ANR即Application Not Responding
(应用程序无响应),一般在ANR的时候会弹出一个应用无响应对话框,同时会候产生一个日志文件trace.txt,位于/data/anr/文件夹下面,trace文件是Android Davik虚拟机在收到异常终止信号时产生的,最常见的一个触发条件就是Android应用中产生了FC(force close)
。由于该文件的产生是在DVM中的,所以只有运行DVM实例的进程才能产生该文件,也就是说只有Java代码才能产生该文件,App应用的Native层(如Android Library、用c/c++编译的库)即使异常也不会产生ANR日志文件。我们可以通过ANR产生的traces日志文件分析应用在哪里产生了ANR,以此来有效解决应用中的ANR。
只用当应用程序的UI线程响应超时才会引起ANR,超时原因一般有两种:
根据ANR产生的原因不同,超时时间也不尽相同,从本质上讲,产生ANR的场景有四种,大致可以对应到Android中的(Activity、BroadcastReceive、Service、InputDispatch)
注意事项: Input的超时机制与其他的不同,对于input来说即便某次事件执行时间超过timeout时长,只要用户后续在没有再生成输入事件,则不会触发ANR
1.Service超时检测机制:
2.BroadcastReceiver超时检测机制:
3.另外:
1. 前台ANR发生后,系统会马上去抓取现场的信息,用于调试分析,收集的信息如下:
2. 分析步骤
1.定位发生ANR时间点
2.查看trace信息
3.分析是否有耗时的message,binder调用,锁的竞争,CPU资源的抢占
4.结合具体的业务场景的上下文来分析
1.人为的收集trace.txt的命令
adb shell ki11-3 888 //可指定进程pid
执行完该命令后traces信息的结果保存到文件/data/anr/traces.txt
2.trace文件解读
----- pid 888 at 2016-11-11 22:22:22 -----
Cmd line:system server
ABI:arm
Build type: optimized
Zygote loaded classes=4113 post zygote classes=3239
Intern table:57550 strong;9315 weak
JNI:CheckINI is off;globals=2418(plus 115 weak)
Libraries:/system/lib/libandroid.so /system/lib/libandroid_servers.so
/system/lib/libaudioeffect jni.so /system/lib/libcompiler rt.so /system/lib/libjavacrypto.so
/system/lib/libjnigraphics.so /system/lib/libmedia jni.so /system/lib/librs_jni.so
/system/lib/libsechook.so /system/lib/libshell_jni.so /system/lib/libsoundpool.so
/system/lib/libwebviewchromium_loader.so /system/lib/libwifi-service.so
/vendor/lib/libalarmservice_jni.so /vendor/lib/liblocationservice.so libjavacore.so (16)
//已分配堆内存大小40MB,其中29M已用,总分配207772个对象
Heap:27%free,29MB/40MB;307772 objects
·.· //省略GC相关信息
//当前进程总99个线程
DALVIK THREADS (99):
//主线程调用栈
"main" prio=5 tid=1 Native
| group="main" sCount=1 dsCount=0 obj=0x75bd9fb0 self=0x5573d4f770
| sysTid=12078 nice=-2 cgrp=default sched=0/0 handle=0x7fa75fafe8
| state=S schedstat=(5907843636 827600677 5112)utm=453 stm=137 core=0 HZ=100
| stack=0x7fd64ef000-0x7fd64f1000 stackSize=8MB
| held mutexes=
//内核栈
kernel:_switch_to+0x70/0x7c
Kernel:SyS_epoll_wait+0x2a0/0x324
kernel:SyS epoll_pwait+0xa4/0x120
kernel:cpu_switch_to+x48/0x4c
native:#00 pc 0000080000869be4 /system/lib64/libc.so(_epoll_pwait+8)
native:#01 pc 008000000001cca4 /system/lib64/libc.so(epoll_pwait+32)
native:#82 pc 888808008881ad74 /system/lib64/libutils.so(_ZN7android6Looper9pollInnerEi+144)
native:#03 pc 000000000001b154 /system/lib64/libutils.so
(_ZN7android6Looper8poll0nceEiPiS1 PPv+80)
native: #04 pc 00000000000d4bc0 /system/lib64/libandroid_runtime.so
(_ZN7android18NativeMessageQueue8poll0nceEP7 INIEnvP8 jobjecti+48)
native:#05 pc 000000000000082c /data/dalvik-cache/arm64/system@framework@boot.oat
(Java android os MessageQueue nativePollOnce_JI+144)
at android.os.MessageQueue.nativePoll0nce(Native method)
at android.os.MessageQueue.next(MessageQueue.java:323)
at android.os.Looper.loop(Looper.java:135)
at com.android.server.SystemServer.run(SystemServer.java:290)
at com.android.server.SystemServer.main(SystemServer.java:175)
at java.lang.reflect.Method.invoke!(Native method)
at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(zygoteInit.java:738)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:628)
"Binder 1"prio=5 tid=8 Native
| group="main" sCount=1 dsCount=0 obj=0x12c610a0 self=0x5573e5c750
| sysTid=12092 nice=0 cgrp=default sched=0/0 handle=0x7fa2743450
| state=S schedstat=(796240075 863170759 3586)utm=50 stm=29 core=1 HZ=100
| stack=0x7fa2647000-0x7fa2649000stackSize=1013KB
| held mutexes=
kernel:__switch_to+0x70/0x7c
kernel:binder_thread_read+0xd78/0xeb0
kernel:binder_ioctl_write_read+0x178/0x24c
kernel:binder_ioctl+0x2b0/0x5e0
kernel:do_vfs_ioct1+0x4a4/0x578
kernel:SyS_ioctl+x5c/0x88
kernel:cpu_switch_to+x48/0x4c
native:#00 pc 000000000069cd0/system/lib64/libc.so(__ioctl+4)
native:#01 pc 0000000000073cf4 /system/lib64/libc.so(ioctl+100)
native:#02 pc 000000000002d6e8 /system/lib64/libbinder.so
(_ZN7android14IPcThreadstate14talkWithDriverEb+164)
native:#03 pc 800000000002df3c /system/lib64/libbinder.so
(_ZN7android14IPCThreadstate20getAndExecuteCommandEv+24)
native:#04 pc 800000000002e114/system/lib64/libbinder.so
(_ZN7android14IPcThreadState14ioinThreadPoolEb+124)
native:#05 pc 800000800036c38 /system/lib64/libbinder.so(???)
native:#06 pc 0000000001579c/system/lib64/libutils.so
(_ZN7android6Thread11threadLoopEPv+208)
native:#07 pc 0000000000090598 /system/lib64/libandroid_runtime.so
(_ZN7android14AndroidRuntime15javaThreadShellEPv+96)
native:#08 pc 0000000000014fec /system/lib64/libutils.so(???)
native:#09 pc 8888000000067754 /system/lib64/libc.so(_ZL15_pthread_startPv+52)
native:#10 pc 000000000001c644 /system/lib64/libc.so(_start_thread+16)
(no managed stack frames)
.·//此处省略剩余的N个线程
3.trace参数解读
"Binder 1"prio=5 tid=8 Native
| group="main" sCount=1 dsCount=0 obi=0x12c610a0 self=0x5573e5c750
| sysTid=12092 nice=0 cgrp=default sched=0/0 handle=0x7fa2743450
| state=S schedstat=(796240075 863170759 3586)utm=50 stm=29 core=1 HZ=100
| stack=0x7fa2647000-0x7fa2649000 stackSize=1013KB
| held mutexes=
线程优先级
//最小取值
public final static int MIN_PRIORITY = 1;
//默认优先级
public final static int NORM_PRIORITY = 5;
//最大取值
public final static int MAX_PRIORITY = 10;
线程状态
NEW, //线程还没启动
RUNNABLE, //正在执行
BLOCKED, //等待获取锁
WAITING, //等待其他线程执行一个特定的动作,比如说调用Object.notify或Object.notifyAll()
TIMED_WAITING, //等待一定时间
TERMINATED //执行完毕
可见,该线程Running=796240075 ns,也约等于796ms。在CPU运行时间包括用户态(utm)和内核态(stm)。utm+stm=(50+29)x10 ms=790ms。
结论:utm+stm=schedstat 第一个参数值
查询关键字(ANR|ActivityManager)
/system_process E/ActivityManager:
ANR in com.anrdemo (com.anrdemo/.MainActivity)
PID: 12639
Reason: Input dispatching timed out (Waiting to send non-key event because the touched window has not finished processing certain input events that were delivered to it over 500.0ms ago. Wait queue length: 6. Wait queue head age: 5841.3ms.)
Load: 13.28 / 11.53 / 9.48
CPU usage from 71228ms to 0ms ago (1970-03-01 00:08:47.864 to 1970-03-01 00:09:59.093):
15% 920/system_server: 9.2% user + 5.9% kernel / faults: 10139 minor 285 major
8.2% 13985/com.tencent.mm: 6.5% user + 1.6% kernel / faults: 35481 minor 1879 major
0.1% 555/fingerprintd: 0% user + 0.1% kernel / faults: 51 minor 22 major
3.5% 266/mmcqd/0: 0% user + 3.5% kernel
3.2% 4424/com.tencent.android.qqdownloader:daemon: 1.7% user + 1.5% kernel / faults: 12007 minor 357 major
3.2% 430/adbd: 0.4% user + 2.7% kernel / faults: 2002 minor
3.1% 135/kswapd0: 0% user + 3.1% kernel
1.8% 12271/com.tencent.mobileqq: 1.2% user + 0.5% kernel / faults: 8799 minor 580 major
1.6% 11695/com.cleanmaster.mguard: 1.5% user + 0.1% kernel / faults: 9171 minor 72 major
1.3% 13039/com.tencent.mm:push: 0.9% user + 0.3% kernel / faults: 9969 minor 192 major
1.2% 4954/com.google.android.gms: 0.9% user + 0.2% kernel / faults: 7546 minor 209 major
1.1% 342/logd: 0.3% user + 0.8% kernel / faults: 628 minor 3 major
1.1% 1485/kworker/u13:3: 0% user + 1.1% kernel
1.1% 10/rcu_preempt: 0% user + 1.1% kernel
1.1% 12166/com.android.vending: 0.8% user + 0.3% kernel / faults: 9897 minor 380 major
1% 3846/com.tencent.qqmusic:QQPlayerService: 0.6% user + 0.4% kernel / faults: 7354 minor 485 major
1% 10486/kworker/u13:4: 0% user + 1% kernel
1% 11251/kworker/u13:1: 0% user + 1% kernel
1% 9498/com.cleanmaster.security:DefendService: 0.7% user + 0.3% kernel / faults: 11998 minor 287 major
0.9% 1973/kworker/u13:5: 0% user + 0.9% kernel
0.9% 14113/com.tencent.mm:exdevice: 0.7% user + 0.1% kernel / faults: 8802 minor 142 major
0.8% 10814/com.tencent.qqmusic: 0.4% user + 0.4% kernel / faults: 9823 minor 221 major
0.8% 2062/sogou.mobile.explorer: 0.5% user + 0.3% kernel / faults: 8532 minor 1288 major
0.8% 528/netd: 0.1% user + 0.7% kernel / faults: 4203 minor 35 major
0.8% 3925/com.google.android.gms.persistent: 0.4% user + 0.3% kernel / faults: 12578 minor 546 major
0.7% 556/cnss_diag: 0.6% user + 0% kernel / faults: 121 minor 5 major
0.6% 4971/com.cleanmaster.mguard:service: 0.4% user + 0.2% kernel / faults: 7468 minor 174 major
0.5% 9180/com.tencent.android.qqdownloader: 0.2% user + 0.3% kernel / faults: 9571 minor 216 major
0.5% 15/ksoftirqd/1: 0% user + 0.5% kernel
0.4% 3280/com.android.phone: 0.2% user + 0.2% kernel / faults: 1857 minor 133 major
0.4% 8489/kworker/0:1: 0% user + 0.4% kernel
0.4% 3127/sdcard: 0% user + 0.4% kernel / faults: 92 minor 17 major
0.4% 397/servicemanager: 0.1% user + 0.2% kernel / faults: 127 minor
0.4% 11878/swift.space.cleaner.free:service: 0.2% user + 0.1% kernel / faults: 6087 minor 140 major
0.4% 20/ksoftirqd/2: 0% user + 0.4% kernel
0.4% 8761/com.netease.cloudmusic:play: 0.3% user + 0% kernel / faults: 4819 minor 977 major
0.3% 6525/com.tencent.qqlive:services: 0.2% user + 0.1% kernel / faults: 7834 minor 483 major
0.3% 12250/com.tencent.mobileqq:MSF: 0.2% user + 0.1% kernel / faults: 5368 minor 114 major
0.3% 9717/com.speed.boost.booster:service: 0.1% user + 0.1% kernel / faults: 5526 minor 77 major
0.3% 8524/sogou.mobile.explorer:push_service: 0.2% user + 0.1% kernel / faults: 6280 minor 42 major
0.3% 427/irq/215-fc38800: 0% user + 0.3% kernel
0.3% 8721/kworker/1:2: 0% user + 0.3% kernel
0.3% 260/cfinteractive: 0% user + 0.3% kernel
0.3% 294/msm-core:sampli: 0% user + 0.3% kernel
0.1% 8756/kworker/u12:3: 0% user + 0.1% kernel
0.1% 11204/kworker/2:0: 0% user + 0.1% kernel
0.3% 3150/com.android.systemui: 0.1% user + 0.1% kernel / faults: 3318 minor 129 major
0.2% 7244/kworker/u12:2: 0% user + 0.2% kernel
0% 3084/VosMCThread: 0% user + 0% kerne
可以看到,Logcat日志信息中主要包含如下内容:
1)导致ANR的类名及所在包名:
com.anrdemo (com.anrdemo/.MainActivity)
2)导致ANR的进程名及ID:com.anrdemo,12639
3)ANR产生的原因(类型):Reason: Input dispatching timed out (Waiting to send non-key event because the touched window has not finished processing certain input events that were delivered to it over 500.0ms ago. Wait queue length: 6. Wait queue head age: 5841.3ms.),明显属于KeyDispatchTimeout类型。
4)系统中CPU使用率的统计信息及某段时间内的变换情况
从trace.txt文件分析
ANR Logcat信息可以帮助我们分析引发ANR的具体类的信息以及ANR的类型,但是却无法帮助我们定位到具体引发问题的代码行,这时候就需要借助ANR发生过程中生成的堆栈信息文件data/anr/trace.txt
获取trace文件:
adb pull /data/anr/trace.txt
由于trace文件记录的是整个手机的ANR日志,所以我们需要根据进程名(包名)和ANR发生的时间找到对应日志,具体以pid 进程id开始,以pid进程id结束。由于阻塞始终会发生在主线程,因此我们需要关注线程名为main的线程状态。下面摘出一部分关键日志进行分析:
//关键日志的起始标记
----- pid 20678 at 2024-05-13 21:58:59 -----
//应用程序包名
Cmd line: com.anrdemo
Build fingerprint: 'Android/aosp_bullhead/bullhead:7.1.2/N2G48C/han-li03221443:userdebug/test-keys'
//手机的CPU架构
ABI: 'arm64'
//堆内存信息
Heap: 33% free, 4MB/6MB; 27144 objects
Dumping cumulative Gc timings
Total time spent in GC: 10.680ms
Mean GC size throughput: 68MB/s
Mean GC object throughput: 1.32996e+06 objects/s
Total number of allocations 41348
Total bytes allocated 5MB
Total bytes freed 753KB
//手机Memory内存信息
Free memory 2MB
Free memory until GC 2MB
Free memory until OOME 187MB
Total memory 6MB
Max memory 192MB
Zygote space size 1128KB
Total mutator paused time: 463us
Total time waiting for GC to complete: 938ns
Total GC count: 1
Total GC time: 10.680ms
Total blocking GC count: 0
Total blocking GC time: 0
//主线程日志分析
//基本信息:main-线程名称,prio-线程优先级,tid-线程id,Sleeping-线程状态
"main" prio=5 tid=1 Sleeping
//详细信息:group-线程组名称,sCount-线程被挂起的次数,dsCount-线程被调试器刮起的次数,obj-线程的Java对象地址,self-线程本身的Native对象地址
| group="main" sCount=1 dsCount=0 obj=0x75190ed0 self=0x7f72095a00
//线程的调度信息:sysTid-linux系统中内核线程id(观察发现主线程的线程号和进程号相同),nice-线程调度的优先级,cgrp-线程调度组,sched-线程调度策略和优先级,handle-线程处理函数地址
| sysTid=20678 nice=-10 cgrp=default sched=0/0 handle=0x7f75fe0a98
//上下文信息:state-线程调度状态,schedstat-线程在CPU中的执行时间、线程等待时间、线程执行的时间片长度,utm-线程在用户状态中调度的时间值,core-最后执行这个线程的CPU核序号
| state=S schedstat=( 274637713 30817705 229 ) utm=20 stm=6 core=1 HZ=100
//堆栈信息:stack-堆栈地址,stackSize-堆栈大小,余下的为堆栈信息,也是我们分析引发ANR问题的关键
| stack=0x7fc8318000-0x7fc831a000 stackSize=8MB
| held mutexes=
at java.lang.Thread.sleep!(Native method)
- sleeping on <0x02f69763> (a java.lang.Object)
at java.lang.Thread.sleep(Thread.java:371)
- locked <0x02f69763> (a java.lang.Object)
at java.lang.Thread.sleep(Thread.java:313)
at com.anrdemo.MainActivity$1.onClick(MainActivity.java:24)
at android.view.View.performClick(View.java:5637)
at android.view.View$PerformClick.run(View.java:22429)
at android.os.Handler.handleCallback(Handler.java:751)
at android.os.Handler.dispatchMessage(Handler.java:95)
at android.os.Looper.loop(Looper.java:154)
at android.app.ActivityThread.main(ActivityThread.java:6121)
at java.lang.reflect.Method.invoke!(Native method)
at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:889)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:779)
"Jit thread pool worker thread 0" daemon prio=5 tid=2 Native
| group="main" sCount=1 dsCount=0 obj=0x12c64790 self=0x7f6a635000
| sysTid=20683 nice=9 cgrp=default sched=0/0 handle=0x7f71701450
| state=S schedstat=( 696978 5677 2 ) utm=0 stm=0 core=2 HZ=100
| stack=0x7f71603000-0x7f71605000 stackSize=1021KB
| held mutexes=
kernel: __switch_to+0x8c/0x98
kernel: futex_wait_queue_me+0xd4/0x130
kernel: futex_wait+0xfc/0x210
kernel: do_futex+0xe0/0x920
kernel: SyS_futex+0x11c/0x1b0
kernel: cpu_switch_to+0x48/0x4c
native: #00 pc 000000000001bcec /system/lib64/libc.so (syscall+28)
native: #01 pc 00000000000e7e4c /system/lib64/libart.so (_ZN3art17ConditionVariable16WaitHoldingLocksEPNS_6ThreadE+156)
native: #02 pc 000000000046c910 /system/lib64/libart.so (_ZN3art10ThreadPool7GetTaskEPNS_6ThreadE+248)
native: #03 pc 000000000046bdac /system/lib64/libart.so (_ZN3art16ThreadPoolWorker3RunEv+124)
native: #04 pc 000000000046b6d0 /system/lib64/libart.so (_ZN3art16ThreadPoolWorker8CallbackEPv+132)
native: #05 pc 0000000000068734 /system/lib64/libc.so (_ZL15__pthread_startPv+208)
native: #06 pc 000000000001da7c /system/lib64/libc.so (__start_thread+16)
(no managed stack frames)
//关键日志的结束标记
----- end 20678 -----
由于ANR只会发生在主线程,所以我们需要关注主线程的状态,从上 面日志中分析可以知道:发生ANR时主线程的状态为Sleep,且引起该状态的地方在MainActivity$1.onClick(MainActivity.java:24)
下面分析Service内onCreate发生ANR异常时源代码的执行情况
大家都知道Android中的四大组件启动时,都会通过跨进程的方式利用Handler消息机制最终将消息Push到ActivityThread中的内部类去处理,因此我先在ActivityThread中搜索service.onCreate调用,然后依次追溯:
ActivityThread中调用:
private void handleCreateService(CreateServiceData data) {
//调用service的onCreate
service.onCreate();
}
该方法由Activity的内部类H(继承与Handler)接收到Servic onCreate消息时调用,该消息由通过scheduleCreateService发出,该方法被ActiveService类调用
public final void scheduleCreateService(IBinder token,
ServiceInfo info, CompatibilityInfo compatInfo, int processState) {
updateProcessState(processState, false);
CreateServiceData s = new CreateServiceData();
s.token = token;
s.info = info;
s.compatInfo = compatInfo;
sendMessage(H.CREATE_SERVICE, s);
}
ActiveServicde的realStartServiceLocked方法调用上面方法,这个方法比较关键需要认真分析:
private final void realStartServiceLocked(ServiceRecord r,
ProcessRecord app, boolean execInFg) throws RemoteException {
r.app = app;
r.restartTime = r.lastActivity = SystemClock.uptimeMillis();
final boolean newService = app.services.add(r);
//发送delay消息(SERVICE_TIMEOUT_MSG)
bumpServiceExecutingLocked(r, execInFg, "create");
mAm.updateLruProcessLocked(app, false, null);
updateServiceForegroundLocked(r.app, /* oomAdj= */ false);
mAm.updateOomAdjLocked();
boolean created = false;
try {
synchronized (r.stats.getBatteryStats()) {
r.stats.startLaunchedLocked();
}
mAm.notifyPackageUse(r.serviceInfo.packageName,
PackageManager.NOTIFY_PACKAGE_USE_SERVICE);
app.forceProcessStateUpTo(ActivityManager.PROCESS_STATE_SERVICE);
//最终执行服务的onCreate()方法
app.thread.scheduleCreateService(r, r.serviceInfo,
mAm.compatibilityInfoForPackageLocked(r.serviceInfo.applicationInfo),
app.repProcState);
r.postNotification();
created = true;
} catch (DeadObjectException e) {
Slog.w(TAG, "Application dead when creating service " + r);
mAm.appDiedLocked(app);
throw e;
} finally {
if (!created) {
// Keep the executeNesting count accurate.
final boolean inDestroying = mDestroyingServices.contains(r);
//当service启动完毕,则remove SERVICE_TIMEOUT_MSG消息
serviceDoneExecutingLocked(r, inDestroying, inDestroying);
// Cleanup.
if (newService) {
app.services.remove(r);
r.app = null;
}
// Retry.
if (!inDestroying) {
scheduleServiceRestartLocked(r, false);
}
}
}
if (r.whitelistManager) {
app.whitelistManager = true;
}
requestServiceBindingsLocked(r, execInFg);
updateServiceClientActivitiesLocked(app, null, true);
// If the service is in the started state, and there are no
// pending arguments, then fake up one so its onStartCommand() will
// be called.
if (r.startRequested && r.callStart && r.pendingStarts.size() == 0) {
r.pendingStarts.add(new ServiceRecord.StartItem(r, false, r.makeNextStartId(),
null, null, 0));
}
sendServiceArgsLocked(r, execInFg, true);
}
bumpServiceExecutingLocked方法内又调用了scheduleServiceTimeoutLocked方法:
void scheduleServiceForegroundTransitionTimeoutLocked(ServiceRecord r) {
if (r.app.executingServices.size() == 0 || r.app.thread == null) {
return;
}
//指定id为SERVICE_TIMEOUT_MSG的消息
Message msg = mAm.mHandler.obtainMessage(
ActivityManagerService.SERVICE_FOREGROUND_TIMEOUT_MSG);
msg.obj = r;
r.fgWaiting = true;
//前台Service和后台Service发送的Delay消息时间不同
mAm.mHandler.sendMessageDelayed(msg, SERVICE_START_FOREGROUND_TIMEOUT);
}
serviceDoneExecutingLocked方法
private void serviceDoneExecutingLocked(ServiceRecord r, boolean inDestroying,
boolean finishing) {
r.executeNesting--;
if (r.executeNesting <= 0) {
if (r.app != null) {
if (DEBUG_SERVICE) Slog.v(TAG_SERVICE,
"Nesting at 0 of " + r.shortName);
r.app.execServicesFg = false;
r.app.executingServices.remove(r);
if (r.app.executingServices.size() == 0) {
if (DEBUG_SERVICE || DEBUG_SERVICE_EXECUTING) Slog.v(TAG_SERVICE_EXECUTING,
"No more executingServices of " + r.shortName);
//remove掉id为SERVICE_TIMEOUT_MSG的消息,从上面的分析可知,该方法是onCreate调用之前发出的一个Delay执行的消息
mAm.mHandler.removeMessages(ActivityManagerService.SERVICE_TIMEOUT_MSG, r.app);
} else if (r.executeFg) {
// Need to re-evaluate whether the app still needs to be in the foreground.
for (int i=r.app.executingServices.size()-1; i>=0; i--) {
if (r.app.executingServices.valueAt(i).executeFg) {
r.app.execServicesFg = true;
break;
}
}
}
if (inDestroying) {
mDestroyingServices.remove(r);
r.bindings.clear();
}
mAm.updateOomAdjLocked(r.app, true);
}
r.executeFg = false;
if (r.tracker != null) {
r.tracker.setExecuting(false, mAm.mProcessStats.getMemFactorLocked(),
SystemClock.uptimeMillis());
if (finishing) {
r.tracker.clearCurrentOwner(r, false);
r.tracker = null;
}
}
if (finishing) {
if (r.app != null && !r.app.persistent) {
r.app.services.remove(r);
if (r.whitelistManager) {
updateWhitelistManagerLocked(r.app);
}
}
r.app = null;
}
}
}
小结:
Service启动调用onCreate方法之前send一个Delay执行的消息,当发生异常或者Service的onCreate方法执行完毕之后,remove掉之前send的消息,如果onCreate方法执行时间超过Delay的时间,那么该消息就会被处理,此时如果Delay的时间是我们设定的ANR时间,则发生ANR,系统作出处理,否则该消息被remove,不会被执行。以一种观察者模式的角度去实现。
id为SERVICE_TIMEOUT_MSG
的消息被AMS中MainHandler接收
case SERVICE_TIMEOUT_MSG: {
if (mDidDexOpt) {
mDidDexOpt = false;
Message nmsg = mHandler.obtainMessage(SERVICE_TIMEOUT_MSG);
nmsg.obj = msg.obj;
mHandler.sendMessageDelayed(nmsg, ActiveServices.SERVICE_TIMEOUT);
return;
}
mServices.serviceTimeout((ProcessRecord)msg.obj);
} break;
ActiveService的serviceTimeout方法中调用AppErrors中的appNotResponding方法,很明显是ANR异常处理的方法:
final void appNotResponding(ProcessRecord app, ActivityRecord activity,
ActivityRecord parent, boolean aboveSystem, final String annotation) {
ArrayList<Integer> firstPids = new ArrayList<Integer>(5);
SparseArray<Boolean> lastPids = new SparseArray<Boolean>(20);
if (mService.mController != null) {
try {
// 0 == continue, -1 = kill process immediately
int res = mService.mController.appEarlyNotResponding(
app.processName, app.pid, annotation);
if (res < 0 && app.pid != MY_PID) {
app.kill("anr", true);
}
} catch (RemoteException e) {
mService.mController = null;
Watchdog.getInstance().setActivityController(null);
}
}
long anrTime = SystemClock.uptimeMillis();
if (ActivityManagerService.MONITOR_CPU_USAGE) {
mService.updateCpuStatsNow();
}
// Unless configured otherwise, swallow ANRs in background processes & kill the process.
boolean showBackground = Settings.Secure.getInt(mContext.getContentResolver(),
Settings.Secure.ANR_SHOW_BACKGROUND, 0) != 0;
boolean isSilentANR;
synchronized (mService) {
// PowerManager.reboot() can block for a long time, so ignore ANRs while shutting down.
if (mService.mShuttingDown) {
Slog.i(TAG, "During shutdown skipping ANR: " + app + " " + annotation);
return;
} else if (app.notResponding) {
Slog.i(TAG, "Skipping duplicate ANR: " + app + " " + annotation);
return;
} else if (app.crashing) {
Slog.i(TAG, "Crashing app skipping ANR: " + app + " " + annotation);
return;
} else if (app.killedByAm) {
Slog.i(TAG, "App already killed by AM skipping ANR: " + app + " " + annotation);
return;
} else if (app.killed) {
Slog.i(TAG, "Skipping died app ANR: " + app + " " + annotation);
return;
}
// In case we come through here for the same app before completing
// this one, mark as anring now so we will bail out.
app.notResponding = true;
// Log the ANR to the event log.
EventLog.writeEvent(EventLogTags.AM_ANR, app.userId, app.pid,
app.processName, app.info.flags, annotation);
// Dump thread traces as quickly as we can, starting with "interesting" processes.
firstPids.add(app.pid);
// Don't dump other PIDs if it's a background ANR
isSilentANR = !showBackground && !isInterestingForBackgroundTraces(app);
if (!isSilentANR) {
int parentPid = app.pid;
if (parent != null && parent.app != null && parent.app.pid > 0) {
parentPid = parent.app.pid;
}
if (parentPid != app.pid) firstPids.add(parentPid);
if (MY_PID != app.pid && MY_PID != parentPid) firstPids.add(MY_PID);
for (int i = mService.mLruProcesses.size() - 1; i >= 0; i--) {
ProcessRecord r = mService.mLruProcesses.get(i);
if (r != null && r.thread != null) {
int pid = r.pid;
if (pid > 0 && pid != app.pid && pid != parentPid && pid != MY_PID) {
if (r.persistent) {
firstPids.add(pid);
if (DEBUG_ANR) Slog.i(TAG, "Adding persistent proc: " + r);
} else if (r.treatLikeActivity) {
firstPids.add(pid);
if (DEBUG_ANR) Slog.i(TAG, "Adding likely IME: " + r);
} else {
lastPids.put(pid, Boolean.TRUE);
if (DEBUG_ANR) Slog.i(TAG, "Adding ANR proc: " + r);
}
}
}
}
}
}
//输出ANR异常Log
// Log the ANR to the main log.
StringBuilder info = new StringBuilder();
info.setLength(0);
//ANR发生时的进程Name
info.append("ANR in ").append(app.processName);
if (activity != null && activity.shortComponentName != null) {
info.append(" (").append(activity.shortComponentName).append(")");
}
info.append("\n");
//进程ID
info.append("PID: ").append(app.pid).append("\n");
//ANR发生的原因
if (annotation != null) {
info.append("Reason: ").append(annotation).append("\n");
}
if (parent != null && parent != activity) {
info.append("Parent: ").append(parent.shortComponentName).append("\n");
}
ProcessCpuTracker processCpuTracker = new ProcessCpuTracker(true);
// don't dump native PIDs for background ANRs unless it is the process of interest
String[] nativeProcs = null;
if (isSilentANR) {
for (int i = 0; i < NATIVE_STACKS_OF_INTEREST.length; i++) {
if (NATIVE_STACKS_OF_INTEREST[i].equals(app.processName)) {
nativeProcs = new String[] { app.processName };
break;
}
}
} else {
nativeProcs = NATIVE_STACKS_OF_INTEREST;
}
int[] pids = nativeProcs == null ? null : Process.getPidsForCommands(nativeProcs);
ArrayList<Integer> nativePids = null;
if (pids != null) {
nativePids = new ArrayList<Integer>(pids.length);
for (int i : pids) {
nativePids.add(i);
}
}
// For background ANRs, don't pass the ProcessCpuTracker to
// avoid spending 1/2 second collecting stats to rank lastPids.
//dump栈信息
File tracesFile = mService.dumpStackTraces(true, firstPids,
(isSilentANR) ? null : processCpuTracker,
(isSilentANR) ? null : lastPids,
nativePids);
String cpuInfo = null;
if (ActivityManagerService.MONITOR_CPU_USAGE) {
mService.updateCpuStatsNow();
synchronized (mService.mProcessCpuTracker) {
//各进程使用CPU情况
cpuInfo = mService.mProcessCpuTracker.printCurrentState(anrTime);
}
//CPU当前负载
info.append(processCpuTracker.printCurrentLoad());
info.append(cpuInfo);
}
info.append(processCpuTracker.printCurrentState(anrTime));
Slog.e(TAG, info.toString());
if (tracesFile == null) {
// There is no trace file, so dump (only) the alleged culprit's threads to the log
Process.sendSignal(app.pid, Process.SIGNAL_QUIT);
}
//将anr信息添加到dropbox
mService.addErrorToDropBox("anr", app, app.processName, activity, parent, annotation,
cpuInfo, tracesFile, null);
if (mService.mController != null) {
try {
// 0 == show dialog, 1 = keep waiting, -1 = kill process immediately
int res = mService.mController.appNotResponding(
app.processName, app.pid, info.toString());
if (res != 0) {
if (res < 0 && app.pid != MY_PID) {
app.kill("anr", true);
} else {
synchronized (mService) {
mService.mServices.scheduleServiceTimeoutLocked(app);
}
}
return;
}
} catch (RemoteException e) {
mService.mController = null;
Watchdog.getInstance().setActivityController(null);
}
}
synchronized (mService) {
mService.mBatteryStatsService.noteProcessAnr(app.processName, app.uid);
if (isSilentANR) {
app.kill("bg anr", true);
return;
}
// Set the app's notResponding state, and look up the errorReportReceiver
makeAppNotRespondingLocked(app,
activity != null ? activity.shortComponentName : null,
annotation != null ? "ANR " + annotation : "ANR",
info.toString());
通过Handler消息机制弹出ANR对话框
// Bring up the infamous App Not Responding dialog
Message msg = Message.obtain();
HashMap<String, Object> map = new HashMap<String, Object>();
msg.what = ActivityManagerService.SHOW_NOT_RESPONDING_UI_MSG;
msg.obj = map;
msg.arg1 = aboveSystem ? 1 : 0;
map.put("app", app);
if (activity != null) {
map.put("activity", activity);
}
mService.mUiHandler.sendMessage(msg);
}
}
Android中的ANR监测以Handler消息机制实现,使用观察者模式,当开始监测时注册消息(该消息在规定时间后执行),事件处理完之后移除消息,当该消息执行时说明事件处理超过了规定的时间,即ANR。
//使用冒泡排序对一个大数组排序
private fun sortBigArray() {
val currTime = System.currentTimeMillis()
val random = IntArray(1000000)
for (i in random.indices) {
random[i] = (Math.random() * 10000000).toInt()
}
BubbleSort.sort(random)
println("耗时" + (System.currentTimeMillis() - currTime) + "ms")
for (i in random.indices) {
println(random[i].toString())
}
}
点击一个按钮调用sortBigArray()方法,内部调用BubbleSort类的sort()方法对一个大数组(100万)进行排序,然后点击几次返回键,然后就出现ANR了。
看一下Logcat日志输出
//debug级别日志
2020-06-03 21:20:24.209 com.example.android.jetpackdemo I/art: Wrote stack traces to '/data/anr/traces.txt'
//error级别日志
2020-06-03 21:20:28.048 ? E/ActivityManager: ANR in com.example.android.jetpackdemo (com.example.android.jetpackdemo/.StartActivity)
PID: 15564
Reason: Input dispatching timed out (Waiting to send key event because the focused window has not finished processing all of the input events that were previously delivered to it. Outbound queue length: 0. Wait queue length: 2.)
Load: 7.7 / 7.48 / 7.35
CPU usage from 294322ms to 0ms ago (2020-06-03 21:15:29.817 to 2020-06-03 21:20:24.139):
4.1% 2001/system_server: 3.1% user + 0.9% kernel / faults: 64102 minor 6 major
3.3% 29428/adbd: 0.8% user + 2.4% kernel / faults: 131259 minor
1.1% 508/logd: 0.5% user + 0.6% kernel / faults: 18 minor
0.7% 2661/com.android.systemui: 0.6% user + 0.1% kernel / faults: 1648 minor 1 major
0.7% 607/surfaceflinger: 0.4% user + 0.3% kernel / faults: 21 minor
0.7% 24463/com.huawei.hwid.persistent: 0.6% user + 0% kernel / faults: 4650 minor 1 major
0.5% 4018/com.huawei.android.launcher: 0.4% user + 0% kernel / faults: 16025 minor 3 major
0.5% 24301/fingerprint_log: 0% user + 0.5% kernel
0.4% 28932/com.huawei.appmarket: 0.3% user + 0% kernel / faults: 2526 minor
//...
2020-06-03 21:20:28.048 ? E/ActivityManager: CPU usage from 1721ms to 2250ms later (2020-06-03 21:20:25.860 to 2020-06-03 21:20:26.389):
99% 15564/com.example.android.jetpackdemo: 97% user + 1.8% kernel / faults: 37 minor
99% 15564/oid.jetpackdemo: 99% user + 0% kernel
7.5% 2001/system_server: 3.7% user + 3.7% kernel / faults: 5 minor
5.6% 2014/ActivityManager: 1.8% user + 3.7% kernel
1.8% 2813/Binder:2001_5: 1.8% user + 0% kernel
1.8% 2862/Binder:2001_6: 0% user + 1.8% kernel
1.8% 3089/Binder:2001_7: 1.8% user + 0% kernel
5.3% 29428/adbd: 0% user + 5.3% kernel / faults: 480 minor
3.5% 29430/->transport: 0% user + 3.5% kernel
1.7% 29428/adbd: 0% user + 1.7% kernel
1.3% 53/rcuop/6: 0% user + 1.3% kernel
16% TOTAL: 14% user + 2.1% kernel + 0.2% irq + 0.2% softirq
上面的日志中输出了堆栈信息的保存在 /data/anr/traces.txt文件中
com.example.android.jetpackdemo I/dalvikvm: Wrote stack traces to '/data/anr/traces.txt'
发生ANR进程的包名信息,所在的类,进程id和ANR的类型
2020-06-03 21:20:28.048 ? E/ActivityManager: ANR in com.example.android.jetpackdemo (com.example.android.jetpackdemo/.StartActivity)
PID: 15564
Reason: Input dispatching timed out (Waiting to send key event because the focused window has not finished processing all of the input events that were previously delivered to it. Outbound queue length: 0. Wait queue length: 2.)
包名com.example.android.jetpackdemo,
具体的类com.example.android.jetpackdemo.StartActivity,
进程号是PID: 15564,
ANR的类型是Input dispatching timed out。
CPU usage from 294322ms to 0ms ago (2020-06-03 21:15:29.817 to 2020-06-03 21:20:24.139):
4.1% 2001/system_server: 3.1% user + 0.9% kernel / faults: 64102 minor 6 major
3.3% 29428/adbd: 0.8% user + 2.4% kernel / faults: 131259 minor
1.1% 508/logd: 0.5% user + 0.6% kernel / faults: 18 minor
//...
2020-06-03 21:20:28.048 ? E/ActivityManager: CPU usage from 1721ms to 2250ms later (2020-06-03 21:20:25.860 to 2020-06-03 21:20:26.389):
99% 15564/com.example.android.jetpackdemo: 97% user + 1.8% kernel / faults: 37 minor
99% 15564/oid.jetpackdemo: 99% user + 0% kernel
7.5% 2001/system_server: 3.7% user + 3.7% kernel / faults: 5 minor
5.6% 2014/ActivityManager: 1.8% user + 3.7% kernel
1.8% 2813/Binder:2001_5: 1.8% user + 0% kernel
//...
注意:
在ANR发生之前,2020-06-03 21:15:29.817 to 2020-06-03 21:20:24.139,这段时间CPU的使用并不高。
在ANR发生的时候,2020-06-03 21:20:25.860 to 2020-06-03 21:20:26.389,这段时间CPU的使用相当高,已经达到99%了。
99% 15564/com.example.android.jetpackdemo: 97% user + 1.8% kernel
99%:内存占用率
15564/com.example.android.jetpackdemo:进程id和进程名。
这两段CPU 信息分别代表ANR发生前和ANR时的CPU占用率,在输出的CPU使用信息中我们也可以看出一些端倪,注意到进程CPU的占用率比较高,说明我们的进程比较忙碌,这里需要说明一下,进程忙碌并不一定代表主线程忙碌,也可能是进程中的后台线程忙碌。
现在知道了ANR发生的所在的类,需要分析发生ANR的时候保存的traces.txt文件了。
导出traces文件
使用adb命令导出traces.txt文件
adb pull /data/anr/traces.txt traces_1.txt
/data/anr/traces.txt: 1 file pulled, 0 skipped. 28.5 MB/s (701726 bytes in 0.023s)
traces.txt部分信息
----- pid 15564 at 2020-06-03 21:20:24 -----
Cmd line: com.example.android.jetpackdemo
Build fingerprint: 'HUAWEI/MLA-AL10/HWMLA:7.0/HUAWEIMLA-AL10/C00B364:user/release-keys'
ABI: 'arm64'
//...
在traces.txt文件的最顶部,首先输出的是发生ANR的进程号和包名信息,然后在traces.txt中搜索我们的进程号或者包名。
"main" prio=5 tid=1 Runnable
| group="main" sCount=0 dsCount=0 obj=0x77d21af8 self=0x7fa2ea2a00
| sysTid=15564 nice=-10 cgrp=default sched=0/0 handle=0x7fa6f4ba98
| state=R schedstat=( 22116939220 18299419 428 ) utm=2209 stm=2 core=5 HZ=100
| stack=0x7fd42e0000-0x7fd42e2000 stackSize=8MB
| held mutexes= "mutator lock"(shared held)
at com.example.android.jetpackdemo.BubbleSort.sort(BubbleSort.java:45)
at com.example.android.jetpackdemo.StartActivity.sortBigArray(StartActivity.kt:76)
at com.example.android.jetpackdemo.StartActivity.onClick(StartActivity.kt:47)
at java.lang.reflect.Method.invoke!(Native method)
at androidx.appcompat.app.AppCompatViewInflater$DeclaredOnClickListener.onClick(AppCompatViewInflater.java:397)
at android.view.View.performClick(View.java:5646)
at android.view.View$PerformClick.run(View.java:22473)
at android.os.Handler.handleCallback(Handler.java:761)
at android.os.Handler.dispatchMessage(Handler.java:98)
at android.os.Looper.loop(Looper.java:156)
at android.app.ActivityThread.main(ActivityThread.java:6517)
at java.lang.reflect.Method.invoke!(Native method)
at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:942)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:832)
//...
看一下和线程相关的部分信息。
"main" prio=5 tid=1 Runnable
| group="main" sCount=0 dsCount=0 obj=0x77d21af8 self=0x7fa2ea2a00
| sysTid=15564 nice=-10 cgrp=default sched=0/0 handle=0x7fa6f4ba98
| state=R schedstat=( 22116939220 18299419 428 ) utm=2209 stm=2 core=5 HZ=100
| stack=0x7fd42e0000-0x7fd42e2000 stackSize=8MB
| held mutexes= "mutator lock"(shared held)
线程基本信息:
线程名:main
线程优先级:prio=5,优先级取值范围[1,10],详见Thread类:
//最小取值
public final static int MIN_PRIORITY = 1;
//默认优先级
public final static int NORM_PRIORITY = 5;
//最大取值
public final static int MAX_PRIORITY = 10;
线程id: tid=1,1代表主线程
线程状态:Runnable,状态取值如下,详见Thread.State枚举类:
NEW, //线程还没启动
RUNNABLE, //正在执行
BLOCKED, //等待获取锁
WAITING, //等待其他线程执行一个特定的动作,比如说调用Object.notify或Object.notifyAll()
TIMED_WAITING, //等待一定时间
TERMINATED //执行完毕
从上面traces.txt文件中这段信息可以看出,导致ANR的最终原因是在BubbleSort.java的第45行。
at com.example.android.jetpackdemo.BubbleSort.sort(BubbleSort.java:45)
at com.example.android.jetpackdemo.StartActivity.sortBigArray(StartActivity.kt:76)
at com.example.android.jetpackdemo.StartActivity.onClick(StartActivity.kt:47)
at java.lang.reflect.Method.invoke!(Native method)
/**
* 拷贝文件,注意要有读写权限
*/
private fun doIo() {
val prePath = Environment.getExternalStorageDirectory().path
val file = File("${prePath}/test/View.java")
if (file.exists()) {
Log.d(TAG, "doIo: ${file.length()}")
val reader = FileReader(file)
val fileWriter = FileWriter("${prePath}/test/ViewCopy.java", true)
for (index in 0 until 5) {
var count: Int
while (reader.read().also { count = it } != -1) {
fileWriter.write(count)
}
try {
reader.reset()
} catch (e: IOException) {
Log.d(TAG, "doIo: error ${e.message}")
}
}
}
}
调用doIo()方法以后,多次点击返回键,制造ANR。
Logcat日志输出
2020-06-04 21:05:24.462 ? E/ActivityManager: ANR in com.example.android.jetpackdemo (com.example.android.jetpackdemo/.StartActivity)
PID: 16295
Reason: Input dispatching timed out (Waiting to send key event because the focused window has not finished processing all of the input events that were previously delivered to it. Outbound queue length: 0. Wait queue length: 2.)
Load: 7.49 / 7.45 / 7.24
CPU usage from 87491ms to 0ms ago (2020-06-04 21:03:53.035 to 2020-06-04 21:05:20.526):
7.8% 2001/system_server: 6.1% user + 1.7% kernel / faults: 34095 minor 3 major
4.2% 28932/com.huawei.appmarket: 3.7% user + 0.5% kernel / faults: 12314 minor 5 major
2.8% 2661/com.android.systemui: 2.2% user + 0.5% kernel / faults: 4222 minor 1 major
2% 412/msm-core:sampli: 0% user + 2% kernel
1.7% 24463/com.huawei.hwid.persistent: 1.5% user + 0.1% kernel / faults: 3317 minor 1 major
1.5% 607/surfaceflinger: 1% user + 0.5% kernel / faults: 24 minor
//...
2020-06-04 21:05:24.462 ? E/ActivityManager: CPU usage from 1696ms to 2226ms later (2020-06-04 21:05:22.222 to 2020-06-04 21:05:22.752):
84% 16295/com.example.android.jetpackdemo: 84% user + 0% kernel / faults: 562 minor 1 major
68% 16295/oid.jetpackdemo: 68% user + 0% kernel
12% 16317/RenderThread: 12% user + 0% kernel
1.8% 16307/HeapTaskDaemon: 1.8% user + 0% kernel
+0% 16461/DeferredSaveThr: 0% user + 0% kernel
9.1% 2001/system_server: 1.8% user + 7.3% kernel / faults: 7 minor
7.3% 2014/ActivityManager: 0% user + 7.3% kernel
3.6% 2536/Binder:2001_3: 3.6% user + 0% kernel
5.5% 607/surfaceflinger: 2.7% user + 2.7% kernel
1.3% 607/surfaceflinger: 1.3% user + 0% kernel
1.3% 658/Binder:607_1: 0% user + 1.3% kernel
4.3% 2661/com.android.systemui: 2.9% user + 1.4% kernel / faults: 26 minor
4.3% 3614/RenderThread: 2.9% user + 1.4% kernel
1.4% 2661/ndroid.systemui: 1.4% user + 0% kernel
1.3% 25/rcuop/2: 0% user + 1.3% kernel
1.3% 339/irq/171-tsens_i: 0% user + 1.3% kernel
1.5% 11851/mdss_fb0: 0% user + 1.5% kernel
1.6% 14246/kworker/u16:5: 0% user + 1.6% kernel
1.6% 16318/kworker/u16:4: 0% user + 1.6% kernel
15% TOTAL: 13% user + 1.8% kernel
从上面的日志信息中我们也看出来发生ANR的时候,进程com.example.android.jetpackdemo
CPU占用率是比较高的,说明我们进程内存在比较忙碌的线程。然后我们继续看一下对应的traces.txt文件。
traces.txt部分信息
----- pid 16295 at 2020-06-04 21:05:20 -----
Cmd line: com.example.android.jetpackdemo
Build fingerprint: 'HUAWEI/MLA-AL10/HWMLA:7.0/HUAWEIMLA-AL10/C00B364:user/release-keys'
通过进程号pid 16295搜索
"main" prio=5 tid=1 Runnable
| group="main" sCount=0 dsCount=0 obj=0x77d21af8 self=0x7fa2ea2a00
| sysTid=16295 nice=-10 cgrp=default sched=0/0 handle=0x7fa6f4ba98
| state=R schedstat=( 16406184130 12254163 407 ) utm=1630 stm=10 core=6 HZ=100
| stack=0x7fd42e0000-0x7fd42e2000 stackSize=8MB
| held mutexes= "mutator lock"(shared held)
native: #00 pc 0000000000478088 /system/lib64/libart.so (_ZN3art15DumpNativeStackERNSt3__113basic_ostreamIcNS0_11char_traitsIcEEEEiP12BacktraceMapPKcPNS_9ArtMethodEPv+220)
native: #01 pc 0000000000478084 /system/lib64/libart.so (_ZN3art15DumpNativeStackERNSt3__113basic_ostreamIcNS0_11char_traitsIcEEEEiP12BacktraceMapPKcPNS_9ArtMethodEPv+216)
native: #02 pc 000000000044c604 /system/lib64/libart.so (_ZNK3art6Thread9DumpStackERNSt3__113basic_ostreamIcNS1_11char_traitsIcEEEEbP12BacktraceMap+524)
native: #03 pc 0000000000463f60 /system/lib64/libart.so (_ZN3art14DumpCheckpoint3RunEPNS_6ThreadE+820)
native: #04 pc 000000000044d510 /system/lib64/libart.so (_ZN3art6Thread21RunCheckpointFunctionEv+192)
native: #05 pc 00000000000ff870 /system/lib64/libart.so (_ZN3art27ScopedObjectAccessUncheckedD2Ev+576)
native: #06 pc 000000000010a764 /system/lib64/libart.so (_ZN3art8CheckJNI23GetPrimitiveArrayRegionEPKcNS_9Primitive4TypeEP7_JNIEnvP7_jarrayiiPv+1164)
native: #07 pc 0000000000022ee4 /system/lib64/libjavacore.so (???)
native: #08 pc 00000000004747a8 /data/dalvik-cache/arm64/system@framework@boot-core-libart.oat (Java_libcore_icu_NativeConverter_encode__J_3CI_3BI_3IZ+244)
at libcore.icu.NativeConverter.encode(Native method)
at java.nio.charset.CharsetEncoderICU.encodeLoop(CharsetEncoderICU.java:169)
at java.nio.charset.CharsetEncoder.encode(CharsetEncoder.java:579)
at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:271)
at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125)
- locked <0x05b5279d> (a java.io.FileWriter)
at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:113)
at java.io.OutputStreamWriter.write(OutputStreamWriter.java:194)
at com.example.android.jetpackdemo.StartActivity.doIo(StartActivity.kt:116)
at com.example.android.jetpackdemo.StartActivity.onClick(StartActivity.kt:65)
at java.lang.reflect.Method.invoke!(Native method)
at androidx.appcompat.app.AppCompatViewInflater$DeclaredOnClickListener.onClick(AppCompatViewInflater.java:397)
at android.view.View.performClick(View.java:5646)
at android.view.View$PerformClick.run(View.java:22473)
at android.os.Handler.handleCallback(Handler.java:761)
at android.os.Handler.dispatchMessage(Handler.java:98)
at android.os.Looper.loop(Looper.java:156)
at android.app.ActivityThread.main(ActivityThread.java:6517)
at java.lang.reflect.Method.invoke!(Native method)
at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:942)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:832)
重点看一下这段信息
at java.io.OutputStreamWriter.write(OutputStreamWriter.java:194)
at com.example.android.jetpackdemo.StartActivity.doIo(StartActivity.kt:116)
at com.example.android.jetpackdemo.StartActivity.onClick(StartActivity.kt:65)
从上面这段信息可以看出,导致ANR的最终原因是在OutputStreamWriter.java的第194行。而我们的代码出问题的地方是StartActivity.kt的116行。
//锁资源
val lockedResource: Any = Any()
fun onClick(v: View) {
when (v.id) {
R.id.btnWaitLockedResource -> {
LockTask().execute(arrayListOf<Int>())
Log.d(TAG, "onClick: 主线程先睡眠一会,避免先获取到锁")
Thread.sleep(200)
Log.d(TAG, "onClick: 主线程先睡眠结束,尝试获取锁")
synchronized(lockedResource) {
for (index in 0 until 10) {
Log.d(TAG, "onClick: 主线程获取到锁了$index")
}
}
}
}
}
//LockTask后台线程
inner class LockTask : AsyncTask<MutableList<Int>, Int, Unit>() {
override fun doInBackground(vararg params: MutableList<Int>) =
synchronized(lockedResource) {
val list = params[0]
for (i in 0 until 1000000) {
list.add((Math.random() * 10000000).toInt())
}
list.forEach {
Log.d(TAG, "doInBackground: for each element is $it")
}
}
}
调用onClick()方法以后,先让后台线程获取锁,然后主线程再尝试获取锁。然后多次点击返回键,制造ANR。
Logcat日志输出
2020-06-04 09:55:04.396 ? E/ActivityManager: ANR in com.example.android.jetpackdemo (com.example.android.jetpackdemo/.StartActivity)
PID: 20008
Reason: Input dispatching timed out (Waiting to send key event because the focused window has not finished processing all of the input events that were previously delivered to it. Outbound queue length: 0. Wait queue length: 2.)
Load: 8.27 / 7.73 / 7.37
CPU usage from 83152ms to 0ms ago (2020-06-04 09:53:36.842 to 2020-06-04 09:54:59.995) with 99% awake:
19% 508/logd: 15% user + 3.5% kernel / faults: 533 minor 1 major
5.5% 2001/system_server: 3.9% user + 1.5% kernel / faults: 10843 minor 7 major
4.9% 28932/com.huawei.appmarket: 4.3% user + 0.6% kernel / faults: 13003 minor 79 major
2.6% 2661/com.android.systemui: 2.2% user + 0.3% kernel / faults: 7158 minor 2 major
1.5% 607/surfaceflinger: 0.9% user + 0.6% kernel / faults: 190 minor 1 major
1.2% 24307/logcat: 0.7% user + 0.4% kernel
0.8% 11161/com.android.settings: 0.6% user + 0.1% kernel / faults: 9084 minor 20 major
0.6% 24305/logcat: 0.2% user + 0.3% kernel
0.4% 24301/fingerprint_log: 0% user + 0.4% kernel
0.3% 15363/kworker/u16:10: 0% user + 0.3% kernel
0.2% 6831/kworker/u16:5: 0% user + 0.2% kernel
0.2% 837/imonitor: 0% user + 0.1% kernel
//...
2020-06-04 09:55:04.396 ? E/ActivityManager: CPU usage from 2211ms to 2742ms later (2020-06-04 09:55:02.206 to 2020-06-04 09:55:02.737):
105% 20008/com.example.android.jetpackdemo: 92% user + 13% kernel / faults: 220 minor
99% 20096/AsyncTask #1: 86% user + 13% kernel
5.6% 20019/HeapTaskDaemon: 5.6% user + 0% kernel
103% 508/logd: 99% user + 3.7% kernel / faults: 8 minor
92% 24315/logd.reader.per: 92% user + 0% kernel
7.5% 511/logd.writer: 5.6% user + 1.8% kernel
3.7% 24314/logd.reader.per: 0% user + 3.7% kernel
1.8% 24313/logd.reader.per: 0% user + 1.8% kernel
11% 2661/com.android.systemui: 11% user + 0% kernel / faults: 52 minor
9.3% 3614/RenderThread: 7.5% user + 1.8% kernel
1.8% 2661/ndroid.systemui: 1.8% user + 0% kernel
9.3% 607/surfaceflinger: 9.3% user + 0% kernel
3.7% 607/surfaceflinger: 3.7% user + 0% kernel
1.8% 2614/Binder:607_4: 0% user + 1.8% kernel
5.6% 2001/system_server: 1.8% user + 3.7% kernel / faults: 2 minor
5.6% 2014/ActivityManager: 0% user + 5.6% kernel
3.3% 19794/adbd: 1.6% user + 1.6% kernel / faults: 147 minor
1.6% 19794/adbd: 0% user + 1.6% kernel
1.6% 19796/->transport: 0% user + 1.6% kernel
1.6% 19797/<-transport: 0% user + 1.6% kernel
3.4% 24307/logcat: 0% user + 3.4% kernel
1.3% 624/mm-pp-dpps: 0% user + 1.3% kernel
1.3% 717/ABA_THREAD: 1.3% user + 0% kernel
1.6% 18971/kworker/0:2: 0% user + 1.6% kernel
1.6% 18974/kworker/u16:0: 0% user + 1.6% kernel
1.6% 19095/mdss_fb0: 0% user + 1.6% kernel
1.7% 24301/fingerprint_log: 1.7% user + 0% kernel
1.7% 24305/logcat: 1.7% user + 0% kernel
31% TOTAL: 26% user + 4% kernel + 0.2% irq + 0.2% softirq
进程CPU占用率是比较高的,说明进程内存在比较忙碌的线程。然后继续看一下对应的traces.txt文件。
traces.txt部分信息
----- pid 20008 at 2020-06-04 09:55:00 -----
Cmd line: com.example.android.jetpackdemo
Build fingerprint: 'HUAWEI/MLA-AL10/HWMLA:7.0/HUAWEIMLA-AL10/C00B364:user/release-keys'
通过进程号pid 20008搜索
"main" prio=5 tid=1 Blocked
| group="main" sCount=1 dsCount=0 obj=0x77d21af8 self=0x7fa2ea2a00
| sysTid=20008 nice=-10 cgrp=default sched=0/0 handle=0x7fa6f4ba98
| state=S schedstat=( 278831875 7233747 156 ) utm=22 stm=5 core=0 HZ=100
| stack=0x7fd42e0000-0x7fd42e2000 stackSize=8MB
| held mutexes=
at com.example.android.jetpackdemo.StartActivity.onClick(StartActivity.kt:61)
- waiting to lock <0x0f8c80b0> (a java.lang.Object) held by thread 16
at java.lang.reflect.Method.invoke!(Native method)
at androidx.appcompat.app.AppCompatViewInflater$DeclaredOnClickListener.onClick(AppCompatViewInflater.java:397)
at android.view.View.performClick(View.java:5646)
at android.view.View$PerformClick.run(View.java:22473)
at android.os.Handler.handleCallback(Handler.java:761)
at android.os.Handler.dispatchMessage(Handler.java:98)
at android.os.Looper.loop(Looper.java:156)
at android.app.ActivityThread.main(ActivityThread.java:6517)
at java.lang.reflect.Method.invoke!(Native method)
at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:942)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:832)
关键信息
at com.example.android.jetpackdemo.StartActivity.onClick(StartActivity.kt:61)
- waiting to lock <0x0f8c80b0> (a java.lang.Object) held by thread 16
在StartActivity的61行,在等待一个锁对象<0x0f8c80b0>,该对象是一个Object对象
(a java.lang.Object),这个锁对象正在被线程id为16的线程持有。那么我们下面在traces.txt文件中搜索一下这个锁对象<0x0f8c80b0>。如下所示:
DALVIK THREADS (16):
"AsyncTask #1" prio=5 tid=16 Runnable
| group="main" sCount=0 dsCount=0 obj=0x12cd61f0 self=0x7f93187200
| sysTid=20096 nice=10 cgrp=bg_non_interactive sched=0/0 handle=0x7f84346450
| state=R schedstat=( 13814173056 6030204 1355 ) utm=1193 stm=188 core=3 HZ=100
| stack=0x7f84244000-0x7f84246000 stackSize=1037KB
| held mutexes= "mutator lock"(shared held)
at java.lang.Integer.stringSize(Integer.java:414)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:630)
at java.lang.StringBuilder.append(StringBuilder.java:220)
at com.example.android.jetpackdemo.StartActivity$LockTask.doInBackground(StartActivity.kt:107)
- locked <0x0f8c80b0> (a java.lang.Object)
at com.example.android.jetpackdemo.StartActivity$LockTask.doInBackground(StartActivity.kt:99)
at android.os.AsyncTask$2.call(AsyncTask.java:316)
at java.util.concurrent.FutureTask.run(FutureTask.java:237)
at android.os.AsyncTask$SerialExecutor$1.run(AsyncTask.java:255)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1133)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:607)
at java.lang.Thread.run(Thread.java:776)
关键信息
at com.example.android.jetpackdemo.StartActivity$LockTask.doInBackground(StartActivity.kt:107)
- locked <0x0f8c80b0> (a java.lang.Object)
正是这个AsyncTask在107行持有锁对象0x0f8c80b0,导致主线程无法获取锁而阻塞,最终导致ANR。
val resourceFirst = "resourceFirst"
val resourceSecond = "resourceSecond"
private fun mockDeadLock() {
//启动一个后台线程
thread(start = false) {
synchronized(resourceSecond) {
Log.d(TAG, "工作线程获取了锁 resourceSecond")
Thread.sleep(100)
Log.d(TAG, "工作线程尝试获取锁 resourceFirst")
synchronized(resourceFirst) {
while (true) {
Log.d(TAG, "工作线程 mockDeadLock")
}
}
}
}.start()
//主线程睡眠30ms后开始获取锁
Thread.sleep(30)
synchronized(resourceFirst) {
Log.d(TAG, "主线程获取了锁 resourceFirst")
Log.d(TAG, "主线程尝试获取锁 resourceSecond")
synchronized(resourceSecond) {
Log.d(TAG, "主线程获取了锁 resourceFirst")
while (true) {
Log.d(TAG, "主线程 mockDeadLock")
}
}
}
}
上面这段代码逻辑:
调用mockDeadLock()方法以后,多次点击返回键,制造ANR。
Logcat输出
2020-06-04 15:07:41.246 ? E/ActivityManager: ANR in com.example.android.jetpackdemo (com.example.android.jetpackdemo/.StartActivity)
PID: 13626
Reason: Input dispatching timed out (Waiting to send key event because the focused window has not finished processing all of the input events that were previously delivered to it. Outbound queue length: 0. Wait queue length: 2.)
Load: 7.53 / 6.81 / 6.4
CPU usage from 177565ms to 0ms ago (2020-06-04 15:04:39.715 to 2020-06-04 15:07:37.281):
11% 2001/system_server: 7.1% user + 4.4% kernel / faults: 68219 minor 37 major
3.4% 2661/com.android.systemui: 2.8% user + 0.6% kernel / faults: 20555 minor 29 major
2% 508/logd: 0.9% user + 1.1% kernel / faults: 76 minor
1.8% 607/surfaceflinger: 1.1% user + 0.7% kernel / faults: 82 minor 1 major
0% 24463/com.huawei.hwid.persistent: 0% user + 0% kernel / faults: 7819 minor 24 major
0.9% 2823/com.huawei.systemmanager:service: 0.6% user + 0.2% kernel / faults: 13277 minor 12 major
//...
2020-06-04 15:07:41.246 ? E/ActivityManager: CPU usage from 1714ms to 2243ms later (2020-06-04 15:07:38.994 to 2020-06-04 15:07:39.523):
12% 2001/system_server: 9% user + 3.6% kernel / faults: 8 minor
10% 2014/ActivityManager: 5.4% user + 5.4% kernel
1.8% 2399/UEventObserver: 1.8% user + 0% kernel
1.5% 13652/kworker/u16:7: 0% user + 1.5% kernel
2.3% TOTAL: 1.1% user + 1.1% kernel
上面的Logcat输出并没有关于我们进程的CUP信息,说明我们的进程CPU占用率很低。那么我们继续看一下traces.txt文件。
traces.txt部分信息
----- pid 13626 at 2020-06-04 15:07:37 -----
Cmd line: com.example.android.jetpackdemo
Build fingerprint: 'HUAWEI/MLA-AL10/HWMLA:7.0/HUAWEIMLA-AL10/C00B364:user/release-keys'
通过进程号pid 13626搜索
"main" prio=5 tid=1 Blocked
| group="main" sCount=1 dsCount=0 obj=0x77d21af8 self=0x7fa2ea2a00
| sysTid=13626 nice=-10 cgrp=default sched=0/0 handle=0x7fa6f4ba98
| state=S schedstat=( 288564792 6939269 224 ) utm=23 stm=5 core=0 HZ=100
| stack=0x7fd42e0000-0x7fd42e2000 stackSize=8MB
| held mutexes=
at com.example.android.jetpackdemo.StartActivity.mockDeadLock(StartActivity.kt:142)
- waiting to lock <0x0a43b5c8> (a java.lang.String) held by thread 17
at com.example.android.jetpackdemo.StartActivity.onClick(StartActivity.kt:70)
at java.lang.reflect.Method.invoke!(Native method)
at androidx.appcompat.app.AppCompatViewInflater$DeclaredOnClickListener.onClick(AppCompatViewInflater.java:397)
at android.view.View.performClick(View.java:5646)
at android.view.View$PerformClick.run(View.java:22473)
at android.os.Handler.handleCallback(Handler.java:761)
at android.os.Handler.dispatchMessage(Handler.java:98)
at android.os.Looper.loop(Looper.java:156)
at android.app.ActivityThread.main(ActivityThread.java:6517)
at java.lang.reflect.Method.invoke!(Native method)
at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:942)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:832)
主线程状态是线程状态是Blocked,说明正在等待获取锁对象,等待获取的锁对象<0x0a43b5c8>是一个String对象(a java.lang.String),该对象被线程id为17的线程持有。然后我们搜索这个锁对象。
"Thread-2" prio=5 tid=17 Blocked
| group="main" sCount=1 dsCount=0 obj=0x12c89dc0 self=0x7f931cd000
| sysTid=13763 nice=0 cgrp=default sched=0/0 handle=0x7f84344450
| state=S schedstat=( 886406 280365 2 ) utm=0 stm=0 core=0 HZ=100
| stack=0x7f84242000-0x7f84244000 stackSize=1037KB
| held mutexes=
at com.example.android.jetpackdemo.StartActivity$mockDeadLock$1.invoke(StartActivity.kt:127)
- waiting to lock <0x0ec26674> (a java.lang.String) held by thread 1
- locked <0x0a43b5c8> (a java.lang.String)
at com.example.android.jetpackdemo.StartActivity$mockDeadLock$1.invoke(StartActivity.kt:21)
at kotlin.concurrent.ThreadsKt$thread$thread$1.run(Thread.kt:30)
Thread-2,线程状态是Blocked,说明正在等待获取锁对象,等待获取的锁对象<0x0ec26674>是一个String对象(a java.lang.String),这个对象被线程id为1的线程(也就是主线程)持有。并且当前线程持有锁对象<0x0a43b5c8>。
最终,主线程和工作线程Thread-2造成死锁,导致应用无响应。
代码实现是从客户端的两个EditText中获取两个数字,然后通过Binder调用服务端的方法计算两个数的和返回给客户端,然后客户端讲计算结果展示在界面上。
客户端部分代码
private IMyAidlInterface iMyAidlInterface;
private ServiceConnection conn = new ServiceConnection() {
@Override
public void onServiceConnected(ComponentName name, IBinder service) {
//获取Binder对象
iMyAidlInterface = IMyAidlInterface.Stub.asInterface(service);
}
//...
};
public void onClick(View view) {
switch (view.getId()) {
case R.id.btn_count:
mNum1 = Integer.parseInt(etNum1.getText().toString());
mNum2 = Integer.parseInt(etNum2.getText().toString());
try {
//在主线程进行同步binder调用
mTotal = iMyAidlInterface.add(mNum1, mNum2);
} catch (RemoteException e) {
e.printStackTrace();
Log.e(TAG, "onClick: " + e.getMessage());
}
editShowResult.setText("mTotal=" + mTotal);
break;
}
}
服务端部分代码
public class IRemoteService extends Service {
private static final String TAG = "IRemoteService";
private IBinder iBinder = new IMyAidlInterface.Stub() {
@Override
public int add(int num1, int num2) throws RemoteException {
Log.d(TAG, "remote method add: start sleep thread id =" + Thread.currentThread().getId()+"," +
"thread name = "+Thread.currentThread().getName());
try {
//睡眠一段时间,然后才进行计算
Thread.sleep(120000);
} catch (InterruptedException e) {
e.printStackTrace();
}
Log.d(TAG, "remote method add: finish sleep return calculate result");
return num1 + num2;
}
};
public IRemoteService() {
}
@Override
public IBinder onBind(Intent intent) {
return iBinder;
}
}
注意:我们需要先把Binder服务端运行起来,然后再运行Binder客户端执行相应的方法。
Logcat输出
2020-06-04 15:49:47.006 2001-2014/? E/ActivityManager: ANR in com.hm.aidlclient (com.hm.aidlclient/.BaseKnowledgeActivity)
PID: 18096
Reason: Input dispatching timed out (Waiting to send key event because the focused window has not finished processing all of the input events that were previously delivered to it. Outbound queue length: 0. Wait queue length: 2.)
Load: 7.55 / 7.26 / 6.87
CPU usage from 755516ms to 0ms ago (2020-06-04 15:37:07.545 to 2020-06-04 15:49:43.062) with 99% awake:
5.1% 2001/system_server: 3.5% user + 1.5% kernel / faults: 139606 minor 17 major
1.2% 508/logd: 0.5% user + 0.6% kernel / faults: 35 minor
0.9% 2661/com.android.systemui: 0.7% user + 0.1% kernel / faults: 13039 minor 4 major
0.8% 12442/adbd: 0.2% user + 0.6% kernel / faults: 23957 minor
0.7% 607/surfaceflinger: 0.4% user + 0.3% kernel / faults: 183 minor 2 major
0.6% 28932/com.huawei.appmarket: 0.5% user + 0.1% kernel / faults: 9311 minor 64 major
0.4% 24463/com.huawei.hwid.persistent: 0.3% user + 0% kernel / faults: 11607 minor 6 major
0.5% 24301/fingerprint_log: 0% user + 0.5% kernel
0.3% 4128/com.google.android.gms: 0.2% user + 0% kernel / faults: 26970 minor 16 major
//...
2020-06-04 15:49:47.006 2001-2014/? E/ActivityManager: CPU usage from 1701ms to 2232ms later (2020-06-04 15:49:44.762 to 2020-06-04 15:49:45.293):
28% 2001/system_server: 21% user + 7.2% kernel / faults: 38 minor
16% 2010/HeapTaskDaemon: 16% user + 0% kernel
9% 2014/ActivityManager: 1.8% user + 7.2% kernel
1.8% 2001/system_server: 0% user + 1.8% kernel
1.8% 2540/NetdConnector: 1.8% user + 0% kernel
9% 607/surfaceflinger: 9% user + 0% kernel
5.4% 607/surfaceflinger: 5.4% user + 0% kernel
1.8% 658/Binder:607_1: 0% user + 1.8% kernel
1.8% 677/EventThread: 0% user + 1.8% kernel
7.1% 2661/com.android.systemui: 5.3% user + 1.7% kernel / faults: 38 minor
8.9% 3614/RenderThread: 7.1% user + 1.7% kernel
1.3% 508/logd: 1.3% user + 0% kernel
1.3% 624/mm-pp-dpps: 1.3% user + 0% kernel
2.7% 717/ABA_THREAD: 1.3% user + 1.3% kernel
1.5% 15978/mdss_fb0: 0% user + 1.5% kernel
1.6% 18228/logcat: 0% user + 1.6% kernel
8.3% TOTAL: 5.8% user + 2.5% kernel
并没有什么有价值的信息。继续看一下traces.txt文件。
traces.txt中客户端相关信息
----- pid 18096 at 2020-06-04 15:49:43 -----
Cmd line: com.hm.aidlclient
Build fingerprint: 'HUAWEI/MLA-AL10/HWMLA:7.0/HUAWEIMLA-AL10/C00B364:user/release-keys'
通过进程号pid 18096搜索
"main" prio=5 tid=1 Native
| group="main" sCount=1 dsCount=0 obj=0x77d21af8 self=0x7fa2ea2a00
| sysTid=18096 nice=-10 cgrp=default sched=0/0 handle=0x7fa6f4ba98
| state=S schedstat=( 464662186 22498334 359 ) utm=38 stm=8 core=0 HZ=100
| stack=0x7fd42e0000-0x7fd42e2000 stackSize=8MB
| held mutexes=
kernel: __switch_to+0x70/0x7c
kernel: binder_thread_read+0x4cc/0x13f0
kernel: binder_ioctl+0x53c/0xbcc
kernel: do_vfs_ioctl+0x570/0x5a8
kernel: SyS_ioctl+0x60/0x88
kernel: el0_svc_naked+0x24/0x28
native: #00 pc 000000000006ad6c /system/lib64/libc.so (__ioctl+4)
native: #01 pc 000000000001fa48 /system/lib64/libc.so (ioctl+144)
native: #02 pc 00000000000555a4 /system/lib64/libbinder.so (_ZN7android14IPCThreadState14talkWithDriverEb+260)
native: #03 pc 0000000000056388 /system/lib64/libbinder.so (_ZN7android14IPCThreadState15waitForResponseEPNS_6ParcelEPi+352)
native: #04 pc 000000000004b250 /system/lib64/libbinder.so (_ZN7android8BpBinder8transactEjRKNS_6ParcelEPS1_j+72)
native: #05 pc 0000000000103354 /system/lib64/libandroid_runtime.so (???)
native: #06 pc 0000000000b36238 /data/dalvik-cache/arm64/system@framework@boot-framework.oat (Java_android_os_BinderProxy_transactNative__ILandroid_os_Parcel_2Landroid_os_Parcel_2I+196)
at android.os.BinderProxy.transactNative(Native method)
at android.os.BinderProxy.transact(Binder.java:617)
at com.hm.aidlserver.IMyAidlInterface$Stub$Proxy.add(IMyAidlInterface.java:90)
at com.hm.aidlclient.BaseKnowledgeActivity.onClick(BaseKnowledgeActivity.java:109)
at com.hm.aidlclient.BaseKnowledgeActivity_ViewBinding$1.doClick(BaseKnowledgeActivity_ViewBinding.java:41)
at butterknife.internal.DebouncingOnClickListener.onClick(DebouncingOnClickListener.java:22)
at android.view.View.performClick(View.java:5646)
at android.view.View$PerformClick.run(View.java:22473)
at android.os.Handler.handleCallback(Handler.java:761)
at android.os.Handler.dispatchMessage(Handler.java:98)
at android.os.Looper.loop(Looper.java:156)
at android.app.ActivityThread.main(ActivityThread.java:6517)
at java.lang.reflect.Method.invoke!(Native method)
at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:942)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:832)
看到Binder客户端主线程的状态是Native,这个状态是native线程的一个状态,对应java线程的RUNNABLE状态。更详细的对应关系可以参考VMThread.java。然后从上面的信息中我们只看到BinderProxy调用了transactNative()方法,这是一个本地方法,最终会调用服务端Binder对象的transact()方法,实现真正的跨进程通信。除了这些我们没有看到其他有用的信息了。那么我们接下来看一看服务端的一些信息,看看能不能找到一些线索。
traces.txt中服务端相关信息
----- pid 17773 at 2020-06-04 15:49:43 -----
Cmd line: com.hm.aidlserver
Build fingerprint: 'HUAWEI/MLA-AL10/HWMLA:7.0/HUAWEIMLA-AL10/C00B364:user/release-keys'
通过进程号pid 17773搜索
"main" prio=5 tid=1 Native
| group="main" sCount=1 dsCount=0 obj=0x77d21af8 self=0x7fa2ea2a00
| sysTid=17773 nice=0 cgrp=default sched=0/0 handle=0x7fa6f4ba98
| state=S schedstat=( 213791882 16481247 206 ) utm=18 stm=3 core=1 HZ=100
| stack=0x7fd42e0000-0x7fd42e2000 stackSize=8MB
| held mutexes=
kernel: __switch_to+0x70/0x7c
kernel: SyS_epoll_wait+0x2d4/0x394
kernel: SyS_epoll_pwait+0xc4/0x150
kernel: el0_svc_naked+0x24/0x28
native: #00 pc 000000000006ac80 /system/lib64/libc.so (__epoll_pwait+8)
native: #01 pc 000000000001e21c /system/lib64/libc.so (epoll_pwait+64)
native: #02 pc 00000000000181d8 /vendor/lib64/libutils.so (_ZN7android6Looper9pollInnerEi+156)
native: #03 pc 000000000001808c /vendor/lib64/libutils.so (_ZN7android6Looper8pollOnceEiPiS1_PPv+60)
native: #04 pc 00000000000f66dc /system/lib64/libandroid_runtime.so (_ZN7android18NativeMessageQueue8pollOnceEP7_JNIEnvP8_jobjecti+48)
native: #05 pc 0000000000b91ec0 /data/dalvik-cache/arm64/system@framework@boot-framework.oat (Java_android_os_MessageQueue_nativePollOnce__JI+140)
at android.os.MessageQueue.nativePollOnce(Native method)
at android.os.MessageQueue.next(MessageQueue.java:356)
at android.os.Looper.loop(Looper.java:138)
at android.app.ActivityThread.main(ActivityThread.java:6517)
at java.lang.reflect.Method.invoke!(Native method)
at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:942)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:832)
服务端的进程号是pid 17773,我们看到服务端的主线程中也没有什么线索,不要慌,这里我们似乎忘了一点什么,服务端的Binder对象是运行在服务端的Binder线程池中的。
那我们怎么找到具体是Binder线程池中的哪个线程呢?其实在traces.txt文件中也是输出了的。
----- binder transactions -----
18096:18096(m.hm.aidlclient:m.hm.aidlclient) -> 17773:17788(m.hm.aidlserver:Binder:17773_2) code: 1
----- end binder transactions -----
上面这段信息的意思就是,我们是在进程id为18096,内核线程为18096的线程(就是主线程)向进程id为17773,内核线id为17788的线程发起跨进程通信。内核线程id为17788的线程的线程名称是Binder:17773_2
。那么我们就搜索一下Binder:17773_2
。搜索结果如下所示:
"Binder:17773_2" prio=5 tid=10 Sleeping
| group="main" sCount=1 dsCount=0 obj=0x32c064c0 self=0x7f9a624800
| sysTid=17788 nice=-10 cgrp=default sched=0/0 handle=0x7fa0fc3450
| state=S schedstat=( 3077762 6086666 14 ) utm=0 stm=0 core=6 HZ=100
| stack=0x7fa0ec9000-0x7fa0ecb000 stackSize=1005KB
| held mutexes=
at java.lang.Thread.sleep!(Native method)
- sleeping on <0x05eea4a7> (a java.lang.Object)
at java.lang.Thread.sleep(Thread.java:379)
- locked <0x05eea4a7> (a java.lang.Object)
at java.lang.Thread.sleep(Thread.java:321)
at com.hm.aidlserver.IRemoteService$1.add(IRemoteService.java:18)
at com.hm.aidlserver.IMyAidlInterface$Stub.onTransact(IMyAidlInterface.java:55)
at android.os.Binder.execTransact(Binder.java:565)
这里我们终于发现了原因,我们看到Binder:17773_2
状态是Sleeping
,就是服务端的Binder对象的add()方法内部第18行调用了Thread.sleep方法造成长时间无法返回,从而使客户端方法执行无法结束,最终导致ANR。
当ANR发生时,先通过logcat定位ANR的类型,然后通过trace信息分析产生ANR的原因,需要重点关注主线程(main)的当前状态和CPU的使用情况!当然ANR还是以预防为主,切记不要在主线程做耗时操作,不管是主动发起还是调用系统方法都需要对耗时的地方进行评估!