slh,
I reinstalled kernel 3.3-0.slh.6 in order to reproduce the issue.
Note my tests are at home where the only network which I can test is my own ("sid"); all others are encrypted neigbour networks I cannot connect to anyway.
First, some facts about the observed behaviour:
1) After complete laptop reboot, the connection is always initialized and established properly (so when I lose connection and wpa hangs, I "cure" the issue by rebooting laptop);
2) wlan0 "hangs" upon the following situations, on the running machine (and cannot anymore reestablish connection until all rebooted):
- network is "lost" for any reason; (wpa_gui and friends "hang forever" trying to reestablish connection)
- laptop suspend (to RAM) and subsequent resume (wpa_gui and friends "hangs forever" trying to reestablish connection)
- even manually switching to different network via the dropdown in wpa_gui, and then attempt to switch back to the initial network (wpa_gui and friends "hangs forever" trying to reestablish connection)
To reproduce the issue and grab below logs, I used the last method (manually switching to "any" in wpa_gui dropdown; and then trying to reconnect to mine switching the dropdown back to "sid").
Now the data (all captured AFTER the "hang"):
uname:
Code:
Linux siddie-new 3.3-0.slh.6-aptosid-amd64 #1 SMP PREEMPT Fri Mar 30 13:01:55 UTC 2012 x86_64 GNU/Linux
lcpci -knn
Code:
03:00.0 Network controller [0280]: Atheros Communications Inc. AR9285 Wireless Network Adapter (PCI-Express) [168c:002b] (rev 01)
Subsystem: AzureWave AW-NE785 / AW-NE785H 802.11bgn Wireless Full or Half-size Mini PCIe Card [1a3b:1089]
Kernel driver in use: ath9k
lspci -vvnn
Code:
03:00.0 Network controller [0280]: Atheros Communications Inc. AR9285 Wireless Network Adapter (PCI-Express) [168c:002b] (rev 01)
Subsystem: AzureWave AW-NE785 / AW-NE785H 802.11bgn Wireless Full or Half-size Mini PCIe Card [1a3b:1089]
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort>SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 17
Region 0: Memory at f1800000 (64-bit, non-prefetchable) [size=64K]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1+ D2- AuxCurrent=375mA PME(D0+,D1+,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit-
Address: 00000000 Data: 0000
Capabilities: [60] Express (v2) Legacy Endpoint, MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Latency L0 <512ns, L1 <64us
ClockPM- Surprise- LLActRep- BwNot-
LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- Retrain- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Not Supported, TimeoutDis+
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [100 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr+ BadTLP+ BadDLLP- Rollover+ Timeout+ NonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
Capabilities: [140 v1] Virtual Channel
Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
Arb: Fixed- WRR32- WRR64- WRR128-
Ctrl: ArbSelect=Fixed
Status: InProgress-
VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
Status: NegoPending- InProgress-
Capabilities: [160 v1] Device Serial Number 00-15-17-ff-ff-24-14-12
Capabilities: [170 v1] Power Budgeting <?>
Kernel driver in use: ath9k
iwlist wlan0 scan (only my network filtered out of the list)
Code:
Cell 09 - Address: 00:19:5B:95:59:93
Channel:6
Frequency:2.437 GHz (Channel 6)
Quality=44/70 Signal level=-66 dBm
Encryption key:on
ESSID:"sid"
Bit Rates:1 Mb/s; 2 Mb/s; 5.5 Mb/s; 11 Mb/s
Bit Rates:6 Mb/s; 9 Mb/s; 12 Mb/s; 18 Mb/s; 24 Mb/s
36 Mb/s; 48 Mb/s; 54 Mb/s
Mode:Master
Extra:tsf=0000000b318bb181
Extra: Last beacon: 575ms ago
IE: Unknown: 0003736964
IE: Unknown: 010482848B96
IE: Unknown: 030106
IE: Unknown: 2A0102
IE: Unknown: 32080C1218243048606C
IE: IEEE 802.11i/WPA2 Version 1
Group Cipher : CCMP
Pairwise Ciphers (1) : CCMP
Authentication Suites (1) : PSK
IE: Unknown: DD180050F202010100000364B80B27A4000041435E0061322F00
IE: Unknown: DD0900037F01010005FF7F
dmesg:
Code:
......
[ 903.055274] wlan0: authentication with 00:19:5b:95:59:93 timed out
[ 909.517973] wlan0: authenticate with 00:19:5b:95:59:93 (try 1)
[ 909.717016] wlan0: authenticate with 00:19:5b:95:59:93 (try 2)
[ 909.916709] wlan0: authenticate with 00:19:5b:95:59:93 (try 3)
[ 910.116528] wlan0: authentication with 00:19:5b:95:59:93 timed out
[ 916.585113] wlan0: authenticate with 00:19:5b:95:59:93 (try 1)
[ 916.784152] wlan0: authenticate with 00:19:5b:95:59:93 (try 2)
[ 916.983913] wlan0: authenticate with 00:19:5b:95:59:93 (try 3)
[ 917.183666] wlan0: authentication with 00:19:5b:95:59:93 timed out
[ 923.652171] wlan0: authenticate with 00:19:5b:95:59:93 (try 1)
[ 923.851308] wlan0: authenticate with 00:19:5b:95:59:93 (try 2)
[ 924.051017] wlan0: authenticate with 00:19:5b:95:59:93 (try 3)
[ 924.250729] wlan0: authentication with 00:19:5b:95:59:93 timed out
[ 930.709592] wlan0: authenticate with 00:19:5b:95:59:93 (try 1)
[ 930.909480] wlan0: authenticate with 00:19:5b:95:59:93 (try 2)
[ 931.109235] wlan0: authenticate with 00:19:5b:95:59:93 (try 3)
[ 931.309005] wlan0: authentication with 00:19:5b:95:59:93 timed out
[ 937.777456] wlan0: authenticate with 00:19:5b:95:59:93 (try 1)
[ 937.976650] wlan0: authenticate with 00:19:5b:95:59:93 (try 2)
[ 938.176361] wlan0: authenticate with 00:19:5b:95:59:93 (try 3)
[ 938.376144] wlan0: authentication with 00:19:5b:95:59:93 timed out
/var/log/kern.log shows the same as dmesg
/var/log/messages:
Code:
Apr 4 22:42:14 siddie-new /usr/sbin/gpm[1635]: Started gpm successfully. Entered daemon mode.
Apr 4 22:42:15 siddie-new kernel: [ 25.014516] microcode: CPU0 sig=0x20655, pf=0x10, revision=0x2
Apr 4 22:42:15 siddie-new kernel: [ 25.051408] microcode: CPU1 sig=0x20655, pf=0x10, revision=0x2
Apr 4 22:42:15 siddie-new kernel: [ 25.054147] microcode: CPU2 sig=0x20655, pf=0x10, revision=0x2
Apr 4 22:42:15 siddie-new kernel: [ 25.057468] microcode: CPU3 sig=0x20655, pf=0x10, revision=0x2
Apr 4 22:42:15 siddie-new kernel: [ 25.060337] microcode: Microcode Update Driver: v2.00 <tigran>, Peter Oruba
Apr 4 22:42:15 siddie-new kernel: [ 25.264182] microcode: CPU0 updated to revision 0x3, date = 2011-09-01
Apr 4 22:42:15 siddie-new kernel: [ 25.264689] microcode: CPU1 updated to revision 0x3, date = 2011-09-01
Apr 4 22:42:15 siddie-new kernel: [ 25.265182] microcode: CPU2 updated to revision 0x3, date = 2011-09-01
Apr 4 22:42:15 siddie-new kernel: [ 25.265655] microcode: CPU3 updated to revision 0x3, date = 2011-09-01
Apr 4 22:54:03 siddie-new kernel: [ 732.313047] cfg80211: Calling CRDA to update world regulatory domain
Apr 4 22:54:03 siddie-new kernel: [ 732.327447] cfg80211: World regulatory domain updated:
Apr 4 22:54:03 siddie-new kernel: [ 732.327452] cfg80211: (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
Apr 4 22:54:03 siddie-new kernel: [ 732.327457] cfg80211: (2402000 KHz - 2472000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
Apr 4 22:54:03 siddie-new kernel: [ 732.327515] cfg80211: (2457000 KHz - 2482000 KHz @ 20000 KHz), (300 mBi, 2000 mBm)
Apr 4 22:54:03 siddie-new kernel: [ 732.327520] cfg80211: (2474000 KHz - 2494000 KHz @ 20000 KHz), (300 mBi, 2000 mBm)
Apr 4 22:54:03 siddie-new kernel: [ 732.327526] cfg80211: (5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
Apr 4 22:54:03 siddie-new kernel: [ 732.327531] cfg80211: (5735000 KHz - 5835000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
Apr 4 22:54:03 siddie-new wpa_action: WPA_IFACE=wlan0 WPA_ACTION=DISCONNECTED
Apr 4 22:54:03 siddie-new wpa_action: WPA_ID=0 WPA_ID_STR=sid_home WPA_CTRL_DIR=/var/run/wpa_supplicant
Apr 4 22:54:03 siddie-new wpa_action: ifdown wlan0
Apr 4 22:54:04 siddie-new wpa_action: removing sendsigs omission pidfile: /run/sendsigs.omit.d/wpasupplicant.wpa_supplicant.wlan0.pid
Apr 4 22:54:04 siddie-new kernel: [ 732.728782] ADDRCONF(NETDEV_UP): wlan0: link is not ready
It is definitely a kernel/driver related issue, because switching the kernels I boot from 3.3-0.slh.1 to 3.3-0.slh.6 (with all else the same) switches from buggy to normal behaviour and vice versa.
As I can grasp from reports of other people affected here, the critical change occured occured between kernels 3.3-0.slh.1 and 3.3-0.slh.2. (I never used 3.3-0.slh.2; and it appeared for me the first time in 3.3-0.slh.3).
With this bug, the network is usable until the first disconnection or wpa-roaming attempt occurs.
After that there is no go (well maybe some other types of wifi networks still work but I can't test it now; I can test only with mine) until the complete reboot of the laptop.
I was not sure what should I provide for "wpa_cli results".
In case more details are needed - let me know (I will be keeping temporarily with the latest kernel for the sake of this).
Thanks
UPDATE:
Leaving laptop in fruitless attempts to reestablish connection, I decided to log into the router in parallel (from my desktop) to check what router logs have to say.
Here they are:
Code:
[INFO] Wed Apr 04 23:51:15 2012 Wireless system with MAC address 74F06D7C6531 disconnected for reason: Received Deauthentication.
[INFO] Wed Apr 04 23:51:15 2012 Lease 192.168.167.133 renewed by client 74F06D7C6531
[INFO] Wed Apr 04 23:51:15 2012 Assigned new lease 192.168.167.133 to client 74F06D7C6531
[INFO] Wed Apr 04 23:50:57 2012 Wireless system with MAC address 74F06D7C6531 associated
[INFO] Wed Apr 04 23:50:26 2012 Wireless system with MAC address 74F06D7C6531 disconnected for reason: Received Deauthentication.
[INFO] Wed Apr 04 23:50:25 2012 Lease 192.168.167.133 renewed by client 74F06D7C6531
[INFO] Wed Apr 04 23:50:25 2012 Assigned new lease 192.168.167.133 to client 74F06D7C6531
[INFO] Wed Apr 04 23:50:07 2012 Wireless system with MAC address 74F06D7C6531 associated
[INFO] Wed Apr 04 23:49:20 2012 Wireless system with MAC address 74F06D7C6531 disconnected for reason: Received Deauthentication.
[INFO] Wed Apr 04 23:49:20 2012 Lease 192.168.167.133 renewed by client 74F06D7C6531
[INFO] Wed Apr 04 23:49:20 2012 Assigned new lease 192.168.167.133 to client 74F06D7C6531
[INFO] Wed Apr 04 23:49:19 2012 Wireless system with MAC address 74F06D7C6531 associated
[INFO] Wed Apr 04 23:49:10 2012 Wireless system with MAC address 74F06D7C6531 disconnected for reason: Received Deauthentication.
[INFO] Wed Apr 04 23:49:03 2012 Lease 192.168.167.133 renewed by client 74F06D7C6531
[INFO] Wed Apr 04 23:49:03 2012 Assigned new lease 192.168.167.133 to client 74F06D7C6531
[INFO] Wed Apr 04 23:49:02 2012 Wireless system with MAC address 74F06D7C6531 associated
[INFO] Wed Apr 04 23:48:40 2012 Wireless system with MAC address 74F06D7C6531 disconnected for reason: Received Deauthentication.
I.e.from router's standpoint of view, it successfully receives the lease request and leases IP to the laptop's wlan0.
For some reason however laptop does not get the lease and attempts reconnections.
After some more playing with this (switching back/forth via the wpa_gui dropdown), it looks like it may be something like timing/racing issue between wpa_supplicant infrastructure and driver (which nevertheless is related to the driver as well, as the kernel change fixes it).
What I observe at wpa_gui widget visually, is slightly variable behaviour:
1) sometimes it hangs at "trying to authenticate" stage and stops here (unless target network is manually forced to switch back/forth via the dropdown)
2) Sometimes it progresses to "completed" status; waiting (relatively long time, 7-10 seconds) for IP address which is not shown. After some timeout (obviously waiting for IP address), it tries to reestablish connection; either cycling this way (repeatin 2), and after few re-attemps it again hangs at 1)
3) Sometimes (very rare) it even goes as far as to show "completed" + IP address in wpa_gui (for a second or so). However for some reason it still thinks it is "not connected" and timeout occured, and resets/tries to reestablish connection; effectively going to 1 or 2 above.
Still it might be that the normal per se change in the kernel (which affected some timings say directly or as a side effect), actually revealed a "hidden until now" bug in wpa_supplicant stuff so they lose sync tryng to cooperate in some conditions when network needs to be reestablished.
So IMHO it is not clear whether it is a pure kernel or wpa_supplicant issue... |