muen/linux.git
3 years agoMerge branch 'work.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Linus Torvalds [Sat, 16 Jun 2018 07:21:50 +0000 (16:21 +0900)]
Merge branch 'work.compat' of git://git./linux/kernel/git/viro/vfs

Pull compat updates from Al Viro:
 "Some biarch patches - getting rid of assorted (mis)uses of
  compat_alloc_user_space().

  Not much in that area this cycle..."

* 'work.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  orangefs: simplify compat ioctl handling
  signalfd: lift sigmask copyin and size checks to callers of do_signalfd4()
  vmsplice(): lift importing iovec into vmsplice(2) and compat counterpart

3 years agoMerge branch 'work.aio' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Linus Torvalds [Sat, 16 Jun 2018 07:11:40 +0000 (16:11 +0900)]
Merge branch 'work.aio' of git://git./linux/kernel/git/viro/vfs

Pull aio fixes from Al Viro:
 "Assorted AIO followups and fixes"

* 'work.aio' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  eventpoll: switch to ->poll_mask
  aio: only return events requested in poll_mask() for IOCB_CMD_POLL
  eventfd: only return events requested in poll_mask()
  aio: mark __aio_sigset::sigmask const

3 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Linus Torvalds [Fri, 15 Jun 2018 22:39:34 +0000 (07:39 +0900)]
Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

 1) Various netfilter fixlets from Pablo and the netfilter team.

 2) Fix regression in IPVS caused by lack of PMTU exceptions on local
    routes in ipv6, from Julian Anastasov.

 3) Check pskb_trim_rcsum for failure in DSA, from Zhouyang Jia.

 4) Don't crash on poll in TLS, from Daniel Borkmann.

 5) Revert SO_REUSE{ADDR,PORT} change, it regresses various things
    including Avahi mDNS. From Bart Van Assche.

 6) Missing of_node_put in qcom/emac driver, from Yue Haibing.

 7) We lack checking of the TCP checking in one special case during SYN
    receive, from Frank van der Linden.

 8) Fix module init error paths of mac80211 hwsim, from Johannes Berg.

 9) Handle 802.1ad properly in stmmac driver, from Elad Nachman.

10) Must grab HW caps before doing quirk checks in stmmac driver, from
    Jose Abreu.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (81 commits)
  net: stmmac: Run HWIF Quirks after getting HW caps
  neighbour: skip NTF_EXT_LEARNED entries during forced gc
  net: cxgb3: add error handling for sysfs_create_group
  tls: fix waitall behavior in tls_sw_recvmsg
  tls: fix use-after-free in tls_push_record
  l2tp: filter out non-PPP sessions in pppol2tp_tunnel_ioctl()
  l2tp: reject creation of non-PPP sessions on L2TPv2 tunnels
  mlxsw: spectrum_switchdev: Fix port_vlan refcounting
  mlxsw: spectrum_router: Align with new route replace logic
  mlxsw: spectrum_router: Allow appending to dev-only routes
  ipv6: Only emit append events for appended routes
  stmmac: added support for 802.1ad vlan stripping
  cfg80211: fix rcu in cfg80211_unregister_wdev
  mac80211: Move up init of TXQs
  mac80211_hwsim: fix module init error paths
  cfg80211: initialize sinfo in cfg80211_get_station
  nl80211: fix some kernel doc tag mistakes
  hv_netvsc: Fix the variable sizes in ipsecv2 and rsc offload
  rds: avoid unenecessary cong_update in loop transport
  l2tp: clean up stale tunnel or session in pppol2tp_connect's error path
  ...

3 years agoMerge tag 'modules-for-v4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu...
Linus Torvalds [Fri, 15 Jun 2018 22:36:39 +0000 (07:36 +0900)]
Merge tag 'modules-for-v4.18' of git://git./linux/kernel/git/jeyu/linux

Pull module updates from Jessica Yu:
 "Minor code cleanup and also allow sig_enforce param to be shown in
  sysfs with CONFIG_MODULE_SIG_FORCE"

* tag 'modules-for-v4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
  module: Allow to always show the status of modsign
  module: Do not access sig_enforce directly

3 years agoMerge branch 'for-linus-4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 15 Jun 2018 21:50:51 +0000 (06:50 +0900)]
Merge branch 'for-linus-4.18-rc1' of git://git./linux/kernel/git/rw/uml

Pull uml updates from Richard Weinberger:
 "Minor updates for UML:

   - fixes for our new vector network driver by Anton

   - initcall cleanup by Alexander

   - We have a new mailinglist, sourceforge.net sucks"

* 'for-linus-4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
  um: Fix raw interface options
  um: Fix initialization of vector queues
  um: remove uml initcalls
  um: Update mailing list address

3 years agoMerge tag 'riscv-for-linus-4.18-merge_window' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Fri, 15 Jun 2018 21:42:43 +0000 (06:42 +0900)]
Merge tag 'riscv-for-linus-4.18-merge_window' of git://git./linux/kernel/git/palmer/riscv-linux

Pull RISC-V updates from Palmer Dabbelt:
 "This contains some small RISC-V updates I'd like to target for 4.18.

  They are all fairly small this time. Here's a short summary, there's
  more info in the commits/merges:

   - a fix to __clear_user to respect the passed arguments.

   - enough support for the perf subsystem to work with RISC-V's ISA
     defined performance counters.

   - support for sparse and cleanups suggested by it.

   - support for R_RISCV_32 (a relocation, not the 32-bit ISA).

   - some MAINTAINERS cleanups.

   - the addition of CONFIG_HVC_RISCV_SBI to our defconfig, as it's
     always present.

  I've given these a simple build+boot test"

* tag 'riscv-for-linus-4.18-merge_window' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux:
  RISC-V: Add CONFIG_HVC_RISCV_SBI=y to defconfig
  RISC-V: Handle R_RISCV_32 in modules
  riscv/ftrace: Export _mcount when DYNAMIC_FTRACE isn't set
  riscv: add riscv-specific predefines to CHECKFLAGS
  riscv: split the declaration of __copy_user
  riscv: no __user for probe_kernel_address()
  riscv: use NULL instead of a plain 0
  perf: riscv: Add Document for Future Porting Guide
  perf: riscv: preliminary RISC-V support
  MAINTAINERS: Update Albert's email, he's back at Berkeley
  MAINTAINERS: Add myself as a maintainer for SiFive's drivers
  riscv: Fix the bug in memory access fixup code

3 years agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Linus Torvalds [Fri, 15 Jun 2018 21:37:04 +0000 (06:37 +0900)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm

Pull more kvm updates from Paolo Bonzini:
 "Mostly the PPC part of the release, but also switching to Arnd's fix
  for the hyperv config issue and a typo fix.

  Main PPC changes:

   - reimplement the MMIO instruction emulation

   - transactional memory support for PR KVM

   - improve radix page table handling"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (63 commits)
  KVM: x86: VMX: redo fix for link error without CONFIG_HYPERV
  KVM: x86: fix typo at kvm_arch_hardware_setup comment
  KVM: PPC: Book3S PR: Fix failure status setting in tabort. emulation
  KVM: PPC: Book3S PR: Enable use on POWER9 bare-metal hosts in HPT mode
  KVM: PPC: Book3S PR: Don't let PAPR guest set MSR hypervisor bit
  KVM: PPC: Book3S PR: Fix failure status setting in treclaim. emulation
  KVM: PPC: Book3S PR: Fix MSR setting when delivering interrupts
  KVM: PPC: Book3S PR: Handle additional interrupt types
  KVM: PPC: Book3S PR: Enable kvmppc_get/set_one_reg_pr() for HTM registers
  KVM: PPC: Book3S: Remove load/put vcpu for KVM_GET_REGS/KVM_SET_REGS
  KVM: PPC: Remove load/put vcpu for KVM_GET/SET_ONE_REG ioctl
  KVM: PPC: Move vcpu_load/vcpu_put down to each ioctl case in kvm_arch_vcpu_ioctl
  KVM: PPC: Book3S PR: Enable HTM for PR KVM for KVM_CHECK_EXTENSION ioctl
  KVM: PPC: Book3S PR: Support TAR handling for PR KVM HTM
  KVM: PPC: Book3S PR: Add guard code to prevent returning to guest with PR=0 and Transactional state
  KVM: PPC: Book3S PR: Add emulation for tabort. in privileged state
  KVM: PPC: Book3S PR: Add emulation for trechkpt.
  KVM: PPC: Book3S PR: Add emulation for treclaim.
  KVM: PPC: Book3S PR: Restore NV regs after emulating mfspr from TM SPRs
  KVM: PPC: Book3S PR: Always fail transactions in guest privileged state
  ...

3 years agoMerge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Linus Torvalds [Fri, 15 Jun 2018 21:35:02 +0000 (06:35 +0900)]
Merge tag 'for_linus' of git://git./linux/kernel/git/mst/vhost

Pull virtio updates from Michael Tsirkin:
 "virtio, vhost: features, fixes

   - PCI virtual function support for virtio

   - DMA barriers for virtio strong barriers

   - bugfixes"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  virtio: update the comments for transport features
  virtio_pci: support enabling VFs
  vhost: fix info leak due to uninitialized memory
  virtio_ring: switch to dma_XX barriers for rpmsg

3 years agonet: stmmac: Run HWIF Quirks after getting HW caps
Jose Abreu [Fri, 15 Jun 2018 15:17:27 +0000 (16:17 +0100)]
net: stmmac: Run HWIF Quirks after getting HW caps

Currently we were running HWIF quirks before getting HW capabilities.
This is not right because some HWIF callbacks depend on HW caps.

Lets save the quirks callback and use it in a later stage.

This fixes Altera socfpga.

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Fixes: 5f0456b43140 ("net: stmmac: Implement logic to automatically select HW Interface")
Reported-by: Dinh Nguyen <dinh.linux@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Vitor Soares <soares@synopsys.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Cc: Dinh Nguyen <dinh.linux@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoneighbour: skip NTF_EXT_LEARNED entries during forced gc
Roopa Prabhu [Wed, 13 Jun 2018 04:26:10 +0000 (21:26 -0700)]
neighbour: skip NTF_EXT_LEARNED entries during forced gc

Commit 9ce33e46531d ("neighbour: support for NTF_EXT_LEARNED flag")
added support for NTF_EXT_LEARNED for neighbour entries.
NTF_EXT_LEARNED entries are neigh entries managed by control
plane (eg: Ethernet VPN implementation in FRR routing suite).
Periodic gc already excludes these entries. This patch extends
it to forced gc which the earlier patch missed.

Fixes: 9ce33e46531d ("neighbour: support for NTF_EXT_LEARNED flag")
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: cxgb3: add error handling for sysfs_create_group
Zhouyang Jia [Fri, 15 Jun 2018 03:06:17 +0000 (11:06 +0800)]
net: cxgb3: add error handling for sysfs_create_group

When sysfs_create_group fails, the lack of error-handling code may
cause unexpected results.

This patch adds error-handling code after calling sysfs_create_group.

Signed-off-by: Zhouyang Jia <jiazhouyang09@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'tls-fixes'
David S. Miller [Fri, 15 Jun 2018 16:14:31 +0000 (09:14 -0700)]
Merge branch 'tls-fixes'

Daniel Borkmann says:

====================
Two tls fixes

First one is syzkaller trigered uaf and second one noticed
while writing test code with tls ulp. For details please see
individual patches.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agotls: fix waitall behavior in tls_sw_recvmsg
Daniel Borkmann [Fri, 15 Jun 2018 01:07:46 +0000 (03:07 +0200)]
tls: fix waitall behavior in tls_sw_recvmsg

Current behavior in tls_sw_recvmsg() is to wait for incoming tls
messages and copy up to exactly len bytes of data that the user
provided. This is problematic in the sense that i) if no packet
is currently queued in strparser we keep waiting until one has been
processed and pushed into tls receive layer for tls_wait_data() to
wake up and push the decrypted bits to user space. Given after
tls decryption, we're back at streaming data, use sock_rcvlowat()
hint from tcp socket instead. Retain current behavior with MSG_WAITALL
flag and otherwise use the hint target for breaking the loop and
returning to application. This is done if currently no ctx->recv_pkt
is ready, otherwise continue to process it from our strparser
backlog.

Fixes: c46234ebb4d1 ("tls: RX path for ktls")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agotls: fix use-after-free in tls_push_record
Daniel Borkmann [Fri, 15 Jun 2018 01:07:45 +0000 (03:07 +0200)]
tls: fix use-after-free in tls_push_record

syzkaller managed to trigger a use-after-free in tls like the
following:

  BUG: KASAN: use-after-free in tls_push_record.constprop.15+0x6a2/0x810 [tls]
  Write of size 1 at addr ffff88037aa08000 by task a.out/2317

  CPU: 3 PID: 2317 Comm: a.out Not tainted 4.17.0+ #144
  Hardware name: LENOVO 20FBCTO1WW/20FBCTO1WW, BIOS N1FET47W (1.21 ) 11/28/2016
  Call Trace:
   dump_stack+0x71/0xab
   print_address_description+0x6a/0x280
   kasan_report+0x258/0x380
   ? tls_push_record.constprop.15+0x6a2/0x810 [tls]
   tls_push_record.constprop.15+0x6a2/0x810 [tls]
   tls_sw_push_pending_record+0x2e/0x40 [tls]
   tls_sk_proto_close+0x3fe/0x710 [tls]
   ? tcp_check_oom+0x4c0/0x4c0
   ? tls_write_space+0x260/0x260 [tls]
   ? kmem_cache_free+0x88/0x1f0
   inet_release+0xd6/0x1b0
   __sock_release+0xc0/0x240
   sock_close+0x11/0x20
   __fput+0x22d/0x660
   task_work_run+0x114/0x1a0
   do_exit+0x71a/0x2780
   ? mm_update_next_owner+0x650/0x650
   ? handle_mm_fault+0x2f5/0x5f0
   ? __do_page_fault+0x44f/0xa50
   ? mm_fault_error+0x2d0/0x2d0
   do_group_exit+0xde/0x300
   __x64_sys_exit_group+0x3a/0x50
   do_syscall_64+0x9a/0x300
   ? page_fault+0x8/0x30
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

This happened through fault injection where aead_req allocation in
tls_do_encryption() eventually failed and we returned -ENOMEM from
the function. Turns out that the use-after-free is triggered from
tls_sw_sendmsg() in the second tls_push_record(). The error then
triggers a jump to waiting for memory in sk_stream_wait_memory()
resp. returning immediately in case of MSG_DONTWAIT. What follows is
the trim_both_sgl(sk, orig_size), which drops elements from the sg
list added via tls_sw_sendmsg(). Now the use-after-free gets triggered
when the socket is being closed, where tls_sk_proto_close() callback
is invoked. The tls_complete_pending_work() will figure that there's
a pending closed tls record to be flushed and thus calls into the
tls_push_pending_closed_record() from there. ctx->push_pending_record()
is called from the latter, which is the tls_sw_push_pending_record()
from sw path. This again calls into tls_push_record(). And here the
tls_fill_prepend() will panic since the buffer address has been freed
earlier via trim_both_sgl(). One way to fix it is to move the aead
request allocation out of tls_do_encryption() early into tls_push_record().
This means we don't prep the tls header and advance state to the
TLS_PENDING_CLOSED_RECORD before allocation which could potentially
fail happened. That fixes the issue on my side.

Fixes: 3c4d7559159b ("tls: kernel TLS support")
Reported-by: syzbot+5c74af81c547738e1684@syzkaller.appspotmail.com
Reported-by: syzbot+709f2810a6a05f11d4d3@syzkaller.appspotmail.com
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'l2tp-l2tp_ppp-must-ignore-non-PPP-sessions'
David S. Miller [Fri, 15 Jun 2018 16:12:37 +0000 (09:12 -0700)]
Merge branch 'l2tp-l2tp_ppp-must-ignore-non-PPP-sessions'

Guillaume Nault says:

====================
l2tp: l2tp_ppp must ignore non-PPP sessions

The original L2TP code was written for version 2 of the protocol, which
could only carry PPP sessions. Then L2TPv3 generalised the protocol so that
it could transport different kinds of pseudo-wires. But parts of the
l2tp_ppp module still break in presence of non-PPP sessions.

Assuming L2TPv2 tunnels can only transport PPP sessions is right, but
l2tp_netlink failed to ensure that (fixed in patch 1).
When retrieving a session from an arbitrary tunnel, l2tp_ppp needs to
filter out non-PPP sessions (last occurrence fixed in patch 2).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agol2tp: filter out non-PPP sessions in pppol2tp_tunnel_ioctl()
Guillaume Nault [Fri, 15 Jun 2018 13:39:19 +0000 (15:39 +0200)]
l2tp: filter out non-PPP sessions in pppol2tp_tunnel_ioctl()

pppol2tp_tunnel_ioctl() can act on an L2TPv3 tunnel, in which case
'session' may be an Ethernet pseudo-wire.

However, pppol2tp_session_ioctl() expects a PPP pseudo-wire, as it
assumes l2tp_session_priv() points to a pppol2tp_session structure. For
an Ethernet pseudo-wire l2tp_session_priv() points to an l2tp_eth_sess
structure instead, making pppol2tp_session_ioctl() access invalid
memory.

Fixes: d9e31d17ceba ("l2tp: Add L2TP ethernet pseudowire support")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agol2tp: reject creation of non-PPP sessions on L2TPv2 tunnels
Guillaume Nault [Fri, 15 Jun 2018 13:39:17 +0000 (15:39 +0200)]
l2tp: reject creation of non-PPP sessions on L2TPv2 tunnels

The /proc/net/pppol2tp handlers (pppol2tp_seq_*()) iterate over all
L2TPv2 tunnels, and rightfully expect that only PPP sessions can be
found there. However, l2tp_netlink accepts creating Ethernet sessions
regardless of the underlying tunnel version.

This confuses pppol2tp_seq_session_show(), which expects that
l2tp_session_priv() returns a pppol2tp_session structure. When the
session is an Ethernet pseudo-wire, a struct l2tp_eth_sess is returned
instead. This leads to invalid memory access when
pppol2tp_session_get_sock() later tries to dereference ps->sk.

Fixes: d9e31d17ceba ("l2tp: Add L2TP ethernet pseudowire support")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'mlxsw-IPv6-and-reference-counting-fixes'
David S. Miller [Fri, 15 Jun 2018 16:11:17 +0000 (09:11 -0700)]
Merge branch 'mlxsw-IPv6-and-reference-counting-fixes'

Ido Schimmel says:

====================
mlxsw: IPv6 and reference counting fixes

The first three patches fix a mismatch between the new IPv6 behavior
introduced in commit f34436a43092 ("net/ipv6: Simplify route replace and
appending into multipath route") and mlxsw. The patches allow the driver
to support multipathing in IPv6 overlays with GRE tunnel devices. A
selftest will be submitted when net-next opens.

The last patch fixes a reference count problem of the port_vlan struct.
I plan to simplify the code in net-next, so that reference counting is
not necessary anymore.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agomlxsw: spectrum_switchdev: Fix port_vlan refcounting
Petr Machata [Fri, 15 Jun 2018 13:23:38 +0000 (16:23 +0300)]
mlxsw: spectrum_switchdev: Fix port_vlan refcounting

Switchdev notifications for addition of SWITCHDEV_OBJ_ID_PORT_VLAN are
distributed not only on clean addition, but also when flags on an
existing VLAN are changed. mlxsw_sp_bridge_port_vlan_add() calls
mlxsw_sp_port_vlan_get() to get at the port_vlan in question, which
implicitly references the object. This then leads to discrepancies in
reference counting when the VLAN is removed. spectrum.c warns about the
problem when the module is removed:

[13578.493090] WARNING: CPU: 0 PID: 2454 at drivers/net/ethernet/mellanox/mlxsw/spectrum.c:2973 mlxsw_sp_port_remove+0xfd/0x110 [mlxsw_spectrum]
[...]
[13578.627106] Call Trace:
[13578.629617]  mlxsw_sp_fini+0x2a/0xe0 [mlxsw_spectrum]
[13578.634748]  mlxsw_core_bus_device_unregister+0x3e/0x130 [mlxsw_core]
[13578.641290]  mlxsw_pci_remove+0x13/0x40 [mlxsw_pci]
[13578.646238]  pci_device_remove+0x31/0xb0
[13578.650244]  device_release_driver_internal+0x14f/0x220
[13578.655562]  driver_detach+0x32/0x70
[13578.659183]  bus_remove_driver+0x47/0xa0
[13578.663134]  pci_unregister_driver+0x1e/0x80
[13578.667486]  mlxsw_sp_module_exit+0xc/0x3fa [mlxsw_spectrum]
[13578.673207]  __x64_sys_delete_module+0x13b/0x1e0
[13578.677888]  ? exit_to_usermode_loop+0x78/0x80
[13578.682374]  do_syscall_64+0x39/0xe0
[13578.685976]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fix by putting the port_vlan when mlxsw_sp_port_vlan_bridge_join()
determines it's a flag-only change.

Fixes: b3529af6bb0d ("spectrum: Reference count VLAN entries")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agomlxsw: spectrum_router: Align with new route replace logic
Ido Schimmel [Fri, 15 Jun 2018 13:23:37 +0000 (16:23 +0300)]
mlxsw: spectrum_router: Align with new route replace logic

Commit f34436a43092 ("net/ipv6: Simplify route replace and appending
into multipath route") changed the IPv6 route replace logic so that the
first matching route (i.e., same metric) is replaced.

Have mlxsw replace the first matching route as well.

Fixes: f34436a43092 ("net/ipv6: Simplify route replace and appending into multipath route")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agomlxsw: spectrum_router: Allow appending to dev-only routes
Ido Schimmel [Fri, 15 Jun 2018 13:23:36 +0000 (16:23 +0300)]
mlxsw: spectrum_router: Allow appending to dev-only routes

Commit f34436a43092 ("net/ipv6: Simplify route replace and appending
into multipath route") changed the IPv6 route append logic so that
dev-only routes can be appended and not only gatewayed routes.

Align mlxsw with the new behaviour.

Fixes: f34436a43092 ("net/ipv6: Simplify route replace and appending into multipath route")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoipv6: Only emit append events for appended routes
Ido Schimmel [Fri, 15 Jun 2018 13:23:35 +0000 (16:23 +0300)]
ipv6: Only emit append events for appended routes

Current code will emit an append event in the FIB notification chain for
any route added with NLM_F_APPEND set, even if the route was not
appended to any existing route.

This is inconsistent with IPv4 where such an event is only emitted when
the new route is appended after an existing one.

Align IPv6 behavior with IPv4, thereby allowing listeners to more easily
handle these events.

Fixes: f34436a43092 ("net/ipv6: Simplify route replace and appending into multipath route")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge tag 'mac80211-for-davem-2018-06-15' of git://git.kernel.org/pub/scm/linux/kerne...
David S. Miller [Fri, 15 Jun 2018 16:08:26 +0000 (09:08 -0700)]
Merge tag 'mac80211-for-davem-2018-06-15' of git://git./linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
A handful of fixes:
 * missing RCU grace period enforcement led to drivers freeing
   data structures before; fix from Dedy Lansky.
 * hwsim module init error paths were messed up; fixed it myself
   after a report from Colin King (who had sent a partial patch)
 * kernel-doc tag errors; fix from Luca Coelho
 * initialize the on-stack sinfo data structure when getting
   station information; fix from Sven Eckelmann
 * TXQ state dumping is now done from init, and when TXQs aren't
   initialized yet at that point, bad things happen, move the
   initialization; fix from Toke Høiland-Jørgensen.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agostmmac: added support for 802.1ad vlan stripping
Elad Nachman [Fri, 15 Jun 2018 06:57:39 +0000 (09:57 +0300)]
stmmac: added support for 802.1ad vlan stripping

stmmac reception handler calls stmmac_rx_vlan() to strip the vlan before
calling napi_gro_receive().

The function assumes VLAN tagged frames are always tagged with
802.1Q protocol, and assigns ETH_P_8021Q to the skb by hard-coding
the parameter on call to __vlan_hwaccel_put_tag() .

This causes packets not to be passed to the VLAN slave if it was created
with 802.1AD protocol
(ip link add link eth0 eth0.100 type vlan proto 802.1ad id 100).

This fix passes the protocol from the VLAN header into
__vlan_hwaccel_put_tag() instead of using the hard-coded value of
ETH_P_8021Q.

NETIF_F_HW_VLAN_STAG_RX check was added and the strip action is now
dependent on the correct combination of features and the detected vlan tag.

NETIF_F_HW_VLAN_STAG_RX feature was added to be in line with the driver
actual abilities.

Signed-off-by: Elad Nachman <eladn@gilat.com>
Reviewed-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agocfg80211: fix rcu in cfg80211_unregister_wdev
Dedy Lansky [Fri, 15 Jun 2018 11:05:01 +0000 (13:05 +0200)]
cfg80211: fix rcu in cfg80211_unregister_wdev

Callers of cfg80211_unregister_wdev can free the wdev object
immediately after this function returns. This may crash the kernel
because this wdev object is still in use by other threads.
Add synchronize_rcu() after list_del_rcu to make sure wdev object can
be safely freed.

Signed-off-by: Dedy Lansky <dlansky@codeaurora.org>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
3 years agomac80211: Move up init of TXQs
Toke Høiland-Jørgensen [Fri, 25 May 2018 12:29:21 +0000 (14:29 +0200)]
mac80211: Move up init of TXQs

On init, ieee80211_if_add() dumps the interface. Since that now includes a
dump of the TXQ state, we need to initialise that before the dump happens.
So move up the TXQ initialisation to to before the call to
ieee80211_if_add().

Fixes: 52539ca89f36 ("cfg80211: Expose TXQ stats and parameters to userspace")
Reported-by: Niklas Cassel <niklas.cassel@linaro.org>
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Tested-by: Niklas Cassel <niklas.cassel@linaro.org>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
3 years agomac80211_hwsim: fix module init error paths
Johannes Berg [Tue, 29 May 2018 10:04:51 +0000 (12:04 +0200)]
mac80211_hwsim: fix module init error paths

We didn't free the workqueue on any errors, nor did we
correctly check for rhashtable allocation errors, nor
did we free the hashtable on error.

Reported-by: Colin King <colin.king@canonical.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
3 years agocfg80211: initialize sinfo in cfg80211_get_station
Sven Eckelmann [Wed, 6 Jun 2018 08:53:55 +0000 (10:53 +0200)]
cfg80211: initialize sinfo in cfg80211_get_station

Most of the implementations behind cfg80211_get_station will not initialize
sinfo to zero before manipulating it. For example, the member "filled",
which indicates the filled in parts of this struct, is often only modified
by enabling certain bits in the bitfield while keeping the remaining bits
in their original state. A caller without a preinitialized sinfo.filled can
then no longer decide which parts of sinfo were filled in by
cfg80211_get_station (or actually the underlying implementations).

cfg80211_get_station must therefore take care that sinfo is initialized to
zero. Otherwise, the caller may tries to read information which was not
filled in and which must therefore also be considered uninitialized. In
batadv_v_elp_get_throughput's case, an invalid "random" expected throughput
may be stored for this neighbor and thus the B.A.T.M.A.N V algorithm may
switch to non-optimal neighbors for certain destinations.

Fixes: 7406353d43c8 ("cfg80211: implement cfg80211_get_station cfg80211 API")
Reported-by: Thomas Lauer <holminateur@gmail.com>
Reported-by: Marcel Schmidt <ff.z-casparistrasse@mailbox.org>
Cc: b.a.t.m.a.n@lists.open-mesh.org
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
3 years agonl80211: fix some kernel doc tag mistakes
Luca Coelho [Fri, 8 Jun 2018 07:04:47 +0000 (10:04 +0300)]
nl80211: fix some kernel doc tag mistakes

There is a bunch of tags marking constants with &, which means struct
or enum name.  Replace them with %, which is the correct tag for
constants.

Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
3 years agoMerge tag 'linux-kselftest-4.18-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 15 Jun 2018 08:26:29 +0000 (17:26 +0900)]
Merge tag 'linux-kselftest-4.18-rc1-2' of git://git./linux/kernel/git/shuah/linux-kselftest

Pull more Kselftest updates from Shuah Khan:

 - fix a signedness bug in cgroups test

 - add ppc support for kprobe args tests

* tag 'linux-kselftest-4.18-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  kselftest/cgroup: fix a signedness bug
  selftests/ftrace: Add ppc support for kprobe args tests

3 years agoMerge tag 'sound-fix-4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai...
Linus Torvalds [Fri, 15 Jun 2018 08:24:40 +0000 (17:24 +0900)]
Merge tag 'sound-fix-4.18-rc1' of git://git./linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "Here is a collection of small fixes on top of the previous update.

  All small and obvious fixes. Mostly for usual suspects, USB-audio and
  HD-audio, but a few trivial error handling fixes for misc drivers as
  well"

* tag 'sound-fix-4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: usb-audio: Always create the interrupt pipe for the mixer
  ALSA: usb-audio: Add insertion control for UAC3 BADD
  ALSA: usb-audio: Change in connectors control creation interface
  ALSA: usb-audio: Add bi-directional terminal types
  ALSA: lx6464es: add error handling for pci_ioremap_bar
  ALSA: sonicvibes: add error handling for snd_ctl_add
  ALSA: usb-audio: Remove explicitly listed Mytek devices
  ALSA: usb-audio: Generic DSD detection for XMOS-based implementations
  ALSA: usb-audio: Add native DSD support for Mytek DACs
  ALSA: hda/realtek - Add shutup hint
  ALSA: usb-audio: Disable the quirk for Nura headset
  ALSA: hda: add dock and led support for HP ProBook 640 G4
  ALSA: hda: add dock and led support for HP EliteBook 830 G5
  ALSA: emu10k1: add error handling for snd_ctl_add
  ALSA: fm801: add error handling for snd_ctl_add

3 years agoMerge tag 'drm-next-2018-06-15' of git://anongit.freedesktop.org/drm/drm
Linus Torvalds [Fri, 15 Jun 2018 08:20:53 +0000 (17:20 +0900)]
Merge tag 'drm-next-2018-06-15' of git://anongit.freedesktop.org/drm/drm

Pull amd drm fixes from Dave Airlie:
 "Just a single set of AMD fixes for stuff in -next for -rc1"

* tag 'drm-next-2018-06-15' of git://anongit.freedesktop.org/drm/drm: (47 commits)
  drm/amd/powerplay: Set higher SCLK&MCLK frequency than dpm7 in OD (v2)
  drm/amd/powerplay: remove uncessary extra gfxoff control call
  drm/amdgpu: fix parsing indirect register list v2
  drm/amd/include: Update df 3.6 mask and shift definition
  drm/amd/pp: Fix OD feature enable failed on Vega10 workstation cards
  drm/amd/display: Fix stale buffer object (bo) use
  drm/amd/pp: initialize result to before or'ing in data
  drm/amd/powerplay: fix wrong clock adjust sequence
  drm/amdgpu: Grab/put runtime PM references in atomic_commit_tail()
  drm/amd/powerplay: fix missed hwmgr check warning before call gfx_off_control handler
  drm/amdgpu: fix CG enabling hang with gfxoff enabled
  drm/amdgpu: fix clear_all and replace handling in the VM (v2)
  drm/amdgpu: add checking for sos version
  drm/amdgpu: fix the missed vcn fw version report
  Revert "drm/amdgpu: Add an ATPX quirk for hybrid laptop"
  drm/amdgpu/df: fix potential array out-of-bounds read
  drm/amdgpu: Fix NULL pointer when load kfd driver with PP block is disabled
  drm/gfx9: Update gc goldensetting for vega20.
  drm/amd/pp: Allow underclocking when od table is empty in vbios
  drm/amdgpu/display: check if ppfuncs exists before using it
  ...

3 years agoorangefs: simplify compat ioctl handling
Al Viro [Sun, 27 May 2018 12:52:48 +0000 (08:52 -0400)]
orangefs: simplify compat ioctl handling

no need to mess with copy_in_user(), etc...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
3 years agosignalfd: lift sigmask copyin and size checks to callers of do_signalfd4()
Al Viro [Sun, 27 May 2018 12:35:50 +0000 (08:35 -0400)]
signalfd: lift sigmask copyin and size checks to callers of do_signalfd4()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
3 years agohv_netvsc: Fix the variable sizes in ipsecv2 and rsc offload
Haiyang Zhang [Fri, 15 Jun 2018 01:29:09 +0000 (18:29 -0700)]
hv_netvsc: Fix the variable sizes in ipsecv2 and rsc offload

These fields in struct ndis_ipsecv2_offload and struct ndis_rsc_offload
are one byte according to the specs. This patch defines them with the
right size. These structs are not in use right now, but will be used soon.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agords: avoid unenecessary cong_update in loop transport
Santosh Shilimkar [Thu, 14 Jun 2018 18:52:34 +0000 (11:52 -0700)]
rds: avoid unenecessary cong_update in loop transport

Loop transport which is self loopback, remote port congestion
update isn't relevant. Infact the xmit path already ignores it.
Receive path needs to do the same.

Reported-by: syzbot+4c20b3866171ce8441d2@syzkaller.appspotmail.com
Reviewed-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'drm-next-4.18' of git://people.freedesktop.org/~agd5f/linux into drm...
Dave Airlie [Fri, 15 Jun 2018 01:32:23 +0000 (11:32 +1000)]
Merge branch 'drm-next-4.18' of git://people.freedesktop.org/~agd5f/linux into drm-next

Fixes for 4.18. Highlights:
- Fixes for gfxoff on Raven
- Remove an ATPX quirk now that the root cause is fixed
- Runtime PM fixes
- Vega20 register header update
- Wattman fixes
- Misc bug fixes

Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180614141428.2909-1-alexander.deucher@amd.com
3 years agoMerge branch 'l2tp-fixes'
David S. Miller [Fri, 15 Jun 2018 00:10:19 +0000 (17:10 -0700)]
Merge branch 'l2tp-fixes'

Guillaume Nault says:

====================
l2tp: pppol2tp_connect() fixes

This series fixes a few remaining issues with pppol2tp_connect().

It doesn't try to prevent invalid configurations that have no effect on
kernel's reliability. That would be work for a future patch set.

Patch 2 is the most important as it avoids an invalid pointer
dereference crashing the kernel. It depends on patch 1 for correctly
identifying L2TP session types.

Patches 3 and 4 avoid creating stale tunnels and sessions.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agol2tp: clean up stale tunnel or session in pppol2tp_connect's error path
Guillaume Nault [Wed, 13 Jun 2018 13:09:21 +0000 (15:09 +0200)]
l2tp: clean up stale tunnel or session in pppol2tp_connect's error path

pppol2tp_connect() may create a tunnel or a session. Remove them in
case of error.

Fixes: fd558d186df2 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agol2tp: prevent pppol2tp_connect() from creating kernel sockets
Guillaume Nault [Wed, 13 Jun 2018 13:09:20 +0000 (15:09 +0200)]
l2tp: prevent pppol2tp_connect() from creating kernel sockets

If 'fd' is negative, l2tp_tunnel_create() creates a tunnel socket using
the configuration passed in 'tcfg'. Currently, pppol2tp_connect() sets
the relevant fields to zero, tricking l2tp_tunnel_create() into setting
up an unusable kernel socket.

We can't set 'tcfg' with the required fields because there's no way to
get them from the current connect() parameters. So let's restrict
kernel sockets creation to the netlink API, which is the original use
case.

Fixes: 789a4a2c61d8 ("l2tp: Add support for static unmanaged L2TPv3 tunnels")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agol2tp: only accept PPP sessions in pppol2tp_connect()
Guillaume Nault [Wed, 13 Jun 2018 13:09:19 +0000 (15:09 +0200)]
l2tp: only accept PPP sessions in pppol2tp_connect()

l2tp_session_priv() returns a struct pppol2tp_session pointer only for
PPPoL2TP sessions. In particular, if the session is an L2TP_PWTYPE_ETH
pseudo-wire, l2tp_session_priv() returns a pointer to an l2tp_eth_sess
structure, which is much smaller than struct pppol2tp_session. This
leads to invalid memory dereference when trying to lock ps->sk_lock.

Fixes: d9e31d17ceba ("l2tp: Add L2TP ethernet pseudowire support")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agol2tp: fix pseudo-wire type for sessions created by pppol2tp_connect()
Guillaume Nault [Wed, 13 Jun 2018 13:09:18 +0000 (15:09 +0200)]
l2tp: fix pseudo-wire type for sessions created by pppol2tp_connect()

Define cfg.pw_type so that the new session is created with its .pwtype
field properly set (L2TP_PWTYPE_PPP).

Not setting the pseudo-wire type had several annoying effects:

  * Invalid value returned in the L2TP_ATTR_PW_TYPE attribute when
    dumping sessions with the netlink API.

  * Impossibility to delete the session using the netlink API (because
    l2tp_nl_cmd_session_delete() gets the deletion callback function
    from an array indexed by the session's pseudo-wire type).

Also, there are several cases where we should check a session's
pseudo-wire type. For example, pppol2tp_connect() should refuse to
connect a session that is not PPPoL2TP, but that requires the session's
.pwtype field to be properly set.

Fixes: f7faffa3ff8e ("l2tp: Add L2TPv3 protocol support")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoeventpoll: switch to ->poll_mask
Ben Noordhuis [Thu, 14 Jun 2018 22:32:07 +0000 (00:32 +0200)]
eventpoll: switch to ->poll_mask

Signed-off-by: Ben Noordhuis <info@bnoordhuis.nl>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
3 years agoaio: only return events requested in poll_mask() for IOCB_CMD_POLL
Christoph Hellwig [Mon, 11 Jun 2018 06:50:10 +0000 (08:50 +0200)]
aio: only return events requested in poll_mask() for IOCB_CMD_POLL

The ->poll_mask() operation has a mask of events that the caller
is interested in, but not all implementations might take it into
account.  Mask the return value to only the requested events,
similar to what the poll and epoll code does.

Reported-by: Avi Kivity <avi@scylladb.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
3 years agoMerge branch 'emaclite-fixes'
David S. Miller [Fri, 15 Jun 2018 00:08:04 +0000 (17:08 -0700)]
Merge branch 'emaclite-fixes'

Radhey Shyam Pandey says:

====================
emaclite bug fixes and code cleanup

This patch series fixes bug in emaclite remove and mdio_setup routines.
It does minor code cleanup.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: emaclite: Remove xemaclite_mdio_setup return check
Radhey Shyam Pandey [Wed, 13 Jun 2018 06:35:19 +0000 (12:05 +0530)]
net: emaclite: Remove xemaclite_mdio_setup return check

Errors are already reported in xemaclite_mdio_setup so avoid
reporting it again.

Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: emaclite: Remove unused 'has_mdio' flag.
Radhey Shyam Pandey [Wed, 13 Jun 2018 06:35:18 +0000 (12:05 +0530)]
net: emaclite: Remove unused 'has_mdio' flag.

Remove unused 'has_mdio' flag.

Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: emaclite: Fix MDIO bus unregister bug
Radhey Shyam Pandey [Wed, 13 Jun 2018 06:35:17 +0000 (12:05 +0530)]
net: emaclite: Fix MDIO bus unregister bug

Since 'has_mdio' flag is not used,sequence insmod->rmmod-> insmod
leads to failure as MDIO unregister doesn't happen in .remove().
Fix it by checking MII bus pointer instead.

Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: emaclite: Fix position of lp->mii_bus assignment
Radhey Shyam Pandey [Wed, 13 Jun 2018 06:35:16 +0000 (12:05 +0530)]
net: emaclite: Fix position of lp->mii_bus assignment

To ensure MDIO bus is not double freed in remove() path
assign lp->mii_bus after MDIO bus registration.

Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoeventfd: only return events requested in poll_mask()
Avi Kivity [Fri, 8 Jun 2018 19:12:32 +0000 (22:12 +0300)]
eventfd: only return events requested in poll_mask()

The ->poll_mask() operation has a mask of events that the caller
is interested in, but we're returning all events regardless.

Change to return only the events the caller is interested in. This
fixes aio IO_CMD_POLL returning immediately when called with POLLIN
on an eventfd, since an eventfd is almost always ready for a write.

Signed-off-by: Avi Kivity <avi@scylladb.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
3 years agoaio: mark __aio_sigset::sigmask const
Avi Kivity [Fri, 8 Jun 2018 14:55:05 +0000 (17:55 +0300)]
aio: mark __aio_sigset::sigmask const

io_pgetevents() will not change the signal mask. Mark it const
to make it clear and to reduce the need for casts in user code.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Avi Kivity <avi@scylladb.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
3 years agotcp: verify the checksum of the first data segment in a new connection
Frank van der Linden [Tue, 12 Jun 2018 23:09:37 +0000 (23:09 +0000)]
tcp: verify the checksum of the first data segment in a new connection

commit 079096f103fa ("tcp/dccp: install syn_recv requests into ehash
table") introduced an optimization for the handling of child sockets
created for a new TCP connection.

But this optimization passes any data associated with the last ACK of the
connection handshake up the stack without verifying its checksum, because it
calls tcp_child_process(), which in turn calls tcp_rcv_state_process()
directly.  These lower-level processing functions do not do any checksum
verification.

Insert a tcp_checksum_complete call in the TCP_NEW_SYN_RECEIVE path to
fix this.

Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table")
Signed-off-by: Frank van der Linden <fllinden@amazon.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Balbir Singh <bsingharora@gmail.com>
Reviewed-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: qcom/emac: Add missing of_node_put()
YueHaibing [Mon, 11 Jun 2018 13:03:45 +0000 (21:03 +0800)]
net: qcom/emac: Add missing of_node_put()

Add missing of_node_put() call for device node returned by
of_parse_phandle().

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'akpm' (patches from Andrew)
Linus Torvalds [Thu, 14 Jun 2018 23:51:42 +0000 (08:51 +0900)]
Merge branch 'akpm' (patches from Andrew)

Merge more updates from Andrew Morton:

 - MM remainders

 - various misc things

 - kcov updates

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (27 commits)
  lib/test_printf.c: call wait_for_random_bytes() before plain %p tests
  hexagon: drop the unused variable zero_page_mask
  hexagon: fix printk format warning in setup.c
  mm: fix oom_kill event handling
  treewide: use PHYS_ADDR_MAX to avoid type casting ULLONG_MAX
  mm: use octal not symbolic permissions
  ipc: use new return type vm_fault_t
  sysvipc/sem: mitigate semnum index against spectre v1
  fault-injection: reorder config entries
  arm: port KCOV to arm
  sched/core / kcov: avoid kcov_area during task switch
  kcov: prefault the kcov_area
  kcov: ensure irq code sees a valid area
  kernel/relay.c: change return type to vm_fault_t
  exofs: avoid VLA in structures
  coredump: fix spam with zero VMA process
  fat: use fat_fs_error() instead of BUG_ON() in __fat_get_block()
  proc: skip branch in /proc/*/* lookup
  mremap: remove LATENCY_LIMIT from mremap to reduce the number of TLB shootdowns
  mm/memblock: add missing include <linux/bootmem.h>
  ...

3 years agolib/test_printf.c: call wait_for_random_bytes() before plain %p tests
Thierry Escande [Thu, 14 Jun 2018 22:28:15 +0000 (15:28 -0700)]
lib/test_printf.c: call wait_for_random_bytes() before plain %p tests

If the test_printf module is loaded before the crng is initialized, the
plain 'p' tests will fail because the printed address will not be hashed
and the buffer will contain '(ptrval)' instead.

This patch adds a call to wait_for_random_bytes() before plain 'p' tests
to make sure the crng is initialized.

Link: http://lkml.kernel.org/r/20180604113708.11554-1-thierry.escande@linaro.org
Signed-off-by: Thierry Escande <thierry.escande@linaro.org>
Acked-by: Tobin C. Harding <me@tobin.cc>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: David Miller <davem@davemloft.net>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agohexagon: drop the unused variable zero_page_mask
Anshuman Khandual [Thu, 14 Jun 2018 22:28:12 +0000 (15:28 -0700)]
hexagon: drop the unused variable zero_page_mask

Hexagon arch does not seem to have subscribed to _HAVE_COLOR_ZERO_PAGE
framework.  Hence zero_page_mask variable is not needed.

Link: http://lkml.kernel.org/r/20180517061105.30447-1-khandual@linux.vnet.ibm.com
Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agohexagon: fix printk format warning in setup.c
Randy Dunlap [Thu, 14 Jun 2018 22:28:09 +0000 (15:28 -0700)]
hexagon: fix printk format warning in setup.c

Fix printk format warning in hexagon/kernel/setup.c:

../arch/hexagon/kernel/setup.c: In function 'setup_arch':
../arch/hexagon/kernel/setup.c:69:2: warning: format '%x' expects argument of type 'unsigned int', but argument 2 has type 'long unsigned int' [-Wformat]

where:
extern unsigned long __phys_offset;
#define PHYS_OFFSET __phys_offset

Link: http://lkml.kernel.org/r/adce8db5-4b01-dc10-7fbb-6a64e0787eb5@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomm: fix oom_kill event handling
Roman Gushchin [Thu, 14 Jun 2018 22:28:05 +0000 (15:28 -0700)]
mm: fix oom_kill event handling

Commit e27be240df53 ("mm: memcg: make sure memory.events is uptodate
when waking pollers") converted most of memcg event counters to
per-memcg atomics, which made them less confusing for a user.  The
"oom_kill" counter remained untouched, so now it behaves differently
than other counters (including "oom").  This adds nothing but confusion.

Let's fix this by adding the MEMCG_OOM_KILL event, and follow the
MEMCG_OOM approach.

This also removes a hack from count_memcg_event_mm(), introduced earlier
specially for the OOM_KILL counter.

[akpm@linux-foundation.org: fix for droppage of memcg-replace-mm-owner-with-mm-memcg.patch]
Link: http://lkml.kernel.org/r/20180508124637.29984-1-guro@fb.com
Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agotreewide: use PHYS_ADDR_MAX to avoid type casting ULLONG_MAX
Stefan Agner [Thu, 14 Jun 2018 22:28:02 +0000 (15:28 -0700)]
treewide: use PHYS_ADDR_MAX to avoid type casting ULLONG_MAX

With PHYS_ADDR_MAX there is now a type safe variant for all bits set.
Make use of it.

Patch created using a semantic patch as follows:

// <smpl>
@@
typedef phys_addr_t;
@@
-(phys_addr_t)ULLONG_MAX
+PHYS_ADDR_MAX
// </smpl>

Link: http://lkml.kernel.org/r/20180419214204.19322-1-stefan@agner.ch
Signed-off-by: Stefan Agner <stefan@agner.ch>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomm: use octal not symbolic permissions
Joe Perches [Thu, 14 Jun 2018 22:27:58 +0000 (15:27 -0700)]
mm: use octal not symbolic permissions

mm/*.c files use symbolic and octal styles for permissions.

Using octal and not symbolic permissions is preferred by many as more
readable.

https://lkml.org/lkml/2016/8/2/1945

Prefer the direct use of octal for permissions.

Done using
$ scripts/checkpatch.pl -f --types=SYMBOLIC_PERMS --fix-inplace mm/*.c
and some typing.

Before:  $ git grep -P -w "0[0-7]{3,3}" mm | wc -l
44
After:  $ git grep -P -w "0[0-7]{3,3}" mm | wc -l
86

Miscellanea:

o Whitespace neatening around these conversions.

Link: http://lkml.kernel.org/r/2e032ef111eebcd4c5952bae86763b541d373469.1522102887.git.joe@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoipc: use new return type vm_fault_t
Souptick Joarder [Thu, 14 Jun 2018 22:27:55 +0000 (15:27 -0700)]
ipc: use new return type vm_fault_t

Use new return type vm_fault_t for fault handler.  For now, this is just
documenting that the function returns a VM_FAULT value rather than an
errno.  Once all instances are converted, vm_fault_t will become a
distinct type.

Commit 1c8f422059ae ("mm: change return type to vm_fault_t")

Link: http://lkml.kernel.org/r/20180425043413.GA21467@jordon-HP-15-Notebook-PC
Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agosysvipc/sem: mitigate semnum index against spectre v1
Davidlohr Bueso [Thu, 14 Jun 2018 22:27:51 +0000 (15:27 -0700)]
sysvipc/sem: mitigate semnum index against spectre v1

Both smatch and coverity are reporting potential issues with spectre
variant 1 with the 'semnum' index within the sma->sems array, ie:

  ipc/sem.c:388 sem_lock() warn: potential spectre issue 'sma->sems'
  ipc/sem.c:641 perform_atomic_semop_slow() warn: potential spectre issue 'sma->sems'
  ipc/sem.c:721 perform_atomic_semop() warn: potential spectre issue 'sma->sems'

Avoid any possible speculation by using array_index_nospec() thus
ensuring the semnum value is bounded to [0, sma->sem_nsems).  With the
exception of sem_lock() all of these are slowpaths.

Link: http://lkml.kernel.org/r/20180423171131.njs4rfm2yzyeg6do@linux-n805
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Gustavo A. R. Silva" <gustavo@embeddedor.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agofault-injection: reorder config entries
Mikulas Patocka [Thu, 14 Jun 2018 22:27:48 +0000 (15:27 -0700)]
fault-injection: reorder config entries

Reorder Kconfig entries, so that menuconfig displays proper indentation.

Link: http://lkml.kernel.org/r/alpine.LRH.2.02.1804251601160.30569@file01.intranet.prod.int.rdu2.redhat.com
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoarm: port KCOV to arm
Dmitry Vyukov [Thu, 14 Jun 2018 22:27:44 +0000 (15:27 -0700)]
arm: port KCOV to arm

KCOV is code coverage collection facility used, in particular, by
syzkaller system call fuzzer.  There is some interest in using syzkaller
on arm devices.  So port KCOV to arm.

On implementation level this merely declares that KCOV is supported and
disables instrumentation of 3 special cases.  Reasons for disabling are
commented in code.

Tested with qemu-system-arm/vexpress-a15.

Link: http://lkml.kernel.org/r/20180511143248.112484-1-dvyukov@google.com
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Abbott Liu <liuwenliang@huawei.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Koguchi Takuo <takuo.koguchi.sw@hitachi.com>
Cc: <syzkaller@googlegroups.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agosched/core / kcov: avoid kcov_area during task switch
Mark Rutland [Thu, 14 Jun 2018 22:27:41 +0000 (15:27 -0700)]
sched/core / kcov: avoid kcov_area during task switch

During a context switch, we first switch_mm() to the next task's mm,
then switch_to() that new task.  This means that vmalloc'd regions which
had previously been faulted in can transiently disappear in the context
of the prev task.

Functions instrumented by KCOV may try to access a vmalloc'd kcov_area
during this window, and as the fault handling code is instrumented, this
results in a recursive fault.

We must avoid accessing any kcov_area during this window.  We can do so
with a new flag in kcov_mode, set prior to switching the mm, and cleared
once the new task is live.  Since task_struct::kcov_mode isn't always a
specific enum kcov_mode value, this is made an unsigned int.

The manipulation is hidden behind kcov_{prepare,finish}_switch() helpers,
which are empty for !CONFIG_KCOV kernels.

The code uses macros because I can't use static inline functions without a
circular include dependency between <linux/sched.h> and <linux/kcov.h>,
since the definition of task_struct uses things defined in <linux/kcov.h>

Link: http://lkml.kernel.org/r/20180504135535.53744-4-mark.rutland@arm.com
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agokcov: prefault the kcov_area
Mark Rutland [Thu, 14 Jun 2018 22:27:37 +0000 (15:27 -0700)]
kcov: prefault the kcov_area

On many architectures the vmalloc area is lazily faulted in upon first
access.  This is problematic for KCOV, as __sanitizer_cov_trace_pc
accesses the (vmalloc'd) kcov_area, and fault handling code may be
instrumented.  If an access to kcov_area faults, this will result in
mutual recursion through the fault handling code and
__sanitizer_cov_trace_pc(), eventually leading to stack corruption
and/or overflow.

We can avoid this by faulting in the kcov_area before
__sanitizer_cov_trace_pc() is permitted to access it.  Once it has been
faulted in, it will remain present in the process page tables, and will
not fault again.

[akpm@linux-foundation.org: code cleanup]
[akpm@linux-foundation.org: add comment explaining kcov_fault_in_area()]
[akpm@linux-foundation.org: fancier code comment from Mark]
Link: http://lkml.kernel.org/r/20180504135535.53744-3-mark.rutland@arm.com
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agokcov: ensure irq code sees a valid area
Mark Rutland [Thu, 14 Jun 2018 22:27:34 +0000 (15:27 -0700)]
kcov: ensure irq code sees a valid area

Patch series "kcov: fix unexpected faults".

These patches fix a few issues where KCOV code could trigger recursive
faults, discovered while debugging a patch enabling KCOV for arch/arm:

* On CONFIG_PREEMPT kernels, there's a small race window where
  __sanitizer_cov_trace_pc() can see a bogus kcov_area.

* Lazy faulting of the vmalloc area can cause mutual recursion between
  fault handling code and __sanitizer_cov_trace_pc().

* During the context switch, switching the mm can cause the kcov_area to
  be transiently unmapped.

These are prerequisites for enabling KCOV on arm, but the issues
themsevles are generic -- we just happen to avoid them by chance rather
than design on x86-64 and arm64.

This patch (of 3):

For kernels built with CONFIG_PREEMPT, some C code may execute before or
after the interrupt handler, while the hardirq count is zero.  In these
cases, in_task() can return true.

A task can be interrupted in the middle of a KCOV_DISABLE ioctl while it
resets the task's kcov data via kcov_task_init().  Instrumented code
executed during this period will call __sanitizer_cov_trace_pc(), and as
in_task() returns true, will inspect t->kcov_mode before trying to write
to t->kcov_area.

In kcov_init_task() we update t->kcov_{mode,area,size} with plain stores,
which may be re-ordered, torn, etc.  Thus __sanitizer_cov_trace_pc() may
see bogus values for any of these fields, and may attempt to write to
memory which is not mapped.

Let's avoid this by using WRITE_ONCE() to set t->kcov_mode, with a
barrier() to ensure this is ordered before we clear t->kov_{area,size}.
This ensures that any code execute while kcov_init_task() is preempted
will either see valid values for t->kcov_{area,size}, or will see that
t->kcov_mode is KCOV_MODE_DISABLED, and bail out without touching
t->kcov_area.

Link: http://lkml.kernel.org/r/20180504135535.53744-2-mark.rutland@arm.com
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agokernel/relay.c: change return type to vm_fault_t
Souptick Joarder [Thu, 14 Jun 2018 22:27:31 +0000 (15:27 -0700)]
kernel/relay.c: change return type to vm_fault_t

Use new return type vm_fault_t for fault handler.  For now, this is just
documenting that the function returns a VM_FAULT value rather than an
errno.  Once all instances are converted, vm_fault_t will become a
distinct type.

commit 1c8f422059ae ("mm: change return type to vm_fault_t")

Link: http://lkml.kernel.org/r/20180510140335.GA25363@jordon-HP-15-Notebook-PC
Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Eric Biggers <ebiggers@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoexofs: avoid VLA in structures
Kees Cook [Thu, 14 Jun 2018 22:27:27 +0000 (15:27 -0700)]
exofs: avoid VLA in structures

On the quest to remove all VLAs from the kernel[1] this adjusts several
cases where allocation is made after an array of structures that points
back into the allocation.  The allocations are changed to perform
explicit calculations instead of using a Variable Length Array in a
structure.

Additionally, this lets Clang compile this code now, since Clang does
not support VLAIS[2].

[1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com
[2] https://lkml.kernel.org/r/CA+55aFy6h1c3_rP_bXFedsTXzwW+9Q9MfJaW7GUmMBrAp-fJ9A@mail.gmail.com

[keescook@chromium.org: v2]
Link: http://lkml.kernel.org/r/20180418163546.GA45794@beast
Link: http://lkml.kernel.org/r/20180327203904.GA1151@beast
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Cc: Boaz Harrosh <ooo@electrozaur.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agocoredump: fix spam with zero VMA process
Alexey Dobriyan [Thu, 14 Jun 2018 22:27:24 +0000 (15:27 -0700)]
coredump: fix spam with zero VMA process

Nobody ever tried to self destruct by unmapping whole address space at
once:

munmap((void *)0, (1ULL << 47) - 4096);

Doing this produces 2 warnings for zero-length vmalloc allocations:

  a.out[1353]: segfault at 7f80bcc4b757 ip 00007f80bcc4b757 sp 00007fff683939b8 error 14
  a.out: vmalloc: allocation failure: 0 bytes, mode:0xcc0(GFP_KERNEL), nodemask=(null)
...
  a.out: vmalloc: allocation failure: 0 bytes, mode:0xcc0(GFP_KERNEL), nodemask=(null)
...

Fix is to switch to kvmalloc().

Steps to reproduce:

// vsyscall=none
#include <sys/mman.h>
#include <sys/resource.h>
int main(void)
{
setrlimit(RLIMIT_CORE, &(struct rlimit){RLIM_INFINITY, RLIM_INFINITY});
munmap((void *)0, (1ULL << 47) - 4096);
return 0;
}

Link: http://lkml.kernel.org/r/20180410180353.GA2515@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agofat: use fat_fs_error() instead of BUG_ON() in __fat_get_block()
OGAWA Hirofumi [Thu, 14 Jun 2018 22:27:21 +0000 (15:27 -0700)]
fat: use fat_fs_error() instead of BUG_ON() in __fat_get_block()

If file size and FAT cluster chain is not matched (corrupted image), we
can hit BUG_ON(!phys) in __fat_get_block().

So, use fat_fs_error() instead.

[hirofumi@mail.parknet.co.jp: fix printk warning]
Link: http://lkml.kernel.org/r/87po12aq5p.fsf@mail.parknet.co.jp
Link: http://lkml.kernel.org/r/874lilcu67.fsf@mail.parknet.co.jp
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com>
Tested-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoproc: skip branch in /proc/*/* lookup
Alexey Dobriyan [Thu, 14 Jun 2018 22:27:17 +0000 (15:27 -0700)]
proc: skip branch in /proc/*/* lookup

Code is structured like this:

for ( ... p < last; p++) {
if (memcmp == 0)
break;
}
if (p >= last)
ERROR
OK

gcc doesn't see that if if lookup succeeds than post loop branch will
never be taken and skip it.

[akpm@linux-foundation.org: proc_pident_instantiate() no longer takes an inode*]
Link: http://lkml.kernel.org/r/20180423213954.GD9043@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomremap: remove LATENCY_LIMIT from mremap to reduce the number of TLB shootdowns
Mel Gorman [Thu, 14 Jun 2018 22:26:41 +0000 (15:26 -0700)]
mremap: remove LATENCY_LIMIT from mremap to reduce the number of TLB shootdowns

Commit 5d1904204c99 ("mremap: fix race between mremap() and page
cleanning") fixed races between mremap and other operations for both
file-backed and anonymous mappings.  The file-backed was the most
critical as it allowed the possibility that data could be changed on a
physical page after page_mkclean returned which could trigger data loss
or data integrity issues.

A customer reported that the cost of the TLBs for anonymous regressions
was excessive and resulting in a 30-50% drop in performance overall
since this commit on a microbenchmark.  Unfortunately I neither have
access to the test-case nor can I describe what it does other than
saying that mremap operations dominate heavily.

This patch removes the LATENCY_LIMIT to handle TLB flushes on a PMD
boundary instead of every 64 pages to reduce the number of TLB
shootdowns by a factor of 8 in the ideal case.  LATENCY_LIMIT was almost
certainly used originally to limit the PTL hold times but the latency
savings are likely offset by the cost of IPIs in many cases.  This patch
is not reported to completely restore performance but gets it within an
acceptable percentage.  The given metric here is simply described as
"higher is better".

Baseline that was known good
002:  Metric:       91.05
004:  Metric:      109.45
008:  Metric:       73.08
016:  Metric:       58.14
032:  Metric:       61.09
064:  Metric:       57.76
128:  Metric:       55.43

Current
001:  Metric:       54.98
002:  Metric:       56.56
004:  Metric:       41.22
008:  Metric:       35.96
016:  Metric:       36.45
032:  Metric:       35.71
064:  Metric:       35.73
128:  Metric:       34.96

With patch
001:  Metric:       61.43
002:  Metric:       81.64
004:  Metric:       67.92
008:  Metric:       51.67
016:  Metric:       50.47
032:  Metric:       52.29
064:  Metric:       50.01
128:  Metric:       49.04

So for low threads, it's not restored but for larger number of threads,
it's closer to the "known good" baseline.

Using a different mremap-intensive workload that is not representative
of the real workload there is little difference observed outside of
noise in the headline metrics However, the TLB shootdowns are reduced by
11% on average and at the peak, TLB shootdowns were reduced by 21%.
Interrupts were sampled every second while the workload ran to get those
figures.  It's known that the figures will vary as the
non-representative load is non-deterministic.

An alternative patch was posted that should have significantly reduced
the TLB flushes but unfortunately it does not perform as well as this
version on the customer test case.  If revisited, the two patches can
stack on top of each other.

Link: http://lkml.kernel.org/r/20180606183803.k7qaw2xnbvzshv34@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomm/memblock: add missing include <linux/bootmem.h>
Mathieu Malaterre [Thu, 14 Jun 2018 22:26:38 +0000 (15:26 -0700)]
mm/memblock: add missing include <linux/bootmem.h>

Commit 26f09e9b3a06 ("mm/memblock: add memblock memory allocation apis")
introduced two new function definitions:

  memblock_virt_alloc_try_nid_nopanic()
  memblock_virt_alloc_try_nid()

Commit ea1f5f3712af ("mm: define memblock_virt_alloc_try_nid_raw")
introduced the following function definition:

  memblock_virt_alloc_try_nid_raw()

This commit adds an includeof header file <linux/bootmem.h> to provide
the missing function prototypes.  Silence the following gcc warning
(W=1):

  mm/memblock.c:1334:15: warning: no previous prototype for `memblock_virt_alloc_try_nid_raw' [-Wmissing-prototypes]
  mm/memblock.c:1371:15: warning: no previous prototype for `memblock_virt_alloc_try_nid_nopanic' [-Wmissing-prototypes]
  mm/memblock.c:1407:15: warning: no previous prototype for `memblock_virt_alloc_try_nid' [-Wmissing-prototypes]

Link: http://lkml.kernel.org/r/20180606194144.16990-1-malat@debian.org
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomm: check for SIGKILL inside dup_mmap() loop
Tetsuo Handa [Thu, 14 Jun 2018 22:26:34 +0000 (15:26 -0700)]
mm: check for SIGKILL inside dup_mmap() loop

As a theoretical problem, dup_mmap() of an mm_struct with 60000+ vmas
can loop while potentially allocating memory, with mm->mmap_sem held for
write by current thread.  This is bad if current thread was selected as
an OOM victim, for current thread will continue allocations using memory
reserves while OOM reaper is unable to reclaim memory.

As an actually observable problem, it is not difficult to make OOM
reaper unable to reclaim memory if the OOM victim is blocked at
i_mmap_lock_write() in this loop.  Unfortunately, since nobody can
explain whether it is safe to use killable wait there, let's check for
SIGKILL before trying to allocate memory.  Even without an OOM event,
there is no point with continuing the loop from the beginning if current
thread is killed.

I tested with debug printk().  This patch should be safe because we
already fail if security_vm_enough_memory_mm() or
kmem_cache_alloc(GFP_KERNEL) fails and exit_mmap() handles it.

   ***** Aborting dup_mmap() due to SIGKILL *****
   ***** Aborting dup_mmap() due to SIGKILL *****
   ***** Aborting dup_mmap() due to SIGKILL *****
   ***** Aborting dup_mmap() due to SIGKILL *****
   ***** Aborting exit_mmap() due to NULL mmap *****

[akpm@linux-foundation.org: add comment]
Link: http://lkml.kernel.org/r/201804071938.CDE04681.SOFVQJFtMHOOLF@I-love.SAKURA.ne.jp
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Rik van Riel <riel@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agokexec: yield to scheduler when loading kimage segments
Jarrett Farnitano [Thu, 14 Jun 2018 22:26:31 +0000 (15:26 -0700)]
kexec: yield to scheduler when loading kimage segments

Without yielding while loading kimage segments, a large initrd will
block all other work on the CPU performing the load until it is
completed.  For example loading an initrd of 200MB on a low power single
core system will lock up the system for a few seconds.

To increase system responsiveness to other tasks at that time, call
cond_resched() in both the crash kernel and normal kernel segment
loading loops.

I did run into a practical problem.  Hardware watchdogs on embedded
systems can have short timers on the order of seconds.  If the system is
locked up for a few seconds with only a single core available, the
watchdog may not be pet in a timely fashion.  If this happens, the
hardware watchdog will fire and reset the system.

This really only becomes a problem when you are working with a single
core, a decently sized initrd, and have a constrained hardware watchdog.

Link: http://lkml.kernel.org/r/1528738546-3328-1-git-send-email-jmf@amazon.com
Signed-off-by: Jarrett Farnitano <jmf@amazon.com>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomm: fix race between kmem_cache destroy, create and deactivate
Shakeel Butt [Thu, 14 Jun 2018 22:26:27 +0000 (15:26 -0700)]
mm: fix race between kmem_cache destroy, create and deactivate

The memcg kmem cache creation and deactivation (SLUB only) is
asynchronous.  If a root kmem cache is destroyed whose memcg cache is in
the process of creation or deactivation, the kernel may crash.

Example of one such crash:
general protection fault: 0000 [#1] SMP PTI
CPU: 1 PID: 1721 Comm: kworker/14:1 Not tainted 4.17.0-smp
...
Workqueue: memcg_kmem_cache kmemcg_deactivate_workfn
RIP: 0010:has_cpu_slab
...
Call Trace:
? on_each_cpu_cond
__kmem_cache_shrink
kmemcg_cache_deact_after_rcu
kmemcg_deactivate_workfn
process_one_work
worker_thread
kthread
ret_from_fork+0x35/0x40

To fix this race, on root kmem cache destruction, mark the cache as
dying and flush the workqueue used for memcg kmem cache creation and
deactivation.  SLUB's memcg kmem cache deactivation also includes RCU
callback and thus make sure all previous registered RCU callbacks have
completed as well.

[shakeelb@google.com: handle the RCU callbacks for SLUB deactivation]
Link: http://lkml.kernel.org/r/20180611192951.195727-1-shakeelb@google.com
[shakeelb@google.com: add more documentation, rename fields for readability]
Link: http://lkml.kernel.org/r/20180522201336.196994-1-shakeelb@google.com
[akpm@linux-foundation.org: fix build, per Shakeel]
[shakeelb@google.com: v3.  Instead of refcount, flush the workqueue]
Link: http://lkml.kernel.org/r/20180530001204.183758-1-shakeelb@google.com
Link: http://lkml.kernel.org/r/20180521174116.171846-1-shakeelb@google.com
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Greg Thelen <gthelen@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomm: fix devmem_is_allowed() for sub-page System RAM intersections
Dan Williams [Thu, 14 Jun 2018 22:26:24 +0000 (15:26 -0700)]
mm: fix devmem_is_allowed() for sub-page System RAM intersections

Hussam reports:

    I was poking around and for no real reason, I did cat /dev/mem and
    strings /dev/mem.  Then I saw the following warning in dmesg. I saved it
    and rebooted immediately.

     memremap attempted on mixed range 0x000000000009c000 size: 0x1000
     ------------[ cut here ]------------
     WARNING: CPU: 0 PID: 11810 at kernel/memremap.c:98 memremap+0x104/0x170
     [..]
     Call Trace:
      xlate_dev_mem_ptr+0x25/0x40
      read_mem+0x89/0x1a0
      __vfs_read+0x36/0x170

The memremap() implementation checks for attempts to remap System RAM
with MEMREMAP_WB and instead redirects those mapping attempts to the
linear map.  However, that only works if the physical address range
being remapped is page aligned.  In low memory we have situations like
the following:

    00000000-00000fff : Reserved
    00001000-0009fbff : System RAM
    0009fc00-0009ffff : Reserved

...where System RAM intersects Reserved ranges on a sub-page page
granularity.

Given that devmem_is_allowed() special cases any attempt to map System
RAM in the first 1MB of memory, replace page_is_ram() with the more
precise region_intersects() to trap attempts to map disallowed ranges.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=199999
Link: http://lkml.kernel.org/r/152856436164.18127.2847888121707136898.stgit@dwillia2-desk3.amr.corp.intel.com
Fixes: 92281dee825f ("arch: introduce memremap()")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reported-by: Hussam Al-Tayeb <me@hussam.eu.org>
Tested-by: Hussam Al-Tayeb <me@hussam.eu.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomm/swapfile.c: fix swap_count comment about nonexistent SWAP_HAS_CONT
Daniel Jordan [Thu, 14 Jun 2018 22:26:21 +0000 (15:26 -0700)]
mm/swapfile.c: fix swap_count comment about nonexistent SWAP_HAS_CONT

Commit 570a335b8e22 ("swap_info: swap count continuations") introduces
COUNT_CONTINUED but refers to it incorrectly as SWAP_HAS_CONT in a
comment in swap_count.  Fix it.

Link: http://lkml.kernel.org/r/20180612175919.30413-1-daniel.m.jordan@oracle.com
Fixes: 570a335b8e22 ("swap_info: swap count continuations")
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomm: fix null pointer dereference in mem_cgroup_protected
Roman Gushchin [Thu, 14 Jun 2018 22:26:17 +0000 (15:26 -0700)]
mm: fix null pointer dereference in mem_cgroup_protected

Shakeel reported a crash in mem_cgroup_protected(), which can be triggered
by memcg reclaim if the legacy cgroup v1 use_hierarchy=0 mode is used:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000120
  PGD 8000001ff55da067 P4D 8000001ff55da067 PUD 1fdc7df067 PMD 0
  Oops: 0000 [#4] SMP PTI
  CPU: 0 PID: 15581 Comm: bash Tainted: G      D 4.17.0-smp-clean #5
  Hardware name: ...
  RIP: 0010:mem_cgroup_protected+0x54/0x130
  Code: 4c 8b 8e 00 01 00 00 4c 8b 86 08 01 00 00 48 8d 8a 08 ff ff ff 48 85 d2 ba 00 00 00 00 48 0f 44 ca 48 39 c8 0f 84 cf 00 00 00 <48> 8b 81 20 01 00 00 4d 89 ca 4c 39 c8 4c 0f 46 d0 4d 85 d2 74 05
  RSP: 0000:ffffabe64dfafa58 EFLAGS: 00010286
  RAX: ffff9fb6ff03d000 RBX: ffff9fb6f5b1b000 RCX: 0000000000000000
  RDX: 0000000000000000 RSI: ffff9fb6f5b1b000 RDI: ffff9fb6f5b1b000
  RBP: ffffabe64dfafb08 R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000000000 R11: 000000000000c800 R12: ffffabe64dfafb88
  R13: ffff9fb6f5b1b000 R14: ffffabe64dfafb88 R15: ffff9fb77fffe000
  FS:  00007fed1f8ac700(0000) GS:ffff9fb6ff400000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000120 CR3: 0000001fdcf86003 CR4: 00000000001606f0
  Call Trace:
   ? shrink_node+0x194/0x510
   do_try_to_free_pages+0xfd/0x390
   try_to_free_mem_cgroup_pages+0x123/0x210
   try_charge+0x19e/0x700
   mem_cgroup_try_charge+0x10b/0x1a0
   wp_page_copy+0x134/0x5b0
   do_wp_page+0x90/0x460
   __handle_mm_fault+0x8e3/0xf30
   handle_mm_fault+0xfe/0x220
   __do_page_fault+0x262/0x500
   do_page_fault+0x28/0xd0
   ? page_fault+0x8/0x30
   page_fault+0x1e/0x30
  RIP: 0033:0x485b72

The problem happens because parent_mem_cgroup() returns a NULL pointer,
which is dereferenced later without a check.

As cgroup v1 has no memory guarantee support, let's make
mem_cgroup_protected() immediately return MEMCG_PROT_NONE, if the given
cgroup has no parent (non-hierarchical mode is used).

Link: http://lkml.kernel.org/r/20180611175418.7007-2-guro@fb.com
Fixes: bf8d5d52ffe8 ("memcg: introduce memory.min")
Signed-off-by: Roman Gushchin <guro@fb.com>
Reported-by: Shakeel Butt <shakeelb@google.com>
Tested-by: Shakeel Butt <shakeelb@google.com>
Tested-by: John Stultz <john.stultz@linaro.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomm/ksm.c: ignore STABLE_FLAG of rmap_item->address in rmap_walk_ksm()
Jia He [Thu, 14 Jun 2018 22:26:14 +0000 (15:26 -0700)]
mm/ksm.c: ignore STABLE_FLAG of rmap_item->address in rmap_walk_ksm()

In our armv8a server(QDF2400), I noticed lots of WARN_ON caused by
PAGE_SIZE unaligned for rmap_item->address under memory pressure
tests(start 20 guests and run memhog in the host).

  WARNING: CPU: 4 PID: 4641 at virt/kvm/arm/mmu.c:1826 kvm_age_hva_handler+0xc0/0xc8
  CPU: 4 PID: 4641 Comm: memhog Tainted: G        W 4.17.0-rc3+ #8
  Call trace:
   kvm_age_hva_handler+0xc0/0xc8
   handle_hva_to_gpa+0xa8/0xe0
   kvm_age_hva+0x4c/0xe8
   kvm_mmu_notifier_clear_flush_young+0x54/0x98
   __mmu_notifier_clear_flush_young+0x6c/0xa0
   page_referenced_one+0x154/0x1d8
   rmap_walk_ksm+0x12c/0x1d0
   rmap_walk+0x94/0xa0
   page_referenced+0x194/0x1b0
   shrink_page_list+0x674/0xc28
   shrink_inactive_list+0x26c/0x5b8
   shrink_node_memcg+0x35c/0x620
   shrink_node+0x100/0x430
   do_try_to_free_pages+0xe0/0x3a8
   try_to_free_pages+0xe4/0x230
   __alloc_pages_nodemask+0x564/0xdc0
   alloc_pages_vma+0x90/0x228
   do_anonymous_page+0xc8/0x4d0
   __handle_mm_fault+0x4a0/0x508
   handle_mm_fault+0xf8/0x1b0
   do_page_fault+0x218/0x4b8
   do_translation_fault+0x90/0xa0
   do_mem_abort+0x68/0xf0
   el0_da+0x24/0x28

In rmap_walk_ksm, the rmap_item->address might still have the
STABLE_FLAG, then the start and end in handle_hva_to_gpa might not be
PAGE_SIZE aligned.  Thus it will cause exceptions in handle_hva_to_gpa
on arm64.

This patch fixes it by ignoring (not removing) the low bits of address
when doing rmap_walk_ksm.

IMO, it should be backported to stable tree.  the storm of WARN_ONs is
very easy for me to reproduce.  More than that, I watched a panic (not
reproducible) as follows:

  page:ffff7fe003742d80 count:-4871 mapcount:-2126053375 mapping: (null) index:0x0
  flags: 0x1fffc00000000000()
  raw: 1fffc00000000000 0000000000000000 0000000000000000 ffffecf981470000
  raw: dead000000000100 dead000000000200 ffff8017c001c000 0000000000000000
  page dumped because: nonzero _refcount
  CPU: 29 PID: 18323 Comm: qemu-kvm Tainted: G W 4.14.15-5.hxt.aarch64 #1
  Hardware name: <snip for confidential issues>
  Call trace:
    dump_backtrace+0x0/0x22c
    show_stack+0x24/0x2c
    dump_stack+0x8c/0xb0
    bad_page+0xf4/0x154
    free_pages_check_bad+0x90/0x9c
    free_pcppages_bulk+0x464/0x518
    free_hot_cold_page+0x22c/0x300
    __put_page+0x54/0x60
    unmap_stage2_range+0x170/0x2b4
    kvm_unmap_hva_handler+0x30/0x40
    handle_hva_to_gpa+0xb0/0xec
    kvm_unmap_hva_range+0x5c/0xd0

I even injected a fault on purpose in kvm_unmap_hva_range by seting
size=size-0x200, the call trace is similar as above.  So I thought the
panic is similarly caused by the root cause of WARN_ON.

Andrea said:

: It looks a straightforward safe fix, on x86 hva_to_gfn_memslot would
: zap those bits and hide the misalignment caused by the low metadata
: bits being erroneously left set in the address, but the arm code
: notices when that's the last page in the memslot and the hva_end is
: getting aligned and the size is below one page.
:
: I think the problem triggers in the addr += PAGE_SIZE of
: unmap_stage2_ptes that never matches end because end is aligned but
: addr is not.
:
:  } while (pte++, addr += PAGE_SIZE, addr != end);
:
: x86 again only works on hva_start/hva_end after converting it to
: gfn_start/end and that being in pfn units the bits are zapped before
: they risk to cause trouble.

Jia He said:

: I've tested by myself in arm64 server (QDF2400,46 cpus,96G mem) Without
: this patch, the WARN_ON is very easy for reproducing.  After this patch, I
: have run the same benchmarch for a whole day without any WARN_ONs

Link: http://lkml.kernel.org/r/1525403506-6750-1-git-send-email-hejianet@gmail.com
Signed-off-by: Jia He <jia.he@hxt-semitech.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Tested-by: Jia He <hejianet@gmail.com>
Cc: Suzuki K Poulose <Suzuki.Poulose@arm.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
Cc: Arvind Yadav <arvind.yadav.cs@gmail.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoMerge tag 'vfs-timespec64' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd...
Linus Torvalds [Thu, 14 Jun 2018 22:31:07 +0000 (07:31 +0900)]
Merge tag 'vfs-timespec64' of git://git./linux/kernel/git/arnd/playground

Pull inode timestamps conversion to timespec64 from Arnd Bergmann:
 "This is a late set of changes from Deepa Dinamani doing an automated
  treewide conversion of the inode and iattr structures from 'timespec'
  to 'timespec64', to push the conversion from the VFS layer into the
  individual file systems.

  As Deepa writes:

   'The series aims to switch vfs timestamps to use struct timespec64.
    Currently vfs uses struct timespec, which is not y2038 safe.

    The series involves the following:
    1. Add vfs helper functions for supporting struct timepec64
       timestamps.
    2. Cast prints of vfs timestamps to avoid warnings after the switch.
    3. Simplify code using vfs timestamps so that the actual replacement
       becomes easy.
    4. Convert vfs timestamps to use struct timespec64 using a script.
       This is a flag day patch.

    Next steps:
    1. Convert APIs that can handle timespec64, instead of converting
       timestamps at the boundaries.
    2. Update internal data structures to avoid timestamp conversions'

  Thomas Gleixner adds:

   'I think there is no point to drag that out for the next merge
    window. The whole thing needs to be done in one go for the core
    changes which means that you're going to play that catchup game
    forever. Let's get over with it towards the end of the merge window'"

* tag 'vfs-timespec64' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground:
  pstore: Remove bogus format string definition
  vfs: change inode times to use struct timespec64
  pstore: Convert internal records to timespec64
  udf: Simplify calls to udf_disk_stamp_to_time
  fs: nfs: get rid of memcpys for inode times
  ceph: make inode time prints to be long long
  lustre: Use long long type to print inode time
  fs: add timespec64_truncate()

3 years agoMerge tag 'ceph-for-4.18-rc1' of git://github.com/ceph/ceph-client
Linus Torvalds [Thu, 14 Jun 2018 22:24:58 +0000 (07:24 +0900)]
Merge tag 'ceph-for-4.18-rc1' of git://github.com/ceph/ceph-client

Pull ceph updates from Ilya Dryomov:
 "The main piece is a set of libceph changes that revamps how OSD
  requests are aborted, improving CephFS ENOSPC handling and making
  "umount -f" actually work (Zheng and myself).

  The rest is mostly mount option handling cleanups from Chengguang and
  assorted fixes from Zheng, Luis and Dongsheng.

* tag 'ceph-for-4.18-rc1' of git://github.com/ceph/ceph-client: (31 commits)
  rbd: flush rbd_dev->watch_dwork after watch is unregistered
  ceph: update description of some mount options
  ceph: show ino32 if the value is different with default
  ceph: strengthen rsize/wsize/readdir_max_bytes validation
  ceph: fix alignment of rasize
  ceph: fix use-after-free in ceph_statfs()
  ceph: prevent i_version from going back
  ceph: fix wrong check for the case of updating link count
  libceph: allocate the locator string with GFP_NOFAIL
  libceph: make abort_on_full a per-osdc setting
  libceph: don't abort reads in ceph_osdc_abort_on_full()
  libceph: avoid a use-after-free during map check
  libceph: don't warn if req->r_abort_on_full is set
  libceph: use for_each_request() in ceph_osdc_abort_on_full()
  libceph: defer __complete_request() to a workqueue
  libceph: move more code into __complete_request()
  libceph: no need to call flush_workqueue() before destruction
  ceph: flush pending works before shutdown super
  ceph: abort osd requests on force umount
  libceph: introduce ceph_osdc_abort_requests()
  ...

3 years agoMerge tag 'for-4.18-part2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave...
Linus Torvalds [Thu, 14 Jun 2018 22:23:00 +0000 (07:23 +0900)]
Merge tag 'for-4.18-part2-tag' of git://git./linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

 - error handling fixup for one of the new ioctls from 1st pull

 - fix for device-replace that incorrectly uses inode pages and can mess
   up compressed extents in some cases

 - fiemap fix for reporting incorrect number of extents

 - vm_fault_t type conversion

* tag 'for-4.18-part2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: scrub: Don't use inode pages for device replace
  btrfs: change return type of btrfs_page_mkwrite to vm_fault_t
  Btrfs: fiemap: pass correct bytenr when fm_extent_count is zero
  btrfs: Check error of btrfs_iget in btrfs_search_path_in_tree_user

3 years agoKbuild: rename HAVE_CC_STACKPROTECTOR config variable
Masahiro Yamada [Thu, 14 Jun 2018 10:36:45 +0000 (19:36 +0900)]
Kbuild: rename HAVE_CC_STACKPROTECTOR config variable

HAVE_CC_STACKPROTECTOR should be selected by architectures with stack
canary implementation.  It is not about the compiler support.

For the consistency with commit 050e9baa9dc9 ("Kbuild: rename
CC_STACKPROTECTOR[_STRONG] config variables"), remove 'CC_' from the
config symbol.

I moved the 'select' lines to keep the alphabetical sorting.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agokconfig: tinyconfig: remove stale stack protector fixups
Masahiro Yamada [Thu, 14 Jun 2018 10:36:44 +0000 (19:36 +0900)]
kconfig: tinyconfig: remove stale stack protector fixups

Prior to commit 2a61f4747eea ("stack-protector: test compiler capability
in Kconfig and drop AUTO mode"), the stack protector was configured by
the choice of NONE, REGULAR, STRONG, AUTO.

tiny.config needed to explicitly set NONE because the default value of
choice, AUTO, did not produce the tiniest kernel.

Now that there are only two boolean symbols, STACKPROTECTOR and
STACKPROTECTOR_STRONG, they are naturally disabled by "make
allnoconfig", which "make tinyconfig" is based on.  Remove unnecessary
lines from the tiny.config fragment file.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agox86: fix dependency of X86_32_LAZY_GS
Masahiro Yamada [Thu, 14 Jun 2018 10:36:43 +0000 (19:36 +0900)]
x86: fix dependency of X86_32_LAZY_GS

Commit 2a61f4747eea ("stack-protector: test compiler capability in
Kconfig and drop AUTO mode") replaced the 'choice' with two boolean
symbols, so CC_STACKPROTECTOR_NONE no longer exists.

Prior to commit 2bc2f688fdf8 ("Makefile: move stack-protector
availability out of Kconfig"), this line was like this:

  depends on X86_32 && !CC_STACKPROTECTOR

The CC_ prefix was dropped by commit 050e9baa9dc9 ("Kbuild: rename
CC_STACKPROTECTOR[_STRONG] config variables"), so the dependency now
should be:

  depends on X86_32 && !STACKPROTECTOR

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agosctp: define sctp_packet_gso_append to build GSO frames
Xin Long [Wed, 13 Jun 2018 23:37:02 +0000 (07:37 +0800)]
sctp: define sctp_packet_gso_append to build GSO frames

Now sctp GSO uses skb_gro_receive() to append the data into head
skb frag_list. However it actually only needs very few code from
skb_gro_receive(). Besides, NAPI_GRO_CB has to be set while most
of its members are not needed here.

This patch is to add sctp_packet_gso_append() to build GSO frames
instead of skb_gro_receive(), and it would avoid many unnecessary
checks and make the code clearer.

Note that sctp will use page frags instead of frag_list to build
GSO frames in another patch. But it may take time, as sctp's GSO
frames may have different size. skb_segment() can only split it
into the frags with the same size, which would break the border
of sctp chunks.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoKVM: x86: VMX: redo fix for link error without CONFIG_HYPERV
Arnd Bergmann [Fri, 25 May 2018 15:36:17 +0000 (17:36 +0200)]
KVM: x86: VMX: redo fix for link error without CONFIG_HYPERV

Arnd had sent this patch to the KVM mailing list, but it slipped through
the cracks of maintainers hand-off, and therefore wasn't included in
the pull request.

The same issue had been fixed by Linus in commit dbee3d0 ("KVM: x86:
VMX: fix build without hyper-v", 2018-06-12) as a self-described
"quick-and-hacky build fix".  However, checking the compile-time
configuration symbol with IS_ENABLED is cleaner and it is enough to
avoid the link error, so switch to Arnd's solution.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
[Rewritten commit message. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoALSA: usb-audio: Always create the interrupt pipe for the mixer
Jorge Sanjuan [Thu, 14 Jun 2018 14:05:58 +0000 (15:05 +0100)]
ALSA: usb-audio: Always create the interrupt pipe for the mixer

An UAC3 BADD device may also include an interrupt status pipe
to report changes on the HEADSET ADAPTER terminals. The creation
of the status pipe is dependent on the device reporting that it
has it.

Signed-off-by: Jorge Sanjuan <jorge.sanjuan@codethink.co.uk>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
3 years agoALSA: usb-audio: Add insertion control for UAC3 BADD
Jorge Sanjuan [Thu, 14 Jun 2018 14:05:57 +0000 (15:05 +0100)]
ALSA: usb-audio: Add insertion control for UAC3 BADD

The HEADSET ADAPTER profile for BADD devices is meant to support
Insertion Control for the Input and Output Terminals of the headset.

This patch defines the BADD inferred input and output terminals and
builds the connector controls.

Signed-off-by: Jorge Sanjuan <jorge.sanjuan@codethink.co.uk>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
3 years agoALSA: usb-audio: Change in connectors control creation interface
Jorge Sanjuan [Thu, 14 Jun 2018 14:05:56 +0000 (15:05 +0100)]
ALSA: usb-audio: Change in connectors control creation interface

Change build_connector_control() and get_connector_control_name()
so they take `struct usb_mixer_interface` as input argument instead
of `struct mixer_build`.

This is preliminary work to add support for connectors control
for UAC3 BADD devices. No functional change.

Signed-off-by: Jorge Sanjuan <jorge.sanjuan@codethink.co.uk>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
3 years agoALSA: usb-audio: Add bi-directional terminal types
Jorge Sanjuan [Thu, 14 Jun 2018 14:05:55 +0000 (15:05 +0100)]
ALSA: usb-audio: Add bi-directional terminal types

Define the bi-directional USB terminal types for audio devices.

Signed-off-by: Jorge Sanjuan <jorge.sanjuan@codethink.co.uk>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
3 years agoALSA: lx6464es: add error handling for pci_ioremap_bar
Zhouyang Jia [Thu, 14 Jun 2018 13:51:46 +0000 (21:51 +0800)]
ALSA: lx6464es: add error handling for pci_ioremap_bar

When pci_ioremap_bar fails, the lack of error-handling code may
cause unexpected results.

This patch adds error-handling code after calling pci_ioremap_bar.

Signed-off-by: Zhouyang Jia <jiazhouyang09@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
3 years agoALSA: sonicvibes: add error handling for snd_ctl_add
Zhouyang Jia [Thu, 14 Jun 2018 11:41:37 +0000 (19:41 +0800)]
ALSA: sonicvibes: add error handling for snd_ctl_add

When snd_ctl_add fails, the lack of error-handling code may
cause unexpected results.

This patch adds error-handling code after calling snd_ctl_add.

Signed-off-by: Zhouyang Jia <jiazhouyang09@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
3 years agoMerge tag 'kvm-ppc-next-4.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git...
Paolo Bonzini [Thu, 14 Jun 2018 15:42:54 +0000 (17:42 +0200)]
Merge tag 'kvm-ppc-next-4.18-2' of git://git./linux/kernel/git/paulus/powerpc into HEAD

3 years agoKVM: x86: fix typo at kvm_arch_hardware_setup comment
Marcelo Tosatti [Mon, 11 Jun 2018 17:12:10 +0000 (14:12 -0300)]
KVM: x86: fix typo at kvm_arch_hardware_setup comment

Fix typo in sentence about min value calculation.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agopstore: Remove bogus format string definition
Arnd Bergmann [Wed, 30 May 2018 15:24:52 +0000 (17:24 +0200)]
pstore: Remove bogus format string definition

The pstore conversion to timespec64 introduces its own method of passing
seconds into sscanf() and sprintf() type functions to work around the
timespec64 definition on 64-bit systems that redefine it to 'timespec'.

That hack is now finally getting removed, but that means we get a (harmless)
warning once both patches are merged:

fs/pstore/ram.c: In function 'ramoops_read_kmsg_hdr':
fs/pstore/ram.c:39:29: error: format '%ld' expects argument of type 'long int *', but argument 3 has type 'time64_t *' {aka 'long long int *'} [-Werror=format=]
 #define RAMOOPS_KERNMSG_HDR "===="
                             ^~~~~~
fs/pstore/ram.c:167:21: note: in expansion of macro 'RAMOOPS_KERNMSG_HDR'

This removes the pstore specific workaround and uses the same method that
we have in place for all other functions that print a timespec64.

Related to this, I found that the kasprintf() output contains an incorrect
nanosecond values for any number starting with zeroes, and I adapt the
format string accordingly.

Link: https://lkml.org/lkml/2018/5/19/115
Link: https://lkml.org/lkml/2018/5/16/1080
Fixes: 0f0d83b99ef7 ("pstore: Convert internal records to timespec64")
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
3 years agoMerge branch 'vfs_timespec64' of https://github.com/deepa-hub/vfs into vfs-timespec64
Arnd Bergmann [Thu, 14 Jun 2018 12:51:13 +0000 (14:51 +0200)]
Merge branch 'vfs_timespec64' of https://github.com/deepa-hub/vfs into vfs-timespec64

Pull the timespec64 conversion from Deepa Dinamani:
 "The series aims to switch vfs timestamps to use
  struct timespec64. Currently vfs uses struct timespec,
  which is not y2038 safe.

  The flag patch applies cleanly. I've not seen the timestamps
  update logic change often. The series applies cleanly on 4.17-rc6
  and linux-next tip (top commit: next-20180517).

  I'm not sure how to merge this kind of a series with a flag patch.
  We are targeting 4.18 for this.
  Let me know if you have other suggestions.

  The series involves the following:
  1. Add vfs helper functions for supporting struct timepec64 timestamps.
  2. Cast prints of vfs timestamps to avoid warnings after the switch.
  3. Simplify code using vfs timestamps so that the actual
     replacement becomes easy.
  4. Convert vfs timestamps to use struct timespec64 using a script.
     This is a flag day patch.

  I've tried to keep the conversions with the script simple, to
  aid in the reviews. I've kept all the internal filesystem data
  structures and function signatures the same.

  Next steps:
  1. Convert APIs that can handle timespec64, instead of converting
     timestamps at the boundaries.
  2. Update internal data structures to avoid timestamp conversions."

I've pulled it into a branch based on top of the NFS changes that
are now in mainline, so I could resolve the non-obvious conflict
between the two while merging.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
3 years agodrm/amd/powerplay: Set higher SCLK&MCLK frequency than dpm7 in OD (v2)
Kenneth Feng [Tue, 12 Jun 2018 07:07:37 +0000 (15:07 +0800)]
drm/amd/powerplay: Set higher SCLK&MCLK frequency than dpm7 in OD (v2)

Fix the issue that SCLK&MCLK can't be set higher than dpm7 when
OD is enabled in SMU7.

v2: fix warning (Alex)

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Acked-by: Rex Zhu<rezhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>