2 weeks agoLinux 5.16-rc8 master
Linus Torvalds [Sun, 2 Jan 2022 22:23:25 +0000 (14:23 -0800)]
Linux 5.16-rc8

2 weeks agoMerge tag 'perf-tools-fixes-for-v5.16-2022-01-02' of git://
Linus Torvalds [Sun, 2 Jan 2022 22:09:03 +0000 (14:09 -0800)]
Merge tag 'perf-tools-fixes-for-v5.16-2022-01-02' of git://git./linux/kernel/git/acme/linux

Pull perf tools fixes from Arnaldo Carvalho de Melo:

 - Fix TUI exit screen refresh race condition in 'perf top'.

 - Fix parsing of Intel PT VM time correlation arguments.

 - Honour CPU filtering command line request of a script's switch events
   in 'perf script'.

 - Fix printing of switch events in Intel PT python script.

 - Fix duplicate alias events list printing in 'perf list', noticed on
   heterogeneous arm64 systems.

 - Fix return value of ids__new(), users expect NULL for failure, not

* tag 'perf-tools-fixes-for-v5.16-2022-01-02' of git://
  perf top: Fix TUI exit screen refresh race condition
  perf pmu: Fix alias events list
  perf scripts python: Fix printing of switch events
  perf script: Fix CPU filtering of a script's switch events
  perf intel-pt: Fix parsing of VM time correlation arguments
  perf expr: Fix return value of ids__new()

2 weeks agoMerge branch 'i2c/for-current' of git://
Linus Torvalds [Sun, 2 Jan 2022 18:36:09 +0000 (10:36 -0800)]
Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
 "Better input validation for compat ioctls and a documentation bugfix
  for 5.16"

* 'i2c/for-current' of git://
  Docs: Fixes link to I2C specification
  i2c: validate user data in compat ioctl

2 weeks agoMerge tag 'x86_urgent_for_v5.16_rc8' of git://
Linus Torvalds [Sun, 2 Jan 2022 17:02:54 +0000 (09:02 -0800)]
Merge tag 'x86_urgent_for_v5.16_rc8' of git://git./linux/kernel/git/tip/tip

Pull x86 fix from Borislav Petkov:

 - Use the proper CONFIG symbol in a preprocessor check.

* tag 'x86_urgent_for_v5.16_rc8' of git://
  x86/build: Use the proper name CONFIG_FW_LOADER

2 weeks agoperf top: Fix TUI exit screen refresh race condition
yaowenbin [Wed, 29 Dec 2021 08:55:19 +0000 (16:55 +0800)]
perf top: Fix TUI exit screen refresh race condition

When the following command is executed several times, a coredump file is

$ timeout -k 9 5 perf top -e task-clock
0.01%  [kernel]                  [k] __do_softirq
0.01%        [.] __pthread_mutex_lock
0.01%  [kernel]                  [k] __ll_sc_atomic64_sub_return
double free or corruption (!prev) perf top --sort comm,dso
timeout: the monitored command dumped core

When we terminate "perf top" using sending signal method,
SLsmg_reset_smg() called. SLsmg_reset_smg() resets the SLsmg screen
management routines by freeing all memory allocated while it was active.

However SLsmg_reinit_smg() maybe be called by another thread.

SLsmg_reinit_smg() will free the same memory accessed by
SLsmg_reset_smg(), thus it results in a double free.

SLsmg_reinit_smg() is called already protected by ui__lock, so we fix
the problem by adding pthread_mutex_trylock of ui__lock when calling

Signed-off-by: Wenyu Liu <>
Tested-by: Arnaldo Carvalho de Melo <>
Cc: Alexander Shishkin <>
Cc: Jiri Olsa <>
Cc: Mark Rutland <>
Cc: Namhyung Kim <>
Cc: Peter Zijlstra <>
Signed-off-by: Hewenliang <>
Signed-off-by: yaowenbin <>
Signed-off-by: Arnaldo Carvalho de Melo <>
2 weeks agoperf pmu: Fix alias events list
John Garry [Tue, 21 Dec 2021 16:11:30 +0000 (00:11 +0800)]
perf pmu: Fix alias events list

Commit 0e0ae8742207c3b4 ("perf list: Display hybrid PMU events with cpu
type") changes the event list for uncore PMUs or arm64 heterogeneous CPU
systems, such that duplicate aliases are incorrectly listed per PMU
(which they should not be), like:

  # perf list
  [Unit: uncore_cbox L3 Lookup any request that access cache and found
  line in E or S-state]
  [Unit: uncore_cbox L3 Lookup any request that access cache and found
  line in E or S-state]
  [Unit: uncore_cbox L3 Lookup any request that access cache and found
  line in I-state]
  [Unit: uncore_cbox L3 Lookup any request that access cache and found
  line in I-state]

Notice how the events are listed twice.

The named commit changed how we remove duplicate events, in that events
for different PMUs are not treated as duplicates. I suppose this is to
handle how "Each hybrid pmu event has been assigned with a pmu name".

Fix PMU alias listing by restoring behaviour to remove duplicates for
non-hybrid PMUs.

Fixes: 0e0ae8742207c3b4 ("perf list: Display hybrid PMU events with cpu type")
Signed-off-by: John Garry <>
Tested-by: Zhengjun Xing <>
Cc: Alexander Shishkin <>
Cc: Ian Rogers <>
Cc: Ingo Molnar <>
Cc: Jiri Olsa <>
Cc: Kan Liang <>
Cc: Mark Rutland <>
Cc: Namhyung Kim <>
Cc: Peter Zijlstra <>
Signed-off-by: Arnaldo Carvalho de Melo <>
2 weeks agoMerge branch 'for-linus' of git://
Linus Torvalds [Sat, 1 Jan 2022 18:21:49 +0000 (10:21 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input

Pull input fixes from Dmitry Torokhov:
 "Two small fixups for spaceball joystick driver and appletouch touchpad

* 'for-linus' of git://
  Input: spaceball - fix parsing of movement data packets
  Input: appletouch - initialize work before device registration

3 weeks agomm: vmscan: reduce throttling due to a failure to make progress -fix
Mel Gorman [Fri, 31 Dec 2021 21:10:09 +0000 (13:10 -0800)]
mm: vmscan: reduce throttling due to a failure to make progress -fix

Hugh Dickins reported the following

My tmpfs swapping load (tweaked to use huge pages more heavily
than in real life) is far from being a realistic load: but it was
notably slowed down by your throttling mods in 5.16-rc, and this
patch makes it well again - thanks.

But: it very quickly hit NULL pointer until I changed that last
line to

        if (first_pgdat)
                consider_reclaim_throttle(first_pgdat, sc);

The likely issue is that huge pages are a major component of the test
workload.  When this is the case, first_pgdat may never get set if
compaction is ready to continue due to this check

            sc->order > PAGE_ALLOC_COSTLY_ORDER &&
            compaction_ready(zone, sc)) {
                sc->compaction_ready = true;

If this was true for every zone in the zonelist, first_pgdat would never
get set resulting in a NULL pointer exception.

Fixes: 1b4e3f26f9f75 ("mm: vmscan: Reduce throttling due to a failure to make progress")
Signed-off-by: Mel Gorman <>
Reported-by: Hugh Dickins <>
Cc: Michal Hocko <>
Cc: Vlastimil Babka <>
Cc: Rik van Riel <>
Cc: Mike Galbraith <>
Cc: Darrick J. Wong <>
Cc: Shakeel Butt <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
3 weeks agomm: vmscan: Reduce throttling due to a failure to make progress
Mel Gorman [Thu, 2 Dec 2021 15:06:14 +0000 (15:06 +0000)]
mm: vmscan: Reduce throttling due to a failure to make progress

Mike Galbraith, Alexey Avramov and Darrick Wong all reported similar
problems due to reclaim throttling for excessive lengths of time.  In
Alexey's case, a memory hog that should go OOM quickly stalls for
several minutes before stalling.  In Mike and Darrick's cases, a small
memcg environment stalled excessively even though the system had enough
memory overall.

Commit 69392a403f49 ("mm/vmscan: throttle reclaim when no progress is
being made") introduced the problem although commit a19594ca4a8b
("mm/vmscan: increase the timeout if page reclaim is not making
progress") made it worse.  Systems at or near an OOM state that cannot
be recovered must reach OOM quickly and memcg should kill tasks if a
memcg is near OOM.

To address this, only stall for the first zone in the zonelist, reduce
the timeout to 1 tick for VMSCAN_THROTTLE_NOPROGRESS and only stall if
the scan control nr_reclaimed is 0, kswapd is still active and there
were excessive pages pending for writeback.  If kswapd has stopped
reclaiming due to excessive failures, do not stall at all so that OOM
triggers relatively quickly.  Similarly, if an LRU is simply congested,
only lightly throttle similar to NOPROGRESS.

Alexey's original case was the most straight forward

for i in {1..3}; do tail /dev/zero; done

On vanilla 5.16-rc1, this test stalled heavily, after the patch the test
completes in a few seconds similar to 5.15.

Alexey's second test case added watching a youtube video while tail runs
10 times.  On 5.15, playback only jitters slightly, 5.16-rc1 stalls a
lot with lots of frames missing and numerous audio glitches.  With this
patch applies, the video plays similarly to 5.15.

[ Fix W=1 build warning]

Reported-and-tested-by: Alexey Avramov <>
Reported-and-tested-by: Mike Galbraith <>
Reported-and-tested-by: Darrick J. Wong <>
Reported-by: kernel test robot <>
Acked-by: Hugh Dickins <>
Tracked-by: Thorsten Leemhuis <>
Fixes: 69392a403f49 ("mm/vmscan: throttle reclaim when no progress is being made")
Signed-off-by: Mel Gorman <>
Signed-off-by: Linus Torvalds <>
3 weeks agoMerge branch 'akpm' (patches from Andrew)
Linus Torvalds [Fri, 31 Dec 2021 17:28:48 +0000 (09:28 -0800)]
Merge branch 'akpm' (patches from Andrew)

Merge misc mm fixes from Andrew Morton:
 "2 patches.

  Subsystems affected by this patch series: mm (userfaultfd and damon)"

* akpm:
  mm/damon/dbgfs: fix 'struct pid' leaks in 'dbgfs_target_ids_write()'
  userfaultfd/selftests: fix hugetlb area allocations

3 weeks agoMerge tag 'scsi-fixes' of git://
Linus Torvalds [Fri, 31 Dec 2021 17:22:25 +0000 (09:22 -0800)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "Three fixes, all in drivers. The lpfc one doesn't look exploitable,
  but nasty things could happen in string operations if mybuf ends up
  with an on stack unterminated string"

* tag 'scsi-fixes' of git://
  scsi: vmw_pvscsi: Set residual data length conditionally
  scsi: libiscsi: Fix UAF in iscsi_conn_get_param()/iscsi_conn_teardown()
  scsi: lpfc: Terminate string in lpfc_debugfs_nvmeio_trc_write()

3 weeks agomm/damon/dbgfs: fix 'struct pid' leaks in 'dbgfs_target_ids_write()'
SeongJae Park [Fri, 31 Dec 2021 04:12:34 +0000 (20:12 -0800)]
mm/damon/dbgfs: fix 'struct pid' leaks in 'dbgfs_target_ids_write()'

DAMON debugfs interface increases the reference counts of 'struct pid's
for targets from the 'target_ids' file write callback
('dbgfs_target_ids_write()'), but decreases the counts only in DAMON
monitoring termination callback ('dbgfs_before_terminate()').

Therefore, when 'target_ids' file is repeatedly written without DAMON
monitoring start/termination, the reference count is not decreased and
therefore memory for the 'struct pid' cannot be freed.  This commit
fixes this issue by decreasing the reference counts when 'target_ids' is

Fixes: 4bc05954d007 ("mm/damon: implement a debugfs-based user space interface")
Signed-off-by: SeongJae Park <>
Cc: <> [5.15+]
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
3 weeks agouserfaultfd/selftests: fix hugetlb area allocations
Mike Kravetz [Fri, 31 Dec 2021 04:12:31 +0000 (20:12 -0800)]
userfaultfd/selftests: fix hugetlb area allocations

Currently, userfaultfd selftest for hugetlb as run from
or any environment where there are 'just enough' hugetlb pages will
always fail with:

  testing events (fork, remap, remove):
ERROR: UFFDIO_COPY error: -12 (errno=12, line=616)

The ENOMEM error code implies there are not enough hugetlb pages.
However, there are free hugetlb pages but they are all reserved.  There
is a basic problem with the way the test allocates hugetlb pages which
has existed since the test was originally written.

Due to the way 'cleanup' was done between different phases of the test,
this issue was masked until recently.  The issue was uncovered by commit
8ba6e8640844 ("userfaultfd/selftests: reinitialize test context in each

For the hugetlb test, src and dst areas are allocated as PRIVATE
mappings of a hugetlb file.  This means that at mmap time, pages are
reserved for the src and dst areas.  At the start of event testing (and
other tests) the src area is populated which results in allocation of
huge pages to fill the area and consumption of reserves associated with
the area.  Then, a child is forked to fault in the dst area.  Note that
the dst area was allocated in the parent and hence the parent owns the
reserves associated with the mapping.  The child has normal access to
the dst area, but can not use the reserves created/owned by the parent.
Thus, if there are no other huge pages available allocation of a page
for the dst by the child will fail.

Fix by not creating reserves for the dst area.  In this way the child
can use free (non-reserved) pages.

Also, MAP_PRIVATE of a file only makes sense if you are interested in
the contents of the file before making a COW copy.  The test does not do
this.  So, just use MAP_ANONYMOUS | MAP_HUGETLB to create an anonymous
hugetlb mapping.  There is no need to create a hugetlb file in the
non-shared case.

Signed-off-by: Mike Kravetz <>
Cc: Axel Rasmussen <>
Cc: Peter Xu <>
Cc: Andrea Arcangeli <>
Cc: Mina Almasry <>
Cc: Shuah Khan <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
3 weeks agoDocs: Fixes link to I2C specification
Deep Majumder [Fri, 19 Nov 2021 06:14:01 +0000 (11:44 +0530)]
Docs: Fixes link to I2C specification

The link to the I2C specification is broken. Although
"" hosts Rev 7 (2021) of this specification, it is
behind a login-wall. Thus, an additional link has been added (which
doesn't require a login) and the NXP official docs link has been

Signed-off-by: Deep Majumder <>
[wsa: minor updates to text and commit message]
Signed-off-by: Wolfram Sang <>
3 weeks agoi2c: validate user data in compat ioctl
Pavel Skripkin [Thu, 30 Dec 2021 22:47:50 +0000 (01:47 +0300)]
i2c: validate user data in compat ioctl

Wrong user data may cause warning in i2c_transfer(), ex: zero msgs.
Userspace should not be able to trigger warnings, so this patch adds
validation checks for user data in compact ioctl to prevent reported

Fixes: 7d5cb45655f2 ("i2c compat ioctls: move to ->compat_ioctl()")
Signed-off-by: Pavel Skripkin <>
Signed-off-by: Wolfram Sang <>
3 weeks agoInput: spaceball - fix parsing of movement data packets
Leo L. Schwab [Fri, 31 Dec 2021 05:05:00 +0000 (21:05 -0800)]
Input: spaceball - fix parsing of movement data packets

The spaceball.c module was not properly parsing the movement reports
coming from the device.  The code read axis data as signed 16-bit
little-endian values starting at offset 2.

In fact, axis data in Spaceball movement reports are signed 16-bit
big-endian values starting at offset 3.  This was determined first by
visually inspecting the data packets, and later verified by consulting:

If this ever worked properly, it was in the time before Git...

Signed-off-by: Leo L. Schwab <>
Signed-off-by: Dmitry Torokhov <>
3 weeks agoInput: appletouch - initialize work before device registration
Pavel Skripkin [Fri, 31 Dec 2021 04:57:46 +0000 (20:57 -0800)]
Input: appletouch - initialize work before device registration

Syzbot has reported warning in __flush_work(). This warning is caused by
work->func == NULL, which means missing work initialization.

This may happen, since input_dev->close() calls
cancel_work_sync(&dev->work), but dev->work initalization happens _after_
input_register_device() call.

So this patch moves dev->work initialization before registering input

Fixes: 5a6eb676d3bc ("Input: appletouch - improve powersaving for Geyser3 devices")
Signed-off-by: Pavel Skripkin <>
Signed-off-by: Dmitry Torokhov <>
3 weeks agoMerge tag 'drm-fixes-2021-12-31' of git://
Linus Torvalds [Fri, 31 Dec 2021 02:25:43 +0000 (18:25 -0800)]
Merge tag 'drm-fixes-2021-12-31' of git://

Pull drm fixes from Dave Airlie:
 "This is a bit bigger than I'd like, however it has two weeks of amdgpu
  fixes in it, since they missed last week, which was very small.

  The nouveau regression is probably the biggest fix in here, and it
  needs to go into 5.15 as well, two i915 fixes, and then a scattering
  of amdgpu fixes. The biggest fix in there is for a fencing NULL
  pointer dereference, the rest are pretty minor.

  For the misc team, I've pulled the two misc fixes manually since I'm
  not sure what is happening at this time of year!

  The amdgpu maintainers have the outstanding runpm regression to fix
  still, they are just working through the last bits of it now.


   - fencing regression fix

   - Fix possible uninitialized variable
   - Fix composite fence seqno icrement on each fence creation

   - Fencing fix
   - XGMI fix
   - VCN regression fix
   - IP discovery regression fixes
   - Fix runpm documentation
   - Suspend/resume fixes
   - Yellow Carp display fixes
   - MCLK power management fix
   - dma-buf fix"

* tag 'drm-fixes-2021-12-31' of git://
  drm/amd/display: Changed pipe split policy to allow for multi-display pipe split
  drm/amd/display: Fix USB4 null pointer dereference in update_psp_stream_config
  drm/amd/display: Set optimize_pwr_state for DCN31
  drm/amd/display: Send s0i2_rdy in stream_count == 0 optimization
  drm/amd/display: Added power down for DCN10
  drm/amd/display: fix B0 TMDS deepcolor no dislay issue
  drm/amdgpu: no DC support for headless chips
  drm/amdgpu: put SMU into proper state on runpm suspending for BOCO capable platform
  drm/amdgpu: always reset the asic in suspend (v2)
  drm/amd/pm: skip setting gfx cgpg in the s0ix suspend-resume
  drm/i915: Increment composite fence seqno
  drm/i915: Fix possible uninitialized variable in parallel extension
  drm/amdgpu: fix runpm documentation
  drm/nouveau: wait for the exclusive fence after the shared ones v2
  drm/amdgpu: add support for IP discovery gc_info table v2
  drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled
  drm/amd/pm: Fix xgmi link control on aldebaran
  drm/amdgpu: introduce new amdgpu_fence object to indicate the job embedded fence
  drm/amdgpu: fix dropped backing store handling in amdgpu_dma_buf_move_notify

3 weeks agoMerge branch 'drm-misc-fixes' of ssh:// into...
Dave Airlie [Fri, 31 Dec 2021 01:40:29 +0000 (11:40 +1000)]
Merge branch 'drm-misc-fixes' of ssh:// into drm-fixes

This merges two fixes that haven't been sent to me yet, but I wanted to get in.

One amdgpu fix, but one nouveau regression fixer.

Signed-off-by: Dave Airlie <>
3 weeks agofs/mount_setattr: always cleanup mount_kattr
Christian Brauner [Thu, 30 Dec 2021 19:23:09 +0000 (20:23 +0100)]
fs/mount_setattr: always cleanup mount_kattr

Make sure that finish_mount_kattr() is called after mount_kattr was
succesfully built in both the success and failure case to prevent
leaking any references we took when we built it.  We returned early if
path lookup failed thereby risking to leak an additional reference we
took when building mount_kattr when an idmapped mount was requested.

Fixes: 9caccd41541a ("fs: introduce MOUNT_ATTR_IDMAP")
Signed-off-by: Christian Brauner <>
Signed-off-by: Linus Torvalds <>
3 weeks agoMerge tag 'net-5.16-rc8' of git://
Linus Torvalds [Thu, 30 Dec 2021 19:12:12 +0000 (11:12 -0800)]
Merge tag 'net-5.16-rc8' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from.. Santa?

  No regressions on our radar at this point. The igc problem fixed here
  was the last one I was tracking but it was broken in previous
  releases, anyway. Mostly driver fixes and a couple of largish SMC

  Current release - regressions:

   - xsk: initialise xskb free_list_node, fixup for a -rc7 fix

  Current release - new code bugs:

   - mlx5: handful of minor fixes:

   - use first online CPU instead of hard coded CPU

   - fix some error handling paths in 'mlx5e_tc_add_fdb_flow()'

   - fix skb memory leak when TC classifier action offloads are disabled

   - fix memory leak with rules with internal OvS port

  Previous releases - regressions:

   - igc: do not enable crosstimestamping for i225-V models

  Previous releases - always broken:

   - udp: use datalen to cap ipv6 udp max gso segments

   - fix use-after-free in tw_timer_handler due to early free of stats

   - smc: fix kernel panic caused by race of smc_sock

   - smc: don't send CDC/LLC message if link not ready, avoid timeouts

   - sctp: use call_rcu to free endpoint, avoid UAF in sock diag

   - bridge: mcast: add and enforce query interval minimum

   - usb: pegasus: do not drop long Ethernet frames

   - mlx5e: fix ICOSQ recovery flow for XSK

   - nfc: uapi: use kernel size_t to fix user-space builds"

* tag 'net-5.16-rc8' of git:// (47 commits)
  fsl/fman: Fix missing put_device() call in fman_port_probe
  selftests: net: using ping6 for IPv6 in
  Documentation: fix outdated interpretation of ip_no_pmtu_disc
  net/ncsi: check for error return from call to nla_put_u32
  net: bridge: mcast: fix br_multicast_ctx_vlan_global_disabled helper
  net: fix use-after-free in tw_timer_handler
  selftests: net: Fix a typo in
  selftests/net: udpgso_bench_tx: fix dst ip argument
  net: bridge: mcast: add and enforce startup query interval minimum
  net: bridge: mcast: add and enforce query interval minimum
  ipv6: raw: check passed optlen before reading
  xsk: Initialise xskb free_list_node
  net/mlx5e: Fix wrong features assignment in case of error
  net/mlx5e: TC, Fix memory leak with rules with internal port
  ionic: Initialize the 'lif->dbid_inuse' bitmap
  igc: Fix TX timestamp support for non-MSI-X platforms
  igc: Do not enable crosstimestamping for i225-V models
  net/smc: fix kernel panic caused by race of smc_sock
  net/smc: don't send CDC/LLC message if link not ready
  NFC: st21nfca: Fix memory leak in device probe and remove

3 weeks agoMerge tag 'char-misc-5.16' of git://
Linus Torvalds [Thu, 30 Dec 2021 17:52:32 +0000 (09:52 -0800)]
Merge tag 'char-misc-5.16' of git://git./linux/kernel/git/gregkh/char-misc

Pull char/misc fixes from Greg KH:
 "Here are two misc driver fixes for 5.16-final:

   - binder accounting fix to resolve reported problem

   - nitro_enclaves fix for mmap assert warning output

  Both of these have been for over a week with no reported issues"

* tag 'char-misc-5.16' of git://
  nitro_enclaves: Use get_user_pages_unlocked() call to handle mmap assert
  binder: fix async_free_space accounting for empty parcels

3 weeks agoMerge tag 'usb-5.16' of git://
Linus Torvalds [Thu, 30 Dec 2021 17:49:54 +0000 (09:49 -0800)]
Merge tag 'usb-5.16' of git://git./linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
 "Here are some small USB driver fixes for 5.16 to resolve some reported

   - mtu3 driver fixes

   - typec ucsi driver fix

   - xhci driver quirk added

   - usb gadget f_fs fix for reported crash

  All of these have been in linux-next for a while with no reported

* tag 'usb-5.16' of git://
  usb: typec: ucsi: Only check the contract if there is a connection
  xhci: Fresco FL1100 controller should not have BROKEN_MSI quirk set.
  usb: mtu3: set interval of FS intr and isoc endpoint
  usb: mtu3: fix list_head check warning
  usb: mtu3: add memory barrier before set GPD's HWO
  usb: mtu3: fix interval value for intr and isoc
  usb: gadget: f_fs: Clear ffs_eventfd in ffs_data_clear.

3 weeks agofsl/fman: Fix missing put_device() call in fman_port_probe
Miaoqian Lin [Thu, 30 Dec 2021 12:26:27 +0000 (12:26 +0000)]
fsl/fman: Fix missing put_device() call in fman_port_probe

The reference taken by 'of_find_device_by_node()' must be released when
not needed anymore.
Add the corresponding 'put_device()' in the and error handling paths.

Fixes: 18a6c85fcc78 ("fsl/fman: Add FMan Port Support")
Signed-off-by: Miaoqian Lin <>
Signed-off-by: David S. Miller <>
3 weeks agoselftests: net: using ping6 for IPv6 in
Jianguo Wu [Thu, 30 Dec 2021 10:40:29 +0000 (18:40 +0800)]
selftests: net: using ping6 for IPv6 in output following message:
  ping: 2001:db8:1::100: Address family for hostname not supported

Using ping6 when pinging IPv6 addresses.

Fixes: a062260a9d5f ("selftests: net: add UDP GRO forwarding self-tests")
Signed-off-by: Jianguo Wu <>
Signed-off-by: David S. Miller <>
3 weeks agoDocumentation: fix outdated interpretation of ip_no_pmtu_disc
xu xin [Thu, 30 Dec 2021 03:28:56 +0000 (03:28 +0000)]
Documentation: fix outdated interpretation of ip_no_pmtu_disc

The updating way of pmtu has changed, but documentation is still in the
old way. So this patch updates the interpretation of ip_no_pmtu_disc and

See commit 28d35bcdd3925 ("net: ipv4: don't let PMTU updates increase
route MTU")

Reported-by: Zeal Robot <>
Signed-off-by: xu xin <>
Signed-off-by: David S. Miller <>
3 weeks agoMerge tag 'amd-drm-fixes-5.16-2021-12-29' of
Dave Airlie [Thu, 30 Dec 2021 03:55:47 +0000 (13:55 +1000)]
Merge tag 'amd-drm-fixes-5.16-2021-12-29' of into drm-fixes


- Fencing fix
- XGMI fix
- VCN regression fix
- IP discovery regression fixes
- Fix runpm documentation
- Suspend/resume fixes
- Yellow Carp display fixes
- MCLK power management fix

Signed-off-by: Dave Airlie <>
From: Alex Deucher <>
3 weeks agoMerge tag 'mlx5-fixes-2021-12-28' of git://
Jakub Kicinski [Thu, 30 Dec 2021 02:19:01 +0000 (18:19 -0800)]
Merge tag 'mlx5-fixes-2021-12-28' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed says:

mlx5 fixes 2021-12-28

This series provides bug fixes to mlx5 driver.

* tag 'mlx5-fixes-2021-12-28' of git://
  net/mlx5e: Fix wrong features assignment in case of error
  net/mlx5e: TC, Fix memory leak with rules with internal port

Signed-off-by: Jakub Kicinski <>
3 weeks agoMerge tag 'drm-intel-fixes-2021-12-29' of git://
Dave Airlie [Thu, 30 Dec 2021 02:12:40 +0000 (12:12 +1000)]
Merge tag 'drm-intel-fixes-2021-12-29' of git:// into drm-fixes

drm/i915 fixes for v5.16:
- Fix possible uninitialized variable
- Fix composite fence seqno icrement on each fence creation

Signed-off-by: Dave Airlie <>
From: Jani Nikula <>
3 weeks agonet/ncsi: check for error return from call to nla_put_u32
Jiasheng Jiang [Wed, 29 Dec 2021 03:21:18 +0000 (11:21 +0800)]
net/ncsi: check for error return from call to nla_put_u32

As we can see from the comment of the nla_put() that it could return
-EMSGSIZE if the tailroom of the skb is insufficient.
Therefore, it should be better to check the return value of the
nla_put_u32 and return the error code if error accurs.
Also, there are many other functions have the same problem, and if this
patch is correct, I will commit a new version to fix all.

Fixes: 955dc68cb9b2 ("net/ncsi: Add generic netlink family")
Signed-off-by: Jiasheng Jiang <>
Signed-off-by: Jakub Kicinski <>
3 weeks agonet: bridge: mcast: fix br_multicast_ctx_vlan_global_disabled helper
Nikolay Aleksandrov [Tue, 28 Dec 2021 15:31:42 +0000 (17:31 +0200)]
net: bridge: mcast: fix br_multicast_ctx_vlan_global_disabled helper

We need to first check if the context is a vlan one, then we need to
check the global bridge multicast vlan snooping flag, and finally the
vlan's multicast flag, otherwise we will unnecessarily enable vlan mcast
processing (e.g. querier timers).

Fixes: 7b54aaaf53cb ("net: bridge: multicast: add vlan state initialization and control")
Signed-off-by: Nikolay Aleksandrov <>
Signed-off-by: Jakub Kicinski <>
3 weeks agonet: fix use-after-free in tw_timer_handler
Muchun Song [Tue, 28 Dec 2021 10:41:45 +0000 (18:41 +0800)]
net: fix use-after-free in tw_timer_handler

A real world panic issue was found as follow in Linux 5.4.

    BUG: unable to handle page fault for address: ffffde49a863de28
    PGD 7e6fe62067 P4D 7e6fe62067 PUD 7e6fe63067 PMD f51e064067 PTE 0
    RIP: 0010:tw_timer_handler+0x20/0x40
    Call Trace:

This issue was also reported since 2017 in the thread [1],
unfortunately, the issue was still can be reproduced after fixing

The ipv4_mib_exit_net is called before tcp_sk_exit_batch when a net
namespace is destroyed since tcp_sk_ops is registered befrore
ipv4_mib_ops, which means tcp_sk_ops is in the front of ipv4_mib_ops
in the list of pernet_list. There will be a use-after-free on
net->mib.net_statistics in tw_timer_handler after ipv4_mib_exit_net
if there are some inflight time-wait timers.

This bug is not introduced by commit f2bf415cfed7 ("mib: add net to
NET_ADD_STATS_BH") since the net_statistics is a global variable
instead of dynamic allocation and freeing. Actually, commit
61a7e26028b9 ("mib: put net statistics on struct net") introduces
the bug since it put net statistics on struct net and free it when
net namespace is destroyed.

Moving init_ipv4_mibs() to the front of tcp_init() to fix this bug
and replace pr_crit() with panic() since continuing is meaningless
when init_ipv4_mibs() fails.


Fixes: 61a7e26028b9 ("mib: put net statistics on struct net")
Signed-off-by: Muchun Song <>
Cc: Cong Wang <>
Cc: Fam Zheng <>
Cc: <>
Signed-off-by: Jakub Kicinski <>
3 weeks agoselftests: net: Fix a typo in
Jianguo Wu [Wed, 29 Dec 2021 07:27:30 +0000 (15:27 +0800)]
selftests: net: Fix a typo in

$rvs -> $rcv

Fixes: a062260a9d5f ("selftests: net: add UDP GRO forwarding self-tests")
Signed-off-by: Jianguo Wu <>
Signed-off-by: Jakub Kicinski <>
3 weeks agoselftests/net: udpgso_bench_tx: fix dst ip argument
wujianguo [Wed, 29 Dec 2021 10:58:10 +0000 (18:58 +0800)]
selftests/net: udpgso_bench_tx: fix dst ip argument

udpgso_bench_tx call setup_sockaddr() for dest address before
parsing all arguments, if we specify "-p ${dst_port}" after "-D ${dst_ip}",
then ${dst_port} will be ignored, and using default cfg_port 8000.

This will cause test case "multiple GRO socks" failed in

Setup sockaddr after parsing all arguments.

Fixes: 3a687bef148d ("selftests: udp gso benchmark")
Signed-off-by: Jianguo Wu <>
Reviewed-by: Willem de Bruijn <>
Signed-off-by: Jakub Kicinski <>
3 weeks agox86/build: Use the proper name CONFIG_FW_LOADER
Lukas Bulwahn [Wed, 29 Dec 2021 11:15:53 +0000 (12:15 +0100)]
x86/build: Use the proper name CONFIG_FW_LOADER

Commit in Fixes intends to add the expression regex only when FW_LOADER
is enabled - not FW_LOADER_BUILTIN. Latter is a leftover from a previous
patchset and not a valid config item.

So, adjust the condition to the actual name of the config.

  [ bp: Cleanup commit message. ]

Fixes: c8dcf655ec81 ("x86/build: Tuck away built-in firmware under FW_LOADER")
Signed-off-by: Lukas Bulwahn <>
Signed-off-by: Borislav Petkov <>
Reviewed-by: Greg Kroah-Hartman <>
3 weeks agoMerge branch 'net-bridge-mcast-add-and-enforce-query-interval-minimum'
Jakub Kicinski [Wed, 29 Dec 2021 20:59:43 +0000 (12:59 -0800)]
Merge branch 'net-bridge-mcast-add-and-enforce-query-interval-minimum'

Nikolay Aleksandrov says:

net: bridge: mcast: add and enforce query interval minimum

This set adds and enforces 1 second minimum value for bridge multicast
query and startup query intervals in order to avoid rearming the timers
too often which could lock and crash the host. I doubt anyone is using
such low values or anything lower than 1 second, so it seems like a good
minimum. In order to be compatible if the value is lower then it is
overwritten and a log message is emitted, since we can't return an error
at this point.

Eric, I looked for the syzbot reports in its dashboard but couldn't find
them so I've added you as the reporter.

I've prepared a global bridge igmp rate limiting patch but wasn't
sure if it's ok for -net. It adds a static limit of 32k packets per
second, I plan to send it for net-next with added drop counters for
each bridge so it can be easily debugged.

Original report can be seen at:

Signed-off-by: Jakub Kicinski <>
3 weeks agonet: bridge: mcast: add and enforce startup query interval minimum
Nikolay Aleksandrov [Mon, 27 Dec 2021 17:21:16 +0000 (19:21 +0200)]
net: bridge: mcast: add and enforce startup query interval minimum

As reported[1] if startup query interval is set too low in combination with
large number of startup queries and we have multiple bridges or even a
single bridge with multiple querier vlans configured we can crash the
machine. Add a 1 second minimum which must be enforced by overwriting the
value if set lower (i.e. without returning an error) to avoid breaking
user-space. If that happens a log message is emitted to let the admin know
that the startup interval has been set to the minimum. It doesn't make
sense to make the startup interval lower than the normal query interval
so use the same value of 1 second. The issue has been present since these
intervals could be user-controlled.


Fixes: d902eee43f19 ("bridge: Add multicast count/interval sysfs entries")
Reported-by: Eric Dumazet <>
Signed-off-by: Nikolay Aleksandrov <>
Signed-off-by: Jakub Kicinski <>
3 weeks agonet: bridge: mcast: add and enforce query interval minimum
Nikolay Aleksandrov [Mon, 27 Dec 2021 17:21:15 +0000 (19:21 +0200)]
net: bridge: mcast: add and enforce query interval minimum

As reported[1] if query interval is set too low and we have multiple
bridges or even a single bridge with multiple querier vlans configured
we can crash the machine. Add a 1 second minimum which must be enforced
by overwriting the value if set lower (i.e. without returning an error) to
avoid breaking user-space. If that happens a log message is emitted to let
the administrator know that the interval has been set to the minimum.
The issue has been present since these intervals could be user-controlled.


Fixes: d902eee43f19 ("bridge: Add multicast count/interval sysfs entries")
Reported-by: Eric Dumazet <>
Signed-off-by: Nikolay Aleksandrov <>
Signed-off-by: Jakub Kicinski <>
3 weeks agoipv6: raw: check passed optlen before reading
Tamir Duberstein [Wed, 29 Dec 2021 20:09:47 +0000 (15:09 -0500)]
ipv6: raw: check passed optlen before reading

Add a check that the user-provided option is at least as long as the
number of bytes we intend to read. Before this patch we would blindly
read sizeof(int) bytes even in cases where the user passed
optlen<sizeof(int), which would potentially read garbage or fault.

Discovered by new tests in .

The original get_user call predates history in the git repo.

Signed-off-by: Tamir Duberstein <>
Signed-off-by: Willem de Bruijn <>
Signed-off-by: Jakub Kicinski <>
3 weeks agoMerge tag 's390-5.16-6' of git://
Linus Torvalds [Wed, 29 Dec 2021 18:07:20 +0000 (10:07 -0800)]
Merge tag 's390-5.16-6' of git://git./linux/kernel/git/s390/linux

Pull s390 fix from Heiko Carstens:

 - fix s390 mcount regex typo in

* tag 's390-5.16-6' of git:// fix typo in s390 mcount regex

3 weeks agoxsk: Initialise xskb free_list_node
Ciara Loftus [Mon, 20 Dec 2021 15:52:50 +0000 (15:52 +0000)]
xsk: Initialise xskb free_list_node

This commit initialises the xskb's free_list_node when the xskb is
allocated. This prevents a potential false negative returned from a call
to list_empty for that node, such as the one introduced in commit
199d983bc015 ("xsk: Fix crash on double free in buffer pool")

In my environment this issue caused packets to not be received by
the xdpsock application if the traffic was running prior to application
launch. This happened when the first batch of packets failed the xskmap
lookup and XDP_PASS was returned from the bpf program. This action is
handled in the i40e zc driver (and others) by allocating an skbuff,
freeing the xdp_buff and adding the associated xskb to the
xsk_buff_pool's free_list if it hadn't been added already. Without this
fix, the xskb is not added to the free_list because the check to determine
if it was added already returns an invalid positive result. Later, this
caused allocation errors in the driver and the failure to receive packets.

Fixes: 199d983bc015 ("xsk: Fix crash on double free in buffer pool")
Fixes: 2b43470add8c ("xsk: Introduce AF_XDP buffer allocation API")
Signed-off-by: Ciara Loftus <>
Acked-by: Magnus Karlsson <>
Signed-off-by: Jakub Kicinski <>
3 weeks agonet/mlx5e: Fix wrong features assignment in case of error
Gal Pressman [Mon, 29 Nov 2021 09:08:41 +0000 (11:08 +0200)]
net/mlx5e: Fix wrong features assignment in case of error

In case of an error in mlx5e_set_features(), 'netdev->features' must be
updated with the correct state of the device to indicate which features
were updated successfully.
To do that we maintain a copy of 'netdev->features' and update it after
successful feature changes, so we can assign it to back to
'netdev->features' if needed.

However, since not all netdev features are handled by the driver (e.g.
GRO/TSO/etc), some features may not be updated correctly in case of an
error updating another feature.

For example, while requesting to disable TSO (feature which is not
handled by the driver) and enable HW-GRO, if an error occurs during
HW-GRO enable, 'oper_features' will be assigned with 'netdev->features'
and HW-GRO turned off. TSO will remain enabled in such case, which is a

To solve that, instead of using 'netdev->features' as the baseline of
'oper_features' and changing it on set feature success, use 'features'
instead and update it in case of errors.

Fixes: 75b81ce719b7 ("net/mlx5e: Don't override netdev features field unless in error flow")
Signed-off-by: Gal Pressman <>
Signed-off-by: Saeed Mahameed <>
3 weeks agonet/mlx5e: TC, Fix memory leak with rules with internal port
Roi Dayan [Wed, 22 Dec 2021 07:20:58 +0000 (09:20 +0200)]
net/mlx5e: TC, Fix memory leak with rules with internal port

Fix a memory leak with decap rule with internal port as destination
device. The driver allocates a modify hdr action but doesn't set
the flow attr modify hdr action which results in skipping releasing
the modify hdr action when releasing the flow.

    [<000000005f8c651c>] krealloc+0x83/0xd0
    [<000000009f59b143>] alloc_mod_hdr_actions+0x156/0x310 [mlx5_core]
    [<000000002257f342>] mlx5e_tc_match_to_reg_set_and_get_id+0x12a/0x360 [mlx5_core]
    [<00000000b44ea75a>] mlx5e_tc_add_fdb_flow+0x962/0x1470 [mlx5_core]
    [<0000000003e384a0>] __mlx5e_add_fdb_flow+0x54c/0xb90 [mlx5_core]
    [<00000000ed8b22b6>] mlx5e_configure_flower+0xe45/0x4af0 [mlx5_core]
    [<00000000024f4ab5>] mlx5e_rep_indr_offload.isra.0+0xfe/0x1b0 [mlx5_core]
    [<000000006c3bb494>] mlx5e_rep_indr_setup_tc_cb+0x90/0x130 [mlx5_core]
    [<00000000d3dac2ea>] tc_setup_cb_add+0x1d2/0x420

Fixes: b16eb3c81fe2 ("net/mlx5: Support internal port as decap route device")
Signed-off-by: Roi Dayan <>
Signed-off-by: Saeed Mahameed <>
3 weeks agoMerge branch '1GbE' of git://
Jakub Kicinski [Wed, 29 Dec 2021 00:19:09 +0000 (16:19 -0800)]
Merge branch '1GbE' of git://git./linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

Intel Wired LAN Driver Updates 2021-12-28

This series contains updates to igc driver only.

Vinicius disables support for crosstimestamp on i225-V as lockups are being

James McLaughlin fixes Tx timestamping support on non-MSI-X platforms.

* '1GbE' of git://
  igc: Fix TX timestamp support for non-MSI-X platforms
  igc: Do not enable crosstimestamping for i225-V models

Signed-off-by: Jakub Kicinski <>
3 weeks agoionic: Initialize the 'lif->dbid_inuse' bitmap
Christophe JAILLET [Sun, 26 Dec 2021 14:06:17 +0000 (15:06 +0100)]
ionic: Initialize the 'lif->dbid_inuse' bitmap

When allocated, this bitmap is not initialized. Only the first bit is set a
few lines below.

Use bitmap_zalloc() to make sure that it is cleared before being used.

Fixes: 6461b446f2a0 ("ionic: Add interrupts and doorbells")
Signed-off-by: Christophe JAILLET <>
Signed-off-by: Shannon Nelson <>
Signed-off-by: Jakub Kicinski <>
3 weeks agodrm/amd/display: Changed pipe split policy to allow for multi-display pipe split
Angus Wang [Thu, 9 Dec 2021 22:27:01 +0000 (17:27 -0500)]
drm/amd/display: Changed pipe split policy to allow for multi-display pipe split

Current implementation of pipe split policy prevents pipe split with
multiple displays connected, which caused the MCLK speed to be stuck at

Changed the pipe split policies so that pipe split is allowed for
multi-display configurations


Note this is a backport of this commit from amdgpu drm-next for 5.16.

Tested-by: Daniel Wheeler <>
Reviewed-by: Aric Cyr <>
Acked-by: Rodrigo Siqueira <>
Signed-off-by: Angus Wang <>
Signed-off-by: Alex Deucher <>
3 weeks agodrm/amd/display: Fix USB4 null pointer dereference in update_psp_stream_config
Nicholas Kazlauskas [Fri, 17 Dec 2021 19:18:59 +0000 (14:18 -0500)]
drm/amd/display: Fix USB4 null pointer dereference in update_psp_stream_config

A porting error on a previous patch left the block of code that
causes the crash from a NULL pointer dereference.

More specifically, we try to access link_enc before it's assigned in
the USB4 case in the following assignment:

config.dio_output_idx = link_enc->transmitter - TRANSMITTER_UNIPHY_A;

That assignment occurs later depending on the ASIC version. It's only
needed on DCN31 and only after link_enc is already assigned.

Fixes: 986430446c917b ("drm/amd/display: fix a crash on USB4 over C20 PHY")
Reviewed-by: Harry Wentland <>
Signed-off-by: Nicholas Kazlauskas <>
Signed-off-by: Alex Deucher <>
3 weeks agodrm/amd/display: Set optimize_pwr_state for DCN31
Nicholas Kazlauskas [Thu, 9 Dec 2021 21:05:36 +0000 (16:05 -0500)]
drm/amd/display: Set optimize_pwr_state for DCN31

We'll exit optimized power state to do link detection but we won't enter
back into the optimized power state.

This could potentially block s2idle entry depending on the sequencing,
but it also means we're losing some power during the transition period.

Hook up the handler like DCN21. It was also missed like the
exit_optimized_pwr_state callback.

Fixes: 64b1d0e8d500 ("drm/amd/display: Add DCN3.1 HWSEQ")
Tested-by: Daniel Wheeler <>
Reviewed-by: Eric Yang <>
Acked-by: Rodrigo Siqueira <>
Signed-off-by: Nicholas Kazlauskas <>
Signed-off-by: Alex Deucher <>
3 weeks agodrm/amd/display: Send s0i2_rdy in stream_count == 0 optimization
Nicholas Kazlauskas [Thu, 9 Dec 2021 18:53:36 +0000 (13:53 -0500)]
drm/amd/display: Send s0i2_rdy in stream_count == 0 optimization

Otherwise SMU won't mark Display as idle when trying to perform s2idle.

Mark the bit in the dcn31 codepath, doesn't apply to older ASIC.

It needed to be split from phy refclk off to prevent entering s2idle
when PSR was engaged but driver was not ready.

Fixes: 118a33151658 ("drm/amd/display: Add DCN3.1 clock manager support")
Tested-by: Daniel Wheeler <>
Reviewed-by: Eric Yang <>
Acked-by: Rodrigo Siqueira <>
Signed-off-by: Nicholas Kazlauskas <>
Signed-off-by: Alex Deucher <>
3 weeks agodrm/amd/display: Added power down for DCN10
Lai, Derek [Mon, 6 Dec 2021 09:10:59 +0000 (17:10 +0800)]
drm/amd/display: Added power down for DCN10

The change of setting a timer callback on boot for 10 seconds is still
working, just lacked power down for DCN10.

Added power down for DCN10.

Tested-by: Daniel Wheeler <>
Reviewed-by: Anthony Koo <>
Acked-by: Rodrigo Siqueira <>
Signed-off-by: Derek Lai <>
Signed-off-by: Alex Deucher <>
3 weeks agodrm/amd/display: fix B0 TMDS deepcolor no dislay issue
Charlene Liu [Mon, 6 Dec 2021 02:19:30 +0000 (21:19 -0500)]
drm/amd/display: fix B0 TMDS deepcolor no dislay issue

B0 PHY C map to F, D map to G driver use logic instance, dmub does the
remap. Driver still need use the right PHY instance to access right HW.

use phyical instance when program PHY register.

could move resync_control programming to dmub next.

Tested-by: Daniel Wheeler <>
Reviewed-by: Dmytro Laktyushkin <>
Reviewed-by: Jun Lei <>
Acked-by: Rodrigo Siqueira <>
Signed-off-by: Charlene Liu <>
Signed-off-by: Alex Deucher <>
3 weeks agoMerge tag 'selinux-pr-20211228' of git://
Linus Torvalds [Tue, 28 Dec 2021 21:33:06 +0000 (13:33 -0800)]
Merge tag 'selinux-pr-20211228' of git://git./linux/kernel/git/pcmoore/selinux

Pull selinux fix from Paul Moore:
 "One more small SELinux patch to address an uninitialized stack

* tag 'selinux-pr-20211228' of git://
  selinux: initialize proto variable in selinux_ip_postroute_compat()

3 weeks agoperf scripts python: Fix printing of switch events
Adrian Hunter [Wed, 15 Dec 2021 08:06:36 +0000 (10:06 +0200)]
perf scripts python: Fix printing of switch events

The script displays only the last of consecutive switch
statements but that may not be the last switch event for the CPU. Fix by
keeping a dictionary of last context switch keyed by CPU, and make it
possible to see all switch events by adding option --all-switch-events.

Fixes: a92bf335fd82eeee ("perf scripts python: Add branches to script")
Signed-off-by: Adrian Hunter <>
Cc: Jiri Olsa <>
Cc: Namhyung Kim <>
Cc: Riccardo Mancini <>
Signed-off-by: Arnaldo Carvalho de Melo <>
3 weeks agoperf script: Fix CPU filtering of a script's switch events
Adrian Hunter [Wed, 15 Dec 2021 08:06:35 +0000 (10:06 +0200)]
perf script: Fix CPU filtering of a script's switch events

CPU filtering was not being applied to a script's switch events.

Fixes: 5bf83c29a0ad2e78 ("perf script: Add scripting operation process_switch()")
Signed-off-by: Adrian Hunter <>
Acked-by: Namhyung Kim <>
Cc: Jiri Olsa <>
Cc: Riccardo Mancini <>
Signed-off-by: Arnaldo Carvalho de Melo <>
3 weeks agoperf intel-pt: Fix parsing of VM time correlation arguments
Adrian Hunter [Wed, 15 Dec 2021 08:06:34 +0000 (10:06 +0200)]
perf intel-pt: Fix parsing of VM time correlation arguments

Parser did not take ':' into account.



  $ perf record -e intel_pt//u uname
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.026 MB ]
  $ perf inject -i --vm-time-correlation="dry-run 123"
  $ perf inject -i --vm-time-correlation="dry-run 123:456"
  Failed to parse VM Time Correlation options
  0x620 [0x98]: failed to process type: 70 [Invalid argument]


  $ perf inject -i --vm-time-correlation="dry-run 123:456"

Fixes: e3ff42bdebcfeb5f ("perf intel-pt: Parse VM Time Correlation options and set up decoding")
Signed-off-by: Adrian Hunter <>
Acked-by: Namhyung Kim <>
Cc: Jiri Olsa <>
Cc: Riccardo Mancini <>
Signed-off-by: Arnaldo Carvalho de Melo <>
3 weeks agoperf expr: Fix return value of ids__new()
Miaoqian Lin [Tue, 14 Dec 2021 01:10:27 +0000 (01:10 +0000)]
perf expr: Fix return value of ids__new()

callers of ids__new() function only do NULL checking for the return
value. ids__new() calles hashmap__new(), which may return

Instead of changing the checking one-by-one return NULL instead of
ERR_PTR(-ENOMEM) to keep it consistent.

Signed-off-by: Miaoqian Lin <>
Reviewed-by: German Gomez <>
Tested-by: German Gomez <>
Acked-by: Jiri Olsa <>
Cc: Alexander Shishkin <>
Cc: Andi Kleen <>
Cc: Ian Rogers <>
Cc: Mark Rutland <>
Cc: Namhyung Kim <>
Cc: Peter Zijlstra <>
Signed-off-by: Arnaldo Carvalho de Melo <>
3 weeks agoMerge tag 'auxdisplay-for-linus-v5.16' of git://
Linus Torvalds [Tue, 28 Dec 2021 19:46:15 +0000 (11:46 -0800)]
Merge tag 'auxdisplay-for-linus-v5.16' of git://

Pull auxdisplay fixes from Miguel Ojeda:
 "A couple of improvements for charlcd:

   - check pointer before dereferencing

   - fix coding style issue"

* tag 'auxdisplay-for-linus-v5.16' of git://
  auxdisplay: charlcd: checking for pointer reference before dereferencing
  auxdisplay: charlcd: fixing coding style issue

3 weeks agoMerge tag 'powerpc-5.16-5' of git://
Linus Torvalds [Tue, 28 Dec 2021 19:42:01 +0000 (11:42 -0800)]
Merge tag 'powerpc-5.16-5' of git://git./linux/kernel/git/powerpc/linux

Pull powerpc fix from Michael Ellerman:
 "Fix DEBUG_WX never reporting any WX mappings, due to use of an
  incorrect config symbol since we converted to using generic ptdump"

* tag 'powerpc-5.16-5' of git://
  powerpc/ptdump: Fix DEBUG_WX since generic ptdump conversion

3 weeks agoigc: Fix TX timestamp support for non-MSI-X platforms
James McLaughlin [Fri, 17 Dec 2021 23:49:33 +0000 (16:49 -0700)]
igc: Fix TX timestamp support for non-MSI-X platforms

Time synchronization was not properly enabled on non-MSI-X platforms.

Fixes: 2c344ae24501 ("igc: Add support for TX timestamping")
Signed-off-by: James McLaughlin <>
Reviewed-by: Vinicius Costa Gomes <>
Tested-by: Nechama Kraus <>
Signed-off-by: Tony Nguyen <>
3 weeks agoigc: Do not enable crosstimestamping for i225-V models
Vinicius Costa Gomes [Tue, 14 Dec 2021 00:39:49 +0000 (16:39 -0800)]
igc: Do not enable crosstimestamping for i225-V models

It was reported that when PCIe PTM is enabled, some lockups could
be observed with some integrated i225-V models.

While the issue is investigated, we can disable crosstimestamp for
those models and see no loss of functionality, because those models
don't have any support for time synchronization.

Fixes: a90ec8483732 ("igc: Add support for PTP getcrosststamp()")
Reported-by: Stefan Dietrich <>
Signed-off-by: Vinicius Costa Gomes <>
Tested-by: Nechama Kraus <>
Signed-off-by: Tony Nguyen <>
3 weeks agodrm/amdgpu: no DC support for headless chips
Alex Deucher [Thu, 23 Dec 2021 19:13:02 +0000 (14:13 -0500)]
drm/amdgpu: no DC support for headless chips

Chips with no display hardware should return false for
DC support.

v2: drop Arcturus and Aldebaran

Fixes: f7f12b25823c0d ("drm/amdgpu: default to true in amdgpu_device_asic_has_dc_support")
Reviewed-by: Evan Quan <>
Reviewed-by: Guchun Chen <>
Reported-by: Tareque Md.Hanif <>
Signed-off-by: Alex Deucher <>
3 weeks agoMerge branch 'smc-fixes'
David S. Miller [Tue, 28 Dec 2021 12:42:46 +0000 (12:42 +0000)]
Merge branch 'smc-fixes'

Dust Li says:

net/smc: fix kernel panic caused by race of smc_sock

This patchset fixes the race between smc_release triggered by
close(2) and cdc_handle triggered by underlaying RDMA device.

The race is caused because the smc_connection may been released
before the pending tx CDC messages got its CQEs. In order to fix
this, I add a counter to track how many pending WRs we have posted
through the smc_connection, and only release the smc_connection
after there is no pending WRs on the connection.

The first patch prevents posting WR on a QP that is not in RTS
state. This patch is needed because if we post WR on a QP that
is not in RTS state, ib_post_send() may success but no CQE will
return, and that will confuse the counter tracking the pending

The second patch add a counter to track how many WRs were posted
through the smc_connection, and don't reset the QP on link destroying
to prevent leak of the counter.

Signed-off-by: David S. Miller <>
3 weeks agonet/smc: fix kernel panic caused by race of smc_sock
Dust Li [Tue, 28 Dec 2021 09:03:25 +0000 (17:03 +0800)]
net/smc: fix kernel panic caused by race of smc_sock

A crash occurs when smc_cdc_tx_handler() tries to access smc_sock
but smc_release() has already freed it.

[ 4570.695099] BUG: unable to handle page fault for address: 000000002eae9e88
[ 4570.696048] #PF: supervisor write access in kernel mode
[ 4570.696728] #PF: error_code(0x0002) - not-present page
[ 4570.697401] PGD 0 P4D 0
[ 4570.697716] Oops: 0002 [#1] PREEMPT SMP NOPTI
[ 4570.698228] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.16.0-rc4+ #111
[ 4570.699013] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 8c24b4c 04/0
[ 4570.699933] RIP: 0010:_raw_spin_lock+0x1a/0x30
[ 4570.711446] Call Trace:
[ 4570.711746]  <IRQ>
[ 4570.711992]  smc_cdc_tx_handler+0x41/0xc0
[ 4570.712470]  smc_wr_tx_tasklet_fn+0x213/0x560
[ 4570.712981]  ? smc_cdc_tx_dismisser+0x10/0x10
[ 4570.713489]  tasklet_action_common.isra.17+0x66/0x140
[ 4570.714083]  __do_softirq+0x123/0x2f4
[ 4570.714521]  irq_exit_rcu+0xc4/0xf0
[ 4570.714934]  common_interrupt+0xba/0xe0

Though smc_cdc_tx_handler() checked the existence of smc connection,
smc_release() may have already dismissed and released the smc socket
before smc_cdc_tx_handler() further visits it.

smc_cdc_tx_handler()           |smc_release()
if (!conn)                     |
                               |      smc_cdc_tx_dismisser()
                               |sock_put(&smc->sk) <- last sock_put,
                               |                      smc_sock freed
bh_lock_sock(&smc->sk) (panic) |

To make sure we won't receive any CDC messages after we free the
smc_sock, add a refcount on the smc_connection for inflight CDC
message(posted to the QP but haven't received related CQE), and
don't release the smc_connection until all the inflight CDC messages
haven been done, for both success or failed ones.

Using refcount on CDC messages brings another problem: when the link
is going to be destroyed, smcr_link_clear() will reset the QP, which
then remove all the pending CQEs related to the QP in the CQ. To make
sure all the CQEs will always come back so the refcount on the
smc_connection can always reach 0, smc_ib_modify_qp_reset() was replaced
by smc_ib_modify_qp_error().
And remove the timeout in smc_wr_tx_wait_no_pending_sends() since we
need to wait for all pending WQEs done, or we may encounter use-after-
free when handling CQEs.

For IB device removal routine, we need to wait for all the QPs on that
device been destroyed before we can destroy CQs on the device, or
the refcount on smc_connection won't reach 0 and smc_sock cannot be

Fixes: 5f08318f617b ("smc: connection data control (CDC)")
Reported-by: Wen Gu <>
Signed-off-by: Dust Li <>
Signed-off-by: David S. Miller <>
3 weeks agonet/smc: don't send CDC/LLC message if link not ready
Dust Li [Tue, 28 Dec 2021 09:03:24 +0000 (17:03 +0800)]
net/smc: don't send CDC/LLC message if link not ready

We found smc_llc_send_link_delete_all() sometimes wait
for 2s timeout when testing with RDMA link up/down.
It is possible when a smc_link is in ACTIVATING state,
the underlaying QP is still in RESET or RTR state, which
cannot send any messages out.

smc_llc_send_link_delete_all() use smc_link_usable() to
checks whether the link is usable, if the QP is still in
RESET or RTR state, but the smc_link is in ACTIVATING, this
LLC message will always fail without any CQE entering the
CQ, and we will always wait 2s before timeout.

Since we cannot send any messages through the QP before
the QP enter RTS. I add a wrapper smc_link_sendable()
which checks the state of QP along with the link state.
And replace smc_link_usable() with smc_link_sendable()
in all LLC & CDC message sending routine.

Fixes: 5f08318f617b ("smc: connection data control (CDC)")
Signed-off-by: Dust Li <>
Signed-off-by: David S. Miller <>
3 weeks agoNFC: st21nfca: Fix memory leak in device probe and remove
Wei Yongjun [Tue, 28 Dec 2021 12:48:11 +0000 (12:48 +0000)]
NFC: st21nfca: Fix memory leak in device probe and remove

'phy->pending_skb' is alloced when device probe, but forgot to free
in the error handling path and remove path, this cause memory leak
as follows:

unreferenced object 0xffff88800bc06800 (size 512):
  comm "8", pid 11775, jiffies 4295159829 (age 9.032s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    [<00000000d66c09ce>] __kmalloc_node_track_caller+0x1ed/0x450
    [<00000000c93382b3>] kmalloc_reserve+0x37/0xd0
    [<000000005fea522c>] __alloc_skb+0x124/0x380
    [<0000000019f29f9a>] st21nfca_hci_i2c_probe+0x170/0x8f2

Fix it by freeing 'pending_skb' in error and remove.

Fixes: 68957303f44a ("NFC: ST21NFCA: Add driver for STMicroelectronics ST21NFCA NFC Chip")
Reported-by: Hulk Robot <>
Signed-off-by: Wei Yongjun <>
Signed-off-by: David S. Miller <>
3 weeks agonet: lantiq_xrx200: fix statistics of received bytes
Aleksander Jan Bajkowski [Mon, 27 Dec 2021 16:22:03 +0000 (17:22 +0100)]
net: lantiq_xrx200: fix statistics of received bytes

Received frames have FCS truncated. There is no need
to subtract FCS length from the statistics.

Fixes: fe1a56420cf2 ("net: lantiq: Add Lantiq / Intel VRX200 Ethernet driver")
Signed-off-by: Aleksander Jan Bajkowski <>
Signed-off-by: David S. Miller <>
3 weeks agonet: ag71xx: Fix a potential double free in error handling paths
Christophe JAILLET [Sun, 26 Dec 2021 17:51:44 +0000 (18:51 +0100)]
net: ag71xx: Fix a potential double free in error handling paths

'ndev' is a managed resource allocated with devm_alloc_etherdev(), so there
is no need to call free_netdev() explicitly or there will be a double

Simplify all error handling paths accordingly.

Fixes: d51b6ce441d3 ("net: ethernet: add ag71xx driver")
Signed-off-by: Christophe JAILLET <>
Signed-off-by: David S. Miller <>
3 weeks agomISDN: change function names to avoid conflicts
wolfgang huang [Tue, 28 Dec 2021 08:01:20 +0000 (16:01 +0800)]
mISDN: change function names to avoid conflicts

As we build for mips, we meet following error. l1_init error with
multiple definition. Some architecture devices usually marked with
l1, l2, lxx as the start-up phase. so we change the mISDN function
names, align with Isdnl2_xxx.

mips-linux-gnu-ld: drivers/isdn/mISDN/layer1.o: in function `l1_init':
(.text+0x890): multiple definition of `l1_init'; \
arch/mips/kernel/bmips_5xxx_init.o:(.text+0xf0): first defined here
make[1]: *** [home/mips/kernel-build/linux/Makefile:1161: vmlinux] Error 1

Signed-off-by: wolfgang huang <>
Reported-by: k2ci <>
Signed-off-by: David S. Miller <>
3 weeks agodrm/amdgpu: put SMU into proper state on runpm suspending for BOCO capable platform
Evan Quan [Fri, 17 Dec 2021 11:05:06 +0000 (19:05 +0800)]
drm/amdgpu: put SMU into proper state on runpm suspending for BOCO capable platform

By setting mp1_state as PP_MP1_STATE_UNLOAD, MP1 will do some proper cleanups and
put itself into a state ready for PNP. That can workaround some random resuming
failure observed on BOCO capable platforms.

Signed-off-by: Evan Quan <>
Acked-by: Alex Deucher <>
Reviewed-by: Guchun Chen <>
Reviewed-by: Lijo Lazar <>
Signed-off-by: Alex Deucher <>
3 weeks agodrm/amdgpu: always reset the asic in suspend (v2)
Alex Deucher [Fri, 12 Nov 2021 16:25:30 +0000 (11:25 -0500)]
drm/amdgpu: always reset the asic in suspend (v2)

If the platform suspend happens to fail and the power rail
is not turned off, the GPU will be in an unknown state on
resume, so reset the asic so that it will be in a known
good state on resume even if the platform suspend failed.

v2: handle s0ix

Acked-by: Luben Tuikov <>
Acked-by: Evan Quan <>
Signed-off-by: Alex Deucher <>
3 weeks agoMerge tag 'efi-urgent-for-v5.16-2' of git://
Linus Torvalds [Mon, 27 Dec 2021 16:58:35 +0000 (08:58 -0800)]
Merge tag 'efi-urgent-for-v5.16-2' of git://git./linux/kernel/git/efi/efi

Pull EFI fix from Ard Biesheuvel:
 "Another EFI fix for v5.16:

   - Prevent missing prototype warning from breaking the build under

* tag 'efi-urgent-for-v5.16-2' of git://
  efi: Move efifb_setup_from_dmi() prototype from arch headers

3 weeks agodrm/amd/pm: skip setting gfx cgpg in the s0ix suspend-resume
Prike Liang [Mon, 13 Dec 2021 08:17:02 +0000 (16:17 +0800)]
drm/amd/pm: skip setting gfx cgpg in the s0ix suspend-resume

In the s0ix entry need retain gfx in the gfxoff state,so here need't
set gfx cgpg in the S0ix suspend-resume process. Moreover move the S0ix
check into SMU12 can simplify the code condition check.

Signed-off-by: Prike Liang <>
Reviewed-by: Evan Quan <>
Signed-off-by: Alex Deucher <>
3 weeks agoselinux: initialize proto variable in selinux_ip_postroute_compat()
Tom Rix [Fri, 24 Dec 2021 15:07:39 +0000 (07:07 -0800)]
selinux: initialize proto variable in selinux_ip_postroute_compat()

Clang static analysis reports this warning

hooks.c:5765:6: warning: 4th function call argument is an uninitialized
        if (selinux_xfrm_postroute_last(sksec->sid, skb, &ad, proto))

selinux_parse_skb() can return ok without setting proto.  The later call
to selinux_xfrm_postroute_last() does an early check of proto and can
return ok if the garbage proto value matches.  So initialize proto.

Fixes: eef9b41622f2 ("selinux: cleanup selinux_xfrm_sock_rcv_skb() and selinux_xfrm_postroute_last()")
Signed-off-by: Tom Rix <>
[PM: typo/spelling and description fixes]
Signed-off-by: Paul Moore <>
3 weeks agonfc: uapi: use kernel size_t to fix user-space builds
Krzysztof Kozlowski [Sun, 26 Dec 2021 12:03:47 +0000 (13:03 +0100)]
nfc: uapi: use kernel size_t to fix user-space builds

Fix user-space builds if it includes /usr/include/linux/nfc.h before
some of other headers:

  /usr/include/linux/nfc.h:281:9: error: unknown type name ‘size_t’
    281 |         size_t service_name_len;
        |         ^~~~~~

Fixes: d646960f7986 ("NFC: Initial LLCP support")
Cc: <>
Signed-off-by: Krzysztof Kozlowski <>
Signed-off-by: David S. Miller <>
3 weeks agouapi: fix linux/nfc.h userspace compilation errors
Dmitry V. Levin [Sun, 26 Dec 2021 13:01:27 +0000 (16:01 +0300)]
uapi: fix linux/nfc.h userspace compilation errors

Replace sa_family_t with __kernel_sa_family_t to fix the following
linux/nfc.h userspace compilation errors:

/usr/include/linux/nfc.h:266:2: error: unknown type name 'sa_family_t'
  sa_family_t sa_family;
/usr/include/linux/nfc.h:274:2: error: unknown type name 'sa_family_t'
  sa_family_t sa_family;

Fixes: 23b7869c0fd0 ("NFC: add the NFC socket raw protocol")
Fixes: d646960f7986 ("NFC: Initial LLCP support")
Cc: <>
Signed-off-by: Dmitry V. Levin <>
Reviewed-by: Krzysztof Kozlowski <>
Signed-off-by: David S. Miller <>
3 weeks agonet: usb: pegasus: Do not drop long Ethernet frames
Matthias-Christian Ott [Sun, 26 Dec 2021 22:12:08 +0000 (23:12 +0100)]
net: usb: pegasus: Do not drop long Ethernet frames

The D-Link DSB-650TX (2001:4002) is unable to receive Ethernet frames
that are longer than 1518 octets, for example, Ethernet frames that
contain 802.1Q VLAN tags.

The frames are sent to the pegasus driver via USB but the driver
discards them because they have the Long_pkt field set to 1 in the
received status report. The function read_bulk_callback of the pegasus
driver treats such received "packets" (in the terminology of the
hardware) as errors but the field simply does just indicate that the
Ethernet frame (MAC destination to FCS) is longer than 1518 octets.

It seems that in the 1990s there was a distinction between
"giant" (> 1518) and "runt" (< 64) frames and the hardware includes
flags to indicate this distinction. It seems that the purpose of the
distinction "giant" frames was to not allow infinitely long frames due
to transmission errors and to allow hardware to have an upper limit of
the frame size. However, the hardware already has such limit with its
2048 octet receive buffer and, therefore, Long_pkt is merely a
convention and should not be treated as a receive error.

Actually, the hardware is even able to receive Ethernet frames with 2048
octets which exceeds the claimed limit frame size limit of the driver of
1536 octets (PEGASUS_MTU).

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Matthias-Christian Ott <>
Reviewed-by: Andrew Lunn <>
Signed-off-by: David S. Miller <>
3 weeks agoatlantic: Fix buff_ring OOB in aq_ring_rx_clean
Zekun Shen [Mon, 27 Dec 2021 02:32:45 +0000 (21:32 -0500)]
atlantic: Fix buff_ring OOB in aq_ring_rx_clean

The function obtain the next buffer without boundary check.
We should return with I/O error code.

The bug is found by fuzzing and the crash report is attached.
It is an OOB bug although reported as use-after-free.

[    4.804724] BUG: KASAN: use-after-free in aq_ring_rx_clean+0x1e88/0x2730 [atlantic]
[    4.805661] Read of size 4 at addr ffff888034fe93a8 by task ksoftirqd/0/9
[    4.806505]
[    4.806703] CPU: 0 PID: 9 Comm: ksoftirqd/0 Tainted: G        W         5.6.0 #34
[    4.809030] Call Trace:
[    4.809343]  dump_stack+0x76/0xa0
[    4.809755]  print_address_description.constprop.0+0x16/0x200
[    4.810455]  ? aq_ring_rx_clean+0x1e88/0x2730 [atlantic]
[    4.811234]  ? aq_ring_rx_clean+0x1e88/0x2730 [atlantic]
[    4.813183]  __kasan_report.cold+0x37/0x7c
[    4.813715]  ? aq_ring_rx_clean+0x1e88/0x2730 [atlantic]
[    4.814393]  kasan_report+0xe/0x20
[    4.814837]  aq_ring_rx_clean+0x1e88/0x2730 [atlantic]
[    4.815499]  ? hw_atl_b0_hw_ring_rx_receive+0x9a5/0xb90 [atlantic]
[    4.816290]  aq_vec_poll+0x179/0x5d0 [atlantic]
[    4.816870]  ? _GLOBAL__sub_I_65535_1_aq_pci_func_init+0x20/0x20 [atlantic]
[    4.817746]  ? __next_timer_interrupt+0xba/0xf0
[    4.818322]  net_rx_action+0x363/0xbd0
[    4.818803]  ? call_timer_fn+0x240/0x240
[    4.819302]  ? __switch_to_asm+0x40/0x70
[    4.819809]  ? napi_busy_loop+0x520/0x520
[    4.820324]  __do_softirq+0x18c/0x634
[    4.820797]  ? takeover_tasklets+0x5f0/0x5f0
[    4.821343]  run_ksoftirqd+0x15/0x20
[    4.821804]  smpboot_thread_fn+0x2f1/0x6b0
[    4.822331]  ? smpboot_unregister_percpu_thread+0x160/0x160
[    4.823041]  ? __kthread_parkme+0x80/0x100
[    4.823571]  ? smpboot_unregister_percpu_thread+0x160/0x160
[    4.824301]  kthread+0x2b5/0x3b0
[    4.824723]  ? kthread_create_on_node+0xd0/0xd0
[    4.825304]  ret_from_fork+0x35/0x40

Signed-off-by: Zekun Shen <>
Signed-off-by: David S. Miller <>
3 weeks agonet: udp: fix alignment problem in udp4_seq_show()
yangxingwu [Mon, 27 Dec 2021 08:29:51 +0000 (16:29 +0800)]
net: udp: fix alignment problem in udp4_seq_show()

$ cat /pro/net/udp


  sl  local_address rem_address   st tx_queue rx_queue tr tm->when
26050: 0100007F:0035 00000000:0000 07 00000000:00000000 00:00000000
26320: 0100007F:0143 00000000:0000 07 00000000:00000000 00:00000000
27135: 00000000:8472 00000000:0000 07 00000000:00000000 00:00000000


   sl  local_address rem_address   st tx_queue rx_queue tr tm->when
26050: 0100007F:0035 00000000:0000 07 00000000:00000000 00:00000000
26320: 0100007F:0143 00000000:0000 07 00000000:00000000 00:00000000
27135: 00000000:8472 00000000:0000 07 00000000:00000000 00:00000000

Signed-off-by: yangxingwu <>
Signed-off-by: David S. Miller <>
3 weeks agonet/smc: fix using of uninitialized completions
Karsten Graul [Mon, 27 Dec 2021 13:35:30 +0000 (14:35 +0100)]
net/smc: fix using of uninitialized completions

In smc_wr_tx_send_wait() the completion on index specified by
pend->idx is initialized and after smc_wr_tx_send() was called the wait
for completion starts. pend->idx is used to get the correct index for
the wait, but the pend structure could already be cleared in
Introduce pnd_idx to hold and use a local copy of the correct index.

Fixes: 09c61d24f96d ("net/smc: wait for departure of an IB message")
Signed-off-by: Karsten Graul <>
Signed-off-by: David S. Miller <>
3 weeks agoip6_vti: initialize __ip6_tnl_parm struct in vti6_siocdevprivate
William Zhao [Thu, 23 Dec 2021 17:33:16 +0000 (12:33 -0500)]
ip6_vti: initialize __ip6_tnl_parm struct in vti6_siocdevprivate

The "__ip6_tnl_parm" struct was left uninitialized causing an invalid
load of random data when the "__ip6_tnl_parm" struct was used elsewhere.
As an example, in the function "ip6_tnl_xmit_ctl()", it tries to access
the "collect_md" member. With "__ip6_tnl_parm" being uninitialized and
containing random data, the UBSAN detected that "collect_md" held a
non-boolean value.

The UBSAN issue is as follows:
UBSAN: invalid-load in net/ipv6/ip6_tunnel.c:1025:14
load of value 30 is not a valid value for type '_Bool'
CPU: 1 PID: 228 Comm: kworker/1:3 Not tainted 5.16.0-rc4+ #8
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Workqueue: ipv6_addrconf addrconf_dad_work
Call Trace:
? __cpuhp_setup_state+0x1d3/0x210
ip6_tnl_xmit_ctl.cold.52+0x2c/0x6f [ip6_tunnel]
vti6_tnl_xmit+0x79c/0x1e96 [ip6_vti]
? lock_is_held_type+0xd9/0x130
? vti6_rcv+0x100/0x100 [ip6_vti]
? lock_is_held_type+0xd9/0x130
? rcu_read_lock_bh_held+0xc0/0xc0
? lock_acquired+0x262/0xb10
? mark_lock.part.52+0xf7/0x1050
? netdev_core_pick_tx+0x290/0x290
? kvm_clock_read+0x14/0x30
? kvm_sched_clock_read+0x5/0x10
? sched_clock_cpu+0x15/0x200
? find_held_lock+0x3a/0x1c0
? lock_release+0x42f/0xc90
? lock_downgrade+0x6b0/0x6b0
? mark_held_locks+0xb7/0x120
? neigh_connected_output+0x31f/0x470
? lockdep_hardirqs_on+0x79/0x100
? neigh_connected_output+0x31f/0x470
? ip6_finish_output2+0x9b0/0x1d90
? rcu_read_lock_bh_held+0x62/0xc0
? ip6_finish_output2+0x9b0/0x1d90
? ip6_append_data+0x330/0x330
? ip6_mtu+0x166/0x370
? __ip6_finish_output+0x1ad/0xfb0
? nf_hook_slow+0xa6/0x170
? nf_hook.constprop.32+0x317/0x430
? ip6_finish_output+0x180/0x180
? __ip6_finish_output+0xfb0/0xfb0
? lock_is_held_type+0xd9/0x130
? __sk_mem_raise_allocated+0x11cf/0x1560
? dst_output+0x4a0/0x4a0
? ndisc_send_rs+0x432/0x610
? addrconf_rs_timer+0x650/0x650
? addrconf_dad_work+0x73c/0x10e0
? addrconf_dad_completed+0xbb0/0xbb0
? rcu_read_lock_sched_held+0xaf/0xe0
? rcu_read_lock_bh_held+0xc0/0xc0
? pwq_dec_nr_in_flight+0x270/0x270
? process_one_work+0x1740/0x1740
? set_kthread_struct+0x100/0x100

The solution is to initialize "__ip6_tnl_parm" struct to zeros in the
"vti6_siocdevprivate()" function.

Signed-off-by: William Zhao <>
Signed-off-by: David S. Miller <>
3 weeks agodrm/i915: Increment composite fence seqno
Matthew Brost [Tue, 14 Dec 2021 19:59:13 +0000 (11:59 -0800)]
drm/i915: Increment composite fence seqno

Increment composite fence seqno on each fence creation.

Fixes: 544460c33821 ("drm/i915: Multi-BB execbuf")
Signed-off-by: Matthew Brost <>
Reviewed-by: John Harrison <>
Signed-off-by: John Harrison <>
(cherry picked from commit 62eeb9ae1364cd96991ccc6e3c5c69d66b8c64df)
Signed-off-by: Jani Nikula <>
3 weeks agodrm/i915: Fix possible uninitialized variable in parallel extension
Matthew Brost [Sun, 19 Dec 2021 00:19:09 +0000 (16:19 -0800)]
drm/i915: Fix possible uninitialized variable in parallel extension

'prev_engine' was declared inside the output loop and checked in the
inner after at least 1 pass of either loop. The variable should be
declared outside both loops as it needs to be persistent across the
entire loop structure.

Fixes: e5e32171a2cf ("drm/i915/guc: Connect UAPI to GuC multi-lrc interface")
Signed-off-by: Matthew Brost <>
Reviewed-by: Lucas De Marchi <>
Signed-off-by: John Harrison <>
(cherry picked from commit cbffbac9c14220b8716b0a9c29d72243f6b14ef3)
Signed-off-by: Jani Nikula <>
3 weeks agoLinux 5.16-rc7
Linus Torvalds [Sun, 26 Dec 2021 21:17:17 +0000 (13:17 -0800)]
Linux 5.16-rc7

3 weeks agoMerge tag 'x86_urgent_for_v5.16_rc7' of git://
Linus Torvalds [Sun, 26 Dec 2021 18:28:55 +0000 (10:28 -0800)]
Merge tag 'x86_urgent_for_v5.16_rc7' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

 - Prevent potential undefined behavior due to shifting pkey constants
   into the sign bit

 - Move the EFI memory reservation code *after* the efi= cmdline parsing
   has happened

 - Revert two commits which turned out to be the wrong direction to
   chase when accommodating early memblock reservations consolidation
   and command line parameters parsing

* tag 'x86_urgent_for_v5.16_rc7' of git://
  x86/pkey: Fix undefined behaviour with PKRU_WD_BIT
  x86/boot: Move EFI range reservation after cmdline parsing
  Revert "x86/boot: Pull up cmdline preparation and early param parsing"
  Revert "x86/boot: Mark prepare_command_line() __init"

3 weeks agoMerge tag 'objtool_urgent_for_v5.16_rc7' of git://
Linus Torvalds [Sun, 26 Dec 2021 18:19:40 +0000 (10:19 -0800)]
Merge tag 'objtool_urgent_for_v5.16_rc7' of git://git./linux/kernel/git/tip/tip

Pull objtool fixes from Borislav Petkov:

 - Prevent clang from reordering the reachable annotation in
   an inline asm statement without inputs

 - Fix objtool builds on non-glibc systems due to undefined

* tag 'objtool_urgent_for_v5.16_rc7' of git://
  compiler.h: Fix annotation macro misplacement with Clang
  uapi: Fix undefined __always_inline on non-glibc systems

3 weeks agoMerge tag 'pinctrl-v5.16-3' of git://
Linus Torvalds [Sun, 26 Dec 2021 04:00:09 +0000 (20:00 -0800)]
Merge tag 'pinctrl-v5.16-3' of git://git./linux/kernel/git/linusw/linux-pinctrl

Pull pin control fixes from Linus Walleij:
 "Some hopefully final pin control fixes for the v5.16 kernel:

   - Fix an out-of-bounds bug in the Mediatek driver

   - Fix an init order bug in the Broadcom BCM2835 driver

   - Fix a GPIO offset bug in the STM32 driver"

* tag 'pinctrl-v5.16-3' of git://
  pinctrl: stm32: consider the GPIO offset to expose all the GPIO lines
  pinctrl: bcm2835: Change init order for gpio hogs
  pinctrl: mediatek: fix global-out-of-bounds issue

3 weeks agoMerge tag 'hwmon-for-v5.16-rc7' of git://
Linus Torvalds [Sat, 25 Dec 2021 21:08:22 +0000 (13:08 -0800)]
Merge tag 'hwmon-for-v5.16-rc7' of git://git./linux/kernel/git/groeck/linux-staging

Pull hwmon fixes from Guenter Roeck:
 "A couple of lm90 driver fixes. None of them are critical, but they
  should nevertheless be fixed"

* tag 'hwmon-for-v5.16-rc7' of git://
  hwmon: (lm90) Do not report 'busy' status bit as alarm
  hwmom: (lm90) Fix citical alarm status for MAX6680/MAX6681
  hwmon: (lm90) Drop critical attribute support for MAX6654
  hwmon: (lm90) Prevent integer overflow/underflow in hysteresis calculations
  hwmon: (lm90) Fix usage of CONFIG2 register in detect function

3 weeks agoMerge branch 'for-linus' of git://
Linus Torvalds [Sat, 25 Dec 2021 21:00:14 +0000 (13:00 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input

Pull input fixes from Dmitry Torokhov:
 "A few small updates to drivers.

  Of note we are now deferring probes of i8042 on some Asus devices as
  the controller is not ready to respond to queries first time around
  when the driver is compiled into the kernel"

* 'for-linus' of git://
  Input: elants_i2c - do not check Remark ID on eKTH3900/eKTH5312
  Input: atmel_mxt_ts - fix double free in mxt_read_info_block
  Input: goodix - fix memory leak in goodix_firmware_upload
  Input: goodix - add id->model mapping for the "9111" model
  Input: goodix - try not to touch the reset-pin on x86/ACPI devices
  Input: i8042 - enable deferred probe quirk for ASUS UM325UA
  Input: elantech - fix stack out of bound access in elantech_change_report_id()
  Input: iqs626a - prohibit inlining of channel parsing functions
  Input: i8042 - add deferred probe support

3 weeks agoMerge branch 'akpm' (patches from Andrew)
Linus Torvalds [Sat, 25 Dec 2021 20:30:03 +0000 (12:30 -0800)]
Merge branch 'akpm' (patches from Andrew)

Merge misc fixes from Andrew Morton:
 "9 patches.

  Subsystems affected by this patch series: mm (kfence, mempolicy,
  memory-failure, pagemap, pagealloc, damon, and memory-failure),
  core-kernel, and MAINTAINERS"

* emailed patches from Andrew Morton <>:
  mm/hwpoison: clear MF_COUNT_INCREASED before retrying get_any_page()
  mm/damon/dbgfs: protect targets destructions with kdamond_lock
  mm/page_alloc: fix __alloc_size attribute for alloc_pages_exact_nid
  mm: delete unsafe BUG from page_cache_add_speculative()
  mm, hwpoison: fix condition in free hugetlb page path
  MAINTAINERS: mark more list instances as moderated
  kernel/crash_core: suppress unknown crashkernel parameter warning
  mm: mempolicy: fix THP allocations escaping mempolicy restrictions
  kfence: fix memory leak when cat kfence objects

3 weeks agomm/hwpoison: clear MF_COUNT_INCREASED before retrying get_any_page()
Liu Shixin [Sat, 25 Dec 2021 05:12:58 +0000 (21:12 -0800)]
mm/hwpoison: clear MF_COUNT_INCREASED before retrying get_any_page()

Hulk Robot reported a panic in put_page_testzero() when testing
madvise() with MADV_SOFT_OFFLINE.  The BUG() is triggered when retrying
get_any_page().  This is because we keep MF_COUNT_INCREASED flag in
second try but the refcnt is not increased.

    page dumped because: VM_BUG_ON_PAGE(page_ref_count(page) == 0)
    ------------[ cut here ]------------
    kernel BUG at include/linux/mm.h:737!
    invalid opcode: 0000 [#1] PREEMPT SMP
    CPU: 5 PID: 2135 Comm: sshd Tainted: G    B             5.16.0-rc6-dirty #373
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
    RIP: release_pages+0x53f/0x840
    Call Trace:
    Modules linked in:
    ---[ end trace e99579b570fe0649 ]---
    RIP: 0010:release_pages+0x53f/0x840

Fixes: b94e02822deb ("mm,hwpoison: try to narrow window race for free pages")
Signed-off-by: Liu Shixin <>
Reported-by: Hulk Robot <>
Reviewed-by: Oscar Salvador <>
Acked-by: Naoya Horiguchi <>
Cc: <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
3 weeks agomm/damon/dbgfs: protect targets destructions with kdamond_lock
SeongJae Park [Sat, 25 Dec 2021 05:12:54 +0000 (21:12 -0800)]
mm/damon/dbgfs: protect targets destructions with kdamond_lock

DAMON debugfs interface iterates current monitoring targets in
'dbgfs_target_ids_read()' while holding the corresponding
'kdamond_lock'.  However, it also destructs the monitoring targets in
'dbgfs_before_terminate()' without holding the lock.  This can result in
a use_after_free bug.  This commit avoids the race by protecting the
destruction with the corresponding 'kdamond_lock'.

Reported-by: Sangwoo Bae <>
Fixes: 4bc05954d007 ("mm/damon: implement a debugfs-based user space interface")
Signed-off-by: SeongJae Park <>
Cc: <> [5.15.x]
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
3 weeks agomm/page_alloc: fix __alloc_size attribute for alloc_pages_exact_nid
Thibaut Sautereau [Sat, 25 Dec 2021 05:12:51 +0000 (21:12 -0800)]
mm/page_alloc: fix __alloc_size attribute for alloc_pages_exact_nid

The second parameter of alloc_pages_exact_nid is the one indicating the
size of memory pointed by the returned pointer.

Fixes: abd58f38dfb4 ("mm/page_alloc: add __alloc_size attributes for better bounds checking")
Signed-off-by: Thibaut Sautereau <>
Acked-by: Kees Cook <>
Cc: Daniel Micay <>
Cc: Levente Polyak <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
3 weeks agomm: delete unsafe BUG from page_cache_add_speculative()
Hugh Dickins [Sat, 25 Dec 2021 05:12:48 +0000 (21:12 -0800)]
mm: delete unsafe BUG from page_cache_add_speculative()

It is not easily reproducible, but on 5.16-rc I have several times hit
the VM_BUG_ON_PAGE(PageTail(page), page) in
page_cache_add_speculative(): usually from filemap_get_read_batch() for
an ext4 read, yesterday from next_uptodate_page() from
filemap_map_pages() for a shmem fault.

That BUG used to be placed where page_ref_add_unless() had succeeded,
but now it is placed before folio_ref_add_unless() is attempted: that is
not safe, since it is only the acquired reference which makes the page
safe from racing THP collapse or split.

We could keep the BUG, checking PageTail only when
folio_ref_try_add_rcu() has succeeded; but I don't think it adds much
value - just delete it.

Fixes: 020853b6f5ea ("mm: Add folio_try_get_rcu()")
Signed-off-by: Hugh Dickins <>
Acked-by: Kirill A. Shutemov <>
Reviewed-by: Matthew Wilcox (Oracle) <>
Cc: Vlastimil Babka <>
Cc: William Kucharski <>
Cc: Christoph Hellwig <>
Cc: Mike Rapoport <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
3 weeks agomm, hwpoison: fix condition in free hugetlb page path
Naoya Horiguchi [Sat, 25 Dec 2021 05:12:45 +0000 (21:12 -0800)]
mm, hwpoison: fix condition in free hugetlb page path

When a memory error hits a tail page of a free hugepage,
__page_handle_poison() is expected to be called to isolate the error in
4kB unit, but it's not called due to the outdated if-condition in
memory_failure_hugetlb().  This loses the chance to isolate the error in
the finer unit, so it's not optimal.  Drop the condition.

This "(p != head && TestSetPageHWPoison(head)" condition is based on the
old semantics of PageHWPoison on hugepage (where PG_hwpoison flag was
set on the subpage), so it's not necessray any more.  By getting to set
PG_hwpoison on head page for hugepages, concurrent error events on
different subpages in a single hugepage can be prevented by
TestSetPageHWPoison(head) at the beginning of memory_failure_hugetlb().
So dropping the condition should not reopen the race window originally
mentioned in commit b985194c8c0a ("hwpoison, hugetlb:
lock_page/unlock_page does not match for handling a free hugepage")

[ fix "HardwareCorrupted" counter]
Signed-off-by: Naoya Horiguchi <>
Reported-by: Fei Luo <>
Reviewed-by: Mike Kravetz <>
Cc: <> [5.14+]
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
3 weeks agoMAINTAINERS: mark more list instances as moderated
Randy Dunlap [Sat, 25 Dec 2021 05:12:42 +0000 (21:12 -0800)]
MAINTAINERS: mark more list instances as moderated

Some lists that are moderated are not marked as moderated consistently,
so mark them all as moderated.

Signed-off-by: Randy Dunlap <>
Cc: Miquel Raynal <>
Cc: Conor Culhane <>
Cc: Ryder Lee <>
Cc: Jianjun Wang <>
Cc: Alexandre Belloni <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
3 weeks agokernel/crash_core: suppress unknown crashkernel parameter warning
Philipp Rudo [Sat, 25 Dec 2021 05:12:39 +0000 (21:12 -0800)]
kernel/crash_core: suppress unknown crashkernel parameter warning

When booting with crashkernel= on the kernel command line a warning
similar to

    Kernel command line: ro console=ttyS0 crashkernel=256M
    Unknown kernel command line parameters "crashkernel=256M", will be passed to user space.

is printed.

This comes from crashkernel= being parsed independent from the kernel
parameter handling mechanism.  So the code in init/main.c doesn't know
that crashkernel= is a valid kernel parameter and prints this incorrect

Suppress the warning by adding a dummy early_param handler for

Fixes: 86d1919a4fb0 ("init: print out unknown kernel parameters")
Signed-off-by: Philipp Rudo <>
Acked-by: Baoquan He <>
Cc: Andrew Halaney <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
3 weeks agomm: mempolicy: fix THP allocations escaping mempolicy restrictions
Andrey Ryabinin [Sat, 25 Dec 2021 05:12:35 +0000 (21:12 -0800)]
mm: mempolicy: fix THP allocations escaping mempolicy restrictions

alloc_pages_vma() may try to allocate THP page on the local NUMA node

page = __alloc_pages_node(hpage_node,
gfp | __GFP_THISNODE | __GFP_NORETRY, order);

And if the allocation fails it retries allowing remote memory:

if (!page && (gfp & __GFP_DIRECT_RECLAIM))
     page = __alloc_pages_node(hpage_node,
gfp, order);

However, this retry allocation completely ignores memory policy nodemask
allowing allocation to escape restrictions.

The first appearance of this bug seems to be the commit ac5b2c18911f
("mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings").

The bug disappeared later in the commit 89c83fb539f9 ("mm, thp:
consolidate THP gfp handling into alloc_hugepage_direct_gfpmask") and
reappeared again in slightly different form in the commit 76e654cc91bb
("mm, page_alloc: allow hugepage fallback to remote nodes when

Fix this by passing correct nodemask to the __alloc_pages() call.

The demonstration/reproducer of the problem:

    $ mount -oremount,size=4G,huge=always /dev/shm/
    $ echo always > /sys/kernel/mm/transparent_hugepage/defrag
    $ cat mbind_thp.c
    #include <unistd.h>
    #include <sys/mman.h>
    #include <sys/stat.h>
    #include <fcntl.h>
    #include <assert.h>
    #include <stdlib.h>
    #include <stdio.h>
    #include <numaif.h>

    #define SIZE 2ULL << 30
    int main(int argc, char **argv)
        int fd;
        unsigned long long i;
        char *addr;
        pid_t pid;
        char buf[100];
        unsigned long nodemask = 1;

        fd = open("/dev/shm/test", O_RDWR|O_CREAT);
        assert(fd > 0);
        assert(ftruncate(fd, SIZE) == 0);

        addr = mmap(NULL, SIZE, PROT_READ|PROT_WRITE,
                           MAP_SHARED, fd, 0);

        assert(mbind(addr, SIZE, MPOL_BIND, &nodemask, 2, MPOL_MF_STRICT|MPOL_MF_MOVE)==0);
        for (i = 0; i < SIZE; i+=4096) {
          addr[i] = 1;
        pid = getpid();
        snprintf(buf, sizeof(buf), "grep shm /proc/%d/numa_maps", pid);

        return 0;
    $ gcc mbind_thp.c -o mbind_thp -lnuma
    $ numactl -H
    available: 2 nodes (0-1)
    node 0 cpus: 0 2
    node 0 size: 1918 MB
    node 0 free: 1595 MB
    node 1 cpus: 1 3
    node 1 size: 2014 MB
    node 1 free: 1731 MB
    node distances:
    node   0   1
      0:  10  20
      1:  20  10
    $ rm -f /dev/shm/test; taskset -c 0 ./mbind_thp
    7fd970a00000 bind:0 file=/dev/shm/test dirty=524288 active=0 N0=396800 N1=127488 kernelpagesize_kB=4

Fixes: ac5b2c18911f ("mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings")
Signed-off-by: Andrey Ryabinin <>
Acked-by: Michal Hocko <>
Acked-by: Mel Gorman <>
Acked-by: David Rientjes <>
Cc: Andrea Arcangeli <>
Cc: <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
3 weeks agokfence: fix memory leak when cat kfence objects
Baokun Li [Sat, 25 Dec 2021 05:12:32 +0000 (21:12 -0800)]
kfence: fix memory leak when cat kfence objects

Hulk robot reported a kmemleak problem:

    unreferenced object 0xffff93d1d8cc02e8 (size 248):
      comm "cat", pid 23327, jiffies 4624670141 (age 495992.217s)
      hex dump (first 32 bytes):
        00 40 85 19 d4 93 ff ff 00 10 00 00 00 00 00 00  .@..............
        00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    unreferenced object 0xffff93d419854000 (size 4096):
      comm "cat", pid 23327, jiffies 4624670141 (age 495992.217s)
      hex dump (first 32 bytes):
        6b 66 65 6e 63 65 2d 23 32 35 30 3a 20 30 78 30  kfence-#250: 0x0
        30 30 30 30 30 30 30 37 35 34 62 64 61 31 32 2d  0000000754bda12-

I find that we can easily reproduce this problem with the following

cat /sys/kernel/debug/kfence/objects
echo scan > /sys/kernel/debug/kmemleak
cat /sys/kernel/debug/kmemleak

The leaked memory is allocated in the stack below:

            seq_open            ---> alloc seq_file
              traverse          ---> alloc seq_buf

And it should have been released in the following process:

                full_proxy_release  ---> free here

However, the release function corresponding to file_operations is not
implemented in kfence.  As a result, a memory leak occurs.  Therefore,
the solution to this problem is to implement the corresponding release

Fixes: 0ce20dd84089 ("mm: add Kernel Electric-Fence infrastructure")
Signed-off-by: Baokun Li <>
Reported-by: Hulk Robot <>
Acked-by: Marco Elver <>
Reviewed-by: Kefeng Wang <>
Cc: Alexander Potapenko <>
Cc: Dmitry Vyukov <>
Cc: Yu Kuai <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
3 weeks agoselftests: mptcp: Remove the deprecated config NFT_COUNTER
Ma Xinjian [Fri, 24 Dec 2021 09:59:28 +0000 (17:59 +0800)]
selftests: mptcp: Remove the deprecated config NFT_COUNTER

NFT_COUNTER was removed since
390ad4295aa ("netfilter: nf_tables: make counter support built-in")
LKP/0Day will check if all configs listing under selftests are able to
be enabled properly.

For the missing configs, it will report something like:
LKP WARN miss config CONFIG_NFT_COUNTER= of net/mptcp/config

- it's not reasonable to keep the deprecated configs.
- configs under kselftests are recommended by corresponding tests.
So if some configs are missing, it will impact the testing results

Reported-by: kernel test robot <>
Signed-off-by: Ma Xinjian <>
Signed-off-by: David S. Miller <>
3 weeks agosctp: use call_rcu to free endpoint
Xin Long [Thu, 23 Dec 2021 18:04:30 +0000 (13:04 -0500)]
sctp: use call_rcu to free endpoint

This patch is to delay the endpoint free by calling call_rcu() to fix
another use-after-free issue in sctp_sock_dump():

  BUG: KASAN: use-after-free in __lock_acquire+0x36d9/0x4c20
  Call Trace:
    __lock_acquire+0x36d9/0x4c20 kernel/locking/lockdep.c:3218
    lock_acquire+0x1ed/0x520 kernel/locking/lockdep.c:3844
    __raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline]
    _raw_spin_lock_bh+0x31/0x40 kernel/locking/spinlock.c:168
    spin_lock_bh include/linux/spinlock.h:334 [inline]
    __lock_sock+0x203/0x350 net/core/sock.c:2253
    lock_sock_nested+0xfe/0x120 net/core/sock.c:2774
    lock_sock include/net/sock.h:1492 [inline]
    sctp_sock_dump+0x122/0xb20 net/sctp/diag.c:324
    sctp_for_each_transport+0x2b5/0x370 net/sctp/socket.c:5091
    sctp_diag_dump+0x3ac/0x660 net/sctp/diag.c:527
    __inet_diag_dump+0xa8/0x140 net/ipv4/inet_diag.c:1049
    inet_diag_dump+0x9b/0x110 net/ipv4/inet_diag.c:1065
    netlink_dump+0x606/0x1080 net/netlink/af_netlink.c:2244
    __netlink_dump_start+0x59a/0x7c0 net/netlink/af_netlink.c:2352
    netlink_dump_start include/linux/netlink.h:216 [inline]
    inet_diag_handler_cmd+0x2ce/0x3f0 net/ipv4/inet_diag.c:1170
    __sock_diag_cmd net/core/sock_diag.c:232 [inline]
    sock_diag_rcv_msg+0x31d/0x410 net/core/sock_diag.c:263
    netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2477
    sock_diag_rcv+0x2a/0x40 net/core/sock_diag.c:274

This issue occurs when asoc is peeled off and the old sk is freed after
getting it by asoc-> and before calling lock_sock(sk).

To prevent the sk free, as a holder of the sk, ep should be alive when
calling lock_sock(). This patch uses call_rcu() and moves sock_put and
ep free into sctp_endpoint_destroy_rcu(), so that it's safe to try to
hold the ep under rcu_read_lock in sctp_transport_traverse_process().

If sctp_endpoint_hold() returns true, it means this ep is still alive
and we have held it and can continue to dump it; If it returns false,
it means this ep is dead and can be freed after rcu_read_unlock, and
we should skip it.

In sctp_sock_dump(), after locking the sk, if this ep is different from
tsp->asoc->ep, it means during this dumping, this asoc was peeled off
before calling lock_sock(), and the sk should be skipped; If this ep is
the same with tsp->asoc->ep, it means no peeloff happens on this asoc,
and due to lock_sock, no peeloff will happen either until release_sock.

Note that delaying endpoint free won't delay the port release, as the
port release happens in sctp_endpoint_destroy() before calling call_rcu().
Also, freeing endpoint by call_rcu() makes it safe to access the sk by
asoc-> in sctp_assocs_seq_show() and sctp_rcv().

Thanks Jones to bring this issue up.

  - improve the changelog.
  - add kfree(ep) into sctp_endpoint_destroy_rcu(), as Jakub noticed.

Reported-by: Lee Jones <>
Fixes: d25adbeb0cdb ("sctp: fix an use-after-free issue in sctp_sock_dump")
Signed-off-by: Xin Long <>
Signed-off-by: David S. Miller <>