muen/linux.git
3 years agohfsplus: drop ACL support
Ernesto A. Fernández [Wed, 22 Aug 2018 04:59:23 +0000 (21:59 -0700)]
hfsplus: drop ACL support

The HFS+ Access Control Lists have not worked at all for the past five
years, and nobody seems to have noticed.  Besides, POSIX draft ACLs are
not compatible with MacOS.  Drop the feature entirely.

Link: http://lkml.kernel.org/r/20180714190608.wtnmmtjqeyladkut@eaf
Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: Viacheslav Dubeyko <slava@dubeyko.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agohfsplus: fix decomposition of Hangul characters
Ernesto A. Fernández [Wed, 22 Aug 2018 04:59:19 +0000 (21:59 -0700)]
hfsplus: fix decomposition of Hangul characters

Files created under macOS cannot be opened under linux if their names
contain Korean characters, and vice versa.

The Korean alphabet is special because its normalization is done without a
table.  The module deals with it correctly when composing, but forgets
about it for the decomposition.

Fix this using the Hangul decomposition function provided in the Unicode
Standard.  The code fits a bit awkwardly because it requires a buffer,
while all the other normalizations are returned as pointers to the
decomposition table.  This is actually also a bug because reordering may
still be needed, but for now leave it as it is.

The patch will cause trouble for Hangul filenames already created by the
module in the past.  This shouldn't really be concern because its main
purpose was always sharing with macOS.  If a user actually needs to access
such a file the nodecompose mount option should be enough.

Link: http://lkml.kernel.org/r/20180717220951.p6qqrgautc4pxvzu@eaf
Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com>
Reported-by: Ting-Chang Hou <tchou@synology.com>
Tested-by: Ting-Chang Hou <tchou@synology.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agohfsplus: avoid deadlock on file truncation
Ernesto A. Fernández [Wed, 22 Aug 2018 04:59:16 +0000 (21:59 -0700)]
hfsplus: avoid deadlock on file truncation

After an extent is removed from the extent tree, the corresponding bits
are also cleared from the block allocation file.  This is currently done
without releasing the tree lock.

The problem is that the allocation file has extents of its own; if it is
fragmented enough, some of them may be in the extent tree as well, and
hfsplus_get_block() will try to take the lock again.

To avoid deadlock, only hold the extent tree lock during the actual tree
operations.

Link: http://lkml.kernel.org/r/20180709202549.auxwkb6memlegb4a@eaf
Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com>
Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com>
Cc: Viacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agohfsplus: don't return 0 when fill_super() failed
Tetsuo Handa [Wed, 22 Aug 2018 04:59:12 +0000 (21:59 -0700)]
hfsplus: don't return 0 when fill_super() failed

syzbot is reporting NULL pointer dereference at mount_fs() [1].  This is
because hfsplus_fill_super() is by error returning 0 when
hfsplus_fill_super() detected invalid filesystem image, and mount_bdev()
is returning NULL because dget(s->s_root) == NULL if s->s_root == NULL,
and mount_fs() is accessing root->d_sb because IS_ERR(root) == false if
root == NULL.  Fix this by returning -EINVAL when hfsplus_fill_super()
detected invalid filesystem image.

[1] https://syzkaller.appspot.com/bug?id=21acb6850cecbc960c927229e597158cf35f33d0

Link: http://lkml.kernel.org/r/d83ce31a-874c-dd5b-f790-41405983a5be@I-love.SAKURA.ne.jp
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: syzbot <syzbot+01ffaf5d9568dd1609f7@syzkaller.appspotmail.com>
Reviewed-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agofs/nilfs2/file.c: use new return type vm_fault_t
Souptick Joarder [Wed, 22 Aug 2018 04:59:08 +0000 (21:59 -0700)]
fs/nilfs2/file.c: use new return type vm_fault_t

Use new return type vm_fault_t for page_mkwrite handler.

Link: http://lkml.kernel.org/r/1529555928-2411-1-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agonilfs2: use 64-bit superblock timstamps
Arnd Bergmann [Wed, 22 Aug 2018 04:59:05 +0000 (21:59 -0700)]
nilfs2: use 64-bit superblock timstamps

The mount time field in the superblock uses a 64-bit timestamp, but
calling get_seconds() may truncate the current time to 32 bits.

This changes it to ktime_get_real_seconds() to avoid the potential
overflow.

Link: http://lkml.kernel.org/r/20180620075041.4154396-1-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: David Howells <dhowells@redhat.com>
Cc: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoautofs: add AUTOFS_EXP_FORCED flag
Ian Kent [Wed, 22 Aug 2018 04:59:01 +0000 (21:59 -0700)]
autofs: add AUTOFS_EXP_FORCED flag

The userspace automount(8) daemon is meant to perform a forced expire when
sent a SIGUSR2.

But since the expiration is routed through the kernel and the kernel
doesn't send an expire request if the mount is busy this hasn't worked at
least since autofs version 5.

Add an AUTOFS_EXP_FORCED flag to allow implemention of the feature and
bump the protocol version so user space can check if it's implemented if
needed.

Link: http://lkml.kernel.org/r/152937734715.21213.6594007182776598970.stgit@pluto.themaw.net
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoautofs: make expire flags usage consistent with v5 params
Ian Kent [Wed, 22 Aug 2018 04:58:58 +0000 (21:58 -0700)]
autofs: make expire flags usage consistent with v5 params

Make the usage of the expire flags consistent by naming the expire flags
the same as it is named in the version 5 miscelaneous ioctl parameters and
only check the bit flags when needed.

Link: http://lkml.kernel.org/r/152937734046.21213.9454131988766280028.stgit@pluto.themaw.net
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoautofs: make autofs_expire_indirect() static
Ian Kent [Wed, 22 Aug 2018 04:58:54 +0000 (21:58 -0700)]
autofs: make autofs_expire_indirect() static

autofs_expire_indirect() isn't used outside of fs/autofs/expire.c so make
it static.

Link: http://lkml.kernel.org/r/152937733512.21213.10509996499623738446.stgit@pluto.themaw.net
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoautofs: make autofs_expire_direct() static
Ian Kent [Wed, 22 Aug 2018 04:58:51 +0000 (21:58 -0700)]
autofs: make autofs_expire_direct() static

autofs_expire_direct() isn't used outside of fs/autofs/expire.c so make it
static.

Link: http://lkml.kernel.org/r/152937732944.21213.11821977712410930973.stgit@pluto.themaw.net
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoautofs: fix clearing AUTOFS_EXP_LEAVES in autofs_expire_indirect()
Ian Kent [Wed, 22 Aug 2018 04:58:48 +0000 (21:58 -0700)]
autofs: fix clearing AUTOFS_EXP_LEAVES in autofs_expire_indirect()

The expire flag AUTOFS_EXP_LEAVES is cleared before the second call to
should_expire() in autofs_expire_indirect() but the parameter passed in
the second call is incorrect.

Fortunately AUTOFS_EXP_LEAVES expire flag has not been used for a long
time but might be needed in the future so fix it rather than remove the
expire leaves functionality.

Link: http://lkml.kernel.org/r/152937732410.21213.7447294898147765076.stgit@pluto.themaw.net
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoautofs: fix inconsistent use of now variable
Ian Kent [Wed, 22 Aug 2018 04:58:44 +0000 (21:58 -0700)]
autofs: fix inconsistent use of now variable

The global variable "now" in fs/autofs/expire.c is used in an inconsistent
way, sometimes using jiffies directly, and sometimes using the "now"
variable, and setting it isn't done consistently either.

But the autofs dentry info last_used field is only updated during path
walks or during expire so jiffies can be used directly and the global
variable "now" removed.

Link: http://lkml.kernel.org/r/152937731702.21213.7371321165189170865.stgit@pluto.themaw.net
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoautofs: fix directory and symlink access
Ian Kent [Wed, 22 Aug 2018 04:58:41 +0000 (21:58 -0700)]
autofs: fix directory and symlink access

Depending on how it is configured the autofs user space daemon can leave
in use mounts mounted at exit and re-connect to them at start up.  But for
this to work best the state of the autofs file system needs to be left
intact over the restart.

Also, at system shutdown, mounts in an autofs file system might be
umounted exposing a mount point trigger for which subsequent access can
lead to a hang.  So recent versions of automount(8) now does its best to
set autofs file system mounts catatonic at shutdown.

When autofs file system mounts are catatonic it's currently possible to
create and remove directories and symlinks which can be a problem at
restart, as described above.

So return EACCES in the directory, symlink and unlink methods if the
autofs file system is catatonic.

Link: http://lkml.kernel.org/r/152902119090.4144.9561910674530214291.stgit@pluto.themaw.net
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoinit/main.c: log init process file name
Paul Menzel [Wed, 22 Aug 2018 04:58:37 +0000 (21:58 -0700)]
init/main.c: log init process file name

Add a log message to `run_init_process()`.

This log message serves two purposes.

1.  If the init process is not specified on the Linux Kernel command
    line, the user sees, what file was chosen.

2.  The time stamps shows exactly, when the Linux kernel handed over
    control to the init process.

Link: http://lkml.kernel.org/r/b1fc97fa-4aa9-1904-ddb5-859e78995c41@molgen.mpg.de
Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoinit/Kconfig: fix its typos
Randy Dunlap [Wed, 22 Aug 2018 04:58:34 +0000 (21:58 -0700)]
init/Kconfig: fix its typos

Correct typos of "it's" to "its.

Link: http://lkml.kernel.org/r/0ac627b6-5527-55f4-0489-1631aa34fc11@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoinit/: remove ineffective sparse disabling
Luc Van Oostenryck [Wed, 22 Aug 2018 04:58:30 +0000 (21:58 -0700)]
init/: remove ineffective sparse disabling

Sparse checking used to be disabled on init/do_mounts.c and a few related
files because "Many of the syscalls used in this file expect some of the
arguments to be __user pointers not __kernel pointers".

However since 28128c61e ("kconfig.h: Include compiler types to avoid
missed struct attributes") the checks are, in fact, not disabled anymore
because of the more early include of "linux/compiler_types.h"

So remove the now ineffective #undefery that was done to disable these
warnings, as well as the associated comment.

Link: http://lkml.kernel.org/r/20180617115355.53799-1-luc.vanoostenryck@gmail.com
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agofs/eventpoll.c: simplify ep_is_linked() callers
Davidlohr Bueso [Wed, 22 Aug 2018 04:58:26 +0000 (21:58 -0700)]
fs/eventpoll.c: simplify ep_is_linked() callers

Instead of having each caller pass the rdllink explicitly, just have
ep_is_linked() pass it while the callers just need the epi pointer.  This
helper is all about the rdllink, and this change, furthermore, improves
the function's self documentation.

Link: http://lkml.kernel.org/r/20180727053432.16679-3-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agofs/eventpoll.c: loosen irq safety in ep_poll()
Davidlohr Bueso [Wed, 22 Aug 2018 04:58:23 +0000 (21:58 -0700)]
fs/eventpoll.c: loosen irq safety in ep_poll()

Similar to other calls, ep_poll() is not called with interrupts disabled,
and we can therefore avoid the irq save/restore dance and just disable
local irqs.  In fact, the call should never be called in irq context at
all, considering that the only path is

epoll_wait(2) -> do_epoll_wait() -> ep_poll().

When running on a 2 socket 40-core (ht) IvyBridge a common pipe based
epoll_wait(2) microbenchmark, the following performance improvements are
seen:

    # threads       vanilla         dirty
 1          1805587     2106412
 2          1854064     2090762
 4          1805484     2017436
 8          1751222     1974475
 16         1725299     1962104
 32         1378463     1571233
 64          787368      900784

Which is a pretty constantly near 15%.

Also add a lockdep check such that we detect any mischief before
deadlocking.

Link: http://lkml.kernel.org/r/20180727053432.16679-2-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jason Baron <jbaron@akamai.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agofs/eventpoll.c: simply CONFIG_NET_RX_BUSY_POLL ifdefery
Davidlohr Bueso [Wed, 22 Aug 2018 04:58:19 +0000 (21:58 -0700)]
fs/eventpoll.c: simply CONFIG_NET_RX_BUSY_POLL ifdefery

... 'tis easier on the eye.

[akpm@linux-foundation.org: use inlines rather than macros]
Link: http://lkml.kernel.org/r/20180725185620.11020-1-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agocheckpatch: DT bindings should be a separate patch
Rob Herring [Wed, 22 Aug 2018 04:58:16 +0000 (21:58 -0700)]
checkpatch: DT bindings should be a separate patch

Devicetree bindings should be their own patch as documented in
Documentation/devicetree/bindings/submitting-patches.txt section I.1.
This is because bindings are logically independent from a driver
implementation, they have a different maintainer (even though they often
are applied via the same tree), and it makes for a cleaner history in the
DT only tree created with git-filter-branch.

[robh@kernel.org: add doc pointer to warning, simplify logic]
Link: http://lkml.kernel.org/r/20180810170513.26284-1-robh@kernel.org
[robh@kernel.org: v3]
Link: http://lkml.kernel.org/r/20180810225049.20452-1-robh@kernel.org
Link: http://lkml.kernel.org/r/20180809205032.22205-1-robh@kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Joe Perches <joe@perches.com>
Cc: Andy Whitcroft <apw@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agocheckpatch: warn on unnecessary int declarations
Joe Perches [Wed, 22 Aug 2018 04:58:12 +0000 (21:58 -0700)]
checkpatch: warn on unnecessary int declarations

On Sun, 2018-08-05 at 08:52 -0700, Linus Torvalds wrote:
> "long unsigned int" isn't _technically_ wrong. But we normally
> call that type "unsigned long".

So add a checkpatch test for it.

Link: http://lkml.kernel.org/r/7bbd97dc0a1e5896a0251fada7bb68bb33643f77.camel@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agocheckpatch: check for space after "else" keyword
Michal Zylowski [Wed, 22 Aug 2018 04:58:08 +0000 (21:58 -0700)]
checkpatch: check for space after "else" keyword

Current checkpatch implementation permits notation like

} else{

in kernel code.  It looks like oversight and inconsistency in checkpatch
rules (e.g.  instruction like 'do' is tested).

Add regex for checking space after 'else' keyword and trigger error if
space is not present.

Link: http://lkml.kernel.org/r/1533545753-8870-1-git-send-email-michal.zylowski@intel.com
Signed-off-by: Michal Zylowski <michal.zylowski@intel.com>
Acked-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agocheckpatch: fix SPDX license check with --root=<path>
Joe Perches [Wed, 22 Aug 2018 04:58:04 +0000 (21:58 -0700)]
checkpatch: fix SPDX license check with --root=<path>

checkpatch uses the in-kernel script spdxcheck.py to validate the specific
license in a file or script.

This check can currently fail for a couple reasons:

o spdxcheck.py assumes the existence of git tree that may not
  exist for a bare source tree from something like a tarball
o the spdxcheck.py must be run from the top level root directory

So add a git existence test and set the subprocess subdirectory.

Link: http://lkml.kernel.org/r/2b32864324ae9c92948b002ec4c0c22409ed98f1.camel@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Reported-by: Charlemagne Lasse <charlemagnelasse@gmail.com>
Tested-by: Charlemagne Lasse <charlemagnelasse@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agocheckpatch: warn when a patch doesn't have a description
Joe Perches [Wed, 22 Aug 2018 04:58:01 +0000 (21:58 -0700)]
checkpatch: warn when a patch doesn't have a description

Potential patches should have a commit description.  Emit a warning when
there isn't one.

[akpm@linux-foundation.org: s/else if/elsif/]
Link: http://lkml.kernel.org/r/1b099f4d8373aa583a17011992676bf0f3f09eee.camel@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Suggested-by: Prakruthi Deepak Heragu <pheragu@codeaurora.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agocheckpatch: check for #if 0/#if 1
Prakruthi Deepak Heragu [Wed, 22 Aug 2018 04:57:57 +0000 (21:57 -0700)]
checkpatch: check for #if 0/#if 1

The #if 0 or #if 1 is used to toggle features. Warn if #if 0 or #if 1
is present and suggest that they can be removed.

[akpm@linux-foundation.org: fix spacing around periods, per Joe\
Link: http://lkml.kernel.org/r/1532625218-24321-1-git-send-email-pheragu@codeaurora.org
Signed-off-by: Abhijeet Dharmapurikar <adharmap@codeaurora.org>
Signed-off-by: Prakruthi Deepak Heragu <pheragu@codeaurora.org>
Acked-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agocheckpatch: fix krealloc reuse test
Joe Perches [Wed, 22 Aug 2018 04:57:50 +0000 (21:57 -0700)]
checkpatch: fix krealloc reuse test

The current krealloc test does not function correctly when the temporary
pointer return name contains the original pointer name.

Fix that by maximally matching the return pointer name and the original
pointer name and doing a separate comparison of the both names.

Link: http://lkml.kernel.org/r/e617ecb8c019a9c4c56540a1bec16c8aed43a4e4.camel@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Reported-by: Lars-Peter Clausen <lars@metafoo.de>
Cc: Manish Narani <manish.narani@xilinx.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agocheckpatch: validate SPDX license with spdxcheck.py
Joe Perches [Wed, 22 Aug 2018 04:57:47 +0000 (21:57 -0700)]
checkpatch: validate SPDX license with spdxcheck.py

Use the existing scripts/spdxcheck.py to validate any
SPDX-License-Identifier found in line 1 or 2 of patches or files.

Miscellanea:

o Properly indent the existing SPDX-License-Identifier block.

Link: http://lkml.kernel.org/r/05b832407b24e0a27e419906187cd863bc1617c7.camel@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Rob Herring <robh@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agocheckpatch: fix macro argument reuse test
Joe Perches [Wed, 22 Aug 2018 04:57:43 +0000 (21:57 -0700)]
checkpatch: fix macro argument reuse test

Multiple line macro definitions where the arguments are separated by line
continuations can cause checkpatch to emit invalid syntax regex tests.

This can occur when a single argument is modified in a part of a patch.

For example: (to not add a diff in the commit message)

$ ./scripts/checkpatch.pl --git db023296f0115d2fe01fdabad54678f2b806da23
Unterminated \g... pattern in regex; <very long regex omitted>

And, the test does not work correctly when these arguments are all new as
the initial patch line addition "+" is used in the argument name.

Fix this by stripping the line continuations and any "+" from the list of
arguments.

Link: http://lkml.kernel.org/r/86cdb43a4db70670c102020093f7fb4eb3003e01.camel@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agocheckpatch: warn if missing author Signed-off-by
Geert Uytterhoeven [Wed, 22 Aug 2018 04:57:40 +0000 (21:57 -0700)]
checkpatch: warn if missing author Signed-off-by

Print a warning if none of the Signed-off-by lines cover the patch author.

Non-ASCII quoted printable encoding in From: headers and (lack of) double
quotes are handled.  Split From: headers are not fully handled: only the
first part is compared.

[geert+renesas@glider.be: only encode UTF-8 quoted printable mail headers]
Link: http://lkml.kernel.org/r/20180718145254.4770-1-geert+renesas@glider.be
Link: http://lkml.kernel.org/r/20180712100323.26684-1-geert+renesas@glider.be
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Joe Perches <joe@perches.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agocheckpatch: update section keywords
Geert Uytterhoeven [Wed, 22 Aug 2018 04:57:36 +0000 (21:57 -0700)]
checkpatch: update section keywords

As of commit bd721ea73e1f ("treewide: replace obsolete _refok by
__ref"), __init_refok no longer exists, so it can be removed.  While at
it, add the modern variants that were still missing.

Link: http://lkml.kernel.org/r/20180706084205.26367-1-geert+renesas@glider.be
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agocheckpatch: improve runtime execution speed a little
Joe Perches [Wed, 22 Aug 2018 04:57:33 +0000 (21:57 -0700)]
checkpatch: improve runtime execution speed a little

checkpatch repeatedly uses a runtime minimum version check that validates
the minimum perl version required for a regex match by using a "$^V ge
5.10.0" runtime string match.

Only perform that minimum version test once and store the result to reduce
string matching time.

This reduces runtime execution time for patches or files with high line
counts.

An example runtime improvement:

new: $ time ./scripts/checkpatch.pl -f drivers/net/ethernet/intel/i40e/i40e_main.c > /dev/null

real 0m11.856s
user 0m11.831s
sys 0m0.025s

old: $ time ./scripts/checkpatch.pl -f drivers/net/ethernet/intel/i40e/i40e_main.c > /dev/null

real 0m13.330s
user 0m13.282s
sys 0m0.049s

Link: http://lkml.kernel.org/r/db21aa9703833bad65ab70cc4e8a78da5b399138.camel@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agocheckpatch: add --fix for CONCATENATED_STRING and STRING_FRAGMENTS
Joe Perches [Wed, 22 Aug 2018 04:57:29 +0000 (21:57 -0700)]
checkpatch: add --fix for CONCATENATED_STRING and STRING_FRAGMENTS

Add the ability to --fix these string issues.

e.g.:
printk(KERN_INFO"bar" "baz"QUX);
converts to
printk(KERN_INFO "barbaz" QUX);

Link: http://lkml.kernel.org/r/a9fb505ccfedffc5869d08832a7ff05a21d85621.camel@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agocheckpatch: add a --strict test for structs with bool member definitions
Joe Perches [Wed, 22 Aug 2018 04:57:26 +0000 (21:57 -0700)]
checkpatch: add a --strict test for structs with bool member definitions

A struct with a bool member can have different sizes on various
architectures because neither bool size nor alignment is standardized.

So emit a message on the use of bool in structs only in .h files and not
.c files.

There is the real possibility that this test could have a false positive
when a bool is declared as an automatic, so limit the test to .h files
where the only false positive is for declarations in static inline
functions.

Link: http://lkml.kernel.org/r/95477c93db187bab6da8a8ba7c57836868446179.camel@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agolib/test_hexdump.c: fix failure on big endian cpu
Christophe Leroy [Wed, 22 Aug 2018 04:57:22 +0000 (21:57 -0700)]
lib/test_hexdump.c: fix failure on big endian cpu

On a big endian cpu, test_hexdump fails as follows.  The logs show that
bytes are expected in reversed order.

  [...]
  test_hexdump: Len: 24 buflen: 130 strlen: 97
  test_hexdump: Result: 97 'be32db7b 0a1893b2 70bac424 7d83349b a69c31ad 9c0face9                    .2.{....p..$}.4...1.....'
  test_hexdump: Expect: 97 '7bdb32be b293180a 24c4ba70 9b34837d ad319ca6 e9ac0f9c                    .2.{....p..$}.4...1.....'
  test_hexdump: Len: 8 buflen: 130 strlen: 77
  test_hexdump: Result: 77 'be32db7b0a1893b2                                                     .2.{....'
  test_hexdump: Expect: 77 'b293180a7bdb32be                                                     .2.{....'
  test_hexdump: Len: 6 buflen: 131 strlen: 87
  test_hexdump: Result: 87 'be32 db7b 0a18                                                                   .2.{..'
  test_hexdump: Expect: 87 '32be 7bdb 180a                                                                   .2.{..'
  test_hexdump: Len: 24 buflen: 131 strlen: 97
  test_hexdump: Result: 97 'be32db7b 0a1893b2 70bac424 7d83349b a69c31ad 9c0face9                    .2.{....p..$}.4...1.....'
  test_hexdump: Expect: 97 '7bdb32be b293180a 24c4ba70 9b34837d ad319ca6 e9ac0f9c                    .2.{....p..$}.4...1.....'
  test_hexdump: Len: 32 buflen: 131 strlen: 101
  test_hexdump: Result: 101 'be32db7b0a1893b2 70bac4247d83349b a69c31ad9c0face9 4cd1199943b1af0c  .2.{....p..$}.4...1.....L...C...'
  test_hexdump: Expect: 101 'b293180a7bdb32be 9b34837d24c4ba70 e9ac0f9cad319ca6 0cafb1439919d14c  .2.{....p..$}.4...1.....L...C...'
  test_hexdump: failed 801 out of 1184 tests

This patch fixes it.

Link: http://lkml.kernel.org/r/f3112437f62c2f48300535510918e8be1dceacfb.1533610877.git.christophe.leroy@c-s.fr
Fixes: 64d1d77a44697 ("hexdump: introduce test suite")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: rashmica <rashmicy@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agolib/Kconfig: remove 'default n' for tests
Andy Shevchenko [Wed, 22 Aug 2018 04:57:18 +0000 (21:57 -0700)]
lib/Kconfig: remove 'default n' for tests

It seems contributors follow the style of Kconfig entries where explicit
'default n' is present.  The default 'default' is 'n' already, thus, drop
these lines from Kconfig to make it more clear.

Link: http://lkml.kernel.org/r/20180719085131.79541-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Coly Li <colyli@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agobcache: use routines from lib/crc64.c for CRC64 calculation
Coly Li [Wed, 22 Aug 2018 04:57:15 +0000 (21:57 -0700)]
bcache: use routines from lib/crc64.c for CRC64 calculation

Now we have crc64 calculation in lib/crc64.c, it is unnecessary for
bcache to use its own version.  This patch changes bcache code to use
crc64 routines in lib/crc64.c.

Link: http://lkml.kernel.org/r/20180718165545.1622-3-colyli@suse.de
Signed-off-by: Coly Li <colyli@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Michael Lyle <mlyle@lyle.org>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Noah Massey <noah.massey@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agolib: add crc64 calculation routines
Coly Li [Wed, 22 Aug 2018 04:57:11 +0000 (21:57 -0700)]
lib: add crc64 calculation routines

Patch series "add crc64 calculation as kernel library", v5.

This patchset adds basic implementation of crc64 calculation as a Linux
kernel library.  Since bcache already does crc64 by itself, this patchset
also modifies bcache code to use the new crc64 library routine.

Currently bcache is the only user of crc64 calculation, another potential
user is bcachefs which is on the way to be in mainline kernel.  Therefore
it makes sense to make crc64 calculation to be a public library.

bcache uses crc64 as storage checksum, if a change of crc lib routines
results an inconsistent result, the unmatched checksum may make bcache
'think' the on-disk is corrupted, such a change should be avoided or
detected as early as possible.  Therefore a patch is being prepared which
adds a crc test framework, to check consistency of different calculations.

This patch (of 2):

Add the re-write crc64 calculation routines for Linux kernel.  The CRC64
polynomical arithmetic follows ECMA-182 specification, inspired by CRC
paper of Dr.  Ross N.  Williams (see
http://www.ross.net/crc/download/crc_v3.txt) and other public domain
implementations.

All the changes work in this way,
- When Linux kernel is built, host program lib/gen_crc64table.c will be
  compiled to lib/gen_crc64table and executed.
- The output of gen_crc64table execution is an array called as lookup
  table (a.k.a POLY 0x42f0e1eba9ea369) which contain 256 64-bit long
  numbers, this table is dumped into header file lib/crc64table.h.
- Then the header file is included by lib/crc64.c for normal 64bit crc
  calculation.
- Function declaration of the crc64 calculation routines is placed in
  include/linux/crc64.h

Currently bcache is the only user of crc64_be(), another potential user is
bcachefs which is on the way to be in mainline kernel.  Therefore it makes
sense to move crc64 calculation into lib/crc64.c as public code.

[colyli@suse.de: fix review comments from v4]
Link: http://lkml.kernel.org/r/20180726053352.2781-2-colyli@suse.de
Link: http://lkml.kernel.org/r/20180718165545.1622-2-colyli@suse.de
Signed-off-by: Coly Li <colyli@suse.de>
Co-developed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Michael Lyle <mlyle@lyle.org>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Noah Massey <noah.massey@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agolib/test_debug_virtual.c: make struct pointer foo static
Colin Ian King [Wed, 22 Aug 2018 04:57:07 +0000 (21:57 -0700)]
lib/test_debug_virtual.c: make struct pointer foo static

The pointer foo is local to the source and does not need to be
in global scope, so make it static.

Cleans up sparse warning:
symbol 'foo' was not declared. Should it be static?

Link: http://lkml.kernel.org/r/20180624112206.5722-1-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoinclude/linux/bitops.h: introduce BITS_PER_TYPE
Chris Wilson [Wed, 22 Aug 2018 04:57:03 +0000 (21:57 -0700)]
include/linux/bitops.h: introduce BITS_PER_TYPE

net_dim.h has a rather useful extension to BITS_PER_BYTE to compute the
number of bits in a type (BITS_PER_BYTE * sizeof(T)), so promote the macro
to bitops.h, alongside BITS_PER_BYTE, for wider usage.

Link: http://lkml.kernel.org/r/20180706094458.14116-1-chris@chris-wilson.co.uk
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Andy Gospodarek <gospo@broadcom.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agolib/bitmap.c: drop unnecessary 0 check for u32 array operations
Andy Shevchenko [Wed, 22 Aug 2018 04:56:59 +0000 (21:56 -0700)]
lib/bitmap.c: drop unnecessary 0 check for u32 array operations

nbits == 0 is safe to be supplied to the function body, so remove
unnecessary checks in bitmap_to_arr32() and bitmap_from_arr32().

Link: http://lkml.kernel.org/r/20180531131914.44352-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Yury Norov <ynorov@caviumnetworks.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoget_maintainer: allow option --mpath <directory> to read all files in <directory>
Joe Perches [Wed, 22 Aug 2018 04:56:55 +0000 (21:56 -0700)]
get_maintainer: allow option --mpath <directory> to read all files in <directory>

There is an external use case for multiple private MAINTAINER style files
in a separate directory.  Allow it.

--mpath has a default of "./MAINTAINERS".

The value entered can be either a file or a directory.

The behaviors are now:

--mpath <file>          Read only the specific file as <MAINTAINER_TYPE> file
--mpath <directory>     Read all files in <directory> as <MAINTAINER_TYPE> files
--mpath <directory> --find-maintainer-files
                        Recurse through <directory> and read all files named MAINTAINERS

Link: http://lkml.kernel.org/r/991b2f20112d53863cd79e61d908f1d26d3e1971.camel@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Tested-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoget_maintainer.pl: add -mpath=<path or file> for MAINTAINERS file location
Joe Perches [Wed, 22 Aug 2018 04:56:52 +0000 (21:56 -0700)]
get_maintainer.pl: add -mpath=<path or file> for MAINTAINERS file location

Add the ability to have an override for the location of the MAINTAINERS
file.

Miscellanea:

o Properly indent a few lines with leading spaces

Link: http://lkml.kernel.org/r/a86e69195076ed3c4c526fddc76b86c28e0a1e37.camel@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Suggested-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoget_maintainer: allow usage outside of kernel tree
Antonio Nino Diaz [Wed, 22 Aug 2018 04:56:48 +0000 (21:56 -0700)]
get_maintainer: allow usage outside of kernel tree

Add option '--no-tree' to get_maintainer.pl script to allow using this
script in projects that aren't the Linux kernel if they use the same
format for their MAINTAINERS file.  This command is also available in
checkpatch.pl, for example.

Link: http://lkml.kernel.org/r/04452ac6-1575-f612-72c6-6ea88e70a9d5@arm.com
Signed-off-by: Antonio Nino Diaz <antonio.ninodiaz@arm.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agos/epoll: robustify irq safety with lockdep_assert_irqs_enabled()
Davidlohr Bueso [Wed, 22 Aug 2018 04:56:45 +0000 (21:56 -0700)]
s/epoll: robustify irq safety with lockdep_assert_irqs_enabled()

Sprinkle lockdep_assert_irqs_enabled() checks in the functions that do not
save and restore interrupts when dealing with the ep->wq.lock.  These are
ep_scan_ready_list() and those called by epoll_ctl(): ep_insert, ep_modify
and ep_remove.

[akpm@linux-foundation.org: remove too-obvious comments]
Link: http://lkml.kernel.org/r/20180721183127.3busfa335zlcjeox@linux-r8p5
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agofs/epoll: loosen irq safety in epoll_insert() and epoll_remove()
Davidlohr Bueso [Wed, 22 Aug 2018 04:56:41 +0000 (21:56 -0700)]
fs/epoll: loosen irq safety in epoll_insert() and epoll_remove()

Both functions are similar to the context of ep_modify(), called via
epoll_ctl(2).  Just like ep_modify(), saving and restoring interrupts is
an overkill in these calls as it will never be called with irqs disabled.
While ep_remove() can be called directly from EPOLL_CTL_DEL, it can also
be called when releasing the file, but this also complies with the above.

Link: http://lkml.kernel.org/r/20180720172956.2883-3-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agofs/epoll: loosen irq safety in ep_scan_ready_list()
Davidlohr Bueso [Wed, 22 Aug 2018 04:56:38 +0000 (21:56 -0700)]
fs/epoll: loosen irq safety in ep_scan_ready_list()

Patch series "fs/epoll: loosen irq safety when possible".

Both patches replace saving+restoring interrupts when taking the ep->lock
(now the waitqueue lock), with just disabling local irqs.  This shows
immediate performance benefits in patch 1 for an epoll workload running on
Xen.  The main concern we need to have with this sort of changes in epoll
is the ep_poll_callback() which is passed to the wait queue wakeup and is
done very often under irq context, this patch does not touch this call.

Patches have been tested pretty heavily with the customer workload,
microbenchmarks, ltp testcases and two high level workloads that use epoll
under the hood: nginx and libevent benchmarks.

This patch (of 2):

Saving and restoring interrupts in ep_scan_ready_list() is an
overkill as it is never called with interrupts disabled. Loosen
this to simply disabling local irqs such that archs where managing
irqs is expensive or virtual environments. This patch yields
some throughput improvements on a workload that is epoll intensive
running on a single Xen DomU.

1 Job  7500 -->    8800 enq/s  (+17%)
2 Jobs 14000   -->   15200 enq/s  (+8%)
3 Jobs 20500 -->   22300 enq/s  (+8%)
4 Jobs 25000   -->   28000 enq/s  (+8-12)%

On bare metal:

For a 2-socket 40-core (ht) IvyBridge on a few workloads, unfortunately I
don't have a xen environment and the results for Xen I do have (which
numbers are in patch 1) I don't have the actual workload, so cannot
compare them directly.

1) Different configurations were used for a epoll_wait (pipes io)
   microbench (http://linux-scalability.org/epoll/epoll-test.c) and shows
   around a 7-10% improvement in overall total number of times the
   epoll_wait() loops when using both regular and nested epolls, so very
   raw numbers, but measurable nonetheless.

# threads vanilla dirty
     1 1677717 1805587
     2 1660510 1854064
     4 1610184 1805484
     8 1577696 1751222
     16 1568837 1725299
     32 1291532 1378463
     64  752584  787368

   Note that stddev is pretty small.

2) Another pipe test, which shows no real measurable improvement.
   (http://www.xmailserver.org/linux-patches/pipetest.c)

Link: http://lkml.kernel.org/r/20180720172956.2883-2-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agosched/wait: assert the wait_queue_head lock is held in __wake_up_common
Christoph Hellwig [Wed, 22 Aug 2018 04:56:34 +0000 (21:56 -0700)]
sched/wait: assert the wait_queue_head lock is held in __wake_up_common

Better ensure we actually hold the lock using lockdep than just commenting
on it.  Due to the various exported _locked interfaces it is far too easy
to get the locking wrong.

Link: http://lkml.kernel.org/r/20171214152344.6880-4-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agouserfaultfd: use fault_wqh lock
Matthew Wilcox [Wed, 22 Aug 2018 04:56:30 +0000 (21:56 -0700)]
userfaultfd: use fault_wqh lock

The userfaultfd code currently uses the unlocked waitqueue helpers for
managing fault_wqh, but instead of holding the waitqueue lock for this
waitqueue around these calls, it the waitqueue lock of
fault_pending_wq, which is a different waitqueue instance.  Given that
the waitqueue is not exposed to the rest of the kernel this actually
works ok at the moment, but prevents the userfaultfd locking rules from
being enforced using lockdep.

Switch to the internally locked waitqueue helpers instead.  This means
that the lock inside fault_wqh now nests inside the fault_pending_wqh
lock, but that's not a problem since it was entirely unused before.

[hch@lst.de: slight changelog updates]
[rppt@linux.vnet.ibm.com: spotted changelog spellos]
Link: http://lkml.kernel.org/r/20171214152344.6880-3-hch@lst.de
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoepoll: use the waitqueue lock to protect ep->wq
Christoph Hellwig [Wed, 22 Aug 2018 04:56:26 +0000 (21:56 -0700)]
epoll: use the waitqueue lock to protect ep->wq

Patch series "waitqueue lockdep annotation", v3.

This series adds a strategic lockdep_assert_held to __wake_up_common to
ensure callers really do hold the wait_queue_head lock when calling the
unlocked wake_up variants.  It turns out epoll did not do this for a
fairly common path (hit all the time by systemd during bootup), so the
second patch fixed this instance as well.

This patch (of 3):

The epoll code currently uses the unlocked waitqueue helpers for managing
ep->wq, but instead of holding the waitqueue lock around these calls, it
uses its own ep->lock spinlock.  Given that the waitqueue is not exposed
to the rest of the kernel this actually works ok at the moment, but
prevents the epoll locking rules from being enforced using lockdep.
Remove ep->lock and use the waitqueue lock to not only reduce the size of
struct eventpoll but also to make sure we can assert locking invariants in
the waitqueue code.

Link: http://lkml.kernel.org/r/20171214152344.6880-2-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Baron <jbaron@akamai.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agokernel: tracepoints: add support for relative references
Ard Biesheuvel [Wed, 22 Aug 2018 04:56:22 +0000 (21:56 -0700)]
kernel: tracepoints: add support for relative references

To avoid the need for relocating absolute references to tracepoint
structures at boot time when running relocatable kernels (which may
take a disproportionate amount of space), add the option to emit
these tables as relative references instead.

Link: http://lkml.kernel.org/r/20180704083651.24360-7-ard.biesheuvel@linaro.org
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morris <james.morris@microsoft.com>
Cc: James Morris <jmorris@namei.org>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Nicolas Pitre <nico@linaro.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoPCI: Add support for relative addressing in quirk tables
Ard Biesheuvel [Wed, 22 Aug 2018 04:56:18 +0000 (21:56 -0700)]
PCI: Add support for relative addressing in quirk tables

Allow the PCI quirk tables to be emitted in a way that avoids absolute
references to the hook functions. This reduces the size of the entries,
and, more importantly, makes them invariant under runtime relocation
(e.g., for KASLR)

Link: http://lkml.kernel.org/r/20180704083651.24360-6-ard.biesheuvel@linaro.org
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morris <james.morris@microsoft.com>
Cc: James Morris <jmorris@namei.org>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Nicolas Pitre <nico@linaro.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoinit: allow initcall tables to be emitted using relative references
Ard Biesheuvel [Wed, 22 Aug 2018 04:56:13 +0000 (21:56 -0700)]
init: allow initcall tables to be emitted using relative references

Allow the initcall tables to be emitted using relative references that
are only half the size on 64-bit architectures and don't require fixups
at runtime on relocatable kernels.

Link: http://lkml.kernel.org/r/20180704083651.24360-5-ard.biesheuvel@linaro.org
Acked-by: James Morris <james.morris@microsoft.com>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Petr Mladek <pmladek@suse.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morris <jmorris@namei.org>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Nicolas Pitre <nico@linaro.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomodule: use relative references for __ksymtab entries
Ard Biesheuvel [Wed, 22 Aug 2018 04:56:09 +0000 (21:56 -0700)]
module: use relative references for __ksymtab entries

An ordinary arm64 defconfig build has ~64 KB worth of __ksymtab entries,
each consisting of two 64-bit fields containing absolute references, to
the symbol itself and to a char array containing its name, respectively.

When we build the same configuration with KASLR enabled, we end up with an
additional ~192 KB of relocations in the .init section, i.e., one 24 byte
entry for each absolute reference, which all need to be processed at boot
time.

Given how the struct kernel_symbol that describes each entry is completely
local to module.c (except for the references emitted by EXPORT_SYMBOL()
itself), we can easily modify it to contain two 32-bit relative references
instead.  This reduces the size of the __ksymtab section by 50% for all
64-bit architectures, and gets rid of the runtime relocations entirely for
architectures implementing KASLR, either via standard PIE linking (arm64)
or using custom host tools (x86).

Note that the binary search involving __ksymtab contents relies on each
section being sorted by symbol name.  This is implemented based on the
input section names, not the names in the ksymtab entries, so this patch
does not interfere with that.

Given that the use of place-relative relocations requires support both in
the toolchain and in the module loader, we cannot enable this feature for
all architectures.  So make it dependent on whether
CONFIG_HAVE_ARCH_PREL32_RELOCATIONS is defined.

Link: http://lkml.kernel.org/r/20180704083651.24360-4-ard.biesheuvel@linaro.org
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Jessica Yu <jeyu@kernel.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morris <james.morris@microsoft.com>
Cc: James Morris <jmorris@namei.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Nicolas Pitre <nico@linaro.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomodule: allow symbol exports to be disabled
Ard Biesheuvel [Wed, 22 Aug 2018 04:56:04 +0000 (21:56 -0700)]
module: allow symbol exports to be disabled

To allow existing C code to be incorporated into the decompressor or the
UEFI stub, introduce a CPP macro that turns all EXPORT_SYMBOL_xxx
declarations into nops, and #define it in places where such exports are
undesirable.  Note that this gets rid of a rather dodgy redefine of
linux/export.h's header guard.

Link: http://lkml.kernel.org/r/20180704083651.24360-3-ard.biesheuvel@linaro.org
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Nicolas Pitre <nico@linaro.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morris <james.morris@microsoft.com>
Cc: James Morris <jmorris@namei.org>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoarch: enable relative relocations for arm64, power and x86
Ard Biesheuvel [Wed, 22 Aug 2018 04:56:00 +0000 (21:56 -0700)]
arch: enable relative relocations for arm64, power and x86

Patch series "add support for relative references in special sections", v10.

This adds support for emitting special sections such as initcall arrays,
PCI fixups and tracepoints as relative references rather than absolute
references.  This reduces the size by 50% on 64-bit architectures, but
more importantly, it removes the need for carrying relocation metadata for
these sections in relocatable kernels (e.g., for KASLR) that needs to be
fixed up at boot time.  On arm64, this reduces the vmlinux footprint of
such a reference by 8x (8 byte absolute reference + 24 byte RELA entry vs
4 byte relative reference)

Patch #3 was sent out before as a single patch.  This series supersedes
the previous submission.  This version makes relative ksymtab entries
dependent on the new Kconfig symbol HAVE_ARCH_PREL32_RELOCATIONS rather
than trying to infer from kbuild test robot replies for which
architectures it should be blacklisted.

Patch #1 introduces the new Kconfig symbol HAVE_ARCH_PREL32_RELOCATIONS,
and sets it for the main architectures that are expected to benefit the
most from this feature, i.e., 64-bit architectures or ones that use
runtime relocations.

Patch #2 add support for #define'ing __DISABLE_EXPORTS to get rid of
ksymtab/kcrctab sections in decompressor and EFI stub objects when
rebuilding existing C files to run in a different context.

Patches #4 - #6 implement relative references for initcalls, PCI fixups
and tracepoints, respectively, all of which produce sections with order
~1000 entries on an arm64 defconfig kernel with tracing enabled.  This
means we save about 28 KB of vmlinux space for each of these patches.

[From the v7 series blurb, which included the jump_label patches as well]:

  For the arm64 kernel, all patches combined reduce the memory footprint
  of vmlinux by about 1.3 MB (using a config copied from Ubuntu that has
  KASLR enabled), of which ~1 MB is the size reduction of the RELA section
  in .init, and the remaining 300 KB is reduction of .text/.data.

This patch (of 6):

Before updating certain subsystems to use place relative 32-bit
relocations in special sections, to save space and reduce the number of
absolute relocations that need to be processed at runtime by relocatable
kernels, introduce the Kconfig symbol and define it for some architectures
that should be able to support and benefit from it.

Link: http://lkml.kernel.org/r/20180704083651.24360-2-ard.biesheuvel@linaro.org
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: James Morris <jmorris@namei.org>
Cc: Nicolas Pitre <nico@linaro.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
Cc: James Morris <james.morris@microsoft.com>
Cc: Jessica Yu <jeyu@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agospelling.txt: add more spellings to spelling.txt
Colin Ian King [Wed, 22 Aug 2018 04:55:56 +0000 (21:55 -0700)]
spelling.txt: add more spellings to spelling.txt

Here are some of the more common spelling mistakes and typos that I've
found while fixing up spelling mistakes in the kernel over the past 6
months.

Link: http://lkml.kernel.org/r/20180629150603.1159-1-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agokernel/hung_task.c: allow to set checking interval separately from timeout
Dmitry Vyukov [Wed, 22 Aug 2018 04:55:52 +0000 (21:55 -0700)]
kernel/hung_task.c: allow to set checking interval separately from timeout

Currently task hung checking interval is equal to timeout, as the result
hung is detected anywhere between timeout and 2*timeout.  This is fine for
most interactive environments, but this hurts automated testing setups
(syzbot).  In an automated setup we need to strictly order CPU lockup <
RCU stall < workqueue lockup < task hung < silent loss, so that RCU stall
is not detected as task hung and task hung is not detected as silent
machine loss.  The large variance in task hung detection timeout requires
setting silent machine loss timeout to a very large value (e.g.  if task
hung is 3 mins, then silent loss need to be set to ~7 mins).  The
additional 3 minutes significantly reduce testing efficiency because
usually we crash kernel within a minute, and this can add hours to bug
localization process as it needs to do dozens of tests.

Allow setting checking interval separately from timeout.  This allows to
set timeout to, say, 3 minutes, but checking interval to 10 secs.

The interval is controlled via a new hung_task_check_interval_secs sysctl,
similar to the existing hung_task_timeout_secs sysctl.  The default value
of 0 results in the current behavior: checking interval is equal to
timeout.

[akpm@linux-foundation.org: update hung_task_timeout_max's comment]
Link: http://lkml.kernel.org/r/20180611111004.203513-1-dvyukov@google.com
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agokernel/crash_core.c: print timestamp using time64_t
Arnd Bergmann [Wed, 22 Aug 2018 04:55:49 +0000 (21:55 -0700)]
kernel/crash_core.c: print timestamp using time64_t

The get_seconds() call returns a 32-bit timestamp on some architectures,
and will overflow in the future.  The newer ktime_get_real_seconds()
always returns a 64-bit timestamp that does not suffer from this problem.

Link: http://lkml.kernel.org/r/20180618150329.941903-1-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Young <dyoung@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Petr Tesarik <ptesarik@suse.cz>
Cc: Marc-Andr Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agolinux/compiler.h: don't use bool
Rasmus Villemoes [Wed, 22 Aug 2018 04:55:45 +0000 (21:55 -0700)]
linux/compiler.h: don't use bool

Appararently, it's possible to have a non-trivial TU include a few
headers, including linux/build_bug.h, without ending up with
linux/types.h.  So the 0day bot sent me

config: um-x86_64_defconfig (attached as .config)

>> include/linux/compiler.h:316:3: error: unknown type name 'bool'; did you mean '_Bool'?
      bool __cond = !(condition);    \

for something I'm working on.

Rather than contributing to the #include madness and including
linux/types.h from compiler.h, just use int.

Link: http://lkml.kernel.org/r/20180817101036.20969-1-linux@rasmusvillemoes.dk
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Christopher Li <sparse@chrisli.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agouserns: use irqsave variant of refcount_dec_and_lock()
Anna-Maria Gleixner [Wed, 22 Aug 2018 04:55:42 +0000 (21:55 -0700)]
userns: use irqsave variant of refcount_dec_and_lock()

The irqsave variant of refcount_dec_and_lock handles irqsave/restore when
taking/releasing the spin lock.  With this variant the call of
local_irq_save/restore is no longer required.

[bigeasy@linutronix.de: s@atomic_dec_and_lock@refcount_dec_and_lock@g]
Link: http://lkml.kernel.org/r/20180703200141.28415-7-bigeasy@linutronix.de
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agouserns: use refcount_t for reference counting instead atomic_t
Sebastian Andrzej Siewior [Wed, 22 Aug 2018 04:55:38 +0000 (21:55 -0700)]
userns: use refcount_t for reference counting instead atomic_t

refcount_t type and corresponding API should be used instead of atomic_t
wh en the variable is used as a reference counter.  This avoids accidental
refcounter overflows that might lead to use-after-free situations.

Link: http://lkml.kernel.org/r/20180703200141.28415-6-bigeasy@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agobdi: use irqsave variant of refcount_dec_and_lock()
Anna-Maria Gleixner [Wed, 22 Aug 2018 04:55:35 +0000 (21:55 -0700)]
bdi: use irqsave variant of refcount_dec_and_lock()

The irqsave variant of refcount_dec_and_lock handles irqsave/restore when
taking/releasing the spin lock.  With this variant the call of
local_irq_save/restore is no longer required.

[bigeasy@linutronix.de: s@atomic_dec_and_lock@refcount_dec_and_lock@g]
Link: http://lkml.kernel.org/r/20180703200141.28415-5-bigeasy@linutronix.de
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agobdi: use refcount_t for reference counting instead atomic_t
Sebastian Andrzej Siewior [Wed, 22 Aug 2018 04:55:31 +0000 (21:55 -0700)]
bdi: use refcount_t for reference counting instead atomic_t

refcount_t type and corresponding API should be used instead of atomic_t
when the variable is used as a reference counter.  This permits avoiding
accidental refcounter overflows that might lead to use-after-free
situations.

Link: http://lkml.kernel.org/r/20180703200141.28415-4-bigeasy@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agokernel.h: documentation for roundup() vs round_up()
Kees Cook [Wed, 22 Aug 2018 04:55:28 +0000 (21:55 -0700)]
kernel.h: documentation for roundup() vs round_up()

Things like 3619dec5103d ("dh key: fix rounding up KDF output length")
expose the lack of explicit documentation for roundup() vs round_up().  At
least we can try to document it better if anyone goes looking.

Link: http://lkml.kernel.org/r/20180703041950.GA43464@beast
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoinclude/asm-generic/bug.h: clarify valid uses of WARN()
Dmitry Vyukov [Wed, 22 Aug 2018 04:55:24 +0000 (21:55 -0700)]
include/asm-generic/bug.h: clarify valid uses of WARN()

Explicitly state that WARN*() should be used only for recoverable kernel
issues/bugs and that it should not be used for any kind of invalid
external inputs or transient conditions.

Motivation: it's a very useful capability to be able to understand if a
particular kernel splat means a kernel bug or simply an invalid user-space
program.  For the former one wants to notify kernel developers, while
notifying kernel developers for the latter is annoying.  Even a kernel
developer may not know what to do with a WARNING in an unfamiliar
subsystem.  This is especially critical for any automated testing systems
that may use panic_on_warn and mail kernel developers.

The clear separation also serves as an additional documentation: is it a
condition that must never occur because of additional checks/logic
elsewhere?  or is it simply a check for invalid inputs or unfortunate
conditions?

Use of pr_err() for user messages also leads to better error messages.
"Something is wrong in file foo on line X" is not particularly useful
message for end user.  pr_err() forces developers to write more meaningful
error messages for user.

As of now we are almost there.  We are doing systematic kernel testing
with panic_on_warn and are not seeing massive amounts of false positives.
But every now and then another WARN on ENOMEM or invalid inputs pops up
and leads to a lengthy argument each time.  The goal of this change is to
officially document the rules.

Link: http://lkml.kernel.org/r/20180620103716.61636-1-dvyukov@gmail.com
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoproc/kcore: add vmcoreinfo note to /proc/kcore
Omar Sandoval [Wed, 22 Aug 2018 04:55:20 +0000 (21:55 -0700)]
proc/kcore: add vmcoreinfo note to /proc/kcore

The vmcoreinfo information is useful for runtime debugging tools, not just
for crash dumps.  A lot of this information can be determined by other
means, but this is much more convenient, and it only adds a page at most
to the file.

Link: http://lkml.kernel.org/r/fddbcd08eed76344863303878b12de1c1e2a04b6.1531953780.git.osandov@fb.com
Signed-off-by: Omar Sandoval <osandov@fb.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Bhupesh Sharma <bhsharma@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: James Morse <james.morse@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agocrash_core: use VMCOREINFO_SYMBOL_ARRAY() for swapper_pg_dir
Omar Sandoval [Wed, 22 Aug 2018 04:55:17 +0000 (21:55 -0700)]
crash_core: use VMCOREINFO_SYMBOL_ARRAY() for swapper_pg_dir

This is preparation for allowing CRASH_CORE to be enabled for any
architecture.

swapper_pg_dir is always either an array or a macro expanding to NULL.
In the latter case, VMCOREINFO_SYMBOL() won't work, as it tries to take
the address of the given symbol:

#define VMCOREINFO_SYMBOL(name) \
vmcoreinfo_append_str("SYMBOL(%s)=%lx\n", #name, (unsigned long)&name)

Instead, use VMCOREINFO_SYMBOL_ARRAY(), which uses the value:

#define VMCOREINFO_SYMBOL_ARRAY(name) \
vmcoreinfo_append_str("SYMBOL(%s)=%lx\n", #name, (unsigned long)name)

This is the same thing for the array case but isn't an error for the macro
case.

Link: http://lkml.kernel.org/r/c05f9781ec204f40fc96f95086e7b6de6a3eb2c3.1532563124.git.osandov@fb.com
Signed-off-by: Omar Sandoval <osandov@fb.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Bhupesh Sharma <bhsharma@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: James Morse <james.morse@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoproc/kcore: optimize multiple page reads
Omar Sandoval [Wed, 22 Aug 2018 04:55:13 +0000 (21:55 -0700)]
proc/kcore: optimize multiple page reads

The current code does a full search of the segment list every time for
every page.  This is wasteful, since it's almost certain that the next
page will be in the same segment.  Instead, check if the previous segment
covers the current page before doing the list search.

Link: http://lkml.kernel.org/r/fd346c11090cf93d867e01b8d73a6567c5ac6361.1531953780.git.osandov@fb.com
Signed-off-by: Omar Sandoval <osandov@fb.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Bhupesh Sharma <bhsharma@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: James Morse <james.morse@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoproc/kcore: clean up ELF header generation
Omar Sandoval [Wed, 22 Aug 2018 04:55:09 +0000 (21:55 -0700)]
proc/kcore: clean up ELF header generation

Currently, the ELF file header, program headers, and note segment are
allocated all at once, in some icky code dating back to 2.3.  Programs
tend to read the file header, then the program headers, then the note
segment, all separately, so this is a waste of effort.  It's cleaner and
more efficient to handle the three separately.

Link: http://lkml.kernel.org/r/19c92cbad0e11f6103ff3274b2e7a7e51a1eb74b.1531953780.git.osandov@fb.com
Signed-off-by: Omar Sandoval <osandov@fb.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Bhupesh Sharma <bhsharma@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: James Morse <james.morse@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoproc/kcore: hold lock during read
Omar Sandoval [Wed, 22 Aug 2018 04:55:06 +0000 (21:55 -0700)]
proc/kcore: hold lock during read

Now that we're using an rwsem, we can hold it during the entirety of
read_kcore() and have a common return path.  This is preparation for the
next change.

[akpm@linux-foundation.org: fix locking bug reported by Tetsuo Handa]
Link: http://lkml.kernel.org/r/d7cfbc1e8a76616f3b699eaff9df0a2730380534.1531953780.git.osandov@fb.com
Signed-off-by: Omar Sandoval <osandov@fb.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Bhupesh Sharma <bhsharma@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: James Morse <james.morse@arm.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoproc/kcore: fix memory hotplug vs multiple opens race
Omar Sandoval [Wed, 22 Aug 2018 04:55:02 +0000 (21:55 -0700)]
proc/kcore: fix memory hotplug vs multiple opens race

There's a theoretical race condition that will cause /proc/kcore to miss
a memory hotplug event:

CPU0                              CPU1
// hotplug event 1
kcore_need_update = 1

open_kcore()                      open_kcore()
    kcore_update_ram()                kcore_update_ram()
        // Walk RAM                       // Walk RAM
        __kcore_update_ram()              __kcore_update_ram()
            kcore_need_update = 0

// hotplug event 2
kcore_need_update = 1
                                              kcore_need_update = 0

Note that CPU1 set up the RAM kcore entries with the state after hotplug
event 1 but cleared the flag for hotplug event 2.  The RAM entries will
therefore be stale until there is another hotplug event.

This is an extremely unlikely sequence of events, but the fix makes the
synchronization saner, anyways: we serialize the entire update sequence,
which means that whoever clears the flag will always succeed in replacing
the kcore list.

Link: http://lkml.kernel.org/r/6106c509998779730c12400c1b996425df7d7089.1531953780.git.osandov@fb.com
Signed-off-by: Omar Sandoval <osandov@fb.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Bhupesh Sharma <bhsharma@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: James Morse <james.morse@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoproc/kcore: replace kclist_lock rwlock with rwsem
Omar Sandoval [Wed, 22 Aug 2018 04:54:59 +0000 (21:54 -0700)]
proc/kcore: replace kclist_lock rwlock with rwsem

Now we only need kclist_lock from user context and at fs init time, and
the following changes need to sleep while holding the kclist_lock.

Link: http://lkml.kernel.org/r/521ba449ebe921d905177410fee9222d07882f0d.1531953780.git.osandov@fb.com
Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Bhupesh Sharma <bhsharma@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: James Morse <james.morse@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoproc/kcore: don't grab lock for memory hotplug notifier
Omar Sandoval [Wed, 22 Aug 2018 04:54:55 +0000 (21:54 -0700)]
proc/kcore: don't grab lock for memory hotplug notifier

The memory hotplug notifier kcore_callback() only needs kclist_lock to
prevent races with __kcore_update_ram(), but we can easily eliminate that
race by using an atomic xchg() in __kcore_update_ram().  This is
preparation for converting kclist_lock to an rwsem.

Link: http://lkml.kernel.org/r/0a4bc89f4dbde8b5b2ea309f7b4fb6a85fe29df2.1531953780.git.osandov@fb.com
Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Bhupesh Sharma <bhsharma@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: James Morse <james.morse@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoproc/kcore: don't grab lock for kclist_add()
Omar Sandoval [Wed, 22 Aug 2018 04:54:51 +0000 (21:54 -0700)]
proc/kcore: don't grab lock for kclist_add()

Patch series "/proc/kcore improvements", v4.

This series makes a few improvements to /proc/kcore.  It fixes a couple of
small issues in v3 but is otherwise the same.  Patches 1, 2, and 3 are
prep patches.  Patch 4 is a fix/cleanup.  Patch 5 is another prep patch.
Patches 6 and 7 are optimizations to ->read().  Patch 8 makes it possible
to enable CRASH_CORE on any architecture, which is needed for patch 9.
Patch 9 adds vmcoreinfo to /proc/kcore.

This patch (of 9):

kclist_add() is only called at init time, so there's no point in grabbing
any locks.  We're also going to replace the rwlock with a rwsem, which we
don't want to try grabbing during early boot.

While we're here, mark kclist_add() with __init so that we'll get a
warning if it's called from non-init code.

Link: http://lkml.kernel.org/r/98208db1faf167aa8b08eebfa968d95c70527739.1531953780.git.osandov@fb.com
Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Bhupesh Sharma <bhsharma@redhat.com>
Tested-by: Bhupesh Sharma <bhsharma@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Bhupesh Sharma <bhsharma@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: James Morse <james.morse@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agofs/proc/kcore.c: use __pa_symbol() for KCORE_TEXT list entries
James Morse [Wed, 22 Aug 2018 04:54:48 +0000 (21:54 -0700)]
fs/proc/kcore.c: use __pa_symbol() for KCORE_TEXT list entries

elf_kcore_store_hdr() uses __pa() to find the physical address of
KCORE_RAM or KCORE_TEXT entries exported as program headers.

This trips CONFIG_DEBUG_VIRTUAL's checks, as the KCORE_TEXT entries are
not in the linear map.

Handle these two cases separately, using __pa_symbol() for the KCORE_TEXT
entries.

Link: http://lkml.kernel.org/r/20180711131944.15252-1-james.morse@arm.com
Signed-off-by: James Morse <james.morse@arm.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Omar Sandoval <osandov@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agofs/proc/vmcore.c: use new typedef vm_fault_t
Souptick Joarder [Wed, 22 Aug 2018 04:54:44 +0000 (21:54 -0700)]
fs/proc/vmcore.c: use new typedef vm_fault_t

Use new return type vm_fault_t for fault handler in struct
vm_operations_struct.  For now, this is just documenting that the function
returns a VM_FAULT value rather than an errno.  Once all instances are
converted, vm_fault_t will become a distinct type.

See 1c8f422059ae ("mm: change return type to vm_fault_t") for reference.

Link: http://lkml.kernel.org/r/20180702153325.GA3875@jordon-HP-15-Notebook-PC
Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Ganesh Goudar <ganeshgr@chelsio.com>
Cc: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoproc: use "unsigned int" in /proc/stat hook
Alexey Dobriyan [Wed, 22 Aug 2018 04:54:41 +0000 (21:54 -0700)]
proc: use "unsigned int" in /proc/stat hook

Number of CPUs is never high enough to force 64-bit arithmetic.
Save couple of bytes on x86_64.

Link: http://lkml.kernel.org/r/20180627200710.GC18434@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoproc: spread "const" a bit
Alexey Dobriyan [Wed, 22 Aug 2018 04:54:37 +0000 (21:54 -0700)]
proc: spread "const" a bit

Link: http://lkml.kernel.org/r/20180627200614.GB18434@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoproc: use macro in /proc/latency hook
Alexey Dobriyan [Wed, 22 Aug 2018 04:54:34 +0000 (21:54 -0700)]
proc: use macro in /proc/latency hook

->latency_record is defined as

struct latency_record[LT_SAVECOUNT];

so use the same macro whie iterating.

Link: http://lkml.kernel.org/r/20180627200534.GA18434@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoproc: save 2 atomic ops on write to "/proc/*/attr/*"
Alexey Dobriyan [Wed, 22 Aug 2018 04:54:30 +0000 (21:54 -0700)]
proc: save 2 atomic ops on write to "/proc/*/attr/*"

Code checks if write is done by current to its own attributes.
For that get/put pair is unnecessary as it can be done under RCU.

Note: rcu_read_unlock() can be done even earlier since pointer to a task
is not dereferenced. It depends if /proc code should look scary or not:

rcu_read_lock();
task = pid_task(...);
rcu_read_unlock();
if (!task)
return -ESRCH;
if (task != current)
return -EACCESS:

P.S.: rename "length" variable. Code like this

length = -EINVAL;

should not exist.

Link: http://lkml.kernel.org/r/20180627200218.GF18113@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoproc: put task earlier in /proc/*/fail-nth
Alexey Dobriyan [Wed, 22 Aug 2018 04:54:27 +0000 (21:54 -0700)]
proc: put task earlier in /proc/*/fail-nth

Link: http://lkml.kernel.org/r/20180627195427.GE18113@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoproc: smaller readlock section in readdir("/proc")
Alexey Dobriyan [Wed, 22 Aug 2018 04:54:23 +0000 (21:54 -0700)]
proc: smaller readlock section in readdir("/proc")

Readdir context is thread local, so ->pos is thread local,
move it out of readlock.

Link: http://lkml.kernel.org/r/20180627195339.GD18113@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoproc: test /proc/thread-self symlink
Alexey Dobriyan [Wed, 22 Aug 2018 04:54:20 +0000 (21:54 -0700)]
proc: test /proc/thread-self symlink

Same story: I have WIP patch to make it faster, so better have a test
as well.

Link: http://lkml.kernel.org/r/20180627195209.GC18113@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoproc: test /proc/self symlink
Alexey Dobriyan [Wed, 22 Aug 2018 04:54:16 +0000 (21:54 -0700)]
proc: test /proc/self symlink

There are plans to change how /proc/self result is calculated,
for that a test is necessary.

Use direct system call because of this whole getpid caching story.

Link: http://lkml.kernel.org/r/20180627195103.GB18113@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agofs/proc/uptime.c: use ktime_get_boottime_ts64
Arnd Bergmann [Wed, 22 Aug 2018 04:54:13 +0000 (21:54 -0700)]
fs/proc/uptime.c: use ktime_get_boottime_ts64

get_monotonic_boottime() is deprecated and uses the old timespec type.
Let's convert /proc/uptime to use ktime_get_boottime_ts64().

Link: http://lkml.kernel.org/r/20180620081746.282742-1-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Deepa Dinamani <deepa.kernel@gmail.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoproc: fixup PDE allocation bloat
Alexey Dobriyan [Wed, 22 Aug 2018 04:54:09 +0000 (21:54 -0700)]
proc: fixup PDE allocation bloat

24074a35c5c975 ("proc: Make inline name size calculation automatic")
started to put PDE allocations into kmalloc-256 which is unnecessary as
~40 character names are very rare.

Put allocation back into kmalloc-192 cache for 64-bit non-debug builds.

Put BUILD_BUG_ON to know when PDE size has gotten out of control.

[adobriyan@gmail.com: fix BUILD_BUG_ON breakage on powerpc64]
Link: http://lkml.kernel.org/r/20180703191602.GA25521@avx2
Link: http://lkml.kernel.org/r/20180617215732.GA24688@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomm: fix comment for NODEMASK_ALLOC
Oscar Salvador [Wed, 22 Aug 2018 04:54:06 +0000 (21:54 -0700)]
mm: fix comment for NODEMASK_ALLOC

Currently, NODEMASK_ALLOC allocates a nodemask_t with kmalloc when
NODES_SHIFT is higher than 8, otherwise it declares it within the stack.

The comment says that the reasoning behind this, is that nodemask_t will
be 256 bytes when NODES_SHIFT is higher than 8, but this is not true.  For
example, NODES_SHIFT = 9 will give us a 64 bytes nodemask_t.  Let us fix
up the comment for that.

Another thing is that it might make sense to let values lower than
128bytes be allocated in the stack.  Although this all depends on the
depth of the stack (and this changes from function to function), I think
that 64 bytes is something we can easily afford.  So we could even bump
the limit by 1 (from > 8 to > 9).

Link: http://lkml.kernel.org/r/20180820085516.9687-1-osalvador@techadventures.net
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agodrivers/block/zram/zram_drv.c: fix bug storing backing_dev
Peter Kalauskas [Wed, 22 Aug 2018 04:54:02 +0000 (21:54 -0700)]
drivers/block/zram/zram_drv.c: fix bug storing backing_dev

The call to strlcpy in backing_dev_store is incorrect. It should take
the size of the destination buffer instead of the size of the source
buffer.  Additionally, ignore the newline character (\n) when reading
the new file_name buffer. This makes it possible to set the backing_dev
as follows:

echo /dev/sdX > /sys/block/zram0/backing_dev

The reason it worked before was the fact that strlcpy() copies 'len - 1'
bytes, which is strlen(buf) - 1 in our case, so it accidentally didn't
copy the trailing new line symbol.  Which also means that "echo -n
/dev/sdX" most likely was broken.

Signed-off-by: Peter Kalauskas <peskal@google.com>
Link: http://lkml.kernel.org/r/20180813061623.GC64836@rodete-desktop-imager.corp.google.com
Acked-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: <stable@vger.kernel.org> [4.14+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years ago/proc/meminfo: add percpu populated pages count
Dennis Zhou (Facebook) [Wed, 22 Aug 2018 04:53:58 +0000 (21:53 -0700)]
/proc/meminfo: add percpu populated pages count

Currently, percpu memory only exposes allocation and utilization
information via debugfs.  This more or less is only really useful for
understanding the fragmentation and allocation information at a per-chunk
level with a few global counters.  This is also gated behind a config.
BPF and cgroup, for example, have seen an increase in use causing
increased use of percpu memory.  Let's make it easier for someone to
identify how much memory is being used.

This patch adds the "Percpu" stat to meminfo to more easily look up how
much percpu memory is in use.  This number includes the cost for all
allocated backing pages and not just insight at the per a unit, per chunk
level.  Metadata is excluded.  I think excluding metadata is fair because
the backing memory scales with the numbere of cpus and can quickly
outweigh the metadata.  It also makes this calculation light.

Link: http://lkml.kernel.org/r/20180807184723.74919-1-dennisszhou@gmail.com
Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomm, oom: introduce memory.oom.group
Roman Gushchin [Wed, 22 Aug 2018 04:53:54 +0000 (21:53 -0700)]
mm, oom: introduce memory.oom.group

For some workloads an intervention from the OOM killer can be painful.
Killing a random task can bring the workload into an inconsistent state.

Historically, there are two common solutions for this
problem:
1) enabling panic_on_oom,
2) using a userspace daemon to monitor OOMs and kill
   all outstanding processes.

Both approaches have their downsides: rebooting on each OOM is an obvious
waste of capacity, and handling all in userspace is tricky and requires a
userspace agent, which will monitor all cgroups for OOMs.

In most cases an in-kernel after-OOM cleaning-up mechanism can eliminate
the necessity of enabling panic_on_oom.  Also, it can simplify the cgroup
management for userspace applications.

This commit introduces a new knob for cgroup v2 memory controller:
memory.oom.group.  The knob determines whether the cgroup should be
treated as an indivisible workload by the OOM killer.  If set, all tasks
belonging to the cgroup or to its descendants (if the memory cgroup is not
a leaf cgroup) are killed together or not at all.

To determine which cgroup has to be killed, we do traverse the cgroup
hierarchy from the victim task's cgroup up to the OOMing cgroup (or root)
and looking for the highest-level cgroup with memory.oom.group set.

Tasks with the OOM protection (oom_score_adj set to -1000) are treated as
an exception and are never killed.

This patch doesn't change the OOM victim selection algorithm.

Link: http://lkml.kernel.org/r/20180802003201.817-4-guro@fb.com
Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomm, oom: refactor oom_kill_process()
Roman Gushchin [Wed, 22 Aug 2018 04:53:50 +0000 (21:53 -0700)]
mm, oom: refactor oom_kill_process()

Patch series "introduce memory.oom.group", v2.

This is a tiny implementation of cgroup-aware OOM killer, which adds an
ability to kill a cgroup as a single unit and so guarantee the integrity
of the workload.

Although it has only a limited functionality in comparison to what now
resides in the mm tree (it doesn't change the victim task selection
algorithm, doesn't look at memory stas on cgroup level, etc), it's also
much simpler and more straightforward.  So, hopefully, we can avoid having
long debates here, as we had with the full implementation.

As it doesn't prevent any futher development, and implements an useful and
complete feature, it looks as a sane way forward.

This patch (of 2):

oom_kill_process() consists of two logical parts: the first one is
responsible for considering task's children as a potential victim and
printing the debug information.  The second half is responsible for
sending SIGKILL to all tasks sharing the mm struct with the given victim.

This commit splits oom_kill_process() with an intention to re-use the the
second half: __oom_kill_process().

The cgroup-aware OOM killer will kill multiple tasks belonging to the
victim cgroup.  We don't need to print the debug information for the each
task, as well as play with task selection (considering task's children),
so we can't use the existing oom_kill_process().

Link: http://lkml.kernel.org/r/20171130152824.1591-2-guro@fb.com
Link: http://lkml.kernel.org/r/20180802003201.817-3-guro@fb.com
Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: David Rientjes <rientjes@google.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agotools/testing/selftests/vm/: add MAP_POPULATE test
Dmitry Safonov [Wed, 22 Aug 2018 04:53:47 +0000 (21:53 -0700)]
tools/testing/selftests/vm/: add MAP_POPULATE test

As with many other projects, we use some shmalloc allocator.  At some
point we need to make a part of allocated pages back private to process.
And it should be populated straight away.  Check that (MAP_PRIVATE |
MAP_POPULATE) actually copies the private page.

[akpm@linux-foundation.org: change message, per review discussion]
Link: http://lkml.kernel.org/r/20180801233636.29354-1-dima@arista.com
Signed-off-by: Dmitry Safonov <dima@arista.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Hua Zhong <hzhong@arista.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Stuart Ritchie <sritchie@arista.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomm/page_alloc: Introduce free_area_init_core_hotplug
Oscar Salvador [Wed, 22 Aug 2018 04:53:43 +0000 (21:53 -0700)]
mm/page_alloc: Introduce free_area_init_core_hotplug

Currently, whenever a new node is created/re-used from the memhotplug
path, we call free_area_init_node()->free_area_init_core().  But there is
some code that we do not really need to run when we are coming from such
path.

free_area_init_core() performs the following actions:

1) Initializes pgdat internals, such as spinlock, waitqueues and more.
2) Account # nr_all_pages and # nr_kernel_pages. These values are used later on
   when creating hash tables.
3) Account number of managed_pages per zone, substracting dma_reserved and
   memmap pages.
4) Initializes some fields of the zone structure data
5) Calls init_currently_empty_zone to initialize all the freelists
6) Calls memmap_init to initialize all pages belonging to certain zone

When called from memhotplug path, free_area_init_core() only performs
actions #1 and #4.

Action #2 is pointless as the zones do not have any pages since either the
node was freed, or we are re-using it, eitherway all zones belonging to
this node should have 0 pages.  For the same reason, action #3 results
always in manages_pages being 0.

Action #5 and #6 are performed later on when onlining the pages:
 online_pages()->move_pfn_range_to_zone()->init_currently_empty_zone()
 online_pages()->move_pfn_range_to_zone()->memmap_init_zone()

This patch does two things:

First, moves the node/zone initializtion to their own function, so it
allows us to create a small version of free_area_init_core, where we only
perform:

1) Initialization of pgdat internals, such as spinlock, waitqueues and more
4) Initialization of some fields of the zone structure data

These two functions are: pgdat_init_internals() and zone_init_internals().

The second thing this patch does, is to introduce
free_area_init_core_hotplug(), the memhotplug version of
free_area_init_core():

Currently, we call free_area_init_node() from the memhotplug path.  In
there, we set some pgdat's fields, and call calculate_node_totalpages().
calculate_node_totalpages() calculates the # of pages the node has.

Since the node is either new, or we are re-using it, the zones belonging
to this node should not have any pages, so there is no point to calculate
this now.

Actually, we re-set these values to 0 later on with the calls to:

reset_node_managed_pages()
reset_node_present_pages()

The # of pages per node and the # of pages per zone will be calculated when
onlining the pages:

online_pages()->move_pfn_range()->move_pfn_range_to_zone()->resize_zone_range()
online_pages()->move_pfn_range()->move_pfn_range_to_zone()->resize_pgdat_range()

Also, since free_area_init_core/free_area_init_node will now only get called during early init, let us replace
__paginginit with __init, so their code gets freed up.

[osalvador@techadventures.net: fix section usage]
Link: http://lkml.kernel.org/r/20180731101752.GA473@techadventures.net
[osalvador@suse.de: v6]
Link: http://lkml.kernel.org/r/20180801122348.21588-6-osalvador@techadventures.net
Link: http://lkml.kernel.org/r/20180730101757.28058-5-osalvador@techadventures.net
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Pasha Tatashin <Pavel.Tatashin@microsoft.com>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomm/page_alloc: inline function to handle CONFIG_DEFERRED_STRUCT_PAGE_INIT
Oscar Salvador [Wed, 22 Aug 2018 04:53:39 +0000 (21:53 -0700)]
mm/page_alloc: inline function to handle CONFIG_DEFERRED_STRUCT_PAGE_INIT

Let us move the code between CONFIG_DEFERRED_STRUCT_PAGE_INIT to an inline
function.  Not having an ifdef in the function makes the code more
readable.

Link: http://lkml.kernel.org/r/20180730101757.28058-4-osalvador@techadventures.net
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Pasha Tatashin <Pavel.Tatashin@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomm: remove __paginginit
Pavel Tatashin [Wed, 22 Aug 2018 04:53:36 +0000 (21:53 -0700)]
mm: remove __paginginit

__paginginit is the same thing as __meminit except for platforms without
sparsemem, there it is defined as __init.

Remove __paginginit and use __meminit.  Use __ref in one single function
that merges __meminit and __init sections: setup_usemap().

Link: http://lkml.kernel.org/r/20180801122348.21588-4-osalvador@techadventures.net
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Pasha Tatashin <Pavel.Tatashin@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomm: access zone->node via zone_to_nid() and zone_set_nid()
Pavel Tatashin [Wed, 22 Aug 2018 04:53:32 +0000 (21:53 -0700)]
mm: access zone->node via zone_to_nid() and zone_set_nid()

zone->node is configured only when CONFIG_NUMA=y, so it is a good idea to
have inline functions to access this field in order to avoid ifdef's in c
files.

Link: http://lkml.kernel.org/r/20180730101757.28058-3-osalvador@techadventures.net
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Pasha Tatashin <Pavel.Tatashin@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomm/page_alloc.c: move ifdefery out of free_area_init_core
Oscar Salvador [Wed, 22 Aug 2018 04:53:29 +0000 (21:53 -0700)]
mm/page_alloc.c: move ifdefery out of free_area_init_core

Patch series "Refactor free_area_init_core and add
free_area_init_core_hotplug", v6.

This patchset does three things:

 1) Clean up/refactor free_area_init_core/free_area_init_node
    by moving the ifdefery out of the functions.
 2) Move the pgdat/zone initialization in free_area_init_core to its
    own function.
 3) Introduce free_area_init_core_hotplug, a small subset of
    free_area_init_core, which is only called from memhotlug code path. In this
    way, we have:

    free_area_init_core: called during early initialization
    free_area_init_core_hotplug: called whenever a new node is allocated/re-used (memhotplug path)

This patch (of 5):

Moving the #ifdefs out of the function makes it easier to follow.

Link: http://lkml.kernel.org/r/20180730101757.28058-2-osalvador@techadventures.net
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Pasha Tatashin <Pavel.Tatashin@microsoft.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomm: remove zone_id() and make use of zone_idx() in is_dev_zone()
Oscar Salvador [Wed, 22 Aug 2018 04:53:24 +0000 (21:53 -0700)]
mm: remove zone_id() and make use of zone_idx() in is_dev_zone()

is_dev_zone() is using zone_id() to check if the zone is ZONE_DEVICE.
zone_id() looks pretty much the same as zone_idx(), and while the use of
zone_idx() is quite spread in the kernel, zone_id() is only being used by
is_dev_zone().

This patch removes zone_id() and makes is_dev_zone() use zone_idx() to
check the zone, so we do not have two things with the same functionality
around.

Link: http://lkml.kernel.org/r/20180730133718.28683-1-osalvador@techadventures.net
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Pasha Tatashin <Pavel.Tatashin@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoDocumentation/sysctl/vm.txt: update __vm_enough_memory()'s path
juviliu [Wed, 22 Aug 2018 04:53:20 +0000 (21:53 -0700)]
Documentation/sysctl/vm.txt: update __vm_enough_memory()'s path

__vm_enough_memory has moved to mm/util.c.

Link: http://lkml.kernel.org/r/E18EDF4A4FA4A04BBFA824B6D7699E532A7E5913@EXMBX-SZMAIL013.tencent.com
Signed-off-by: Juvi Liu <juviliu@tencent.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomemcg: reduce memcg tree traversals for stats collection
Shakeel Butt [Wed, 22 Aug 2018 04:53:17 +0000 (21:53 -0700)]
memcg: reduce memcg tree traversals for stats collection

Currently cgroup-v1's memcg_stat_show traverses the memcg tree ~17 times
to collect the stats while cgroup-v2's memory_stat_show traverses the
memcg tree thrice.  On a large machine, a couple thousand memcgs is very
normal and if the churn is high and memcgs stick around during to several
reasons, tens of thousands of nodes in memcg tree can exist.  This patch
has refactored and shared the stat collection code between cgroup-v1 and
cgroup-v2 and has reduced the tree traversal to just one.

I ran a simple benchmark which reads the root_mem_cgroup's stat file
1000 times in the presense of 2500 memcgs on cgroup-v1. The results are:

Without the patch:
$ time ./read-root-stat-1000-times

real    0m1.663s
user    0m0.000s
sys     0m1.660s

With the patch:
$ time ./read-root-stat-1000-times

real    0m0.468s
user    0m0.000s
sys     0m0.467s

Link: http://lkml.kernel.org/r/20180724224635.143944-1-shakeelb@google.com
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Bruce Merry <bmerry@ska.ac.za>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>