muen/linux.git
4 years agoslab: make usercopy region 32-bit
Alexey Dobriyan [Thu, 5 Apr 2018 23:21:31 +0000 (16:21 -0700)]
slab: make usercopy region 32-bit

If kmem case sizes are 32-bit, then usecopy region should be too.

Link: http://lkml.kernel.org/r/20180305200730.15812-21-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: David Miller <davem@davemloft.net>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agokasan: make kasan_cache_create() work with 32-bit slab cache sizes
Alexey Dobriyan [Thu, 5 Apr 2018 23:21:28 +0000 (16:21 -0700)]
kasan: make kasan_cache_create() work with 32-bit slab cache sizes

If SLAB doesn't support 4GB+ kmem caches (it never did), KASAN should
not do it as well.

Link: http://lkml.kernel.org/r/20180305200730.15812-20-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoslab: make kmem_cache_flags accept 32-bit object size
Alexey Dobriyan [Thu, 5 Apr 2018 23:21:24 +0000 (16:21 -0700)]
slab: make kmem_cache_flags accept 32-bit object size

Now that all sizes are properly typed, propagate "unsigned int" down the
callgraph.

Link: http://lkml.kernel.org/r/20180305200730.15812-19-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoslub: make ->size unsigned int
Alexey Dobriyan [Thu, 5 Apr 2018 23:21:20 +0000 (16:21 -0700)]
slub: make ->size unsigned int

Linux doesn't support negative length objects (including meta data).

Link: http://lkml.kernel.org/r/20180305200730.15812-18-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoslub: make ->object_size unsigned int
Alexey Dobriyan [Thu, 5 Apr 2018 23:21:17 +0000 (16:21 -0700)]
slub: make ->object_size unsigned int

Linux doesn't support negative length objects.

Link: http://lkml.kernel.org/r/20180305200730.15812-17-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoslub: make ->offset unsigned int
Alexey Dobriyan [Thu, 5 Apr 2018 23:21:13 +0000 (16:21 -0700)]
slub: make ->offset unsigned int

->offset is free pointer offset from the start of the object, can't be
negative.

Link: http://lkml.kernel.org/r/20180305200730.15812-16-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoslub: make ->cpu_partial unsigned int
Alexey Dobriyan [Thu, 5 Apr 2018 23:21:10 +0000 (16:21 -0700)]
slub: make ->cpu_partial unsigned int

/*
 * cpu_partial determined the maximum number of objects
 * kept in the per cpu partial lists of a processor.
 */

Can't be negative.

Link: http://lkml.kernel.org/r/20180305200730.15812-15-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoslub: make ->inuse unsigned int
Alexey Dobriyan [Thu, 5 Apr 2018 23:21:06 +0000 (16:21 -0700)]
slub: make ->inuse unsigned int

->inuse is "the number of bytes in actual use by the object",
can't be negative.

Link: http://lkml.kernel.org/r/20180305200730.15812-14-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoslub: make ->align unsigned int
Alexey Dobriyan [Thu, 5 Apr 2018 23:21:02 +0000 (16:21 -0700)]
slub: make ->align unsigned int

Kmem cache alignment can't be negative.

Link: http://lkml.kernel.org/r/20180305200730.15812-13-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoslub: make ->reserved unsigned int
Alexey Dobriyan [Thu, 5 Apr 2018 23:20:58 +0000 (16:20 -0700)]
slub: make ->reserved unsigned int

->reserved is either 0 or sizeof(struct rcu_head), can't be negative.

Link: http://lkml.kernel.org/r/20180305200730.15812-12-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoslub: make ->red_left_pad unsigned int
Alexey Dobriyan [Thu, 5 Apr 2018 23:20:55 +0000 (16:20 -0700)]
slub: make ->red_left_pad unsigned int

Padding length can't be negative.

Link: http://lkml.kernel.org/r/20180305200730.15812-11-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoslub: make ->max_attr_size unsigned int
Alexey Dobriyan [Thu, 5 Apr 2018 23:20:51 +0000 (16:20 -0700)]
slub: make ->max_attr_size unsigned int

->max_attr_size is maximum length of every SLAB memcg attribute
ever written. VFS limits those to INT_MAX.

Link: http://lkml.kernel.org/r/20180305200730.15812-10-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoslub: make ->remote_node_defrag_ratio unsigned int
Alexey Dobriyan [Thu, 5 Apr 2018 23:20:48 +0000 (16:20 -0700)]
slub: make ->remote_node_defrag_ratio unsigned int

->remote_node_defrag_ratio is in range 0..1000.

This also adds a check and modifies the behavior to return an error
code.  Before this patch invalid values were ignored.

Link: http://lkml.kernel.org/r/20180305200730.15812-9-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoslab: make size_index_elem() unsigned int
Alexey Dobriyan [Thu, 5 Apr 2018 23:20:44 +0000 (16:20 -0700)]
slab: make size_index_elem() unsigned int

size_index_elem() always works with small sizes (kmalloc caches are
32-bit) and returns small indexes.

Link: http://lkml.kernel.org/r/20180305200730.15812-8-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoslab: make size_index[] array u8
Alexey Dobriyan [Thu, 5 Apr 2018 23:20:40 +0000 (16:20 -0700)]
slab: make size_index[] array u8

All those small numbers are reverse indexes into kmalloc caches array
and can't be negative.

On x86_64 "unsigned int = fls()" can drop CDQE instruction:

add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-2 (-2)
Function                                     old     new   delta
kmalloc_slab                                 101      99      -2

Link: http://lkml.kernel.org/r/20180305200730.15812-7-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoslab: make kmem_cache_create() work with 32-bit sizes
Alexey Dobriyan [Thu, 5 Apr 2018 23:20:37 +0000 (16:20 -0700)]
slab: make kmem_cache_create() work with 32-bit sizes

struct kmem_cache::size and ::align were always 32-bit.

Out of curiosity I created 4GB kmem_cache, it oopsed with division by 0.
kmem_cache_create(1UL<<32+1) created 1-byte cache as expected.

size_t doesn't work and never did.

Link: http://lkml.kernel.org/r/20180305200730.15812-6-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoslab: make create_boot_cache() work with 32-bit sizes
Alexey Dobriyan [Thu, 5 Apr 2018 23:20:33 +0000 (16:20 -0700)]
slab: make create_boot_cache() work with 32-bit sizes

struct kmem_cache::size has always been "int", all those
"size_t size" are fake.

Link: http://lkml.kernel.org/r/20180305200730.15812-5-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoslab: make create_kmalloc_cache() work with 32-bit sizes
Alexey Dobriyan [Thu, 5 Apr 2018 23:20:29 +0000 (16:20 -0700)]
slab: make create_kmalloc_cache() work with 32-bit sizes

KMALLOC_MAX_CACHE_SIZE is 32-bit so is the largest kmalloc cache size.

Christoph said:
:
: Ok SLABs maximum allocation size is limited to 32M (see
: include/linux/slab.h:
:
: #define KMALLOC_SHIFT_HIGH      ((MAX_ORDER + PAGE_SHIFT - 1) <= 25 ? \
:                                 (MAX_ORDER + PAGE_SHIFT - 1) : 25)
:
: And SLUB/SLOB pass all larger requests to the page allocator anyways.

Link: http://lkml.kernel.org/r/20180305200730.15812-4-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoslab: make kmalloc_size() return "unsigned int"
Alexey Dobriyan [Thu, 5 Apr 2018 23:20:26 +0000 (16:20 -0700)]
slab: make kmalloc_size() return "unsigned int"

kmalloc_size() derives size of kmalloc cache from internal index, which
can't be negative.

Propagate unsignedness a bit.

Link: http://lkml.kernel.org/r/20180305200730.15812-3-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoslab: make kmalloc_index() return "unsigned int"
Alexey Dobriyan [Thu, 5 Apr 2018 23:20:22 +0000 (16:20 -0700)]
slab: make kmalloc_index() return "unsigned int"

kmalloc_index() return index into an array of kmalloc kmem caches,
therefore should be unsigned.

Space savings with SLUB on trimmed down .config:

add/remove: 0/1 grow/shrink: 6/56 up/down: 85/-557 (-472)
Function                                     old     new   delta
calculate_sizes                              924     983     +59
on_freelist                                  589     604     +15
init_cache_random_seq                        122     127      +5
ext4_mb_init                                1206    1210      +4
slab_pad_check.part                          270     271      +1
cpu_partial_store                            112     113      +1
usersize_show                                 28      27      -1
...
new_slab                                    1871    1837     -34
slab_order                                   204       -    -204

This patch start a series of converting SLUB (mostly) to "unsigned int".

1) Most integers in the code are in fact unsigned entities: array
   indexes, lengths, buffer sizes, allocation orders. It is therefore
   better to use unsigned variables

2) Some integers in the code are either "size_t" or "unsigned long" for
   no reason.

   size_t usually comes from people trying to maintain type correctness
   and figuring out that "sizeof" operator returns size_t or
   memset/memcpy takes size_t so should everything passed to it.

   However the number of 4GB+ objects in the kernel is very small. Most,
   if not all, dynamically allocated objects with kmalloc() or
   kmem_cache_create() aren't actually big. Maintaining wide types
   doesn't do anything.

   64-bit ops are bigger than 32-bit on our beloved x86_64,
   so try to not use 64-bit where it isn't necessary
   (read: everywhere where integers are integers not pointers)

3) in case of SLAB allocators, there are additional limitations
   *) page->inuse, page->objects are only 16-/15-bit,
   *) cache size was always 32-bit
   *) slab orders are small, order 20 is needed to go 64-bit on x86_64
      (PAGE_SIZE << order)

Basically everything is 32-bit except kmalloc(1ULL<<32) which gets
shortcut through page allocator.

Christoph said:
:
: That changes with large base page size on power and ARM64 f.e. but then
: we do not want to encourage larger allocations through slab anyways.

Link: http://lkml.kernel.org/r/20180305200730.15812-2-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoslab: fixup calculate_alignment() argument type
Alexey Dobriyan [Thu, 5 Apr 2018 23:20:18 +0000 (16:20 -0700)]
slab: fixup calculate_alignment() argument type

Link: http://lkml.kernel.org/r/20180305200730.15812-1-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/slub.c: use jitter-free reference while printing age
Chintan Pandya [Thu, 5 Apr 2018 23:20:15 +0000 (16:20 -0700)]
mm/slub.c: use jitter-free reference while printing age

When SLUB_DEBUG catches some issues, it prints all the required debug
info.  However, in a few cases where allocation and free of the object
has happened in a very short time, 'age' might be misleading.  See the
example below:

  =============================================================================
  BUG kmalloc-256 (Tainted: G        W  O   ): Poison overwritten
  -----------------------------------------------------------------------------
  ...
  INFO: Allocated in binder_transaction+0x4b0/0x2448 age=731 cpu=3 pid=5314
  ...
  INFO: Freed in binder_free_transaction+0x2c/0x58 age=735 cpu=6 pid=2079
  ...
  Object fffffff14956a870: 6b 6b 6b 6b 6b 6b 6b 6b 67 6b 6b 6b 6b 6b 6b a5  kkkkkkkkgkkkk

In this case, object got freed later but 'age' shows otherwise.  This
could be because, while printing this info, we print allocation traces
first and free traces thereafter.  In between, if we get schedule out or
jiffies increment, (jiffies - t->when) could become meaningless.

Use the jitter free reference to calculate age.

New output will exactly be same.  'age' is still staying with single
jiffies ref in both prints.

Change-Id: I0846565807a4229748649bbecb1ffb743d71fcd8
Link: http://lkml.kernel.org/r/1520492010-19389-1-git-send-email-cpandya@codeaurora.org
Signed-off-by: Chintan Pandya <cpandya@codeaurora.org>
Acked-by: Christoph Lameter <cl@linux.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/slab_common.c: mark kmalloc machinery as __ro_after_init
Alexey Dobriyan [Thu, 5 Apr 2018 23:20:11 +0000 (16:20 -0700)]
mm/slab_common.c: mark kmalloc machinery as __ro_after_init

kmalloc caches aren't relocated after being set up neither does
"size_index" array.

Link: http://lkml.kernel.org/r/20180226203519.GA6886@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agofs: don't flush pagecache when expanding block device
shunki-fujita [Thu, 5 Apr 2018 23:20:07 +0000 (16:20 -0700)]
fs: don't flush pagecache when expanding block device

When changing the size of a block device, its all caches are freed.
It's necessary on shrinking to prevent spurious I/Os to the disappeared
region.  However, on expanding, such kind of I/Os doesn't happen.

Similar things can be considered for btrfs filesystem resize and
resize2fs, but they are designed not to drop caches when expanding.
Therefore this patch removes unnecessary cache drop.

Link: http://lkml.kernel.org/r/1521457240-153390-1-git-send-email-shunki-fujita@cybozu.co.jp
Signed-off-by: Shunki Fujita <shunki-fujita@cybozu.co.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agonet/9p/client.c: fix potential refcnt problem of trans module
Chengguang Xu [Thu, 5 Apr 2018 23:20:01 +0000 (16:20 -0700)]
net/9p/client.c: fix potential refcnt problem of trans module

When specifying trans_mod multiple times in a mount, it will cause an
inaccurate refcount of the trans module.  Also, in the error case of
option parsing, we should put the trans module if we have already got
it.

Link: http://lkml.kernel.org/r/1522154942-57339-1-git-send-email-cgxu519@gmx.com
Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@sandia.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agofs/9p: don't set SB_NOATIME by default
Yiwen Jiang [Thu, 5 Apr 2018 23:19:57 +0000 (16:19 -0700)]
fs/9p: don't set SB_NOATIME by default

When the user uses some syscall, for example mmap(v9fs_file_mmap), it
will not update atime even if user's was set mnt_flags without
MNT_NOATIME, because v9fs defaults to settine SB_NOATIME in
v9fs_set_super.

For supporting access time updating when the user mounts with relatime,
we should not set SB_NOATIME by default.

Link: http://lkml.kernel.org/r/5AB9A377.6080906@huawei.com
Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@sandia.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years ago9p: check memory allocation result for cachetag
Chengguang Xu [Thu, 5 Apr 2018 23:19:53 +0000 (16:19 -0700)]
9p: check memory allocation result for cachetag

Check memory allocation result for cachetag in mount option parsing and
fix potential memory leak in the error case.

Link: http://lkml.kernel.org/r/1521614889-73446-1-git-send-email-cgxu519@gmx.com
Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@sandia.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Cc: <v9fs-developer@lists.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years ago9p: don't maintain dir i_nlink if the exported fs doesn't either
Eryu Guan [Thu, 5 Apr 2018 23:19:49 +0000 (16:19 -0700)]
9p: don't maintain dir i_nlink if the exported fs doesn't either

If the exported filesystem dir on 9p server doesn't maintain accurate
i_nlink count, e.g.  always reports i_nlink as 1, then 9p should not
maintain nlink count either, otherwise drop_link would report warning
with i_nlink being zero.

For example:

 - overlayfs sets nlink to 1 for merged dir

 - ext4 (with dir_nlink feature enabled) sets nlink to 1 if a dir has
   more than EXT4_LINK_MAX (65000) links.

In this case, everytime a stat(2) call (getattr) on such exported dirs
on 9p client side, the i_nlink gets reset to 1, then operations like
rmdir(2), unlink(2) and rename(2) would cause the dir nlink to go to
zero (then negative), which results in warnings in drop_nlink() and/or
inc_nlink() calls.

This can be reproduced easily as the following steps:

 - export a merged overlayfs dir via qemu virtfs to guest

 - mount the exported virtfs in guest

 - create two sub-directories in the root dir of the mounted 9pfs

 - stat the root dir of 9pfs, this resets nlink to 1

 - remove all subdirs, the second unlink/rmdir would trigger warning

  ------------[ cut here ]------------
  WARNING: CPU: 3 PID: 1284 at fs/inode.c:282 drop_nlink+0x3e/0x50
  ...
  Call Trace:
    dump_stack+0x63/0x81
    __warn+0xcb/0xf0
    warn_slowpath_null+0x1d/0x20
    drop_nlink+0x3e/0x50
    v9fs_remove+0xaa/0x130 [9p]
    v9fs_vfs_rmdir+0x13/0x20 [9p]
    vfs_rmdir+0xb7/0x130
    do_rmdir+0x1b8/0x230
    SyS_unlinkat+0x22/0x30
    do_syscall_64+0x67/0x180
  ---[ end trace 43758d8ba91e603b ]---

Fix it by leaving i_nlink to be 1 and don't drop nlink if a directory
has nlink <= 2, which indicates that the underlying exported fs doesn't
maintain nlink count accurately.  This follows what ext4 does in
ext4_dec_count().

Link: http://lkml.kernel.org/r/20180312053829.4367-1-eguan@linux.alibaba.com
Signed-off-by: Eryu Guan <eguan@linux.alibaba.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Tested-by: Roman Kapl <code@rkapl.cz>
Cc: Caspar Zhang <caspar@linux.alibaba.com>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@sandia.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Cc: <v9fs-developer@lists.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agonet/9p: avoid -ERESTARTSYS leak to userspace
Greg Kurz [Thu, 5 Apr 2018 23:19:44 +0000 (16:19 -0700)]
net/9p: avoid -ERESTARTSYS leak to userspace

If it was interrupted by a signal, the 9p client may need to send some
more requests to the server for cleanup before returning to userspace.

To avoid such a last minute request to be interrupted right away, the
client memorizes if a signal is pending, clears TIF_SIGPENDING, handles
the request and calls recalc_sigpending() before returning.

Unfortunately, if the transmission of this cleanup request fails for any
reason, the transport returns an error and the client propagates it
right away, without calling recalc_sigpending().

This ends up with -ERESTARTSYS from the initially interrupted request
crawling up to syscall exit, with TIF_SIGPENDING cleared by the cleanup
request.  The specific signal handling code, which is responsible for
converting -ERESTARTSYS to -EINTR is not called, and userspace receives
the confusing errno value:

  open: Unknown error 512 (512)

This is really hard to hit in real life.  I discovered the issue while
working on hot-unplug of a virtio-9p-pci device with an instrumented
QEMU allowing to control request completion.

Both p9_client_zc_rpc() and p9_client_rpc() functions have this buggy
error path actually.  Their code flow is a bit obscure and the best
thing to do would probably be a full rewrite: to really ensure this
situation of clearing TIF_SIGPENDING and returning -ERESTARTSYS can
never happen.

But given the general lack of interest for the 9p code, I won't risk
breaking more things.  So this patch simply fixes the buggy paths in
both functions with a trivial label+goto.

Thanks to Laurent Dufour for his help and suggestions on how to find the
root cause and how to fix it.

Link: http://lkml.kernel.org/r/152062809886.10599.7361006774123053312.stgit@bahia.lan
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@sandia.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Cc: David Miller <davem@davemloft.net>
Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoocfs2/dlm: clean up unused variable in dlm_process_recovery_data
Changwei Ge [Thu, 5 Apr 2018 23:19:40 +0000 (16:19 -0700)]
ocfs2/dlm: clean up unused variable in dlm_process_recovery_data

Link: http://lkml.kernel.org/r/1522734135-7933-1-git-send-email-ge.changwei@h3c.com
Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
Acked-by: Joseph Qi <jiangqi903@gmail.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoocfs2/o2hb: check len for bio_add_page() to avoid getting incorrect bio
piaojun [Thu, 5 Apr 2018 23:19:36 +0000 (16:19 -0700)]
ocfs2/o2hb: check len for bio_add_page() to avoid getting incorrect bio

We need to check len for bio_add_page() to make sure the bio has been
set up correctly, otherwise we may submit incorrect data to device.

Link: http://lkml.kernel.org/r/5ABC3EBE.5020807@huawei.com
Signed-off-by: Jun Piao <piaojun@huawei.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Reviewed-by: Changwei Ge <ge.changwei@h3c.com>
Acked-by: Joseph Qi <jiangqi903@gmail.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoocfs2: add duplicated ino number check
Gang He [Thu, 5 Apr 2018 23:19:32 +0000 (16:19 -0700)]
ocfs2: add duplicated ino number check

Add duplicated ino number check, to avoid adding a file into the file
check list when this file is being checked.

Link: http://lkml.kernel.org/r/1495611866-27360-5-git-send-email-ghe@suse.com
Signed-off-by: Gang He <ghe@suse.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoocfs2: add kobject for online file check
Gang He [Thu, 5 Apr 2018 23:19:29 +0000 (16:19 -0700)]
ocfs2: add kobject for online file check

Use embedded kobject mechanism for online file check feature, this will
avoid to use a global list to save/search per-device online file check
related data, meanwhile, reduce the code lines and make the code logic
clear.  The changed code is based on Goldwyn Rodrigues's patches and
ext4 fs code.

Link: http://lkml.kernel.org/r/1495611866-27360-4-git-send-email-ghe@suse.com
Signed-off-by: Gang He <ghe@suse.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoocfs2: fix some small problems
Gang He [Thu, 5 Apr 2018 23:19:25 +0000 (16:19 -0700)]
ocfs2: fix some small problems

First, move setting fe_done = 1 in spin lock, avoid bring any potential
race condition.

Second, tune mlog message level from ERROR to NOTICE, since the message
should not belong to error message.

Third, tune errno to -EAGAIN when file check queue is full, this errno
is more appropriate in the case.

Link: http://lkml.kernel.org/r/1495611866-27360-3-git-send-email-ghe@suse.com
Signed-off-by: Gang He <ghe@suse.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoocfs2: move some definitions to header file
Gang He [Thu, 5 Apr 2018 23:19:22 +0000 (16:19 -0700)]
ocfs2: move some definitions to header file

Patch series "ocfs2: use kobject for online file check", v3.

Use embedded kobject mechanism for online file check feature, this will
avoid to use a global list to save/search per-device online file check
related data.  The changed code is based on Goldwyn Rodrigues's patches
and ext4 fs code, there is not any new features added, except some very
small fixes during this code refactoring.  Second, the code change does
not affect the underlying file check code.  Thank Goldwyn very much.

Compare with second version, add more comments in the patch
descriptions, to make sure each modification is mentioned.  Compare with
first version, split the code change into four patches, make sure each
patch will not bring ocfs2 kernel modules compiling errors.

This patch (of 3):

Move some definitions to header file, which will be referenced by other
source files when kobject mechanism is introduced.

Link: http://lkml.kernel.org/r/1495611866-27360-2-git-send-email-ghe@suse.com
Signed-off-by: Gang He <ghe@suse.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoocfs2: correct spelling mistake for migratable for all
Changwei Ge [Thu, 5 Apr 2018 23:19:18 +0000 (16:19 -0700)]
ocfs2: correct spelling mistake for migratable for all

Inspired by the ocfs2 patch to fix the spelling of migrateable to
migratable, I checked all ocfs2 files and found more spelling mistakes.
So correct them all.

Link: http://lkml.kernel.org/r/1521525734-19576-1-git-send-email-ge.changwei@h3c.com
Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoocfs2: fix spelling mistake: "Migrateable" -> "Migratable"
Colin Ian King [Thu, 5 Apr 2018 23:19:14 +0000 (16:19 -0700)]
ocfs2: fix spelling mistake: "Migrateable" -> "Migratable"

Trivial fix to spelling mistake in mlog message text

Link: http://lkml.kernel.org/r/20180319114101.2051-1-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoocfs2/dlm: wait for dlm recovery done when migrating all lock resources
piaojun [Thu, 5 Apr 2018 23:19:11 +0000 (16:19 -0700)]
ocfs2/dlm: wait for dlm recovery done when migrating all lock resources

Wait for dlm recovery done when migrating all lock resources in case that
new lock resource left after leaving dlm domain.  And the left lock
resource will cause other nodes BUG.

        NodeA                       NodeB                NodeC

  umount:
    dlm_unregister_domain()
      dlm_migrate_all_locks()

                                   NodeB down

  do recovery for NodeB
  and collect a new lockres
  form other live nodes:

    dlm_do_recovery
      dlm_remaster_locks
        dlm_request_all_locks:

    dlm_mig_lockres_handler
      dlm_new_lockres
        __dlm_insert_lockres

  at last NodeA become the
  master of the new lockres
  and leave domain:
    dlm_leave_domain()

                                                    mount:
                                                      dlm_join_domain()

                                                    touch file and request
                                                    for the owner of the new
                                                    lockres, but all the
                                                    other nodes said 'NO',
                                                    so NodeC decide to be
                                                    the owner, and send do
                                                    assert msg to other
                                                    nodes:
                                                    dlmlock()
                                                      dlm_get_lock_resource()
                                                        dlm_do_assert_master()

                                                    other nodes receive the msg
                                                    and found two masters exist.
                                                    at last cause BUG in
                                                    dlm_assert_master_handler()
                                                    -->BUG();

Link: http://lkml.kernel.org/r/5AAA6E25.7090303@huawei.com
Fixes: bc9838c4d44a ("dlm: allow dlm do recovery during shutdown")
Signed-off-by: Jun Piao <piaojun@huawei.com>
Reviewed-by: Alex Chen <alex.chen@huawei.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Acked-by: Joseph Qi <jiangqi903@gmail.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoocfs2/dlm: clean up unused stack variable in dlm_do_local_ast()
Changwei Ge [Thu, 5 Apr 2018 23:19:07 +0000 (16:19 -0700)]
ocfs2/dlm: clean up unused stack variable in dlm_do_local_ast()

Link: http://lkml.kernel.org/r/1521116681-14602-2-git-send-email-ge.changwei@h3c.com
Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoocfs2/dlm: clean up unused argument for dlm_destroy_recovery_area()
Changwei Ge [Thu, 5 Apr 2018 23:19:03 +0000 (16:19 -0700)]
ocfs2/dlm: clean up unused argument for dlm_destroy_recovery_area()

Link: http://lkml.kernel.org/r/1521116681-14602-1-git-send-email-ge.changwei@h3c.com
Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agofs/ocfs2/dlm/dlmrecovery.c: remove unrelated comment
Changwei Ge [Thu, 5 Apr 2018 23:19:00 +0000 (16:19 -0700)]
fs/ocfs2/dlm/dlmrecovery.c: remove unrelated comment

Obviously, the comment before dlm_do_local_recovery_cleanup() has nothing
to do with it.  So remove it.

Link: http://lkml.kernel.org/r/1519371054-4648-1-git-send-email-ge.changwei@h3c.com
Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
Acked-by: Joseph Qi <jiangqi903@gmail.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoocfs2: remove two unused functions from suballoc.c
Changwei Ge [Thu, 5 Apr 2018 23:18:56 +0000 (16:18 -0700)]
ocfs2: remove two unused functions from suballoc.c

The two functions are no longer used.

Link: http://lkml.kernel.org/r/1519609595-26229-1-git-send-email-ge.changwei@h3c.com
Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoocfs2: remove unnecessary null pointer check before kmem_cache_destroy()
piaojun [Thu, 5 Apr 2018 23:18:52 +0000 (16:18 -0700)]
ocfs2: remove unnecessary null pointer check before kmem_cache_destroy()

As kmem_cache_destroy() already handles null pointers, so we can remove
the conditional test entirely.

Link: http://lkml.kernel.org/r/5A9EB21D.3000209@huawei.com
Signed-off-by: Jun Piao <piaojun@huawei.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoocfs2/dlm: don't handle migrate lockres if already in shutdown
Jun Piao [Thu, 5 Apr 2018 23:18:48 +0000 (16:18 -0700)]
ocfs2/dlm: don't handle migrate lockres if already in shutdown

We should not handle migrate lockres if we are already in
'DLM_CTXT_IN_SHUTDOWN', as that will cause lockres remains after leaving
dlm domain.  At last other nodes will get stuck into infinite loop when
requsting lock from us.

The problem is caused by concurrency umount between nodes.  Before
receiveing N1's DLM_BEGIN_EXIT_DOMAIN_MSG, N2 has picked up N1 as the
migrate target.  So N2 will continue sending lockres to N1 even though
N1 has left domain.

        N1                             N2 (owner)
                                       touch file

    access the file,
    and get pr lock

                                       begin leave domain and
                                       pick up N1 as new owner

    begin leave domain and
    migrate all lockres done

                                       begin migrate lockres to N1

    end leave domain, but
    the lockres left
    unexpectedly, because
    migrate task has passed

[piaojun@huawei.com: v3]
Link: http://lkml.kernel.org/r/5A9CBD19.5020107@huawei.com
Link: http://lkml.kernel.org/r/5A99F028.2090902@huawei.com
Signed-off-by: Jun Piao <piaojun@huawei.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Reviewed-by: Joseph Qi <jiangqi903@gmail.com>
Reviewed-by: Changwei Ge <ge.changwei@h3c.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoocfs2: keep the trace point consistent with the function name
Jia Guo [Thu, 5 Apr 2018 23:18:45 +0000 (16:18 -0700)]
ocfs2: keep the trace point consistent with the function name

Keep the trace point consistent with the function name.

Link: http://lkml.kernel.org/r/02609aba-84b2-a22d-3f3b-bc1944b94260@huawei.com
Fixes: 3ef045c3d8ae ("ocfs2: switch to ->write_iter()")
Signed-off-by: Jia Guo <guojia12@huawei.com>
Reviewed-by: Jun Piao <piaojun@huawei.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Reviewed-by: Alex Chen <alex.chen@huawei.com>
Acked-by: Gang He <ghe@suse.com>
Acked-by: Joseph Qi <jiangqi903@gmail.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoocfs2: remove some unused function declarations
piaojun [Thu, 5 Apr 2018 23:18:41 +0000 (16:18 -0700)]
ocfs2: remove some unused function declarations

Remove some unused function declarations in dlmcommon.h.

Link: http://lkml.kernel.org/r/5A7D1034.7050807@huawei.com
Signed-off-by: Jun Piao <piaojun@huawei.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Reviewed-by: Changwei Ge <ge.changwei@h3c.com>
Reviewed-by: Alex Chen <alex.chen@huawei.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoocfs2: use 'oi' instead of 'OCFS2_I()'
piaojun [Thu, 5 Apr 2018 23:18:37 +0000 (16:18 -0700)]
ocfs2: use 'oi' instead of 'OCFS2_I()'

We could use 'oi' instead of 'OCFS2_I()' to make code more elegant.

Link: http://lkml.kernel.org/r/5A7020FE.5050906@huawei.com
Signed-off-by: Jun Piao <piaojun@huawei.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Reviewed-by: Alex Chen <alex.chen@huawei.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoocfs2: use 'osb' instead of 'OCFS2_SB()'
piaojun [Thu, 5 Apr 2018 23:18:33 +0000 (16:18 -0700)]
ocfs2: use 'osb' instead of 'OCFS2_SB()'

We could use 'osb' instead of 'OCFS2_SB()' to make code more elegant.

Link: http://lkml.kernel.org/r/5A702111.7090907@huawei.com
Signed-off-by: Jun Piao <piaojun@huawei.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoscripts/faddr2line: show the code context
Changbin Du [Thu, 5 Apr 2018 23:18:29 +0000 (16:18 -0700)]
scripts/faddr2line: show the code context

Inspired by gdb command 'list', show the code context of target lines.
Here is a example:

$ scripts/faddr2line vmlinux native_write_msr+0x6
native_write_msr+0x6/0x20:
arch_static_branch at arch/x86/include/asm/msr.h:105
100             return EAX_EDX_VAL(val, low, high);
101     }
102
103     static inline void notrace __wrmsr(unsigned int msr, u32 low, u32 high)
104     {
105             asm volatile("1: wrmsr\n"
106                          "2:\n"
107                          _ASM_EXTABLE_HANDLE(1b, 2b, ex_handler_wrmsr_unsafe)
108                          : : "c" (msr), "a"(low), "d" (high) : "memory");
109     }
110
(inlined by) static_key_false at include/linux/jump_label.h:142
137     #define JUMP_TYPE_LINKED        2UL
138     #define JUMP_TYPE_MASK          3UL
139
140     static __always_inline bool static_key_false(struct static_key *key)
141     {
142             return arch_static_branch(key, false);
143     }
144
145     static __always_inline bool static_key_true(struct static_key *key)
146     {
147             return !arch_static_branch(key, true);
(inlined by) native_write_msr at arch/x86/include/asm/msr.h:150
145     static inline void notrace
146     native_write_msr(unsigned int msr, u32 low, u32 high)
147     {
148             __wrmsr(msr, low, high);
149
150             if (msr_tracepoint_active(__tracepoint_write_msr))
151                     do_trace_write_msr(msr, ((u64)high << 32 | low), 0);
152     }
153
154     /* Can be uninlined because referenced by paravirt */
155     static inline int notrace

Link: http://lkml.kernel.org/r/1521444205-2259-1-git-send-email-changbin.du@intel.com
Signed-off-by: Changbin Du <changbin.du@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: NeilBrown <neilb@suse.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agolib: fix stall in __bitmap_parselist()
Yury Norov [Thu, 5 Apr 2018 23:18:25 +0000 (16:18 -0700)]
lib: fix stall in __bitmap_parselist()

syzbot is catching stalls at __bitmap_parselist()
(https://syzkaller.appspot.com/bug?id=ad7e0351fbc90535558514a71cd3edc11681997a).
The trigger is

  unsigned long v = 0;
  bitmap_parselist("7:,", &v, BITS_PER_LONG);

which results in hitting infinite loop at

    while (a <= b) {
    off = min(b - a + 1, used_size);
    bitmap_set(maskp, a, off);
    a += group_size;
    }

due to used_size == group_size == 0.

Link: http://lkml.kernel.org/r/20180404162647.15763-1-ynorov@caviumnetworks.com
Fixes: 0a5ce0831d04382a ("lib/bitmap.c: make bitmap_parselist() thread-safe and much faster")
Signed-off-by: Yury Norov <ynorov@caviumnetworks.com>
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: syzbot <syzbot+6887cbb011c8054e8a3d@syzkaller.appspotmail.com>
Cc: Noam Camus <noamca@mellanox.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agohugetlbfs: fix bug in pgoff overflow checking
Mike Kravetz [Thu, 5 Apr 2018 23:18:21 +0000 (16:18 -0700)]
hugetlbfs: fix bug in pgoff overflow checking

This is a fix for a regression in 32 bit kernels caused by an invalid
check for pgoff overflow in hugetlbfs mmap setup.  The check incorrectly
specified that the size of a loff_t was the same as the size of a long.
The regression prevents mapping hugetlbfs files at offsets greater than
4GB on 32 bit kernels.

On 32 bit kernels conversion from a page based unsigned long can not
overflow a loff_t byte offset.  Therefore, skip this check if
sizeof(unsigned long) != sizeof(loff_t).

Link: http://lkml.kernel.org/r/20180330145402.5053-1-mike.kravetz@oracle.com
Fixes: 63489f8e8211 ("hugetlbfs: check for pgoff value overflow")
Reported-by: Dan Rue <dan.rue@linaro.org>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Tested-by: Anders Roxell <anders.roxell@linaro.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Yisheng Xie <xieyisheng1@huawei.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Nic Losby <blurbdust@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agozboot: fix stack protector in compressed boot phase
Huacai Chen [Thu, 5 Apr 2018 23:18:18 +0000 (16:18 -0700)]
zboot: fix stack protector in compressed boot phase

Calling __stack_chk_guard_setup() in decompress_kernel() is too late
that stack checking always fails for decompress_kernel() itself.  So
remove __stack_chk_guard_setup() and initialize __stack_chk_guard before
we call decompress_kernel().

Original code comes from ARM but also used for MIPS and SH, so fix them
together.  If without this fix, compressed booting of these archs will
fail because stack checking is enabled by default (>=4.16).

Link: http://lkml.kernel.org/r/1522226933-29317-1-git-send-email-chenhc@lemote.com
Fixes: 8779657d29c0 ("stackprotector: Introduce CONFIG_CC_STACKPROTECTOR_STRONG")
Signed-off-by: Huacai Chen <chenhc@lemote.com>
Acked-by: James Hogan <jhogan@kernel.org>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Rich Felker <dalias@libc.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoMerge tag 'char-misc-4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregk...
Linus Torvalds [Thu, 5 Apr 2018 03:07:20 +0000 (20:07 -0700)]
Merge tag 'char-misc-4.17-rc1' of git://git./linux/kernel/git/gregkh/char-misc

Pull char/misc updates from Greg KH:
 "Here is the big set of char/misc driver patches for 4.17-rc1.

  There are a lot of little things in here, nothing huge, but all
  important to the different hardware types involved:

   -  thunderbolt driver updates

   -  parport updates (people still care...)

   -  nvmem driver updates

   -  mei updates (as always)

   -  hwtracing driver updates

   -  hyperv driver updates

   -  extcon driver updates

   -  ... and a handful of even smaller driver subsystem and individual
      driver updates

  All of these have been in linux-next with no reported issues"

* tag 'char-misc-4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (149 commits)
  hwtracing: Add HW tracing support menu
  intel_th: Add ACPI glue layer
  intel_th: Allow forcing host mode through drvdata
  intel_th: Pick up irq number from resources
  intel_th: Don't touch switch routing in host mode
  intel_th: Use correct method of finding hub
  intel_th: Add SPDX GPL-2.0 header to replace GPLv2 boilerplate
  stm class: Make dummy's master/channel ranges configurable
  stm class: Add SPDX GPL-2.0 header to replace GPLv2 boilerplate
  MAINTAINERS: Bestow upon myself the care for drivers/hwtracing
  hv: add SPDX license id to Kconfig
  hv: add SPDX license to trace
  Drivers: hv: vmbus: do not mark HV_PCIE as perf_device
  Drivers: hv: vmbus: respect what we get from hv_get_synint_state()
  /dev/mem: Avoid overwriting "err" in read_mem()
  eeprom: at24: use SPDX identifier instead of GPL boiler-plate
  eeprom: at24: simplify the i2c functionality checking
  eeprom: at24: fix a line break
  eeprom: at24: tweak newlines
  eeprom: at24: refactor at24_probe()
  ...

4 years agoMerge tag 'driver-core-4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Thu, 5 Apr 2018 02:41:45 +0000 (19:41 -0700)]
Merge tag 'driver-core-4.17-rc1' of git://git./linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here is the "big" set of driver core patches for 4.17-rc1.

  There's really not much here, just a bunch of firmware code
  refactoring from Luis as he attempts to wrangle that codebase into
  something that is managable, along with a bunch of userspace tests for
  it. Other than that, a handful of small bugfixes and reverts of things
  that didn't work out.

  Full details are in the shortlog, it's not all that much.

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'driver-core-4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (30 commits)
  drivers: base: remove check for callback in coredump_store()
  mt7601u: use firmware_request_cache() to address cache on reboot
  firmware: add firmware_request_cache() to help with cache on reboot
  firmware: fix typo on pr_info_once() when ignore_sysfs_fallback is used
  firmware: explicitly include vmalloc.h
  firmware: ensure the firmware cache is not used on incompatible calls
  test_firmware: modify custom fallback tests to use unique files
  firmware: add helper to check to see if fw cache is setup
  firmware: fix checking for return values for fw_add_devm_name()
  rename: _request_firmware_load() fw_load_sysfs_fallback()
  test_firmware: test three firmware kernel configs using a proc knob
  test_firmware: expand on library with shared helpers
  firmware: enable to force disable the fallback mechanism at run time
  firmware: enable run time change of forcing fallback loader
  firmware: move firmware loader into its own directory
  firmware: split firmware fallback functionality into its own file
  firmware: move loading timeout under struct firmware_fallback_config
  firmware: use helpers for setting up a temporary cache timeout
  firmware: simplify CONFIG_FW_LOADER_USER_HELPER_FALLBACK further
  drivers: base: add description for .coredump() callback
  ...

4 years agoMerge tag 'staging-4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh...
Linus Torvalds [Thu, 5 Apr 2018 01:56:27 +0000 (18:56 -0700)]
Merge tag 'staging-4.17-rc1' of git://git./linux/kernel/git/gregkh/staging

Pull staging/IIO updates from Greg KH:
 "Here is the big set of Staging/IIO driver patches for 4.17-rc1.

  It is a lot, over 500 changes, but not huge by previous kernel release
  standards. We deleted more lines than we added again (27k added vs.
  91k remvoed), thanks to finally being able to delete the IRDA drivers
  and networking code.

  We also deleted the ccree crypto driver, but that's coming back in
  through the crypto tree to you, in a much cleaned-up form.

  Added this round is at lot of "mt7621" device support, which is for an
  embedded device that Neil Brown cares about, and of course a handful
  of new IIO drivers as well.

  And finally, the fsl-mc core code moved out of the staging tree to the
  "real" part of the kernel, which is nice to see happen as well.

  Full details are in the shortlog, which has all of the tiny cleanup
  patches described.

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'staging-4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (579 commits)
  staging: rtl8723bs: Remove yield call, replace with cond_resched()
  staging: rtl8723bs: Replace yield() call with cond_resched()
  staging: rtl8723bs: Remove unecessary newlines from 'odm.h'.
  staging: rtl8723bs: Rework 'struct _ODM_Phy_Status_Info_' coding style.
  staging: rtl8723bs: Rework 'struct _ODM_Per_Pkt_Info_' coding style.
  staging: rtl8723bs: Replace NULL pointer comparison with '!'.
  staging: rtl8723bs: Factor out rtl8723bs_recv_tasklet() sections.
  staging: rtl8723bs: Fix function signature that goes over 80 characters.
  staging: rtl8723bs: Fix lines too long in update_recvframe_attrib().
  staging: rtl8723bs: Remove unnecessary blank lines in 'rtl8723bs_recv.c'.
  staging: rtl8723bs: Change camel case to snake case in 'rtl8723bs_recv.c'.
  staging: rtl8723bs: Add missing braces in else statement.
  staging: rtl8723bs: Add spaces around ternary operators.
  staging: rtl8723bs: Fix lines with trailing open parentheses.
  staging: rtl8723bs: Remove unnecessary length #define's.
  staging: rtl8723bs: Fix IEEE80211 authentication algorithm constants.
  staging: rtl8723bs: Fix alignment in rtw_wx_set_auth().
  staging: rtl8723bs: Remove braces from single statement conditionals.
  staging: rtl8723bs: Remove unecessary braces from switch statement.
  staging: rtl8723bs: Fix newlines in rtw_wx_set_auth().
  ...

4 years agoMerge tag 'tty-4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Linus Torvalds [Thu, 5 Apr 2018 01:43:49 +0000 (18:43 -0700)]
Merge tag 'tty-4.17-rc1' of git://git./linux/kernel/git/gregkh/tty

Pull tty/serial driver updates from Greg KH:
 "Here is the big set of tty and serial driver patches for 4.17-rc1

  Not all that big really, most are just small fixes and additions to
  existing drivers. There's a bunch of work on the imx serial driver
  recently for some reason, and a new embedded serial driver added as
  well.

  Full details are in the shortlog.

  All of these have been in the linux-next tree for a while with no
  reported issues"

* tag 'tty-4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (66 commits)
  serial: expose buf_overrun count through proc interface
  serial: mvebu-uart: fix tx lost characters
  tty: serial: msm_geni_serial: Fix return value check in qcom_geni_serial_probe()
  tty: serial: msm_geni_serial: Add serial driver support for GENI based QUP
  8250-men-mcb: add support for 16z025 and 16z057
  powerpc: Mark the variable earlycon_acpi_spcr_enable maybe_unused
  serial: stm32: fix initialization of RS485 mode
  ARM: dts: STi: Remove "console=ttyASN" from bootargs for STi boards
  vt: change SGR 21 to follow the standards
  serdev: Fix typo in serdev_device_alloc
  ARM: dts: STi: Fix aliases property name for STi boards
  tty: st-asc: Update tty alias
  serial: stm32: add support for RS485 hardware control mode
  dt-bindings: serial: stm32: add RS485 optional properties
  selftests: add devpts selftests
  devpts: comment devpts_mntget()
  devpts: resolve devpts bind-mounts
  devpts: hoist out check for DEVPTS_SUPER_MAGIC
  serial: 8250: Add Nuvoton NPCM UART
  serial: mxs-auart: disable clks of Alphascale ASM9260
  ...

4 years agoMerge tag 'usb-4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Linus Torvalds [Thu, 5 Apr 2018 00:55:35 +0000 (17:55 -0700)]
Merge tag 'usb-4.17-rc1' of git://git./linux/kernel/git/gregkh/usb

Pull USB/PHY updates from Greg KH:
 "Here is the big set of USB and PHY driver patches for 4.17-rc1.

  Lots of USB typeC work happened this round, with code moving from the
  staging directory into the "real" part of the kernel, as well as new
  infrastructure being added to be able to handle the different types of
  "roles" that typeC requires.

  There is also the normal huge set of USB gadget controller and driver
  updates, along with XHCI changes, and a raft of other tiny fixes all
  over the USB tree. And the PHY driver updates are merged in here as
  well as they interacted with the USB drivers in some places.

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'usb-4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (250 commits)
  Revert "USB: serial: ftdi_sio: add Id for Physik Instrumente E-870"
  usb: musb: gadget: misplaced out of bounds check
  usb: chipidea: imx: Fix ULPI on imx53
  usb: chipidea: imx: Cleanup ci_hdrc_imx_platform_flag
  usb: chipidea: usbmisc: small clean up
  usb: chipidea: usbmisc: evdo can be set e/o reset
  usb: chipidea: usbmisc: evdo is only specific to OTG port
  USB: serial: ftdi_sio: add Id for Physik Instrumente E-870
  usb: dwc3: gadget: never call ->complete() from ->ep_queue()
  usb: gadget: udc: core: update usb_ep_queue() documentation
  usb: host: Remove the deprecated ATH79 USB host config options
  usb: roles: Fix return value check in intel_xhci_usb_probe()
  USB: gadget: f_midi: fixing a possible double-free in f_midi
  usb: core: Add USB_QUIRK_DELAY_CTRL_MSG to usbcore quirks
  usb: core: Copy parameter string correctly and remove superfluous null check
  USB: announce bcdDevice as well as idVendor, idProduct.
  USB:fix USB3 devices behind USB3 hubs not resuming at hibernate thaw
  usb: hub: Reduce warning to notice on power loss
  USB: serial: ftdi_sio: add support for Harman FirmwareHubEmulator
  USB: serial: cp210x: add ELDAT Easywave RX09 id
  ...

4 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Linus Torvalds [Thu, 5 Apr 2018 00:42:38 +0000 (17:42 -0700)]
Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:
 "This fixes some fallout from the net-next merge the other day, plus
  some non-merge-window-related bug fixes:

  1) Fix sparse warnings in bcmgenet, systemport, b53, and mt7530
     (Florian Fainelli)

  2) pptp does a bogus dst_release() on a route we have a single
     refcount on, and attached to a socket, which needs that refcount
     (Eric Dumazet)

  3) UDP connected sockets on ipv6 can race with route update handling,
     resulting in a pre-PMTU update route still stuck on the socket and
     thus continuing to get ICMPV6_PKT_TOOBIG errors. We end up never
     seeing the updated route. (Alexey Kodanev)

  4) Missing list initializer(s) in TIPC (Jon Maloy)

  5) Connect phy early to prevent crashes in lan78xx driver (Alexander
     Graf)

  6) Fix build with modular NVMEM (Arnd Bergmann)

  7) netdevsim canot mark nsim_devlink_net_ops and nsim_fib_net_ops as
     __net_initdata, as these are references from module unload
     unconditionally (Arnd Bergmann)"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (23 commits)
  netdevsim: remove incorrect __net_initdata annotations
  sfc: remove ctpio_dmabuf_start from stats
  inet: frags: fix ip6frag_low_thresh boundary
  tipc: Fix namespace violation in tipc_sk_fill_sock_diag
  net: avoid unneeded atomic operation in ip*_append_data()
  nvmem: disallow modular CONFIG_NVMEM
  net: hns3: fix length overflow when CONFIG_ARM64_64K_PAGES
  nfp: use full 40 bits of the NSP buffer address
  lan78xx: Connect phy early
  nfp: add a separate counter for packets with CHECKSUM_COMPLETE
  tipc: Fix missing list initializations in struct tipc_subscription
  ipv6: udp: set dst cache for a connected sk if current not valid
  ipv6: udp: convert 'connected' to bool type in udpv6_sendmsg()
  ipv6: allow to cache dst for a connected sk in ip6_sk_dst_lookup_flow()
  ipv6: add a wrapper for ip6_dst_store() with flowi6 checks
  net: phy: marvell10g: add thermal hwmon device
  pptp: remove a buggy dst release in pptp_connect()
  net: dsa: mt7530: Use NULL instead of plain integer
  net: dsa: b53: Fix sparse warnings in b53_mmap.c
  af_unix: remove redundant lockdep class
  ...

4 years agoMerge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Linus Torvalds [Thu, 5 Apr 2018 00:11:08 +0000 (17:11 -0700)]
Merge branch 'linus' of git://git./linux/kernel/git/herbert/crypto-2.6

Pull crypto updates from Herbert Xu:
 "API:

   - add AEAD support to crypto engine

   - allow batch registration in simd

  Algorithms:

   - add CFB mode

   - add speck block cipher

   - add sm4 block cipher

   - new test case for crct10dif

   - improve scheduling latency on ARM

   - scatter/gather support to gcm in aesni

   - convert x86 crypto algorithms to skcihper

  Drivers:

   - hmac(sha224/sha256) support in inside-secure

   - aes gcm/ccm support in stm32

   - stm32mp1 support in stm32

   - ccree driver from staging tree

   - gcm support over QI in caam

   - add ks-sa hwrng driver"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (212 commits)
  crypto: ccree - remove unused enums
  crypto: ahash - Fix early termination in hash walk
  crypto: brcm - explicitly cast cipher to hash type
  crypto: talitos - don't leak pointers to authenc keys
  crypto: qat - don't leak pointers to authenc keys
  crypto: picoxcell - don't leak pointers to authenc keys
  crypto: ixp4xx - don't leak pointers to authenc keys
  crypto: chelsio - don't leak pointers to authenc keys
  crypto: caam/qi - don't leak pointers to authenc keys
  crypto: caam - don't leak pointers to authenc keys
  crypto: lrw - Free rctx->ext with kzfree
  crypto: talitos - fix IPsec cipher in length
  crypto: Deduplicate le32_to_cpu_array() and cpu_to_le32_array()
  crypto: doc - clarify hash callbacks state machine
  crypto: api - Keep failed instances alive
  crypto: api - Make crypto_alg_lookup static
  crypto: api - Remove unused crypto_type lookup function
  crypto: chelsio - Remove declaration of static function from header
  crypto: inside-secure - hmac(sha224) support
  crypto: inside-secure - hmac(sha256) support
  ..

4 years agoMerge tag 'riscv-for-linus-4.17-mw0' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Wed, 4 Apr 2018 23:43:47 +0000 (16:43 -0700)]
Merge tag 'riscv-for-linus-4.17-mw0' of git://git./linux/kernel/git/palmer/riscv-linux

Pull RISC-V updates from Palmer Dabbelt:
 "This contains the new features we'd like to incorporate into the
  RISC-V port for 4.17. We might have a bit more stuff land later in the
  merge window, but I wanted to get this out earlier just so everyone
  can see where we currently stand.

  A short summary of the changes is:

   - We've added support for dynamic ftrace on RISC-V targets.

   - There have been a handful of cleanups to our atomic and locking
     routines. They now more closely match the released RISC-V memory
     model draft.

   - Our module loading support has been cleaned up and is now enabled
     by default, despite some limitations still existing.

   - A patch to define COMMANDLINE_FORCE instead of COMMANDLINE_OVERRIDE
     so the generic device tree code picks up handling all our command
     line stuff.

  There's more information in the merge commits for each patch set"

* tag 'riscv-for-linus-4.17-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux: (21 commits)
  RISC-V: Rename CONFIG_CMDLINE_OVERRIDE to CONFIG_CMDLINE_FORCE
  RISC-V: Add definition of relocation types
  RISC-V: Enable module support in defconfig
  RISC-V: Support SUB32 relocation type in kernel module
  RISC-V: Support ADD32 relocation type in kernel module
  RISC-V: Support ALIGN relocation type in kernel module
  RISC-V: Support RVC_BRANCH/JUMP relocation type in kernel modulewq
  RISC-V: Support HI20/LO12_I/LO12_S relocation type in kernel module
  RISC-V: Support CALL relocation type in kernel module
  RISC-V: Support GOT_HI20/CALL_PLT relocation type in kernel module
  RISC-V: Add section of GOT.PLT for kernel module
  RISC-V: Add sections of PLT and GOT for kernel module
  riscv/atomic: Strengthen implementations with fences
  riscv/spinlock: Strengthen implementations with fences
  riscv/barrier: Define __smp_{store_release,load_acquire}
  riscv/ftrace: Add HAVE_FUNCTION_GRAPH_RET_ADDR_PTR support
  riscv/ftrace: Add DYNAMIC_FTRACE_WITH_REGS support
  riscv/ftrace: Add ARCH_SUPPORTS_FTRACE_OPS support
  riscv/ftrace: Add dynamic function graph tracer support
  riscv/ftrace: Add dynamic function tracer support
  ...

4 years agoMerge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64...
Linus Torvalds [Wed, 4 Apr 2018 23:01:43 +0000 (16:01 -0700)]
Merge tag 'arm64-upstream' of git://git./linux/kernel/git/arm64/linux

Pull arm64 updates from Will Deacon:
 "Nothing particularly stands out here, probably because people were
  tied up with spectre/meltdown stuff last time around. Still, the main
  pieces are:

   - Rework of our CPU features framework so that we can whitelist CPUs
     that don't require kpti even in a heterogeneous system

   - Support for the IDC/DIC architecture extensions, which allow us to
     elide instruction and data cache maintenance when writing out
     instructions

   - Removal of the large memory model which resulted in suboptimal
     codegen by the compiler and increased the use of literal pools,
     which could potentially be used as ROP gadgets since they are
     mapped as executable

   - Rework of forced signal delivery so that the siginfo_t is
     well-formed and handling of show_unhandled_signals is consolidated
     and made consistent between different fault types

   - More siginfo cleanup based on the initial patches from Eric
     Biederman

   - Workaround for Cortex-A55 erratum #1024718

   - Some small ACPI IORT updates and cleanups from Lorenzo Pieralisi

   - Misc cleanups and non-critical fixes"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (70 commits)
  arm64: uaccess: Fix omissions from usercopy whitelist
  arm64: fpsimd: Split cpu field out from struct fpsimd_state
  arm64: tlbflush: avoid writing RES0 bits
  arm64: cmpxchg: Include linux/compiler.h in asm/cmpxchg.h
  arm64: move percpu cmpxchg implementation from cmpxchg.h to percpu.h
  arm64: cmpxchg: Include build_bug.h instead of bug.h for BUILD_BUG
  arm64: lse: Include compiler_types.h and export.h for out-of-line LL/SC
  arm64: fpsimd: include <linux/init.h> in fpsimd.h
  drivers/perf: arm_pmu_platform: do not warn about affinity on uniprocessor
  perf: arm_spe: include linux/vmalloc.h for vmap()
  Revert "arm64: Revert L1_CACHE_SHIFT back to 6 (64-byte cache line size)"
  arm64: cpufeature: Avoid warnings due to unused symbols
  arm64: Add work around for Arm Cortex-A55 Erratum 1024718
  arm64: Delay enabling hardware DBM feature
  arm64: Add MIDR encoding for Arm Cortex-A55 and Cortex-A35
  arm64: capabilities: Handle shared entries
  arm64: capabilities: Add support for checks based on a list of MIDRs
  arm64: Add helpers for checking CPU MIDR against a range
  arm64: capabilities: Clean up midr range helpers
  arm64: capabilities: Change scope of VHE to Boot CPU feature
  ...

4 years agoMerge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Wed, 4 Apr 2018 22:19:26 +0000 (15:19 -0700)]
Merge branch 'irq-core-for-linus' of git://git./linux/kernel/git/tip/tip

Pull irq updates from Thomas Gleixner:
 "The usual pile of boring changes:

   - Consolidate tasklet functions to share code instead of duplicating
     it

   - The first step for making the low level entry handler management on
     multi-platform kernels generic

   - A new sysfs file which allows to retrieve the wakeup state of
     interrupts.

   - Ensure that the interrupt thread follows the effective affinity and
     not the programmed affinity to avoid cross core wakeups.

   - Two new interrupt controller drivers (Microsemi Ocelot and Qualcomm
     PDC)

   - Fix the wakeup path clock handling for Reneasas interrupt chips.

   - Rework the boot time register reset for ARM GIC-V2/3

   - Better suspend/resume support for ARM GIV-V3/ITS

   - Add missing locking to the ARM GIC set_type() callback

   - Small fixes for the irq simulator code

   - SPDX identifiers for the irq core code and removal of boiler plate

   - Small cleanups all over the place"

* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits)
  openrisc: Set CONFIG_MULTI_IRQ_HANDLER
  arm64: Set CONFIG_MULTI_IRQ_HANDLER
  genirq: Make GENERIC_IRQ_MULTI_HANDLER depend on !MULTI_IRQ_HANDLER
  irqchip/gic: Take lock when updating irq type
  irqchip/gic: Update supports_deactivate static key to modern api
  irqchip/gic-v3: Ensure GICR_CTLR.EnableLPI=0 is observed before enabling
  irqchip: Add a driver for the Microsemi Ocelot controller
  dt-bindings: interrupt-controller: Add binding for the Microsemi Ocelot interrupt controller
  irqchip/gic-v3: Probe for SCR_EL3 being clear before resetting AP0Rn
  irqchip/gic-v3: Don't try to reset AP0Rn
  irqchip/gic-v3: Do not check trigger configuration of partitionned LPIs
  genirq: Remove license boilerplate/references
  genirq: Add missing SPDX identifiers
  genirq/matrix: Cleanup SPDX identifier
  genirq: Cleanup top of file comments
  genirq: Pass desc to __irq_free instead of irq number
  irqchip/gic-v3: Loudly complain about the use of IRQ_TYPE_NONE
  irqchip/gic: Loudly complain about the use of IRQ_TYPE_NONE
  RISC-V: Move to the new GENERIC_IRQ_MULTI_HANDLER handler
  genirq: Add CONFIG_GENERIC_IRQ_MULTI_HANDLER
  ...

4 years agoMerge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Wed, 4 Apr 2018 21:50:29 +0000 (14:50 -0700)]
Merge branch 'timers-core-for-linus' of git://git./linux/kernel/git/tip/tip

Pull time(r) updates from Thomas Gleixner:
 "A small set of updates for timers and timekeeping:

   - The most interesting change is the consolidation of clock MONOTONIC
     and clock BOOTTIME.

     Clock MONOTONIC behaves now exactly like clock BOOTTIME and does
     not longer ignore the time spent in suspend. A new clock
     MONOTONIC_ACTIVE is provived which behaves like clock MONOTONIC in
     kernels before this change. This allows applications to
     programmatically check for the clock MONOTONIC behaviour.

     As discussed in the review thread, this has the potential of
     breaking user space and we might have to revert this. Knock on wood
     that we can avoid that exercise.

   - Updates to the NTP mechanism to improve accuracy

   - A new kernel internal data structure to aid the ongoing Y2038 work.

   - Cleanups and simplifications of the clocksource code.

   - Make the alarmtimer code play nicely with debugobjects"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  alarmtimer: Init nanosleep alarm timer on stack
  y2038: Introduce struct __kernel_old_timeval
  tracing: Unify the "boot" and "mono" tracing clocks
  hrtimer: Unify MONOTONIC and BOOTTIME clock behavior
  posix-timers: Unify MONOTONIC and BOOTTIME clock behavior
  timekeeping: Remove boot time specific code
  Input: Evdev - unify MONOTONIC and BOOTTIME clock behavior
  timekeeping: Make the MONOTONIC clock behave like the BOOTTIME clock
  timekeeping: Add the new CLOCK_MONOTONIC_ACTIVE clock
  timekeeping/ntp: Determine the multiplier directly from NTP tick length
  timekeeping/ntp: Don't align NTP frequency adjustments to ticks
  clocksource: Use ATTRIBUTE_GROUPS
  clocksource: Use DEVICE_ATTR_RW/RO/WO to define device attributes
  clocksource: Don't walk the clocksource list for empty override

4 years agoMerge tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso...
Linus Torvalds [Wed, 4 Apr 2018 21:31:53 +0000 (14:31 -0700)]
Merge tag 'random_for_linus' of git://git./linux/kernel/git/tytso/random

Pull /dev/random updates from Ted Ts'o:
 "A few random (cough, cough) cleanups for the /dev/random driver"

* tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random:
  drivers/char/random.c: remove unused dont_count_entropy
  random: optimize add_interrupt_randomness
  random: always fill buffer in get_random_bytes_wait
  random: use a tighter cap in credit_entropy_bits_safe()

4 years agoMerge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso...
Linus Torvalds [Wed, 4 Apr 2018 21:19:24 +0000 (14:19 -0700)]
Merge tag 'ext4_for_linus' of git://git./linux/kernel/git/tytso/ext4

Pull ext4 updates from Ted Ts'o:
 "Cleanups and bugfixes for ext4, including some fixes to make ext4 more
  robust against maliciously crafted file system images.

  (I still don't recommend that container folks hold any delusions that
  mounting arbitary images that can be crafted by malicious attackers
  should be considered sane thing to do, though!)"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (29 commits)
  ext4: force revalidation of directory pointer after seekdir(2)
  ext4: add extra checks to ext4_xattr_block_get()
  ext4: add bounds checking to ext4_xattr_find_entry()
  ext4: move call to ext4_error() into ext4_xattr_check_block()
  ext4: don't show data=<mode> option if defaulted
  ext4: omit init_itable=n in procfs when disabled
  ext4: show more binary mount options in procfs
  ext4: simplify kobject usage
  ext4: remove unused parameters in sysfs code
  ext4: null out kobject* during sysfs cleanup
  ext4: don't allow r/w mounts if metadata blocks overlap the superblock
  ext4: always initialize the crc32c checksum driver
  ext4: fail ext4_iget for root directory if unallocated
  ext4: limit xattr size to INT_MAX
  ext4: add validity checks for bitmap block numbers
  ext4: fix comments in ext4_swap_extents()
  ext4: use generic_writepages instead of __writepage/write_cache_pages
  ext4: don't complain about incorrect features when probing
  ext4: remove EXT4_STATE_DIOREAD_LOCK flag
  ext4: fix offset overflow on 32-bit archs in ext4_iomap_begin()
  ...

4 years agoMerge tag '4.17-SMB3-Fixes' of git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Wed, 4 Apr 2018 21:09:27 +0000 (14:09 -0700)]
Merge tag '4.17-SMB3-Fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs updates from Steve French:
 "Includes SMB3.11 security improvements, as well as various fixes for
  stable and some debugging improvements"

* tag '4.17-SMB3-Fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: Add minor debug message during negprot
  smb3: Fix root directory when server returns inode number of zero
  cifs: fix sparse warning on previous patch in a few printks
  cifs: add server->vals->header_preamble_size
  cifs: smbd: disconnect transport on RDMA errors
  cifs: smbd: avoid reconnect lockup
  Don't log confusing message on reconnect by default
  Don't log expected error on DFS referral request
  fs: cifs: Replace _free_xid call in cifs_root_iget function
  SMB3.1.1 dialect is no longer experimental
  Tree connect for SMB3.1.1 must be signed for non-encrypted shares
  fix smb3-encryption breakage when CONFIG_DEBUG_SG=y
  CIFS: fix sha512 check in cifs_crypto_secmech_release
  CIFS: implement v3.11 preauth integrity
  CIFS: add sha512 secmech
  CIFS: refactor crypto shash/sdesc allocation&free
  Update README file for cifs.ko
  Update TODO list for cifs.ko
  cifs: fix memory leak in SMB2_open()
  CIFS: SMBD: fix spelling mistake: "faield" and "legnth"

4 years agoMerge tag 'gfs2-4.17.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2...
Linus Torvalds [Wed, 4 Apr 2018 20:09:42 +0000 (13:09 -0700)]
Merge tag 'gfs2-4.17.fixes' of git://git./linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 updates from Bob Peterson:
 "We've only got nine GFS2 patches for this merge window:

   - report journal recovery times more accurately during journal replay
     (Abhi Das)

   - fix fallocate chunk size (Andreas Gruenbacher)

   - correctly dirty inodes during rename (Andreas Gruenbacher)

   - improve the comment for function gfs2_block_map (Andreas
     Gruenbacher)

   - improve kernel trace point iomap end: The physical block address
     was added (Andreas Gruenbacher)

   - fix a nasty file system corruption bug that surfaced in xfstests
     476 in punch-hole/truncate (Andreas Gruenbacher)

   - fix a problem Christoph Helwig pointed out, namely, that GFS2 was
     misusing the IOMAP_ZERO flag. The zeroing of new blocks was moved
     to the proper fallocate code (Andreas Gruenbacher)

   - declare function gfs2_remove_from_ail as static (Bob Peterson)

   - only set PageChecked for jdata page writes (Bob Peterson)"

* tag 'gfs2-4.17.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: time journal recovery steps accurately
  gfs2: Zero out fallocated blocks in fallocate_chunk
  gfs2: Check for the end of metadata in punch_hole
  gfs2: gfs2_iomap_end tracepoint: log block address
  gfs2: Improve gfs2_block_map comment
  GFS2: Only set PageChecked for jdata pages
  GFS2: Make function gfs2_remove_from_ail static
  gfs2: Dirty source inode during rename
  gfs2: Fix fallocate chunk size

4 years agoMerge tag 'for-4.17-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Linus Torvalds [Wed, 4 Apr 2018 20:03:38 +0000 (13:03 -0700)]
Merge tag 'for-4.17-tag' of git://git./linux/kernel/git/kdave/linux

Pull btrfs updates from David Sterba:
 "There are a several user visible changes, the rest is mostly invisible
  and continues to clean up the whole code base.

  User visible changes:
   - new mount option nossd_spread (pair for ssd_spread)

   - mount option subvolid will detect junk after the number and fail
     the mount

   - add message after cancelled device replace

   - direct module dependency on libcrc32, removed own crc wrappers

   - removed user space transaction ioctls

   - use lighter locking when reading /proc/self/mounts, RCU instead of
     mutex to avoid unnecessary contention

  Enhancements:
   - skip writeback of last page when truncating file to same size

   - send: do not issue unnecessary truncate operations

   - mount option token specifiers: use %u for unsigned values, more
     validation

   - selftests: more tree block validations

  qgroups:
   - preparatory work for splitting reservation types for data and
     metadata, this should allow for more accurate tracking and fix some
     issues with underflows or do further enhancements

   - split metadata reservations for started and joined transaction so
     they do not get mixed up and are accounted correctly at commit time

   - with the above, it's possible to revert patch that potentially
     deadlocks when trying to make more space by explicitly committing
     when the quota limit is hit

   - fix root item corruption when multiple same source snapshots are
     created with quota enabled

  RAID56:
   - make sure target is identical to source when raid56 rebuild fails
     after dev-replace

   - faster rebuild during scrub, batch by stripes and not
     block-by-block

   - make more use of cached data when rebuilding from a missing device

  Fixes:
   - null pointer deref when device replace target is missing

   - fix fsync after hole punching when using no-holes feature

   - fix lockdep splat when allocating percpu data with wrong GFP flags

  Cleanups, refactoring, core changes:
   - drop redunant parameters from various functions

   - kill and opencode trivial helpers

   - __cold/__exit function annotations

   - dead code removal

   - continued audit and documentation of memory barriers

   - error handling: handle removal from uuid tree

   - error handling: remove handling of impossible condtitons

   - more debugging or error messages

   - updated tracepoints

   - one VLA use removal (and one still left)"

* tag 'for-4.17-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (164 commits)
  btrfs: lift errors from add_extent_changeset to the callers
  Btrfs: print error messages when failing to read trees
  btrfs: user proper type for btrfs_mask_flags flags
  btrfs: split dev-replace locking helpers for read and write
  btrfs: remove stale comments about fs_mutex
  btrfs: use RCU in btrfs_show_devname for device list traversal
  btrfs: update barrier in should_cow_block
  btrfs: use lockdep_assert_held for mutexes
  btrfs: use lockdep_assert_held for spinlocks
  btrfs: Validate child tree block's level and first key
  btrfs: tests/qgroup: Fix wrong tree backref level
  Btrfs: fix copy_items() return value when logging an inode
  Btrfs: fix fsync after hole punching when using no-holes feature
  btrfs: use helper to set ulist aux from a qgroup
  Revert "btrfs: qgroups: Retry after commit on getting EDQUOT"
  btrfs: qgroup: Update trace events for metadata reservation
  btrfs: qgroup: Use root::qgroup_meta_rsv_* to record qgroup meta reserved space
  btrfs: delayed-inode: Use new qgroup meta rsv for delayed inode and item
  btrfs: qgroup: Use separate meta reservation type for delalloc
  btrfs: qgroup: Introduce function to convert META_PREALLOC into META_PERTRANS
  ...

4 years agoMerge tag 'xfs-4.17-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Linus Torvalds [Wed, 4 Apr 2018 19:44:02 +0000 (12:44 -0700)]
Merge tag 'xfs-4.17-merge-1' of git://git./fs/xfs/xfs-linux

Pull xfs updates from Darrick Wong:
 "Here's the first round of fixes for XFS for 4.17.

  The biggest new features this time around are the addition of lazytime
  support, further enhancement of the on-disk inode metadata verifiers,
  and a patch to smooth over some of the AGFL padding problems that have
  intermittently plagued users since 4.5. I forsee sending a second pull
  request next week with further bug fixes and speedups in the online
  scrub code and elsewhere.

  This series has been run through a full xfstests run over the weekend
  and through a quick xfstests run against this morning's master, with
  no major failures reported.

  Summary of changes for this release:

   - Various cleanups and code fixes

   - Implement lazytime as a mount option

   - Convert various on-disk metadata checks from asserts to -EFSCORRUPTED

   - Fix accounting problems with the rmap per-ag reservations

   - Refactorings and cleanups for xfs_log_force

   - Various bugfixes for the reflink code

   - Work around v5 AGFL padding problems to prevent fs shutdowns

   - Establish inode fork verifiers to inspect on-disk metadata
     correctness

   - Various online scrub fixes

   - Fix v5 swapext blowing up on deleted inodes"

* tag 'xfs-4.17-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (49 commits)
  xfs: do not log/recover swapext extent owner changes for deleted inodes
  xfs: clean up xfs_mount allocation and dynamic initializers
  xfs: remove dead inode version setting code
  xfs: catch inode allocation state mismatch corruption
  xfs: xfs_scrub_iallocbt_xref_rmap_inodes should use xref_set_corrupt
  xfs: flag inode corruption if parent ptr doesn't get us a real inode
  xfs: don't accept inode buffers with suspicious unlinked chains
  xfs: move inode extent size hint validation to libxfs
  xfs: record inode buf errors as a xref error in inobt scrubber
  xfs: remove xfs_buf parameter from inode scrub methods
  xfs: inode scrubber shouldn't bother with raw checks
  xfs: bmap scrubber should do rmap xref with bmap for sparse files
  xfs: refactor inode buffer verifier error logging
  xfs: refactor inode verifier error logging
  xfs: refactor bmap record validation
  xfs: sanity-check the unused space before trying to use it
  xfs: detect agfl count corruption and reset agfl
  xfs: unwind the try_again loop in xfs_log_force
  xfs: refactor xfs_log_force_lsn
  xfs: minor cleanup for xfs_reflink_end_cow
  ...

4 years agoMerge branch 'work.dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Linus Torvalds [Wed, 4 Apr 2018 19:05:25 +0000 (12:05 -0700)]
Merge branch 'work.dcache' of git://git./linux/kernel/git/viro/vfs

Pull vfs dcache updates from Al Viro:
 "Part of this is what the trylock loop elimination series has turned
  into, part making d_move() preserve the parent (and thus the path) of
  victim, plus some general cleanups"

* 'work.dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (22 commits)
  d_genocide: move export to definition
  fold dentry_lock_for_move() into its sole caller and clean it up
  make non-exchanging __d_move() copy ->d_parent rather than swap them
  oprofilefs: don't oops on allocation failure
  lustre: get rid of pointless casts to struct dentry *
  debugfs_lookup(): switch to lookup_one_len_unlocked()
  fold lookup_real() into __lookup_hash()
  take out orphan externs (empty_string/slash_string)
  split d_path() and friends into a separate file
  dcache.c: trim includes
  fs/dcache: Avoid a try_lock loop in shrink_dentry_list()
  get rid of trylock loop around dentry_kill()
  handle move to LRU in retain_dentry()
  dput(): consolidate the "do we need to retain it?" into an inlined helper
  split the slow part of lock_parent() off
  now lock_parent() can't run into killed dentry
  get rid of trylock loop in locking dentries on shrink list
  d_delete(): get rid of trylock loop
  fs/dcache: Move dentry_kill() below lock_parent()
  fs/dcache: Remove stale comment from dentry_kill()
  ...

4 years agonetdevsim: remove incorrect __net_initdata annotations
Arnd Bergmann [Wed, 4 Apr 2018 12:12:39 +0000 (14:12 +0200)]
netdevsim: remove incorrect __net_initdata annotations

The __net_initdata section cannot currently be used for structures that
get cleaned up in an exitcall using unregister_pernet_operations:

WARNING: vmlinux.o(.text+0x868c34): Section mismatch in reference from the function nsim_devlink_exit() to the (unknown reference) .init.data:(unknown)
The function nsim_devlink_exit() references
the (unknown reference) __initdata (unknown).
This is often because nsim_devlink_exit lacks a __initdata
annotation or the annotation of (unknown) is wrong.
WARNING: vmlinux.o(.text+0x868c64): Section mismatch in reference from the function nsim_devlink_init() to the (unknown reference) .init.data:(unknown)
WARNING: vmlinux.o(.text+0x8692bc): Section mismatch in reference from the function nsim_fib_exit() to the (unknown reference) .init.data:(unknown)
WARNING: vmlinux.o(.text+0x869300): Section mismatch in reference from the function nsim_fib_init() to the (unknown reference) .init.data:(unknown)

As that warning tells us, discarding the structure after a module is
loaded would lead to a undefined behavior when that module is removed.

It might be possible to change that annotation so it has no effect for
loadable modules, but I have not figured out exactly how to do that, and
we want this to be fixed in -rc1.

This just removes the annotations, just like we do for all other such
modules.

Fixes: 37923ed6b8ce ("netdevsim: Add simple FIB resource controller via devlink")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agosfc: remove ctpio_dmabuf_start from stats
Bert Kenward [Wed, 4 Apr 2018 15:40:30 +0000 (16:40 +0100)]
sfc: remove ctpio_dmabuf_start from stats

The ctpio_dmabuf_start entry is not actually a stat and shouldn't
be exposed to ethtool.

Fixes: 2c0b6ee837db ("sfc: expose CTPIO stats on NICs that support them")
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoinet: frags: fix ip6frag_low_thresh boundary
Eric Dumazet [Wed, 4 Apr 2018 15:35:10 +0000 (08:35 -0700)]
inet: frags: fix ip6frag_low_thresh boundary

Giving an integer to proc_doulongvec_minmax() is dangerous on 64bit arches,
since linker might place next to it a non zero value preventing a change
to ip6frag_low_thresh.

ip6frag_low_thresh is not used anymore in the kernel, but we do not
want to prematuraly break user scripts wanting to change it.

Since specifying a minimal value of 0 for proc_doulongvec_minmax()
is moot, let's remove these zero values in all defrag units.

Fixes: 6e00f7dd5e4e ("ipv6: frags: fix /proc/sys/net/ipv6/ip6frag_low_thresh")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Maciej ┼╗enczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agotipc: Fix namespace violation in tipc_sk_fill_sock_diag
GhantaKrishnamurthy MohanKrishna [Wed, 4 Apr 2018 12:49:47 +0000 (14:49 +0200)]
tipc: Fix namespace violation in tipc_sk_fill_sock_diag

To fetch UID info for socket diagnostics, we determine the
namespace of user context using tipc socket instance. This
may cause namespace violation, as the kernel will remap based
on UID.

We fix this by fetching namespace info using the calling userspace
netlink socket.

Fixes: c30b70deb5f4 (tipc: implement socket diagnostics for AF_TIPC)
Reported-by: syzbot+326e587eff1074657718@syzkaller.appspotmail.com
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: GhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: avoid unneeded atomic operation in ip*_append_data()
Paolo Abeni [Wed, 4 Apr 2018 12:30:01 +0000 (14:30 +0200)]
net: avoid unneeded atomic operation in ip*_append_data()

After commit 694aba690de0 ("ipv4: factorize sk_wmem_alloc updates
done by __ip_append_data()") and commit 1f4c6eb24029 ("ipv6:
factorize sk_wmem_alloc updates done by __ip6_append_data()"),
when transmitting sub MTU datagram, an addtional, unneeded atomic
operation is performed in ip*_append_data() to update wmem_alloc:
in the above condition the delta is 0.

The above cause small but measurable performance regression in UDP
xmit tput test with packet size below MTU.

This change avoids such overhead updating wmem_alloc only if
wmem_alloc_delta is non zero.

The error path is left intentionally unmodified: it's a slow path
and simplicity is preferred to performances.

Fixes: 694aba690de0 ("ipv4: factorize sk_wmem_alloc updates done by __ip_append_data()")
Fixes: 1f4c6eb24029 ("ipv6: factorize sk_wmem_alloc updates done by __ip6_append_data()")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonvmem: disallow modular CONFIG_NVMEM
Arnd Bergmann [Wed, 4 Apr 2018 10:38:40 +0000 (12:38 +0200)]
nvmem: disallow modular CONFIG_NVMEM

The new of_get_nvmem_mac_address() helper function causes a link error
with CONFIG_NVMEM=m:

drivers/of/of_net.o: In function `of_get_nvmem_mac_address':
of_net.c:(.text+0x168): undefined reference to `of_nvmem_cell_get'
of_net.c:(.text+0x19c): undefined reference to `nvmem_cell_read'
of_net.c:(.text+0x1a8): undefined reference to `nvmem_cell_put'

I could not come up with a good solution for this, as the code is always
built-in. Using an #if IS_REACHABLE() check around it would solve the
link time issue but then stop it from working in that configuration.
Making of_nvmem_cell_get() an inline function could also solve that, but
seems a bit ugly since it's somewhat larger than most inline functions,
and it would just bring that problem into the callers.  Splitting the
function into a separate file might be an alternative.

This uses the big hammer by making CONFIG_NVMEM itself a 'bool' symbol,
which avoids the problem entirely but makes the vmlinux larger for anyone
that might use NVMEM support but doesn't need it built-in otherwise.

Fixes: 9217e566bdee ("of_net: Implement of_get_nvmem_mac_address helper")
Cc: Mike Looijmans <mike.looijmans@topic.nl>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: David S. Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Mike Looijmans
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: hns3: fix length overflow when CONFIG_ARM64_64K_PAGES
Tan Xiaojun [Wed, 4 Apr 2018 09:40:48 +0000 (17:40 +0800)]
net: hns3: fix length overflow when CONFIG_ARM64_64K_PAGES

When enable the config item "CONFIG_ARM64_64K_PAGES", the size of PAGE_SIZE
is 65536(64K). But the type of length is u16, it will overflow. So change it
to u32.

Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonfp: use full 40 bits of the NSP buffer address
Dirk van der Merwe [Wed, 4 Apr 2018 00:24:23 +0000 (17:24 -0700)]
nfp: use full 40 bits of the NSP buffer address

The NSP default buffer is a piece of NFP memory where additional
command data can be placed.  Its format has been copied from
host buffer, but the PCIe selection bits do not make sense in
this case.  If those get masked out from a NFP address - writes
to random place in the chip memory may be issued and crash the
device.

Even in the general NSP buffer case, it doesn't make sense to have the
PCIe selection bits there anymore. These are unused at the moment, and
when it becomes necessary, the PCIe selection bits should rather be
moved to another register to utilise more bits for the buffer address.

This has never been an issue because the buffer used to be
allocated in memory with less-than-38-bit-long address but that
is about to change.

Fixes: 1a64821c6af7 ("nfp: add support for service processor access")
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agolan78xx: Connect phy early
Alexander Graf [Tue, 3 Apr 2018 22:19:35 +0000 (00:19 +0200)]
lan78xx: Connect phy early

When using wicked with a lan78xx device attached to the system, we
end up with ethtool commands issued on the device before an ifup
got issued. That lead to the following crash:

    Unable to handle kernel NULL pointer dereference at virtual address 0000039c
    pgd = ffff800035b30000
    [0000039c] *pgd=0000000000000000
    Internal error: Oops: 96000004 [#1] SMP
    Modules linked in: [...]
    Supported: Yes
    CPU: 3 PID: 638 Comm: wickedd Tainted: G            E      4.12.14-0-default #1
    Hardware name: raspberrypi rpi/rpi, BIOS 2018.03-rc2 02/21/2018
    task: ffff800035e74180 task.stack: ffff800036718000
    PC is at phy_ethtool_ksettings_get+0x20/0x98
    LR is at lan78xx_get_link_ksettings+0x44/0x60 [lan78xx]
    pc : [<ffff0000086f7f30>] lr : [<ffff000000dcca84>] pstate: 20000005
    sp : ffff80003671bb20
    x29: ffff80003671bb20 x28: ffff800035e74180
    x27: ffff000008912000 x26: 000000000000001d
    x25: 0000000000000124 x24: ffff000008f74d00
    x23: 0000004000114809 x22: 0000000000000000
    x21: ffff80003671bbd0 x20: 0000000000000000
    x19: ffff80003671bbd0 x18: 000000000000040d
    x17: 0000000000000001 x16: 0000000000000000
    x15: 0000000000000000 x14: ffffffffffffffff
    x13: 0000000000000000 x12: 0000000000000020
    x11: 0101010101010101 x10: fefefefefefefeff
    x9 : 7f7f7f7f7f7f7f7f x8 : fefefeff31677364
    x7 : 0000000080808080 x6 : ffff80003671bc9c
    x5 : ffff80003671b9f8 x4 : ffff80002c296190
    x3 : 0000000000000000 x2 : 0000000000000000
    x1 : ffff80003671bbd0 x0 : ffff80003671bc00
    Process wickedd (pid: 638, stack limit = 0xffff800036718000)
    Call trace:
    Exception stack(0xffff80003671b9e0 to 0xffff80003671bb20)
    b9e0: ffff80003671bc00 ffff80003671bbd0 0000000000000000 0000000000000000
    ba00: ffff80002c296190 ffff80003671b9f8 ffff80003671bc9c 0000000080808080
    ba20: fefefeff31677364 7f7f7f7f7f7f7f7f fefefefefefefeff 0101010101010101
    ba40: 0000000000000020 0000000000000000 ffffffffffffffff 0000000000000000
    ba60: 0000000000000000 0000000000000001 000000000000040d ffff80003671bbd0
    ba80: 0000000000000000 ffff80003671bbd0 0000000000000000 0000004000114809
    baa0: ffff000008f74d00 0000000000000124 000000000000001d ffff000008912000
    bac0: ffff800035e74180 ffff80003671bb20 ffff000000dcca84 ffff80003671bb20
    bae0: ffff0000086f7f30 0000000020000005 ffff80002c296000 ffff800035223900
    bb00: 0000ffffffffffff 0000000000000000 ffff80003671bb20 ffff0000086f7f30
    [<ffff0000086f7f30>] phy_ethtool_ksettings_get+0x20/0x98
    [<ffff000000dcca84>] lan78xx_get_link_ksettings+0x44/0x60 [lan78xx]
    [<ffff0000087cbc40>] ethtool_get_settings+0x68/0x210
    [<ffff0000087cc0d4>] dev_ethtool+0x214/0x2180
    [<ffff0000087e5008>] dev_ioctl+0x400/0x630
    [<ffff00000879dd00>] sock_do_ioctl+0x70/0x88
    [<ffff00000879f5f8>] sock_ioctl+0x208/0x368
    [<ffff0000082cde10>] do_vfs_ioctl+0xb0/0x848
    [<ffff0000082ce634>] SyS_ioctl+0x8c/0xa8
    Exception stack(0xffff80003671bec0 to 0xffff80003671c000)
    bec0: 0000000000000009 0000000000008946 0000fffff4e841d0 0000aa0032687465
    bee0: 0000aaaafa2319d4 0000fffff4e841d4 0000000032687465 0000000032687465
    bf00: 000000000000001d 7f7fff7f7f7f7f7f 72606b622e71ff4c 7f7f7f7f7f7f7f7f
    bf20: 0101010101010101 0000000000000020 ffffffffffffffff 0000ffff7f510c68
    bf40: 0000ffff7f6a9d18 0000ffff7f44ce30 000000000000040d 0000ffff7f6f98f0
    bf60: 0000fffff4e842c0 0000000000000001 0000aaaafa2c2e00 0000ffff7f6ab000
    bf80: 0000fffff4e842c0 0000ffff7f62a000 0000aaaafa2b9f20 0000aaaafa2c2e00
    bfa0: 0000fffff4e84818 0000fffff4e841a0 0000ffff7f5ad0cc 0000fffff4e841a0
    bfc0: 0000ffff7f44ce3c 0000000080000000 0000000000000009 000000000000001d
    bfe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000

The culprit is quite simple: The driver tries to access the phy left and right,
but only actually has a working reference to it when the device is up.

The fix thus is quite simple too: Get a reference to the phy on probe already
and keep it even when the device is going down.

With this patch applied, I can successfully run wicked on my system and bring
the interface up and down as many times as I want, without getting NULL pointer
dereferences in between.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonfp: add a separate counter for packets with CHECKSUM_COMPLETE
Jakub Kicinski [Mon, 2 Apr 2018 20:31:20 +0000 (13:31 -0700)]
nfp: add a separate counter for packets with CHECKSUM_COMPLETE

We are currently counting packets with CHECKSUM_COMPLETE as
"hw_rx_csum_ok".  This is confusing.  Add a new counter.
To make sure it fits in the same cacheline move the less used
error counter to a different location.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agotipc: Fix missing list initializations in struct tipc_subscription
Jon Maloy [Tue, 3 Apr 2018 17:11:19 +0000 (19:11 +0200)]
tipc: Fix missing list initializations in struct tipc_subscription

When an item of struct tipc_subscription is created, we fail to
initialize the two lists aggregated into the struct. This has so far
never been a problem, since the items are just added to a root
object by list_add(), which does not require the addee list to be
pre-initialized. However, syzbot is provoking situations where this
addition fails, whereupon the attempted removal if the item from
the list causes a crash.

This problem seems to always have been around, despite that the code
for creating this object was rewritten in commit 242e82cc95f6 ("tipc:
collapse subscription creation functions"), which is still in net-next.

We fix this for that commit by initializing the two lists properly.

Fixes: 242e82cc95f6 ("tipc: collapse subscription creation functions")
Reported-by: syzbot+0bb443b74ce09197e970@syzkaller.appspotmail.com
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'ipv6-udp-set-dst-cache-for-a-connected-sk-if-current-not-valid'
David S. Miller [Wed, 4 Apr 2018 15:31:58 +0000 (11:31 -0400)]
Merge branch 'ipv6-udp-set-dst-cache-for-a-connected-sk-if-current-not-valid'

Alexey Kodanev says:

====================
ipv6: udp: set dst cache for a connected sk if current not valid

A new RTF_CACHE route can be created with the socket's dst cache
update between the below calls in udpv6_sendmsg(), when datagram
sending results to ICMPV6_PKT_TOOBIG error:

   dst = ip6_sk_dst_lookup_flow(...)
   ...
release_dst:
    if (dst) {
        if (connected) {
            ip6_dst_store(sk, dst)

Therefore, the new socket's dst cache reset to the old one on
"release_dst:".

The first three patches prepare the code to store dst cache
with ip6_sk_dst_lookup_flow():

  * the first patch adds ip6_sk_dst_store_flow() function with
    commonly used source and destiantion addresses checks using
    the flow information.

  * the second patch adds a new argument to ip6_sk_dst_lookup_flow()
    and ability to store dst in the socket's cache. Also, the two
    users of the function are updated without enabling the new
    behavior: pingv6_sendmsg() and udpv6_sendmsg().

  * the third patch makes 'connected' variable in udpv6_sendmsg()
    to be consistent with ip6_sk_dst_store_flow(), changes its type
    from int to bool.

The last patch contains the actual fix that removes sk dst cache
update in the end of udpv6_sendmsg(), and allows to do it in
ip6_sk_dst_lookup_flow().

v6: * use bool type for a new parameter in ip_sk_dst_lookup_flow()
    * add one more patch to convert 'connected' variable in
      udpv6_sendmsg() from int to bool type. If it shouldn't be
      here I will resend it when the net-next is opened.

v5: * relocate ip6_sk_dst_store_flow() to net/ipv6/route.c and
      rename ip6_dst_store_flow() to ip6_sk_dst_store_flow() as
      suggested by Martin

v4: * fix the error in the build of ip_dst_store_flow() reported by
      kbuild test robot due to missing checks for CONFIG_IPV6: add
      new function to ip6_output.c instead of ip6_route.h
    * add 'const' to struct flowi6 in ip6_dst_store_flow()
    * minor commit messages fixes

v3: * instead of moving ip6_dst_store() above udp_v6_send_skb(),
      update socket's dst cache inside ip6_sk_dst_lookup_flow()
      if the current one is invalid
    * the issue not reproduced in 4.1, but starting from 4.2. Add
      one more 'Fixes:' commit that creates new RTF_CACHE route.
      Though, it is also mentioned in the first one
====================

Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoipv6: udp: set dst cache for a connected sk if current not valid
Alexey Kodanev [Tue, 3 Apr 2018 12:00:10 +0000 (15:00 +0300)]
ipv6: udp: set dst cache for a connected sk if current not valid

A new RTF_CACHE route can be created between ip6_sk_dst_lookup_flow()
and ip6_dst_store() calls in udpv6_sendmsg(), when datagram sending
results to ICMPV6_PKT_TOOBIG error:

    udp_v6_send_skb(), for example with vti6 tunnel:
        vti6_xmit(), get ICMPV6_PKT_TOOBIG error
            skb_dst_update_pmtu(), can create a RTF_CACHE clone
            icmpv6_send()
    ...
    udpv6_err()
        ip6_sk_update_pmtu()
           ip6_update_pmtu(), can create a RTF_CACHE clone
           ...
           ip6_datagram_dst_update()
                ip6_dst_store()

And after commit 33c162a980fe ("ipv6: datagram: Update dst cache of
a connected datagram sk during pmtu update"), the UDPv6 error handler
can update socket's dst cache, but it can happen before the update in
the end of udpv6_sendmsg(), preventing getting the new dst cache on
the next udpv6_sendmsg() calls.

In order to fix it, save dst in a connected socket only if the current
socket's dst cache is invalid.

The previous patch prepared ip6_sk_dst_lookup_flow() to do that with
the new argument, and this patch enables it in udpv6_sendmsg().

Fixes: 33c162a980fe ("ipv6: datagram: Update dst cache of a connected datagram sk during pmtu update")
Fixes: 45e4fd26683c ("ipv6: Only create RTF_CACHE routes after encountering pmtu exception")
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoipv6: udp: convert 'connected' to bool type in udpv6_sendmsg()
Alexey Kodanev [Tue, 3 Apr 2018 12:00:09 +0000 (15:00 +0300)]
ipv6: udp: convert 'connected' to bool type in udpv6_sendmsg()

This should make it consistent with ip6_sk_dst_lookup_flow()
that is accepting the new 'connected' parameter of type bool.

Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoipv6: allow to cache dst for a connected sk in ip6_sk_dst_lookup_flow()
Alexey Kodanev [Tue, 3 Apr 2018 12:00:08 +0000 (15:00 +0300)]
ipv6: allow to cache dst for a connected sk in ip6_sk_dst_lookup_flow()

Add 'connected' parameter to ip6_sk_dst_lookup_flow() and update
the cache only if ip6_sk_dst_check() returns NULL and a socket
is connected.

The function is used as before, the new behavior for UDP sockets
in udpv6_sendmsg() will be enabled in the next patch.

Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoipv6: add a wrapper for ip6_dst_store() with flowi6 checks
Alexey Kodanev [Tue, 3 Apr 2018 12:00:07 +0000 (15:00 +0300)]
ipv6: add a wrapper for ip6_dst_store() with flowi6 checks

Move commonly used pattern of ip6_dst_store() usage to a separate
function - ip6_sk_dst_store_flow(), which will check the addresses
for equality using the flow information, before saving them.

There is no functional changes in this patch. In addition, it will
be used in the next patch, in ip6_sk_dst_lookup_flow().

Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: phy: marvell10g: add thermal hwmon device
Russell King [Tue, 3 Apr 2018 09:31:45 +0000 (10:31 +0100)]
net: phy: marvell10g: add thermal hwmon device

Add a thermal monitoring device for the Marvell 88x3310, which updates
once a second.  We also need to hook into the suspend/resume mechanism
to ensure that the thermal monitoring is reconfigured when we resume.

Suggested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agopptp: remove a buggy dst release in pptp_connect()
Eric Dumazet [Tue, 3 Apr 2018 01:48:37 +0000 (18:48 -0700)]
pptp: remove a buggy dst release in pptp_connect()

Once dst has been cached in socket via sk_setup_caps(),
it is illegal to call ip_rt_put() (or dst_release()),
since sk_setup_caps() did not change dst refcount.

We can still dereference it since we hold socket lock.

Caugth by syzbot :

BUG: KASAN: use-after-free in atomic_dec_return include/asm-generic/atomic-instrumented.h:198 [inline]
BUG: KASAN: use-after-free in dst_release+0x27/0xa0 net/core/dst.c:185
Write of size 4 at addr ffff8801c54dc040 by task syz-executor4/20088

CPU: 1 PID: 20088 Comm: syz-executor4 Not tainted 4.16.0+ #376
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:17 [inline]
 dump_stack+0x1a7/0x27d lib/dump_stack.c:53
 print_address_description+0x73/0x250 mm/kasan/report.c:256
 kasan_report_error mm/kasan/report.c:354 [inline]
 kasan_report+0x23c/0x360 mm/kasan/report.c:412
 check_memory_region_inline mm/kasan/kasan.c:260 [inline]
 check_memory_region+0x137/0x190 mm/kasan/kasan.c:267
 kasan_check_write+0x14/0x20 mm/kasan/kasan.c:278
 atomic_dec_return include/asm-generic/atomic-instrumented.h:198 [inline]
 dst_release+0x27/0xa0 net/core/dst.c:185
 sk_dst_set include/net/sock.h:1812 [inline]
 sk_dst_reset include/net/sock.h:1824 [inline]
 sock_setbindtodevice net/core/sock.c:610 [inline]
 sock_setsockopt+0x431/0x1b20 net/core/sock.c:707
 SYSC_setsockopt net/socket.c:1845 [inline]
 SyS_setsockopt+0x2ff/0x360 net/socket.c:1828
 do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x42/0xb7
RIP: 0033:0x4552d9
RSP: 002b:00007f4878126c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
RAX: ffffffffffffffda RBX: 00007f48781276d4 RCX: 00000000004552d9
RDX: 0000000000000019 RSI: 0000000000000001 RDI: 0000000000000013
RBP: 000000000072bea0 R08: 0000000000000010 R09: 0000000000000000
R10: 00000000200010c0 R11: 0000000000000246 R12: 00000000ffffffff
R13: 0000000000000526 R14: 00000000006fac30 R15: 0000000000000000

Allocated by task 20088:
 save_stack+0x43/0xd0 mm/kasan/kasan.c:447
 set_track mm/kasan/kasan.c:459 [inline]
 kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:552
 kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:489
 kmem_cache_alloc+0x12e/0x760 mm/slab.c:3542
 dst_alloc+0x11f/0x1a0 net/core/dst.c:104
 rt_dst_alloc+0xe9/0x540 net/ipv4/route.c:1520
 __mkroute_output net/ipv4/route.c:2265 [inline]
 ip_route_output_key_hash_rcu+0xa49/0x2c60 net/ipv4/route.c:2493
 ip_route_output_key_hash+0x20b/0x370 net/ipv4/route.c:2322
 __ip_route_output_key include/net/route.h:126 [inline]
 ip_route_output_flow+0x26/0xa0 net/ipv4/route.c:2577
 ip_route_output_ports include/net/route.h:163 [inline]
 pptp_connect+0xa84/0x1170 drivers/net/ppp/pptp.c:453
 SYSC_connect+0x213/0x4a0 net/socket.c:1639
 SyS_connect+0x24/0x30 net/socket.c:1620
 do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x42/0xb7

Freed by task 20082:
 save_stack+0x43/0xd0 mm/kasan/kasan.c:447
 set_track mm/kasan/kasan.c:459 [inline]
 __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:520
 kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:527
 __cache_free mm/slab.c:3486 [inline]
 kmem_cache_free+0x83/0x2a0 mm/slab.c:3744
 dst_destroy+0x266/0x380 net/core/dst.c:140
 dst_destroy_rcu+0x16/0x20 net/core/dst.c:153
 __rcu_reclaim kernel/rcu/rcu.h:178 [inline]
 rcu_do_batch kernel/rcu/tree.c:2675 [inline]
 invoke_rcu_callbacks kernel/rcu/tree.c:2930 [inline]
 __rcu_process_callbacks kernel/rcu/tree.c:2897 [inline]
 rcu_process_callbacks+0xd6c/0x17b0 kernel/rcu/tree.c:2914
 __do_softirq+0x2d7/0xb85 kernel/softirq.c:285

The buggy address belongs to the object at ffff8801c54dc000
 which belongs to the cache ip_dst_cache of size 168
The buggy address is located 64 bytes inside of
 168-byte region [ffff8801c54dc000ffff8801c54dc0a8)
The buggy address belongs to the page:
page:ffffea0007153700 count:1 mapcount:0 mapping:ffff8801c54dc000 index:0x0
flags: 0x2fffc0000000100(slab)
raw: 02fffc0000000100 ffff8801c54dc000 0000000000000000 0000000100000010
raw: ffffea0006b34b20 ffffea0006b6c1e0 ffff8801d674a1c0 0000000000000000
page dumped because: kasan: bad access detected

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: dsa: mt7530: Use NULL instead of plain integer
Florian Fainelli [Mon, 2 Apr 2018 23:24:14 +0000 (16:24 -0700)]
net: dsa: mt7530: Use NULL instead of plain integer

We would be passing 0 instead of NULL as the rsp argument to
mt7530_fdb_cmd(), fix that.

Fixes: b8f126a8d543 ("net-next: dsa: add dsa support for Mediatek MT7530 switch")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: dsa: b53: Fix sparse warnings in b53_mmap.c
Florian Fainelli [Mon, 2 Apr 2018 23:17:01 +0000 (16:17 -0700)]
net: dsa: b53: Fix sparse warnings in b53_mmap.c

sparse complains about the following warnings:

drivers/net/dsa/b53/b53_mmap.c:33:31: warning: incorrect type in
initializer (different address spaces)
drivers/net/dsa/b53/b53_mmap.c:33:31:    expected unsigned char
[noderef] [usertype] <asn:2>*regs
drivers/net/dsa/b53/b53_mmap.c:33:31:    got void *priv

and indeed, while what we are doing is functional, we are dereferencing
a void * pointer into a void __iomem * which is not great. Just use the
defined b53_mmap_priv structure which holds our register base and use
that.

Fixes: 967dd82ffc52 ("net: dsa: b53: Add support for Broadcom RoboSwitch")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoaf_unix: remove redundant lockdep class
Cong Wang [Mon, 2 Apr 2018 18:01:27 +0000 (11:01 -0700)]
af_unix: remove redundant lockdep class

After commit 581319c58600 ("net/socket: use per af lockdep classes for sk queues")
sock queue locks now have per-af lockdep classes, including unix socket.
It is no longer necessary to workaround it.

I noticed this while looking at a syzbot deadlock report, this patch
itself doesn't fix it (this is why I don't add Reported-by).

Fixes: 581319c58600 ("net/socket: use per af lockdep classes for sk queues")
Cc: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'net-Broadcom-drivers-sparse-fixes'
David S. Miller [Wed, 4 Apr 2018 15:07:21 +0000 (11:07 -0400)]
Merge branch 'net-Broadcom-drivers-sparse-fixes'

Florian Fainelli says:

====================
net: Broadcom drivers sparse fixes

This patch series fixes the same warning reported by sparse in bcmsysport and
bcmgenet in the code that deals with inserting the TX checksum pointers:

drivers/net/ethernet/broadcom/bcmsysport.c:1155:26: warning: cast from restricted __be16
drivers/net/ethernet/broadcom/bcmsysport.c:1155:26: warning: incorrect type in argument 1 (different base types)
drivers/net/ethernet/broadcom/bcmsysport.c:1155:26:    expected unsigned short [unsigned] [usertype] val
drivers/net/ethernet/broadcom/bcmsysport.c:1155:26:    got restricted __be16 [usertype] protocol

This patch fixes both issues by using the same construct and not swapping
skb->protocol but instead the values we are checking against.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: systemport: Fix sparse warnings in bcm_sysport_insert_tsb()
Florian Fainelli [Mon, 2 Apr 2018 22:58:56 +0000 (15:58 -0700)]
net: systemport: Fix sparse warnings in bcm_sysport_insert_tsb()

skb->protocol is a __be16 which we would be calling htons() against,
while this is not wrong per-se as it correctly results in swapping the
value on LE hosts, this still upsets sparse. Adopt a similar pattern to
what other drivers do and just assign ip_ver to skb->protocol, and then
use htons() against the different constants such that the compiler can
resolve the values at build time.

Fixes: 80105befdb4b ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC driver")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: bcmgenet: Fix sparse warnings in bcmgenet_put_tx_csum()
Florian Fainelli [Mon, 2 Apr 2018 22:58:55 +0000 (15:58 -0700)]
net: bcmgenet: Fix sparse warnings in bcmgenet_put_tx_csum()

skb->protocol is a __be16 which we would be calling htons() against,
while this is not wrong per-se as it correctly results in swapping the
value on LE hosts, this still upsets sparse. Adopt a similar pattern to
what other drivers do and just assign ip_ver to skb->protocol, and then
use htons() against the different constants such that the compiler can
resolve the values at build time.

Fixes: 1c1008c793fa ("net: bcmgenet: add main driver file")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agorxrpc: Fix undefined packet handling
David Howells [Mon, 2 Apr 2018 22:51:39 +0000 (23:51 +0100)]
rxrpc: Fix undefined packet handling

By analogy with other Rx implementations, RxRPC packet types 9, 10 and 11
should just be discarded rather than being aborted like other undefined
packet types.

Reported-by: Jeffrey Altman <jaltman@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoopenrisc: Set CONFIG_MULTI_IRQ_HANDLER
Palmer Dabbelt [Wed, 4 Apr 2018 04:31:29 +0000 (21:31 -0700)]
openrisc: Set CONFIG_MULTI_IRQ_HANDLER

arm has an optional MULTI_IRQ_HANDLER, which openrisc copied but didn't
make optional.  The multi irq handler infrastructure has been copied to
generic code selectable with a new config symbol. That symbol can be
selected by randconfig builds and can cause build breakage.

Introduce CONFIG_MULTI_IRQ_HANDLER as an intermediate step which prevents
the core config symbol from being selected. The openrisc local config
symbol will be removed once openrisc gets converted to the generic code.

Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Link: https://lkml.kernel.org/r/20180404043130.31277-3-palmer@sifive.com
4 years agoarm64: Set CONFIG_MULTI_IRQ_HANDLER
Palmer Dabbelt [Wed, 4 Apr 2018 04:31:28 +0000 (21:31 -0700)]
arm64: Set CONFIG_MULTI_IRQ_HANDLER

arm has an optional MULTI_IRQ_HANDLER, which arm64 copied but didn't make
optional.  The multi irq handler infrastructure has been copied to generic
code selectable with a new config symbol. That symbol can be selected by
randconfig builds and can cause build breakage.

Introduce CONFIG_MULTI_IRQ_HANDLER as an intermediate step which prevents
the core config symbol from being selected. The arm64 local config symbol
will be removed once arm64 gets converted to the generic code.

Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Link: https://lkml.kernel.org/r/20180404043130.31277-2-palmer@sifive.com
4 years agogenirq: Make GENERIC_IRQ_MULTI_HANDLER depend on !MULTI_IRQ_HANDLER
Palmer Dabbelt [Wed, 4 Apr 2018 04:31:30 +0000 (21:31 -0700)]
genirq: Make GENERIC_IRQ_MULTI_HANDLER depend on !MULTI_IRQ_HANDLER

These config switches enable the same code in the core and the not yet
converted architecture code. They can be selected both by randconfig builds
and cause linker error because the same symbols are defined twice.

Make the new GENERIC_IRQ_MULTI_HANDLER depend on !MULTI_IRQ_HANDLER to
prevent that. The dependency will be removed once all architectures are
converted over.

Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Link: https://lkml.kernel.org/r/20180404043130.31277-4-palmer@sifive.com
4 years agoMerge branch 'old.dcache' into work.dcache
Al Viro [Wed, 4 Apr 2018 04:40:19 +0000 (00:40 -0400)]
Merge branch 'old.dcache' into work.dcache

4 years agoMerge branch 'userns-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm...
Linus Torvalds [Wed, 4 Apr 2018 02:15:32 +0000 (19:15 -0700)]
Merge branch 'userns-linus' of git://git./linux/kernel/git/ebiederm/user-namespace

Pull namespace updates from Eric Biederman:
 "There was a lot of work this cycle fixing bugs that were discovered
  after the merge window and getting everything ready where we can
  reasonably support fully unprivileged fuse. The bug fixes you already
  have and much of the unprivileged fuse work is coming in via other
  trees.

  Still left for fully unprivileged fuse is figuring out how to cleanly
  handle .set_acl and .get_acl in the legacy case, and properly handling
  of evm xattrs on unprivileged mounts.

  Included in the tree is a cleanup from Alexely that replaced a linked
  list with a statically allocated fix sized array for the pid caches,
  which simplifies and speeds things up.

  Then there is are some cleanups and fixes for the ipc namespace. The
  motivation was that in reviewing other code it was discovered that
  access ipc objects from different pid namespaces recorded pids in such
  a way that when asked the wrong pids were returned. In the worst case
  there has been a measured 30% performance impact for sysvipc
  semaphores. Other test cases showed no measurable performance impact.
  Manfred Spraul and Davidlohr Bueso who tend to work on sysvipc
  performance both gave the nod that this is good enough.

  Casey Schaufler and James Morris have given their approval to the LSM
  side of the changes.

  I simplified the types and the code dealing with sysvipc to pass just
  kern_ipc_perm for all three types of ipc. Which reduced the header
  dependencies throughout the kernel and simplified the lsm code.

  Which let me work on the pid fixes without having to worry about
  trivial changes causing complete kernel recompiles"

* 'userns-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  ipc/shm: Fix pid freeing.
  ipc/shm: fix up for struct file no longer being available in shm.h
  ipc/smack: Tidy up from the change in type of the ipc security hooks
  ipc: Directly call the security hook in ipc_ops.associate
  ipc/sem: Fix semctl(..., GETPID, ...) between pid namespaces
  ipc/msg: Fix msgctl(..., IPC_STAT, ...) between pid namespaces
  ipc/shm: Fix shmctl(..., IPC_STAT, ...) between pid namespaces.
  ipc/util: Helpers for making the sysvipc operations pid namespace aware
  ipc: Move IPCMNI from include/ipc.h into ipc/util.h
  msg: Move struct msg_queue into ipc/msg.c
  shm: Move struct shmid_kernel into ipc/shm.c
  sem: Move struct sem and struct sem_array into ipc/sem.c
  msg/security: Pass kern_ipc_perm not msg_queue into the msg_queue security hooks
  shm/security: Pass kern_ipc_perm not shmid_kernel into the shm security hooks
  sem/security: Pass kern_ipc_perm not sem_array into the sem security hooks
  pidns: simpler allocation of pid_* caches