mm/memory_hotplug: don't access uninitialized memmaps in shrink_zone_span()
authorDavid Hildenbrand <>
Fri, 22 Nov 2019 01:53:56 +0000 (17:53 -0800)
committerLinus Torvalds <>
Fri, 22 Nov 2019 17:11:18 +0000 (09:11 -0800)
Let's limit shrinking to !ZONE_DEVICE so we can fix the current code.
We should never try to touch the memmap of offline sections where we
could have uninitialized memmaps and could trigger BUGs when calling
page_to_nid() on poisoned pages.

There is no reliable way to distinguish an uninitialized memmap from an
initialized memmap that belongs to ZONE_DEVICE, as we don't have
anything like SECTION_IS_ONLINE we can use similar to
pfn_to_online_section() for !ZONE_DEVICE memory.

E.g., set_zone_contiguous() similarly relies on pfn_to_online_section()
and will therefore never set a ZONE_DEVICE zone consecutive.  Stopping
to shrink the ZONE_DEVICE therefore results in no observable changes,
besides /proc/zoneinfo indicating different boundaries - something we
can totally live with.

Before commit d0dc12e86b31 ("mm/memory_hotplug: optimize memory
hotplug"), the memmap was initialized with 0 and the node with the right
value.  So the zone might be wrong but not garbage.  After that commit,
both the zone and the node will be garbage when touching uninitialized

Toshiki reported a BUG (race between delayed initialization of
ZONE_DEVICE memmaps without holding the memory hotplug lock and
concurrent zone shrinking).

"Iteration of create and destroy namespace causes the panic as below:

      kernel BUG at mm/page_alloc.c:535!
      CPU: 7 PID: 2766 Comm: ndctl Not tainted 5.4.0-rc4 #6
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 04/01/2014
      RIP: 0010:set_pfnblock_flags_mask+0x95/0xf0
      Call Trace:
       pmem_attach_disk+0x16b/0x600 [nd_pmem]

  While creating a namespace and initializing memmap, if you destroy the
  namespace and shrink the zone, it will initialize the memmap outside
  the zone and trigger VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page),
  pfn), page) in set_pfnblock_flags_mask()."

This BUG is also mitigated by this commit, where we for now stop to
shrink the ZONE_DEVICE zone until we can do it in a safe and clean way.

Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") [visible after d0dc12e86b319]
Signed-off-by: David Hildenbrand <>
Reported-by: Aneesh Kumar K.V <>
Reported-by: Toshiki Fukasawa <>
Cc: Oscar Salvador <>
Cc: David Hildenbrand <>
Cc: Michal Hocko <>
Cc: Pavel Tatashin <>
Cc: Dan Williams <>
Cc: Alexander Duyck <>
Cc: Alexander Potapenko <>
Cc: Andy Lutomirski <>
Cc: Anshuman Khandual <>
Cc: Benjamin Herrenschmidt <>
Cc: Borislav Petkov <>
Cc: Catalin Marinas <>
Cc: Christian Borntraeger <>
Cc: Christophe Leroy <>
Cc: Damian Tometzki <>
Cc: Dave Hansen <>
Cc: Fenghua Yu <>
Cc: Gerald Schaefer <>
Cc: Greg Kroah-Hartman <>
Cc: Halil Pasic <>
Cc: Heiko Carstens <>
Cc: "H. Peter Anvin" <>
Cc: Ingo Molnar <>
Cc: Ira Weiny <>
Cc: Jason Gunthorpe <>
Cc: Jun Yao <>
Cc: Logan Gunthorpe <>
Cc: Mark Rutland <>
Cc: Masahiro Yamada <>
Cc: "Matthew Wilcox (Oracle)" <>
Cc: Mel Gorman <>
Cc: Michael Ellerman <>
Cc: Mike Rapoport <>
Cc: Pankaj Gupta <>
Cc: Paul Mackerras <>
Cc: Pavel Tatashin <>
Cc: Peter Zijlstra <>
Cc: Qian Cai <>
Cc: Rich Felker <>
Cc: Robin Murphy <>
Cc: Steve Capper <>
Cc: Thomas Gleixner <>
Cc: Tom Lendacky <>
Cc: Tony Luck <>
Cc: Vasily Gorbik <>
Cc: Vlastimil Babka <>
Cc: Wei Yang <>
Cc: Wei Yang <>
Cc: Will Deacon <>
Cc: Yoshinori Sato <>
Cc: Yu Zhao <>
Cc: <> [4.13+]
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>

index 3b62a9f..f307bd8 100644 (file)
@@ -331,7 +331,7 @@ static unsigned long find_smallest_section_pfn(int nid, struct zone *zone,
                                     unsigned long end_pfn)
        for (; start_pfn < end_pfn; start_pfn += PAGES_PER_SUBSECTION) {
-               if (unlikely(!pfn_valid(start_pfn)))
+               if (unlikely(!pfn_to_online_page(start_pfn)))
                if (unlikely(pfn_to_nid(start_pfn) != nid))
@@ -356,7 +356,7 @@ static unsigned long find_biggest_section_pfn(int nid, struct zone *zone,
        /* pfn is the end pfn of a memory section. */
        pfn = end_pfn - 1;
        for (; pfn >= start_pfn; pfn -= PAGES_PER_SUBSECTION) {
-               if (unlikely(!pfn_valid(pfn)))
+               if (unlikely(!pfn_to_online_page(pfn)))
                if (unlikely(pfn_to_nid(pfn) != nid))
@@ -415,7 +415,7 @@ static void shrink_zone_span(struct zone *zone, unsigned long start_pfn,
        pfn = zone_start_pfn;
        for (; pfn < zone_end_pfn; pfn += PAGES_PER_SUBSECTION) {
-               if (unlikely(!pfn_valid(pfn)))
+               if (unlikely(!pfn_to_online_page(pfn)))
                if (page_zone(pfn_to_page(pfn)) != zone)
@@ -471,6 +471,16 @@ static void __remove_zone(struct zone *zone, unsigned long start_pfn,
        struct pglist_data *pgdat = zone->zone_pgdat;
        unsigned long flags;
+       /*
+        * Zone shrinking code cannot properly deal with ZONE_DEVICE. So
+        * we will not try to shrink the zones - which is okay as
+        * set_zone_contiguous() cannot deal with ZONE_DEVICE either way.
+        */
+       if (zone_idx(zone) == ZONE_DEVICE)
+               return;
        pgdat_resize_lock(zone->zone_pgdat, &flags);
        shrink_zone_span(zone, start_pfn, start_pfn + nr_pages);