commit f19f5c49bb upstream.
It turns out that we should *not* invert all not-present mappings,
because the all zeroes case is obviously special.
clear_page() does not undergo the XOR logic to invert the address bits,
i.e. PTE, PMD and PUD entries that have not been individually written
will have val=0 and so will trigger __pte_needs_invert(). As a result,
{pte,pmd,pud}_pfn() will return the wrong PFN value, i.e. all ones
(adjusted by the max PFN mask) instead of zero. A zeroed entry is ok
because the page at physical address 0 is reserved early in boot
specifically to mitigate L1TF, so explicitly exempt them from the
inversion when reading the PFN.
Manifested as an unexpected mprotect(..., PROT_NONE) failure when called
on a VMA that has VM_PFNMAP and was mmap'd to as something other than
PROT_NONE but never used. mprotect() sends the PROT_NONE request down
prot_none_walk(), which walks the PTEs to check the PFNs.
prot_none_pte_entry() gets the bogus PFN from pte_pfn() and returns
-EACCES because it thinks mprotect() is trying to adjust a high MMIO
address.
[ This is a very modified version of Sean's original patch, but all
credit goes to Sean for doing this and also pointing out that
sometimes the __pte_needs_invert() function only gets the protection
bits, not the full eventual pte. But zero remains special even in
just protection bits, so that's ok. - Linus ]
Fixes: f22cc87f6c ("x86/speculation/l1tf: Invert all not present mappings")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5e0fb5df2e upstream.
ioremap() calls pud_free_pmd_page() / pmd_free_pte_page() when it creates
a pud / pmd map. The following preconditions are met at their entry.
- All pte entries for a target pud/pmd address range have been cleared.
- System-wide TLB purges have been peformed for a target pud/pmd address
range.
The preconditions assure that there is no stale TLB entry for the range.
Speculation may not cache TLB entries since it requires all levels of page
entries, including ptes, to have P & A-bits set for an associated address.
However, speculation may cache pud/pmd entries (paging-structure caches)
when they have P-bit set.
Add a system-wide TLB purge (INVLPG) to a single page after clearing
pud/pmd entry's P-bit.
SDM 4.10.4.1, Operation that Invalidate TLBs and Paging-Structure Caches,
states that:
INVLPG invalidates all paging-structure caches associated with the
current PCID regardless of the liner addresses to which they correspond.
Fixes: 28ee90fe60 ("x86/mm: implement free pmd/pte page interfaces")
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: mhocko@suse.com
Cc: akpm@linux-foundation.org
Cc: hpa@zytor.com
Cc: cpandya@codeaurora.org
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: Joerg Roedel <joro@8bytes.org>
Cc: stable@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20180627141348.21777-4-toshi.kani@hpe.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3bbda5a386 upstream.
If the ts3a227e audio accessory detection hardware is present and its
driver probed, the jack needs to be created before enabling jack
detection in the ts3a227e driver. With this patch, the jack is
instantiated in the max98090 headset init function if the ts3a227e is
present. This fixes a null pointer dereference as the jack detection
enabling function in the ts3a driver was called before the jack is
created.
[minor correction to keep error handling on jack creation the same
as before by Pierre Bossart]
Signed-off-by: Thierry Escande <thierry.escande@collabora.com>
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Acked-By: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 318abdfbe7 upstream.
Like the skcipher_walk and blkcipher_walk cases:
scatterwalk_done() is only meant to be called after a nonzero number of
bytes have been processed, since scatterwalk_pagedone() will flush the
dcache of the *previous* page. But in the error case of
ablkcipher_walk_done(), e.g. if the input wasn't an integer number of
blocks, scatterwalk_done() was actually called after advancing 0 bytes.
This caused a crash ("BUG: unable to handle kernel paging request")
during '!PageSlab(page)' on architectures like arm and arm64 that define
ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE, provided that the input was
page-aligned as in that case walk->offset == 0.
Fix it by reorganizing ablkcipher_walk_done() to skip the
scatterwalk_advance() and scatterwalk_done() if an error has occurred.
Reported-by: Liu Chao <liuchao741@huawei.com>
Fixes: bf06099db1 ("crypto: skcipher - Add ablkcipher_walk interfaces")
Cc: <stable@vger.kernel.org> # v2.6.35+
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0868def3e4 upstream.
Like the skcipher_walk case:
scatterwalk_done() is only meant to be called after a nonzero number of
bytes have been processed, since scatterwalk_pagedone() will flush the
dcache of the *previous* page. But in the error case of
blkcipher_walk_done(), e.g. if the input wasn't an integer number of
blocks, scatterwalk_done() was actually called after advancing 0 bytes.
This caused a crash ("BUG: unable to handle kernel paging request")
during '!PageSlab(page)' on architectures like arm and arm64 that define
ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE, provided that the input was
page-aligned as in that case walk->offset == 0.
Fix it by reorganizing blkcipher_walk_done() to skip the
scatterwalk_advance() and scatterwalk_done() if an error has occurred.
This bug was found by syzkaller fuzzing.
Reproducer, assuming ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE:
#include <linux/if_alg.h>
#include <sys/socket.h>
#include <unistd.h>
int main()
{
struct sockaddr_alg addr = {
.salg_type = "skcipher",
.salg_name = "ecb(aes-generic)",
};
char buffer[4096] __attribute__((aligned(4096))) = { 0 };
int fd;
fd = socket(AF_ALG, SOCK_SEQPACKET, 0);
bind(fd, (void *)&addr, sizeof(addr));
setsockopt(fd, SOL_ALG, ALG_SET_KEY, buffer, 16);
fd = accept(fd, NULL, NULL);
write(fd, buffer, 15);
read(fd, buffer, 15);
}
Reported-by: Liu Chao <liuchao741@huawei.com>
Fixes: 5cde0af2a9 ("[CRYPTO] cipher: Added block cipher type")
Cc: <stable@vger.kernel.org> # v2.6.19+
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bb29648102 upstream.
syzbot reported a crash in vmac_final() when multiple threads
concurrently use the same "vmac(aes)" transform through AF_ALG. The bug
is pretty fundamental: the VMAC template doesn't separate per-request
state from per-tfm (per-key) state like the other hash algorithms do,
but rather stores it all in the tfm context. That's wrong.
Also, vmac_final() incorrectly zeroes most of the state including the
derived keys and cached pseudorandom pad. Therefore, only the first
VMAC invocation with a given key calculates the correct digest.
Fix these bugs by splitting the per-tfm state from the per-request state
and using the proper init/update/final sequencing for requests.
Reproducer for the crash:
#include <linux/if_alg.h>
#include <sys/socket.h>
#include <unistd.h>
int main()
{
int fd;
struct sockaddr_alg addr = {
.salg_type = "hash",
.salg_name = "vmac(aes)",
};
char buf[256] = { 0 };
fd = socket(AF_ALG, SOCK_SEQPACKET, 0);
bind(fd, (void *)&addr, sizeof(addr));
setsockopt(fd, SOL_ALG, ALG_SET_KEY, buf, 16);
fork();
fd = accept(fd, NULL, NULL);
for (;;)
write(fd, buf, 256);
}
The immediate cause of the crash is that vmac_ctx_t.partial_size exceeds
VMAC_NHBYTES, causing vmac_final() to memset() a negative length.
Reported-by: syzbot+264bca3a6e8d645550d3@syzkaller.appspotmail.com
Fixes: f1939f7c56 ("crypto: vmac - New hash algorithm for intel_txt support")
Cc: <stable@vger.kernel.org> # v2.6.32+
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 73bf20ef3d upstream.
The VMAC template assumes the block cipher has a 128-bit block size, but
it failed to check for that. Thus it was possible to instantiate it
using a 64-bit block size cipher, e.g. "vmac(cast5)", causing
uninitialized memory to be used.
Add the needed check when instantiating the template.
Fixes: f1939f7c56 ("crypto: vmac - New hash algorithm for intel_txt support")
Cc: <stable@vger.kernel.org> # v2.6.32+
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0a957467c5 upstream.
i8259.h uses inb/outb and thus needs to include asm/io.h to avoid the
following build error, as seen with x86_64:defconfig and CONFIG_SMP=n.
In file included from drivers/rtc/rtc-cmos.c:45:0:
arch/x86/include/asm/i8259.h: In function 'inb_pic':
arch/x86/include/asm/i8259.h:32:24: error:
implicit declaration of function 'inb'
arch/x86/include/asm/i8259.h: In function 'outb_pic':
arch/x86/include/asm/i8259.h:45:2: error:
implicit declaration of function 'outb'
Reported-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Suggested-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Fixes: 447ae31667 ("x86: Don't include linux/irq.h from asm/hardirq.h")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b89b41d0b8 upstream.
Current cpu_core_id fixup causes downcored F17h configurations to be
incorrect:
NODE: 0
processor 0 core id : 0
processor 1 core id : 1
processor 2 core id : 2
processor 3 core id : 4
processor 4 core id : 5
processor 5 core id : 0
NODE: 1
processor 6 core id : 2
processor 7 core id : 3
processor 8 core id : 4
processor 9 core id : 0
processor 10 core id : 1
processor 11 core id : 2
Code that relies on the cpu_core_id, like match_smt(), for example,
which builds the thread siblings masks used by the scheduler, is
mislead.
So, limit the fixup to pre-F17h machines. The new value for cpu_core_id
for F17h and later will represent the CPUID_Fn8000001E_EBX[CoreId],
which is guaranteed to be unique for each core within a socket.
This way we have:
NODE: 0
processor 0 core id : 0
processor 1 core id : 1
processor 2 core id : 2
processor 3 core id : 4
processor 4 core id : 5
processor 5 core id : 6
NODE: 1
processor 6 core id : 8
processor 7 core id : 9
processor 8 core id : 10
processor 9 core id : 12
processor 10 core id : 13
processor 11 core id : 14
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
[ Heavily massaged. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yazen Ghannam <Yazen.Ghannam@amd.com>
Link: http://lkml.kernel.org/r/20170731085159.9455-2-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6c26fcd2ab upstream.
pfn_modify_allowed() and arch_has_pfn_modify_check() are outside of the
!__ASSEMBLY__ section in include/asm-generic/pgtable.h, which confuses
assembler on archs that don't have __HAVE_ARCH_PFN_MODIFY_ALLOWED (e.g.
ia64) and breaks build:
include/asm-generic/pgtable.h: Assembler messages:
include/asm-generic/pgtable.h:538: Error: Unknown opcode `static inline bool pfn_modify_allowed(unsigned long pfn,pgprot_t prot)'
include/asm-generic/pgtable.h:540: Error: Unknown opcode `return true'
include/asm-generic/pgtable.h:543: Error: Unknown opcode `static inline bool arch_has_pfn_modify_check(void)'
include/asm-generic/pgtable.h:545: Error: Unknown opcode `return false'
arch/ia64/kernel/entry.S:69: Error: `mov' does not fit into bundle
Move those two static inlines into the !__ASSEMBLY__ section so that they
don't confuse the asm build pass.
Fixes: 42e4089c78 ("x86/speculation/l1tf: Disallow non privileged high MMIO PROT_NONE mappings")
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[groeck: Context changes]
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 792adb90fa upstream.
The introduction of generic_max_swapfile_size and arch-specific versions has
broken linking on x86 with CONFIG_SWAP=n due to undefined reference to
'generic_max_swapfile_size'. Fix it by compiling the x86-specific
max_swapfile_size() only with CONFIG_SWAP=y.
Reported-by: Tomas Pruzina <pruzinat@gmail.com>
Fixes: 377eeaa8e1 ("x86/speculation/l1tf: Limit swap file size to MAX_PA/2")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 269777aa53 upstream.
Commit 0cc3cd2165 ("cpu/hotplug: Boot HT siblings at least once")
breaks non-SMP builds.
[ I suspect the 'bool' fields should just be made to be bitfields and be
exposed regardless of configuration, but that's a separate cleanup
that I'll leave to the owners of this file for later. - Linus ]
Fixes: 0cc3cd2165 ("cpu/hotplug: Boot HT siblings at least once")
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Abel Vesa <abelvesa@linux.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d0055f351e upstream.
The function has an inline "return false;" definition with CONFIG_SMP=n
but the "real" definition is also visible leading to "redefinition of
‘apic_id_is_primary_thread’" compiler error.
Guard it with #ifdef CONFIG_SMP
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Fixes: 6a4d2657e0 ("x86/smp: Provide topology_is_primary_thread()")
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 07d981ad4c upstream
The kernel unnecessarily prevents late microcode loading when SMT is
disabled. It should be safe to allow it if all the primary threads are
online.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1063711b57 upstream
The mmio tracer sets io mapping PTEs and PMDs to non present when enabled
without inverting the address bits, which makes the PTE entry vulnerable
for L1TF.
Make it use the right low level macros to actually invert the address bits
to protect against L1TF.
In principle this could be avoided because MMIO tracing is not likely to be
enabled on production machines, but the fix is straigt forward and for
consistency sake it's better to get rid of the open coded PTE manipulation.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 958f79b9ee upstream
set_memory_np() is used to mark kernel mappings not present, but it has
it's own open coded mechanism which does not have the L1TF protection of
inverting the address bits.
Replace the open coded PTE manipulation with the L1TF protecting low level
PTE routines.
Passes the CPA self test.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[ dwmw2: Pull in pud_mkhuge() from commit a00cc7d9dd, and pfn_pud() ]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0768f91530 upstream
Some cases in THP like:
- MADV_FREE
- mprotect
- split
mark the PMD non present for temporarily to prevent races. The window for
an L1TF attack in these contexts is very small, but it wants to be fixed
for correctness sake.
Use the proper low level functions for pmd/pud_mknotpresent() to address
this.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f22cc87f6c upstream
For kernel mappings PAGE_PROTNONE is not necessarily set for a non present
mapping, but the inversion logic explicitely checks for !PRESENT and
PROT_NONE.
Remove the PROT_NONE check and make the inversion unconditional for all not
present mappings.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bc2d8d262c upstream
Josh reported that the late SMT evaluation in cpu_smt_state_init() sets
cpu_smt_control to CPU_SMT_NOT_SUPPORTED in case that 'nosmt' was supplied
on the kernel command line as it cannot differentiate between SMT disabled
by BIOS and SMT soft disable via 'nosmt'. That wreckages the state and
makes the sysfs interface unusable.
Rework this so that during bringup of the non boot CPUs the availability of
SMT is determined in cpu_smt_allowed(). If a newly booted CPU is not a
'primary' thread then set the local cpu_smt_available marker and evaluate
this explicitely right after the initial SMP bringup has finished.
SMT evaulation on x86 is a trainwreck as the firmware has all the
information _before_ booting the kernel, but there is no interface to query
it.
Fixes: 73d5e2b472 ("cpu/hotplug: detect SMT disabled by BIOS")
Reported-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5b76a3cff0 upstream
When nested virtualization is in use, VMENTER operations from the nested
hypervisor into the nested guest will always be processed by the bare metal
hypervisor, and KVM's "conditional cache flushes" mode in particular does a
flush on nested vmentry. Therefore, include the "skip L1D flush on
vmentry" bit in KVM's suggested ARCH_CAPABILITIES setting.
Add the relevant Documentation.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8e0b2b9166 upstream
Bit 3 of ARCH_CAPABILITIES tells a hypervisor that L1D flush on vmentry is
not needed. Add a new value to enum vmx_l1d_flush_state, which is used
either if there is no L1TF bug at all, or if bit 3 is set in ARCH_CAPABILITIES.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ea156d192f upstream
Three changes to the content of the sysfs file:
- If EPT is disabled, L1TF cannot be exploited even across threads on the
same core, and SMT is irrelevant.
- If mitigation is completely disabled, and SMT is enabled, print "vulnerable"
instead of "vulnerable, SMT vulnerable"
- Reorder the two parts so that the main vulnerability state comes first
and the detail on SMT is second.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cd28325249 upstream
This lets userspace read the MSR_IA32_ARCH_CAPABILITIES and check that all
requested features are available on the host.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 518e7b9481 upstream
Linux (among the others) has checks to make sure that certain features
aren't enabled on a certain family/model/stepping if the microcode version
isn't greater than or equal to a known good version.
By exposing the real microcode version, we're preventing buggy guests that
don't check that they are running virtualized (i.e., they should trust the
hypervisor) from disabling features that are effectively not buggy.
Suggested-by: Filippo Sironi <sironi@amazon.de>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Liran Alon <liran.alon@oracle.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d1d93fa90f upstream
In order to determine if LFENCE is a serializing instruction on AMD
processors, MSR 0xc0011029 (MSR_F10H_DECFG) must be read and the state
of bit 1 checked. This patch will add support to allow a guest to
properly make this determination.
Add the MSR feature callback operation to svm.c and add MSR 0xc0011029
to the list of MSR-based features. If LFENCE is serializing, then the
feature is supported, allowing the hypervisor to set the value of the
MSR that guest will see. Support is also added to write (hypervisor only)
and read the MSR value for the guest. A write by the guest will result in
a #GP. A read by the guest will return the value as set by the host. In
this way, the support to expose the feature to the guest is controlled by
the hypervisor.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 801e459a6f upstream
Provide a new KVM capability that allows bits within MSRs to be recognized
as features. Two new ioctls are added to the /dev/kvm ioctl routine to
retrieve the list of these MSRs and then retrieve their values. A kvm_x86_ops
callback is used to determine support for the listed MSR-based features.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[Tweaked documentation. - Radim]
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 18b57ce2eb upstream
For VMEXITs caused by external interrupts, vmx_handle_external_intr()
indirectly calls into the interrupt handlers through the host's IDT.
It follows that these interrupts get accounted for in the
kvm_cpu_l1tf_flush_l1d per-cpu flag.
The subsequently executed vmx_l1d_flush() will thus be aware that some
interrupts have happened and conduct a L1d flush anyway.
Setting l1tf_flush_l1d from vmx_handle_external_intr() isn't needed
anymore. Drop it.
Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ffcba43ff6 upstream
The last missing piece to having vmx_l1d_flush() take interrupts after
VMEXIT into account is to set the kvm_cpu_l1tf_flush_l1d per-cpu flag on
irq entry.
Issue calls to kvm_set_cpu_l1tf_flush_l1d() from entering_irq(),
ipi_entering_ack_irq(), smp_reschedule_interrupt() and
uv_bau_message_interrupt().
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 447ae31667 upstream
The next patch in this series will have to make the definition of
irq_cpustat_t available to entering_irq().
Inclusion of asm/hardirq.h into asm/apic.h would cause circular header
dependencies like
asm/smp.h
asm/apic.h
asm/hardirq.h
linux/irq.h
linux/topology.h
linux/smp.h
asm/smp.h
or
linux/gfp.h
linux/mmzone.h
asm/mmzone.h
asm/mmzone_64.h
asm/smp.h
asm/apic.h
asm/hardirq.h
linux/irq.h
linux/irqdesc.h
linux/kobject.h
linux/sysfs.h
linux/kernfs.h
linux/idr.h
linux/gfp.h
and others.
This causes compilation errors because of the header guards becoming
effective in the second inclusion: symbols/macros that had been defined
before wouldn't be available to intermediate headers in the #include chain
anymore.
A possible workaround would be to move the definition of irq_cpustat_t
into its own header and include that from both, asm/hardirq.h and
asm/apic.h.
However, this wouldn't solve the real problem, namely asm/harirq.h
unnecessarily pulling in all the linux/irq.h cruft: nothing in
asm/hardirq.h itself requires it. Also, note that there are some other
archs, like e.g. arm64, which don't have that #include in their
asm/hardirq.h.
Remove the linux/irq.h #include from x86' asm/hardirq.h.
Fix resulting compilation errors by adding appropriate #includes to *.c
files as needed.
Note that some of these *.c files could be cleaned up a bit wrt. to their
set of #includes, but that should better be done from separate patches, if
at all.
Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[dwmw2: More fixes for EFI and Xen in 4.9]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 45b575c00d upstream
Part of the L1TF mitigation for vmx includes flushing the L1D cache upon
VMENTRY.
L1D flushes are costly and two modes of operations are provided to users:
"always" and the more selective "conditional" mode.
If operating in the latter, the cache would get flushed only if a host side
code path considered unconfined had been traversed. "Unconfined" in this
context means that it might have pulled in sensitive data like user data
or kernel crypto keys.
The need for L1D flushes is tracked by means of the per-vcpu flag
l1tf_flush_l1d. KVM exit handlers considered unconfined set it. A
vmx_l1d_flush() subsequently invoked before the next VMENTER will conduct a
L1d flush based on its value and reset that flag again.
Currently, interrupts delivered "normally" while in root operation between
VMEXIT and VMENTER are not taken into account. Part of the reason is that
these don't leave any traces and thus, the vmx code is unable to tell if
any such has happened.
As proposed by Paolo Bonzini, prepare for tracking all interrupts by
introducing a new per-cpu flag, "kvm_cpu_l1tf_flush_l1d". It will be in
strong analogy to the per-vcpu ->l1tf_flush_l1d.
A later patch will make interrupt handlers set it.
For the sake of cache locality, group kvm_cpu_l1tf_flush_l1d into x86'
per-cpu irq_cpustat_t as suggested by Peter Zijlstra.
Provide the helpers kvm_set_cpu_l1tf_flush_l1d(),
kvm_clear_cpu_l1tf_flush_l1d() and kvm_get_cpu_l1tf_flush_l1d(). Make them
trivial resp. non-existent for !CONFIG_KVM_INTEL as appropriate.
Let vmx_l1d_flush() handle kvm_cpu_l1tf_flush_l1d in the same way as
l1tf_flush_l1d.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9aee5f8a7e upstream
An upcoming patch will extend KVM's L1TF mitigation in conditional mode
to also cover interrupts after VMEXITs. For tracking those, stores to a
new per-cpu flag from interrupt handlers will become necessary.
In order to improve cache locality, this new flag will be added to x86's
irq_cpustat_t.
Make some space available there by shrinking the ->softirq_pending bitfield
from 32 to 16 bits: the number of bits actually used is only NR_SOFTIRQS,
i.e. 10.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5b6ccc6c3b upstream
Currently, vmx_vcpu_run() checks if l1tf_flush_l1d is set and invokes
vmx_l1d_flush() if so.
This test is unncessary for the "always flush L1D" mode.
Move the check to vmx_l1d_flush()'s conditional mode code path.
Notes:
- vmx_l1d_flush() is likely to get inlined anyway and thus, there's no
extra function call.
- This inverts the (static) branch prediction, but there hadn't been any
explicit likely()/unlikely() annotations before and so it stays as is.
Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 427362a142 upstream
The vmx_l1d_flush_always static key is only ever evaluated if
vmx_l1d_should_flush is enabled. In that case however, there are only two
L1d flushing modes possible: "always" and "conditional".
The "conditional" mode's implementation tends to require more sophisticated
logic than the "always" mode.
Avoid inverted logic by replacing the 'vmx_l1d_flush_always' static key
with a 'vmx_l1d_flush_cond' one.
There is no change in functionality.
Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 379fd0c7e6 upstream
vmx_l1d_flush() gets invoked only if l1tf_flush_l1d is true. There's no
point in setting l1tf_flush_l1d to true from there again.
Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 73d5e2b472 upstream
If SMT is disabled in BIOS, the CPU code doesn't properly detect it.
The /sys/devices/system/cpu/smt/control file shows 'on', and the 'l1tf'
vulnerabilities file shows SMT as vulnerable.
Fix it by forcing 'cpu_smt_control' to CPU_SMT_NOT_SUPPORTED in such a
case. Unfortunately the detection can only be done after bringing all
the CPUs online, so we have to overwrite any previous writes to the
variable.
Reported-by: Joe Mario <jmario@redhat.com>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Fixes: f048c399e0 ("x86/topology: Provide topology_smt_supported()")
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 288d152c23 upstream
The slow path in vmx_l1d_flush() reads from vmx_l1d_flush_pages in order
to evict the L1d cache.
However, these pages are never cleared and, in theory, their data could be
leaked.
More importantly, KSM could merge a nested hypervisor's vmx_l1d_flush_pages
to fewer than 1 << L1D_CACHE_ORDER host physical pages and this would break
the L1d flushing algorithm: L1D on x86_64 is tagged by physical addresses.
Fix this by initializing the individual vmx_l1d_flush_pages with a
different pattern each.
Rename the "empty_zp" asm constraint identifier in vmx_l1d_flush() to
"flush_pages" to reflect this change.
Fixes: a47dd5f067 ("x86/KVM/VMX: Add L1D flush algorithm")
Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d90a7a0ec8 upstream
Introduce the 'l1tf=' kernel command line option to allow for boot-time
switching of mitigation that is used on processors affected by L1TF.
The possible values are:
full
Provides all available mitigations for the L1TF vulnerability. Disables
SMT and enables all mitigations in the hypervisors. SMT control via
/sys/devices/system/cpu/smt/control is still possible after boot.
Hypervisors will issue a warning when the first VM is started in
a potentially insecure configuration, i.e. SMT enabled or L1D flush
disabled.
full,force
Same as 'full', but disables SMT control. Implies the 'nosmt=force'
command line option. sysfs control of SMT and the hypervisor flush
control is disabled.
flush
Leaves SMT enabled and enables the conditional hypervisor mitigation.
Hypervisors will issue a warning when the first VM is started in a
potentially insecure configuration, i.e. SMT enabled or L1D flush
disabled.
flush,nosmt
Disables SMT and enables the conditional hypervisor mitigation. SMT
control via /sys/devices/system/cpu/smt/control is still possible
after boot. If SMT is reenabled or flushing disabled at runtime
hypervisors will issue a warning.
flush,nowarn
Same as 'flush', but hypervisors will not warn when
a VM is started in a potentially insecure configuration.
off
Disables hypervisor mitigations and doesn't emit any warnings.
Default is 'flush'.
Let KVM adhere to these semantics, which means:
- 'lt1f=full,force' : Performe L1D flushes. No runtime control
possible.
- 'l1tf=full'
- 'l1tf-flush'
- 'l1tf=flush,nosmt' : Perform L1D flushes and warn on VM start if
SMT has been runtime enabled or L1D flushing
has been run-time enabled
- 'l1tf=flush,nowarn' : Perform L1D flushes and no warnings are emitted.
- 'l1tf=off' : L1D flushes are not performed and no warnings
are emitted.
KVM can always override the L1D flushing behavior using its 'vmentry_l1d_flush'
module parameter except when lt1f=full,force is set.
This makes KVM's private 'nosmt' option redundant, and as it is a bit
non-systematic anyway (this is something to control globally, not on
hypervisor level), remove that option.
Add the missing Documentation entry for the l1tf vulnerability sysfs file
while at it.
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20180713142323.202758176@linutronix.de
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fee0aede6f upstream
The CPU_SMT_NOT_SUPPORTED state is set (if the processor does not support
SMT) when the sysfs SMT control file is initialized.
That was fine so far as this was only required to make the output of the
control file correct and to prevent writes in that case.
With the upcoming l1tf command line parameter, this needs to be set up
before the L1TF mitigation selection and command line parsing happens.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20180713142323.121795971@linutronix.de
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7db92e165a upstream
In preparation of allowing run time control for L1D flushing, move the
setup code to the module parameter handler.
In case of pre module init parsing, just store the value and let vmx_init()
do the actual setup after running kvm_init() so that enable_ept is having
the correct state.
During run-time invoke it directly from the parameter setter to prepare for
run-time control.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20180713142322.694063239@linutronix.de
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2f055947ae upstream
The VMX module parameter to control the L1D flush should become
writeable.
The MSR list is set up at VM init per guest VCPU, but the run time
switching is based on a static key which is global. Toggling the MSR list
at run time might be feasible, but for now drop this optimization and use
the regular MSR write to make run-time switching possible.
The default mitigation is the conditional flush anyway, so for extra
paranoid setups this will add some small overhead, but the extra code
executed is in the noise compared to the flush itself.
Aside of that the EPT disabled case is not handled correctly at the moment
and the MSR list magic is in the way for fixing that as well.
If it's really providing a significant advantage, then this needs to be
revisited after the code is correct and the control is writable.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20180713142322.516940445@linutronix.de
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 215af5499d upstream
Writing 'off' to /sys/devices/system/cpu/smt/control offlines all SMT
siblings. Writing 'on' merily enables the abilify to online them, but does
not online them automatically.
Make 'on' more useful by onlining all offline siblings.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 390d975e0c upstream
If the L1D flush module parameter is set to 'always' and the IA32_FLUSH_CMD
MSR is available, optimize the VMENTER code with the MSR save list.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 989e3992d2 upstream
The IA32_FLUSH_CMD MSR needs only to be written on VMENTER. Extend
add_atomic_switch_msr() with an entry_only parameter to allow storing the
MSR only in the guest (ENTRY) MSR array.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 33966dd6b2 upstream
There is no semantic change but this change allows an unbalanced amount of
MSRs to be loaded on VMEXIT and VMENTER, i.e. the number of MSRs to save or
restore on VMEXIT or VMENTER may be different.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 83bafef1a1 upstream
When L0 establishes (or removes) an MSR entry in the VM-entry or VM-exit
MSR load lists, the change should affect the dormant VMCS as well as the
current VMCS. Moreover, the vmcs02 MSR-load addresses should be
initialized.
[ dwmw2: Pulled in to 4.9 backports for L1TF ]
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c595ceee45 upstream
Add the logic for flushing L1D on VMENTER. The flush depends on the static
key being enabled and the new l1tf_flush_l1d flag being set.
The flags is set:
- Always, if the flush module parameter is 'always'
- Conditionally at:
- Entry to vcpu_run(), i.e. after executing user space
- From the sched_in notifier, i.e. when switching to a vCPU thread.
- From vmexit handlers which are considered unsafe, i.e. where
sensitive data can be brought into L1D:
- The emulator, which could be a good target for other speculative
execution-based threats,
- The MMU, which can bring host page tables in the L1 cache.
- External interrupts
- Nested operations that require the MMU (see above). That is
vmptrld, vmptrst, vmclear,vmwrite,vmread.
- When handling invept,invvpid
[ tglx: Split out from combo patch and reduced to a single flag ]
[ dwmw2: Backported to 4.9, set l1tf_flush_l1d in svm/vmx code ]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3fa045be4c upstream
336996-Speculative-Execution-Side-Channel-Mitigations.pdf defines a new MSR
(IA32_FLUSH_CMD aka 0x10B) which has similar write-only semantics to other
MSRs defined in the document.
The semantics of this MSR is to allow "finer granularity invalidation of
caching structures than existing mechanisms like WBINVD. It will writeback
and invalidate the L1 data cache, including all cachelines brought in by
preceding instructions, without invalidating all caches (eg. L2 or
LLC). Some processors may also invalidate the first level level instruction
cache on a L1D_FLUSH command. The L1 data and instruction caches may be
shared across the logical processors of a core."
Use it instead of the loop based L1 flush algorithm.
A copy of this document is available at
https://bugzilla.kernel.org/show_bug.cgi?id=199511
[ tglx: Avoid allocating pages when the MSR is available ]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a47dd5f067 upstream
To mitigate the L1 Terminal Fault vulnerability it's required to flush L1D
on VMENTER to prevent rogue guests from snooping host memory.
CPUs will have a new control MSR via a microcode update to flush L1D with a
single MSR write, but in the absence of microcode a fallback to a software
based flush algorithm is required.
Add a software flush loop which is based on code from Intel.
[ tglx: Split out from combo patch ]
[ bpetkov: Polish the asm code ]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a399477e52 upstream
Add a mitigation mode parameter "vmentry_l1d_flush" for CVE-2018-3620, aka
L1 terminal fault. The valid arguments are:
- "always" L1D cache flush on every VMENTER.
- "cond" Conditional L1D cache flush, explained below
- "never" Disable the L1D cache flush mitigation
"cond" is trying to avoid L1D cache flushes on VMENTER if the code executed
between VMEXIT and VMENTER is considered safe, i.e. is not bringing any
interesting information into L1D which might exploited.
[ tglx: Split out from a larger patch ]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 26acfb666a upstream
If the L1TF CPU bug is present we allow the KVM module to be loaded as the
major of users that use Linux and KVM have trusted guests and do not want a
broken setup.
Cloud vendors are the ones that are uncomfortable with CVE 2018-3620 and as
such they are the ones that should set nosmt to one.
Setting 'nosmt' means that the system administrator also needs to disable
SMT (Hyper-threading) in the BIOS, or via the 'nosmt' command line
parameter, or via the /sys/devices/system/cpu/smt/control. See commit
05736e4ac1 ("cpu/hotplug: Provide knobs to control SMT").
Other mitigations are to use task affinity, cpu sets, interrupt binding,
etc - anything to make sure that _only_ the same guests vCPUs are running
on sibling threads.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0cc3cd2165 upstream
Due to the way Machine Check Exceptions work on X86 hyperthreads it's
required to boot up _all_ logical cores at least once in order to set the
CR4.MCE bit.
So instead of ignoring the sibling threads right away, let them boot up
once so they can configure themselves. After they came out of the initial
boot stage check whether its a "secondary" sibling and cancel the operation
which puts the CPU back into offline state.
[dwmw2: Backport to 4.9]
Reported-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 506a66f374 upstream
Dave Hansen reported, that it's outright dangerous to keep SMT siblings
disabled completely so they are stuck in the BIOS and wait for SIPI.
The reason is that Machine Check Exceptions are broadcasted to siblings and
the soft disabled sibling has CR4.MCE = 0. If a MCE is delivered to a
logical core with CR4.MCE = 0, it asserts IERR#, which shuts down or
reboots the machine. The MCE chapter in the SDM contains the following
blurb:
Because the logical processors within a physical package are tightly
coupled with respect to shared hardware resources, both logical
processors are notified of machine check errors that occur within a
given physical processor. If machine-check exceptions are enabled when
a fatal error is reported, all the logical processors within a physical
package are dispatched to the machine-check exception handler. If
machine-check exceptions are disabled, the logical processors enter the
shutdown state and assert the IERR# signal. When enabling machine-check
exceptions, the MCE flag in control register CR4 should be set for each
logical processor.
Reverting the commit which ignores siblings at enumeration time solves only
half of the problem. The core cpuhotplug logic needs to be adjusted as
well.
This thoughtful engineered mechanism also turns the boot process on all
Intel HT enabled systems into a MCE lottery. MCE is enabled on the boot CPU
before the secondary CPUs are brought up. Depending on the number of
physical cores the window in which this situation can happen is smaller or
larger. On a HSW-EX it's about 750ms:
MCE is enabled on the boot CPU:
[ 0.244017] mce: CPU supports 22 MCE banks
The corresponding sibling #72 boots:
[ 1.008005] .... node #0, CPUs: #72
That means if an MCE hits on physical core 0 (logical CPUs 0 and 72)
between these two points the machine is going to shutdown. At least it's a
known safe state.
It's obvious that the early boot can be hit by an MCE as well and then runs
into the same situation because MCEs are not yet enabled on the boot CPU.
But after enabling them on the boot CPU, it does not make any sense to
prevent the kernel from recovering.
Adjust the nosmt kernel parameter documentation as well.
Reverts: 2207def700 ("x86/apic: Ignore secondary threads if nosmt=force")
Reported-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e14d7dfb41 upstream
Jan has noticed that pte_pfn and co. resp. pfn_pte are incorrect for
CONFIG_PAE because phys_addr_t is wider than unsigned long and so the
pte_val reps. shift left would get truncated. Fix this up by using proper
types.
[dwmw2: Backport to 4.9]
Fixes: 6b28baca9b ("x86/speculation/l1tf: Protect PROT_NONE PTEs against speculation")
Reported-by: Jan Beulich <JBeulich@suse.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0d0f624905 upstream
The PAE 3-level paging code currently doesn't mitigate L1TF by flipping the
offset bits, and uses the high PTE word, thus bits 32-36 for type, 37-63 for
offset. The lower word is zeroed, thus systems with less than 4GB memory are
safe. With 4GB to 128GB the swap type selects the memory locations vulnerable
to L1TF; with even more memory, also the swap offfset influences the address.
This might be a problem with 32bit PAE guests running on large 64bit hosts.
By continuing to keep the whole swap entry in either high or low 32bit word of
PTE we would limit the swap size too much. Thus this patch uses the whole PAE
PTE with the same layout as the 64bit version does. The macros just become a
bit tricky since they assume the arch-dependent swp_entry_t to be 32bit.
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7ce2f0393e upstream
The TOPOEXT reenablement is a workaround for broken BIOSen which didn't
enable the CPUID bit. amd_get_topology_early(), however, relies on
that bit being set so that it can read out the CPUID leaf and set
smp_num_siblings properly.
Move the reenablement up to early_init_amd(). While at it, simplify
amd_get_topology_early().
[dwmw2: Backport to 4.9]
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 11e34e64e4 upstream
336996-Speculative-Execution-Side-Channel-Mitigations.pdf defines a new MSR
(IA32_FLUSH_CMD) which is detected by CPUID.7.EDX[28]=1 bit being set.
This new MSR "gives software a way to invalidate structures with finer
granularity than other architectual methods like WBINVD."
A copy of this document is available at
https://bugzilla.kernel.org/show_bug.cgi?id=199511
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1a7ed1ba4b upstream
The previous patch has limited swap file size so that large offsets cannot
clear bits above MAX_PA/2 in the pte and interfere with L1TF mitigation.
It assumed that offsets are encoded starting with bit 12, same as pfn. But
on x86_64, offsets are encoded starting with bit 9.
Thus the limit can be raised by 3 bits. That means 16TB with 42bit MAX_PA
and 256TB with 46bit MAX_PA.
Fixes: 377eeaa8e1 ("x86/speculation/l1tf: Limit swap file size to MAX_PA/2")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2207def700 upstream
nosmt on the kernel command line merely prevents the onlining of the
secondary SMT siblings.
nosmt=force makes the APIC detection code ignore the secondary SMT siblings
completely, so they even do not show up as possible CPUs. That reduces the
amount of memory allocations for per cpu variables and saves other
resources from being allocated too large.
This is not fully equivalent to disabling SMT in the BIOS because the low
level SMT enabling in the BIOS can result in partitioning of resources
between the siblings, which is not undone by just ignoring them. Some CPUs
can use the full resources when their sibling is not onlined, but this is
depending on the CPU family and model and it's not well documented whether
this applies to all partitioned resources. That means depending on the
workload disabling SMT in the BIOS might result in better performance.
Linus analysis of the Intel manual:
The intel optimization manual is not very clear on what the partitioning
rules are.
I find:
"In general, the buffers for staging instructions between major pipe
stages are partitioned. These buffers include µop queues after the
execution trace cache, the queues after the register rename stage, the
reorder buffer which stages instructions for retirement, and the load
and store buffers.
In the case of load and store buffers, partitioning also provided an
easier implementation to maintain memory ordering for each logical
processor and detect memory ordering violations"
but some of that partitioning may be relaxed if the HT thread is "not
active":
"In Intel microarchitecture code name Sandy Bridge, the micro-op queue
is statically partitioned to provide 28 entries for each logical
processor, irrespective of software executing in single thread or
multiple threads. If one logical processor is not active in Intel
microarchitecture code name Ivy Bridge, then a single thread executing
on that processor core can use the 56 entries in the micro-op queue"
but I do not know what "not active" means, and how dynamic it is. Some of
that partitioning may be entirely static and depend on the early BIOS
disabling of HT, and even if we park the cores, the resources will just be
wasted.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1e1d7e25fd upstream
To support force disabling of SMT it's required to know the number of
thread siblings early. amd_get_topology() cannot be called before the APIC
driver is selected, so split out the part which initializes
smp_num_siblings and invoke it from amd_early_init().
[dwmw2: Backport to 4.9]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 119bff8a9c upstream
Old code used to check whether CPUID ext max level is >= 0x80000008 because
that last leaf contains the number of cores of the physical CPU. The three
functions called there now do not depend on that leaf anymore so the check
can go.
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1910ad5624 upstream
Make use of the new early detection function to initialize smp_num_siblings
on the boot cpu before the MP-Table or ACPI/MADT scan happens. That's
required for force disabling SMT.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 95f3d39ccf upstream
To support force disabling of SMT it's required to know the number of
thread siblings early. detect_extended_topology() cannot be called before
the APIC driver is selected, so split out the part which initializes
smp_num_siblings.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 545401f444 upstream
To support force disabling of SMT it's required to know the number of
thread siblings early. detect_ht() cannot be called before the APIC driver
is selected, so split out the part which initializes smp_num_siblings.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 55e6d279ab upstream
The value of this printout is dubious at best and there is no point in
having it in two different places along with convoluted ways to reach it.
Remove it completely.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 05736e4ac1 upstream
Provide a command line and a sysfs knob to control SMT.
The command line options are:
'nosmt': Enumerate secondary threads, but do not online them
'nosmt=force': Ignore secondary threads completely during enumeration
via MP table and ACPI/MADT.
The sysfs control file has the following states (read/write):
'on': SMT is enabled. Secondary threads can be freely onlined
'off': SMT is disabled. Secondary threads, even if enumerated
cannot be onlined
'forceoff': SMT is permanentely disabled. Writes to the control
file are rejected.
'notsupported': SMT is not supported by the CPU
The command line option 'nosmt' sets the sysfs control to 'off'. This
can be changed to 'on' to reenable SMT during runtime.
The command line option 'nosmt=force' sets the sysfs control to
'forceoff'. This cannot be changed during runtime.
When SMT is 'on' and the control file is changed to 'off' then all online
secondary threads are offlined and attempts to online a secondary thread
later on are rejected.
When SMT is 'off' and the control file is changed to 'on' then secondary
threads can be onlined again. The 'off' -> 'on' transition does not
automatically online the secondary threads.
When the control file is set to 'forceoff', the behaviour is the same as
setting it to 'off', but the operation is irreversible and later writes to
the control file are rejected.
When the control status is 'notsupported' then writes to the control file
are rejected.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c4de65696d upstream
The asymmetry caused a warning to trigger if the bootup was stopped in state
CPUHP_AP_ONLINE_IDLE. The warning no longer triggers as kthread_park() can
now be invoked on already or still parked threads. But there is still no
reason to have this be asymmetric.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6a4d2657e0 upstream
If the CPU is supporting SMT then the primary thread can be found by
checking the lower APIC ID bits for zero. smp_num_siblings is used to build
the mask for the APIC ID bits which need to be taken into account.
This uses the MPTABLE or ACPI/MADT supplied APIC ID, which can be different
than the initial APIC ID in CPUID. But according to AMD the lower bits have
to be consistent. Intel gave a tentative confirmation as well.
Preparatory patch to support disabling SMT at boot/runtime.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 56563f53d3 upstream
The pr_warn in l1tf_select_mitigation would have used the prior pr_fmt
which was defined as "Spectre V2 : ".
Move the function to be past SSBD and also define the pr_fmt.
Fixes: 17dbca1193 ("x86/speculation/l1tf: Add sysfs reporting for l1tf")
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 377eeaa8e1 upstream
For the L1TF workaround its necessary to limit the swap file size to below
MAX_PA/2, so that the higher bits of the swap offset inverted never point
to valid memory.
Add a mechanism for the architecture to override the swap file size check
in swapfile.c and add a x86 specific max swapfile check function that
enforces that limit.
The check is only enabled if the CPU is vulnerable to L1TF.
In VMs with 42bit MAX_PA the typical limit is 2TB now, on a native system
with 46bit PA it is 32TB. The limit is only per individual swap file, so
it's always possible to exceed these limits with multiple swap files or
partitions.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 42e4089c78 upstream
For L1TF PROT_NONE mappings are protected by inverting the PFN in the page
table entry. This sets the high bits in the CPU's address space, thus
making sure to point to not point an unmapped entry to valid cached memory.
Some server system BIOSes put the MMIO mappings high up in the physical
address space. If such an high mapping was mapped to unprivileged users
they could attack low memory by setting such a mapping to PROT_NONE. This
could happen through a special device driver which is not access
protected. Normal /dev/mem is of course access protected.
To avoid this forbid PROT_NONE mappings or mprotect for high MMIO mappings.
Valid page mappings are allowed because the system is then unsafe anyways.
It's not expected that users commonly use PROT_NONE on MMIO. But to
minimize any impact this is only enforced if the mapping actually refers to
a high MMIO address (defined as the MAX_PA-1 bit being set), and also skip
the check for root.
For mmaps this is straight forward and can be handled in vm_insert_pfn and
in remap_pfn_range().
For mprotect it's a bit trickier. At the point where the actual PTEs are
accessed a lot of state has been changed and it would be difficult to undo
on an error. Since this is a uncommon case use a separate early page talk
walk pass for MMIO PROT_NONE mappings that checks for this condition
early. For non MMIO and non PROT_NONE there are no changes.
[dwmw2: Backport to 4.9]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 17dbca1193 upstream
L1TF core kernel workarounds are cheap and normally always enabled, However
they still should be reported in sysfs if the system is vulnerable or
mitigated. Add the necessary CPU feature/bug bits.
- Extend the existing checks for Meltdowns to determine if the system is
vulnerable. All CPUs which are not vulnerable to Meltdown are also not
vulnerable to L1TF
- Check for 32bit non PAE and emit a warning as there is no practical way
for mitigation due to the limited physical address bits
- If the system has more than MAX_PA/2 physical memory the invert page
workarounds don't protect the system against the L1TF attack anymore,
because an inverted physical address will also point to valid
memory. Print a warning in this case and report that the system is
vulnerable.
Add a function which returns the PFN limit for the L1TF mitigation, which
will be used in follow up patches for sanity and range checks.
[ tglx: Renamed the CPU feature bit to L1TF_PTEINV ]
[ dwmw2: Backport to 4.9 (cpufeatures.h, E820) ]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 10a70416e1 upstream
The L1TF workaround doesn't make any attempt to mitigate speculate accesses
to the first physical page for zeroed PTEs. Normally it only contains some
data from the early real mode BIOS.
It's not entirely clear that the first page is reserved in all
configurations, so add an extra reservation call to make sure it is really
reserved. In most configurations (e.g. with the standard reservations)
it's likely a nop.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6b28baca9b upstream
When PTEs are set to PROT_NONE the kernel just clears the Present bit and
preserves the PFN, which creates attack surface for L1TF speculation
speculation attacks.
This is important inside guests, because L1TF speculation bypasses physical
page remapping. While the host has its own migitations preventing leaking
data from other VMs into the guest, this would still risk leaking the wrong
page inside the current guest.
This uses the same technique as Linus' swap entry patch: while an entry is
is in PROTNONE state invert the complete PFN part part of it. This ensures
that the the highest bit will point to non existing memory.
The invert is done by pte/pmd_modify and pfn/pmd/pud_pte for PROTNONE and
pte/pmd/pud_pfn undo it.
This assume that no code path touches the PFN part of a PTE directly
without using these primitives.
This doesn't handle the case that MMIO is on the top of the CPU physical
memory. If such an MMIO region was exposed by an unpriviledged driver for
mmap it would be possible to attack some real memory. However this
situation is all rather unlikely.
For 32bit non PAE the inversion is not done because there are really not
enough bits to protect anything.
Q: Why does the guest need to be protected when the HyperVisor already has
L1TF mitigations?
A: Here's an example:
Physical pages 1 2 get mapped into a guest as
GPA 1 -> PA 2
GPA 2 -> PA 1
through EPT.
The L1TF speculation ignores the EPT remapping.
Now the guest kernel maps GPA 1 to process A and GPA 2 to process B, and
they belong to different users and should be isolated.
A sets the GPA 1 PA 2 PTE to PROT_NONE to bypass the EPT remapping and
gets read access to the underlying physical page. Which in this case
points to PA 2, so it can read process B's data, if it happened to be in
L1, so isolation inside the guest is broken.
There's nothing the hypervisor can do about this. This mitigation has to
be done in the guest itself.
[ tglx: Massaged changelog ]
[ dwmw2: backported to 4.9 ]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2f22b4cd45 upstream
With L1 terminal fault the CPU speculates into unmapped PTEs, and resulting
side effects allow to read the memory the PTE is pointing too, if its
values are still in the L1 cache.
For swapped out pages Linux uses unmapped PTEs and stores a swap entry into
them.
To protect against L1TF it must be ensured that the swap entry is not
pointing to valid memory, which requires setting higher bits (between bit
36 and bit 45) that are inside the CPUs physical address space, but outside
any real memory.
To do this invert the offset to make sure the higher bits are always set,
as long as the swap file is not too big.
Note there is no workaround for 32bit !PAE, or on systems which have more
than MAX_PA/2 worth of memory. The later case is very unlikely to happen on
real systems.
[AK: updated description and minor tweaks by. Split out from the original
patch ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bcd11afa7a upstream
If pages are swapped out, the swap entry is stored in the corresponding
PTE, which has the Present bit cleared. CPUs vulnerable to L1TF speculate
on PTE entries which have the present bit set and would treat the swap
entry as phsyical address (PFN). To mitigate that the upper bits of the PTE
must be set so the PTE points to non existent memory.
The swap entry stores the type and the offset of a swapped out page in the
PTE. type is stored in bit 9-13 and offset in bit 14-63. The hardware
ignores the bits beyond the phsyical address space limit, so to make the
mitigation effective its required to start 'offset' at the lowest possible
bit so that even large swap offsets do not reach into the physical address
space limit bits.
Move offset to bit 9-58 and type to bit 59-63 which are the bits that
hardware generally doesn't care about.
That, in turn, means that if you on desktop chip with only 40 bits of
physical addressing, now that the offset starts at bit 9, there needs to be
30 bits of offset actually *in use* until bit 39 ends up being set, which
means when inverted it will again point into existing memory.
So that's 4 terabyte of swap space (because the offset is counted in pages,
so 30 bits of offset is 42 bits of actual coverage). With bigger physical
addressing, that obviously grows further, until the limit of the offset is
hit (at 50 bits of offset - 62 bits of actual swap file coverage).
This is a preparatory change for the actual swap entry inversion to protect
against L1TF.
[ AK: Updated description and minor tweaks. Split into two parts ]
[ tglx: Massaged changelog ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 50896e180c upstream
L1 Terminal Fault (L1TF) is a speculation related vulnerability. The CPU
speculates on PTE entries which do not have the PRESENT bit set, if the
content of the resulting physical address is available in the L1D cache.
The OS side mitigation makes sure that a !PRESENT PTE entry points to a
physical address outside the actually existing and cachable memory
space. This is achieved by inverting the upper bits of the PTE. Due to the
address space limitations this only works for 64bit and 32bit PAE kernels,
but not for 32bit non PAE.
This mitigation applies to both host and guest kernels, but in case of a
64bit host (hypervisor) and a 32bit PAE guest, inverting the upper bits of
the PAE address space (44bit) is not enough if the host has more than 43
bits of populated memory address space, because the speculation treats the
PTE content as a physical host address bypassing EPT.
The host (hypervisor) protects itself against the guest by flushing L1D as
needed, but pages inside the guest are not protected against attacks from
other processes inside the same guest.
For the guest the inverted PTE mask has to match the host to provide the
full protection for all pages the host could possibly map into the
guest. The hosts populated address space is not known to the guest, so the
mask must cover the possible maximal host address space, i.e. 52 bit.
On 32bit PAE the maximum PTE mask is currently set to 44 bit because that
is the limit imposed by 32bit unsigned long PFNs in the VMs. This limits
the mask to be below what the host could possible use for physical pages.
The L1TF PROT_NONE protection code uses the PTE masks to determine which
bits to invert to make sure the higher bits are set for unmapped entries to
prevent L1TF speculation attacks against EPT inside guests.
In order to invert all bits that could be used by the host, increase
__PHYSICAL_PAGE_SHIFT to 52 to match 64bit.
The real limit for a 32bit PAE kernel is still 44 bits because all Linux
PTEs are created from unsigned long PFNs, so they cannot be higher than 44
bits on a 32bit kernel. So these extra PFN bits should be never set. The
only users of this macro are using it to look at PTEs, so it's safe.
[ tglx: Massaged changelog ]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5800dc5c19 upstream.
Nadav reported that on guests we're failing to rewrite the indirect
calls to CALLEE_SAVE paravirt functions. In particular the
pv_queued_spin_unlock() call is left unpatched and that is all over the
place. This obviously wrecks Spectre-v2 mitigation (for paravirt
guests) which relies on not actually having indirect calls around.
The reason is an incorrect clobber test in paravirt_patch_call(); this
function rewrites an indirect call with a direct call to the _SAME_
function, there is no possible way the clobbers can be different
because of this.
Therefore remove this clobber check. Also put WARNs on the other patch
failure case (not enough room for the instruction) which I've not seen
trigger in my (limited) testing.
Three live kernel image disassemblies for lock_sock_nested (as a small
function that illustrates the problem nicely). PRE is the current
situation for guests, POST is with this patch applied and NATIVE is with
or without the patch for !guests.
PRE:
(gdb) disassemble lock_sock_nested
Dump of assembler code for function lock_sock_nested:
0xffffffff817be970 <+0>: push %rbp
0xffffffff817be971 <+1>: mov %rdi,%rbp
0xffffffff817be974 <+4>: push %rbx
0xffffffff817be975 <+5>: lea 0x88(%rbp),%rbx
0xffffffff817be97c <+12>: callq 0xffffffff819f7160 <_cond_resched>
0xffffffff817be981 <+17>: mov %rbx,%rdi
0xffffffff817be984 <+20>: callq 0xffffffff819fbb00 <_raw_spin_lock_bh>
0xffffffff817be989 <+25>: mov 0x8c(%rbp),%eax
0xffffffff817be98f <+31>: test %eax,%eax
0xffffffff817be991 <+33>: jne 0xffffffff817be9ba <lock_sock_nested+74>
0xffffffff817be993 <+35>: movl $0x1,0x8c(%rbp)
0xffffffff817be99d <+45>: mov %rbx,%rdi
0xffffffff817be9a0 <+48>: callq *0xffffffff822299e8
0xffffffff817be9a7 <+55>: pop %rbx
0xffffffff817be9a8 <+56>: pop %rbp
0xffffffff817be9a9 <+57>: mov $0x200,%esi
0xffffffff817be9ae <+62>: mov $0xffffffff817be993,%rdi
0xffffffff817be9b5 <+69>: jmpq 0xffffffff81063ae0 <__local_bh_enable_ip>
0xffffffff817be9ba <+74>: mov %rbp,%rdi
0xffffffff817be9bd <+77>: callq 0xffffffff817be8c0 <__lock_sock>
0xffffffff817be9c2 <+82>: jmp 0xffffffff817be993 <lock_sock_nested+35>
End of assembler dump.
POST:
(gdb) disassemble lock_sock_nested
Dump of assembler code for function lock_sock_nested:
0xffffffff817be970 <+0>: push %rbp
0xffffffff817be971 <+1>: mov %rdi,%rbp
0xffffffff817be974 <+4>: push %rbx
0xffffffff817be975 <+5>: lea 0x88(%rbp),%rbx
0xffffffff817be97c <+12>: callq 0xffffffff819f7160 <_cond_resched>
0xffffffff817be981 <+17>: mov %rbx,%rdi
0xffffffff817be984 <+20>: callq 0xffffffff819fbb00 <_raw_spin_lock_bh>
0xffffffff817be989 <+25>: mov 0x8c(%rbp),%eax
0xffffffff817be98f <+31>: test %eax,%eax
0xffffffff817be991 <+33>: jne 0xffffffff817be9ba <lock_sock_nested+74>
0xffffffff817be993 <+35>: movl $0x1,0x8c(%rbp)
0xffffffff817be99d <+45>: mov %rbx,%rdi
0xffffffff817be9a0 <+48>: callq 0xffffffff810a0c20 <__raw_callee_save___pv_queued_spin_unlock>
0xffffffff817be9a5 <+53>: xchg %ax,%ax
0xffffffff817be9a7 <+55>: pop %rbx
0xffffffff817be9a8 <+56>: pop %rbp
0xffffffff817be9a9 <+57>: mov $0x200,%esi
0xffffffff817be9ae <+62>: mov $0xffffffff817be993,%rdi
0xffffffff817be9b5 <+69>: jmpq 0xffffffff81063aa0 <__local_bh_enable_ip>
0xffffffff817be9ba <+74>: mov %rbp,%rdi
0xffffffff817be9bd <+77>: callq 0xffffffff817be8c0 <__lock_sock>
0xffffffff817be9c2 <+82>: jmp 0xffffffff817be993 <lock_sock_nested+35>
End of assembler dump.
NATIVE:
(gdb) disassemble lock_sock_nested
Dump of assembler code for function lock_sock_nested:
0xffffffff817be970 <+0>: push %rbp
0xffffffff817be971 <+1>: mov %rdi,%rbp
0xffffffff817be974 <+4>: push %rbx
0xffffffff817be975 <+5>: lea 0x88(%rbp),%rbx
0xffffffff817be97c <+12>: callq 0xffffffff819f7160 <_cond_resched>
0xffffffff817be981 <+17>: mov %rbx,%rdi
0xffffffff817be984 <+20>: callq 0xffffffff819fbb00 <_raw_spin_lock_bh>
0xffffffff817be989 <+25>: mov 0x8c(%rbp),%eax
0xffffffff817be98f <+31>: test %eax,%eax
0xffffffff817be991 <+33>: jne 0xffffffff817be9ba <lock_sock_nested+74>
0xffffffff817be993 <+35>: movl $0x1,0x8c(%rbp)
0xffffffff817be99d <+45>: mov %rbx,%rdi
0xffffffff817be9a0 <+48>: movb $0x0,(%rdi)
0xffffffff817be9a3 <+51>: nopl 0x0(%rax)
0xffffffff817be9a7 <+55>: pop %rbx
0xffffffff817be9a8 <+56>: pop %rbp
0xffffffff817be9a9 <+57>: mov $0x200,%esi
0xffffffff817be9ae <+62>: mov $0xffffffff817be993,%rdi
0xffffffff817be9b5 <+69>: jmpq 0xffffffff81063ae0 <__local_bh_enable_ip>
0xffffffff817be9ba <+74>: mov %rbp,%rdi
0xffffffff817be9bd <+77>: callq 0xffffffff817be8c0 <__lock_sock>
0xffffffff817be9c2 <+82>: jmp 0xffffffff817be993 <lock_sock_nested+35>
End of assembler dump.
Fixes: 63f70270cc ("[PATCH] i386: PARAVIRT: add common patching machinery")
Fixes: 3010a0663f ("x86/paravirt, objtool: Annotate indirect calls")
Reported-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1bcfe05640 upstream.
Use the correct IRQ line for the MSI controller in the PCIe host
controller. Apparently a different IRQ line is used compared to other
i.MX6 variants. Without this change MSI IRQs aren't properly propagated
to the upstream interrupt controller.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Fixes: b1d17f68e5 ("ARM: dts: imx: add initial imx6sx device tree source")
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 062d0f22a3 upstream.
In write to debugfs file 'resource_stats' the local buffer 'tmp_str' is
written at index 'count-1' where 'count' is the size of the write, so
potentially 0.
This patch filters odd values for the write size/position to avoid this
type of problem.
Signed-off-by: Michael Mera <dev@michaelmera.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d8f9cc328c upstream.
To allow rereg_user_mr to modify the MR from read-only to writable without
using get_user_pages again, we needed to define the initial MR as writable.
However, this was originally done unconditionally, without taking into
account the writability of the underlying virtual memory.
As a result, any attempt to register a read-only MR over read-only
virtual memory failed.
To fix this, do not add the writable flag bit when the user virtual memory
is not writable (e.g. const memory).
However, when the underlying memory is NOT writable (and we therefore
do not define the initial MR as writable), the IB core adds a
"force writable" flag to its user-pages request. If this succeeds,
the reg_user_mr caller gets a writable copy of the original pages.
If the user-space caller then does a rereg_user_mr operation to enable
writability, this will succeed. This should not be allowed, since
the original virtual memory was not writable.
Cc: <stable@vger.kernel.org>
Fixes: 9376932d0c ("IB/mlx4_ib: Add support for user MR re-registration")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2fd1d2c4ce upstream.
Andrei Vagin writes:
FYI: This bug has been reproduced on 4.11.7
> BUG: Dentry ffff895a3dd01240{i=4e7c09a,n=lo} still in use (1) [unmount of proc proc]
> ------------[ cut here ]------------
> WARNING: CPU: 1 PID: 13588 at fs/dcache.c:1445 umount_check+0x6e/0x80
> CPU: 1 PID: 13588 Comm: kworker/1:1 Not tainted 4.11.7-200.fc25.x86_64 #1
> Hardware name: CompuLab sbc-flt1/fitlet, BIOS SBCFLT_0.08.04 06/27/2015
> Workqueue: events proc_cleanup_work
> Call Trace:
> dump_stack+0x63/0x86
> __warn+0xcb/0xf0
> warn_slowpath_null+0x1d/0x20
> umount_check+0x6e/0x80
> d_walk+0xc6/0x270
> ? dentry_free+0x80/0x80
> do_one_tree+0x26/0x40
> shrink_dcache_for_umount+0x2d/0x90
> generic_shutdown_super+0x1f/0xf0
> kill_anon_super+0x12/0x20
> proc_kill_sb+0x40/0x50
> deactivate_locked_super+0x43/0x70
> deactivate_super+0x5a/0x60
> cleanup_mnt+0x3f/0x90
> mntput_no_expire+0x13b/0x190
> kern_unmount+0x3e/0x50
> pid_ns_release_proc+0x15/0x20
> proc_cleanup_work+0x15/0x20
> process_one_work+0x197/0x450
> worker_thread+0x4e/0x4a0
> kthread+0x109/0x140
> ? process_one_work+0x450/0x450
> ? kthread_park+0x90/0x90
> ret_from_fork+0x2c/0x40
> ---[ end trace e1c109611e5d0b41 ]---
> VFS: Busy inodes after unmount of proc. Self-destruct in 5 seconds. Have a nice day...
> BUG: unable to handle kernel NULL pointer dereference at (null)
> IP: _raw_spin_lock+0xc/0x30
> PGD 0
Fix this by taking a reference to the super block in proc_sys_prune_dcache.
The superblock reference is the core of the fix however the sysctl_inodes
list is converted to a hlist so that hlist_del_init_rcu may be used. This
allows proc_sys_prune_dache to remove inodes the sysctl_inodes list, while
not causing problems for proc_sys_evict_inode when if it later choses to
remove the inode from the sysctl_inodes list. Removing inodes from the
sysctl_inodes list allows proc_sys_prune_dcache to have a progress
guarantee, while still being able to drop all locks. The fact that
head->unregistering is set in start_unregistering ensures that no more
inodes will be added to the the sysctl_inodes list.
Previously the code did a dance where it delayed calling iput until the
next entry in the list was being considered to ensure the inode remained on
the sysctl_inodes list until the next entry was walked to. The structure
of the loop in this patch does not need that so is much easier to
understand and maintain.
Cc: stable@vger.kernel.org
Reported-by: Andrei Vagin <avagin@gmail.com>
Tested-by: Andrei Vagin <avagin@openvz.org>
Fixes: ace0c791e6 ("proc/sysctl: Don't grab i_lock under sysctl_lock.")
Fixes: d6cffbbe9a ("proc/sysctl: prune stale dentries during unregistering")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ace0c791e6 upstream.
Konstantin Khlebnikov <khlebnikov@yandex-team.ru> writes:
> This patch has locking problem. I've got lockdep splat under LTP.
>
> [ 6633.115456] ======================================================
> [ 6633.115502] [ INFO: possible circular locking dependency detected ]
> [ 6633.115553] 4.9.10-debug+ #9 Tainted: G L
> [ 6633.115584] -------------------------------------------------------
> [ 6633.115627] ksm02/284980 is trying to acquire lock:
> [ 6633.115659] (&sb->s_type->i_lock_key#4){+.+...}, at: [<ffffffff816bc1ce>] igrab+0x1e/0x80
> [ 6633.115834] but task is already holding lock:
> [ 6633.115882] (sysctl_lock){+.+...}, at: [<ffffffff817e379b>] unregister_sysctl_table+0x6b/0x110
> [ 6633.116026] which lock already depends on the new lock.
> [ 6633.116026]
> [ 6633.116080]
> [ 6633.116080] the existing dependency chain (in reverse order) is:
> [ 6633.116117]
> -> #2 (sysctl_lock){+.+...}:
> -> #1 (&(&dentry->d_lockref.lock)->rlock){+.+...}:
> -> #0 (&sb->s_type->i_lock_key#4){+.+...}:
>
> d_lock nests inside i_lock
> sysctl_lock nests inside d_lock in d_compare
>
> This patch adds i_lock nesting inside sysctl_lock.
Al Viro <viro@ZenIV.linux.org.uk> replied:
> Once ->unregistering is set, you can drop sysctl_lock just fine. So I'd
> try something like this - use rcu_read_lock() in proc_sys_prune_dcache(),
> drop sysctl_lock() before it and regain after. Make sure that no inodes
> are added to the list ones ->unregistering has been set and use RCU list
> primitives for modifying the inode list, with sysctl_lock still used to
> serialize its modifications.
>
> Freeing struct inode is RCU-delayed (see proc_destroy_inode()), so doing
> igrab() is safe there. Since we don't drop inode reference until after we'd
> passed beyond it in the list, list_for_each_entry_rcu() should be fine.
I agree with Al Viro's analsysis of the situtation.
Fixes: d6cffbbe9a ("proc/sysctl: prune stale dentries during unregistering")
Reported-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Tested-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Suggested-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d6cffbbe9a upstream.
Currently unregistering sysctl table does not prune its dentries.
Stale dentries could slowdown sysctl operations significantly.
For example, command:
# for i in {1..100000} ; do unshare -n -- sysctl -a &> /dev/null ; done
creates a millions of stale denties around sysctls of loopback interface:
# sysctl fs.dentry-state
fs.dentry-state = 25812579 24724135 45 0 0 0
All of them have matching names thus lookup have to scan though whole
hash chain and call d_compare (proc_sys_compare) which checks them
under system-wide spinlock (sysctl_lock).
# time sysctl -a > /dev/null
real 1m12.806s
user 0m0.016s
sys 1m12.400s
Currently only memory reclaimer could remove this garbage.
But without significant memory pressure this never happens.
This patch collects sysctl inodes into list on sysctl table header and
prunes all their dentries once that table unregisters.
Konstantin Khlebnikov <khlebnikov@yandex-team.ru> writes:
> On 10.02.2017 10:47, Al Viro wrote:
>> how about >> the matching stats *after* that patch?
>
> dcache size doesn't grow endlessly, so stats are fine
>
> # sysctl fs.dentry-state
> fs.dentry-state = 92712 58376 45 0 0 0
>
> # time sysctl -a &>/dev/null
>
> real 0m0.013s
> user 0m0.004s
> sys 0m0.008s
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 119e1ef80e upstream.
__legitimize_mnt() has two problems - one is that in case of success
the check of mount_lock is not ordered wrt preceding increment of
refcount, making it possible to have successful __legitimize_mnt()
on one CPU just before the otherwise final mntpu() on another,
with __legitimize_mnt() not seeing mntput() taking the lock and
mntput() not seeing the increment done by __legitimize_mnt().
Solved by a pair of barriers.
Another is that failure of __legitimize_mnt() on the second
read_seqretry() leaves us with reference that'll need to be
dropped by caller; however, if that races with final mntput()
we can end up with caller dropping rcu_read_lock() and doing
mntput() to release that reference - with the first mntput()
having freed the damn thing just as rcu_read_lock() had been
dropped. Solution: in "do mntput() yourself" failure case
grab mount_lock, check if MNT_DOOMED has been set by racing
final mntput() that has missed our increment and if it has -
undo the increment and treat that as "failure, caller doesn't
need to drop anything" case.
It's not easy to hit - the final mntput() has to come right
after the first read_seqretry() in __legitimize_mnt() *and*
manage to miss the increment done by __legitimize_mnt() before
the second read_seqretry() in there. The things that are almost
impossible to hit on bare hardware are not impossible on SMP
KVM, though...
Reported-by: Oleg Nesterov <oleg@redhat.com>
Fixes: 48a066e72d ("RCU'd vsfmounts")
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9ea0a46ca2 upstream.
mntput_no_expire() does the calculation of total refcount under mount_lock;
unfortunately, the decrement (as well as all increments) are done outside
of it, leading to false positives in the "are we dropping the last reference"
test. Consider the following situation:
* mnt is a lazy-umounted mount, kept alive by two opened files. One
of those files gets closed. Total refcount of mnt is 2. On CPU 42
mntput(mnt) (called from __fput()) drops one reference, decrementing component
* After it has looked at component #0, the process on CPU 0 does
mntget(), incrementing component #0, gets preempted and gets to run again -
on CPU 69. There it does mntput(), which drops the reference (component #69)
and proceeds to spin on mount_lock.
* On CPU 42 our first mntput() finishes counting. It observes the
decrement of component #69, but not the increment of component #0. As the
result, the total it gets is not 1 as it should've been - it's 0. At which
point we decide that vfsmount needs to be killed and proceed to free it and
shut the filesystem down. However, there's still another opened file
on that filesystem, with reference to (now freed) vfsmount, etc. and we are
screwed.
It's not a wide race, but it can be reproduced with artificial slowdown of
the mnt_get_count() loop, and it should be easier to hit on SMP KVM setups.
Fix consists of moving the refcount decrement under mount_lock; the tricky
part is that we want (and can) keep the fast case (i.e. mount that still
has non-NULL ->mnt_ns) entirely out of mount_lock. All places that zero
mnt->mnt_ns are dropping some reference to mnt and they call synchronize_rcu()
before that mntput(). IOW, if mntput() observes (under rcu_read_lock())
a non-NULL ->mnt_ns, it is guaranteed that there is another reference yet to
be dropped.
Reported-by: Jann Horn <jannh@google.com>
Tested-by: Jann Horn <jannh@google.com>
Fixes: 48a066e72d ("RCU'd vsfmounts")
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4c0d7cd5c8 upstream.
RCU pathwalk relies upon the assumption that anything that changes
->d_inode of a dentry will invalidate its ->d_seq. That's almost
true - the one exception is that the final dput() of already unhashed
dentry does *not* touch ->d_seq at all. Unhashing does, though,
so for anything we'd found by RCU dcache lookup we are fine.
Unfortunately, we can *start* with an unhashed dentry or jump into
it.
We could try and be careful in the (few) places where that could
happen. Or we could just make the final dput() invalidate the damn
thing, unhashed or not. The latter is much simpler and easier to
backport, so let's do it that way.
Reported-by: "Dae R. Jeong" <threeearcat@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 90bad5e05b upstream.
Since mountpoint crossing can happen without leaving lazy mode,
root dentries do need the same protection against having their
memory freed without RCU delay as everything else in the tree.
It's partially hidden by RCU delay between detaching from the
mount tree and dropping the vfsmount reference, but the starting
point of pathwalk can be on an already detached mount, in which
case umount-caused RCU delay has already passed by the time the
lazy pathwalk grabs rcu_read_lock(). If the starting point
happens to be at the root of that vfsmount *and* that vfsmount
covers the entire filesystem, we get trouble.
Fixes: 48a066e72d ("RCU'd vsfmounts")
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b5b1404d08 upstream.
This is purely a preparatory patch for upcoming changes during the 4.19
merge window.
We have a function called "boot_cpu_state_init()" that isn't really
about the bootup cpu state: that is done much earlier by the similarly
named "boot_cpu_init()" (note lack of "state" in name).
This function initializes some hotplug CPU state, and needs to run after
the percpu data has been properly initialized. It even has a comment to
that effect.
Except it _doesn't_ actually run after the percpu data has been properly
initialized. On x86 it happens to do that, but on at least arm and
arm64, the percpu base pointers are initialized by the arch-specific
'smp_prepare_boot_cpu()' hook, which ran _after_ boot_cpu_state_init().
This had some unexpected results, and in particular we have a patch
pending for the merge window that did the obvious cleanup of using
'this_cpu_write()' in the cpu hotplug init code:
- per_cpu_ptr(&cpuhp_state, smp_processor_id())->state = CPUHP_ONLINE;
+ this_cpu_write(cpuhp_state.state, CPUHP_ONLINE);
which is obviously the right thing to do. Except because of the
ordering issue, it actually failed miserably and unexpectedly on arm64.
So this just fixes the ordering, and changes the name of the function to
be 'boot_cpu_hotplug_init()' to make it obvious that it's about cpu
hotplug state, because the core CPU state was supposed to have already
been done earlier.
Marked for stable, since the (not yet merged) patch that will show this
problem is marked for stable.
Reported-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Mian Yousaf Kaukab <yousaf.kaukab@suse.com>
Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1214fd7b49 upstream.
Surround scsi_execute() calls with scsi_autopm_get_device() and
scsi_autopm_put_device(). Note: removing sr_mutex protection from the
scsi_cd_get() and scsi_cd_put() calls is safe because the purpose of
sr_mutex is to serialize cdrom_*() calls.
This patch avoids that complaints similar to the following appear in the
kernel log if runtime power management is enabled:
INFO: task systemd-udevd:650 blocked for more than 120 seconds.
Not tainted 4.18.0-rc7-dbg+ #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
systemd-udevd D28176 650 513 0x00000104
Call Trace:
__schedule+0x444/0xfe0
schedule+0x4e/0xe0
schedule_preempt_disabled+0x18/0x30
__mutex_lock+0x41c/0xc70
mutex_lock_nested+0x1b/0x20
__blkdev_get+0x106/0x970
blkdev_get+0x22c/0x5a0
blkdev_open+0xe9/0x100
do_dentry_open.isra.19+0x33e/0x570
vfs_open+0x7c/0xd0
path_openat+0x6e3/0x1120
do_filp_open+0x11c/0x1c0
do_sys_open+0x208/0x2d0
__x64_sys_openat+0x59/0x70
do_syscall_64+0x77/0x230
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Maurizio Lombardi <mlombard@redhat.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: <stable@vger.kernel.org>
Tested-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fdcb613d49 upstream.
The LPSS PWM device on on Bay Trail and Cherry Trail devices has a set
of private registers at offset 0x800, the current lpss_device_desc for
them already sets the LPSS_SAVE_CTX flag to have these saved/restored
over device-suspend, but the current lpss_device_desc was not setting
the prv_offset field, leading to the regular device registers getting
saved/restored instead.
This is causing the PWM controller to no longer work, resulting in a black
screen, after a suspend/resume on systems where the firmware clears the
APB clock and reset bits at offset 0x804.
This commit fixes this by properly setting prv_offset to 0x800 for
the PWM devices.
Cc: stable@vger.kernel.org
Fixes: e1c7481797 ("ACPI / LPSS: Add Intel BayTrail ACPI mode PWM")
Fixes: 1bfbd8eb8a ("ACPI / LPSS: Add ACPI IDs for Intel Braswell")
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Rafael J . Wysocki <rjw@rjwysocki.net>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3c53776e29 upstream.
Way back in 4.9, we committed 4cd13c21b2 ("softirq: Let ksoftirqd do
its job"), and ever since we've had small nagging issues with it. For
example, we've had:
1ff688209e ("watchdog: core: make sure the watchdog_worker is not deferred")
8d5755b3f7 ("watchdog: softdog: fire watchdog even if softirqs do not get to run")
217f697436 ("net: busy-poll: allow preemption in sk_busy_loop()")
all of which worked around some of the effects of that commit.
The DVB people have also complained that the commit causes excessive USB
URB latencies, which seems to be due to the USB code using tasklets to
schedule USB traffic. This seems to be an issue mainly when already
living on the edge, but waiting for ksoftirqd to handle it really does
seem to cause excessive latencies.
Now Hanna Hawa reports that this issue isn't just limited to USB URB and
DVB, but also causes timeout problems for the Marvell SoC team:
"I'm facing kernel panic issue while running raid 5 on sata disks
connected to Macchiatobin (Marvell community board with Armada-8040
SoC with 4 ARMv8 cores of CA72) Raid 5 built with Marvell DMA engine
and async_tx mechanism (ASYNC_TX_DMA [=y]); the DMA driver (mv_xor_v2)
uses a tasklet to clean the done descriptors from the queue"
The latency problem causes a panic:
mv_xor_v2 f0400000.xor: dma_sync_wait: timeout!
Kernel panic - not syncing: async_tx_quiesce: DMA error waiting for transaction
We've discussed simply just reverting the original commit entirely, and
also much more involved solutions (with per-softirq threads etc). This
patch is intentionally stupid and fairly limited, because the issue
still remains, and the other solutions either got sidetracked or had
other issues.
We should probably also consider the timer softirqs to be synchronous
and not be delayed to ksoftirqd (since they were the issue with the
earlier watchdog problems), but that should be done as a separate patch.
This does only the tasklet cases.
Reported-and-tested-by: Hanna Hawa <hannah@marvell.com>
Reported-and-tested-by: Josef Griebichler <griebichler.josef@gmx.at>
Reported-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fedb8da963 upstream.
For years I thought all parisc machines executed loads and stores in
order. However, Jeff Law recently indicated on gcc-patches that this is
not correct. There are various degrees of out-of-order execution all the
way back to the PA7xxx processor series (hit-under-miss). The PA8xxx
series has full out-of-order execution for both integer operations, and
loads and stores.
This is described in the following article:
http://web.archive.org/web/20040214092531/http://www.cpus.hp.com/technical_references/advperf.shtml
For this reason, we need to define mb() and to insert a memory barrier
before the store unlocking spinlocks. This ensures that all memory
accesses are complete prior to unlocking. The ldcw instruction performs
the same function on entry.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Cc: stable@vger.kernel.org # 4.0+
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 66509a276c upstream.
Enable the -mlong-calls compiler option by default, because otherwise in most
cases linking the vmlinux binary fails due to truncations of R_PARISC_PCREL22F
relocations. This fixes building the 64-bit defconfig.
Cc: stable@vger.kernel.org # 4.0+
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3ab2011ea3 upstream.
There is a race condition in tpm_common_write function allowing
two threads on the same /dev/tpm<N>, or two different applications
on the same /dev/tpmrm<N> to overwrite each other commands/responses.
Fixed this by taking the priv->buffer_mutex early in the function.
Also converted the priv->data_pending from atomic to a regular size_t
type. There is no need for it to be atomic since it is only touched
under the protection of the priv->buffer_mutex.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Cc: stable@vger.kernel.org
Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5012284700 upstream.
Commit 8844618d8a: "ext4: only look at the bg_flags field if it is
valid" will complain if block group zero does not have the
EXT4_BG_INODE_ZEROED flag set. Unfortunately, this is not correct,
since a freshly created file system has this flag cleared. It gets
almost immediately after the file system is mounted read-write --- but
the following somewhat unlikely sequence will end up triggering a
false positive report of a corrupted file system:
mkfs.ext4 /dev/vdc
mount -o ro /dev/vdc /vdc
mount -o remount,rw /dev/vdc
Instead, when initializing the inode table for block group zero, test
to make sure that itable_unused count is not too large, since that is
the case that will result in some or all of the reserved inodes
getting cleared.
This fixes the failures reported by Eric Whiteney when running
generic/230 and generic/231 in the the nojournal test case.
Fixes: 8844618d8a ("ext4: only look at the bg_flags field if it is valid")
Reported-by: Eric Whitney <enwlinux@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b697d7d8c7 upstream.
The __get_txreq() function can return a pointer, ERR_PTR(-EBUSY), or NULL.
All of the relevant call sites look for IS_ERR, so the NULL return would
lead to a NULL pointer exception.
Do not use the ERR_PTR mechanism for this function.
Update all call sites to handle the return value correctly.
Clean up error paths to reflect return value.
Fixes: 45842abbb2 ("staging/rdma/hfi1: move txreq header code")
Cc: <stable@vger.kernel.org> # 4.9.x+
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Kamenee Arumugam <kamenee.arumugam@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e01e80634e upstream.
One of the classes of kernel stack content leaks[1] is exposing the
contents of prior heap or stack contents when a new process stack is
allocated. Normally, those stacks are not zeroed, and the old contents
remain in place. In the face of stack content exposure flaws, those
contents can leak to userspace.
Fixing this will make the kernel no longer vulnerable to these flaws, as
the stack will be wiped each time a stack is assigned to a new process.
There's not a meaningful change in runtime performance; it almost looks
like it provides a benefit.
Performing back-to-back kernel builds before:
Run times: 157.86 157.09 158.90 160.94 160.80
Mean: 159.12
Std Dev: 1.54
and after:
Run times: 159.31 157.34 156.71 158.15 160.81
Mean: 158.46
Std Dev: 1.46
Instead of making this a build or runtime config, Andy Lutomirski
recommended this just be enabled by default.
[1] A noisy search for many kinds of stack content leaks can be seen here:
https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=linux+kernel+stack+leak
I did some more with perf and cycle counts on running 100,000 execs of
/bin/true.
before:
Cycles: 218858861551 218853036130 214727610969 227656844122 224980542841
Mean: 221015379122.60
Std Dev: 4662486552.47
after:
Cycles: 213868945060 213119275204 211820169456 224426673259 225489986348
Mean: 217745009865.40
Std Dev: 5935559279.99
It continues to look like it's faster, though the deviation is rather
wide, but I'm not sure what I could do that would be less noisy. I'm
open to ideas!
Link: http://lkml.kernel.org/r/20180221021659.GA37073@beast
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[ Srivatsa: Backported to 4.9.y ]
Signed-off-by: Srivatsa S. Bhat <srivatsa@csail.mit.edu>
Reviewed-by: Srinidhi Rao <srinidhir@vmware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 58152ecbbc upstream.
In case skb in out_or_order_queue is the result of
multiple skbs coalescing, we would like to get a proper gso_segs
counter tracking, so that future tcp_drop() can report an accurate
number.
I chose to not implement this tracking for skbs in receive queue,
since they are not dropped, unless socket is disconnected.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bd3599a0e1 upstream.
When we clone a range into a file we can end up dropping existing
extent maps (or trimming them) and replacing them with new ones if the
range to be cloned overlaps with a range in the destination inode.
When that happens we add the new extent maps to the list of modified
extents in the inode's extent map tree, so that a "fast" fsync (the flag
BTRFS_INODE_NEEDS_FULL_SYNC not set in the inode) will see the extent maps
and log corresponding extent items. However, at the end of range cloning
operation we do truncate all the pages in the affected range (in order to
ensure future reads will not get stale data). Sometimes this truncation
will release the corresponding extent maps besides the pages from the page
cache. If this happens, then a "fast" fsync operation will miss logging
some extent items, because it relies exclusively on the extent maps being
present in the inode's extent tree, leading to data loss/corruption if
the fsync ends up using the same transaction used by the clone operation
(that transaction was not committed in the meanwhile). An extent map is
released through the callback btrfs_invalidatepage(), which gets called by
truncate_inode_pages_range(), and it calls __btrfs_releasepage(). The
later ends up calling try_release_extent_mapping() which will release the
extent map if some conditions are met, like the file size being greater
than 16Mb, gfp flags allow blocking and the range not being locked (which
is the case during the clone operation) nor being the extent map flagged
as pinned (also the case for cloning).
The following example, turned into a test for fstests, reproduces the
issue:
$ mkfs.btrfs -f /dev/sdb
$ mount /dev/sdb /mnt
$ xfs_io -f -c "pwrite -S 0x18 9000K 6908K" /mnt/foo
$ xfs_io -f -c "pwrite -S 0x20 2572K 156K" /mnt/bar
$ xfs_io -c "fsync" /mnt/bar
# reflink destination offset corresponds to the size of file bar,
# 2728Kb minus 4Kb.
$ xfs_io -c ""reflink ${SCRATCH_MNT}/foo 0 2724K 15908K" /mnt/bar
$ xfs_io -c "fsync" /mnt/bar
$ md5sum /mnt/bar
95a95813a8c2abc9aa75a6c2914a077e /mnt/bar
<power fail>
$ mount /dev/sdb /mnt
$ md5sum /mnt/bar
207fd8d0b161be8a84b945f0df8d5f8d /mnt/bar
# digest should be 95a95813a8c2abc9aa75a6c2914a077e like before the
# power failure
In the above example, the destination offset of the clone operation
corresponds to the size of the "bar" file minus 4Kb. So during the clone
operation, the extent map covering the range from 2572Kb to 2728Kb gets
trimmed so that it ends at offset 2724Kb, and a new extent map covering
the range from 2724Kb to 11724Kb is created. So at the end of the clone
operation when we ask to truncate the pages in the range from 2724Kb to
2724Kb + 15908Kb, the page invalidation callback ends up removing the new
extent map (through try_release_extent_mapping()) when the page at offset
2724Kb is passed to that callback.
Fix this by setting the bit BTRFS_INODE_NEEDS_FULL_SYNC whenever an extent
map is removed at try_release_extent_mapping(), forcing the next fsync to
search for modified extents in the fs/subvolume tree instead of relying on
the presence of extent maps in memory. This way we can continue doing a
"fast" fsync if the destination range of a clone operation does not
overlap with an existing range or if any of the criteria necessary to
remove an extent map at try_release_extent_mapping() is not met (file
size not bigger then 16Mb or gfp flags do not allow blocking).
CC: stable@vger.kernel.org # 3.16+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 73c8d89455 upstream.
Maintain the tracing on/off setting of the ring_buffer when switching
to the trace buffer snapshot.
Taking a snapshot is done by swapping the backup ring buffer
(max_tr_buffer). But since the tracing on/off setting is defined
by the ring buffer, when swapping it, the tracing on/off setting
can also be changed. This causes a strange result like below:
/sys/kernel/debug/tracing # cat tracing_on
1
/sys/kernel/debug/tracing # echo 0 > tracing_on
/sys/kernel/debug/tracing # cat tracing_on
0
/sys/kernel/debug/tracing # echo 1 > snapshot
/sys/kernel/debug/tracing # cat tracing_on
1
/sys/kernel/debug/tracing # echo 1 > snapshot
/sys/kernel/debug/tracing # cat tracing_on
0
We don't touch tracing_on, but snapshot changes tracing_on
setting each time. This is an anomaly, because user doesn't know
that each "ring_buffer" stores its own tracing-enable state and
the snapshot is done by swapping ring buffers.
Link: http://lkml.kernel.org/r/153149929558.11274.11730609978254724394.stgit@devbox
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Cc: Hiraku Toyooka <hiraku.toyooka@cybertrust.co.jp>
Cc: stable@vger.kernel.org
Fixes: debdd57f51 ("tracing: Make a snapshot feature available from userspace")
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
[ Updated commit log and comment in the code ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a0040c0145 upstream.
Hyper-V instances support PCI pass-through which is implemented through PV
pci-hyperv driver. When a device is passed through, a new root PCI bus is
created in the guest. The bus sits on top of VMBus and has no associated
information in ACPI. acpi_pci_add_bus() in this case proceeds all the way
to acpi_evaluate_dsm(), which reports
ACPI: \: failed to evaluate _DSM (0x1001)
While acpi_pci_slot_enumerate() and acpiphp_enumerate_slots() are protected
against ACPI_HANDLE() being NULL and do nothing, acpi_evaluate_dsm() is not
and gives us the error. It seems the correct fix is to not do anything in
acpi_pci_add_bus() in such cases.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Sinan Kaya <okaya@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 44de022c43 upstream.
Ext4_check_descriptors() was getting called before s_gdb_count was
initialized. So for file systems w/o the meta_bg feature, allocation
bitmaps could overlap the block group descriptors and ext4 wouldn't
notice.
For file systems with the meta_bg feature enabled, there was a
fencepost error which would cause the ext4_check_descriptors() to
incorrectly believe that the block allocation bitmap overlaps with the
block group descriptor blocks, and it would reject the mount.
Fix both of these problems.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Benjamin Gilbert <bgilbert@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 61f4b23769 ]
On i386 nlk->ngroups might be 32 or 0. Which leads to UB, resulting in
hang during boot.
Check for 0 ngroups and use (unsigned long long) as a type to shift.
Fixes: 7acf9d4237 ("netlink: Do not subscribe to non-existent groups").
Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d1f0301b33 upstream.
The support of force threading interrupts which are set up with both a
primary and a threaded handler wreckaged the setup of regular requested
threaded interrupts (primary handler == NULL).
The reason is that it does not check whether the primary handler is set to
the default handler which wakes the handler thread. Instead it replaces the
thread handler with the primary handler as it would do with force threaded
interrupts which have been requested via request_irq(). So both the primary
and the thread handler become the same which then triggers the warnon that
the thread handler tries to wakeup a not configured secondary thread.
Fortunately this only happens when the driver omits the IRQF_ONESHOT flag
when requesting the threaded interrupt, which is normaly caught by the
sanity checks when force irq threading is disabled.
Fix it by skipping the force threading setup when a regular threaded
interrupt is requested. As a consequence the interrupt request which lacks
the IRQ_ONESHOT flag is rejected correctly instead of silently wreckaging
it.
Fixes: 2a1d3ab898 ("genirq: Handle force threading of irqs with primary and thread handler")
Reported-by: Kurt Kanzenbach <kurt.kanzenbach@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Kurt Kanzenbach <kurt.kanzenbach@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b08abbd9f5 upstream.
During unload process, the chip can encounter problem where a FW dump would
be captured. For this case, the full reset sequence will be skip to bring
the chip back to full operational state.
Fixes: e315cd28b9 ("[SCSI] qla2xxx: Code changes for qla data structure refactoring")
Cc: <stable@vger.kernel.org>
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 46d8c4b286 upstream.
This was detected by the self-test thanks to Ard's chunking patch.
I finally got around to testing this out on my ancient Via box. It
turns out that the workaround got the assembly wrong and we end up
doing count + initial cycles of the loop instead of just count.
This obviously causes corruption, either by overwriting the source
that is yet to be processed, or writing over the end of the buffer.
On CPUs that don't require the workaround only ECB is affected.
On Nano CPUs both ECB and CBC are affected.
This patch fixes it by doing the subtraction prior to the assembly.
Fixes: a76c1c23d0 ("crypto: padlock-aes - work around Nano CPU...")
Cc: <stable@vger.kernel.org>
Reported-by: Jamie Heilman <jamie@audible.transient.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 63aff65573 upstream.
VPID for the nested vcpu is allocated at vmx_create_vcpu whenever nested
vmx is turned on with the module parameter.
However, it's only freed if the L1 guest has executed VMXON which is not
a given.
As a result, on a system with nested==on every creation+deletion of an
L1 vcpu without running an L2 guest results in leaking one vpid. Since
the total number of vpids is limited to 64k, they can eventually get
exhausted, preventing L2 from starting.
Delay allocation of the L2 vpid until VMXON emulation, thus matching its
freeing.
Fixes: 5c614b3583
Cc: stable@vger.kernel.org
Signed-off-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c8e8cd579b upstream.
'call' is a user-controlled value, so sanitize the array index after the
bounds check to avoid speculating past the bounds of the 'nargs' array.
Found with the help of Smatch:
net/socket.c:2508 __do_sys_socketcall() warn: potential spectre issue
'nargs' [r] (local cap)
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jeremy Cline <jcline@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 72c05f32f4 upstream.
ems_usb_probe() allocates memory for dev->tx_msg_buffer, but there
is no its deallocation in ems_usb_disconnect().
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Anton Vasilyev <vasilyev@ispras.ru>
Cc: <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 71755ee535 upstream.
The squashfs fragment reading code doesn't actually verify that the
fragment is inside the fragment table. The end result _is_ verified to
be inside the image when actually reading the fragment data, but before
that is done, we may end up taking a page fault because the fragment
table itself might not even exist.
Another report from Anatoly and his endless squashfs image fuzzing.
Reported-by: Анатолий Тросиненко <anatoly.trosinenko@gmail.com>
Acked-by:: Phillip Lougher <phillip.lougher@gmail.com>,
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d512584780 upstream.
Anatoly reports another squashfs fuzzing issue, where the decompression
parameters themselves are in a compressed block.
This causes squashfs_read_data() to be called in order to read the
decompression options before the decompression stream having been set
up, making squashfs go sideways.
Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com>
Acked-by: Phillip Lougher <phillip.lougher@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit bc5b6c0b62 ]
'protocol' is a user-controlled value, so sanitize it after the bounds
check to avoid using it for speculative out-of-bounds access to arrays
indexed by it.
This addresses the following accesses detected with the help of smatch:
* net/netlink/af_netlink.c:654 __netlink_create() warn: potential
spectre issue 'nlk_cb_mutex_keys' [w]
* net/netlink/af_netlink.c:654 __netlink_create() warn: potential
spectre issue 'nlk_cb_mutex_key_strings' [w]
* net/netlink/af_netlink.c:685 netlink_create() warn: potential spectre
issue 'nl_table' [w] (local cap)
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Jeremy Cline <jcline@redhat.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a94c689e6c ]
If a DSA slave network device was previously disabled, there is no need
to suspend or resume it.
Fixes: 2446254915 ("net: dsa: allow switch drivers to implement suspend/resume hooks")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4672694bd4 ]
ip_frag_queue() might call pskb_pull() on one skb that
is already in the fragment queue.
We need to take care of possible truesize change, or we
might have an imbalance of the netns frags memory usage.
IPv6 is immune to this bug, because RFC5722, Section 4,
amended by Errata ID 3089 states :
When reassembling an IPv6 datagram, if
one or more its constituent fragments is determined to be an
overlapping fragment, the entire datagram (and any constituent
fragments) MUST be silently discarded.
Fixes: 158f323b98 ("net: adjust skb->truesize in pskb_expand_head()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 56e2c94f05 ]
We currently check current frags memory usage only when
a new frag queue is created. This allows attackers to first
consume the memory budget (default : 4 MB) creating thousands
of frag queues, then sending tiny skbs to exceed high_thresh
limit by 2 to 3 order of magnitude.
Note that before commit 648700f76b ("inet: frags: use rhashtables
for reassembly units"), work queue could be starved under DOS,
getting no cpu cycles.
After commit 648700f76b, only the per frag queue timer can eventually
remove an incomplete frag queue and its skbs.
Fixes: b13d3cbfb8 ("inet: frag: move eviction of queues to work queue")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Jann Horn <jannh@google.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: Peter Oskolkov <posk@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 35a2897c2a upstream.
Steven Rostedt reported a potential race in RCU core because of
swake_up():
CPU0 CPU1
---- ----
__call_rcu_core() {
spin_lock(rnp_root)
need_wake = __rcu_start_gp() {
rcu_start_gp_advanced() {
gp_flags = FLAG_INIT
}
}
rcu_gp_kthread() {
swait_event_interruptible(wq,
gp_flags & FLAG_INIT) {
spin_lock(q->lock)
*fetch wq->task_list here! *
list_add(wq->task_list, q->task_list)
spin_unlock(q->lock);
*fetch old value of gp_flags here *
spin_unlock(rnp_root)
rcu_gp_kthread_wake() {
swake_up(wq) {
swait_active(wq) {
list_empty(wq->task_list)
} * return false *
if (condition) * false *
schedule();
In this case, a wakeup is missed, which could cause the rcu_gp_kthread
waits for a long time.
The reason of this is that we do a lockless swait_active() check in
swake_up(). To fix this, we can either 1) add a smp_mb() in swake_up()
before swait_active() to provide the proper order or 2) simply remove
the swait_active() in swake_up().
The solution 2 not only fixes this problem but also keeps the swait and
wait API as close as possible, as wake_up() doesn't provide a full
barrier and doesn't do a lockless check of the wait queue either.
Moreover, there are users already using swait_active() to do their quick
checks for the wait queues, so it make less sense that swake_up() and
swake_up_all() do this on their own.
This patch then removes the lockless swait_active() check in swake_up()
and swake_up_all().
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Krister Johansen <kjlx@templeofstupid.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170615041828.zk3a3sfyudm5p6nl@tardis
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: David Chen <david.chen@nutanix.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 15ecbe94a4 ]
Larry Brakmo proposal ( https://patchwork.ozlabs.org/patch/935233/
tcp: force cwnd at least 2 in tcp_cwnd_reduction) made us rethink
about our recent patch removing ~16 quick acks after ECN events.
tcp_enter_quickack_mode(sk, 1) makes sure one immediate ack is sent,
but in the case the sender cwnd was lowered to 1, we do not want
to have a delayed ack for the next packet we will receive.
Fixes: 522040ea5f ("tcp: do not aggressively quick ack after ECN events")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Neal Cardwell <ncardwell@google.com>
Cc: Lawrence Brakmo <brakmo@fb.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 522040ea5f ]
ECN signals currently forces TCP to enter quickack mode for
up to 16 (TCP_MAX_QUICKACKS) following incoming packets.
We believe this is not needed, and only sending one immediate ack
for the current packet should be enough.
This should reduce the extra load noticed in DCTCP environments,
after congestion events.
This is part 2 of our effort to reduce pure ACK packets.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 9a9c9b51e5 ]
We want to add finer control of the number of ACK packets sent after
ECN events.
This patch is not changing current behavior, it only enables following
change.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a3893637e1 ]
As explained in commit 9f9843a751 ("tcp: properly handle stretch
acks in slow start"), TCP stacks have to consider how many packets
are acknowledged in one single ACK, because of GRO, but also
because of ACK compression or losses.
We plan to add SACK compression in the following patch, we
must therefore not call tcp_enter_quickack_mode()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 822fb18a82 ]
When loading module manually, after call xenbus_switch_state to initializes
the state of the netfront device, the driver state did not change so fast
that may lead no dev created in latest kernel. This patch adds wait to make
sure xenbus knows the driver is not in closed/unknown state.
Current state:
[vm]# ethtool eth0
Settings for eth0:
Link detected: yes
[vm]# modprobe -r xen_netfront
[vm]# modprobe xen_netfront
[vm]# ethtool eth0
Settings for eth0:
Cannot get device settings: No such device
Cannot get wake-on-lan settings: No such device
Cannot get message level: No such device
Cannot get link status: No such device
No data available
With the patch installed.
[vm]# ethtool eth0
Settings for eth0:
Link detected: yes
[vm]# modprobe -r xen_netfront
[vm]# modprobe xen_netfront
[vm]# ethtool eth0
Settings for eth0:
Link detected: yes
Signed-off-by: Xiao Liang <xiliang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 383d470936 ]
For some very small BDPs (with just a few packets) there was a
quantization effect where the target number of packets in flight
during the super-unity-gain (1.25x) phase of gain cycling was
implicitly truncated to a number of packets no larger than the normal
unity-gain (1.0x) phase of gain cycling. This meant that in multi-flow
scenarios some flows could get stuck with a lower bandwidth, because
they did not push enough packets inflight to discover that there was
more bandwidth available. This was really only an issue in multi-flow
LAN scenarios, where RTTs and BDPs are low enough for this to be an
issue.
This fix ensures that gain cycling can raise inflight for small BDPs
by ensuring that in PROBE_BW mode target inflight values with a
super-unity gain are always greater than inflight values with a gain
<= 1. Importantly, this applies whether the inflight value is
calculated for use as a cwnd value, or as a target inflight value for
the end of the super-unity phase in bbr_is_next_cycle_phase() (both
need to be bigger to ensure we can probe with more packets in flight
reliably).
This is a candidate fix for stable releases.
Fixes: 0f8782ea14 ("tcp_bbr: add BBR congestion control")
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Priyaranjan Jha <priyarjha@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 9939a46d90 ]
As for today STMMAC_ALIGN macro (which is used to align DMA stuff)
relies on L1 line length (L1_CACHE_BYTES).
This isn't correct in case of system with several cache levels
which might have L1 cache line length smaller than L2 line. This
can lead to sharing one cache line between DMA buffer and other
data, so we can lose this data while invalidate DMA buffer before
DMA transaction.
Fix that by using SMP_CACHE_BYTES instead of L1_CACHE_BYTES for
aligning.
Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b0753408aa ]
mdio_mux_iproc_probe() uses platform_set_drvdata() to store md pointer
in device, whereas mdio_mux_iproc_remove() restores md pointer by
dev_get_platdata(&pdev->dev). This leads to wrong resources release.
The patch replaces getter to platform_get_drvdata.
Fixes: 98bc865a1e ("net: mdio-mux: Add MDIO mux driver for iProc SoCs")
Signed-off-by: Anton Vasilyev <vasilyev@ispras.ru>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 136f55f660 ]
As long the bh tasklet isn't scheduled once, no packet from the rx path
will be handled. Since the tx path also schedule the same tasklet
this situation only persits until the first packet transmission.
So fix this issue by scheduling the tasklet after link reset.
Link: https://github.com/raspberrypi/linux/issues/2617
Fixes: 55d7de9de6 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet")
Suggested-by: Floris Bos <bos@je-eigen-domein.nl>
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 7f3fc7ddf7 ]
If we enable or disable xgbe flow-control by ethtool ,
it does't work.Because the parameter is not properly
assigned,so we need to adjust the assignment order
of the parameters.
Fixes: c1ce2f7736 ("amd-xgbe: Fix flow control setting logic")
Signed-off-by: tangpengpeng <tangpengpeng@higon.com>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 101f0cd4f2 ]
UBSAN triggers the following undefined behaviour warnings:
[...]
[ 13.236124] UBSAN: Undefined behaviour in drivers/net/ethernet/amazon/ena/ena_eth_com.c:468:22
[ 13.240043] shift exponent 64 is too large for 64-bit type 'long long unsigned int'
[...]
[ 13.744769] UBSAN: Undefined behaviour in drivers/net/ethernet/amazon/ena/ena_eth_com.c:373:4
[ 13.748694] shift exponent 64 is too large for 64-bit type 'long long unsigned int'
[...]
When splitting the address to high and low, GENMASK_ULL is used to generate
a bitmask with dma_addr_bits field from io_sq (in ena_com_prepare_tx and
ena_com_add_single_rx_desc).
The problem is that dma_addr_bits is not initialized with a proper value
(besides being cleared in ena_com_create_io_queue).
Assign dma_addr_bits the correct value that is stored in ena_dev when
initializing the SQ.
Fixes: 1738cd3ed3 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
Signed-off-by: Gal Pressman <pressmangal@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 9fc12023d6 ]
Remove BUG_ON() from fib_compute_spec_dst routine and check
in_dev pointer during flowi4 data structure initialization.
fib_compute_spec_dst routine can be run concurrently with device removal
where ip_ptr net_device pointer is set to NULL. This can happen
if userspace enables pkt info on UDP rx socket and the device
is removed while traffic is flowing
Fixes: 35ebf65e85 ("ipv4: Create and use fib_compute_spec_dst() helper")
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit eee1fe6476 upstream.
When a port is brought up/down do not enable/disable only the TXMAC
but the RXMAC as well. This is essential for the CPU port to work.
Fixes: 6b93fb4648 ("net-next: dsa: add new driver for qca8xxx family")
Signed-off-by: Michal Vokáč <michal.vokac@ysoft.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 79a4ed4f0f upstream.
By default autonegotiation is enabled to configure MAC on all ports.
For the CPU port autonegotiation can not be used so we need to set
some sensible defaults manually.
This patch forces the default setting of the CPU port to 1000Mbps/full
duplex which is the chip maximum capability.
Also correct size of the bit field used to configure link speed.
Fixes: 6b93fb4648 ("net-next: dsa: add new driver for qca8xxx family")
Signed-off-by: Michal Vokáč <michal.vokac@ysoft.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 940efcc888 upstream.
Flows can be created on UD and RAW_PACKET QP types. Attempts to provide
other QP types as an input causes to various unpredictable failures.
The reason is that in order to support all various types (e.g. XRC), we
are supposed to use real_qp handle and not qp handle and expect to
driver/FW to fail such (XRC) flows. The simpler and safer variant is to
ban all QP types except UD and RAW_PACKET, instead of relying on
driver/FW.
Cc: <stable@vger.kernel.org> # 3.11
Fixes: 436f2ad05a ("IB/core: Export ib_create/destroy_flow through uverbs")
Cc: syzkaller <syzkaller@googlegroups.com>
Reported-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8d5a803c6a upstream.
With commit 044e6e3d74: "ext4: don't update checksum of new
initialized bitmaps" the buffer valid bit will get set without
actually setting up the checksum for the allocation bitmap, since the
checksum will get calculated once we actually allocate an inode or
block.
If we are doing this, then we need to (re-)check the verified bit
after we take the block group lock. Otherwise, we could race with
another process reading and verifying the bitmap, which would then
complain about the checksum being invalid.
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1780137
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 362eca70b5 upstream.
The inline data code was updating the raw inode directly; this is
problematic since if metadata checksums are enabled,
ext4_mark_inode_dirty() must be called to update the inode's checksum.
In addition, the jbd2 layer requires that get_write_access() be called
before the metadata buffer is modified. Fix both of these problems.
https://bugzilla.kernel.org/show_bug.cgi?id=200443
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 01cfb7937a upstream.
Anatoly Trosinenko reports that a corrupted squashfs image can cause a
kernel oops. It turns out that squashfs can end up being confused about
negative fragment lengths.
The regular squashfs_read_data() does check for negative lengths, but
squashfs_read_metadata() did not, and the fragment size code just
blindly trusted the on-disk value. Fix both the fragment parsing and
the metadata reading code.
Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Phillip Lougher <phillip@squashfs.org.uk>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 81e69df38e upstream.
Fedora has integrated the jitter entropy daemon to work around slow
boot problems, especially on VM's that don't support virtio-rng:
https://bugzilla.redhat.com/show_bug.cgi?id=1572944
It's understandable why they did this, but the Jitter entropy daemon
works fundamentally on the principle: "the CPU microarchitecture is
**so** complicated and we can't figure it out, so it *must* be
random". Yes, it uses statistical tests to "prove" it is secure, but
AES_ENCRYPT(NSA_KEY, COUNTER++) will also pass statistical tests with
flying colors.
So if RDRAND is available, mix it into entropy submitted from
userspace. It can't hurt, and if you believe the NSA has backdoored
RDRAND, then they probably have enough details about the Intel
microarchitecture that they can reverse engineer how the Jitter
entropy daemon affects the microarchitecture, and attack its output
stream. And if RDRAND is in fact an honest DRNG, it will immeasurably
improve on what the Jitter entropy daemon might produce.
This also provides some protection against someone who is able to read
or set the entropy seed file.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3239b8cd28 ]
Hardware could time out Fastpath IOs one second earlier than the timeout
provided by the host.
For non-RAID devices, driver provides timeout value based on OS provided
timeout value. Under certain scenarios, if the OS provides a timeout
value of 1 second, due to above behavior hardware will timeout
immediately.
Increase timeout value for non-RAID fastpath IOs by 1 second.
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b7e1e6859f ]
The OMAP3 ISP driver manages its MMU mappings through the IOMMU-aware
ARM DMA backend. The current code creates a dma_iommu_mapping and
attaches this to the ISP device, but never detaches the mapping in
either the probe failure paths or the driver remove path resulting
in an unbalanced mapping refcount and a memory leak. Fix this properly.
Reported-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Suman Anna <s-anna@ti.com>
Tested-by: Pavel Machek <pavel@ucw.cz>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 31545df391 ]
In crypto_authenc_esn_setkey we save pointers to the authenc keys
in a local variable of type struct crypto_authenc_keys and we don't
zeroize it after use. Fix this and don't leak pointers to the
authenc keys.
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5d111f5190 ]
wait_for_connected() wait till a port change status to
USB_PORT_STAT_CONNECTION, but this is not possible if
the port is unpowered. The loop will only exit at timeout.
Such case take place if an over-current incident happen
while system is in S3. Then during resume wait_for_connected()
will wait 2s, which may be noticeable by the user.
Signed-off-by: Dominik Bozek <dominikx.bozek@intel.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ece97f3a5f ]
simpleImage generation was broken for some time. This patch is fixing
steps how simpleImage.*.ub file is generated. Steps are objdump of
vmlinux and create .ub.
Also make sure that there is striped elf version with .strip suffix.
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c1c734cb1f ]
As part of bringup I ended up wanting to call an earlycon driver by a
name that was exactly 16-bytes big, specifically "qcom_geni_serial".
Unfortunately, when I tried this I found that things compiled just
fine. They just didn't work.
Specifically the compiler felt perfectly justified in initting the
".name" field of "struct earlycon_id" with the full 16-bytes and just
skipping the '\0'. Needless to say, that behavior didn't seem ideal,
but I guess someone must have allowed it for a reason.
One way to fix this is to shorten the name field to 15 bytes and then
add an extra byte after that nobody touches. This should always be
initted to 0 and we're golden.
There are, of course, other ways to fix this too. We could audit all
the users of the "name" field and make them stop at both null
termination or at 16 bytes. We could also just make the name field
much bigger so that we're not likely to run into this. ...but both
seem like we'll just hit the bug again.
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 300eec7c0a ]
ic_nameservers, which stores the list of name servers discovered by
ipconfig, is initialised (i.e. has all of its elements set to NONE, or
0xffffffff) by ic_nameservers_predef() in the following scenarios:
- before the "ip=" and "nfsaddrs=" kernel command line parameters are
parsed (in ip_auto_config_setup());
- before autoconfiguring via DHCP or BOOTP (in ic_bootp_init()), in
order to clear any values that may have been set after parsing "ip="
or "nfsaddrs=" and are no longer needed.
This means that ic_nameservers_predef() is not called when neither "ip="
nor "nfsaddrs=" is specified on the kernel command line. In this
scenario, every element in ic_nameservers remains set to 0x00000000,
which is indistinguishable from ANY and causes pnp_seq_show() to write
the following (bogus) information to /proc/net/pnp:
#MANUAL
nameserver 0.0.0.0
nameserver 0.0.0.0
nameserver 0.0.0.0
This is potentially problematic for systems that blindly link
/etc/resolv.conf to /proc/net/pnp.
Ensure that ic_nameservers is also initialised when neither "ip=" nor
"nfsaddrs=" are specified by calling ic_nameservers_predef() in
ip_auto_config(), but only when ip_auto_config_setup() was not called
earlier. This causes the following to be written to /proc/net/pnp, and
is consistent with what gets written when ipconfig is configured
manually but no name servers are specified on the kernel command line:
#MANUAL
Signed-off-by: Chris Novakovic <chris@chrisn.me.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 7c8b77f815 ]
Heiko Stübner justified pretty well the change in commit e330eb86ba
("ARM: multi_v7_defconfig: enable Rockchip io-domain driver"). This
change is also needed for arm64 rockchip boards, so, do the same for arm64.
The io-domain driver is necessary to notify the soc about voltages
changes happening on supplying regulators. Probably the most important
user right now is the mmc tuning code, where the soc needs to get
notified when the voltage is dropped to the 1.8V point.
As this option is necessary to successfully tune UHS cards etc, it
should get built in. Otherwise, tuning will fail with,
dwmmc_rockchip fe320000.dwmmc: All phases bad!
mmc0: tuning execution failed: -5
Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 1c74d5c0de ]
Currently we are enabling handling of interrupts specific to Tegra124+
which happen to overlap with previous generations. Let's specify
interrupts mask per SoC generation for consistency and in a preparation
of squashing of Tegra20 driver into the common one that will enable
handling of GART faults which may be undesirable by newer generations.
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit bf3fbdfbec ]
The ISR reads interrupts-enable mask, but doesn't utilize it. Apply the
mask to the interrupt status and don't handle interrupts that MC driver
haven't asked for. Kernel would disable spurious MC IRQ and report the
error. This would happen only in a case of a very severe bug.
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 7e5d05e18b ]
We need to introduce a new compatible name for the Meson-AXG SoC
in order to support the RMII 100M ethernet PHY, since the PRG_ETH0
register of the dwmac glue layer is changed from previous old SoC.
Signed-off-by: Yixun Lan <yixun.lan@amlogic.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 03d9fbc397 ]
The Meson8m2 SoC is a variant of Meson8 with some updates from Meson8b
(such as the Gigabit capable DesignWare MAC).
It is mostly pin compatible with Meson8, only 10 (existing) CBUS pins
get an additional function (four of these are Ethernet RXD2, RXD3, TXD2
and TXD3 which are required when the board uses an RGMII PHY).
The AOBUS pins seem to be identical on Meson8 and Meson8m2.
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Reviewed-by: Kevin Hilman <khilman@baylibre.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 486e666136 ]
The use of stack Variable Length Arrays needs to be avoided, as they
can be a vector for stack exhaustion, which can be both a runtime bug
(kernel Oops) or a security flaw (overwriting memory beyond the
stack). Also, in general, as code evolves it is easy to lose track of
how big a VLA can get. Thus, we can end up having runtime failures
that are hard to debug. As part of the directive[1] to remove all VLAs
from the kernel, and build with -Wvla.
Currently driver is using a VLA declared using the number of descriptors. This
array is used to store integer values and is later used as an argument to
`gpiod_set_array_value_cansleep()` This can be avoided by using
`kmalloc_array()` to allocate memory for the array of integer values. Memory is
free'd before return from function.
>From the code it appears that it is safe to sleep so we can use GFP_KERNEL
(based _cansleep() suffix of function `gpiod_set_array_value_cansleep()`.
It can be expected that this patch will result in a small increase in overhead
due to the use of `kmalloc_array()`
[1] https://lkml.org/lkml/2018/3/7/621
Signed-off-by: Tobin C. Harding <me@tobin.cc>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 8e142e9e62 ]
DECLARE_TLV_DB_SCALE (alias of SNDRV_CTL_TLVD_DECLARE_DB_SCALE) is used but
tlv.h is not included. This causes build failure when local macro is
defined by comment-out.
This commit fixes the bug. At the same time, the alias macro is replaced
with a destination macro added at a commit 46e860f768 ("ALSA: rename
TLV-related macros so that they're friendly to user applications")
Reported-by: Connor McAdams <conmanx360@gmail.com>
Fixes: 44f0c9782c ('ALSA: hda/ca0132: Add tuning controls')
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 9899e4d352 ]
In tw_chrdev_ioctl(), the length of the data buffer is firstly copied
from the userspace pointer 'argp' and saved to the kernel object
'data_buffer_length'. Then a security check is performed on it to make
sure that the length is not more than 'TW_MAX_IOCTL_SECTORS *
512'. Otherwise, an error code -EINVAL is returned. If the security
check is passed, the entire ioctl command is copied again from the
'argp' pointer and saved to the kernel object 'tw_ioctl'. Then, various
operations are performed on 'tw_ioctl' according to the 'cmd'. Given
that the 'argp' pointer resides in userspace, a malicious userspace
process can race to change the buffer length between the two
copies. This way, the user can bypass the security check and inject
invalid data buffer length. This can cause potential security issues in
the following execution.
This patch checks for capable(CAP_SYS_ADMIN) in tw_chrdev_open() to
avoid the above issues.
Signed-off-by: Wenwen Wang <wang6495@umn.edu>
Acked-by: Adam Radford <aradford@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c9318a3e02 ]
In twa_chrdev_ioctl(), the ioctl driver command is firstly copied from
the userspace pointer 'argp' and saved to the kernel object
'driver_command'. Then a security check is performed on the data buffer
size indicated by 'driver_command', which is
'driver_command.buffer_length'. If the security check is passed, the
entire ioctl command is copied again from the 'argp' pointer and saved
to the kernel object 'tw_ioctl'. Then, various operations are performed
on 'tw_ioctl' according to the 'cmd'. Given that the 'argp' pointer
resides in userspace, a malicious userspace process can race to change
the buffer size between the two copies. This way, the user can bypass
the security check and inject invalid data buffer size. This can cause
potential security issues in the following execution.
This patch checks for capable(CAP_SYS_ADMIN) in twa_chrdev_open()t o
avoid the above issues.
Signed-off-by: Wenwen Wang <wang6495@umn.edu>
Acked-by: Adam Radford <aradford@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit dac0490718 ]
Only non-NPAR PFs need to actively check and manage unsupported link
speeds. NPAR functions and VFs do not control the link speed and
should skip the unsupported speed detection logic, to avoid warning
messages from firmware rejecting the unsupported firmware calls.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3c0a83b14e ]
The s390 CPU measurement facility sampling mode supports basic entries
and diagnostic entries. Each entry has a valid bit to indicate the
status of the entry as valid or invalid.
This bit is bit 31 in the diagnostic entry, but the bit mask definition
refers to bit 30.
Fix this by making the reserved field one bit larger.
Fixes: 7e75fc3ff4 ("s390/cpum_sf: Add raw data sampling to support the diagnostic-sampling function")
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 77715b7ddb ]
The CPU Measurement sampling facility creates a trailer entry for each
Sample-Data-Block of stored samples. The trailer entry contains the sizes
(in bytes) of the stored sampling types:
- basic-sampling data entry size
- diagnostic-sampling data entry size
Both sizes are 2 bytes long.
This patch changes the trailer entry definition to reflect this.
Fixes: fcc77f5073 ("s390/cpum_sf: Atomically reset trailer entry fields of sample-data-blocks")
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 9c4a121e82 ]
Add support for the BCM43364 chipset via an SDIO interface, as used in
e.g. the Murata 1FX module.
The BCM43364 uses the same firmware as the BCM43430 (which is already
included), the only difference is the omission of Bluetooth.
However, the SDIO_ID for the BCM43364 is 02D0:A9A4, giving it a MODALIAS
of sdio:c00v02D0dA9A4, which doesn't get recognised and hence doesn't
load the brcmfmac module. Adding the 'A9A4' ID in the appropriate place
triggers the brcmfmac driver to load, and then correctly use the
firmware file 'brcmfmac43430-sdio.bin'.
Signed-off-by: Sean Lanigan <sean@lano.id.au>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a75bbe71a2 ]
Per ONFI specification (Rev. 4.0), if the CRC of the first parameter page
read is not valid, the host should read redundant parameter page copies.
Fix FSL NAND driver to read the two redundant copies which are mandatory
in the specification.
Signed-off-by: Jane Wan <Jane.Wan@nokia.com>
Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0cc4655cb5 ]
This issue was reported by a user who downloaded a corrupt saa7164
firmware, then went looking for a valid xc5000 firmware to fix the
error displayed...but the device in question has no xc5000, thus after
much effort, the wild goose chase eventually led to a support call.
The xc5000 has nothing to do with saa7164 (as far as I can tell),
so replace the string with saa7164 as well as give a meaningful
hint on the firmware mismatch.
Signed-off-by: Brad Love <brad@nextdimension.cc>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 804689ad2d ]
For failed commands with valid sense data (e.g. NCQ commands),
scsi_check_sense() is used in ata_analyze_tf() to determine if the
command can be retried. In such case, rely on this decision and ignore
the command error mask based decision done in ata_worth_retry().
This fixes useless retries of commands such as unaligned writes on zoned
disks (TYPE_ZAC).
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 8a22a3e1e7 ]
Inclusion of include/dma-iommu.h when CONFIG_IOMMU_DMA is not selected
results in the following splat:
In file included from drivers/irqchip/irq-gic-v3-mbi.c:20:0:
./include/linux/dma-iommu.h:95:69: error: unknown type name ‘dma_addr_t’
static inline int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base)
^~~~~~~~~~
./include/linux/dma-iommu.h:108:74: warning: ‘struct list_head’ declared inside parameter list will not be visible outside of this definition or declaration
static inline void iommu_dma_get_resv_regions(struct device *dev, struct list_head *list)
^~~~~~~~~
scripts/Makefile.build:312: recipe for target 'drivers/irqchip/irq-gic-v3-mbi.o' failed
Fix it by including linux/types.h.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Rob Herring <robh@kernel.org>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Cc: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lkml.kernel.org/r/20180508121438.11301-5-marc.zyngier@arm.com
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 50808bfcc1 ]
Function nvmem_reg_read can return a non zero value indicating an error.
This returned value must be read and error propagated to
nvmem_cell_prepare_write_buffer. Silence the following gcc warning (W=1):
drivers/nvmem/core.c:1093:9: warning: variable 'rc' set but
not used [-Wunused-but-set-variable]
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 57a66497e1 ]
The PMU node references two interrupts, but lacks the interrupt-affinity
property, which is required in that case:
hw perfevents: no interrupt-affinity property for /pmu, guessing.
Add the missing property to fix this.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 7207b94754 ]
The PMU node references two interrupts, but lacks the interrupt-affinity
property, which is required in that case:
hw perfevents: no interrupt-affinity property for /pmu, guessing.
Add the missing property to fix this.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b3a81b6c4f ]
On many Chromebooks touch devices are multi-sourced; the components are
electrically compatible and one can be freely swapped for another without
changing the OS image or firmware.
To avoid bunch of scary messages when device is not actually present in the
system let's try testing basic communication with it and if there is no
response terminate probe early with -ENXIO.
Signed-off-by: Dmitry Torokhov <dtor@chromium.org>
Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 9dcb3df428 ]
The interrupt controller inside the Wii's Hollywood chip is connected to
two masters, the "Broadway" PowerPC and the "Starlet" ARM926, each with
their own interrupt status and mask registers.
When booting the Wii with mini[1], interrupts from the SD card
controller (IRQ 7) are handled by the ARM, because mini provides SD
access over IPC. Linux however can't currently use or disable this IPC
service, so both sides try to handle IRQ 7 without coordination.
Let's instead make sure that all interrupts that are unmasked on the PPC
side are masked on the ARM side; this will also make sure that Linux can
properly talk to the SD card controller (and potentially other devices).
If access to a device through IPC is desired in the future, interrupts
from that device should not be handled by Linux directly.
[1]: https://github.com/lewurm/mini
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 7a47f20eb1 ]
The method struct drm_connector_helper_funcs::mode_valid is defined
as returning an 'enum drm_mode_status' but the driver implementation
for this method uses an 'int' for it.
Fix this by using 'enum drm_mode_status' in the driver too.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 1cfc63b5ae ]
When waiting for a cacheline to change state in cmpwait, we may immediately
wake-up the first time around the outer loop if the event register was
already set (for example, because of the event stream).
Avoid these spurious wakeups by explicitly clearing the event register
before loading the cacheline and setting the exclusive monitor.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c6213eb1ae ]
This fixes klockworks warnings: Pointer 'dev' returned from call to
function 'bus_find_device' at line 179 may be NULL and will be dereferenced
at line 181.
cpsw-phy-sel.c:179: 'dev' is assigned the return value from function 'bus_find_device'.
bus.c:342: 'bus_find_device' explicitly returns a NULL value.
cpsw-phy-sel.c:181: 'dev' is dereferenced by passing argument 1 to function 'dev_get_drvdata'.
device.h:1024: 'dev' is passed to function 'dev_get_drvdata'.
device.h:1026: 'dev' is explicitly dereferenced.
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
[nsekhar@ti.com: add an error message, fix return path]
Signed-off-by: Sekhar Nori <nsekhar@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 7a2148dfda ]
The current code decrements the timeout counter i and the end of
each loop i is incremented, so the check for timeout will always
be false and hence the timeout mechanism is just a dead code path.
Potentially, if the RD_READY bit is not set, we could end up in
an infinite loop.
Fix this so the timeout starts from 1000 and decrements to zero,
if at the end of the loop i is zero we have a timeout condition.
Detected by CoverityScan, CID#1324008 ("Logically dead code")
Fixes: ccfc97bdb5 ("[media] smiapp: Add driver")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0b01fd3d40 ]
If is_enabled() is not defined, regulator core will assume
this regulator is already enabled, then it can NOT be really
enabled after disabled.
Based on Li Jun's patch from the NXP kernel tree.
Signed-off-by: Anson Huang <Anson.Huang@nxp.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2e3611e954 ]
The device can set the exception event bit in one of the response UPIU,
for example to notify the need for urgent BKOPs operation. In such a
case, the host driver calls ufshcd_exception_event_handler to handle
this notification. When trying to check the exception event status (for
finding the cause for the exception event), the device may be busy with
additional SCSI commands handling and may not respond within the 100ms
timeout.
To prevent that, we need to block SCSI commands during handling of
exception events and allow retransmissions of the query requests, in
case of timeout.
Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Maya Erez <merez@codeaurora.org>
Signed-off-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Asutosh Das <asutoshd@codeaurora.org>
Reviewed-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 36dd26e0c8 ]
Improve fscrypt read performance by switching the decryption workqueue
from bound to unbound. With the bound workqueue, when multiple bios
completed on the same CPU, they were decrypted on that same CPU. But
with the unbound queue, they are now decrypted in parallel on any CPU.
Although fscrypt read performance can be tough to measure due to the
many sources of variation, this change is most beneficial when
decryption is slow, e.g. on CPUs without AES instructions. For example,
I timed tarring up encrypted directories on f2fs. On x86 with AES-NI
instructions disabled, the unbound workqueue improved performance by
about 25-35%, using 1 to NUM_CPUs jobs with 4 or 8 CPUs available. But
with AES-NI enabled, performance was unchanged to within ~2%.
I also did the same test on a quad-core ARM CPU using xts-speck128-neon
encryption. There performance was usually about 10% better with the
unbound workqueue, bringing it closer to the unencrypted speed.
The unbound workqueue may be worse in some cases due to worse locality,
but I think it's still the better default. dm-crypt uses an unbound
workqueue by default too, so this change makes fscrypt match.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 1898eb61fb ]
The ARM CCN PMU driver uses dev_warn() to complain about parameters in
the user-provided perf_event_attr. This means that under normal
operation (e.g. a single invocation of the perf tool), a number of
messages warnings may be logged to dmesg.
Tools may issue multiple syscalls to probe for feature support, and
multiple applications (from multiple users) can attempt to open events
simultaneously, so this is not very helpful, even if a user happens to
have access to dmesg. Worse, this can push important information out of
the dmesg ring buffer, and can significantly slow down syscall fuzzers,
vastly increasing the time it takes to find critical bugs.
Demote the dev_warn() instances to dev_dbg(), as is the case for all
other PMU drivers under drivers/perf/. Users who wish to debug PMU event
initialisation can enable dynamic debug to receive these messages.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Pawel Moll <pawel.moll@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 408fec36a1 ]
Currently we request control of native PCIe hotplug unconditionally.
Native PCIe hotplug events are handled by the pciehp driver, and if it is
not enabled those events will be lost.
Request control of native PCIe hotplug only if the pciehp driver is
enabled, so we will actually handle native PCIe hotplug events.
Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4ea69b2fd6 ]
For multi-function programs, loading the address of a callee
function to a register requires emitting instructions whose
count varies from one to five depending on the nature of the
address.
Since we come to know of the callee's address only before the
extra pass, the number of instructions required to load this
address may vary from what was previously generated. This can
make the JITed image grow or shrink.
To avoid this, we should generate a constant five-instruction
when loading function addresses by padding the optimized load
sequence with NOPs.
Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2181636471 ]
The device node iterators perform an of_node_get on each iteration, so a
jump out of the loop requires an of_node_put.
The semantic patch that fixes this problem is as follows
(http://coccinelle.lip6.fr):
// <smpl>
@@
expression root,e;
local idexpression child;
iterator name for_each_child_of_node;
@@
for_each_child_of_node(root, child) {
... when != of_node_put(child)
when != e = child
+ of_node_put(child);
? break;
...
}
... when != child
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Acked-by: Ludovic Desroches <ludovic.desroches@microchip.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5a4b475cf8 ]
Since the value of x is never intended to be read, declare it with gcc
attribute as unused. Fix warning treated as error with W=1:
arch/powerpc/platforms/powermac/bootx_init.c:471:21: error: variable ‘x’ set but not used [-Werror=unused-but-set-variable]
Suggested-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit f72cf3f1d4 ]
Add a missing prototype for function `note_bootable_part` to silence a
warning treated as error with W=1:
arch/powerpc/platforms/powermac/setup.c:361:12: error: no previous prototype for ‘note_bootable_part’ [-Werror=missing-prototypes]
Suggested-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b87a358b4a ]
Add a missing include <platforms/chrp/chrp.h>.
These functions can all be static, make it so. Fix warnings treated as
errors with W=1:
arch/powerpc/platforms/chrp/time.c:41:13: error: no previous prototype for ‘chrp_time_init’ [-Werror=missing-prototypes]
arch/powerpc/platforms/chrp/time.c:66:5: error: no previous prototype for ‘chrp_cmos_clock_read’ [-Werror=missing-prototypes]
arch/powerpc/platforms/chrp/time.c:74:6: error: no previous prototype for ‘chrp_cmos_clock_write’ [-Werror=missing-prototypes]
arch/powerpc/platforms/chrp/time.c:86:5: error: no previous prototype for ‘chrp_set_rtc_time’ [-Werror=missing-prototypes]
arch/powerpc/platforms/chrp/time.c:130:6: error: no previous prototype for ‘chrp_get_rtc_time’ [-Werror=missing-prototypes]
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c89ca59322 ]
The header file <linux/syscalls.h> was missing from the includes. Fix the
following warning, treated as error with W=1:
arch/powerpc/kernel/pci_32.c:286:6: error: no previous prototype for ‘sys_pciconfig_iobase’ [-Werror=missing-prototypes]
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 699e2302c2 ]
The country code is used by the ath to detect the ISO 3166-1 alpha-2 name
and to select the correct conformance test limits (CTL) for a country. If
the country isn't available and it is still programmed in the EEPROM then
it will cause an error and stop the initialization with:
Invalid EEPROM contents
The current CTL mappings for this country are:
* 2.4GHz: ETSI
* 5GHz: FCC
Signed-off-by: Sven Eckelmann <sven.eckelmann@openmesh.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 9c790f2d23 ]
The country code is used by the ath to detect the ISO 3166-1 alpha-2 name
and to select the correct conformance test limits (CTL) for a country. If
the country isn't available and it is still programmed in the EEPROM then
it will cause an error and stop the initialization with:
Invalid EEPROM contents
The current CTL mappings for this country are:
* 2.4GHz: FCC
* 5GHz: FCC
Signed-off-by: Sven Eckelmann <sven.eckelmann@openmesh.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2a3169a54b ]
The country code is used by the ath to detect the ISO 3166-1 alpha-2 name
and to select the correct conformance test limits (CTL) for a country. If
the country isn't available and it is still programmed in the EEPROM then
it will cause an error and stop the initialization with:
Invalid EEPROM contents
The current CTL mappings for this country are:
* 2.4GHz: ETSI
* 5GHz: ETSI
Signed-off-by: Sven Eckelmann <sven.eckelmann@openmesh.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 667ddac574 ]
The country code is used by the ath to detect the ISO 3166-1 alpha-2 name
and to select the correct conformance test limits (CTL) for a country. If
the country isn't available and it is still programmed in the EEPROM then
it will cause an error and stop the initialization with:
Invalid EEPROM contents
The current CTL mappings for this country are:
* 2.4GHz: ETSI
* 5GHz: FCC
Signed-off-by: Sven Eckelmann <sven.eckelmann@openmesh.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 1ea3986ad2 ]
The country code is used by the ath to detect the ISO 3166-1 alpha-2 name
and to select the correct conformance test limits (CTL) for a country. If
the country isn't available and it is still programmed in the EEPROM then
it will cause an error and stop the initialization with:
Invalid EEPROM contents
The current CTL mappings for this country are:
* 2.4GHz: ETSI
* 5GHz: FCC
Signed-off-by: Sven Eckelmann <sven.eckelmann@openmesh.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4f183687e3 ]
The regdomain code is used to select the correct the correct conformance
test limits (CTL) for a country. If the regdomain code isn't available and
it is still programmed in the EEPROM then it will cause an error and stop
the initialization with:
Invalid EEPROM contents
The current CTL mappings for this regdomain code are:
* 2.4GHz: FCC
* 5GHz: FCC
Signed-off-by: Sven Eckelmann <sven.eckelmann@openmesh.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 9ba8df0c52 ]
The regdomain code is used to select the correct the correct conformance
test limits (CTL) for a country. If the regdomain code isn't available and
it is still programmed in the EEPROM then it will cause an error and stop
the initialization with:
Invalid EEPROM contents
The current CTL mappings for this regdomain code are:
* 2.4GHz: ETSI
* 5GHz: ETSI
Signed-off-by: Sven Eckelmann <sven.eckelmann@openmesh.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 45faf6e096 ]
The regdomain code is used to select the correct the correct conformance
test limits (CTL) for a country. If the regdomain code isn't available and
it is still programmed in the EEPROM then it will cause an error and stop
the initialization with:
Invalid EEPROM contents
The current CTL mappings for this regdomain code are:
* 2.4GHz: ETSI
* 5GHz: ETSI
Signed-off-by: Sven Eckelmann <sven.eckelmann@openmesh.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 01fb2994a9 ]
The regdomain code is used to select the correct the correct conformance
test limits (CTL) for a country. If the regdomain code isn't available and
it is still programmed in the EEPROM then it will cause an error and stop
the initialization with:
Invalid EEPROM contents
The current CTL mappings for this regdomain code are:
* 2.4GHz: ETSI
* 5GHz: FCC
Signed-off-by: Sven Eckelmann <sven.eckelmann@openmesh.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ff3d27a048 ]
Under the following case, qgroup rescan can double account cowed tree
blocks:
In this case, extent tree only has one tree block.
-
| transid=5 last committed=4
| btrfs_qgroup_rescan_worker()
| |- btrfs_start_transaction()
| | transid = 5
| |- qgroup_rescan_leaf()
| |- btrfs_search_slot_for_read() on extent tree
| Get the only extent tree block from commit root (transid = 4).
| Scan it, set qgroup_rescan_progress to the last
| EXTENT/META_ITEM + 1
| now qgroup_rescan_progress = A + 1.
|
| fs tree get CoWed, new tree block is at A + 16K
| transid 5 get committed
-
| transid=6 last committed=5
| btrfs_qgroup_rescan_worker()
| btrfs_qgroup_rescan_worker()
| |- btrfs_start_transaction()
| | transid = 5
| |- qgroup_rescan_leaf()
| |- btrfs_search_slot_for_read() on extent tree
| Get the only extent tree block from commit root (transid = 5).
| scan it using qgroup_rescan_progress (A + 1).
| found new tree block beyong A, and it's fs tree block,
| account it to increase qgroup numbers.
-
In above case, tree block A, and tree block A + 16K get accounted twice,
while qgroup rescan should stop when it already reach the last leaf,
other than continue using its qgroup_rescan_progress.
Such case could happen by just looping btrfs/017 and with some
possibility it can hit such double qgroup accounting problem.
Fix it by checking the path to determine if we should finish qgroup
rescan, other than relying on next loop to exit.
Reported-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3d3a2e610e ]
Currently the code assumes that there's an implied barrier by the
sequence of code preceding the wakeup, namely the mutex unlock.
As Nikolay pointed out:
I think this is wrong (not your code) but the original assumption that
the RELEASE semantics provided by mutex_unlock is sufficient.
According to memory-barriers.txt:
Section 'LOCK ACQUISITION FUNCTIONS' states:
(2) RELEASE operation implication:
Memory operations issued before the RELEASE will be completed before the
RELEASE operation has completed.
Memory operations issued after the RELEASE *may* be completed before the
RELEASE operation has completed.
(I've bolded the may portion)
The example given there:
As an example, consider the following:
*A = a;
*B = b;
ACQUIRE
*C = c;
*D = d;
RELEASE
*E = e;
*F = f;
The following sequence of events is acceptable:
ACQUIRE, {*F,*A}, *E, {*C,*D}, *B, RELEASE
So if we assume that *C is modifying the flag which the waitqueue is checking,
and *E is the actual wakeup, then those accesses can be re-ordered...
IMHO this code should be considered broken...
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 90b2da89a0 ]
When a buffer is queued or requeued in vb2_buffer_done, then don't
call the finish memop. In this case the buffer is only returned to vb2,
not to userspace.
Calling 'finish' here will cause an unbalance when the queue is
canceled, since the core will call the same memop again.
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 6e91d48371 ]
the wl pointer can be null In case only wlcore_sdio is probed while
no WiLink module is successfully probed, as in the case of mounting a
wl12xx module while using a device tree file configured with wl18xx
related settings.
In this case the system was crashing in wl1271_suspend() as platform
device data is not set.
Make sure wl the pointer is valid before using it.
Signed-off-by: Eyal Reizer <eyalr@ti.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b817047ae7 ]
Race condition is observed during rmmod of mwifiex_usb:
1. The rmmod thread will call mwifiex_usb_disconnect(), download
SHUTDOWN command and do wait_event_interruptible_timeout(),
waiting for response.
2. The main thread will handle the response and will do a
wake_up_interruptible(), unblocking rmmod thread.
3. On getting unblocked, rmmod thread will make rx_cmd.urb = NULL in
mwifiex_usb_free().
4. The main thread will try to resubmit rx_cmd.urb in
mwifiex_usb_submit_rx_urb(), which is NULL.
To fix, wait for main thread to complete before calling
mwifiex_usb_free().
Signed-off-by: Ganapathi Bhat <gbhat@marvell.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 9c0ac70ad2 ]
In case, one BE is used by two FE1/FE2
FE1--->BE-->
|
FE2----]
when FE1/FE2 call dpcm_be_dai_hw_free() together
the BE users will be 2 (> 1), hence cannot be hw_free
the be state will leave at, ex. SND_SOC_DPCM_STATE_STOP
later FE1/FE2 call dpcm_be_dai_shutdown(),
will be skip due to wrong state.
leaving the BE not being hw_free and shutdown.
The BE dai will be hw_free later when calling
dpcm_be_dai_shutdown() if still in invalid state.
Signed-off-by: KaiChieh Chuang <kaichieh.chuang@mediatek.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 9960521c44 ]
This patch fixes the following warning during boot:
do not call blocking ops when !TASK_RUNNING; state=1 set at
[<(ptrval)>] qca_setup+0x194/0x750 [hci_uart]
WARNING: CPU: 2 PID: 1878 at kernel/sched/core.c:6135
__might_sleep+0x7c/0x88
In qca_set_baudrate(), the current task state is set to
TASK_UNINTERRUPTIBLE before going to sleep for 300ms. It was then
restored to TASK_INTERRUPTIBLE. This patch sets the current task state
back to TASK_RUNNING instead.
Signed-off-by: Thierry Escande <thierry.escande@linaro.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0f22e40053 ]
Make sure the rx_allocator worker is canceled before running the
rx_init routine. rx_init frees and re-allocates all rxb's pages. The
rx_allocator worker also allocates pages for the used rxb's. Running
rx_init and rx_allocator simultaniously causes a kernel panic. Fix
that by canceling the work in rx_init.
Signed-off-by: Shaul Triebitz <shaul.triebitz@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e9d33f149f ]
A few changes improve the overall usability of the test:
* fix a hard-coded maximum frequency (3300),
* don't adjust the CPU frequency if only evaluating results,
* fix a comparison for multiple frequencies.
A symptom of that last issue looked like this:
./run.sh: line 107: [: too many arguments
./run.sh: line 110: 3099
3099
3100-3100: syntax error in expression (error token is \"3099
3100-3100\")
Because a check will count how many differente frequencies
there are among the CPUs of the system, and after they are
tallied another read is performed, which might produce
different results.
Signed-off-by: Daniel Díaz <daniel.diaz@linaro.org>
Signed-off-by: Shuah Khan (Samsung OSG) <shuah@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e5e5732d81 ]
After revoking atomic write, related LBA can be reused by others, so we
need to wait page writeback before reusing the LBA, in order to avoid
interference between old atomic written in-flight IO and new IO.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4071e67cff ]
The following patch disables loading of f2fs module on architectures
which have PAGE_SIZE > 4096 , since it is impossible to mount f2fs on
such architectures , log messages are:
mount: /mnt: wrong fs type, bad option, bad superblock on
/dev/vdiskb1, missing codepage or helper program, or other error.
/dev/vdiskb1: F2FS filesystem,
UUID=1d8b9ca4-2389-4910-af3b-10998969f09c, volume name ""
May 15 18:03:13 ttip kernel: F2FS-fs (vdiskb1): Invalid
page_cache_size (8192), supports only 4KB
May 15 18:03:13 ttip kernel: F2FS-fs (vdiskb1): Can't find valid F2FS
filesystem in 1th superblock
May 15 18:03:13 ttip kernel: F2FS-fs (vdiskb1): Invalid
page_cache_size (8192), supports only 4KB
May 15 18:03:13 ttip kernel: F2FS-fs (vdiskb1): Can't find valid F2FS
filesystem in 2th superblock
May 15 18:03:13 ttip kernel: F2FS-fs (vdiskb1): Invalid
page_cache_size (8192), supports only 4KB
which was introduced by git commit 5c9b469295
tested on git kernel 4.17.0-rc6-00309-gec30dcf7f425
with patch applied:
modprobe: ERROR: could not insert 'f2fs': Invalid argument
May 28 01:40:28 v215 kernel: F2FS not supported on PAGE_SIZE(8192) != 4096
Signed-off-by: Anatoly Pugachev <matorola@gmail.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ae55e59da0 ]
If the server recalls the layout that was just handed out, we risk hitting
a race as described in RFC5661 Section 2.10.6.3 unless we ensure that we
release the sequence slot after processing the LAYOUTGET operation that
was sent as part of the OPEN compound.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 926bc2f100 ]
The stores to update the SLB shadow area must be made as they appear
in the C code, so that the hypervisor does not see an entry with
mismatched vsid and esid. Use WRITE_ONCE for this.
GCC has been observed to elide the first store to esid in the update,
which means that if the hypervisor interrupts the guest after storing
to vsid, it could see an entry with old esid and new vsid, which may
possibly result in memory corruption.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 447808bf50 ]
time_init() will set up tb_ticks_per_usec based on reality.
time_init() is called *after* udbg_init_opal_common() during boot.
from arch/powerpc/kernel/time.c:
unsigned long tb_ticks_per_usec = 100; /* sane default */
Currently, all powernv systems have a timebase frequency of 512mhz
(512000000/1000000 == 0x200) - although there's nothing written
down anywhere that I can find saying that we couldn't make that
different based on the requirements in the ISA.
So, we've been (accidentally) thwacking the (currently) correct
(for powernv at least) value for tb_ticks_per_usec earlier than
we otherwise would have.
The "sane default" seems to be adequate for our purposes between
udbg_init_opal_common() and time_init() being called, and if it isn't,
then we should probably be setting it somewhere that isn't hvc_opal.c!
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 46d4be41b9 ]
Correct two cases where eeh_pcid_get() is used to reference the driver's
module but the reference is dropped before the driver pointer is used.
In eeh_rmv_device() also refactor a little so that only two calls to
eeh_pcid_put() are needed, rather than three and the reference isn't
taken at all if it wasn't needed.
Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit cb2595c139 ]
ucma_process_join() will free the new allocated "mc" struct,
if there is any error after that, especially the copy_to_user().
But in parallel, ucma_leave_multicast() could find this "mc"
through idr_find() before ucma_process_join() frees it, since it
is already published.
So "mc" could be used in ucma_leave_multicast() after it is been
allocated and freed in ucma_process_join(), since we don't refcnt
it.
Fix this by separating "publish" from ID allocation, so that we
can get an ID first and publish it later after copy_to_user().
Fixes: c8f6a362bf ("RDMA/cma: Add multicast communication support")
Reported-by: Noam Rathaus <noamr@beyondsecurity.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit bd975e6914 ]
When listing sets with timeout support, there's a probability that
just timing out entries with "0" timeout value is listed/saved.
However when restoring the saved list, the zero timeout value means
permanent elelements.
The new behaviour is that timing out entries are listed with "timeout 1"
instead of zero.
Fixes netfilter bugzilla #1258.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit abfdff44bc ]
When using RTC_ALM_SET or RTC_WKALM_SET with rtc_wkalrm.enabled not set,
rtc_timer_enqueue() is not called and rtc_set_alarm() may succeed but the
subsequent RTC_AIE_ON ioctl will fail. RTC_ALM_READ would also fail in that
case.
Ensure rtc_set_alarm() fails when alarms are not supported to avoid letting
programs think the alarms are working for a particular RTC when they are
not.
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3171822fdc ]
When running a fuzz tester against a KASAN-enabled kernel, the following
splat periodically occurs.
The problem occurs when the test sends a GETDEVICEINFO request with a
malformed xdr array (size but no data) for gdia_notify_types and the
array size is > 0x3fffffff, which results in an overflow in the value of
nbytes which is passed to read_buf().
If the array size is 0x40000000, 0x80000000, or 0xc0000000, then after
the overflow occurs, the value of nbytes 0, and when that happens the
pointer returned by read_buf() points to the end of the xdr data (i.e.
argp->end) when really it should be returning NULL.
Fix this by returning NFS4ERR_BAD_XDR if the array size is > 1000 (this
value is arbitrary, but it's the same threshold used by
nfsd4_decode_bitmap()... in could really be any value >= 1 since it's
expected to get at most a single bitmap in gdia_notify_types).
[ 119.256854] ==================================================================
[ 119.257611] BUG: KASAN: use-after-free in nfsd4_decode_getdeviceinfo+0x5a4/0x5b0 [nfsd]
[ 119.258422] Read of size 4 at addr ffff880113ada000 by task nfsd/538
[ 119.259146] CPU: 0 PID: 538 Comm: nfsd Not tainted 4.17.0+ #1
[ 119.259662] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-1.fc25 04/01/2014
[ 119.261202] Call Trace:
[ 119.262265] dump_stack+0x71/0xab
[ 119.263371] print_address_description+0x6a/0x270
[ 119.264609] kasan_report+0x258/0x380
[ 119.265854] ? nfsd4_decode_getdeviceinfo+0x5a4/0x5b0 [nfsd]
[ 119.267291] nfsd4_decode_getdeviceinfo+0x5a4/0x5b0 [nfsd]
[ 119.268549] ? nfs4svc_decode_compoundargs+0xa5b/0x13c0 [nfsd]
[ 119.269873] ? nfsd4_decode_sequence+0x490/0x490 [nfsd]
[ 119.271095] nfs4svc_decode_compoundargs+0xa5b/0x13c0 [nfsd]
[ 119.272393] ? nfsd4_release_compoundargs+0x1b0/0x1b0 [nfsd]
[ 119.273658] nfsd_dispatch+0x183/0x850 [nfsd]
[ 119.274918] svc_process+0x161c/0x31a0 [sunrpc]
[ 119.276172] ? svc_printk+0x190/0x190 [sunrpc]
[ 119.277386] ? svc_xprt_release+0x451/0x680 [sunrpc]
[ 119.278622] nfsd+0x2b9/0x430 [nfsd]
[ 119.279771] ? nfsd_destroy+0x1c0/0x1c0 [nfsd]
[ 119.281157] kthread+0x2db/0x390
[ 119.282347] ? kthread_create_worker_on_cpu+0xc0/0xc0
[ 119.283756] ret_from_fork+0x35/0x40
[ 119.286041] Allocated by task 436:
[ 119.287525] kasan_kmalloc+0xa0/0xd0
[ 119.288685] kmem_cache_alloc+0xe9/0x1f0
[ 119.289900] get_empty_filp+0x7b/0x410
[ 119.291037] path_openat+0xca/0x4220
[ 119.292242] do_filp_open+0x182/0x280
[ 119.293411] do_sys_open+0x216/0x360
[ 119.294555] do_syscall_64+0xa0/0x2f0
[ 119.295721] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 119.298068] Freed by task 436:
[ 119.299271] __kasan_slab_free+0x130/0x180
[ 119.300557] kmem_cache_free+0x78/0x210
[ 119.301823] rcu_process_callbacks+0x35b/0xbd0
[ 119.303162] __do_softirq+0x192/0x5ea
[ 119.305443] The buggy address belongs to the object at ffff880113ada000
which belongs to the cache filp of size 256
[ 119.308556] The buggy address is located 0 bytes inside of
256-byte region [ffff880113ada000, ffff880113ada100)
[ 119.311376] The buggy address belongs to the page:
[ 119.312728] page:ffffea00044eb680 count:1 mapcount:0 mapping:0000000000000000 index:0xffff880113ada780
[ 119.314428] flags: 0x17ffe000000100(slab)
[ 119.315740] raw: 0017ffe000000100 0000000000000000 ffff880113ada780 00000001000c0001
[ 119.317379] raw: ffffea0004553c60 ffffea00045c11e0 ffff88011b167e00 0000000000000000
[ 119.319050] page dumped because: kasan: bad access detected
[ 119.321652] Memory state around the buggy address:
[ 119.322993] ffff880113ad9f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 119.324515] ffff880113ad9f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 119.326087] >ffff880113ada000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 119.327547] ^
[ 119.328730] ffff880113ada080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 119.330218] ffff880113ada100: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
[ 119.331740] ==================================================================
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 57f230ab04 ]
The max number of slots used in xennet_get_responses() is set to
MAX_SKB_FRAGS + (rx->status <= RX_COPY_THRESHOLD).
In old kernel-xen MAX_SKB_FRAGS was 18, while nowadays it is 17. This
difference is resulting in frequent messages "too many slots" and a
reduced network throughput for some workloads (factor 10 below that of
a kernel-xen based guest).
Replacing MAX_SKB_FRAGS by XEN_NETIF_NR_SLOTS_MIN for calculation of
the max number of slots to use solves that problem (tests showed no
more messages "too many slots" and throughput was as high as with the
kernel-xen based guest system).
Replace MAX_SKB_FRAGS-2 by XEN_NETIF_NR_SLOTS_MIN-1 in
netfront_tx_slot_available() for making it clearer what is really being
tested without actually modifying the tested value.
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c9484b986e ]
Patch series "kcov: fix unexpected faults".
These patches fix a few issues where KCOV code could trigger recursive
faults, discovered while debugging a patch enabling KCOV for arch/arm:
* On CONFIG_PREEMPT kernels, there's a small race window where
__sanitizer_cov_trace_pc() can see a bogus kcov_area.
* Lazy faulting of the vmalloc area can cause mutual recursion between
fault handling code and __sanitizer_cov_trace_pc().
* During the context switch, switching the mm can cause the kcov_area to
be transiently unmapped.
These are prerequisites for enabling KCOV on arm, but the issues
themsevles are generic -- we just happen to avoid them by chance rather
than design on x86-64 and arm64.
This patch (of 3):
For kernels built with CONFIG_PREEMPT, some C code may execute before or
after the interrupt handler, while the hardirq count is zero. In these
cases, in_task() can return true.
A task can be interrupted in the middle of a KCOV_DISABLE ioctl while it
resets the task's kcov data via kcov_task_init(). Instrumented code
executed during this period will call __sanitizer_cov_trace_pc(), and as
in_task() returns true, will inspect t->kcov_mode before trying to write
to t->kcov_area.
In kcov_init_task() we update t->kcov_{mode,area,size} with plain stores,
which may be re-ordered, torn, etc. Thus __sanitizer_cov_trace_pc() may
see bogus values for any of these fields, and may attempt to write to
memory which is not mapped.
Let's avoid this by using WRITE_ONCE() to set t->kcov_mode, with a
barrier() to ensure this is ordered before we clear t->kov_{area,size}.
This ensures that any code execute while kcov_init_task() is preempted
will either see valid values for t->kcov_{area,size}, or will see that
t->kcov_mode is KCOV_MODE_DISABLED, and bail out without touching
t->kcov_area.
Link: http://lkml.kernel.org/r/20180504135535.53744-2-mark.rutland@arm.com
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 56406e017a upstream.
The commit 3bc04e28a0 ("usb: dwc2: host: Get aligned DMA in a more
supported way") introduced a common way to align DMA allocations.
The code in the commit aligns the struct dma_aligned_buffer but the
actual DMA address pointed by data[0] gets aligned to an offset from
the allocated boundary by the kmalloc_ptr and the old_xfer_buffer
pointers.
This is against the recommendation in Documentation/DMA-API.txt which
states:
Therefore, it is recommended that driver writers who don't take
special care to determine the cache line size at run time only map
virtual regions that begin and end on page boundaries (which are
guaranteed also to be cache line boundaries).
The effect of this is that architectures with non-coherent DMA caches
may run into memory corruption or kernel crashes with Unhandled
kernel unaligned accesses exceptions.
Fix the alignment by positioning the DMA area in front of the allocation
and use memory at the end of the area for storing the orginal
transfer_buffer pointer. This may have the added benefit of increased
performance as the DMA area is now fully aligned on all architectures.
Tested with Lantiq xRX200 (MIPS) and RPi Model B Rev 2 (ARM).
Fixes: 3bc04e28a0 ("usb: dwc2: host: Get aligned DMA in a more supported way")
Cc: <stable@vger.kernel.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
[ Antti: backported to 4.9: edited difference in whitespace ]
Signed-off-by: Antti Seppälä <a.seppala@gmail.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7b0eb6b41a upstream.
Arnd reports the following arm64 randconfig build error with the PSI
patches that add another page flag:
/git/arm-soc/arch/arm64/mm/init.c: In function 'mem_init':
/git/arm-soc/include/linux/compiler.h:357:38: error: call to
'__compiletime_assert_618' declared with attribute error: BUILD_BUG_ON
failed: sizeof(struct page) > (1 << STRUCT_PAGE_MAX_SHIFT)
The additional page flag causes other information stored in
page->flags to get bumped into their own struct page member:
#if SECTIONS_WIDTH+ZONES_WIDTH+NODES_SHIFT+LAST_CPUPID_SHIFT <=
BITS_PER_LONG - NR_PAGEFLAGS
#define LAST_CPUPID_WIDTH LAST_CPUPID_SHIFT
#else
#define LAST_CPUPID_WIDTH 0
#endif
#if defined(CONFIG_NUMA_BALANCING) && LAST_CPUPID_WIDTH == 0
#define LAST_CPUPID_NOT_IN_PAGE_FLAGS
#endif
which in turn causes the struct page size to exceed the size set in
STRUCT_PAGE_MAX_SHIFT. This value is an an estimate used to size the
VMEMMAP page array according to address space and struct page size.
However, the check is performed - and triggers here - on a !VMEMMAP
config, which consumes an additional 22 page bits for the sparse
section id. When VMEMMAP is enabled, those bits are returned, cpupid
doesn't need its own member, and the page passes the VMEMMAP check.
Restrict that check to the situation it was meant to check: that we
are sizing the VMEMMAP page array correctly.
Says Arnd:
Further experiments show that the build error already existed before,
but was only triggered with larger values of CONFIG_NR_CPU and/or
CONFIG_NODES_SHIFT that might be used in actual configurations but
not in randconfig builds.
With longer CPU and node masks, I could recreate the problem with
kernels as old as linux-4.7 when arm64 NUMA support got added.
Reported-by: Arnd Bergmann <arnd@arndb.de>
Tested-by: Arnd Bergmann <arnd@arndb.de>
Cc: stable@vger.kernel.org
Fixes: 1a2db30034 ("arm64, numa: Add NUMA support for arm64 platforms.")
Fixes: 3e1907d5bf ("arm64: mm: move vmemmap region right below the linear region")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2519c1bbe3 upstream.
Commit 57ea2a34ad ("tracing/kprobes: Fix trace_probe flags on
enable_trace_kprobe() failure") added an if statement that depends on another
if statement that gcc doesn't see will initialize the "link" variable and
gives the warning:
"warning: 'link' may be used uninitialized in this function"
It is really a false positive, but to quiet the warning, and also to make
sure that it never actually is used uninitialized, initialize the "link"
variable to NULL and add an if (!WARN_ON_ONCE(!link)) where the compiler
thinks it could be used uninitialized.
Cc: stable@vger.kernel.org
Fixes: 57ea2a34ad ("tracing/kprobes: Fix trace_probe flags on enable_trace_kprobe() failure")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3e536e222f upstream.
There is a window for racing when printing directly to task->comm,
allowing other threads to see a non-terminated string. The vsnprintf
function fills the buffer, counts the truncated chars, then finally
writes the \0 at the end.
creator other
vsnprintf:
fill (not terminated)
count the rest trace_sched_waking(p):
... memcpy(comm, p->comm, TASK_COMM_LEN)
write \0
The consequences depend on how 'other' uses the string. In our case,
it was copied into the tracing system's saved cmdlines, a buffer of
adjacent TASK_COMM_LEN-byte buffers (note the 'n' where 0 should be):
crash-arm64> x/1024s savedcmd->saved_cmdlines | grep 'evenk'
0xffffffd5b3818640: "irq/497-pwr_evenkworker/u16:12"
...and a strcpy out of there would cause stack corruption:
[224761.522292] Kernel panic - not syncing: stack-protector:
Kernel stack is corrupted in: ffffff9bf9783c78
crash-arm64> kbt | grep 'comm\|trace_print_context'
#6 0xffffff9bf9783c78 in trace_print_context+0x18c(+396)
comm (char [16]) = "irq/497-pwr_even"
crash-arm64> rd 0xffffffd4d0e17d14 8
ffffffd4d0e17d14: 2f71726900000000 5f7277702d373934 ....irq/497-pwr_
ffffffd4d0e17d24: 726f776b6e657665 3a3631752f72656b evenkworker/u16:
ffffffd4d0e17d34: f9780248ff003231 cede60e0ffffff9b 12..H.x......`..
ffffffd4d0e17d44: cede60c8ffffffd4 00000fffffffffd4 .....`..........
The workaround in e09e28671 (use strlcpy in __trace_find_cmdline) was
likely needed because of this same bug.
Solved by vsnprintf:ing to a local buffer, then using set_task_comm().
This way, there won't be a window where comm is not terminated.
Link: http://lkml.kernel.org/r/20180726071539.188015-1-snild@sony.com
Cc: stable@vger.kernel.org
Fixes: bc0c38d139 ("ftrace: latency tracer infrastructure")
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Snild Dolkow <snild@sony.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 15cc78644d upstream.
There was a case that triggered a double free in event_trigger_callback()
due to the called reg() function freeing the trigger_data and then it
getting freed again by the error return by the caller. The solution there
was to up the trigger_data ref count.
Code inspection found that event_enable_trigger_func() has the same issue,
but is not as easy to trigger (requires harder to trigger failures). It
needs to be solved slightly different as it needs more to clean up when the
reg() function fails.
Link: http://lkml.kernel.org/r/20180725124008.7008e586@gandalf.local.home
Cc: stable@vger.kernel.org
Fixes: 7862ad1846 ("tracing: Add 'enable_event' and 'disable_event' event trigger commands")
Reivewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1863c38725 upstream.
Running the following:
# cd /sys/kernel/debug/tracing
# echo 500000 > buffer_size_kb
[ Or some other number that takes up most of memory ]
# echo snapshot > events/sched/sched_switch/trigger
Triggers the following bug:
------------[ cut here ]------------
kernel BUG at mm/slub.c:296!
invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
CPU: 6 PID: 6878 Comm: bash Not tainted 4.18.0-rc6-test+ #1066
Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016
RIP: 0010:kfree+0x16c/0x180
Code: 05 41 0f b6 72 51 5b 5d 41 5c 4c 89 d7 e9 ac b3 f8 ff 48 89 d9 48 89 da 41 b8 01 00 00 00 5b 5d 41 5c 4c 89 d6 e9 f4 f3 ff ff <0f> 0b 0f 0b 48 8b 3d d9 d8 f9 00 e9 c1 fe ff ff 0f 1f 40 00 0f 1f
RSP: 0018:ffffb654436d3d88 EFLAGS: 00010246
RAX: ffff91a9d50f3d80 RBX: ffff91a9d50f3d80 RCX: ffff91a9d50f3d80
RDX: 00000000000006a4 RSI: ffff91a9de5a60e0 RDI: ffff91a9d9803500
RBP: ffffffff8d267c80 R08: 00000000000260e0 R09: ffffffff8c1a56be
R10: fffff0d404543cc0 R11: 0000000000000389 R12: ffffffff8c1a56be
R13: ffff91a9d9930e18 R14: ffff91a98c0c2890 R15: ffffffff8d267d00
FS: 00007f363ea64700(0000) GS:ffff91a9de580000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055c1cacc8e10 CR3: 00000000d9b46003 CR4: 00000000001606e0
Call Trace:
event_trigger_callback+0xee/0x1d0
event_trigger_write+0xfc/0x1a0
__vfs_write+0x33/0x190
? handle_mm_fault+0x115/0x230
? _cond_resched+0x16/0x40
vfs_write+0xb0/0x190
ksys_write+0x52/0xc0
do_syscall_64+0x5a/0x160
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7f363e16ab50
Code: 73 01 c3 48 8b 0d 38 83 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 79 db 2c 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 1e e3 01 00 48 89 04 24
RSP: 002b:00007fff9a4c6378 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000009 RCX: 00007f363e16ab50
RDX: 0000000000000009 RSI: 000055c1cacc8e10 RDI: 0000000000000001
RBP: 000055c1cacc8e10 R08: 00007f363e435740 R09: 00007f363ea64700
R10: 0000000000000073 R11: 0000000000000246 R12: 0000000000000009
R13: 0000000000000001 R14: 00007f363e4345e0 R15: 00007f363e4303c0
Modules linked in: ip6table_filter ip6_tables snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device i915 snd_pcm snd_timer i2c_i801 snd soundcore i2c_algo_bit drm_kms_helper
86_pkg_temp_thermal video kvm_intel kvm irqbypass wmi e1000e
---[ end trace d301afa879ddfa25 ]---
The cause is because the register_snapshot_trigger() call failed to
allocate the snapshot buffer, and then called unregister_trigger()
which freed the data that was passed to it. Then on return to the
function that called register_snapshot_trigger(), as it sees it
failed to register, it frees the trigger_data again and causes
a double free.
By calling event_trigger_init() on the trigger_data (which only ups
the reference counter for it), and then event_trigger_free() afterward,
the trigger_data would not get freed by the registering trigger function
as it would only up and lower the ref count for it. If the register
trigger function fails, then the event_trigger_free() called after it
will free the trigger data normally.
Link: http://lkml.kernel.org/r/20180724191331.738eb819@gandalf.local.home
Cc: stable@vger.kerne.org
Fixes: 93e31ffbf4 ("tracing: Add 'snapshot' event trigger command")
Reported-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 384cf4285b upstream.
The Lenovo LaVie Z laptop requires i8042 to be reset in order to
consistently detect its Elantech touchpad. The nomux and kbdreset
quirks are not sufficient.
It's possible the other LaVie Z models from NEC require this as well.
Cc: stable@vger.kernel.org
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3756f6401c upstream.
gcc-8 warns about using strncpy() with the source size as the limit:
fs/exec.c:1223:32: error: argument to 'sizeof' in 'strncpy' call is the same expression as the source; did you mean to use the size of the destination? [-Werror=sizeof-pointer-memaccess]
This is indeed slightly suspicious, as it protects us from source
arguments without NUL-termination, but does not guarantee that the
destination is terminated.
This keeps the strncpy() to ensure we have properly padded target
buffer, but ensures that we use the correct length, by passing the
actual length of the destination buffer as well as adding a build-time
check to ensure it is exactly TASK_COMM_LEN.
There are only 23 callsites which I all reviewed to ensure this is
currently the case. We could get away with doing only the check or
passing the right length, but it doesn't hurt to do both.
Link: http://lkml.kernel.org/r/20171205151724.1764896-1-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Suggested-by: Kees Cook <keescook@chromium.org>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Aleksa Sarai <asarai@suse.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Starting with gcc-8.1, we get a warning about all system call definitions,
which use an alias between functions with incompatible prototypes, e.g.:
In file included from ../mm/process_vm_access.c:19:
../include/linux/syscalls.h:211:18: warning: 'sys_process_vm_readv' alias between functions of incompatible types 'long int(pid_t, const struct iovec *, long unsigned int, const struct iovec *, long unsigned int, long unsigned int)' {aka 'long int(int, const struct iovec *, long unsigned int, const struct iovec *, long unsigned int, long unsigned int)'} and 'long int(long int, long int, long int, long int, long int, long int)' [-Wattribute-alias]
asmlinkage long sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)) \
^~~
../include/linux/syscalls.h:207:2: note: in expansion of macro '__SYSCALL_DEFINEx'
__SYSCALL_DEFINEx(x, sname, __VA_ARGS__)
^~~~~~~~~~~~~~~~~
../include/linux/syscalls.h:201:36: note: in expansion of macro 'SYSCALL_DEFINEx'
#define SYSCALL_DEFINE6(name, ...) SYSCALL_DEFINEx(6, _##name, __VA_ARGS__)
^~~~~~~~~~~~~~~
../mm/process_vm_access.c:300:1: note: in expansion of macro 'SYSCALL_DEFINE6'
SYSCALL_DEFINE6(process_vm_readv, pid_t, pid, const struct iovec __user *, lvec,
^~~~~~~~~~~~~~~
../include/linux/syscalls.h:215:18: note: aliased declaration here
asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__)) \
^~~
../include/linux/syscalls.h:207:2: note: in expansion of macro '__SYSCALL_DEFINEx'
__SYSCALL_DEFINEx(x, sname, __VA_ARGS__)
^~~~~~~~~~~~~~~~~
../include/linux/syscalls.h:201:36: note: in expansion of macro 'SYSCALL_DEFINEx'
#define SYSCALL_DEFINE6(name, ...) SYSCALL_DEFINEx(6, _##name, __VA_ARGS__)
^~~~~~~~~~~~~~~
../mm/process_vm_access.c:300:1: note: in expansion of macro 'SYSCALL_DEFINE6'
SYSCALL_DEFINE6(process_vm_readv, pid_t, pid, const struct iovec __user *, lvec,
This is really noisy and does not indicate a real problem. In the latest
mainline kernel, this was addressed by commit bee2003177 ("disable
-Wattribute-alias warning for SYSCALL_DEFINEx()"), which seems too invasive
to backport.
This takes a much simpler approach and just disables the warning across the
kernel.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8399799725 upstream.
RX overflow interrupt (RXOFLW) is disabled even though xcan_interrupt()
processes it. This means that an RX overflow interrupt will only be
processed when another interrupt gets asserted (e.g. for RX/TX).
Fix that by enabling the RXOFLW interrupt.
Fixes: b1201e44f5 ("can: xilinx CAN controller support")
Signed-off-by: Anssi Hannula <anssi.hannula@bitwise.fi>
Cc: Michal Simek <michal.simek@xilinx.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2f4f0f338c upstream.
xcan_interrupt() clears ERROR|RXOFLV|BSOFF|ARBLST interrupts if any of
them is asserted. This does not take into account that some of them
could have been asserted between interrupt status read and interrupt
clear, therefore clearing them without handling them.
Fix the code to only clear those interrupts that it knows are asserted
and therefore going to be processed in xcan_err_interrupt().
Fixes: b1201e44f5 ("can: xilinx CAN controller support")
Signed-off-by: Anssi Hannula <anssi.hannula@bitwise.fi>
Cc: Michal Simek <michal.simek@xilinx.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 620050d9c2 upstream.
The xilinx_can driver assumes that the TXOK interrupt only clears after
it has been acknowledged as many times as there have been successfully
sent frames.
However, the documentation does not mention such behavior, instead
saying just that the interrupt is cleared when the clear bit is set.
Similarly, testing seems to also suggest that it is immediately cleared
regardless of the amount of frames having been sent. Performing some
heavy TX load and then going back to idle has the tx_head drifting
further away from tx_tail over time, steadily reducing the amount of
frames the driver keeps in the TX FIFO (but not to zero, as the TXOK
interrupt always frees up space for 1 frame from the driver's
perspective, so frames continue to be sent) and delaying the local echo
frames.
The TX FIFO tracking is also otherwise buggy as it does not account for
TX FIFO being cleared after software resets, causing
BUG!, TX FIFO full when queue awake!
messages to be output.
There does not seem to be any way to accurately track the state of the
TX FIFO for local echo support while using the full TX FIFO.
The Zynq version of the HW (but not the soft-AXI version) has watermark
programming support and with it an additional TX-FIFO-empty interrupt
bit.
Modify the driver to only put 1 frame into TX FIFO at a time on soft-AXI
and 2 frames at a time on Zynq. On Zynq the TXFEMP interrupt bit is used
to detect whether 1 or 2 frames have been sent at interrupt processing
time.
Tested with the integrated CAN on Zynq-7000 SoC. The 1-frame-FIFO mode
was also tested.
An alternative way to solve this would be to drop local echo support but
keep using the full TX FIFO.
v2: Add FIFO space check before TX queue wake with locking to
synchronize with queue stop. This avoids waking the queue when xmit()
had just filled it.
v3: Keep local echo support and reduce the amount of frames in FIFO
instead as suggested by Marc Kleine-Budde.
Fixes: b1201e44f5 ("can: xilinx CAN controller support")
Signed-off-by: Anssi Hannula <anssi.hannula@bitwise.fi>
Cc: <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2574fe5451 upstream.
The xilinx_can driver performs a software reset when an RX overrun is
detected. This causes the device to enter Configuration mode where no
messages are received or transmitted.
The documentation does not mention any need to perform a reset on an RX
overrun, and testing by inducing an RX overflow also indicated that the
device continues to work just fine without a reset.
Remove the software reset.
Tested with the integrated CAN on Zynq-7000 SoC.
Fixes: b1201e44f5 ("can: xilinx CAN controller support")
Signed-off-by: Anssi Hannula <anssi.hannula@bitwise.fi>
Cc: <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 877e0b7594 upstream.
The xilinx_can driver contains no mechanism for propagating recovery
from CAN_STATE_ERROR_WARNING and CAN_STATE_ERROR_PASSIVE.
Add such a mechanism by factoring the handling of
XCAN_STATE_ERROR_PASSIVE and XCAN_STATE_ERROR_WARNING out of
xcan_err_interrupt and checking for recovery after RX and TX if the
interface is in one of those states.
Tested with the integrated CAN on Zynq-7000 SoC.
Fixes: b1201e44f5 ("can: xilinx CAN controller support")
Signed-off-by: Anssi Hannula <anssi.hannula@bitwise.fi>
Cc: <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8ebd83bdb0 upstream.
There are several issues with the suspend/resume handling code of the
driver:
- The device is attached and detached in the runtime_suspend() and
runtime_resume() callbacks if the interface is running. However,
during xcan_chip_start() the interface is considered running,
causing the resume handler to incorrectly call netif_start_queue()
at the beginning of xcan_chip_start(), and on xcan_chip_start() error
return the suspend handler detaches the device leaving the user
unable to bring-up the device anymore.
- The device is not brought properly up on system resume. A reset is
done and the code tries to determine the bus state after that.
However, after reset the device is always in Configuration mode
(down), so the state checking code does not make sense and
communication will also not work.
- The suspend callback tries to set the device to sleep mode (low-power
mode which monitors the bus and brings the device back to normal mode
on activity), but then immediately disables the clocks (possibly
before the device reaches the sleep mode), which does not make sense
to me. If a clean shutdown is wanted before disabling clocks, we can
just bring it down completely instead of only sleep mode.
Reorganize the PM code so that only the clock logic remains in the
runtime PM callbacks and the system PM callbacks contain the device
bring-up/down logic. This makes calling the runtime PM callbacks during
e.g. xcan_chip_start() safe.
The system PM callbacks now simply call common code to start/stop the
HW if the interface was running, replacing the broken code from before.
xcan_chip_stop() is updated to use the common reset code so that it will
wait for the reset to complete. Reset also disables all interrupts so do
not do that separately.
Also, the device_may_wakeup() checks are removed as the driver does not
have wakeup support.
Tested on Zynq-7000 integrated CAN.
Signed-off-by: Anssi Hannula <anssi.hannula@bitwise.fi>
Cc: Michal Simek <michal.simek@xilinx.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 32852c561b upstream.
If the device gets into a state where RXNEMP (RX FIFO not empty)
interrupt is asserted without RXOK (new frame received successfully)
interrupt being asserted, xcan_rx_poll() will continue to try to clear
RXNEMP without actually reading frames from RX FIFO. If the RX FIFO is
not empty, the interrupt will not be cleared and napi_schedule() will
just be called again.
This situation can occur when:
(a) xcan_rx() returns without reading RX FIFO due to an error condition.
The code tries to clear both RXOK and RXNEMP but RXNEMP will not clear
due to a frame still being in the FIFO. The frame will never be read
from the FIFO as RXOK is no longer set.
(b) A frame is received between xcan_rx_poll() reading interrupt status
and clearing RXOK. RXOK will be cleared, but RXNEMP will again remain
set as the new message is still in the FIFO.
I'm able to trigger case (b) by flooding the bus with frames under load.
There does not seem to be any benefit in using both RXNEMP and RXOK in
the way the driver does, and the polling example in the reference manual
(UG585 v1.10 18.3.7 Read Messages from RxFIFO) also says that either
RXOK or RXNEMP can be used for detecting incoming messages.
Fix the issue and simplify the RX processing by only using RXNEMP
without RXOK.
Tested with the integrated CAN on Zynq-7000 SoC.
Fixes: b1201e44f5 ("can: xilinx CAN controller support")
Signed-off-by: Anssi Hannula <anssi.hannula@bitwise.fi>
Cc: <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 722e5f2b1e upstream.
Commit 52cdbdd498 (driver core: correct device's shutdown order)
introduced a regression by breaking device shutdown on some systems.
Namely, the devices_kset_move_last() call in really_probe() added by
that commit is a mistake as it may cause parents to follow children
in the devices_kset list which then causes shutdown to fail. For
example, if a device has children before really_probe() is called
for it (which is not uncommon), that call will cause it to be
reordered after the children in the devices_kset list and the
ordering of that list will not reflect the correct device shutdown
order any more.
Also it causes the devices_kset list to be constantly reordered
until all drivers have been probed which is totally pointless
overhead in the majority of cases and it only covered an issue
with system shutdown, while system-wide suspend/resume potentially
had the same issue on the affected platforms (which was not covered).
Moreover, the shutdown issue originally addressed by the change in
really_probe() made by commit 52cdbdd498 is not present in 4.18-rc
any more, since dra7 started to use the sdhci-omap driver which
doesn't disable any regulators during shutdown, so the really_probe()
part of commit 52cdbdd498 can be safely reverted. [The original
issue was related to the omap_hsmmc driver used by dra7 previously.]
For the above reasons, revert the really_probe() modifications made
by commit 52cdbdd498.
The other code changes made by commit 52cdbdd498 are useful and
they need not be reverted.
Fixes: 52cdbdd498 (driver core: correct device's shutdown order)
Link: https://lore.kernel.org/lkml/CAFgQCTt7VfqM=UyCnvNFxrSw8Z6cUtAi3HUwR4_xPAc03SgHjQ@mail.gmail.com/
Reported-by: Pingfan Liu <kernelfans@gmail.com>
Tested-by: Pingfan Liu <kernelfans@gmail.com>
Reviewed-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4d644abf25 upstream.
Commit 1b9ba000 ("Allow function drivers to pause control
transfers") states that USB_GADGET_DELAYED_STATUS is only
supported if data phase is 0 bytes.
It seems that when the length is not 0 bytes, there is no
need to explicitly delay the data stage since the transfer
is not completed until the user responds. However, when the
length is 0, there is no data stage and the transfer is
finished once setup() returns, hence there is a need to
explicitly delay completion.
This manifests as the following bugs:
Prior to 946ef68ad4 ('Let setup() return
USB_GADGET_DELAYED_STATUS'), when setup is 0 bytes, ffs
would require user to queue a 0 byte request in order to
clear setup state. However, that 0 byte request was actually
not needed and would hang and cause errors in other setup
requests.
After the above commit, 0 byte setups work since the gadget
now accepts empty queues to ep0 to clear the delay, but all
other setups hang.
Fixes: 946ef68ad4 ("Let setup() return USB_GADGET_DELAYED_STATUS")
Signed-off-by: Jerry Zhang <zhangjerry@google.com>
Cc: stable <stable@vger.kernel.org>
Acked-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 249a32b7ee upstream.
Based on USB2.0 Spec Section 11.12.5,
"If a hub has per-port power switching and per-port current limiting,
an over-current on one port may still cause the power on another port
to fall below specific minimums. In this case, the affected port is
placed in the Power-Off state and C_PORT_OVER_CURRENT is set for the
port, but PORT_OVER_CURRENT is not set."
so let's check C_PORT_OVER_CURRENT too for over current condition.
Fixes: 08d1dec6f4 ("usb:hub set hub->change_bits when over-current happens")
Cc: <stable@vger.kernel.org>
Tested-by: Alessandro Antenucci <antenucci@korg.it>
Signed-off-by: Bin Liu <b-liu@ti.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 8541b21e78 ]
In order to be able to give better diagnostics and detect
malicious traffic, we need to have better sk->sk_drops tracking.
Fixes: 9f5afeae51 ("tcp: use an RB tree for ooo receive queue")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3d4bf93ac1 ]
In case an attacker feeds tiny packets completely out of order,
tcp_collapse_ofo_queue() might scan the whole rb-tree, performing
expensive copies, but not changing socket memory usage at all.
1) Do not attempt to collapse tiny skbs.
2) Add logic to exit early when too many tiny skbs are detected.
We prefer not doing aggressive collapsing (which copies packets)
for pathological flows, and revert to tcp_prune_ofo_queue() which
will be less expensive.
In the future, we might add the possibility of terminating flows
that are proven to be malicious.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit f4a3313d8e ]
Right after a TCP flow is created, receiving tiny out of order
packets allways hit the condition :
if (atomic_read(&sk->sk_rmem_alloc) >= sk->sk_rcvbuf)
tcp_clamp_window(sk);
tcp_clamp_window() increases sk_rcvbuf to match sk_rmem_alloc
(guarded by tcp_rmem[2])
Calling tcp_collapse_ofo_queue() in this case is not useful,
and offers a O(N^2) surface attack to malicious peers.
Better not attempt anything before full queue capacity is reached,
forcing attacker to spend lots of resource and allow us to more
easily detect the abuse.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 72cd43ba64 ]
Juha-Matti Tilli reported that malicious peers could inject tiny
packets in out_of_order_queue, forcing very expensive calls
to tcp_collapse_ofo_queue() and tcp_prune_ofo_queue() for
every incoming packet. out_of_order_queue rb-tree can contain
thousands of nodes, iterating over all of them is not nice.
Before linux-4.9, we would have pruned all packets in ofo_queue
in one go, every XXXX packets. XXXX depends on sk_rcvbuf and skbs
truesize, but is about 7000 packets with tcp_rmem[2] default of 6 MB.
Since we plan to increase tcp_rmem[2] in the future to cope with
modern BDP, can not revert to the old behavior, without great pain.
Strategy taken in this patch is to purge ~12.5 % of the queue capacity.
Fixes: 36a6503fed ("tcp: refine tcp_prune_ofo_queue() to not drop all packets")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Juha-Matti Tilli <juha-matti.tilli@iki.fi>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a0496ef2c2 ]
Per DCTCP RFC8257 (Section 3.2) the ACK reflecting the CE status change
has to be sent immediately so the sender can respond quickly:
""" When receiving packets, the CE codepoint MUST be processed as follows:
1. If the CE codepoint is set and DCTCP.CE is false, set DCTCP.CE to
true and send an immediate ACK.
2. If the CE codepoint is not set and DCTCP.CE is true, set DCTCP.CE
to false and send an immediate ACK.
"""
Previously DCTCP implementation may continue to delay the ACK. This
patch fixes that to implement the RFC by forcing an immediate ACK.
Tested with this packetdrill script provided by Larry Brakmo
0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
0.000 setsockopt(3, SOL_TCP, TCP_CONGESTION, "dctcp", 5) = 0
0.000 bind(3, ..., ...) = 0
0.000 listen(3, 1) = 0
0.100 < [ect0] SEW 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
0.100 > SE. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 8>
0.110 < [ect0] . 1:1(0) ack 1 win 257
0.200 accept(3, ..., ...) = 4
+0 setsockopt(4, SOL_SOCKET, SO_DEBUG, [1], 4) = 0
0.200 < [ect0] . 1:1001(1000) ack 1 win 257
0.200 > [ect01] . 1:1(0) ack 1001
0.200 write(4, ..., 1) = 1
0.200 > [ect01] P. 1:2(1) ack 1001
0.200 < [ect0] . 1001:2001(1000) ack 2 win 257
+0.005 < [ce] . 2001:3001(1000) ack 2 win 257
+0.000 > [ect01] . 2:2(0) ack 2001
// Previously the ACK below would be delayed by 40ms
+0.000 > [ect01] E. 2:2(0) ack 3001
+0.500 < F. 9501:9501(0) ack 4 win 257
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 27cde44a25 ]
Currently when a DCTCP receiver delays an ACK and receive a
data packet with a different CE mark from the previous one's, it
sends two immediate ACKs acking previous and latest sequences
respectly (for ECN accounting).
Previously sending the first ACK may mark off the delayed ACK timer
(tcp_event_ack_sent). This may subsequently prevent sending the
second ACK to acknowledge the latest sequence (tcp_ack_snd_check).
The culprit is that tcp_send_ack() assumes it always acknowleges
the latest sequence, which is not true for the first special ACK.
The fix is to not make the assumption in tcp_send_ack and check the
actual ack sequence before cancelling the delayed ACK. Further it's
safer to pass the ack sequence number as a local variable into
tcp_send_ack routine, instead of intercepting tp->rcv_nxt to avoid
future bugs like this.
Reported-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b0c05d0e99 ]
Previously, when a data segment was sent an ACK was piggybacked
on the data segment without generating a CA_EVENT_NON_DELAYED_ACK
event to notify congestion control modules. So the DCTCP
ca->delayed_ack_reserved flag could incorrectly stay set when
in fact there were no delayed ACKs being reserved. This could result
in sending a special ECN notification ACK that carries an older
ACK sequence, when in fact there was no need for such an ACK.
DCTCP keeps track of the delayed ACK status with its own separate
state ca->delayed_ack_reserved. Previously it may accidentally cancel
the delayed ACK without updating this field upon sending a special
ACK that carries a older ACK sequence. This inconsistency would
lead to DCTCP receiver never acknowledging the latest data until the
sender times out and retry in some cases.
Packetdrill script (provided by Larry Brakmo)
0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
0.000 setsockopt(3, SOL_TCP, TCP_CONGESTION, "dctcp", 5) = 0
0.000 bind(3, ..., ...) = 0
0.000 listen(3, 1) = 0
0.100 < [ect0] SEW 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
0.100 > SE. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 8>
0.110 < [ect0] . 1:1(0) ack 1 win 257
0.200 accept(3, ..., ...) = 4
0.200 < [ect0] . 1:1001(1000) ack 1 win 257
0.200 > [ect01] . 1:1(0) ack 1001
0.200 write(4, ..., 1) = 1
0.200 > [ect01] P. 1:2(1) ack 1001
0.200 < [ect0] . 1001:2001(1000) ack 2 win 257
0.200 write(4, ..., 1) = 1
0.200 > [ect01] P. 2:3(1) ack 2001
0.200 < [ect0] . 2001:3001(1000) ack 3 win 257
0.200 < [ect0] . 3001:4001(1000) ack 3 win 257
0.200 > [ect01] . 3:3(0) ack 4001
0.210 < [ce] P. 4001:4501(500) ack 3 win 257
+0.001 read(4, ..., 4500) = 4500
+0 write(4, ..., 1) = 1
+0 > [ect01] PE. 3:4(1) ack 4501
+0.010 < [ect0] W. 4501:5501(1000) ack 4 win 257
// Previously the ACK sequence below would be 4501, causing a long RTO
+0.040~+0.045 > [ect01] . 4:4(0) ack 5501 // delayed ack
+0.311 < [ect0] . 5501:6501(1000) ack 4 win 257 // More data
+0 > [ect01] . 4:4(0) ack 6501 // now acks everything
+0.500 < F. 9501:9501(0) ack 4 win 257
Reported-by: Larry Brakmo <brakmo@fb.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5025f7f7d5 ]
rtnl_configure_link sets dev->rtnl_link_state to
RTNL_LINK_INITIALIZED and unconditionally calls
__dev_notify_flags to notify user-space of dev flags.
current call sequence for rtnl_configure_link
rtnetlink_newlink
rtnl_link_ops->newlink
rtnl_configure_link (unconditionally notifies userspace of
default and new dev flags)
If a newlink handler wants to call rtnl_configure_link
early, we will end up with duplicate notifications to
user-space.
This patch fixes rtnl_configure_link to check rtnl_link_state
and call __dev_notify_flags with gchanges = 0 if already
RTNL_LINK_INITIALIZED.
Later in the series, this patch will help the following sequence
where a driver implementing newlink can call rtnl_configure_link
to initialize the link early.
makes the following call sequence work:
rtnetlink_newlink
rtnl_link_ops->newlink (vxlan) -> rtnl_configure_link (initializes
link and notifies
user-space of default
dev flags)
rtnl_configure_link (updates dev flags if requested by user ifm
and notifies user-space of new dev flags)
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 215d08a85b ]
The situation described in the comment can occur also with
PHY_IGNORE_INTERRUPT, therefore change the condition to include it.
Fixes: f555f34fdc ("net: phy: fix auto-negotiation stall due to unavailable interrupt")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There are two scenarios that we will restore deleted records. The first is
when device down and up(or unmap/remap). In this scenario the new filter
mode is same with previous one. Because we get it from in_dev->mc_list and
we do not touch it during device down and up.
The other scenario is when a new socket join a group which was just delete
and not finish sending status reports. In this scenario, we should use the
current filter mode instead of restore old one. Here are 4 cases in total.
old_socket new_socket before_fix after_fix
IN(A) IN(A) ALLOW(A) ALLOW(A)
IN(A) EX( ) TO_IN( ) TO_EX( )
EX( ) IN(A) TO_EX( ) ALLOW(A)
EX( ) EX( ) TO_EX( ) TO_EX( )
Fixes: 24803f38a5 (igmp: do not remove igmp souce list info when set link down)
Fixes: 1666d49e1d (mld: do not remove mld souce list info when set link down)
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 33180bee86 ]
When driver converts HW timestamp to wall clock time it subtracts
the last saved cycle counter from the HW timestamp and converts the
difference to nanoseconds.
The conversion is done by multiplying the cycles difference with the
clock multiplier value as a first step and therefore the cycles
difference should be small enough so that the multiplication product
doesn't exceed 64bit.
The overflow handling routine is in charge of updating the last saved
cycle counter in driver and it is called periodically using kernel
delayed workqueue.
The delay period for this work is calculated using the max HW cycle
counter value (a 41 bit mask) as a base which doesn't take the 64bit
limit into account so the delay period may be incorrect and too
long to prevent a large difference between the HW counter and the last
saved counter in SW.
This change adjusts the work period for the HW clock overflow work by
taking the minimum between the previous value and the quotient of max
u64 value and the clock multiplier value.
Fixes: ef9814deaf ("net/mlx5e: Add HW timestamping (TS) support")
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 958c696f5a ]
Function mlx4_RST2INIT_QP_wrapper saved the qp number passed in the qp
context, rather than the one passed in the input modifier.
However, the qp number in the qp context is not defined as a
required parameter by the FW. Therefore, drivers may choose to not
specify the qp number in the qp context for the reset-to-init transition.
Thus, we must save the qp number passed in the command input modifier --
which is always present. (This saved qp number is used as the input
modifier for command 2RST_QP when a slave's qp's are destroyed).
Fixes: c82e9aa0a8 ("mlx4_core: resource tracking for HCA resources used by guests")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2efd4fca70 ]
Syzbot reported a read beyond the end of the skb head when returning
IPV6_ORIGDSTADDR:
BUG: KMSAN: kernel-infoleak in put_cmsg+0x5ef/0x860 net/core/scm.c:242
CPU: 0 PID: 4501 Comm: syz-executor128 Not tainted 4.17.0+ #9
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x185/0x1d0 lib/dump_stack.c:113
kmsan_report+0x188/0x2a0 mm/kmsan/kmsan.c:1125
kmsan_internal_check_memory+0x138/0x1f0 mm/kmsan/kmsan.c:1219
kmsan_copy_to_user+0x7a/0x160 mm/kmsan/kmsan.c:1261
copy_to_user include/linux/uaccess.h:184 [inline]
put_cmsg+0x5ef/0x860 net/core/scm.c:242
ip6_datagram_recv_specific_ctl+0x1cf3/0x1eb0 net/ipv6/datagram.c:719
ip6_datagram_recv_ctl+0x41c/0x450 net/ipv6/datagram.c:733
rawv6_recvmsg+0x10fb/0x1460 net/ipv6/raw.c:521
[..]
This logic and its ipv4 counterpart read the destination port from
the packet at skb_transport_offset(skb) + 4.
With MSG_MORE and a local SOCK_RAW sender, syzbot was able to cook a
packet that stores headers exactly up to skb_transport_offset(skb) in
the head and the remainder in a frag.
Call pskb_may_pull before accessing the pointer to ensure that it lies
in skb head.
Link: http://lkml.kernel.org/r/CAF=yD-LEJwZj5a1-bAAj2Oy_hKmGygV6rsJ_WOrAYnv-fnayiQ@mail.gmail.com
Reported-by: syzbot+9adb4b567003cac781f0@syzkaller.appspotmail.com
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3dd1c9a127 ]
The skb hash for locally generated ip[v6] fragments belonging
to the same datagram can vary in several circumstances:
* for connected UDP[v6] sockets, the first fragment get its hash
via set_owner_w()/skb_set_hash_from_sk()
* for unconnected IPv6 UDPv6 sockets, the first fragment can get
its hash via ip6_make_flowlabel()/skb_get_hash_flowi6(), if
auto_flowlabel is enabled
For the following frags the hash is usually computed via
skb_get_hash().
The above can cause OoO for unconnected IPv6 UDPv6 socket: in that
scenario the egress tx queue can be selected on a per packet basis
via the skb hash.
It may also fool flow-oriented schedulers to place fragments belonging
to the same datagram in different flows.
Fix the issue by copying the skb hash from the head frag into
the others at fragmentation time.
Before this commit:
perf probe -a "dev_queue_xmit skb skb->hash skb->l4_hash:b1@0/8 skb->sw_hash:b1@1/8"
netperf -H $IPV4 -t UDP_STREAM -l 5 -- -m 2000 -n &
perf record -e probe:dev_queue_xmit -e probe:skb_set_owner_w -a sleep 0.1
perf script
probe:dev_queue_xmit: (ffffffff8c6b1b20) hash=3713014309 l4_hash=1 sw_hash=0
probe:dev_queue_xmit: (ffffffff8c6b1b20) hash=0 l4_hash=0 sw_hash=0
After this commit:
probe:dev_queue_xmit: (ffffffff8c6b1b20) hash=2171763177 l4_hash=1 sw_hash=0
probe:dev_queue_xmit: (ffffffff8c6b1b20) hash=2171763177 l4_hash=1 sw_hash=0
Fixes: b73c3d0e4f ("net: Save TX flow hash in sock and set in skbuf on xmit")
Fixes: 67800f9b1f ("ipv6: Call skb_get_hash_flowi6 to get skb->hash in ip6_make_flowlabel")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 38c0a74fe0 upstream.
The MIPS implementation of pci_resource_to_user() introduced in v3.12 by
commit 4c2924b725 ("MIPS: PCI: Use pci_resource_to_user to map pci
memory space properly") incorrectly sets *end to the address of the
byte after the resource, rather than the last byte of the resource.
This results in userland seeing resources as a byte larger than they
actually are, for example a 32 byte BAR will be reported by a tool such
as lspci as being 33 bytes in size:
Region 2: I/O ports at 1000 [disabled] [size=33]
Correct this by subtracting one from the calculated end address,
reporting the correct address to userland.
Signed-off-by: Paul Burton <paul.burton@mips.com>
Reported-by: Rui Wang <rui.wang@windriver.com>
Fixes: 4c2924b725 ("MIPS: PCI: Use pci_resource_to_user to map pci memory space properly")
Cc: James Hogan <jhogan@kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Wolfgang Grandegger <wg@grandegger.com>
Cc: linux-mips@linux-mips.org
Cc: stable@vger.kernel.org # v3.12+
Patchwork: https://patchwork.linux-mips.org/patch/19829/
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1dc3039bc8 upstream.
When blk_queue_enter() waits for a queue to unfreeze, or unset the
PREEMPT_ONLY flag, do not allow it to be interrupted by a signal.
The PREEMPT_ONLY flag was introduced later in commit 3a0a529971
("block, scsi: Make SCSI quiesce and resume work reliably"). Note the SCSI
device is resumed asynchronously, i.e. after un-freezing userspace tasks.
So that commit exposed the bug as a regression in v4.15. A mysterious
SIGBUS (or -EIO) sometimes happened during the time the device was being
resumed. Most frequently, there was no kernel log message, and we saw Xorg
or Xwayland killed by SIGBUS.[1]
[1] E.g. https://bugzilla.redhat.com/show_bug.cgi?id=1553979
Without this fix, I get an IO error in this test:
# dd if=/dev/sda of=/dev/null iflag=direct & \
while killall -SIGUSR1 dd; do sleep 0.1; done & \
echo mem > /sys/power/state ; \
sleep 5; killall dd # stop after 5 seconds
The interruptible wait was added to blk_queue_enter in
commit 3ef28e83ab ("block: generic request_queue reference counting").
Before then, the interruptible wait was only in blk-mq, but I don't think
it could ever have been correct.
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alan Jenkins <alan.christopher.jenkins@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a8f688ec43 upstream.
The use of -EAGAIN in rpcrdma_convert_iovs() is a latent bug: the
transport never calls xprt_write_space() when more pages become
available. -ENOBUFS will trigger the correct "delay briefly and call
again" logic.
Fixes: 7a89f9c626 ("xprtrdma: Honor ->send_request API contract")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: stable@vger.kernel.org # 4.8+
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 229bc19fd7 upstream.
Don't rely on event interrupt (EINT) bit alone to detect pending port
change in resume. If no change event is detected the host may be suspended
again, oterwise roothubs are resumed.
There is a lag in xHC setting EINT. If we don't notice the pending change
in resume, and the controller is runtime suspeded again, it causes the
event handler to assume host is dead as it will fail to read xHC registers
once PCI puts the controller to D3 state.
[ 268.520969] xhci_hcd: xhci_resume: starting port polling.
[ 268.520985] xhci_hcd: xhci_hub_status_data: stopping port polling.
[ 268.521030] xhci_hcd: xhci_suspend: stopping port polling.
[ 268.521040] xhci_hcd: // Setting command ring address to 0x349bd001
[ 268.521139] xhci_hcd: Port Status Change Event for port 3
[ 268.521149] xhci_hcd: resume root hub
[ 268.521163] xhci_hcd: port resume event for port 3
[ 268.521168] xhci_hcd: xHC is not running.
[ 268.521174] xhci_hcd: handle_port_status: starting port polling.
[ 268.596322] xhci_hcd: xhci_hc_died: xHCI host controller not responding, assume dead
The EINT lag is described in a additional note in xhci specs 4.19.2:
"Due to internal xHC scheduling and system delays, there will be a lag
between a change bit being set and the Port Status Change Event that it
generated being written to the Event Ring. If SW reads the PORTSC and
sees a change bit set, there is no guarantee that the corresponding Port
Status Change Event has already been written into the Event Ring."
Cc: <stable@vger.kernel.org>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e78bfb0751 ]
Commit 8b7008620b ("net: Don't copy pfmemalloc flag in
__copy_skb_header()") introduced a different handling for the
pfmemalloc flag in copy and clone paths.
In __skb_clone(), now, the flag is set only if it was set in the
original skb, but not cleared if it wasn't. This is wrong and
might lead to socket buffers being flagged with pfmemalloc even
if the skb data wasn't allocated from pfmemalloc reserves. Copy
the flag instead of ORing it.
Reported-by: Sabrina Dubroca <sd@queasysnail.net>
Fixes: 8b7008620b ("net: Don't copy pfmemalloc flag in __copy_skb_header()")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Tested-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 8b7008620b ]
The pfmemalloc flag indicates that the skb was allocated from
the PFMEMALLOC reserves, and the flag is currently copied on skb
copy and clone.
However, an skb copied from an skb flagged with pfmemalloc
wasn't necessarily allocated from PFMEMALLOC reserves, and on
the other hand an skb allocated that way might be copied from an
skb that wasn't.
So we should not copy the flag on skb copy, and rather decide
whether to allow an skb to be associated with sockets unrelated
to page reclaim depending only on how it was allocated.
Move the pfmemalloc flag before headers_start[0] using an
existing 1-bit hole, so that __copy_skb_header() doesn't copy
it.
When cloning, we'll now take care of this flag explicitly,
contravening to the warning comment of __skb_clone().
While at it, restore the newline usage introduced by commit
b193722731 ("net: reorganize sk_buff for faster
__copy_skb_header()") to visually separate bytes used in
bitfields after headers_start[0], that was gone after commit
a9e419dc7b ("netfilter: merge ctinfo into nfct pointer storage
area"), and describe the pfmemalloc flag in the kernel-doc
structure comment.
This doesn't change the size of sk_buff or cacheline boundaries,
but consolidates the 15 bits hole before tc_index into a 2 bytes
hole before csum, that could now be filled more easily.
Reported-by: Patrick Talbert <ptalbert@redhat.com>
Fixes: c93bdd0e03 ("netvm: allow skb allocation to use PFMEMALLOC reserves")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5c968f4802 ]
mii_nway_restart is not pm aware which results in a rtnl deadlock.
Implement mii_nway_restart manual by setting BMCR_ANRESTART if
BMCR_ANENABLE is set.
To reproduce:
* plug an asix based usb network interface
* wait until the device enters PM (~5 sec)
* `ip link set eth1 up` will never return
Fixes: d9fe64e511 ("net: asix: Add in_pm parameter")
Signed-off-by: Alexander Couzens <lynxis@fe80.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 38cd58ed9c ]
This adds the USB id of LTE modem Quectel EG91. It requires the
same quirk as other Quectel modems to make it work.
Signed-off-by: Matevz Vucnik <vucnikm@gmail.com>
Acked-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 9ba8376ce1 ]
It seems that a *break* is missing in order to avoid falling through
to the default case. Otherwise, checking *chan* makes no sense.
Fixes: 72df7a7244 ("ptp: Allow reassigning calibration pin function")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit df8ed346d4 ]
Currently also the pause flags are removed from phydev->supported because
they're not included in PHY_DEFAULT_FEATURES. I don't think this is
intended, especially when considering that this function can be called
via phy_set_max_speed() anywhere in a driver. Change the masking to mask
out only the values we're going to change. In addition remove the
misleading comment, job of this small function is just to adjust the
supported and advertised speeds.
Fixes: f3a6bd393c ("phylib: Add phy_set_max_speed helper")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e7372197e1 ]
Xin reported that icmp replies may not use the address on the device the
echo request is received if the destination address is broadcast. Instead
a route lookup is done without considering VRF context. Fix by setting
oif in flow struct to the master device if it is enslaved. That directs
the lookup to the VRF table. If the device is not enslaved, oif is still
0 so no affect.
Fixes: cd2fbe1b6b ("net: Use VRF device index for lookups on RX")
Reported-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit acc2cf4e37 ]
When tcp_diag_destroy closes a TCP_NEW_SYN_RECV socket, it first
frees it by calling inet_csk_reqsk_queue_drop_and_and_put in
tcp_abort, and then frees it again by calling sock_gen_put.
Since tcp_abort only has one caller, and all the other codepaths
in tcp_abort don't free the socket, just remove the free in that
function.
Cc: David Ahern <dsa@cumulusnetworks.com>
Tested: passes Android sock_diag_test.py, which exercises this codepath
Fixes: d7226c7a4d ("net: diag: Fix refcnt leak in error path destroying socket")
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsa@cumulusnetworks.com>
Tested-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 107d01f5ba ]
rhashtable_init() currently does not take into account the user-passed
min_size parameter unless param->nelem_hint is set as well. As such,
the default size (number of buckets) will always be HASH_DEFAULT_SIZE
even if the smallest allowed size is larger than that. Remediate this
by unconditionally calling into rounded_hashtable_size() and handling
things accordingly.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 169dc027fb ]
The rol32 call is currently rotating hash but the rol'd value is
being discarded. I believe the current code is incorrect and hash
should be assigned the rotated value returned from rol32.
Thanks to David Lebrun for spotting this.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 70ba5b6db9 ]
The low and high values of the net.ipv4.ping_group_range sysctl were
being silently forced to the default disabled state when a write to the
sysctl contained GIDs that didn't map to the associated user namespace.
Confusingly, the sysctl's write operation would return success and then
a subsequent read of the sysctl would indicate that the low and high
values are the overflowgid.
This patch changes the behavior by clearly returning an error when the
sysctl write operation receives a GID range that doesn't map to the
associated user namespace. In such a situation, the previous value of
the sysctl is preserved and that range will be returned in a subsequent
read of the sysctl.
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d5a672ac9f ]
The gen_stats facility will add a header for the toplevel nlattr of type
TCA_STATS2 that contains all stats added by qdisc callbacks. A reference
to this header is stored in the gnet_dump struct, and when all the
per-qdisc callbacks have finished adding their stats, the length of the
containing header will be adjusted to the right value.
However, on architectures that need padding (i.e., that don't set
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS), the padding nlattr is added
before the stats, which means that the stored pointer will point to the
padding, and so when the header is fixed up, the result is just a very
big padding nlattr. Because most qdiscs also supply the legacy TCA_STATS
struct, this problem has been mostly invisible, but we exposed it with
the netlink attribute-based statistics in CAKE.
Fix the issue by fixing up the stored pointer if it points to a padding
nlattr.
Tested-by: Pete Heist <pete@heistp.net>
Tested-by: Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk>
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0e714d2778 upstream.
info.index can be indirectly controlled by user-space, hence leading
to a potential exploitation of the Spectre variant 1 vulnerability.
This issue was detected with the help of Smatch:
drivers/vfio/pci/vfio_pci.c:734 vfio_pci_ioctl()
warn: potential spectre issue 'vdev->region'
Fix this by sanitizing info.index before indirectly using it to index
vdev->region
Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].
[1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2
Cc: stable@vger.kernel.org
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9f15bde671 upstream.
It was reported that a kernel crash happened in mem_cgroup_iter(), which
can be triggered if the legacy cgroup-v1 non-hierarchical mode is used.
Unable to handle kernel paging request at virtual address 6b6b6b6b6b6b8f
......
Call trace:
mem_cgroup_iter+0x2e0/0x6d4
shrink_zone+0x8c/0x324
balance_pgdat+0x450/0x640
kswapd+0x130/0x4b8
kthread+0xe8/0xfc
ret_from_fork+0x10/0x20
mem_cgroup_iter():
......
if (css_tryget(css)) <-- crash here
break;
......
The crashing reason is that mem_cgroup_iter() uses the memcg object whose
pointer is stored in iter->position, which has been freed before and
filled with POISON_FREE(0x6b).
And the root cause of the use-after-free issue is that
invalidate_reclaim_iterators() fails to reset the value of iter->position
to NULL when the css of the memcg is released in non- hierarchical mode.
Link: http://lkml.kernel.org/r/1531994807-25639-1-git-send-email-jing.xia@unisoc.com
Fixes: 6df38689e0 ("mm: memcontrol: fix possible memcg leak due to interrupted reclaim")
Signed-off-by: Jing Xia <jing.xia.mail@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: <chunyan.zhang@unisoc.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 64234961c1 upstream.
We used to have pre-set CONFIG_INITRAMFS_SOURCE with local path
to intramfs in ARC defconfigs. This was quite convenient for
in-house development but not that convenient for newcomers
who obviusly don't have folders like "arc_initramfs" next to
the Linux source tree. Which leads to quite surprising failure
of defconfig building:
------------------------------->8-----------------------------
../scripts/gen_initramfs_list.sh: Cannot open '../../arc_initramfs_hs/'
../usr/Makefile:57: recipe for target 'usr/initramfs_data.cpio.gz' failed
make[2]: *** [usr/initramfs_data.cpio.gz] Error 1
------------------------------->8-----------------------------
So now when more and more people start to deal with our defconfigs
let's make their life easier with removal of CONFIG_INITRAMFS_SOURCE.
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Cc: Kevin Hilman <khilman@baylibre.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 93312b6da4 upstream.
mprotect(EXEC) was failing for stack mappings as default vm flags was
missing MAYEXEC.
This was triggered by glibc test suite nptl/tst-execstack testcase
What is surprising is that despite running LTP for years on, we didn't
catch this issue as it lacks a directed test case.
gcc dejagnu tests with nested functions also requiring exec stack work
fine though because they rely on the GNU_STACK segment spit out by
compiler and handled in kernel elf loader.
This glibc case is different as the stack is non exec to begin with and
a dlopen of shared lib with GNU_STACK segment triggers the exec stack
proceedings using a mprotect(PROT_EXEC) which was broken.
CC: stable@vger.kernel.org
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6e3761145a upstream.
swap was broken on ARC due to silly copy-paste issue.
We encode offset from swapcache page in __swp_entry() as (off << 13) but
were not decoding back in __swp_offset() as (off >> 13) - it was still
(off << 13).
This finally fixes swap usage on ARC.
| # mkswap /dev/sda2
|
| # swapon -a -e /dev/sda2
| Adding 500728k swap on /dev/sda2. Priority:-2 extents:1 across:500728k
|
| # free
| total used free shared buffers cached
| Mem: 765104 13456 751648 4736 8 4736
| -/+ buffers/cache: 8712 756392
| Swap: 500728 0 500728
Cc: stable@vger.kernel.org
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b5020a8e6b upstream.
Syzbot reports crashes in kvm_irqfd_assign(), caused by use-after-free
when kvm_irqfd_assign() and kvm_irqfd_deassign() run in parallel
for one specific eventfd. When the assign path hasn't finished but irqfd
has been added to kvm->irqfds.items list, another thead may deassign the
eventfd and free struct kvm_kernel_irqfd(). The assign path then uses
the struct kvm_kernel_irqfd that has been freed by deassign path. To avoid
such issue, keep irqfd under kvm->irq_srcu protection after the irqfd
has been added to kvm->irqfds.items list, and call synchronize_srcu()
in irq_shutdown() to make sure that irqfd has been fully initialized in
the assign path.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Tianyu Lan <tianyu.lan@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 08a77676f9 upstream.
e7fd37ba12 ("cgroup: avoid copying strings longer than the buffers")
converted possibly unsafe strncpy() usages in cgroup to strscpy().
However, although the callsites are completely fine with truncated
copied, because strscpy() is marked __must_check, it led to the
following warnings.
kernel/cgroup/cgroup.c: In function ‘cgroup_file_name’:
kernel/cgroup/cgroup.c:1400:10: warning: ignoring return value of ‘strscpy’, declared with attribute warn_unused_result [-Wunused-result]
strscpy(buf, cft->name, CGROUP_FILE_NAME_MAX);
^
To avoid the warnings, 50034ed496 ("cgroup: use strlcpy() instead of
strscpy() to avoid spurious warning") switched them to strlcpy().
strlcpy() is worse than strlcpy() because it unconditionally runs
strlen() on the source string, and the only reason we switched to
strlcpy() here was because it was lacking __must_check, which doesn't
reflect any material differences between the two function. It's just
that someone added __must_check to strscpy() and not to strlcpy().
These basic string copy operations are used in variety of ways, and
one of not-so-uncommon use cases is safely handling truncated copies,
where the caller naturally doesn't care about the return value. The
__must_check doesn't match the actual use cases and forces users to
opt for inferior variants which lack __must_check by happenstance or
spread ugly (void) casts.
Remove __must_check from strscpy() and restore strscpy() usages in
cgroup.
Signed-off-by: Tejun Heo <tj@kernel.org>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ma Shimiao <mashimiao.fnst@cn.fujitsu.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
[backport only the string.h portion to remove build warnings starting to show up - gregkh]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b4f18c063a upstream.
In order to forward the guest's ARCH_WORKAROUND_2 calls to EL3,
add a small(-ish) sequence to handle it at EL2. Special care must
be taken to track the state of the guest itself by updating the
workaround flags. We also rely on patching to enable calls into
the firmware.
Note that since we need to execute branches, this always executes
after the Spectre-v2 mitigation has been applied.
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 55e3748e89 upstream.
In order to offer ARCH_WORKAROUND_2 support to guests, we need
a bit of infrastructure.
Let's add a flag indicating whether or not the guest uses
SSBD mitigation. Depending on the state of this flag, allow
KVM to disable ARCH_WORKAROUND_2 before entering the guest,
and enable it when exiting it.
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9cdc0108ba upstream.
If running on a system that performs dynamic SSBD mitigation, allow
userspace to request the mitigation for itself. This is implemented
as a prctl call, allowing the mitigation to be enabled or disabled at
will for this particular thread.
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9dd9614f54 upstream.
In order to allow userspace to be mitigated on demand, let's
introduce a new thread flag that prevents the mitigation from
being turned off when exiting to userspace, and doesn't turn
it on on entry into the kernel (with the assumption that the
mitigation is always enabled in the kernel itself).
This will be used by a prctl interface introduced in a later
patch.
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 647d0519b5 upstream.
On a system where firmware can dynamically change the state of the
mitigation, the CPU will always come up with the mitigation enabled,
including when coming back from suspend.
If the user has requested "no mitigation" via a command line option,
let's enforce it by calling into the firmware again to disable it.
Similarily, for a resume from hibernate, the mitigation could have
been disabled by the boot kernel. Let's ensure that it is set
back on in that case.
Acked-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 986372c436 upstream.
In order to avoid checking arm64_ssbd_callback_required on each
kernel entry/exit even if no mitigation is required, let's
add yet another alternative that by default jumps over the mitigation,
and that gets nop'ed out if we're doing dynamic mitigation.
Think of it as a poor man's static key...
Reviewed-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a43ae4dfe5 upstream.
On a system where the firmware implements ARCH_WORKAROUND_2,
it may be useful to either permanently enable or disable the
workaround for cases where the user decides that they'd rather
not get a trap overhead, and keep the mitigation permanently
on or off instead of switching it on exception entry/exit.
In any case, default to the mitigation being enabled.
Reviewed-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8e2906245f upstream.
In order for the kernel to protect itself, let's call the SSBD mitigation
implemented by the higher exception level (either hypervisor or firmware)
on each transition between userspace and kernel.
We must take the PSCI conduit into account in order to target the
right exception level, hence the introduction of a runtime patching
callback.
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit eff0e9e107 upstream.
We've so far used the PSCI return codes for SMCCC because they
were extremely similar. But with the new ARM DEN 0070A specification,
"NOT_REQUIRED" (-2) is clashing with PSCI's "PSCI_RET_INVALID_PARAMS".
Let's bite the bullet and add SMCCC specific return codes. Users
can be repainted as and when required.
Acked-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 4464e210de upstream.
We already have the percpu area for the host cpu state, which points to
the VCPU, so there's no need to store the VCPU pointer on the stack on
every context switch. We can be a little more clever and just use
tpidr_el2 for the percpu offset and load the VCPU pointer from the host
context.
This has the benefit of being able to retrieve the host context even
when our stack is corrupted, and it has a potential performance benefit
because we trade a store plus a load for an mrs and a load on a round
trip to the guest.
This does require us to calculate the percpu offset without including
the offset from the kernel mapping of the percpu array to the linear
mapping of the array (which is what we store in tpidr_el1), because a
PC-relative generated address in EL2 is already giving us the hyp alias
of the linear mapping of a kernel address. We do this in
__cpu_init_hyp_mode() by using kvm_ksym_ref().
The code that accesses ESR_EL2 was previously using an alternative to
use the _EL1 accessor on VHE systems, but this was actually unnecessary
as the _EL1 accessor aliases the ESR_EL2 register on VHE, and the _EL2
accessor does the same thing on both systems.
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 44a497abd6 upstream.
kvm_vgic_global_state is part of the read-only section, and is
usually accessed using a PC-relative address generation (adrp + add).
It is thus useless to use kern_hyp_va() on it, and actively problematic
if kern_hyp_va() becomes non-idempotent. On the other hand, there is
no way that the compiler is going to guarantee that such access is
always PC relative.
So let's bite the bullet and provide our own accessor.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: James Morse <james.morse@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit dea5e2a4c5 upstream.
We've so far relied on a patching infrastructure that only gave us
a single alternative, without any way to provide a range of potential
replacement instructions. For a single feature, this is an all or
nothing thing.
It would be interesting to have a more flexible grained way of patching
the kernel though, where we could dynamically tune the code that gets
injected.
In order to achive this, let's introduce a new form of dynamic patching,
assiciating a callback to a patching site. This callback gets source and
target locations of the patching request, as well as the number of
instructions to be patched.
Dynamic patching is declared with the new ALTERNATIVE_CB and alternative_cb
directives:
asm volatile(ALTERNATIVE_CB("mov %0, #0\n", callback)
: "r" (v));
or
alternative_cb callback
mov x0, #0
alternative_cb_end
where callback is the C function computing the alternative.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 1f742679c3 upstream.
Now that a VHE host uses tpidr_el2 for the cpu offset we no longer
need KVM to save/restore tpidr_el1. Move this from the 'common' code
into the non-vhe code. While we're at it, on VHE we don't need to
save the ELR or SPSR as kernel_entry in entry.S will have pushed these
onto the kernel stack, and will restore them from there. Move these
to the non-vhe code as we need them to get back to the host.
Finally remove the always-copy-tpidr we hid in the stage2 setup
code, cpufeature's enable callback will do this for VHE, we only
need KVM to do it for non-vhe. Add the copy into kvm-init instead.
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Christoffer Dall <cdall@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 6d99b68933 upstream.
Now that KVM uses tpidr_el2 in the same way as Linux's cpu_offset in
tpidr_el1, merge the two. This saves KVM from save/restoring tpidr_el1
on VHE hosts, and allows future code to blindly access per-cpu variables
without triggering world-switch.
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Christoffer Dall <cdall@linaro.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit c97e166e54 upstream.
Make tpidr_el2 a cpu-offset for per-cpu variables in the same way the
host uses tpidr_el1. This lets tpidr_el{1,2} have the same value, and
on VHE they can be the same register.
KVM calls hyp_panic() when anything unexpected happens. This may occur
while a guest owns the EL1 registers. KVM stashes the vcpu pointer in
tpidr_el2, which it uses to find the host context in order to restore
the host EL1 registers before parachuting into the host's panic().
The host context is a struct kvm_cpu_context allocated in the per-cpu
area, and mapped to hyp. Given the per-cpu offset for this CPU, this is
easy to find. Change hyp_panic() to take a pointer to the
struct kvm_cpu_context. Wrap these calls with an asm function that
retrieves the struct kvm_cpu_context from the host's per-cpu area.
Copy the per-cpu offset from the hosts tpidr_el1 into tpidr_el2 during
kvm init. (Later patches will make this unnecessary for VHE hosts)
We print out the vcpu pointer as part of the panic message. Add a back
reference to the 'running vcpu' in the host cpu context to preserve this.
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Christoffer Dall <cdall@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 36989e7fd3 upstream.
kvm_host_cpu_state is a per-cpu allocation made from kvm_arch_init()
used to store the host EL1 registers when KVM switches to a guest.
Make it easier for ASM to generate pointers into this per-cpu memory
by making it a static allocation.
Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Christoffer Dall <cdall@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 32b03d1059 upstream.
KVM uses tpidr_el2 as its private vcpu register, which makes sense for
non-vhe world switch as only KVM can access this register. This means
vhe Linux has to use tpidr_el1, which KVM has to save/restore as part
of the host context.
If the SDEI handler code runs behind KVMs back, it mustn't access any
per-cpu variables. To allow this on systems with vhe we need to make
the host use tpidr_el2, saving KVM from save/restoring it.
__guest_enter() stores the host_ctxt on the stack, do the same with
the vcpu.
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Christoffer Dall <cdall@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 1b7e2296a8 upstream.
Shortly we will want to load a percpu variable in the return from
userspace path. We can save an instruction by folding the addition of
the percpu offset into the load instruction, and this patch adds a new
helper to do so.
At the same time, we clean up this_cpu_ptr for consistency. As with
{adr,ldr,str}_l, we change the template to take the destination register
first, and name this dst. Secondly, we rename the macro to adr_this_cpu,
following the scheme of adr_l, and matching the newly added
ldr_this_cpu.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Laura Abbott <labbott@redhat.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 84379c9afe upstream.
Eric Dumazet reports:
Here is a reproducer of an annoying bug detected by syzkaller on our production kernel
[..]
./b78305423 enable_conntrack
Then :
sleep 60
dmesg | tail -10
[ 171.599093] unregister_netdevice: waiting for lo to become free. Usage count = 2
[ 181.631024] unregister_netdevice: waiting for lo to become free. Usage count = 2
[ 191.687076] unregister_netdevice: waiting for lo to become free. Usage count = 2
[ 201.703037] unregister_netdevice: waiting for lo to become free. Usage count = 2
[ 211.711072] unregister_netdevice: waiting for lo to become free. Usage count = 2
[ 221.959070] unregister_netdevice: waiting for lo to become free. Usage count = 2
Reproducer sends ipv6 fragment that hits nfct defrag via LOCAL_OUT hook.
skb gets queued until frag timer expiry -- 1 minute.
Normally nf_conntrack_reasm gets called during prerouting, so skb has
no dst yet which might explain why this wasn't spotted earlier.
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Reported-by: John Sperbeck <jsperbeck@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Tested-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c604cb7670 upstream.
My recent fix for dns_resolver_preparse() printing very long strings was
incomplete, as shown by syzbot which still managed to hit the
WARN_ONCE() in set_precision() by adding a crafted "dns_resolver" key:
precision 50001 too large
WARNING: CPU: 7 PID: 864 at lib/vsprintf.c:2164 vsnprintf+0x48a/0x5a0
The bug this time isn't just a printing bug, but also a logical error
when multiple options ("#"-separated strings) are given in the key
payload. Specifically, when separating an option string into name and
value, if there is no value then the name is incorrectly considered to
end at the end of the key payload, rather than the end of the current
option. This bypasses validation of the option length, and also means
that specifying multiple options is broken -- which presumably has gone
unnoticed as there is currently only one valid option anyway.
A similar problem also applied to option values, as the kstrtoul() when
parsing the "dnserror" option will read past the end of the current
option and into the next option.
Fix these bugs by correctly computing the length of the option name and
by copying the option value, null-terminated, into a temporary buffer.
Reproducer for the WARN_ONCE() that syzbot hit:
perl -e 'print "#A#", "\0" x 50000' | keyctl padd dns_resolver desc @s
Reproducer for "dnserror" option being parsed incorrectly (expected
behavior is to fail when seeing the unknown option "foo", actual
behavior was to read the dnserror value as "1#foo" and fail there):
perl -e 'print "#dnserror=1#foo\0"' | keyctl padd dns_resolver desc @s
Reported-by: syzbot <syzkaller@googlegroups.com>
Fixes: 4a2d789267 ("DNS: If the DNS server returns an error, allow that to be cached [ver #2]")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 11ff7288be upstream.
the ebtables evaluation loop expects targets to return
positive values (jumps), or negative values (absolute verdicts).
This is completely different from what xtables does.
In xtables, targets are expected to return the standard netfilter
verdicts, i.e. NF_DROP, NF_ACCEPT, etc.
ebtables will consider these as jumps.
Therefore reject any target found due to unspec fallback.
v2: also reject watchers. ebtables ignores their return value, so
a target that assumes skb ownership (and returns NF_STOLEN) causes
use-after-free.
The only watchers in the 'ebtables' front-end are log and nflog;
both have AF_BRIDGE specific wrappers on kernel side.
Reported-by: syzbot+2b43f681169a2a0d306a@syzkaller.appspotmail.com
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9a98302de1 upstream.
Without this patch, firmware will not run properly on rtl8821ae, and it
causes bad user experience. For example, bad connection performance with
low rate, higher power consumption, and so on.
rtl8821ae uses two kinds of firmwares for normal and WoWlan cases, and
each firmware has firmware data buffer and size individually. Original
code always overwrite size of normal firmware rtlpriv->rtlhal.fwsize, and
this mismatch causes firmware checksum error, then firmware can't start.
In this situation, driver gives message "Firmware is not ready to run!".
Fixes: fe89707f0a ("rtlwifi: rtl8821ae: Simplify loading of WOWLAN firmware")
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Cc: Stable <stable@vger.kernel.org> # 4.0+
Reviewed-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 676bcfece1 upstream.
t.qset_idx can be indirectly controlled by user-space, hence leading to
a potential exploitation of the Spectre variant 1 vulnerability.
This issue was detected with the help of Smatch:
drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c:2286 cxgb_extension_ioctl()
warn: potential spectre issue 'adapter->msix_info'
Fix this by sanitizing t.qset_idx before using it to index
adapter->msix_info
Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].
[1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2
Cc: stable@vger.kernel.org
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d412c31dae ]
The command interface can work in two modes: Events and Polling.
In the general case, each time we invoke a command, a work is
queued to handle it.
When working in events, the interrupt handler completes the
command execution. On the other hand, when working in polling
mode, the work itself completes it.
Due to a bug in the work handler, a command could have been
completed by the interrupt handler, while the work handler
hasn't finished yet, causing the it to complete once again
if the command interface mode was changed from Events to
polling after the interrupt handler was called.
mlx5_unload_one()
mlx5_stop_eqs()
// Destroy the EQ before cmd EQ
...cmd_work_handler()
write_doorbell()
--> EVENT_TYPE_CMD
mlx5_cmd_comp_handler() // First free
free_ent(cmd, ent->idx)
complete(&ent->done)
<-- mlx5_stop_eqs //cmd was complete
// move to polling before destroying the last cmd EQ
mlx5_cmd_use_polling()
cmd->mode = POLL;
--> cmd_work_handler (continues)
if (cmd->mode == POLL)
mlx5_cmd_comp_handler() // Double free
The solution is to store the cmd->mode before writing the doorbell.
Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b8f1f65882 ]
Sock will be NULL if we pass -1 to vhost_net_set_backend(), but when
we meet errors during ubuf allocation, the code does not check for
NULL before calling sockfd_put(), this will lead NULL
dereferencing. Fixing by checking sock pointer before.
Fixes: bab632d69e ("vhost: vhost TX zero-copy support")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 1236f22fba ]
If SACK is not enabled and the first cumulative ACK after the RTO
retransmission covers more than the retransmitted skb, a spurious
FRTO undo will trigger (assuming FRTO is enabled for that RTO).
The reason is that any non-retransmitted segment acknowledged will
set FLAG_ORIG_SACK_ACKED in tcp_clean_rtx_queue even if there is
no indication that it would have been delivered for real (the
scoreboard is not kept with TCPCB_SACKED_ACKED bits in the non-SACK
case so the check for that bit won't help like it does with SACK).
Having FLAG_ORIG_SACK_ACKED set results in the spurious FRTO undo
in tcp_process_loss.
We need to use more strict condition for non-SACK case and check
that none of the cumulatively ACKed segments were retransmitted
to prove that progress is due to original transmissions. Only then
keep FLAG_ORIG_SACK_ACKED set, allowing FRTO undo to proceed in
non-SACK case.
(FLAG_ORIG_SACK_ACKED is planned to be renamed to FLAG_ORIG_PROGRESS
to better indicate its purpose but to keep this change minimal, it
will be done in another patch).
Besides burstiness and congestion control violations, this problem
can result in RTO loop: When the loss recovery is prematurely
undoed, only new data will be transmitted (if available) and
the next retransmission can occur only after a new RTO which in case
of multiple losses (that are not for consecutive packets) requires
one RTO per loss to recover.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Tested-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c860e997e9 ]
Fast Open key could be stored in different endian based on the CPU.
Previously hosts in different endianness in a server farm using
the same key config (sysctl value) would produce different cookies.
This patch fixes it by always storing it as little endian to keep
same API for LE hosts.
Reported-by: Daniele Iamartino <danielei@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0ee1f47349 ]
When unplugging an r8152 adapter while the interface is UP, the NIC
becomes unusable. usb->disconnect (aka rtl8152_disconnect) deletes
napi. Then, rtl8152_disconnect calls unregister_netdev and that invokes
netdev->ndo_stop (aka rtl8152_close). rtl8152_close tries to
napi_disable, but the napi is already deleted by disconnect above. So
the first while loop in napi_disable never finishes. This results in
complete deadlock of the network layer as there is rtnl_mutex held by
unregister_netdev.
So avoid the call to napi_disable in rtl8152_close when the device is
already gone.
The other calls to usb_kill_urb, cancel_delayed_work_sync,
netif_stop_queue etc. seem to be fine. The urb and netdev is not
destroyed yet.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: linux-usb@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e7e197edd0 ]
This module exposes two USB configurations: a QMI+AT capable setup on
USB config #1 and a MBIM capable setup on USB config #2.
By default the kernel will choose the MBIM capable configuration as
long as the cdc_mbim driver is available. This patch adds support for
the QMI port in the secondary configuration.
Signed-off-by: Aleksander Morgado <aleksander@aleksander.es>
Acked-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit bb7858ba11 ]
Memory size is limited in the kdump kernel environment. Allocation of more
msix-vectors (or queues) consumes few tens of MBs of memory, which might
lead to the kdump kernel failure.
This patch adds changes to limit the number of MSI-X vectors in kdump
kernel to minimum required value (i.e., 2 per engine).
Fixes: fe56b9e6a ("qed: Add module with basic common support")
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 12b03558ce ]
After commit 88078d98d1 ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE
are friends"), sungem owners reported the infamous "eth0: hw csum failure"
message.
CHECKSUM_COMPLETE has in fact never worked for this driver, but this
was masked by the fact that upper stacks had to strip the FCS, and
therefore skb->ip_summed was set back to CHECKSUM_NONE before
my recent change.
Driver configures a number of bytes to skip when the chip computes
the checksum, and for some reason only half of the Ethernet header
was skipped.
Then a second problem is that we should strip the FCS by default,
unless the driver is updated to eventually support NETIF_F_RXFCS in
the future.
Finally, a driver should check if NETIF_F_RXCSUM feature is enabled
or not, so that the admin can turn off rx checksum if wanted.
Many thanks to Andreas Schwab and Mathieu Malaterre for their
help in debugging this issue.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Meelis Roos <mroos@linux.ee>
Reported-by: Mathieu Malaterre <malat@debian.org>
Reported-by: Andreas Schwab <schwab@linux-m68k.org>
Tested-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 7e85dc8cb3 ]
When blackhole is used on top of classful qdisc like hfsc it breaks
qlen and backlog counters because packets are disappear without notice.
In HFSC non-zero qlen while all classes are inactive triggers warning:
WARNING: ... at net/sched/sch_hfsc.c:1393 hfsc_dequeue+0xba4/0xe90 [sch_hfsc]
and schedules watchdog work endlessly.
This patch return __NET_XMIT_BYPASS in addition to NET_XMIT_SUCCESS,
this flag tells upper layer: this packet is gone and isn't queued.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 603b7bcff8 ]
The NULL character was not set correctly for the string containing
the command length, this caused failures reading the output of the
command due to a random length. The fix is to initialize the output
length string.
Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0ce4e70ff0 ]
To compute delays, better not use time of the day which can
be changed by admins or malicious programs.
Also change ccid3_first_li() to use s64 type for delta variable
to avoid potential overflows.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Cc: dccp@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 30877961b1 ]
Commit 296d485680 ("ipvlan: inherit MTU from master device") adjusted
the mtu from the master device when creating a ipvlan device, but it
would also override the mtu value set in rtnl_create_link. It causes
IFLA_MTU param not to take effect.
So this patch is to not adjust the mtu if IFLA_MTU param is set when
creating a ipvlan device.
Fixes: 296d485680 ("ipvlan: inherit MTU from master device")
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ced9e19150 ]
pool can be indirectly controlled by user-space, hence leading to
a potential exploitation of the Spectre variant 1 vulnerability.
This issue was detected with the help of Smatch:
drivers/atm/zatm.c:1491 zatm_ioctl() warn: potential spectre issue
'zatm_dev->pool_info' (local cap)
Fix this by sanitizing pool before using it to index
zatm_dev->pool_info
Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].
[1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5d59ad6eea upstream.
If one of the later memory allocations in rypto4xx_build_pdr()
fails: dev->pdr (and/or) dev->pdr_uinfo wouldn't be freed.
crypto4xx_build_sdr() has the same issue with dev->sdr.
Signed-off-by: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d6213c1f2a upstream.
The DMA controller regs actually point to DMA channel 0, so the write to
ENETDMA_CFG_REG will actually modify a random DMA channel.
Since DMA controller registers do not exist on BCM6345, guard the write
with the usual check for dma_has_sram.
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9c86b846ce upstream.
Check the return code of prepare_enable and change one last instance of
enable only to prepare_enable. Also properly disable and release the
clock in error paths and on remove for enetsw.
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9e276de6a3 upstream.
Consider a message size limit when calculating the maximum amount
of data that can be read.
The message size limit has been introduced with 4.9, so cc it
to stable.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3e4c56d41e upstream.
ip_alloc_sem should be taken in ocfs2_get_block() when reading file in
DIRECT mode to prevent concurrent access to extent tree with
ocfs2_dio_end_io_write(), which may cause BUGON in the following
situation:
read file 'A' end_io of writing file 'A'
vfs_read
__vfs_read
ocfs2_file_read_iter
generic_file_read_iter
ocfs2_direct_IO
__blockdev_direct_IO
do_blockdev_direct_IO
do_direct_IO
get_more_blocks
ocfs2_get_block
ocfs2_extent_map_get_blocks
ocfs2_get_clusters
ocfs2_get_clusters_nocache()
ocfs2_search_extent_list
return the index of record which
contains the v_cluster, that is
v_cluster > rec[i]->e_cpos.
ocfs2_dio_end_io
ocfs2_dio_end_io_write
down_write(&oi->ip_alloc_sem);
ocfs2_mark_extent_written
ocfs2_change_extent_flag
ocfs2_split_extent
...
--> modify the rec[i]->e_cpos, resulting
in v_cluster < rec[i]->e_cpos.
BUG_ON(v_cluster < le32_to_cpu(rec->e_cpos))
[alex.chen@huawei.com: v3]
Link: http://lkml.kernel.org/r/59EF3614.6050008@huawei.com
Link: http://lkml.kernel.org/r/59EF3614.6050008@huawei.com
Fixes: c15471f795 ("ocfs2: fix sparse file & data ordering issue in direct io")
Signed-off-by: Alex Chen <alex.chen@huawei.com>
Reviewed-by: Jun Piao <piaojun@huawei.com>
Reviewed-by: Joseph Qi <jiangqi903@gmail.com>
Reviewed-by: Gang He <ghe@suse.com>
Acked-by: Changwei Ge <ge.changwei@h3c.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Salvatore Bonaccorso <carnil@debian.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6d53cefb18 upstream.
Commit abb2ea7dfd ("compiler, clang: suppress warning for unused
static inline functions") just caused more warnings due to re-defining
the 'inline' macro.
So undef it before re-defining it, and also add the 'notrace' attribute
like the gcc version that this is overriding does.
Maybe this makes clang happier.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit abb2ea7dfd upstream.
GCC explicitly does not warn for unused static inline functions for
-Wunused-function. The manual states:
Warn whenever a static function is declared but not defined or
a non-inline static function is unused.
Clang does warn for static inline functions that are unused.
It turns out that suppressing the warnings avoids potentially complex
#ifdef directives, which also reduces LOC.
Suppress the warning for clang.
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b63e132b64 upstream.
The current MIPS implementation of arch_trigger_cpumask_backtrace() is
broken because it attempts to use synchronous IPIs despite the fact that
it may be run with interrupts disabled.
This means that when arch_trigger_cpumask_backtrace() is invoked, for
example by the RCU CPU stall watchdog, we may:
- Deadlock due to use of synchronous IPIs with interrupts disabled,
causing the CPU that's attempting to generate the backtrace output
to hang itself.
- Not succeed in generating the desired output from remote CPUs.
- Produce warnings about this from smp_call_function_many(), for
example:
[42760.526910] INFO: rcu_sched detected stalls on CPUs/tasks:
[42760.535755] 0-...!: (1 GPs behind) idle=ade/140000000000000/0 softirq=526944/526945 fqs=0
[42760.547874] 1-...!: (0 ticks this GP) idle=e4a/140000000000000/0 softirq=547885/547885 fqs=0
[42760.559869] (detected by 2, t=2162 jiffies, g=266689, c=266688, q=33)
[42760.568927] ------------[ cut here ]------------
[42760.576146] WARNING: CPU: 2 PID: 1216 at kernel/smp.c:416 smp_call_function_many+0x88/0x20c
[42760.587839] Modules linked in:
[42760.593152] CPU: 2 PID: 1216 Comm: sh Not tainted 4.15.4-00373-gee058bb4d0c2 #2
[42760.603767] Stack : 8e09bd20 8e09bd20 8e09bd20 fffffff0 00000007 00000006 00000000 8e09bca8
[42760.616937] 95b2b379 95b2b379 807a0080 00000007 81944518 0000018a 00000032 00000000
[42760.630095] 00000000 00000030 80000000 00000000 806eca74 00000009 8017e2b8 000001a0
[42760.643169] 00000000 00000002 00000000 8e09baa4 00000008 808b8008 86d69080 8e09bca0
[42760.656282] 8e09ad50 805e20aa 00000000 00000000 00000000 8017e2b8 00000009 801070ca
[42760.669424] ...
[42760.673919] Call Trace:
[42760.678672] [<27fde568>] show_stack+0x70/0xf0
[42760.685417] [<84751641>] dump_stack+0xaa/0xd0
[42760.692188] [<699d671c>] __warn+0x80/0x92
[42760.698549] [<68915d41>] warn_slowpath_null+0x28/0x36
[42760.705912] [<f7c76c1c>] smp_call_function_many+0x88/0x20c
[42760.713696] [<6bbdfc2a>] arch_trigger_cpumask_backtrace+0x30/0x4a
[42760.722216] [<f845bd33>] rcu_dump_cpu_stacks+0x6a/0x98
[42760.729580] [<796e7629>] rcu_check_callbacks+0x672/0x6ac
[42760.737476] [<059b3b43>] update_process_times+0x18/0x34
[42760.744981] [<6eb94941>] tick_sched_handle.isra.5+0x26/0x38
[42760.752793] [<478d3d70>] tick_sched_timer+0x1c/0x50
[42760.759882] [<e56ea39f>] __hrtimer_run_queues+0xc6/0x226
[42760.767418] [<e88bbcae>] hrtimer_interrupt+0x88/0x19a
[42760.775031] [<6765a19e>] gic_compare_interrupt+0x2e/0x3a
[42760.782761] [<0558bf5f>] handle_percpu_devid_irq+0x78/0x168
[42760.790795] [<90c11ba2>] generic_handle_irq+0x1e/0x2c
[42760.798117] [<1b6d462c>] gic_handle_local_int+0x38/0x86
[42760.805545] [<b2ada1c7>] gic_irq_dispatch+0xa/0x14
[42760.812534] [<90c11ba2>] generic_handle_irq+0x1e/0x2c
[42760.820086] [<c7521934>] do_IRQ+0x16/0x20
[42760.826274] [<9aef3ce6>] plat_irq_dispatch+0x62/0x94
[42760.833458] [<6a94b53c>] except_vec_vi_end+0x70/0x78
[42760.840655] [<22284043>] smp_call_function_many+0x1ba/0x20c
[42760.848501] [<54022b58>] smp_call_function+0x1e/0x2c
[42760.855693] [<ab9fc705>] flush_tlb_mm+0x2a/0x98
[42760.862730] [<0844cdd0>] tlb_flush_mmu+0x1c/0x44
[42760.869628] [<cb259b74>] arch_tlb_finish_mmu+0x26/0x3e
[42760.877021] [<1aeaaf74>] tlb_finish_mmu+0x18/0x66
[42760.883907] [<b3fce717>] exit_mmap+0x76/0xea
[42760.890428] [<c4c8a2f6>] mmput+0x80/0x11a
[42760.896632] [<a41a08f4>] do_exit+0x1f4/0x80c
[42760.903158] [<ee01cef6>] do_group_exit+0x20/0x7e
[42760.909990] [<13fa8d54>] __wake_up_parent+0x0/0x1e
[42760.917045] [<46cf89d0>] smp_call_function_many+0x1a2/0x20c
[42760.924893] [<8c21a93b>] syscall_common+0x14/0x1c
[42760.931765] ---[ end trace 02aa09da9dc52a60 ]---
[42760.938342] ------------[ cut here ]------------
[42760.945311] WARNING: CPU: 2 PID: 1216 at kernel/smp.c:291 smp_call_function_single+0xee/0xf8
...
This patch switches MIPS' arch_trigger_cpumask_backtrace() to use async
IPIs & smp_call_function_single_async() in order to resolve this
problem. We ensure use of the pre-allocated call_single_data_t
structures is serialized by maintaining a cpumask indicating that
they're busy, and refusing to attempt to send an IPI when a CPU's bit is
set in this mask. This should only happen if a CPU hasn't responded to a
previous backtrace IPI - ie. if it's hung - and we print a warning to
the console in this case.
I've marked this for stable branches as far back as v4.9, to which it
applies cleanly. Strictly speaking the faulty MIPS implementation can be
traced further back to commit 856839b768 ("MIPS: Add
arch_trigger_all_cpu_backtrace() function") in v3.19, but kernel
versions v3.19 through v4.8 will require further work to backport due to
the rework performed in commit 9a01c3ed5c ("nmi_backtrace: add more
trigger_*_cpu_backtrace() methods").
Signed-off-by: Paul Burton <paul.burton@mips.com>
Patchwork: https://patchwork.linux-mips.org/patch/19597/
Cc: James Hogan <jhogan@kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Huacai Chen <chenhc@lemote.com>
Cc: linux-mips@linux-mips.org
Cc: stable@vger.kernel.org # v4.9+
Fixes: 856839b768 ("MIPS: Add arch_trigger_all_cpu_backtrace() function")
Fixes: 9a01c3ed5c ("nmi_backtrace: add more trigger_*_cpu_backtrace() methods")
[ Huacai: backported to 4.9: Replace "call_single_data_t" with "struct call_single_data" ]
Signed-off-by: Huacai Chen <chenhc@lemote.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7a8690ed6f upstream.
In commit 357d23c811a7 ("Remove the obsolete libibcm library")
in rdma-core [1], we removed obsolete library which used the
/dev/infiniband/ucmX interface.
Following multiple syzkaller reports about non-sanitized
user input in the UCMA module, the short audit reveals the same
issues in UCM module too.
It is better to disable this interface in the kernel,
before syzkaller team invests time and energy to harden
this unused interface.
[1] https://github.com/linux-rdma/rdma-core/pull/279
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c568503ef0 upstream.
syzbot reports following splat:
BUG: KMSAN: uninit-value in ebt_stp_mt_check+0x24b/0x450
net/bridge/netfilter/ebt_stp.c:162
ebt_stp_mt_check+0x24b/0x450 net/bridge/netfilter/ebt_stp.c:162
xt_check_match+0x1438/0x1650 net/netfilter/x_tables.c:506
ebt_check_match net/bridge/netfilter/ebtables.c:372 [inline]
ebt_check_entry net/bridge/netfilter/ebtables.c:702 [inline]
The uninitialised access is
xt_mtchk_param->nft_compat
... which should be set to 0.
Fix it by zeroing the struct beforehand, same for tgchk.
ip(6)tables targetinfo uses c99-style initialiser, so no change
needed there.
Reported-by: syzbot+da4494182233c23a5fcf@syzkaller.appspotmail.com
Fixes: 55917a21d0 ("netfilter: x_tables: add context to know if extension runs from nft_compat")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 815c6704bf upstream.
The controller memory buffer is remapped into a kernel address on each
reset, but the driver was setting the submission queue base address
only on the very first queue creation. The remapped address is likely to
change after a reset, so accessing the old address will hit a kernel bug.
This patch fixes that by setting the queue's CMB base address each time
the queue is created.
Fixes: f63572dff1 ("nvme: unmap CMB and remove sysfs file in reset path")
Reported-by: Christian Black <christian.d.black@intel.com>
Cc: Jon Derrick <jonathan.derrick@intel.com>
Cc: <stable@vger.kernel.org> # 4.9+
Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Scott Bauer <scott.bauer@intel.com>
Reviewed-by: Jon Derrick <jonathan.derrick@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7b72717a20 upstream.
The code was mistakenly using the length of the page array memory instead
of the depth of the page array.
This would cause MR creation to fail in some cases.
Fixes: 8376b86de7 ("iw_cxgb4: Support the new memory registration API")
Cc: stable@vger.kernel.org
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 54836e2d03 upstream.
On Tegra30 Cardhu the PCA9546 I2C mux is not ACK'ing I2C commands on
resume from suspend (which is caused by the reset signal for the I2C
mux not being configured correctl). However, this NACK is causing the
Tegra30 to hang on resuming from suspend which is not expected as we
detect NACKs and handle them. The hang observed appears to occur when
resetting the I2C controller to recover from the NACK.
Commit 77821b4678 ("i2c: tegra: proper handling of error cases") added
additional error handling for some error cases including NACK, however,
it appears that this change conflicts with an early fix by commit
f70893d083 ("i2c: tegra: Add delay before resetting the controller
after NACK"). After commit 77821b4678 was made we now disable 'packet
mode' before the delay from commit f70893d083 happens. Testing shows
that moving the delay to before disabling 'packet mode' fixes the hang
observed on Tegra30. The delay was added to give the I2C controller
chance to send a stop condition and so it makes sense to move this to
before we disable packet mode. Please note that packet mode is always
enabled for Tegra.
Fixes: 77821b4678 ("i2c: tegra: proper handling of error cases")
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9feeb638cd upstream.
In 2016 GNU Make made a backwards incompatible change to the way '#'
characters were handled in Makefiles when used inside functions or
macros:
http://git.savannah.gnu.org/cgit/make.git/commit/?id=c6966b323811c37acedff05b57
Due to this change, when attempting to run `make prepare' I get a
spurious make syntax error:
/home/earnest/linux/tools/objtool/.fixdep.o.cmd:1: *** missing separator. Stop.
When inspecting `.fixdep.o.cmd' it includes two lines which use
unescaped comment characters at the top:
\# cannot find fixdep (/home/earnest/linux/tools/objtool//fixdep)
\# using basic dep data
This is because `tools/build/Build.include' prints these '\#'
characters:
printf '\# cannot find fixdep (%s)\n' $(fixdep) > $(dot-target).cmd; \
printf '\# using basic dep data\n\n' >> $(dot-target).cmd; \
This completes commit 9564a8cf42 ("Kbuild: fix # escaping in .cmd files
for future Make").
Link: https://bugzilla.kernel.org/show_bug.cgi?id=197847
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: stable@vger.kernel.org
Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit aaa23f8600 upstream.
Obtaining the runtime pm wakeref can fail, especially in a hotplug
scenario where i915.ko has been unloaded. If we do not catch the
failure, we end up with an unbalanced pm.
v2 additions by tiwai:
hdmi_present_sense() checks the return value and handle only a
negative error case and bails out only if it's really still suspended.
Also, snd_hda_power_down() is called at the error path so that the
refcount is balanced.
Along with it, the spec->pcm_lock is taken outside
hdmi_present_sense() in the caller side, so that it won't cause
deadlock at reentrace via runtime resume.
v3 fix by tiwai:
Missing linux/pm_runtime.h is included.
References: 222bde0388 ("ALSA: hda - Fix mutex deadlock at HDMI/DP hotplug")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0fa3ecd878 upstream.
sgid directories have special semantics, making newly created files in
the directory belong to the group of the directory, and newly created
subdirectories will also become sgid. This is historically used for
group-shared directories.
But group directories writable by non-group members should not imply
that such non-group members can magically join the group, so make sure
to clear the sgid bit on non-directories for non-members (but remember
that sgid without group execute means "mandatory locking", just to
confuse things even more).
Reported-by: Jann Horn <jannh@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9547837bdc upstream.
The (1292:4745) Innomedia INNEX GENESIS/ATARI adapter needs
HID_QUIRK_MULTI_INPUT to split the device up into two controllers
instead of inputs from both being merged into one.
Signed-off-by: Tomasz Kramkowski <tk@the-tk.com>
Acked-By: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 313db3d648 upstream.
The > should be >= here so that we don't read one element beyond the end
of the ep->stream_info->stream_rings[] array.
Fixes: e9df17eb14 ("USB: xhci: Correct assumptions about number of rings per endpoint.")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bba57eddad upstream.
Corsair Strafe appears to suffer from the same issues
as the Corsair Strafe RGB.
Apply the same quirks (control message delay and init delay)
that the RGB version has to 1b1c:1b15.
With these quirks in place the keyboard works correctly upon
booting the system, and no longer requires reattaching the device.
Signed-off-by: Nico Sneck <snecknico@gmail.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f1e255d60a upstream.
In general, accessing userspace memory beyond the length of the supplied
buffer in VFS read/write handlers can lead to both kernel memory corruption
(via kernel_read()/kernel_write(), which can e.g. be triggered via
sys_splice()) and privilege escalation inside userspace.
Fix it by using simple_read_from_buffer() instead of custom logic.
Fixes: 6bc235a2e2 ("USB: add driver for Meywa-Denki & Kayac YUREX")
Signed-off-by: Jann Horn <jannh@google.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 01b3cdfca2 upstream.
Fix broken modem-status error handling which could lead to bits of slab
data leaking to user space.
Fixes: 3b36a8fd67 ("usb: fix uninitialized variable warning in keyspan_pda")
Cc: stable <stable@vger.kernel.org> # 2.6.27
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e33eab9ded upstream.
The "r" variable is an int and "bufsize" is an unsigned int so the
comparison is type promoted to unsigned. If usb_control_msg() returns a
negative that is treated as a high positive value and the error handling
doesn't work.
Fixes: 2d5a9c72d0 ("USB: serial: ch341: fix control-message error handling")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 240630e618 upstream.
There have been several reports of LPM related hard freezes about once
a day on multiple Lenovo 50 series models. Strange enough these reports
where not disk model specific as LPM issues usually are and some users
with the exact same disk + laptop where seeing them while other users
where not seeing these issues.
It turns out that enabling LPM triggers a firmware bug somewhere, which
has been fixed in later BIOS versions.
This commit adds a new ahci_broken_lpm() function and a new ATA_FLAG_NO_LPM
for dealing with this.
The ahci_broken_lpm() function contains DMI match info for the 4 models
which are known to be affected by this and the DMI BIOS date field for
known good BIOS versions. If the BIOS date is older then the one in the
table LPM will be disabled and a warning will be printed.
Note the BIOS dates are for known good versions, some older versions may
work too, but we don't know for sure, the table is using dates from BIOS
versions for which users have confirmed that upgrading to that version
makes the problem go away.
Unfortunately I've been unable to get hold of the reporter who reported
that BIOS version 2.35 fixed the problems on the W541 for him. I've been
able to verify the DMI_SYS_VENDOR and DMI_PRODUCT_VERSION from an older
dmidecode, but I don't know the exact BIOS date as reported in the DMI.
Lenovo keeps a changelog with dates in their release notes, but the
dates there are the release dates not the build dates which are in DMI.
So I've chosen to set the date to which we compare to one day past the
release date of the 2.34 BIOS. I plan to fix this with a follow up
commit once I've the necessary info.
Cc: stable@vger.kernel.org
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 90d72ce079 upstream.
Embarrassingly, the recent fix introduced worse problem than it solved,
causing the balloon not to inflate. The VM informed the hypervisor that
the pages for lock/unlock are sitting in the wrong address, as it used
the page that is used the uninitialized page variable.
Fixes: b23220fe05 ("vmw_balloon: fixing double free when batching mode is off")
Cc: stable@vger.kernel.org
Reviewed-by: Xavier Deguillard <xdeguillard@vmware.com>
Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b320a0a9f2 upstream.
The block (LBA) specified must not exceed the last addressable LBA,
which is dev->nr_sectors - 1. So fix the correct check is
"if (block >= dev->n_sectors)" and not "if (block > dev->n_sectords)".
Additionally, the asc/ascq to return for an LBA that is not a zone start
LBA should be ILLEGAL REQUEST, regardless if the bad LBA is out of
range.
Reported-by: David Butterfield <david.butterfield@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Cc: stable@vger.kernel.org
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a0341fc198 upstream.
This read handler had a lot of custom logic and wrote outside the bounds of
the provided buffer. This could lead to kernel and userspace memory
corruption. Just use simple_read_from_buffer() with a stack buffer.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Cc: stable@vger.kernel.org
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7a6b9f4d60 upstream.
Card write threshold control is supposed to be set since controller
version 2.80a for data write in HS400 mode and data read in
HS200/HS400/SDR104 mode. However the current code returns without
configuring it in the case of data writing in HS400 mode.
Meanwhile the patch fixes that the current code goes to
'disable' when doing data reading in HS400 mode.
Fixes: 7e4bf1bc95 ("mmc: dw_mmc: add the card write threshold for HS400 mode")
Signed-off-by: Qing Xia <xiaqing17@hisilicon.com>
Cc: stable@vger.kernel.org # v4.8+
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 523402fa91 upstream.
We currently attempt to check whether a physical address range provided
to __ioremap() may be in use by the page allocator by examining the
value of PageReserved for each page in the region - lowmem pages not
marked reserved are presumed to be in use by the page allocator, and
requests to ioremap them fail.
The way we check this has been broken since commit 92923ca3aa ("mm:
meminit: only set page reserved in the memblock region"), because
memblock will typically not have any knowledge of non-RAM pages and
therefore those pages will not have the PageReserved flag set. Thus when
we attempt to ioremap a region outside of RAM we incorrectly fail
believing that the region is RAM that may be in use.
In most cases ioremap() on MIPS will take a fast-path to use the
unmapped kseg1 or xkphys virtual address spaces and never hit this path,
so the only way to hit it is for a MIPS32 system to attempt to ioremap()
an address range in lowmem with flags other than _CACHE_UNCACHED.
Perhaps the most straightforward way to do this is using
ioremap_uncached_accelerated(), which is how the problem was discovered.
Fix this by making use of walk_system_ram_range() to test the address
range provided to __ioremap() against only RAM pages, rather than all
lowmem pages. This means that if we have a lowmem I/O region, which is
very common for MIPS systems, we're free to ioremap() address ranges
within it. A nice bonus is that the test is no longer limited to lowmem.
The approach here matches the way x86 performed the same test after
commit c81c8a1eee ("x86, ioremap: Speed up check for RAM pages") until
x86 moved towards a slightly more complicated check using walk_mem_res()
for unrelated reasons with commit 0e4c12b45a ("x86/mm, resource: Use
PAGE_KERNEL protection for ioremap of memory pages").
Signed-off-by: Paul Burton <paul.burton@mips.com>
Reported-by: Serge Semin <fancer.lancer@gmail.com>
Tested-by: Serge Semin <fancer.lancer@gmail.com>
Fixes: 92923ca3aa ("mm: meminit: only set page reserved in the memblock region")
Cc: James Hogan <jhogan@kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: stable@vger.kernel.org # v4.2+
Patchwork: https://patchwork.linux-mips.org/patch/19786/
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5a267832c2 upstream.
The generic nmi_cpu_backtrace() function calls show_regs() when a struct
pt_regs is available, and dump_stack() otherwise. If we were to make use
of the generic nmi_cpu_backtrace() with MIPS' current implementation of
show_regs() this would mean that we see only register data with no
accompanying stack information, in contrast with our current
implementation which calls dump_stack() regardless of whether register
state is available.
In preparation for making use of the generic nmi_cpu_backtrace() to
implement arch_trigger_cpumask_backtrace(), have our implementation of
show_regs() call dump_stack() and drop the explicit dump_stack() call in
arch_dump_stack() which is invoked by arch_trigger_cpumask_backtrace().
This will allow the output we produce to remain the same after a later
patch switches to using nmi_cpu_backtrace(). It may mean that we produce
extra stack output in other uses of show_regs(), but this:
1) Seems harmless.
2) Is good for consistency between arch_trigger_cpumask_backtrace()
and other users of show_regs().
3) Matches the behaviour of the ARM & PowerPC architectures.
Marked for stable back to v4.9 as a prerequisite of the following patch
"MIPS: Call dump_stack() from show_regs()".
Signed-off-by: Paul Burton <paul.burton@mips.com>
Patchwork: https://patchwork.linux-mips.org/patch/19596/
Cc: James Hogan <jhogan@kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Huacai Chen <chenhc@lemote.com>
Cc: linux-mips@linux-mips.org
Cc: stable@vger.kernel.org # v4.9+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ce00bf07cc upstream.
The old code would indefinitely block other users of nf_log_mutex if
a userspace access in proc_dostring() blocked e.g. due to a userfaultfd
region. Fix it by moving proc_dostring() out of the locked region.
This is a followup to commit 266d07cb1c ("netfilter: nf_log: fix
sleeping function called from invalid context"), which changed this code
from using rcu_read_lock() to taking nf_log_mutex.
Fixes: 266d07cb1c ("netfilter: nf_log: fix sleeping function calle[...]")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d12067f428 upstream.
dm_bufio_shrink_count() is called from do_shrink_slab to find out how many
freeable objects are there. The reported value doesn't have to be precise,
so we don't need to take the dm-bufio lock.
Suggested-by: David Rientjes <rientjes@google.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3f77f244d8 upstream.
The v21 version of the NAND flash controller contains a Spare Area Size
Register (SPAS) at offset 0x10. Its setting defaults to the maximum
spare area size of 218 bytes. The size that is set in this register is
used by the controller when it calculates the ECC bytes internally in
hardware.
Usually, this register is updated from settings in the IIM fuses when
the system is booting from NAND flash. For other boot media, however,
the SPAS register remains at the default setting, which may not work for
the particular flash chip on the board. The same goes for flash chips
whose configuration cannot be set in the IIM fuses (e.g. chips with 2k
sector size and 128 bytes spare area size can't be configured in the IIM
fuses on imx25 systems).
Set the SPAS register explicitly during the preset operation. Derive the
register value from mtd->oobsize that was detected during probe by
decoding the flash chip's ID bytes.
While at it, rename the define for the spare area register's offset to
NFC_V21_RSLTSPARE_AREA. The register at offset 0x10 on v1 controllers is
different from the register on v21 controllers.
Fixes: d484018 ("mtd: mxc_nand: set NFC registers after reset")
Cc: stable@vger.kernel.org
Signed-off-by: Martin Kaiser <martin@kaiser.cx>
Reviewed-by: Sascha Hauer <s.hauer@pengutronix.de>
Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 41c73a49df upstream.
If the first allocation attempt using GFP_NOWAIT fails, drop the lock
and retry using GFP_NOIO allocation (lock is dropped because the
allocation can take some time).
Note that we won't do GFP_NOIO allocation when we loop for the second
time, because the lock shouldn't be dropped between __wait_for_free_buffer
and __get_unclaimed_buffer.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9ea61cac0b upstream.
We've seen in-field reports showing _lots_ (18 in one case, 41 in
another) of tasks all sitting there blocked on:
mutex_lock+0x4c/0x68
dm_bufio_shrink_count+0x38/0x78
shrink_slab.part.54.constprop.65+0x100/0x464
shrink_zone+0xa8/0x198
In the two cases analyzed, we see one task that looks like this:
Workqueue: kverityd verity_prefetch_io
__switch_to+0x9c/0xa8
__schedule+0x440/0x6d8
schedule+0x94/0xb4
schedule_timeout+0x204/0x27c
schedule_timeout_uninterruptible+0x44/0x50
wait_iff_congested+0x9c/0x1f0
shrink_inactive_list+0x3a0/0x4cc
shrink_lruvec+0x418/0x5cc
shrink_zone+0x88/0x198
try_to_free_pages+0x51c/0x588
__alloc_pages_nodemask+0x648/0xa88
__get_free_pages+0x34/0x7c
alloc_buffer+0xa4/0x144
__bufio_new+0x84/0x278
dm_bufio_prefetch+0x9c/0x154
verity_prefetch_io+0xe8/0x10c
process_one_work+0x240/0x424
worker_thread+0x2fc/0x424
kthread+0x10c/0x114
...and that looks to be the one holding the mutex.
The problem has been reproduced on fairly easily:
0. Be running Chrome OS w/ verity enabled on the root filesystem
1. Pick test patch: http://crosreview.com/412360
2. Install launchBalloons.sh and balloon.arm from
http://crbug.com/468342
...that's just a memory stress test app.
3. On a 4GB rk3399 machine, run
nice ./launchBalloons.sh 4 900 100000
...that tries to eat 4 * 900 MB of memory and keep accessing.
4. Login to the Chrome web browser and restore many tabs
With that, I've seen printouts like:
DOUG: long bufio 90758 ms
...and stack trace always show's we're in dm_bufio_prefetch().
The problem is that we try to allocate memory with GFP_NOIO while
we're holding the dm_bufio lock. Instead we should be using
GFP_NOWAIT. Using GFP_NOIO can cause us to sleep while holding the
lock and that causes the above problems.
The current behavior explained by David Rientjes:
It will still try reclaim initially because __GFP_WAIT (or
__GFP_KSWAPD_RECLAIM) is set by GFP_NOIO. This is the cause of
contention on dm_bufio_lock() that the thread holds. You want to
pass GFP_NOWAIT instead of GFP_NOIO to alloc_buffer() when holding a
mutex that can be contended by a concurrent slab shrinker (if
count_objects didn't use a trylock, this pattern would trivially
deadlock).
This change significantly increases responsiveness of the system while
in this state. It makes a real difference because it unblocks kswapd.
In the bug report analyzed, kswapd was hung:
kswapd0 D ffffffc000204fd8 0 72 2 0x00000000
Call trace:
[<ffffffc000204fd8>] __switch_to+0x9c/0xa8
[<ffffffc00090b794>] __schedule+0x440/0x6d8
[<ffffffc00090bac0>] schedule+0x94/0xb4
[<ffffffc00090be44>] schedule_preempt_disabled+0x28/0x44
[<ffffffc00090d900>] __mutex_lock_slowpath+0x120/0x1ac
[<ffffffc00090d9d8>] mutex_lock+0x4c/0x68
[<ffffffc000708e7c>] dm_bufio_shrink_count+0x38/0x78
[<ffffffc00030b268>] shrink_slab.part.54.constprop.65+0x100/0x464
[<ffffffc00030dbd8>] shrink_zone+0xa8/0x198
[<ffffffc00030e578>] balance_pgdat+0x328/0x508
[<ffffffc00030eb7c>] kswapd+0x424/0x51c
[<ffffffc00023f06c>] kthread+0x10c/0x114
[<ffffffc000203dd0>] ret_from_fork+0x10/0x40
By unblocking kswapd memory pressure should be reduced.
Suggested-by: David Rientjes <rientjes@google.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7810e6781e upstream.
In __alloc_pages_slowpath() we reset zonelist and preferred_zoneref for
allocations that can ignore memory policies. The zonelist is obtained
from current CPU's node. This is a problem for __GFP_THISNODE
allocations that want to allocate on a different node, e.g. because the
allocating thread has been migrated to a different CPU.
This has been observed to break SLAB in our 4.4-based kernel, because
there it relies on __GFP_THISNODE working as intended. If a slab page
is put on wrong node's list, then further list manipulations may corrupt
the list because page_to_nid() is used to determine which node's
list_lock should be locked and thus we may take a wrong lock and race.
Current SLAB implementation seems to be immune by luck thanks to commit
511e3a0588 ("mm/slab: make cache_grow() handle the page allocated on
arbitrary node") but there may be others assuming that __GFP_THISNODE
works as promised.
We can fix it by simply removing the zonelist reset completely. There
is actually no reason to reset it, because memory policies and cpusets
don't affect the zonelist choice in the first place. This was different
when commit 183f6371aa ("mm: ignore mempolicies when using
ALLOC_NO_WATERMARK") introduced the code, as mempolicies provided their
own restricted zonelists.
We might consider this for 4.17 although I don't know if there's
anything currently broken.
SLAB is currently not affected, but in kernels older than 4.7 that don't
yet have 511e3a0588 ("mm/slab: make cache_grow() handle the page
allocated on arbitrary node") it is. That's at least 4.4 LTS. Older
ones I'll have to check.
So stable backports should be more important, but will have to be
reviewed carefully, as the code went through many changes. BTW I think
that also the ac->preferred_zoneref reset is currently useless if we
don't also reset ac->nodemask from a mempolicy to NULL first (which we
probably should for the OOM victims etc?), but I would leave that for a
separate patch.
Link: http://lkml.kernel.org/r/20180525130853.13915-1-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Fixes: 183f6371aa ("mm: ignore mempolicies when using ALLOC_NO_WATERMARK")
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3ee9bc1234 upstream.
The cx25840 driver currently configures 885, 887, and 888 using
default divisors for each chip. This check to see if the cx23885
driver has passed the cx25840 a non-default clock rate for a
specific chip. If a cx23885 board has left clk_freq at 0, the
clock default values will be used to configure the PLLs.
This patch only has effect on 888 boards who set clk_freq to 25M.
Signed-off-by: Brad Love <brad@nextdimension.cc>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9564a8cf42 upstream.
I tried building using a freshly built Make (4.2.1-69-g8a731d1), but
already the objtool build broke with
orc_dump.c: In function ‘orc_dump’:
orc_dump.c:106:2: error: ‘elf_getshnum’ is deprecated [-Werror=deprecated-declarations]
if (elf_getshdrnum(elf, &nr_sections)) {
Turns out that with that new Make, the backslash was not removed, so cpp
didn't see a #include directive, grep found nothing, and
-DLIBELF_USE_DEPRECATED was wrongly put in CFLAGS.
Now, that new Make behaviour is documented in their NEWS file:
* WARNING: Backward-incompatibility!
Number signs (#) appearing inside a macro reference or function invocation
no longer introduce comments and should not be escaped with backslashes:
thus a call such as:
foo := $(shell echo '#')
is legal. Previously the number sign needed to be escaped, for example:
foo := $(shell echo '\#')
Now this latter will resolve to "\#". If you want to write makefiles
portable to both versions, assign the number sign to a variable:
C := \#
foo := $(shell echo '$C')
This was claimed to be fixed in 3.81, but wasn't, for some reason.
To detect this change search for 'nocomment' in the .FEATURES variable.
This also fixes up the two make-cmd instances to replace # with $(pound)
rather than with \#. There might very well be other places that need
similar fixup in preparation for whatever future Make release contains
the above change, but at least this builds an x86_64 defconfig with the
new make.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=197847
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c5c2a97b3a upstream.
This commit fixes a rare but possible case when the clk rate is updated
without update of the regulator voltage.
At boot up, CPUfreq checks if the system is running at the right freq. This
is a sanity check in case a bootloader set clk rate that is outside of freq
table present with cpufreq core. In such cases system can be unstable so
better to change it to a freq that is preset in freq-table.
The CPUfreq takes next freq that is >= policy->cur and this is our
target_freq that needs to be set now.
dev_pm_opp_set_rate(dev, target_freq) checks the target_freq and the
old_freq (a current rate). If these are equal it returns early. If not,
it searches for OPP (old_opp) that fits best to old_freq (not listed in
the table) and updates old_freq (!).
Here, we can end up with old_freq = old_opp.rate = target_freq, which
is not handled in _generic_set_opp_regulator(). It's supposed to update
voltage only when freq > old_freq || freq > old_freq.
if (freq > old_freq) {
ret = _set_opp_voltage(dev, reg, new_supply);
[...]
if (freq < old_freq) {
ret = _set_opp_voltage(dev, reg, new_supply);
if (ret)
It results in, no voltage update while clk rate is updated.
Example:
freq-table = {
1000MHz 1.15V
666MHZ 1.10V
333MHz 1.05V
}
boot-up-freq = 800MHz # not listed in freq-table
freq = target_freq = 1GHz
old_freq = 800Mhz
old_opp = _find_freq_ceil(opp_table, &old_freq); #(old_freq is modified!)
old_freq = 1GHz
Fixes: 6a0712f6f1 ("PM / OPP: Add dev_pm_opp_set_rate()")
Cc: 4.6+ <stable@vger.kernel.org> # v4.6+
Signed-off-by: Waldemar Rymarkiewicz <waldemar.rymarkiewicz@gmail.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4f65245f2d upstream.
uref->field_index, uref->usage_index, finfo.field_index and cinfo.index can be
indirectly controlled by user-space, hence leading to a potential exploitation
of the Spectre variant 1 vulnerability.
This issue was detected with the help of Smatch:
drivers/hid/usbhid/hiddev.c:473 hiddev_ioctl_usage() warn: potential spectre issue 'report->field' (local cap)
drivers/hid/usbhid/hiddev.c:477 hiddev_ioctl_usage() warn: potential spectre issue 'field->usage' (local cap)
drivers/hid/usbhid/hiddev.c:757 hiddev_ioctl() warn: potential spectre issue 'report->field' (local cap)
drivers/hid/usbhid/hiddev.c:801 hiddev_ioctl() warn: potential spectre issue 'hid->collection' (local cap)
Fix this by sanitizing such structure fields before using them to index
report->field, field->usage and hid->collection
Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].
[1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2
Cc: stable@vger.kernel.org
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ef6eaf2727 upstream.
Commit ac75a04104 ("HID: i2c-hid: fix size check and type usage") started
writing messages when the ret_size is <= 2 from i2c_master_recv. However, my
device i2c-DLL07D1 returns 2 for a short period of time (~0.5s) after I stop
moving the pointing stick or touchpad. It varies, but you get ~50 messages
each time which spams the log hard.
[ 95.925055] i2c_hid i2c-DLL07D1:01: i2c_hid_get_input: incomplete report (83/2)
This has also been observed with a i2c-ALP0017.
[ 1781.266353] i2c_hid i2c-ALP0017:00: i2c_hid_get_input: incomplete report (30/2)
Only print the message when ret_size is totally invalid and less than 2 to cut
down on the log spam.
Fixes: ac75a04104 ("HID: i2c-hid: fix size check and type usage")
Reported-by: John Smith <john-s-84@gmx.net>
Cc: stable@vger.kernel.org
Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jiri Slaby noticed that the backport of upstream commit 25cc72a338
("mlxsw: spectrum: Forbid linking to devices that have uppers") to
kernel 4.9.y introduced the same check twice in the same function
instead of in two different places.
Fix this by relocating one of the checks to its intended place, thus
preventing unsupported configurations as described in the original
commit.
Fixes: 73ee5a73e7 ("mlxsw: spectrum: Forbid linking to devices that have uppers")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a17712c8e4 upstream.
This patch attempts to close a hole leading to a BUG seen with hot
removals during writes [1].
A block device (NVME namespace in this test case) is formatted to EXT4
without partitions. It's mounted and write I/O is run to a file, then
the device is hot removed from the slot. The superblock attempts to be
written to the drive which is no longer present.
The typical chain of events leading to the BUG:
ext4_commit_super()
__sync_dirty_buffer()
submit_bh()
submit_bh_wbc()
BUG_ON(!buffer_mapped(bh));
This fix checks for the superblock's buffer head being mapped prior to
syncing.
[1] https://www.spinics.net/lists/linux-ext4/msg56527.html
Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bfe0a5f47a upstream.
The kernel's ext4 mount-time checks were more permissive than
e2fsprogs's libext2fs checks when opening a file system. The
superblock is considered too insane for debugfs or e2fsck to operate
on it, the kernel has no business trying to mount it.
This will make file system fuzzing tools work harder, but the failure
cases that they find will be more useful and be easier to evaluate.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c37e9e0134 upstream.
If there is a directory entry pointing to a system inode (such as a
journal inode), complain and declare the file system to be corrupted.
Also, if the superblock's first inode number field is too small,
refuse to mount the file system.
This addresses CVE-2018-10882.
https://bugzilla.kernel.org/show_bug.cgi?id=200069
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6e8ab72a81 upstream.
When converting from an inode from storing the data in-line to a data
block, ext4_destroy_inline_data_nolock() was only clearing the on-disk
copy of the i_blocks[] array. It was not clearing copy of the
i_blocks[] in ext4_inode_info, in i_data[], which is the copy actually
used by ext4_map_blocks().
This didn't matter much if we are using extents, since the extents
header would be invalid and thus the extents could would re-initialize
the extents tree. But if we are using indirect blocks, the previous
contents of the i_blocks array will be treated as block numbers, with
potentially catastrophic results to the file system integrity and/or
user data.
This gets worse if the file system is using a 1k block size and
s_first_data is zero, but even without this, the file system can get
quite badly corrupted.
This addresses CVE-2018-10881.
https://bugzilla.kernel.org/show_bug.cgi?id=200015
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8844618d8a upstream.
The bg_flags field in the block group descripts is only valid if the
uninit_bg or metadata_csum feature is enabled. We were not
consistently looking at this field; fix this.
Also block group #0 must never have uninitialized allocation bitmaps,
or need to be zeroed, since that's where the root inode, and other
special inodes are set up. Check for these conditions and mark the
file system as corrupted if they are detected.
This addresses CVE-2018-10876.
https://bugzilla.kernel.org/show_bug.cgi?id=199403
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e09463f220 upstream.
Do not set the b_modified flag in block's journal head should not
until after we're sure that jbd2_journal_dirty_metadat() will not
abort with an error due to there not being enough space reserved in
the jbd2 handle.
Otherwise, future attempts to modify the buffer may lead a large
number of spurious errors and warnings.
This addresses CVE-2018-10883.
https://bugzilla.kernel.org/show_bug.cgi?id=200071
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 99ec9e7751 upstream.
The displaylink hardware has such a peculiarity that it doesn't render a
command until next command is received. This produces occasional
corruption, such as when setting 22x11 font on the console, only the first
line of the cursor will be blinking if the cursor is located at some
specific columns.
When we end up with a repeating pixel, the driver has a bug that it leaves
one uninitialized byte after the command (and this byte is enough to flush
the command and render it - thus it fixes the screen corruption), however
whe we end up with a non-repeating pixel, there is no byte appended and
this results in temporary screen corruption.
This patch fixes the screen corruption by always appending a byte 0xAF at
the end of URB. It also removes the uninitialized byte.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7ffbe65578 upstream.
For every request we send, whether it is SMB1 or SMB2+, we attempt to
reconnect tcon (cifs_reconnect_tcon or smb2_reconnect) before carrying
out the request.
So, while server->tcpStatus != CifsNeedReconnect, we wait for the
reconnection to succeed on wait_event_interruptible_timeout(). If it
returns, that means that either the condition was evaluated to true, or
timeout elapsed, or it was interrupted by a signal.
Since we're not handling the case where the process woke up due to a
received signal (-ERESTARTSYS), the next call to
wait_event_interruptible_timeout() will _always_ fail and we end up
looping forever inside either cifs_reconnect_tcon() or smb2_reconnect().
Here's an example of how to trigger that:
$ mount.cifs //foo/share /mnt/test -o
username=foo,password=foo,vers=1.0,hard
(break connection to server before executing bellow cmd)
$ stat -f /mnt/test & sleep 140
[1] 2511
$ ps -aux -q 2511
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 2511 0.0 0.0 12892 1008 pts/0 S 12:24 0:00 stat -f
/mnt/test
$ kill -9 2511
(wait for a while; process is stuck in the kernel)
$ ps -aux -q 2511
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 2511 83.2 0.0 12892 1008 pts/0 R 12:24 30:01 stat -f
/mnt/test
By using 'hard' mount point means that cifs.ko will keep retrying
indefinitely, however we must allow the process to be killed otherwise
it would hang the system.
Signed-off-by: Paulo Alcantara <palcantara@suse.de>
Cc: stable@vger.kernel.org
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 64dafbc953 upstream.
We have
struct drbd_requests { ... struct bio *private_bio; ... }
to hold a bio clone for local submission.
On local IO completion, we put that bio, and in case we want to use the
result later, we overload that member to hold the ERR_PTR() of the
completion result,
Which, before v4.3, used to be the passed in "int error",
so we could first bio_put(), then assign.
v4.3-rc1~100^2~21 4246a0b63b block: add a bi_error field to struct bio
changed that:
bio_put(req->private_bio);
- req->private_bio = ERR_PTR(error);
+ req->private_bio = ERR_PTR(bio->bi_error);
Which introduces an access after free,
because it was non obvious that req->private_bio == bio.
Impact of that was mostly unnoticable, because we only use that value
in a multiple-failure case, and even then map any "unexpected" error
code to EIO, so worst case we could potentially mask a more specific
error with EIO in a multiple failure case.
Unless the pointed to memory region was unmapped, as is the case with
CONFIG_DEBUG_PAGEALLOC, in which case this results in
BUG: unable to handle kernel paging request
v4.13-rc1~70^2~75 4e4cbee93d block: switch bios to blk_status_t
changes it further to
bio_put(req->private_bio);
req->private_bio = ERR_PTR(blk_status_to_errno(bio->bi_status));
And blk_status_to_errno() now contains a WARN_ON_ONCE() for unexpected
values, which catches this "sometimes", if the memory has been reused
quickly enough for other things.
Should also go into stable since 4.3, with the trivial change around 4.13.
Cc: stable@vger.kernel.org
Fixes: 4246a0b63b block: add a bi_error field to struct bio
Reported-by: Sarah Newman <srn@prgmr.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 891f6a726c upstream.
In the critical section cleanup we must not mess with r1. For march=z9
or older, larl + ex (instead of exrl) are used with r1 as a temporary
register. This can clobber r1 in several interrupt handlers. Fix this by
using r11 as a temp register. r11 is being saved by all callers of
cleanup_critical.
Fixes: 6dd85fbb87 ("s390: move expoline assembler macros to a header")
Cc: stable@vger.kernel.org #v4.16
Reported-by: Oliver Kurz <okurz@suse.com>
Reported-by: Petr Tesařík <ptesarik@suse.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 26b5b874af upstream.
As Al Viro noted in commit 128394eff3 ("sg_write()/bsg_write() is not fit
to be called under KERNEL_DS"), sg improperly accesses userspace memory
outside the provided buffer, permitting kernel memory corruption via
splice(). But it doesn't just do it on ->write(), also on ->read().
As a band-aid, make sure that the ->read() and ->write() handlers can not
be called in weird contexts (kernel context or credentials different from
file opener), like for ib_safe_file_access().
If someone needs to use these interfaces from different security contexts,
a new interface should be written that goes through the ->ioctl() handler.
I've mostly copypasted ib_safe_file_access() over as sg_safe_file_access()
because I couldn't find a good common header - please tell me if you know a
better way.
[mkp: s/_safe_/_check_/]
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Cc: <stable@vger.kernel.org>
Signed-off-by: Jann Horn <jannh@google.com>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1bc0299d97 upstream.
The following code fails to allocate a buffer for the
tail address that the hardware DMAs into when the user
context DMA_RTAIL is set.
if (HFI1_CAP_KGET_MASK(rcd->flags, DMA_RTAIL)) {
rcd->rcvhdrtail_kvaddr = dma_zalloc_coherent(
&dd->pcidev->dev, PAGE_SIZE, &dma_hdrqtail,
gfp_flags);
if (!rcd->rcvhdrtail_kvaddr)
goto bail_free;
rcd->rcvhdrqtailaddr_dma = dma_hdrqtail;
}
So the rcvhdrtail_kvaddr would then be NULL.
The mmap logic fails to check for a NULL rcvhdrtail_kvaddr.
The fix is to test for both user and kernel DMA_TAIL options
during the allocation as well as testing for a NULL
rcvhdrtail_kvaddr during the mmap processing.
Additionally, all downstream testing of the capmask for DMA_RTAIL
have been eliminated in favor of testing rcvhdrtail_kvaddr.
Cc: <stable@vger.kernel.org> # 4.9.x
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit df07101e1c upstream.
According to the reference manual the shp_2_mcu / mcu_2_shp
scripts must be used for devices connected through the SPBA.
This fixes an issue we saw with DMA transfers.
Sometimes the SPI controller RX FIFO was not empty after a DMA
transfer and the driver got stuck in the next PIO transfer when
it read one word more than expected.
commit dd4b487b32 ("ARM: dts: imx6: Use correct SDMA script
for SPI cores") is fixing the same issue but only for SPI1 - 4.
Fixes: 677940258d ("ARM: dts: imx6q: enable dma for ecspi5")
Signed-off-by: Sean Nyekjaer <sean.nyekjaer@prevas.dk>
Reviewed-by: Fabio Estevam <fabio.estevam@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bfe7244257 upstream.
Now the command:
ethtool --phy-statistics eth0
will cause system crash with meassage "Unable to handle kernel NULL pointer
dereference at virtual address 00000010" from:
(kszphy_get_stats) from [<c069f1d8>] (ethtool_get_phy_stats+0xd8/0x210)
(ethtool_get_phy_stats) from [<c06a0738>] (dev_ethtool+0x5b8/0x228c)
(dev_ethtool) from [<c06b5484>] (dev_ioctl+0x3fc/0x964)
(dev_ioctl) from [<c0679f7c>] (sock_ioctl+0x170/0x2c0)
(sock_ioctl) from [<c02419d4>] (do_vfs_ioctl+0xa8/0x95c)
(do_vfs_ioctl) from [<c02422c4>] (SyS_ioctl+0x3c/0x64)
(SyS_ioctl) from [<c0107d60>] (ret_fast_syscall+0x0/0x44)
The reason: phy_driver structure for KSZ9031 phy has no .probe() callback
defined. As result, struct phy_device *phydev->priv pointer will not be
initializes (null).
This issue will affect also following phys:
KSZ8795, KSZ886X, KSZ8873MLL, KSZ9031, KSZ9021, KSZ8061, KS8737
Fix it by:
- adding .probe() = kszphy_probe() callback to KSZ9031, KSZ9021
phys. The kszphy_probe() can be re-used as it doesn't do any phy specific
settings.
- removing statistic callbacks from other phys (KSZ8795, KSZ886X,
KSZ8873MLL, KSZ8061, KS8737) as they doesn't have corresponding
statistic counters.
Fixes: 2b2427d064 ("phy: micrel: Add ethtool statistics counters")
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Dan Rue <dan.rue@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit adc972c5b8 upstream.
When depth of chain is bigger than NFT_JUMP_STACK_SIZE, the nft_do_chain
crashes. But there is no need to crash hard here.
Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The validation code modified by commit 5b5e7a0de2 ("net: metrics:
add proper netlink validation") is organised differently in older
kernel versions. The fib_convert_metrics() function that is modified
in the backports to 4.4 and 4.9 needs to returns an error code, not a
success flag.
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ae481cc139 upstream.
Resume failed because of uninitialized registers. Instead of adding a
resume callback, we simply initialize registers before every transfer.
This lightweight change is more robust and will keep us safe if we ever
need support for power domains or dynamic frequency changes.
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Acked-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ebec3f8f52 upstream.
syzbot is reporting stalls at __process_echoes() [1]. This is because
since ldata->echo_commit < ldata->echo_tail becomes true for some reason,
the discard loop is serving as almost infinite loop. This patch tries to
avoid falling into ldata->echo_commit < ldata->echo_tail situation by
making access to echo_* variables more carefully.
Since reset_buffer_flags() is called without output_lock held, it should
not touch echo_* variables. And omit a call to reset_buffer_flags() from
n_tty_open() by using vzalloc().
Since add_echo_byte() is called without output_lock held, it needs memory
barrier between storing into echo_buf[] and incrementing echo_head counter.
echo_buf() needs corresponding memory barrier before reading echo_buf[].
Lack of handling the possibility of not-yet-stored multi-byte operation
might be the reason of falling into ldata->echo_commit < ldata->echo_tail
situation, for if I do WARN_ON(ldata->echo_commit == tail + 1) prior to
echo_buf(ldata, tail + 1), the WARN_ON() fires.
Also, explicitly masking with buffer for the former "while" loop, and
use ldata->echo_commit > tail for the latter "while" loop.
[1] https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: syzbot <syzbot+108696293d7a21ab688f@syzkaller.appspotmail.com>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8760675932 upstream.
The dwc2_get_ls_map() use ttport to reference into the
bitmap if we're on a multi_tt hub. But the bitmaps index
from 0 to (hub->maxchild - 1), while the ttport index from
1 to hub->maxchild. This will cause invalid memory access
when the number of ttport is hub->maxchild.
Without this patch, I can easily meet a Kernel panic issue
if connect a low-speed USB mouse with the max port of FE2.1
multi-tt hub (1a40:0201) on rk3288 platform.
Fixes: 9f9f09b048 ("usb: dwc2: host: Totally redo the microframe scheduler")
Cc: <stable@vger.kernel.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Minas Harutyunyan hminas@synopsys.com>
Signed-off-by: William Wu <william.wu@rock-chips.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2f83982338 upstream.
Silicon Labs defines alternative VID/PID pairs for some chips that when
used will automatically install drivers for Windows users without manual
intervention. Unfortunately, these IDs are not recognized by the Linux
module, so using these IDs improves user experience on one platform but
degrades it on Linux. This patch addresses this problem.
Signed-off-by: Karoly Pados <pados@pados.hu>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a685557fbb upstream.
Discards issued to a DM thin device can complete to userspace (via
fstrim) _before_ the metadata changes associated with the discards is
reflected in the thinp superblock (e.g. free blocks). As such, if a
user constructs a test that loops repeatedly over these steps, block
allocation can fail due to discards not having completed yet:
1) fill thin device via filesystem file
2) remove file
3) fstrim
From initial report, here:
https://www.redhat.com/archives/dm-devel/2018-April/msg00022.html
"The root cause of this issue is that dm-thin will first remove
mapping and increase corresponding blocks' reference count to prevent
them from being reused before DISCARD bios get processed by the
underlying layers. However. increasing blocks' reference count could
also increase the nr_allocated_this_transaction in struct sm_disk
which makes smd->old_ll.nr_allocated +
smd->nr_allocated_this_transaction bigger than smd->old_ll.nr_blocks.
In this case, alloc_data_block() will never commit metadata to reset
the begin pointer of struct sm_disk, because sm_disk_get_nr_free()
always return an underflow value."
While there is room for improvement to the space-map accounting that
thinp is making use of: the reality is this test is inherently racey and
will result in the previous iteration's fstrim's discard(s) completing
vs concurrent block allocation, via dd, in the next iteration of the
loop.
No amount of space map accounting improvements will be able to allow
user's to use a block before a discard of that block has completed.
So the best we can really do is allow DM thinp to gracefully handle such
aggressive use of all the pool's data by degrading the pool into
out-of-data-space (OODS) mode. We _should_ get that behaviour already
(if space map accounting didn't falsely cause alloc_data_block() to
believe free space was available).. but short of that we handle the
current reality that dm_pool_alloc_data_block() can return -ENOSPC.
Reported-by: Dennis Yang <dennisyang@qnap.com>
Cc: stable@vger.kernel.org
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d5a6cabf02 upstream.
Some Lenovo laptops, e.g. Lenovo P50, showed the pop noise at resume
or runtime resume. It turned out to be reduced by applying
alc_no_shutup() just like TPT440 quirk does.
Since there are many Lenovo models showing the same behavior, put this
workaround in ALC269_FIXUP_THINKPAD_ACPI entry so that it's applied
commonly to all such Lenovo machines.
Reported-by: Hans de Goede <hdegoede@redhat.com>
Tested-by: Benjamin Berg <bberg@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 50fc7b6195 upstream.
Commit 40f7090bb1 ("Input: elan_i2c_smbus - fix corrupted stack")
fixed most of the functions using i2c_smbus_read_block_data() to
allocate a buffer with the maximum block size. However three
functions were left unchanged:
* In elan_smbus_initialize(), increase the buffer size in the same
way.
* In elan_smbus_calibrate_result(), the buffer is provided by the
caller (calibrate_store()), so introduce a bounce buffer. Also
name the result buffer size.
* In elan_smbus_get_report(), the buffer is provided by the caller
but happens to be the right length. Add a compile-time assertion
to ensure this remains the case.
Cc: <stable@vger.kernel.org> # 3.19+
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fa65653e57 upstream.
Detect when a directory entry is (possibly partially) beyond directory
size and return EIO in that case since it means the filesystem is
corrupted. Otherwise directory operations can further corrupt the
directory and possibly also oops the kernel.
CC: Anatoly Trosinenko <anatoly.trosinenko@gmail.com>
CC: stable@vger.kernel.org
Reported-and-tested-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit eef04c7b37 upstream.
Commit 910f8befdf ("xen/pirq: fix error path cleanup when binding
MSIs") fixed a couple of errors in error cleanup path of
xen_bind_pirq_msi_to_irq(). This cleanup allowed a call to
__unbind_from_irq() with an unbound irq, which would result in
triggering the BUG_ON there.
Since there is really no reason for the BUG_ON (xen_free_irq() can
operate on unbound irqs) we can remove it.
Reported-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: stable@vger.kernel.org
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2bdce74412 upstream.
Hussam reports:
I was poking around and for no real reason, I did cat /dev/mem and
strings /dev/mem. Then I saw the following warning in dmesg. I saved it
and rebooted immediately.
memremap attempted on mixed range 0x000000000009c000 size: 0x1000
------------[ cut here ]------------
WARNING: CPU: 0 PID: 11810 at kernel/memremap.c:98 memremap+0x104/0x170
[..]
Call Trace:
xlate_dev_mem_ptr+0x25/0x40
read_mem+0x89/0x1a0
__vfs_read+0x36/0x170
The memremap() implementation checks for attempts to remap System RAM
with MEMREMAP_WB and instead redirects those mapping attempts to the
linear map. However, that only works if the physical address range
being remapped is page aligned. In low memory we have situations like
the following:
00000000-00000fff : Reserved
00001000-0009fbff : System RAM
0009fc00-0009ffff : Reserved
...where System RAM intersects Reserved ranges on a sub-page page
granularity.
Given that devmem_is_allowed() special cases any attempt to map System
RAM in the first 1MB of memory, replace page_is_ram() with the more
precise region_intersects() to trap attempts to map disallowed ranges.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=199999
Link: http://lkml.kernel.org/r/152856436164.18127.2847888121707136898.stgit@dwillia2-desk3.amr.corp.intel.com
Fixes: 92281dee82 ("arch: introduce memremap()")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reported-by: Hussam Al-Tayeb <me@hussam.eu.org>
Tested-by: Hussam Al-Tayeb <me@hussam.eu.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 23edca8649 upstream.
There is a problem if we are going to unmap a rbd device and the
watch_dwork is going to queue delayed work for watch:
unmap Thread watch Thread timer
do_rbd_remove
cancel_tasks_sync(rbd_dev)
queue_delayed_work for watch
destroy_workqueue(rbd_dev->task_wq)
drain_workqueue(wq)
destroy other resources in wq
call_timer_fn
__queue_work()
Then the delayed work escape the cancel_tasks_sync() and
destroy_workqueue() and we will get an user-after-free call trace:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
Modules linked in:
CPU: 7 PID: 0 Comm: swapper/7 Tainted: G OE 4.17.0-rc6+ #13
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
RIP: 0010:__queue_work+0x6a/0x3b0
RSP: 0018:ffff9427df1c3e90 EFLAGS: 00010086
RAX: ffff9427deca8400 RBX: 0000000000000000 RCX: 0000000000000000
RDX: ffff9427deca8400 RSI: ffff9427df1c3e50 RDI: 0000000000000000
RBP: ffff942783e39e00 R08: ffff9427deca8400 R09: ffff9427df1c3f00
R10: 0000000000000004 R11: 0000000000000005 R12: ffff9427cfb85970
R13: 0000000000002000 R14: 000000000001eca0 R15: 0000000000000007
FS: 0000000000000000(0000) GS:ffff9427df1c0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000004c900a005 CR4: 00000000000206e0
Call Trace:
<IRQ>
? __queue_work+0x3b0/0x3b0
call_timer_fn+0x2d/0x130
run_timer_softirq+0x16e/0x430
? tick_sched_timer+0x37/0x70
__do_softirq+0xd2/0x280
irq_exit+0xd5/0xe0
smp_apic_timer_interrupt+0x6c/0x130
apic_timer_interrupt+0xf/0x20
[ Move rbd_dev->watch_dwork cancellation so that rbd_reregister_watch()
either bails out early because the watch is UNREGISTERED at that point
or just gets cancelled. ]
Cc: stable@vger.kernel.org
Fixes: 99d1694310 ("rbd: retry watch re-registration periodically")
Signed-off-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1d375b58c1 upstream.
On some devices the contents of the ctrl register get lost over a
suspend/resume and the PWM comes back up disabled after the resume.
This is seen on some Bay Trail devices with the PWM in ACPI enumerated
mode, so it shows up as a platform device instead of a PCI device.
If we still think it is enabled and then try to change the duty-cycle
after this, we end up with a "PWM_SW_UPDATE was not cleared" error and
the PWM is stuck in that state from then on.
This commit adds suspend and resume pm callbacks to the pwm-lpss-platform
code, which save/restore the ctrl register over a suspend/resume, fixing
this.
Note that:
1) There is no need to do this over a runtime suspend, since we
only runtime suspend when disabled and then we properly set the enable
bit and reprogram the timings when we re-enable the PWM.
2) This may be happening on more systems then we realize, but has been
covered up sofar by a bug in the acpi-lpss.c code which was save/restoring
the regular device registers instead of the lpss private registers due to
lpss_device_desc.prv_offset not being set. This is fixed by a later patch
in this series.
Cc: stable@vger.kernel.org
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d68894800e upstream.
In nfs_idmap_read_and_verify_message there is an incorrect sprintf '%d'
that converts the __u32 'im_id' from struct idmap_msg to 'id_str', which
is a stack char array variable of length NFS_UINT_MAXLEN == 11.
If a uid or gid value is > 2147483647 = 0x7fffffff, the conversion
overflows into a negative value, for example:
crash> p (unsigned) (0x80000000)
$1 = 2147483648
crash> p (signed) (0x80000000)
$2 = -2147483648
The '-' sign is written to the buffer and this causes a 1 byte overflow
when the NULL byte is written, which corrupts kernel stack memory. If
CONFIG_CC_STACKPROTECTOR_STRONG is set we see a stack-protector panic:
[11558053.616565] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffffa05b8a8c
[11558053.639063] CPU: 6 PID: 9423 Comm: rpc.idmapd Tainted: G W ------------ T 3.10.0-514.el7.x86_64 #1
[11558053.641990] Hardware name: Red Hat OpenStack Compute, BIOS 1.10.2-3.el7_4.1 04/01/2014
[11558053.644462] ffffffff818c7bc0 00000000b1f3aec1 ffff880de0f9bd48 ffffffff81685eac
[11558053.646430] ffff880de0f9bdc8 ffffffff8167f2b3 ffffffff00000010 ffff880de0f9bdd8
[11558053.648313] ffff880de0f9bd78 00000000b1f3aec1 ffffffff811dcb03 ffffffffa05b8a8c
[11558053.650107] Call Trace:
[11558053.651347] [<ffffffff81685eac>] dump_stack+0x19/0x1b
[11558053.653013] [<ffffffff8167f2b3>] panic+0xe3/0x1f2
[11558053.666240] [<ffffffff811dcb03>] ? kfree+0x103/0x140
[11558053.682589] [<ffffffffa05b8a8c>] ? idmap_pipe_downcall+0x1cc/0x1e0 [nfsv4]
[11558053.689710] [<ffffffff810855db>] __stack_chk_fail+0x1b/0x30
[11558053.691619] [<ffffffffa05b8a8c>] idmap_pipe_downcall+0x1cc/0x1e0 [nfsv4]
[11558053.693867] [<ffffffffa00209d6>] rpc_pipe_write+0x56/0x70 [sunrpc]
[11558053.695763] [<ffffffff811fe12d>] vfs_write+0xbd/0x1e0
[11558053.702236] [<ffffffff810acccc>] ? task_work_run+0xac/0xe0
[11558053.704215] [<ffffffff811fec4f>] SyS_write+0x7f/0xe0
[11558053.709674] [<ffffffff816964c9>] system_call_fastpath+0x16/0x1b
Fix this by calling the internally defined nfs_map_numeric_to_string()
function which properly uses '%u' to convert this __u32. For consistency,
also replace the one other place where snprintf is called.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Reported-by: Stephen Johnston <sjohnsto@redhat.com>
Fixes: cf4ab538f1 ("NFSv4: Fix the string length returned by the idmapper")
Cc: stable@vger.kernel.org # v3.4+
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9c2ece6ef6 upstream.
nfsd4_readdir_rsize restricts rd_maxcount to svc_max_payload when
estimating the size of the readdir reply, but nfsd_encode_readdir
restricts it to INT_MAX when encoding the reply. This can result in log
messages like "kernel: RPC request reserved 32896 but used 1049444".
Restrict rd_dircount similarly (no reason it should be larger than
svc_max_payload).
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 76d81243a4 upstream.
As warned by smatch:
drivers/media/dvb-core/dvb_frontend.c:314 dvb_frontend_get_event() warn: inconsistent returns 'sem:&fepriv->sem'.
Locked on: line 288
line 295
line 306
line 314
Unlocked on: line 303
The lock implementation for get event is wrong, as, if an
interrupt occurs, down_interruptible() will fail, and the
routine will call up() twice when userspace calls the ioctl
again.
The bad code is there since when Linux migrated to git, in
2005.
Cc: stable@vger.kernel.org
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ea72fbf588 upstream.
As warned by smatch:
drivers/media/v4l2-core/v4l2-compat-ioctl32.c:879 put_v4l2_ext_controls32() warn: check for integer overflow 'count'
The access_ok() logic should check for too big arrays too.
Cc: stable@vger.kernel.org
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d28b625208 upstream.
According to documentation REMAP register has to be programmed in
either DMA or PIO mode of the slice.
Move the DMA capability check below to let REMAP register be programmed
in PIO mode.
Cc: stable@vger.kernel.org # 4.3+
Fixes: 4b45efe852 ("mfd: Add support for Intel Sunrisepoint LPSS devices")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2b12dfa124 upstream.
Fix child-node lookup during probe, which ended up searching the whole
device tree depth-first starting at the parent rather than just matching
on its children.
This would only cause trouble if the child node is missing while there
is an unrelated node named "backlight" elsewhere in the tree.
Cc: stable <stable@vger.kernel.org> # 3.7
Fixes: eebfdc17cc ("backlight: Add TPS65217 WLED driver")
Signed-off-by: Johan Hovold <johan@kernel.org>
Acked-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d1cc0ec3da upstream.
Fix child-node lookup during probe, which ended up searching the whole
device tree depth-first starting at the parent rather than just matching
on its children.
To make things worse, the parent mfd node was also prematurely freed,
while the child backlight node was leaked.
Cc: stable <stable@vger.kernel.org> # 3.9
Fixes: 47ec340cb8 ("mfd: max8925: Support dt for backlight")
Signed-off-by: Johan Hovold <johan@kernel.org>
Acked-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4a9c8bb2ac upstream.
Fix child-node lookup during probe, which ended up searching the whole
device tree depth-first starting at the parent rather than just matching
on its children.
To make things worse, the parent mfd node was also prematurely freed.
Cc: stable <stable@vger.kernel.org> # 3.10
Fixes: 59eb2b5e57 ("drivers/video/backlight/as3711_bl.c: add OF support")
Signed-off-by: Johan Hovold <johan@kernel.org>
Acked-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 353748a359 upstream.
There is potential for the size and len fields in ubifs_data_node to be
too large causing either a negative value for the length fields or an
integer overflow leading to an incorrect memory allocation. Likewise,
when the len field is small, an integer underflow may occur.
Signed-off-by: Silvio Cesare <silvio.cesare@gmail.com>
Fixes: 1e51764a3c ("UBIFS: add new flash file system")
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 781932375f upstream.
Fastmap cannot track the LEB unmap operation, therefore it can
happen that after an interrupted erasure the mapping still looks
good from Fastmap's point of view, while reading from the PEB will
cause an ECC error and confuses the upper layer.
Instead of teaching users of UBI how to deal with that, we read back
the VID header and check for errors. If the PEB is empty or shows ECC
errors we fixup the mapping and schedule the PEB for erasure.
Fixes: dbb7d2a88d ("UBI: Add fastmap core")
Cc: <stable@vger.kernel.org>
Reported-by: martin bayern <Martinbayern@outlook.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6e7d801610 upstream.
Ben Hutchings pointed out that 29b7a6fa1e ("ubi: fastmap: Don't flush
fastmap work on detach") does not really fix the problem, it just
reduces the risk to hit the race window where fastmap work races against
free()'ing ubi->volumes[].
The correct approach is making sure that no more fastmap work is in
progress before we free ubi data structures.
So we cancel fastmap work right after the ubi background thread is
stopped.
By setting ubi->thread_enabled to zero we make sure that no further work
tries to wake the thread.
Fixes: 29b7a6fa1e ("ubi: fastmap: Don't flush fastmap work on detach")
Fixes: 74cdaf2400 ("UBI: Fastmap: Fix memory leaks while closing the WL sub-system")
Cc: stable@vger.kernel.org
Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
Cc: Martin Townsend <mtownsend1973@gmail.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4a2e84c6ed upstream.
All the managed resources would be freed by the time release function
is invoked. Handling such memory in qcom_smd_edge_release() would do
bad things.
Found this issue while testing Audio usecase where the dsp is started up
and shutdown in a loop.
This patch fixes this issue by using simple kzalloc for allocating
channel->name and channel which is then freed in qcom_smd_edge_release().
Without this patch restarting a remoteproc would crash the system.
Fixes: 53e2822e56 ("rpmsg: Introduce Qualcomm SMD backend")
Cc: <stable@vger.kernel.org>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 011abdc9df upstream.
If "re-add" is written to the "state" file for a device
which is faulty, this has an effect similar to removing
and re-adding the device. It should take up the
same slot in the array that it previously had, and
an accelerated (e.g. bitmap-based) rebuild should happen.
The slot that "it previously had" is determined by
rdev->saved_raid_disk.
However this is not set when a device fails (only when a device
is added), and it is cleared when resync completes.
This means that "re-add" will normally work once, but may not work a
second time.
This patch includes two fixes.
1/ when a device fails, record the ->raid_disk value in
->saved_raid_disk before clearing ->raid_disk
2/ when "re-add" is written to a device for which
->saved_raid_disk is not set, fail.
I think this is suitable for stable as it can
cause re-adding a device to be forced to do a full
resync which takes a lot longer and so puts data at
more risk.
Cc: <stable@vger.kernel.org> (v4.1)
Fixes: 97f6cd39da ("md-cluster: re-add capabilities")
Signed-off-by: NeilBrown <neilb@suse.com>
Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a982e45dc1 upstream.
When a USB device is connected to the USB host port on the SAM9N12 then
you get "-62" error which seems to indicate USB replies from the device
are timing out. Based on a logic sniffer, I saw the USB bus was running
at half speed.
The PLL code uses cached MUL and DIV values which get set in set_rate()
and applied in prepare(), but the recalc_rate() function instead
queries the hardware instead of using these cached values. Therefore,
if recalc_rate() is called between a set_rate() and prepare(), the
wrong frequency is calculated and later the USB clock divider for the
SAM9N12 SOC will be configured for an incorrect clock.
In my case, the PLL hardware was set to 96 Mhz before the OHCI
driver loads, and therefore the usb clock divider was being set
to /2 even though the OHCI driver set the PLL to 48 Mhz.
As an alternative explanation, I noticed this was fixed in the past by
87e2ed338f ("clk: at91: fix recalc_rate implementation of PLL
driver") but the bug was later re-introduced by 1bdf02326b ("clk:
at91: make use of syscon/regmap internally").
Fixes: 1bdf02326b ("clk: at91: make use of syscon/regmap internally)
Cc: <stable@vger.kernel.org>
Signed-off-by: Marcin Ziemianowicz <marcin@ziemianowicz.com>
Acked-by: Boris Brezillon <boris.brezillon@bootlin.com>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 254a4cd50b upstream.
The pmem driver does not honor a forced read-only setting for very long:
$ blockdev --setro /dev/pmem0
$ blockdev --getro /dev/pmem0
1
followed by various commands like these:
$ blockdev --rereadpt /dev/pmem0
or
$ mkfs.ext4 /dev/pmem0
results in this in the kernel serial log:
nd_pmem namespace0.0: region0 read-write, marking pmem0 read-write
with the read-only setting lost:
$ blockdev --getro /dev/pmem0
0
That's from bus.c nvdimm_revalidate_disk(), which always applies the
setting from nd_region (which is initially based on the ACPI NFIT
NVDIMM state flags not_armed bit).
In contrast, commit 20bd1d026a ("scsi: sd: Keep disk read-only when
re-reading partition") fixed this issue for SCSI devices to preserve
the previous setting if it was set to read-only.
This patch modifies bus.c to preserve any previous read-only setting.
It also eliminates the kernel serial log print except for cases where
read-write is changed to read-only, so it doesn't print read-only to
read-only non-changes.
Cc: <stable@vger.kernel.org>
Fixes: 5813882094 ("libnvdimm, nfit: handle unarmed dimms, mark namespaces read-only")
Signed-off-by: Robert Elliott <elliott@hpe.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8c3d20aada upstream.
That other commit introduced an inconsistency because it would trace on
ERP_FAILED for all callers of port forced reopen triggers (not just
terminate_rport_io), but it would not trace on ERP_FAILED for all callers of
other ERP triggers such as adapter, port regular, LUN.
Therefore, generalize that other commit. zfcp_erp_action_enqueue() already
had two early outs which re-used the one zfcp_dbf_rec_trig() call. All ERP
trigger functions finally run through zfcp_erp_action_enqueue(). So move
the special handling for ZFCP_STATUS_COMMON_ERP_FAILED into
zfcp_erp_action_enqueue() and add another early out with new trace marker
for pseudo ERP need in this case. This removes all early returns from all
ERP trigger functions so we always end up at zfcp_dbf_rec_trig().
Example trace record formatted with zfcpdbf from s390-tools:
Timestamp : ...
Area : REC
Subarea : 00
Level : 1
Exception : -
CPU ID : ..
Caller : 0x...
Record ID : 1 ZFCP_DBF_REC_TRIG
Tag : .......
LUN : 0x...
WWPN : 0x...
D_ID : 0x...
Adapter status : 0x...
Port status : 0x...
LUN status : 0x...
Ready count : 0x...
Running count : 0x...
ERP want : 0x0. ZFCP_ERP_ACTION_REOPEN_...
ERP need : 0xe0 ZFCP_ERP_ACTION_FAILED
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Cc: <stable@vger.kernel.org> #2.6.38+
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d70aab5592 upstream.
For problem determination we always want to see when we were invoked on the
terminate_rport_io callback whether we perform something or not.
Temporal event sequence of interest with a long fast_io_fail_tmo of 27 sec:
loose remote port
t workqueue
[s] zfcp_q_<dev> IRQ zfcperp<dev>
=== ================== =================== ============================
0 recv RSCN
q p.test_link_work
block rport
start fast_io_fail_tmo
send ADISC ELS
4 recv ADISC fail
block zfcp_port
port forced reopen
send open port
12 recv open port fail
q p.gid_pn_work
zfcp_erp_wakeup
(zfcp_erp_wait would return)
GID_PN fail
Before this point, we got a SCSI trace with tag "sctrpi1" on fast_io_fail,
e.g. with the typical 5 sec setting.
port.status |= ERP_FAILED
If fast_io_fail_tmo triggers after this point, we missed a SCSI trace.
workqueue
fc_dl_<host>
==================
27 fc_timeout_fail_rport_io
fc_terminate_rport_io
zfcp_scsi_terminate_rport_io
zfcp_erp_port_forced_reopen
_zfcp_erp_port_forced_reopen
if (port.status & ERP_FAILED)
return;
Therefore, write a trace before above early return.
Example trace record formatted with zfcpdbf from s390-tools:
Timestamp : ...
Area : REC
Subarea : 00
Level : 1
Exception : -
CPU ID : ..
Caller : 0x...
Record ID : 1 ZFCP_DBF_REC_TRIG
Tag : sctrpi1 SCSI terminate rport I/O
LUN : 0xffffffffffffffff none (invalid)
WWPN : 0x<wwpn>
D_ID : 0x<n_port_id>
Adapter status : 0x...
Port status : 0x...
LUN status : 0x00000000 none (invalid)
Ready count : 0x...
Running count : 0x...
ERP want : 0x03 ZFCP_ERP_ACTION_REOPEN_PORT_FORCED
ERP need : 0xe0 ZFCP_ERP_ACTION_FAILED
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Cc: <stable@vger.kernel.org> #2.6.38+
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 96d9270499 upstream.
get_device() and its internally used kobject_get() only return NULL if they
get passed NULL as argument. zfcp_get_port_by_wwpn() loops over
adapter->port_list so the iteration variable port is always non-NULL.
Struct device is embedded in struct zfcp_port so &port->dev is always
non-NULL. This is the argument to get_device(). However, if we get an
fc_rport in terminate_rport_io() for which we cannot find a match within
zfcp_get_port_by_wwpn(), the latter can return NULL. v2.6.30 commit
70932935b6 ("[SCSI] zfcp: Fix oops when port disappears") introduced an
early return without adding a trace record for this case. Even if we don't
need recovery in this case, for debugging we should still see that our
callback was invoked originally by scsi_transport_fc.
Example trace record formatted with zfcpdbf from s390-tools:
Timestamp : ...
Area : REC
Subarea : 00
Level : 1
Exception : -
CPU ID : ..
Caller : 0x...
Record ID : 1
Tag : sctrpin SCSI terminate rport I/O, no zfcp port
LUN : 0xffffffffffffffff none (invalid)
WWPN : 0x<wwpn> WWPN
D_ID : 0x<n_port_id> N_Port-ID
Adapter status : 0x...
Port status : 0xffffffff unknown (-1)
LUN status : 0x00000000 none (invalid)
Ready count : 0x...
Running count : 0x...
ERP want : 0x03 ZFCP_ERP_ACTION_REOPEN_PORT_FORCED
ERP need : 0xc0 ZFCP_ERP_ACTION_NONE
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Fixes: 70932935b6 ("[SCSI] zfcp: Fix oops when port disappears")
Cc: <stable@vger.kernel.org> #2.6.38+
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 512857a795 upstream.
If a SCSI device is deleted during scsi_eh host reset, we cannot get a
reference to the SCSI device anymore since scsi_device_get returns !=0 by
design. Assuming the recovery of adapter and port(s) was successful,
zfcp_erp_strategy_followup_success() attempts to trigger a LUN reset for the
half-gone SCSI device. Unfortunately, it causes the following confusing
trace record which states that zfcp will do a LUN recovery as "ERP need" is
ZFCP_ERP_ACTION_REOPEN_LUN == 1 and equals "ERP want".
Old example trace record formatted with zfcpdbf from s390-tools:
Tag: : ersfs_3 ERP, trigger, unit reopen, port reopen succeeded
LUN : 0x<FCP_LUN>
WWPN : 0x<WWPN>
D_ID : 0x<N_Port-ID>
Adapter status : 0x5400050b
Port status : 0x54000001
LUN status : 0x40000000 ZFCP_STATUS_COMMON_RUNNING
but not ZFCP_STATUS_COMMON_UNBLOCKED as it
was closed on close part of adapter reopen
ERP want : 0x01
ERP need : 0x01 misleading
However, zfcp_erp_setup_act() returns NULL as it cannot get the reference.
Hence, zfcp_erp_action_enqueue() takes an early goto out and _NO_ recovery
actually happens.
We always do want the recovery trigger trace record even if no erp_action
could be enqueued as in this case. For other cases where we did not enqueue
an erp_action, 'need' has always been zero to indicate this. In order to
indicate above goto out, introduce an eyecatcher "flag" to mark the "ERP
need" as 'not needed' but still keep the information which erp_action type,
that zfcp_erp_required_act() had decided upon, is needed. 0xc_ is chosen to
be visibly different from 0x0_ in "ERP want".
New example trace record formatted with zfcpdbf from s390-tools:
Tag: : ersfs_3 ERP, trigger, unit reopen, port reopen succeeded
LUN : 0x<FCP_LUN>
WWPN : 0x<WWPN>
D_ID : 0x<N_Port-ID>
Adapter status : 0x5400050b
Port status : 0x54000001
LUN status : 0x40000000
ERP want : 0x01
ERP need : 0xc1 would need LUN ERP, but no action set up
^
Before v2.6.38 commit ae0904f60f ("[SCSI] zfcp: Redesign of the debug
tracing for recovery actions.") we could detect this case because the
"erp_action" field in the trace was NULL. The rework removed erp_action as
argument and field from the trace.
This patch here is for tracing. A fix to allow LUN recovery in the case at
hand is a topic for a separate patch.
See also commit fdbd1c5e27 ("[SCSI] zfcp: Allow running unit/LUN shutdown
without acquiring reference") for a similar case and background info.
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Fixes: ae0904f60f ("[SCSI] zfcp: Redesign of the debug tracing for recovery actions.")
Cc: <stable@vger.kernel.org> #2.6.38+
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 81979ae63e upstream.
We already have a SCSI trace for the end of abort and scsi_eh TMF. Due to
zfcp_erp_wait() and fc_block_scsi_eh() time can pass between the start of
our eh callback and an actual send/recv of an abort / TMF request. In order
to see the temporal sequence including any abort / TMF send retries, add a
trace before the above two blocking functions. This supports problem
determination with scsi_eh and parallel zfcp ERP.
No need to explicitly trace the beginning of our eh callback, since we
typically can send an abort / TMF and see its HBA response (in the worst
case, it's a pseudo response on dismiss all of adapter recovery, e.g. due to
an FSF request timeout [fsrth_1] of the abort / TMF). If we cannot send, we
now get a trace record for the first "abrt_wt" or "[lt]r_wait" which denotes
almost the beginning of the callback.
No need to explicitly trace the wakeup after the above two blocking
functions because the next retry loop causes another trace in any case and
that is sufficient.
Example trace records formatted with zfcpdbf from s390-tools:
Timestamp : ...
Area : SCSI
Subarea : 00
Level : 1
Exception : -
CPU ID : ..
Caller : 0x...
Record ID : 1
Tag : abrt_wt abort, before zfcp_erp_wait()
Request ID : 0x0000000000000000 none (invalid)
SCSI ID : 0x<scsi_id>
SCSI LUN : 0x<scsi_lun>
SCSI LUN high : 0x<scsi_lun_high>
SCSI result : 0x<scsi_result_of_cmd_to_be_aborted>
SCSI retries : 0x<retries_of_cmd_to_be_aborted>
SCSI allowed : 0x<allowed_retries_of_cmd_to_be_aborted>
SCSI scribble : 0x<req_id_of_cmd_to_be_aborted>
SCSI opcode : <CDB_of_cmd_to_be_aborted>
FCP rsp inf cod: 0x.. none (invalid)
FCP rsp IU : ... none (invalid)
Timestamp : ...
Area : SCSI
Subarea : 00
Level : 1
Exception : -
CPU ID : ..
Caller : 0x...
Record ID : 1
Tag : lr_wait LUN reset, before zfcp_erp_wait()
Request ID : 0x0000000000000000 none (invalid)
SCSI ID : 0x<scsi_id>
SCSI LUN : 0x<scsi_lun>
SCSI LUN high : 0x<scsi_lun_high>
SCSI result : 0x... unrelated
SCSI retries : 0x.. unrelated
SCSI allowed : 0x.. unrelated
SCSI scribble : 0x... unrelated
SCSI opcode : ... unrelated
FCP rsp inf cod: 0x.. none (invalid)
FCP rsp IU : ... none (invalid)
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Fixes: 63caf367e1 ("[SCSI] zfcp: Improve reliability of SCSI eh handlers in zfcp")
Fixes: af4de36d91 ("[SCSI] zfcp: Block scsi_eh thread for rport state BLOCKED")
Cc: <stable@vger.kernel.org> #2.6.38+
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit df30781699 upstream.
For problem determination we need to see whether and why we were successful
or not. This allows deduction of scsi_eh escalation.
Example trace record formatted with zfcpdbf from s390-tools:
Timestamp : ...
Area : SCSI
Subarea : 00
Level : 1
Exception : -
CPU ID : ..
Caller : 0x...
Record ID : 1
Tag : schrh_r SCSI host reset handler result
Request ID : 0x0000000000000000 none (invalid)
SCSI ID : 0xffffffff none (invalid)
SCSI LUN : 0xffffffff none (invalid)
SCSI LUN high : 0xffffffff none (invalid)
SCSI result : 0x00002002 field re-used for midlayer value: SUCCESS
or in other cases: 0x2009 == FAST_IO_FAIL
SCSI retries : 0xff none (invalid)
SCSI allowed : 0xff none (invalid)
SCSI scribble : 0xffffffffffffffff none (invalid)
SCSI opcode : ffffffff ffffffff ffffffff ffffffff none (invalid)
FCP rsp inf cod: 0xff none (invalid)
FCP rsp IU : 00000000 00000000 00000000 00000000 none (invalid)
00000000 00000000
v2.6.35 commit a1dbfddd02 ("[SCSI] zfcp: Pass return code from
fc_block_scsi_eh to scsi eh") introduced the first return with something
other than the previously hardcoded single SUCCESS return path.
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Fixes: a1dbfddd02 ("[SCSI] zfcp: Pass return code from fc_block_scsi_eh to scsi eh")
Cc: <stable@vger.kernel.org> #2.6.38+
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 413c2f3348 upstream.
This patch prevents driver from setting lower default speed of 1 GB/sec,
if the switch does not support Get Port Speed Capabilities (GPSC)
command. Setting this default speed results into much lower write
performance for large sequential WRITE. This patch modifies driver to
check for gpsc_supported flags and prevents driver from issuing
MBC_SET_PORT_PARAM (001Ah) to set default speed of 1 GB/sec. If driver
does not send this mailbox command, firmware assumes maximum supported
link speed and will operate at the max speed.
Cc: stable@vger.kernel.org
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Reported-by: Eda Zhou <ezhou@redhat.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Tested-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c043ec1ca5 upstream.
Currently, we use int for buffer length and bytes_per_datum. However,
kfifo uses unsigned int for length and size_t for element size. We need
to make sure these matches or we will have bugs related to overflow (in
the range between INT_MAX and UINT_MAX for length, for example).
In addition, set_bytes_per_datum uses size_t while bytes_per_datum is an
int, which would cause bugs for large values of bytes_per_datum.
Change buffer length to use unsigned int and bytes_per_datum to use
size_t.
Signed-off-by: Martin Kelly <mkelly@xevo.com>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
[bwh: Backported to 4.9:
- Drop change to iio_dma_buffer_set_length()
- Adjust filename, context]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5811375325 upstream.
Fstests generic/475 provides a way to fail metadata reads while
checking if checksum exists for the inode inside run_delalloc_nocow(),
and csum_exist_in_range() interprets error (-EIO) as inode having
checksum and makes its caller enter the cow path.
In case of free space inode, this ends up with a warning in
cow_file_range().
The same problem applies to btrfs_cross_ref_exist() since it may also
read metadata in between.
With this, run_delalloc_nocow() bails out when errors occur at the two
places.
cc: <stable@vger.kernel.org> v2.6.28+
Fixes: 17d217fe97 ("Btrfs: fix nodatasum handling in balancing code")
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c5b4a50b74 upstream.
If we failed during a rename exchange operation after starting/joining a
transaction, we would end up replacing the return value, stored in the
local 'ret' variable, with the return value from btrfs_end_transaction().
So this could end up returning 0 (success) to user space despite the
operation having failed and aborted the transaction, because if there are
multiple tasks having a reference on the transaction at the time
btrfs_end_transaction() is called by the rename exchange, that function
returns 0 (otherwise it returns -EIO and not the original error value).
So fix this by not overwriting the return value on error after getting
a transaction handle.
Fixes: cdd1fedf82 ("btrfs: add support for RENAME_EXCHANGE and RENAME_WHITEOUT")
CC: stable@vger.kernel.org # 4.9+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b65c32ec5a upstream.
The signatureValue field of a X.509 certificate is encoded as a BIT STRING.
For RSA signatures this BIT STRING is of so-called primitive subtype, which
contains a u8 prefix indicating a count of unused bits in the encoding.
We have to strip this prefix from signature data, just as we already do for
key data in x509_extract_key_data() function.
This wasn't noticed earlier because this prefix byte is zero for RSA key
sizes divisible by 8. Since BIT STRING is a big-endian encoding adding zero
prefixes has no bearing on its value.
The signature length, however was incorrect, which is a problem for RSA
implementations that need it to be exactly correct (like AMD CCP).
Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
Fixes: c26fd69fa0 ("X.509: Add a crypto key parser for binary (DER) X.509 certificates")
Cc: stable@vger.kernel.org
Signed-off-by: James Morris <james.morris@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit abcbcb80cd upstream.
For the common cases where 1000 is a multiple of HZ, or HZ is a multiple of
1000, jiffies_to_msecs() never returns zero when passed a non-zero time
period.
However, if HZ > 1000 and not an integer multiple of 1000 (e.g. 1024 or
1200, as used on alpha and DECstation), jiffies_to_msecs() may return zero
for small non-zero time periods. This may break code that relies on
receiving back a non-zero value.
jiffies_to_usecs() does not need such a fix: one jiffy can only be less
than one µs if HZ > 1000000, and such large values of HZ are already
rejected at build time, twice:
- include/linux/jiffies.h does #error if HZ >= 12288,
- kernel/time/time.c has BUILD_BUG_ON(HZ > USEC_PER_SEC).
Broken since forever.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: linux-alpha@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180622143357.7495-1-geert@linux-m68k.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 13c65840fe upstream.
After a suspend/resume cycle the Presence Detect or Data Link Layer Status
Changed bits might be set. If we don't clear them those events will not
fire anymore and nothing happens for instance when a device is now
hot-unplugged.
Fix this by clearing those bits in a newly introduced function
pcie_reenable_notification(). This should be fine because immediately
after, we check if the adapter is still present by reading directly from
the status register.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2a027b47db upstream.
The erratum and workaround are described by BCM5300X-ES300-RDS.pdf as
below.
R10: PCIe Transactions Periodically Fail
Description: The BCM5300X PCIe does not maintain transaction ordering.
This may cause PCIe transaction failure.
Fix Comment: Add a dummy PCIe configuration read after a PCIe
configuration write to ensure PCIe configuration access
ordering. Set ES bit of CP0 configu7 register to enable
sync function so that the sync instruction is functional.
Resolution: hndpci.c: extpci_write_config()
hndmips.c: si_mips_init()
mipsinc.h CONF7_ES
This is fixed by the CFE MIPS bcmsi chipset driver also for BCM47XX.
Also the dummy PCIe configuration read is already implemented in the
Linux BCMA driver.
Enable ExternalSync in Config7 when CONFIG_BCMA_DRIVER_PCI_HOSTMODE=y
too so that the sync instruction is externalised.
Signed-off-by: Tokunori Ikegami <ikegami@allied-telesis.co.jp>
Reviewed-by: Paul Burton <paul.burton@mips.com>
Acked-by: Hauke Mehrtens <hauke@hauke-m.de>
Cc: Chris Packham <chris.packham@alliedtelesis.co.nz>
Cc: Rafał Miłecki <zajec5@gmail.com>
Cc: linux-mips@linux-mips.org
Cc: stable@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/19461/
Signed-off-by: James Hogan <jhogan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0cd8116f17 upstream.
The "sector is in requested range" test used to determine whether
sectors should be re-locked or not is done on a variable that is reset
everytime we cross a chip boundary, which can lead to some blocks being
re-locked while the caller expect them to be unlocked.
Fix the check to make sure this cannot happen.
Fixes: 1648eaaa15 ("mtd: cfi_cmdset_0002: Support Persistent Protection Bits (PPB) locking")
Cc: stable@vger.kernel.org
Signed-off-by: Joakim Tjernlund <joakim.tjernlund@infinera.com>
Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5fdfc3dbad upstream.
cfi_ppb_unlock() tries to relock all sectors that were locked before
unlocking the whole chip.
This locking used the chip start address + the FULL offset from the
first flash chip, thereby forming an illegal address. Fix that by using
the chip offset(adr).
Fixes: 1648eaaa15 ("mtd: cfi_cmdset_0002: Support Persistent Protection Bits (PPB) locking")
Cc: stable@vger.kernel.org
Signed-off-by: Joakim Tjernlund <joakim.tjernlund@infinera.com>
Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7b74a83cf5 upstream.
On fatal error the driver simulates CQE's for ULPs that rely on
completion of all their posted work-request.
For the GSI traffic, the mlx5 has its own mechanism that sends the
completions via software CQE's directly to the relevant CQ.
This should be kept in fatal error too, so the driver should simulate
such CQE's with the specified error state in order to complete GSI QP
work requests.
Without the fix the next deadlock might appears:
schedule_timeout+0x274/0x350
wait_for_common+0xec/0x240
mcast_remove_one+0xd0/0x120 [ib_core]
ib_unregister_device+0x12c/0x230 [ib_core]
mlx5_ib_remove+0xc4/0x270 [mlx5_ib]
mlx5_detach_device+0x184/0x1a0 [mlx5_core]
mlx5_unload_one+0x308/0x340 [mlx5_core]
mlx5_pci_err_detected+0x74/0xe0 [mlx5_core]
Cc: <stable@vger.kernel.org> # 4.7
Fixes: 89ea94a7b6 ("IB/mlx5: Reset flow support for IB kernel ULPs")
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3b9cf7905f upstream.
For strings, account for trailing \0 in property length field:
This is consistent with how dtc builds string properties.
Function __of_prop_dup() would misbehave on such properties as it duplicates
properties based on the property length field creating new string values
without trailing \0s.
Signed-off-by: Stefan M Schaeckeler <sschaeck@cisco.com>
Reviewed-by: Frank Rowand <frank.rowand@sony.com>
Tested-by: Frank Rowand <frank.rowand@sony.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 71c8fc0c96 upstream.
When rewriting swapper using nG mappings, we must performance cache
maintenance around each page table access in order to avoid coherency
problems with the host's cacheable alias under KVM. To ensure correct
ordering of the maintenance with respect to Device memory accesses made
with the Stage-1 MMU disabled, DMBs need to be added between the
maintenance and the corresponding memory access.
This patch adds a missing DMB between writing a new page table entry and
performing a clean+invalidate on the same line.
Fixes: f992b4dfd5 ("arm64: kpti: Add ->enable callback to remap swapper using nG mappings")
Cc: <stable@vger.kernel.org> # 4.16.x-
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b5b7dd647f upstream.
We inspect __kpti_forced early on as part of the cpufeature enable
callback which remaps the swapper page table using non-global entries.
Ensure that __kpti_forced has been updated to reflect the kpti=
command-line option before we start using it.
Fixes: ea1e3de85e ("arm64: entry: Add fake CPU feature for unmapping the kernel at EL0")
Cc: <stable@vger.kernel.org> # 4.16.x-
Reported-by: Wei Xu <xuwei5@hisilicon.com>
Tested-by: Sudeep Holla <sudeep.holla@arm.com>
Tested-by: Wei Xu <xuwei5@hisilicon.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 76ed0b803a upstream.
NUMREGBYTES (which is used as the size for gdb_regs[]) is incorrectly
based on DBG_MAX_REG_NUM instead of GDB_MAX_REGS. DBG_MAX_REG_NUM
is the number of total registers, while GDB_MAX_REGS is the number
of 'unsigned longs' it takes to serialize those registers. Since
FP registers require 3 'unsigned longs' each, DBG_MAX_REG_NUM is
smaller than GDB_MAX_REGS.
This causes GDB 8.0 give the following error on connect:
"Truncated register 19 in remote 'g' packet"
This also causes the register serialization/deserialization logic
to overflow gdb_regs[], overwriting whatever follows.
Fixes: 834b2964b7 ("kgdb,arm: fix register dump")
Cc: <stable@vger.kernel.org> # 2.6.37+
Signed-off-by: David Rivshin <drivshin@allworx.com>
Acked-by: Rabin Vincent <rabin@rab.in>
Tested-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 722cde76d6 upstream.
Unregister fadump on kexec down path otherwise the fadump registration
in new kexec-ed kernel complains that fadump is already registered.
This makes new kernel to continue using fadump registered by previous
kernel which may lead to invalid vmcore generation. Hence this patch
fixes this issue by un-registering fadump in fadump_cleanup() which is
called during kexec path so that new kernel can register fadump with
new valid values.
Fixes: b500afff11 ("fadump: Invalidate registration and release reserved memory for general use.")
Cc: stable@vger.kernel.org # v3.4+
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 98fd72fe82 upstream.
When IODA2 creates a PE, it creates an IOMMU table with it_ops::free
set to pnv_ioda2_table_free() which calls pnv_pci_ioda2_table_free_pages().
Since iommu_tce_table_put() calls it_ops::free when the last reference
to the table is released, explicit call to pnv_pci_ioda2_table_free_pages()
is not needed so let's remove it.
This should fix double free in the case of PCI hotuplug as
pnv_pci_ioda2_table_free_pages() does not reset neither
iommu_table::it_base nor ::it_size.
This was not exposed by SRIOV as it uses different code path via
pnv_pcibios_sriov_disable().
IODA1 does not inialize it_ops::free so it does not have this issue.
Fixes: c5f7700bbd ("powerpc/powernv: Dynamically release PE")
Cc: stable@vger.kernel.org # v4.8+
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cd6ef7eebf upstream.
Back when we first introduced the DAWR, in commit 4ae7ebe952
("powerpc: Change hardware breakpoint to allow longer ranges"), we
screwed up the constraint making it a 1024 byte boundary rather than a
512. This makes the check overly permissive. Fortunately GDB is the
only real user and it always did they right thing, so we never
noticed.
This fixes the constraint to 512 bytes.
Fixes: 4ae7ebe952 ("powerpc: Change hardware breakpoint to allow longer ranges")
Cc: stable@vger.kernel.org # v3.9+
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4f7c06e26e upstream.
In commit e2a800beac ("powerpc/hw_brk: Fix off by one error when
validating DAWR region end") we fixed setting the DAWR end point to
its max value via PPC_PTRACE_SETHWDEBUG. Unfortunately we broke
PTRACE_SET_DEBUGREG when setting a 512 byte aligned breakpoint.
PTRACE_SET_DEBUGREG currently sets the length of the breakpoint to
zero (memset() in hw_breakpoint_init()). This worked with
arch_validate_hwbkpt_settings() before the above patch was applied but
is now broken if the breakpoint is 512byte aligned.
This sets the length of the breakpoint to 8 bytes when using
PTRACE_SET_DEBUGREG.
Fixes: e2a800beac ("powerpc/hw_brk: Fix off by one error when validating DAWR region end")
Cc: stable@vger.kernel.org # v3.11+
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 91d0697188 upstream.
Currently we do not have an isync, or any other context synchronizing
instruction prior to the slbie/slbmte in _switch() that updates the
SLB entry for the kernel stack.
However that is not correct as outlined in the ISA.
From Power ISA Version 3.0B, Book III, Chapter 11, page 1133:
"Changing the contents of ... the contents of SLB entries ... can
have the side effect of altering the context in which data
addresses and instruction addresses are interpreted, and in which
instructions are executed and data accesses are performed.
...
These side effects need not occur in program order, and therefore
may require explicit synchronization by software.
...
The synchronizing instruction before the context-altering
instruction ensures that all instructions up to and including that
synchronizing instruction are fetched and executed in the context
that existed before the alteration."
And page 1136:
"For data accesses, the context synchronizing instruction before the
slbie, slbieg, slbia, slbmte, tlbie, or tlbiel instruction ensures
that all preceding instructions that access data storage have
completed to a point at which they have reported all exceptions
they will cause."
We're not aware of any bugs caused by this, but it should be fixed
regardless.
Add the missing isync when updating kernel stack SLB entry.
Cc: stable@vger.kernel.org
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
[mpe: Flesh out change log with more ISA text & explanation]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit df0e91d488 upstream.
Fuse has an "atomic_o_trunc" mode, where userspace filesystem uses the
O_TRUNC flag in the OPEN request to truncate the file atomically with the
open.
In this mode there's no need to send a SETATTR request to userspace after
the open, so fuse_do_setattr() checks this mode and returns. But this
misses the important step of truncating the pagecache.
Add the missing parts of truncation to the ATTR_OPEN branch.
Reported-by: Chad Austin <chadaustin@fb.com>
Fixes: 6ff958edbf ("fuse: add atomic open+truncate support")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fe50a7d039 upstream.
There was one place where the timeout value for an operation was
not being set, if a capabilities request was done from idle. Move
the timeout value setting to before where that change might be
requested.
IMHO the cause here is the invisible returns in the macros. Maybe
that's a job for later, though.
Reported-by: Nordmark Claes <Claes.Nordmark@tieto.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2026d35741 upstream.
The function __builtin_expect returns long type (see the gcc
documentation), and so do macros likely and unlikely. Unfortunatelly, when
CONFIG_PROFILE_ANNOTATED_BRANCHES is selected, the macros likely and
unlikely expand to __branch_check__ and __branch_check__ truncates the
long type to int. This unintended truncation may cause bugs in various
kernel code (we found a bug in dm-writecache because of it), so it's
better to fix __branch_check__ to return long.
Link: http://lkml.kernel.org/r/alpine.LRH.2.02.1805300818140.24812@file01.intranet.prod.int.rdu2.redhat.com
Cc: Ingo Molnar <mingo@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 1f0d69a9fc ("tracing: profile likely and unlikely annotations")
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5d302ed3cc upstream.
According to "EP93xx User’s Guide", I2STXLinCtrlData and I2SRXLinCtrlData
registers actually have different format. The only currently used bit
(Left_Right_Justify) has different position. Fix this and simplify the
whole setup taking into account the fact that both registers have zero
default value.
The practical effect of the above is repaired SND_SOC_DAIFMT_RIGHT_J
support (currently unused).
Signed-off-by: Alexander Sverdlin <alexander.sverdlin@gmail.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2d534113be upstream.
The bit responsible for LRCLK polarity is i2s_tlrs (0), not i2s_trel (2)
(refer to "EP93xx User's Guide").
Previously card drivers which specified SND_SOC_DAIFMT_NB_IF actually got
SND_SOC_DAIFMT_NB_NF, an adaptation is necessary to retain the old
behavior.
Signed-off-by: Alexander Sverdlin <alexander.sverdlin@gmail.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ff2faf1289 upstream.
dapm_kcontrol_data is freed as part of dapm_kcontrol_free(), leaving the
paths pointer dangling in the list.
This leads to system crash when we try to unload and reload sound card.
I hit this bug during ADSP crash/reboot test case on Dragon board DB410c.
Without this patch, on SLAB Poisoning enabled build, kernel crashes with
"BUG kmalloc-128 (Tainted: G W ): Poison overwritten"
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6e01827ed9 upstream.
Some low-speed and full-speed devices (for example, bluetooth)
do not have time to initialize. For them, ETIMEDOUT is a valid error.
We need to give them another try. Otherwise, they will
never be initialized correctly and in dmesg will be messages
"Bluetooth: hci0 command 0x1002 tx timeout" or similars.
Fixes: 264904ccc3 ("usb: retry reset if a device times out")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Maxim Moseychuk <franchesko.salias.hudro.pedros@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8afb1d2c12 upstream.
Commit 40f70c03e3 ("serial: sh-sci: add locking to console write
function to avoid SMP lockup") copied the strategy to avoid locking
problems in conjuncture with the console from the UART8250
driver. Instead using directly spin_{try}lock_irqsave(),
local_irq_save() followed by spin_{try}lock() was used. While this is
correct on mainline, for -rt it is a problem. spin_{try}lock() will
check if it is running in a valid context. Since the local_irq_save()
has already been executed, the context has changed and
spin_{try}lock() will complain. The reason why spin_{try}lock()
complains is that on -rt the spin locks are turned into mutexes and
therefore can sleep. Sleeping with interrupts disabled is not valid.
BUG: sleeping function called from invalid context at /home/wagi/work/rt/v4.4-cip-rt/kernel/locking/rtmutex.c:995
in_atomic(): 0, irqs_disabled(): 128, pid: 778, name: irq/76-eth0
CPU: 0 PID: 778 Comm: irq/76-eth0 Not tainted 4.4.126-test-cip22-rt14-00403-gcd03665c8318 #12
Hardware name: Generic RZ/G1 (Flattened Device Tree)
Backtrace:
[<c00140a0>] (dump_backtrace) from [<c001424c>] (show_stack+0x18/0x1c)
r7:c06b01f0 r6:60010193 r5:00000000 r4:c06b01f0
[<c0014234>] (show_stack) from [<c01d3c94>] (dump_stack+0x78/0x94)
[<c01d3c1c>] (dump_stack) from [<c004c134>] (___might_sleep+0x134/0x194)
r7:60010113 r6:c06d3559 r5:00000000 r4:ffffe000
[<c004c000>] (___might_sleep) from [<c04ded60>] (rt_spin_lock+0x20/0x74)
r5:c06f4d60 r4:c06f4d60
[<c04ded40>] (rt_spin_lock) from [<c02577e4>] (serial_console_write+0x100/0x118)
r5:c06f4d60 r4:c06f4d60
[<c02576e4>] (serial_console_write) from [<c0061060>] (call_console_drivers.constprop.15+0x10c/0x124)
r10:c06d2894 r9:c04e18b0 r8:00000028 r7:00000000 r6:c06d3559 r5:c06d2798
r4:c06b9914 r3:c02576e4
[<c0060f54>] (call_console_drivers.constprop.15) from [<c0062984>] (console_unlock+0x32c/0x430)
r10:c06d30d8 r9:00000028 r8:c06dd518 r7:00000005 r6:00000000 r5:c06d2798
r4:c06d2798 r3:00000028
[<c0062658>] (console_unlock) from [<c0062e1c>] (vprintk_emit+0x394/0x4f0)
r10:c06d2798 r9:c06d30ee r8:00000006 r7:00000005 r6:c06a78fc r5:00000027
r4:00000003
[<c0062a88>] (vprintk_emit) from [<c0062fa0>] (vprintk+0x28/0x30)
r10:c060bd46 r9:00001000 r8:c06b9a90 r7:c06b9a90 r6:c06b994c r5:c06b9a3c
r4:c0062fa8
[<c0062f78>] (vprintk) from [<c0062fb8>] (vprintk_default+0x10/0x14)
[<c0062fa8>] (vprintk_default) from [<c009cd30>] (printk+0x78/0x84)
[<c009ccbc>] (printk) from [<c025afdc>] (credit_entropy_bits+0x17c/0x2cc)
r3:00000001 r2:decade60 r1:c061a5ee r0:c061a523
r4:00000006
[<c025ae60>] (credit_entropy_bits) from [<c025bf74>] (add_interrupt_randomness+0x160/0x178)
r10:466e7196 r9:1f536000 r8:fffeef74 r7:00000000 r6:c06b9a60 r5:c06b9a3c
r4:dfbcf680
[<c025be14>] (add_interrupt_randomness) from [<c006536c>] (irq_thread+0x1e8/0x248)
r10:c006537c r9:c06cdf21 r8:c0064fcc r7:df791c24 r6:df791c00 r5:ffffe000
r4:df525180
[<c0065184>] (irq_thread) from [<c003fba4>] (kthread+0x108/0x11c)
r10:00000000 r9:00000000 r8:c0065184 r7:df791c00 r6:00000000 r5:df791d00
r4:decac000
[<c003fa9c>] (kthread) from [<c00101b8>] (ret_from_fork+0x14/0x3c)
r8:00000000 r7:00000000 r6:00000000 r5:c003fa9c r4:df791d00
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Daniel Wagner <daniel.wagner@siemens.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3f90f9ef2d upstream.
If 020/030 support is enabled, get_io_area() leaves an IO_SIZE gap
between mappings which is added to the vm_struct representing the
mapping. __ioremap() uses the actual requested size (after alignment),
while __iounmap() is passed the size from the vm_struct.
On 020/030, early termination descriptors are used to set up mappings of
extent 'size', which are validated on unmapping. The unmapped gap of
size IO_SIZE defeats the sanity check of the pmd tables, causing
__iounmap() to loop forever on 030.
On 040/060, unmapping of page table entries does not check for a valid
mapping, so the umapping loop always completes there.
Adjust size to be unmapped by the gap that had been added in the
vm_struct prior.
This fixes the hang in atari_platform_init() reported a long time ago,
and a similar one reported by Finn recently (addressed by removing
ioremap() use from the SWIM driver.
Tested on my Falcon in 030 mode - untested but should work the same on
040/060 (the extra page tables cleared there would never have been set
up anyway).
Signed-off-by: Michael Schmitz <schmitzmic@gmail.com>
[geert: Minor commit description improvements]
[geert: This was fixed in 2.4.23, but not in 2.5.x]
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1f74c8a647 upstream.
mce_no_way_out() does a quick check during #MC to see whether some of
the MCEs logged would require the kernel to panic immediately. And it
passes a struct mce where MCi_STATUS gets written.
However, after having saved a valid status value, the next iteration
of the loop which goes over the MCA banks on the CPU, overwrites the
valid status value because we're using struct mce as storage instead of
a temporary variable.
Which leads to MCE records with an empty status value:
mce: [Hardware Error]: CPU 0: Machine Check Exception: 6 Bank 0: 0000000000000000
mce: [Hardware Error]: RIP 10:<ffffffffbd42fbd7> {trigger_mce+0x7/0x10}
In order to prevent the loss of the status register value, return
immediately when severity is a panic one so that we can panic
immediately with the first fatal MCE logged. This is also the intention
of this function and not to noodle over the banks while a fatal MCE is
already logged.
Tony: read the rest of the MCA bank to populate the struct mce fully.
Suggested-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20180622095428.626-8-bp@alien8.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 40c36e2741 upstream.
Some injection testing resulted in the following console log:
mce: [Hardware Error]: CPU 22: Machine Check Exception: f Bank 1: bd80000000100134
mce: [Hardware Error]: RIP 10:<ffffffffc05292dd> {pmem_do_bvec+0x11d/0x330 [nd_pmem]}
mce: [Hardware Error]: TSC c51a63035d52 ADDR 3234bc4000 MISC 88
mce: [Hardware Error]: PROCESSOR 0:50654 TIME 1526502199 SOCKET 0 APIC 38 microcode 2000043
mce: [Hardware Error]: Run the above through 'mcelog --ascii'
Kernel panic - not syncing: Machine check from unknown source
This confused everybody because the first line quite clearly shows
that we found a logged error in "Bank 1", while the last line says
"unknown source".
The problem is that the Linux code doesn't do the right thing
for a local machine check that results in a fatal error.
It turns out that we know very early in the handler whether the
machine check is fatal. The call to mce_no_way_out() has checked
all the banks for the CPU that took the local machine check. If
it says we must crash, we can do so right away with the right
messages.
We do scan all the banks again. This means that we might initially
not see a problem, but during the second scan find something fatal.
If this happens we print a slightly different message (so I can
see if it actually every happens).
[ bp: Remove unneeded severity assignment. ]
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: stable@vger.kernel.org # 4.2
Link: http://lkml.kernel.org/r/52e049a497e86fd0b71c529651def8871c804df0.1527283897.git.tony.luck@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c7d606f560 upstream.
Since we added support to add recovery from some errors inside the kernel in:
commit b2f9d678e2 ("x86/mce: Check for faults tagged in EXTABLE_CLASS_FAULT exception table entries")
we have done a less than stellar job at reporting the cause of recoverable
machine checks that occur in other parts of the kernel. The user just gets
the unhelpful message:
mce: [Hardware Error]: Machine check: Action required: unknown MCACOD
doubly unhelpful when they check the manual for the reported IA32_MSR_STATUS.MCACOD
and see that it is listed as one of the standard recoverable values.
Add an extra rule to the MCE severity table to catch this case and report it
as:
mce: [Hardware Error]: Machine check: Data load in unrecoverable area of kernel
Fixes: b2f9d678e2 ("x86/mce: Check for faults tagged in EXTABLE_CLASS_FAULT exception table entries")
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: stable@vger.kernel.org # 4.6+
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/4cc7c465150a9a48b8b9f45d0b840278e77eb9b5.1527283897.git.tony.luck@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit eab6870fee upstream.
Mark Rutland noticed that GCC optimization passes have the potential to elide
necessary invocations of the array_index_mask_nospec() instruction sequence,
so mark the asm() volatile.
Mark explains:
"The volatile will inhibit *some* cases where the compiler could lift the
array_index_nospec() call out of a branch, e.g. where there are multiple
invocations of array_index_nospec() with the same arguments:
if (idx < foo) {
idx1 = array_idx_nospec(idx, foo)
do_something(idx1);
}
< some other code >
if (idx < foo) {
idx2 = array_idx_nospec(idx, foo);
do_something_else(idx2);
}
... since the compiler can determine that the two invocations yield the same
result, and reuse the first result (likely the same register as idx was in
originally) for the second branch, effectively re-writing the above as:
if (idx < foo) {
idx = array_idx_nospec(idx, foo);
do_something(idx);
}
< some other code >
if (idx < foo) {
do_something_else(idx);
}
... if we don't take the first branch, then speculatively take the second, we
lose the nospec protection.
There's more info on volatile asm in the GCC docs:
https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Volatile
"
Reported-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: <stable@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: babdde2698 ("x86: Implement array_index_mask_nospec")
Link: https://lkml.kernel.org/lkml/152838798950.14521.4893346294059739135.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5cc41e0995 upstream.
WHen registering a new binfmt_misc handler, it is possible to overflow
the offset to get a negative value, which might crash the system, or
possibly leak kernel data.
Here is a crash log when 2500000000 was used as an offset:
BUG: unable to handle kernel paging request at ffff989cfd6edca0
IP: load_misc_binary+0x22b/0x470 [binfmt_misc]
PGD 1ef3e067 P4D 1ef3e067 PUD 0
Oops: 0000 [#1] SMP NOPTI
Modules linked in: binfmt_misc kvm_intel ppdev kvm irqbypass joydev input_leds serio_raw mac_hid parport_pc qemu_fw_cfg parpy
CPU: 0 PID: 2499 Comm: bash Not tainted 4.15.0-22-generic #24-Ubuntu
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/2014
RIP: 0010:load_misc_binary+0x22b/0x470 [binfmt_misc]
Call Trace:
search_binary_handler+0x97/0x1d0
do_execveat_common.isra.34+0x667/0x810
SyS_execve+0x31/0x40
do_syscall_64+0x73/0x130
entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Use kstrtoint instead of simple_strtoul. It will work as the code
already set the delimiter byte to '\0' and we only do it when the field
is not empty.
Tested with offsets -1, 2500000000, UINT_MAX and INT_MAX. Also tested
with examples documented at Documentation/admin-guide/binfmt-misc.rst
and other registrations from packages on Ubuntu.
Link: http://lkml.kernel.org/r/20180529135648.14254-1-cascardo@canonical.com
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 670ae9caac upstream.
struct vhost_msg within struct vhost_msg_node is copied to userspace.
Unfortunately it turns out on 64 bit systems vhost_msg has padding after
type which gcc doesn't initialize, leaking 4 uninitialized bytes to
userspace.
This padding also unfortunately means 32 bit users of this interface are
broken on a 64 bit kernel which will need to be fixed separately.
Fixes: CVE-2018-1118
Cc: stable@vger.kernel.org
Reported-by: Kevin Easton <kevin@guarana.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reported-by: syzbot+87cfa083e727a224754b@syzkaller.appspotmail.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ebeaa36754 upstream.
Current ISH driver only registers suspend/resume PM callbacks which don't
support hibernation (suspend to disk). Basically after hiberation, the ISH
can't resume properly and user may not see sensor events (for example: screen
rotation may not work).
User will not see a crash or panic or anything except the following message
in log:
hid-sensor-hub 001F:8086:22D8.0001: timeout waiting for response from ISHTP device
So this patch adds support for S4/hiberbation to ISH by using the
SIMPLE_DEV_PM_OPS() MACRO instead of struct dev_pm_ops directly. The suspend
and resume functions will now be used for both suspend to RAM and hibernation.
If power management is disabled, SIMPLE_DEV_PM_OPS will do nothing, the suspend
and resume related functions won't be used, so mark them as __maybe_unused to
clarify that this is the intended behavior, and remove #ifdefs for power
management.
Cc: stable@vger.kernel.org
Signed-off-by: Even Xu <even.xu@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f6a4b4c9d0 upstream.
As long as a symlink inode remains in-core, the destination (and
therefore size) will not be re-fetched from the server, as it cannot
change. The original implementation of the attribute cache assumed that
setting the expiry time in the past was sufficient to cause a re-fetch
of all attributes on the next getattr. That does not work in this case.
The bug manifested itself as follows. When the command sequence
touch foo; ln -s foo bar; ls -l bar
is run, the output was
lrwxrwxrwx. 1 fedora fedora 4906 Apr 24 19:10 bar -> foo
However, after a re-mount, ls -l bar produces
lrwxrwxrwx. 1 fedora fedora 3 Apr 24 19:10 bar -> foo
After this commit, even before a re-mount, the output is
lrwxrwxrwx. 1 fedora fedora 3 Apr 24 19:10 bar -> foo
Reported-by: Becky Ligon <ligon@clemson.edu>
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Fixes: 71680c18c8 ("orangefs: Cache getattr results.")
Cc: stable@vger.kernel.org
Cc: hubcap@omnibond.com
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 955bc61328 upstream.
According to the API, you may only call clk_get_rate() after actually
enabling it.
Found by Linux Driver Verification project (linuxtesting.org).
Fixes: a5fd9139f7 ("w1: add 1-wire master driver for i.MX27 / i.MX31")
Signed-off-by: Stefan Potyra <Stefan.Potyra@elektrobit.com>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2cfce3a86b upstream.
Commit 184add2ca2 ("libata: Apply NOLPM quirk for SanDisk
SD7UB3Q*G1001 SSDs") disabled LPM for SanDisk SD7UB3Q*G1001 SSDs.
This has lead to several reports of users of that SSD where LPM
was working fine and who know have a significantly increased idle
power consumption on their laptops.
Likely there is another problem on the T450s from the original
reporter which gets exposed by the uncore reaching deeper sleep
states (higher PC-states) due to LPM being enabled. The problem as
reported, a hardfreeze about once a day, already did not sound like
it would be caused by LPM and the reports of the SSD working fine
confirm this. The original reporter is ok with dropping the quirk.
A X250 user has reported the same hard freeze problem and for him
the problem went away after unrelated updates, I suspect some GPU
driver stack changes fixed things.
TL;DR: The original reporters problem were triggered by LPM but not
an LPM issue, so drop the quirk for the SSD in question.
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1583207
Cc: stable@vger.kernel.org
Cc: Richard W.M. Jones <rjones@redhat.com>
Cc: Lorenzo Dalrio <lorenzo.dalrio@gmail.com>
Reported-by: Lorenzo Dalrio <lorenzo.dalrio@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: "Richard W.M. Jones" <rjones@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 795ef78814 upstream.
Don't populate the arrays cdb on the stack, instead make them static.
Makes the object code smaller by 230 bytes:
Before:
text data bss dec hex filename
3797 240 0 4037 fc5 drivers/ata/libata-zpodd.o
After:
text data bss dec hex filename
3407 400 0 3807 edf drivers/ata/libata-zpodd.o
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c7d1f119c4 upstream.
If the policy limits are updated via cpufreq_update_policy() and
subsequently via sysfs, the limits stored in user_policy may be
set incorrectly.
For example, if both min and max are set via sysfs to the maximum
available frequency, user_policy.min and user_policy.max will also
be the maximum. If a policy notifier triggered by
cpufreq_update_policy() lowers both the min and the max at this
point, that change is not reflected by the user_policy limits, so
if the max is updated again via sysfs to the same lower value,
then user_policy.max will be lower than user_policy.min which
shouldn't happen. In particular, if one of the policy CPUs is
then taken offline and back online, cpufreq_set_policy() will
fail for it due to a failing limits check.
To prevent that from happening, initialize the min and max fields
of the new_policy object to the ones stored in user_policy that
were previously set via sysfs.
Signed-off-by: Kevin Wangtao <kevin.wangtao@hisilicon.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Subject & changelog ]
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b2adf22fdf upstream.
The server detects reconnect by the (non-zero) value in PreviousSessionId
of SMB2/SMB3 SessionSetup request, but this behavior regressed due
to commit 166cea4dc3
("SMB2: Separate RawNTLMSSP authentication from SMB2_sess_setup")
CC: Stable <stable@vger.kernel.org>
CC: Sachin Prabhu <sprabhu@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a3aa60d511 upstream.
When 'kzalloc()' fails in 'snd_hda_attach_pcm_stream()', a new pcm instance is
created without setting its operators via 'snd_pcm_set_ops()'. Following
operations on the new pcm instance can trigger kernel null pointer dereferences
and cause kernel oops.
This bug was found with my work on building a gray-box fault-injection tool for
linux-kernel-module binaries. A kernel null pointer dereference was confirmed
from line 'substream->ops->open()' in function 'snd_pcm_open_substream()' in
file 'sound/core/pcm_native.c'.
This patch fixes the bug by calling 'snd_device_free()' in the error handling
path of 'kzalloc()', which removes the new pcm instance from the snd card before
returns with an error code.
Signed-off-by: Bo Chen <chenbo@pdx.edu>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ac0b4145d6 upstream.
[BUG]
Btrfs can create compressed extent without checksum (even though it
shouldn't), and if we then try to replace device containing such extent,
the result device will contain all the uncompressed data instead of the
compressed one.
Test case already submitted to fstests:
https://patchwork.kernel.org/patch/10442353/
[CAUSE]
When handling compressed extent without checksum, device replace will
goe into copy_nocow_pages() function.
In that function, btrfs will get all inodes referring to this data
extents and then use find_or_create_page() to get pages direct from that
inode.
The problem here is, pages directly from inode are always uncompressed.
And for compressed data extent, they mismatch with on-disk data.
Thus this leads to corrupted compressed data extent written to replace
device.
[FIX]
In this attempt, we could just remove the "optimization" branch, and let
unified scrub_pages() to handle it.
Although scrub_pages() won't bother reusing page cache, it will be a
little slower, but it does the correct csum checking and won't cause
such data corruption caused by "optimization".
Note about the fix: this is the minimal fix that can be backported to
older stable trees without conflicts. The whole callchain from
copy_nocow_pages() can be deleted, and will be in followup patches.
Fixes: ff023aac31 ("Btrfs: add code to scrub to copy read data to another disk")
CC: stable@vger.kernel.org # 4.4+
Reported-by: James Harvey <jamespharvey20@gmail.com>
Reviewed-by: James Harvey <jamespharvey20@gmail.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
[ remove code removal, add note why ]
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fd4e994bd1 upstream.
If we have invalid flags set, when we error out we must drop our writer
counter and free the buffer we allocated for the arguments. This bug is
trivially reproduced with the following program on 4.7+:
#include <fcntl.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/ioctl.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <linux/btrfs.h>
#include <linux/btrfs_tree.h>
int main(int argc, char **argv)
{
struct btrfs_ioctl_vol_args_v2 vol_args = {
.flags = UINT64_MAX,
};
int ret;
int fd;
if (argc != 2) {
fprintf(stderr, "usage: %s PATH\n", argv[0]);
return EXIT_FAILURE;
}
fd = open(argv[1], O_WRONLY);
if (fd == -1) {
perror("open");
return EXIT_FAILURE;
}
ret = ioctl(fd, BTRFS_IOC_RM_DEV_V2, &vol_args);
if (ret == -1)
perror("ioctl");
close(fd);
return EXIT_SUCCESS;
}
When unmounting the filesystem, we'll hit the
WARN_ON(mnt_get_writers(mnt)) in cleanup_mnt() and also may prevent the
filesystem to be remounted read-only as the writer count will stay
lifted.
Fixes: 6b526ed70c ("btrfs: introduce device delete by devid")
CC: stable@vger.kernel.org # 4.9+
Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Su Yue <suy.fnst@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b5c40d598f upstream.
In btrfs_clone_files(), we must check the NODATASUM flag while the
inodes are locked. Otherwise, it's possible that btrfs_ioctl_setflags()
will change the flags after we check and we can end up with a party
checksummed file.
The race window is only a few instructions in size, between the if and
the locks which is:
3834 if (S_ISDIR(src->i_mode) || S_ISDIR(inode->i_mode))
3835 return -EISDIR;
where the setflags must be run and toggle the NODATASUM flag (provided
the file size is 0). The clone will block on the inode lock, segflags
takes the inode lock, changes flags, releases log and clone continues.
Not impossible but still needs a lot of bad luck to hit unintentionally.
Fixes: 0e7b824c4e ("Btrfs: don't make a file partly checksummed through file clone")
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ update changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4f2f76f751 upstream.
ext4_resize_fs() has an off-by-one bug when checking whether growing of
a filesystem will not overflow inode count. As a result it allows a
filesystem with 8192 inodes per group to grow to 64TB which overflows
inode count to 0 and makes filesystem unusable. Fix it.
Cc: stable@vger.kernel.org
Fixes: 3f8a6411fb
Reported-by: Jaco Kroon <jaco@uls.co.za>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit eee597ac93 upstream.
Currently in ext4_punch_hole we're going to skip the mtime update if
there are no actual blocks to release. However we've actually modified
the file by zeroing the partial block so the mtime should be updated.
Moreover the sync and datasync handling is skipped as well, which is
also wrong. Fix it.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reported-by: Joe Habermann <joe.habermann@quantum.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2ee3ee06a8 upstream.
When ext4_ind_map_blocks() computes a length of a hole, it doesn't count
with the fact that mapped offset may be somewhere in the middle of the
completely empty subtree. In such case it will return too large length
of the hole which then results in lseek(SEEK_DATA) to end up returning
an incorrect offset beyond the end of the hole.
Fix the problem by correctly taking offset within a subtree into account
when computing a length of a hole.
Fixes: facab4d971
CC: stable@vger.kernel.org
Reported-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4fd44a98ff ]
commit 079096f103 ("tcp/dccp: install syn_recv requests into ehash
table") introduced an optimization for the handling of child sockets
created for a new TCP connection.
But this optimization passes any data associated with the last ACK of the
connection handshake up the stack without verifying its checksum, because it
calls tcp_child_process(), which in turn calls tcp_rcv_state_process()
directly. These lower-level processing functions do not do any checksum
verification.
Insert a tcp_checksum_complete call in the TCP_NEW_SYN_RECEIVE path to
fix this.
Fixes: 079096f103 ("tcp/dccp: install syn_recv requests into ehash table")
Signed-off-by: Frank van der Linden <fllinden@amazon.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Balbir Singh <bsingharora@gmail.com>
Reviewed-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 8d499533e0 ]
use nla_strlcpy() to avoid copying data beyond the length of TCA_DEF_DATA
netlink attribute, in case it is less than SIMP_MAX_DATA and it does not
end with '\0' character.
v2: fix errors in the commit message, thanks Hangbin Liu
Fixes: fa1b1cff3d ("net_cls_act: Make act_simple use of netlink policy.")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0975764684 ]
IPVS setups with local client and remote tunnel server need
to create exception for the local virtual IP. What we do is to
change PMTU from 64KB (on "lo") to 1460 in the common case.
Suggested-by: Martin KaFai Lau <kafai@fb.com>
Fixes: 45e4fd2668 ("ipv6: Only create RTF_CACHE routes after encountering pmtu exception")
Fixes: 7343ff31eb ("ipv6: Don't create clones of host routes.")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: David Ahern <dsahern@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit eb55bbf865 ]
There is a timing issue under active-standy mode, when bond_enslave() is
called, bond->params.primary might not be initialized yet.
Any time the primary slave string changes, bond->force_primary should be
set to true to make sure the primary becomes the active slave.
Signed-off-by: Xiangning Yu <yuxiangning@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ebc3dd688c ]
It has been observed that writing 0xF2 to the power register while it
reads as 0xF4 results in the register having the value 0xF0, i.e. clearing
RESUME and setting SUSPENDM in one go does not work. It might also violate
the USB spec to transition directly from resume to suspend, especially
when not taking T_DRSMDN into account. But this is what happens when a
remote wakeup occurs between SetPortFeature USB_PORT_FEAT_SUSPEND on the
root hub and musb_bus_suspend being called.
This commit returns -EBUSY when musb_bus_suspend is called while remote
wakeup is signalled and thus avoids to reset the RESUME bit. Ignoring
this error when musb_port_suspend is called from musb_hub_control is ok.
Signed-off-by: Daniel Glöckner <dg@emlix.com>
Signed-off-by: Bin Liu <b-liu@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 8810f7517a ]
There is a scenario that can end up with rebuild process failing to
return good content, i.e.
suppose that all disks can be read without problems and if the content
that was read out doesn't match its checksum, currently for raid6
btrfs at most retries twice,
- the 1st retry is to rebuild with all other stripes, it'll eventually
be a raid5 xor rebuild,
- if the 1st fails, the 2nd retry will deliberately fail parity p so
that it will do raid6 style rebuild,
however, the chances are that another non-parity stripe content also
has something corrupted, so that the above retries are not able to
return correct content, and users will think of this as data loss.
More seriouly, if the loss happens on some important internal btree
roots, it could refuse to mount.
This extends btrfs to do more retries and each retry fails only one
stripe. Since raid6 can tolerate 2 disk failures, if there is one
more failure besides the failure on which we're recovering, this can
always work.
The worst case is to retry as many times as the number of raid6 disks,
but given the fact that such a scenario is really rare in practice,
it's still acceptable.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 02db55718d upstream.
While rcvbuf is properly clamped by tcp_rmem[2], rcvwin
is left to a potentially too big value.
It has no serious effect, since :
1) tcp_grow_window() has very strict checks.
2) window_clamp can be mangled by user space to any value anyway.
tcp_init_buffer_space() and companions use tcp_full_space(),
we use tcp_win_from_space() to avoid reloading sk->sk_rcvbuf
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Wei Wang <weiwan@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Benjamin Gilbert <benjamin.gilbert@coreos.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ab1068d686 ]
When there are 16 or more logical CPUs, we request for
`IWL_MAX_RX_HW_QUEUES` (16) IRQs only as we limit to that number of
IRQs, but later on we compare the number of IRQs returned to
nr_online_cpus+2 instead of max_irqs, the latter being what we
actually asked for. This ends up setting num_rx_queues to 17 which
causes lots of out-of-bounds array accesses later on.
Compare to max_irqs instead, and also add an assertion in case
num_rx_queues > IWM_MAX_RX_HW_QUEUES.
This fixes https://bugzilla.kernel.org/show_bug.cgi?id=199551
Fixes: 2e5d4a8f61 ("iwlwifi: pcie: Add new configuration to enable MSIX")
Signed-off-by: Hao Wei Tee <angelsl@in04.sg>
Tested-by: Sara Sharon <sara.sharon@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
With the recent sync with objtool from 4.14.y, the objtool .gitignore
file was forgotten. Fix that up now to properly handle the change in
where the autogenerated files live.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1411b5218a upstream.
In the vmx AES init routines we do a printk(KERN_INFO ...) to report
the fallback implementation we're using.
However with a slow console this can significantly affect the speed of
crypto operations. Using 'cryptsetup benchmark' the removal of the
printk() leads to a ~5x speedup for aes-cbc decryption.
So remove them.
Fixes: 8676590a15 ("crypto: vmx - Adding AES routines for VMX module")
Fixes: 8c755ace35 ("crypto: vmx - Adding CBC routines for VMX module")
Fixes: 4f7f60d312 ("crypto: vmx - Adding CTR routines for VMX module")
Fixes: cc333cd68d ("crypto: vmx - Adding GHASH routines for VMX module")
Cc: stable@vger.kernel.org # v4.1+
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3c9fa24ca7 upstream.
The functions that were used in the emulation of fxrstor, fxsave, sgdt and
sidt were originally meant for task switching, and as such they did not
check privilege levels. This is very bad when the same functions are used
in the emulation of unprivileged instructions. This is CVE-2018-10853.
The obvious fix is to add a new argument to ops->read_std and ops->write_std,
which decides whether the access is a "system" access or should use the
processor's CPL.
Fixes: 129a72a0d3 ("KVM: x86: Introduce segmented_write_std", 2017-01-12)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4a7e625ce5 upstream.
Commit 9b96fbacda ("serial: PL011: clear pending interrupts")
clears the RX and receive timeout interrupts on pl011 startup, to
avoid a screaming-interrupt scenario that can occur when the
firmware or bootloader leaves these interrupts asserted.
This has been noted as an issue when running Linux on qemu [1].
Unfortunately, the above fix seems to lead to potential
misbehaviour if the RX FIFO interrupt is asserted _non_ spuriously
on driver startup, if the RX FIFO is also already full to the
trigger level.
Clearing the RX FIFO interrupt does not change the FIFO fill level.
In this scenario, because the interrupt is now clear and because
the FIFO is already full to the trigger level, no new assertion of
the RX FIFO interrupt can occur unless the FIFO is drained back
below the trigger level. This never occurs because the pl011
driver is waiting for an RX FIFO interrupt to tell it that there is
something to read, and does not read the FIFO at all until that
interrupt occurs.
Thus, simply clearing "spurious" interrupts on startup may be
misguided, since there is no way to be sure that the interrupts are
truly spurious, and things can go wrong if they are not.
This patch instead clears the interrupt condition by draining the
RX FIFO during UART startup, after clearing any potentially
spurious interrupt. This should ensure that an interrupt will
definitely be asserted if the RX FIFO subsequently becomes
sufficiently full.
The drain is done at the point of enabling interrupts only. This
means that it will occur any time the UART is newly opened through
the tty layer. It will not apply to polled-mode use of the UART by
kgdboc: since that scenario cannot use interrupts by design, this
should not matter. kgdboc will interact badly with "normal" use of
the UART in any case: this patch makes no attempt to paper over
such issues.
This patch does not attempt to address the case where the RX FIFO
fills faster than it can be drained: that is a pathological
hardware design problem that is beyond the scope of the driver to
work around. As a failsafe, the number of poll iterations for
draining the FIFO is limited to twice the FIFO size. This will
ensure that the kernel at least boots even if it is impossible to
drain the FIFO for some reason.
[1] [Qemu-devel] [Qemu-arm] [PATCH] pl011: do not put into fifo
before enabled the interruption
https://lists.gnu.org/archive/html/qemu-devel/2018-01/msg06446.html
Reported-by: Wei Xu <xuwei5@hisilicon.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Peter Maydell <peter.maydell@linaro.org>
Fixes: 9b96fbacda ("serial: PL011: clear pending interrupts")
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Cc: stable <stable@vger.kernel.org>
Tested-by: Wei Xu <xuwei5@hisilicon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 13dc04d0e5 upstream.
I noticed that unused UARTs won't necessarily idle properly always
unless at least one byte tx transfer is done first.
After some debugging I narrowed down the problem to the scr register
dma configuration bits that need to be set before softreset for the
clocks to idle. Unless we do this, the module clkctrl idlest bits
may be set to 1 instead of 3 meaning the clock will never idle and
is blocking deeper idle states for the whole domain.
This might be related to the configuration done by the bootloader
or kexec booting where certain configurations cause the 8250 or
the clkctrl clock to jam in a way where setting of the scr bits
and reset is needed to clear it. I've tried diffing the 8250
registers for the various modes, but did not see anything specific.
So far I've only seen this on omap4 but I'm suspecting this might
also happen on the other clkctrl using SoCs considering they
already have a quirk enabled for UART_ERRATA_CLOCK_DISABLE.
Let's fix the issue by configuring scr before reset for basic dma
even if we don't use it. The scr register will be reset when we do
softreset few lines after, and we restore scr on resume. We should
do this for all the SoCs with UART_ERRATA_CLOCK_DISABLE quirk flag
set since the ones with UART_ERRATA_CLOCK_DISABLE are all based
using clkctrl similar to omap4.
Looks like both OMAP_UART_SCR_DMAMODE_1 | OMAP_UART_SCR_DMAMODE_CTL
bits are needed for the clkctrl to idle after a softreset.
And we need to add omap4 to also use the UART_ERRATA_CLOCK_DISABLE
for the related workaround to be enabled. This same compatible
value will also be used for omap5.
Fixes: cdb929e445 ("serial: 8250_omap: workaround errata around idling UART after using DMA")
Cc: Keerthy <j-keerthy@ti.com>
Cc: Matthijs van Duin <matthijsvanduin@gmail.com>
Cc: Sekhar Nori <nsekhar@ti.com>
Cc: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit aa2f80e752 upstream.
The best granularity of residue that DMA engine can report is in the BURST
units, so the serial driver must use MAXBURST = 1 and DMA_SLAVE_BUSWIDTH_1_BYTE
if it relies on exact number of bytes transferred by DMA engine.
Fixes: 62c37eedb7 ("serial: samsung: add dma reqest/release functions")
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Krzysztof Kozlowski <krzk@kernel.org>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9594b5be7e upstream.
I was puzzled while looking at /proc/interrupts and random things showed
up between reboots. This occurred more often but I realised it later. The
"correct" output should be:
|38: 11861 atmel-aic5 2 Level ttyS0
but I saw sometimes
|38: 6426 atmel-aic5 2 Level tty1
and accounted it wrongly as correct. This is use after free and the
former example randomly got the "old" pointer which pointed to the same
content. With SLAB_FREELIST_RANDOM and HARDENED I even got
|38: 7067 atmel-aic5 2 Level E=Started User Manager for UID 0
or other nonsense.
As it turns out the tty, pointer that is accessed in atmel_startup(), is
freed() before atmel_shutdown(). It seems to happen quite often that the
tty for ttyS0 is allocated and freed while ->shutdown is not invoked. I
don't do anything special - just a systemd boot :)
Use dev_name(&pdev->dev) as the IRQ name for request_irq(). This exists
as long as the driver is loaded so no use-after-free here.
Cc: stable@vger.kernel.org
Fixes: 761ed4a945 ("tty: serial_core: convert uart_close to use tty_port_close")
Acked-by: Richard Genoud <richard.genoud@gmail.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8c4e97ddfe upstream.
The ALWAYS_SYNC flag is currently honored by the usb-storage driver but not UAS
and is required to work around devices that become unstable upon being
queried for cache. This code is taken straight from:
drivers/usb/storage/scsiglue.c:284
Signed-off-by: Alexander Kappner <agk@godking.net>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Cc: stable <stable@vger.kernel.org>
Acked-by: Oliver Neukum <oneukum@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a0d6ec8809 upstream.
pdev_nr and rhport can be controlled by user-space, hence leading to
a potential exploitation of the Spectre variant 1 vulnerability.
This issue was detected with the help of Smatch:
drivers/usb/usbip/vhci_sysfs.c:238 detach_store() warn: potential spectre issue 'vhcis'
drivers/usb/usbip/vhci_sysfs.c:328 attach_store() warn: potential spectre issue 'vhcis'
drivers/usb/usbip/vhci_sysfs.c:338 attach_store() warn: potential spectre issue 'vhci->vhci_hcd_ss->vdev'
drivers/usb/usbip/vhci_sysfs.c:340 attach_store() warn: potential spectre issue 'vhci->vhci_hcd_hs->vdev'
Fix this by sanitizing pdev_nr and rhport before using them to index
vhcis and vhci->vhci_hcd_ss->vdev respectively.
Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].
[1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2
Cc: stable@vger.kernel.org
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Shuah Khan (Samsung OSG) <shuah@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 45ad559a29 upstream.
Syzbot reported yet another warning with Ion:
WARNING: CPU: 0 PID: 1467 at drivers/staging/android/ion/ion.c:122
ion_buffer_destroy+0xd4/0x190 drivers/staging/android/ion/ion.c:122
Kernel panic - not syncing: panic_on_warn set ...
This is catching that a buffer was freed with an existing kernel mapping
still present. This can be easily be triggered from userspace by calling
DMA_BUF_SYNC_START without calling DMA_BUF_SYNC_END. Switch to a single
pr_warn_once to indicate the error without being disruptive.
Reported-by: syzbot+cd8bcd40cb049efa2770@syzkaller.appspotmail.com
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Laura Abbott <labbott@redhat.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ce14e868a5 upstream.
Int the next patch the emulator's .read_std and .write_std callbacks will
grow another argument, which is not needed in kvm_read_guest_virt and
kvm_write_guest_virt_system's callers. Since we have to make separate
functions, let's give the currently existing names a nicer interface, too.
Fixes: 129a72a0d3 ("KVM: x86: Introduce segmented_write_std", 2017-01-12)
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 79367a6574 upstream.
Wrap the common invocation of ctxt->ops->read_std and ctxt->ops->write_std, so
as to have a smaller patch when the functions grow another argument.
Fixes: 129a72a0d3 ("KVM: x86: Introduce segmented_write_std", 2017-01-12)
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7d18f0a14a upstream.
Sometimes a GPIO is fetched with NULL as parent device, and
that is just fine. So under these circumstances, avoid using
dev_name() to provide a name for the GPIO line.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Cc: Daniel Rosenberg <drosen@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 42819eb7a0 upstream.
The merged version of my patch "nvmet: don't report 0-bytes in serial
number" fails to remove two lines which should have been replaced,
so that the space-padded strings are overwritten again with 0-bytes.
Fix it.
Fixes: 42de82a8b5 nvmet: don't report 0-bytes in serial number
Signed-off-by: Martin Wilck <mwilck@suse.com>
Reviewed-by: Sagi Grimberg <sagi@grimbeg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 42de82a8b5 upstream.
The NVME standard mandates that the SN, MN, and FR fields of the Identify
Controller Data Structure be "ASCII strings". That means that they may
not contain 0-bytes, not even string terminators.
Signed-off-by: Martin Wilck <mwilck@suse.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
[hch: fixed for the move of the serial field, updated description]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2e7f5d2af2 upstream.
The NVMe specification defines the serial number as:
"Serial Number (SN): Contains the serial number for the NVM subsystem
that is assigned by the vendor as an ASCII string. Refer to section
7.10 for unique identifier requirements. Refer to section 1.5 for ASCII
string requirements"
So move it from the controller to the subsystem, where it belongs.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4b66af2d63 upstream.
Key extensions (struct sadb_key) include a user-specified number of key
bits. The kernel uses that number to determine how much key data to copy
out of the message in pfkey_msg2xfrm_state().
The length of the sadb_key message must be verified to be long enough,
even in the case of SADB_X_AALG_NULL. Furthermore, the sadb_key_len value
must be long enough to include both the key data and the struct sadb_key
itself.
Introduce a helper function verify_key_len(), and call it from
parse_exthdrs() where other exthdr types are similarly checked for
correctness.
Signed-off-by: Kevin Easton <kevin@guarana.org>
Reported-by: syzbot+5022a34ca5a3d49b84223653fab632dfb7b4cf37@syzkaller.appspotmail.com
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Zubin Mithra <zsm@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 161b8be2bd upstream.
A spurious interrupt before the nvme driver has initialized the completion
queue may inadvertently cause the driver to believe it has a completion
to process. This may result in a NULL dereference since the nvmeq's tags
are not set at this point.
The patch initializes the host's CQ memory so that a spurious interrupt
isn't mistaken for a real completion.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ad729bc9ac upstream.
The patch c4adfc822b ("bonding: make speed, duplex setting consistent
with link state") puts the link state to down if
bond_update_speed_duplex() cannot retrieve speed and duplex settings.
Assumably the patch was written with 802.3ad mode in mind which relies
on link speed/duplex settings. For other modes like active-backup these
settings are not required. Thus, only for these other modes, this patch
reintroduces support for slaves that do not support reporting speed or
duplex such as wireless devices. This fixes the regression reported in
bug 196547 (https://bugzilla.kernel.org/show_bug.cgi?id=196547).
Fixes: c4adfc822b ("bonding: make speed, duplex setting consistent
with link state")
Signed-off-by: Andreas Born <futur.andy@googlemail.com>
Acked-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Nate Clark <nate@neworld.us>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3f3c278c94 upstream.
Earlier patch c4adfc822b ("bonding: make speed, duplex setting
consistent with link state") made an attempt to keep slave state
consistent with speed and duplex settings. Unfortunately link-state
transition is used to change the active link especially when used
in conjunction with mii-mon. The above mentioned patch broke that
logic. Also when speed and duplex settings for a link are updated
during a link-event, the link-status should not be changed to
invoke correct transition logic.
This patch fixes this issue by moving the link-state update outside
of the bond_update_speed_duplex() fn and to the places where this fn
is called and update link-state selectively.
Fixes: c4adfc822b ("bonding: make speed, duplex setting consistent
with link state")
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Reviewed-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Nate Clark <nate@neworld.us>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b5bf0f5b16 upstream.
bond_miimon_commit() marks the link UP after attempting to get the speed
and duplex settings for the link. There is a possibility that
bond_update_speed_duplex() could fail. This is another place where it
could result into an inconsistent bonding link state.
With this patch the link will be marked UP only if the speed and duplex
values retrieved have sane values and processed further.
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Nate Clark <nate@neworld.us>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.