Commit a3c0f84765 upstream.
Spectre variant 1 attacks are about this sequence of pseudo-code:
index = load(user-manipulated pointer);
access(base + index * stride);
In order for the cache side-channel to work, the access() must me made
to memory which userspace can detect whether cache lines have been
loaded. On 32-bit ARM, this must be either user accessible memory, or
a kernel mapping of that same user accessible memory.
The problem occurs when the load() speculatively loads privileged data,
and the subsequent access() is made to user accessible memory.
Any load() which makes use of a user-maniplated pointer is a potential
problem if the data it has loaded is used in a subsequent access. This
also applies for the access() if the data loaded by that access is used
by a subsequent access.
Harden the get_user() accessors against Spectre attacks by forcing out
of bounds addresses to a NULL pointer. This prevents get_user() being
used as the load() step above. As a side effect, put_user() will also
be affected even though it isn't implicated.
Also harden copy_from_user() by redoing the bounds check within the
arm_copy_from_user() code, and NULLing the pointer if out of bounds.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David A. Long <dave.long@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit b1cd0a1480 upstream.
Fixing __get_user() for spectre variant 1 is not sane: we would have to
add address space bounds checking in order to validate that the location
should be accessed, and then zero the address if found to be invalid.
Since __get_user() is supposed to avoid the bounds check, and this is
exactly what get_user() does, there's no point having two different
implementations that are doing the same thing. So, when the Spectre
workarounds are required, make __get_user() an alias of get_user().
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David A. Long <dave.long@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit d09fbb327d upstream.
Borrow the x86 implementation of __inttype() to use in get_user() to
select an integer type suitable to temporarily hold the result value.
This is necessary to avoid propagating the volatile nature of the
result argument, which can cause the following warning:
lib/iov_iter.c:413:5: warning: optimization may eliminate reads and/or writes to register variables [-Wvolatile-register-var]
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David A. Long <dave.long@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 8c8484a1c1 upstream.
__get_user_error() is used as a fast accessor to make copying structure
members as efficient as possible. However, with software PAN and the
recent Spectre variant 1, the efficiency is reduced as these are no
longer fast accessors.
In the case of software PAN, it has to switch the domain register around
each access, and with Spectre variant 1, it would have to repeat the
access_ok() check for each access.
Rather than using __get_user_error() to copy each semops element member,
copy each semops element in full using __copy_from_user().
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David A. Long <dave.long@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 42019fc50d upstream.
__get_user_error() is used as a fast accessor to make copying structure
members in the signal handling path as efficient as possible. However,
with software PAN and the recent Spectre variant 1, the efficiency is
reduced as these are no longer fast accessors.
In the case of software PAN, it has to switch the domain register around
each access, and with Spectre variant 1, it would have to repeat the
access_ok() check for each access.
Use __copy_from_user() rather than __get_user_err() for individual
members when restoring VFP state.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David A. Long <dave.long@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit c32cd419d6 upstream.
__get_user_error() is used as a fast accessor to make copying structure
members in the signal handling path as efficient as possible. However,
with software PAN and the recent Spectre variant 1, the efficiency is
reduced as these are no longer fast accessors.
In the case of software PAN, it has to switch the domain register around
each access, and with Spectre variant 1, it would have to repeat the
access_ok() check for each access.
It becomes much more efficient to use __copy_from_user() instead, so
let's use this for the ARM integer registers.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David A. Long <dave.long@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit b800acfc70 upstream.
We want SMCCC_ARCH_WORKAROUND_1 to be fast. As fast as possible.
So let's intercept it as early as we can by testing for the
function call number as soon as we've identified a HVC call
coming from the guest.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Boot-tested-by: Tony Lindgren <tony@atomide.com>
Reviewed-by: Tony Lindgren <tony@atomide.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: David A. Long <dave.long@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 0c47ac8cd1 upstream.
In order to avoid aliasing attacks against the branch predictor
on Cortex-A15, let's invalidate the BTB on guest exit, which can
only be done by invalidating the icache (with ACTLR[0] being set).
We use the same hack as for A12/A17 to perform the vector decoding.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Boot-tested-by: Tony Lindgren <tony@atomide.com>
Reviewed-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: David A. Long <dave.long@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 3f7e8e2e1e upstream.
In order to avoid aliasing attacks against the branch predictor,
let's invalidate the BTB on guest exit. This is made complicated
by the fact that we cannot take a branch before invalidating the
BTB.
We only apply this to A12 and A17, which are the only two ARM
cores on which this useful.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Boot-tested-by: Tony Lindgren <tony@atomide.com>
Reviewed-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: David A. Long <dave.long@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit f5fe12b1ea upstream.
In order to prevent aliasing attacks on the branch predictor,
invalidate the BTB or instruction cache on CPUs that are known to be
affected when taking an abort on a address that is outside of a user
task limit:
Cortex A8, A9, A12, A17, A73, A75: flush BTB.
Cortex A15, Brahma B15: invalidate icache.
If the IBE bit is not set, then there is little point to enabling the
workaround.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Boot-tested-by: Tony Lindgren <tony@atomide.com>
Reviewed-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: David A. Long <dave.long@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit e388b80288 upstream.
When the branch predictor hardening is enabled, firmware must have set
the IBE bit in the auxiliary control register. If this bit has not
been set, the Spectre workarounds will not be functional.
Add validation that this bit is set, and print a warning at alert level
if this is not the case.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Boot-tested-by: Tony Lindgren <tony@atomide.com>
Reviewed-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: David A. Long <dave.long@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 06c23f5ffe upstream.
Required manual merge of arch/arm/mm/proc-v7.S.
Harden the branch predictor against Spectre v2 attacks on context
switches for ARMv7 and later CPUs. We do this by:
Cortex A9, A12, A17, A73, A75: invalidating the BTB.
Cortex A15, Brahma B15: invalidating the instruction cache.
Cortex A57 and Cortex A72 are not addressed in this patch.
Cortex R7 and Cortex R8 are also not addressed as we do not enforce
memory protection on these cores.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Boot-tested-by: Tony Lindgren <tony@atomide.com>
Reviewed-by: Tony Lindgren <tony@atomide.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: David A. Long <dave.long@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 9d3a04925d upstream.
Add support for per-processor bug checking - each processor function
descriptor gains a function pointer for this check, which must not be
an __init function. If non-NULL, this will be called whenever a CPU
enters the kernel via which ever path (boot CPU, secondary CPU startup,
CPU resuming, etc.)
This allows processor specific bug checks to validate that workaround
bits are properly enabled by firmware via all entry paths to the kernel.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Boot-tested-by: Tony Lindgren <tony@atomide.com>
Reviewed-by: Tony Lindgren <tony@atomide.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: David A. Long <dave.long@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 26602161b5 upstream.
Check for CPU bugs when secondary processors are being brought online,
and also when CPUs are resuming from a low power mode. This gives an
opportunity to check that processor specific bug workarounds are
correctly enabled for all paths that a CPU re-enters the kernel.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Boot-tested-by: Tony Lindgren <tony@atomide.com>
Reviewed-by: Tony Lindgren <tony@atomide.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: David A. Long <dave.long@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d135b8b506 upstream.
Clang tries to warn when there's a mismatch between an operand's size,
and the size of the register it is held in, as this may indicate a bug.
Specifically, clang warns when the operand's type is less than 64 bits
wide, and the register is used unqualified (i.e. %N rather than %xN or
%wN).
Unfortunately clang can generate these warnings for unreachable code.
For example, for code like:
do { \
typeof(*(ptr)) __v = (v); \
switch(sizeof(*(ptr))) { \
case 1: \
// assume __v is 1 byte wide \
asm ("{op}b %w0" : : "r" (v)); \
break; \
case 8: \
// assume __v is 8 bytes wide \
asm ("{op} %0" : : "r" (v)); \
break; \
}
while (0)
... if op() were passed a char value and pointer to char, clang may
produce a warning for the unreachable case where sizeof(*(ptr)) is 8.
For the same reasons, clang produces warnings when __put_user_err() is
used for types that are less than 64 bits wide.
We could avoid this with a cast to a fixed-width type in each of the
cases. However, GCC will then warn that pointer types are being cast to
mismatched integer sizes (in unreachable paths).
Another option would be to use the same union trickery as we do for
__smp_store_release() and __smp_load_acquire(), but this is fairly
invasive.
Instead, this patch suppresses the clang warning by using an x modifier
in the assembly for the 8 byte case of __put_user_err(). No additional
work is necessary as the value has been cast to typeof(*(ptr)), so the
compiler will have performed any necessary extension for the reachable
case.
For consistency, __get_user_err() is also updated to use the x modifier
for its 8 byte case.
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8c97023cf0 upstream.
Commit 971a69db7d ("Xen: don't warn about 2-byte wchar_t in efi")
added the --no-wchar-size-warning to the Makefile to avoid this
harmless warning:
arm-linux-gnueabi-ld: warning: drivers/xen/efi.o uses 2-byte wchar_t yet the output is to use 4-byte wchar_t; use of wchar_t values across objects may fail
Changing kbuild to use thin archives instead of recursive linking
unfortunately brings the same warning back during the final link.
The kernel does not use wchar_t string literals at this point, and
xen does not use wchar_t at all (only efi_char16_t), so the flag
has no effect, but as pointed out by Jan Beulich, adding a wchar_t
string literal would be bad here.
Since wchar_t is always defined as u16, independent of the toolchain
default, always passing -fshort-wchar is correct and lets us
remove the Xen specific hack along with fixing the warning.
Link: https://patchwork.kernel.org/patch/9275217/
Fixes: 971a69db7d ("Xen: don't warn about 2-byte wchar_t in efi")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8f91869766 upstream.
Commit:
d77698df39 ("x86/build: Specify stack alignment for clang")
intended to use the same stack alignment for clang as with gcc.
The two compilers use different options to configure the stack alignment
(gcc: -mpreferred-stack-boundary=n, clang: -mstack-alignment=n).
The above commit assumes that the clang option uses the same parameter
type as gcc, i.e. that the alignment is specified as 2^n. However clang
interprets the value of this option literally to use an alignment of n,
in consequence the stack remains misaligned.
Change the values used with -mstack-alignment to be the actual alignment
instead of a power of two.
cc-option isn't used here with the typical pattern of KBUILD_CFLAGS +=
$(call cc-option ...). The reason is that older gcc versions don't
support the -mpreferred-stack-boundary option, since cc-option doesn't
verify whether the alternative option is valid it would incorrectly
select the clang option -mstack-alignment..
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Bernhard.Rosenkranzer@linaro.org
Cc: Greg Hackmann <ghackmann@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Michael Davidson <md@google.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hines <srhines@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dianders@chromium.org
Link: http://lkml.kernel.org/r/20170817004740.170588-1-mka@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 91ee5b21ee upstream.
Clang may emit absolute symbol references when building in non-PIC mode,
even when using the default 'small' code model, which is already mostly
position independent to begin with, due to its use of adrp/add pairs
that have a relative range of +/- 4 GB. The remedy is to pass the -fpie
flag, which can be done safely now that the code has been updated to avoid
GOT indirections (which may be emitted due to the compiler assuming that
the PIC/PIE code may end up in a shared library that is subject to ELF
symbol preemption)
Passing -fpie when building code that needs to execute at an a priori
unknown offset is arguably an improvement in any case, and given that
the recent visibility changes allow the PIC build to pass with GCC as
well, let's add -fpie for all arm64 builds rather than only for Clang.
Tested-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20170818194947.19347-5-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 696204faa6 upstream.
The build commands for the ARM and arm64 EFI stubs strip the .debug
sections and other sections that may legally contain absolute relocations,
in order to inspect the remaining sections for the presence of such
relocations.
This leaves us without debugging symbols in the stub for no good reason,
considering that these sections are omitted from the kernel binary anyway,
and that these relocations are thus only consumed by users of the ELF
binary, such as debuggers.
So move to 'strip' for performing the relocation check, and if it succeeds,
invoke objcopy as before, but leaving the .debug sections in place. Note
that these sections may refer to ksymtab/kcrctab contents, so leave those
in place as well.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1485868902-20401-11-git-send-email-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f4857f4c2e upstream.
Replace the inline asm which exports struct offsets as ELF symbols
with proper const variables exposing the same values. This works
around an issue with Clang which does not interpret the "i" (or "I")
constraints in the same way as GCC.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d77698df39 upstream.
For gcc stack alignment is configured with -mpreferred-stack-boundary=N,
clang has the option -mstack-alignment=N for that purpose. Use the same
alignment as with gcc.
If the alignment is not specified clang assumes an alignment of
16 bytes, as required by the standard ABI. However as mentioned in
d9b0cde91c ("x86-64, gcc: Use -mpreferred-stack-boundary=3 if
supported") the standard kernel entry on x86-64 leaves the stack
on an 8-byte boundary, as a consequence clang will keep the stack
misaligned.
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 032a2c4f65 upstream.
cc-option is used to enable compiler options for the boot code if they
are available. The macro uses KBUILD_CFLAGS and KBUILD_CPPFLAGS for the
check, however these flags aren't used to build the boot code, in
consequence cc-option can yield wrong results. For example
-mpreferred-stack-boundary=2 is never set with a 64-bit compiler,
since the setting is only valid for 16 and 32-bit binaries. This
is also the case for 32-bit kernel builds, because the option -m32 is
added to KBUILD_CFLAGS after the assignment of REALMODE_CFLAGS.
Use __cc-option instead of cc-option for the boot mode options.
The macro receives the compiler options as parameter instead of using
KBUILD_C*FLAGS, for the boot code we pass REALMODE_CFLAGS.
Also use separate statements for the __cc-option checks instead
of performing them in the initial assignment of REALMODE_CFLAGS since
the variable is an input of the macro.
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9f3f1fd299 upstream.
cc-option uses KBUILD_CFLAGS and KBUILD_CPPFLAGS when it determines
whether an option is supported or not. This is fine for options used to
build the kernel itself, however some components like the x86 boot code
use a different set of flags.
Add the new macro __cc-option which is a more generic version of
cc-option with additional parameters. One parameter is the compiler
with which the check should be performed, the other the compiler options
to be used instead KBUILD_C*FLAGS.
Refactor cc-option and hostcc-option to use __cc-option and move
hostcc-option to scripts/Kbuild.include.
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Suggested-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Michal Marek <mmarek@suse.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fdb2726f4e upstream.
aes_ctrby8_avx-x86_64.S uses the C preprocessor for token pasting
of character sequences that are not valid preprocessor tokens.
While this is allowed when preprocessing assembler files it exposes
an incompatibilty between the clang and gcc preprocessors where
clang does not strip leading white space from macro parameters,
leading to the CONCAT(%xmm, i) macro expansion on line 96 resulting
in a token with a space character embedded in it.
While this could be resolved by deleting the offending space character,
the assembler is perfectly capable of doing the token pasting correctly
for itself so we can just get rid of the preprocessor macros.
Signed-off-by: Michael Davidson <md@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1f318a8baf upstream.
clang warns about unused inline functions by default:
arch/arm/crypto/aes-cipher-glue.c:68:1: warning: unused function '__inittest' [-Wunused-function]
arch/arm/crypto/aes-cipher-glue.c:69:1: warning: unused function '__exittest' [-Wunused-function]
As these appear in every single module, let's just disable the warnings by marking the
two functions as __maybe_unused.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jessica Yu <jeyu@redhat.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c3f0d0bc5b upstream.
Clang will warn about unknown warnings but will not return false
unless -Werror is set. GCC will return false if an unknown
warning is passed.
Adding -Werror make both compiler behave the same.
[arnd: it turns out we need the same patch for testing whether -ffunction-sections
works right with gcc. I've build tested extensively with this patch
applied, so let's just merge this one now.]
Signed-off-by: Mark Charlebois <charlebm@gmail.com>
Signed-off-by: Behan Webster <behanw@converseincode.com>
Reviewed-by: Jan-Simon Möller <dl9pf@gmx.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a0ae981eba upstream.
Since commit c3f0d0bc5b ("kbuild, LLVMLinux: Add -Werror to
cc-option to support clang"), cc-option and friends work nicely
for clang.
However, -Wno-unknown-warning-option makes clang happy with any
unknown warning options even if -Werror is specified.
Once -Wno-unknown-warning-option is added, any succeeding call of
cc-disable-warning is evaluated positive, then unknown warning
options are accepted. This should be dropped.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cf0c3e68aa upstream.
KBuild abuses the asm statement to write to a file and
clang chokes about these invalid asm statements. Hack it
even more by fooling this is actual valid asm code.
[masahiro:
Import Jeroen's work for U-Boot:
http://patchwork.ozlabs.org/patch/375026/
Tweak sed script a little to avoid garbage '#' for GCC case, like
#define NR_PAGEFLAGS 23 /* __NR_PAGEFLAGS # */ ]
Signed-off-by: Jeroen Hofstee <jeroen@myspectrum.nl>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Matthias Kaehlcke <mka@chromium.org>
Tested-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7dd47b95b0 upstream.
This part ended up in redundant code after touched by multiple
people.
[1] Commit 3234282f33 ("x86, asm: Fix CFI macro invocations to
deal with shortcomings in gas") added parentheses for defined
expressions to support old gas for x86.
[2] Commit a22dcdb003 ("x86, asm: Fix ancient-GAS workaround")
split the pattern into two to avoid parentheses for non-numeric
expressions.
[3] Commit 95a2f6f72d ("Partially revert patch that encloses
asm-offset.h numbers in brackets") removed parentheses from numeric
expressions as well because parentheses in MN10300 assembly have a
special meaning (pointer access).
Apparently, there is a conflict between [1] and [3]. After all,
[3] took precedence, and a long time has passed since then.
Now, merge the two patterns again because the first one is covered
by the other.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ebf003f0cf upstream.
Largely redundant code is used in different places to generate C headers
from offset information extracted from assembly language output.
Consolidate the code in Makefile.lib and use this instead.
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a37c45cd82 upstream.
The Linux Kernel relies on GCC's acceptance of inline assembly as an
opaque object which will not have any validation performed on the content.
The current behaviour in LLVM is to perform validation of the contents by
means of parsing the input if the MC layer can handle it.
Disable clangs integrated assembler and use the GNU assembler instead.
Wording-mostly-from: Saleem Abdulrasool <compnerd@compnerd.org>
Signed-off-by: Michael Davidson <md@google.com>
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 785f11aa59 upstream.
Add cross target to CC if using clang. Also add custom gcc toolchain
path for fallback gcc tools.
Clang will fallback to using things like ld, as, and libgcc if
(respectively) one of the llvm linkers isn't available, the integrated
assembler is turned off, or an appropriately cross-compiled version of
compiler-rt isn't available. To this end, you can specify the path to
this fallback gcc toolchain with GCC_TOOLCHAIN.
Signed-off-by: Behan Webster <behanw@converseincode.com>
Reviewed-by: Jan-Simon Möller <dl9pf@gmx.de>
Reviewed-by: Mark Charlebois <charlebm@gmail.com>
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 7ddacfa564 ]
Preethi reported that PMTU discovery for UDP/raw applications is not
working in the presence of VRF when the socket is not bound to a device.
The problem is that ip6_sk_update_pmtu does not consider the L3 domain
of the skb device if the socket is not bound. Update the function to
set oif to the L3 master device if relevant.
Fixes: ca254490c8 ("net: Add VRF support to IPv6 stack")
Reported-by: Preethi Ramachandra <preethir@juniper.net>
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0d5b9311ba ]
Multiple cpus might attempt to insert a new fragment in rhashtable,
if for example RPS is buggy, as reported by 배석진 in
https://patchwork.ozlabs.org/patch/994601/
We use rhashtable_lookup_get_insert_key() instead of
rhashtable_insert_fast() to let cpus losing the race
free their own inet_frag_queue and use the one that
was inserted by another cpu.
Fixes: 648700f76b ("inet: frags: use rhashtables for reassembly units")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: 배석진 <soukjin.bae@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 7b900ead6c ]
We need to make sure, that the carrier check polling is disabled
while suspending. Otherwise we can end up with usbnet_read_cmd()
being issued when only usbnet_read_cmd_nopm() is allowed. If this
happens, read operations lock up.
Fixes: d69d169493 ("usbnet: smsc95xx: fix link detection for disabled autonegotiation")
Signed-off-by: Frieder Schrempf <frieder.schrempf@kontron.de>
Reviewed-by: Raghuram Chary J <RaghuramChary.Jallipalli@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 59663e4219 ]
This patch has the fix to avoid PHY lockup with 5717/5719/5720 in change
ring and flow control paths. This patch solves the RX hang while doing
continuous ring or flow control parameters with heavy traffic from peer.
Signed-off-by: Siva Reddy Kallam <siva.kallam@broadcom.com>
Acked-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit cc3ccf26f0 ]
As rfc7496#section4.5 says about SCTP_PR_SUPPORTED:
This socket option allows the enabling or disabling of the
negotiation of PR-SCTP support for future associations. For existing
associations, it allows one to query whether or not PR-SCTP support
was negotiated on a particular association.
It means only sctp sock's prsctp_enable can be set.
Note that for the limitation of SCTP_{CURRENT|ALL}_ASSOC, we will
add it when introducing SCTP_{FUTURE|CURRENT|ALL}_ASSOC for linux
sctp in another patchset.
v1->v2:
- drop the params.assoc_id check as Neil suggested.
Fixes: 28aa4c26fc ("sctp: add SCTP_PR_SUPPORTED on sctp sockopt")
Reported-by: Ying Xu <yinxu@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 33d9a2c72f ]
eth_type_trans() assumes initial value for skb->pkt_type
is PACKET_HOST.
This is indeed the value right after a fresh skb allocation.
However, it is possible that GRO merged a packet with a different
value (like PACKET_OTHERHOST in case macvlan is used), so
we need to make sure napi->skb will have pkt_type set back to
PACKET_HOST.
Otherwise, valid packets might be dropped by the stack because
their pkt_type is not PACKET_HOST.
napi_reuse_skb() was added in commit 96e93eab20 ("gro: Add
internal interfaces for VLAN"), but this bug always has
been there.
Fixes: 96e93eab20 ("gro: Add internal interfaces for VLAN")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 16f7eb2b77 ]
The various types of tunnels running over IPv4 can ask to set the DF
bit to do PMTU discovery. However, PMTU discovery is subject to the
threshold set by the net.ipv4.route.min_pmtu sysctl, and is also
disabled on routes with "mtu lock". In those cases, we shouldn't set
the DF bit.
This patch makes setting the DF bit conditional on the route's MTU
locking state.
This issue seems to be older than git history.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 62230715fd ]
Only first fragment has the sport/dport information,
not the following ones.
If we want consistent hash for all fragments, we need to
ignore ports even for first fragment.
This bug is visible for IPv6 traffic, if incoming fragments
do not have a flow label, since skb_get_hash() will give
different results for first fragment and following ones.
It is also visible if any routing rule wants dissection
and sport or dport.
See commit 5e5d6fed37 ("ipv6: route: dissect flow
in input path if fib rules need it") for details.
[edumazet] rewrote the changelog completely.
Fixes: 06635a35d1 ("flow_dissect: use programable dissector in skb_flow_dissect and friends")
Signed-off-by: 배석진 <soukjin.bae@samsung.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit da5a3ce66b upstream.
At boot time, KVM stashes the host MDCR_EL2 value, but only does this
when the kernel is not running in hyp mode (i.e. is non-VHE). In these
cases, the stashed value of MDCR_EL2.HPMN happens to be zero, which can
lead to CONSTRAINED UNPREDICTABLE behaviour.
Since we use this value to derive the MDCR_EL2 value when switching
to/from a guest, after a guest have been run, the performance counters
do not behave as expected. This has been observed to result in accesses
via PMXEVTYPER_EL0 and PMXEVCNTR_EL0 not affecting the relevant
counters, resulting in events not being counted. In these cases, only
the fixed-purpose cycle counter appears to work as expected.
Fix this by always stashing the host MDCR_EL2 value, regardless of VHE.
Cc: Christopher Dall <christoffer.dall@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: stable@vger.kernel.org
Fixes: 1e947bad0b ("arm64: KVM: Skip HYP setup when already running in HYP")
Tested-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7f3ef5dedb upstream.
Leaving the DRM driver enabled on reboot or kexec has the annoying
effect of leaving the display generating transactions whilst the
IOMMU has been shut down.
In turn, the IOMMU driver (which shares its interrupt line with
the VOP) starts warning either on shutdown or when entering the
secondary kernel in the kexec case (nothing is expected on that
front).
A cheap way of ensuring that things are nicely shut down is to
register a shutdown callback in the platform driver.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Tested-by: Vicente Bergas <vicencb@gmail.com>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20180805124807.18169-1-marc.zyngier@arm.com
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 017b1660df upstream.
The page migration code employs try_to_unmap() to try and unmap the source
page. This is accomplished by using rmap_walk to find all vmas where the
page is mapped. This search stops when page mapcount is zero. For shared
PMD huge pages, the page map count is always 1 no matter the number of
mappings. Shared mappings are tracked via the reference count of the PMD
page. Therefore, try_to_unmap stops prematurely and does not completely
unmap all mappings of the source page.
This problem can result is data corruption as writes to the original
source page can happen after contents of the page are copied to the target
page. Hence, data is lost.
This problem was originally seen as DB corruption of shared global areas
after a huge page was soft offlined due to ECC memory errors. DB
developers noticed they could reproduce the issue by (hotplug) offlining
memory used to back huge pages. A simple testcase can reproduce the
problem by creating a shared PMD mapping (note that this must be at least
PUD_SIZE in size and PUD_SIZE aligned (1GB on x86)), and using
migrate_pages() to migrate process pages between nodes while continually
writing to the huge pages being migrated.
To fix, have the try_to_unmap_one routine check for huge PMD sharing by
calling huge_pmd_unshare for hugetlbfs huge pages. If it is a shared
mapping it will be 'unshared' which removes the page table entry and drops
the reference on the PMD page. After this, flush caches and TLB.
mmu notifiers are called before locking page tables, but we can not be
sure of PMD sharing until page tables are locked. Therefore, check for
the possibility of PMD sharing before locking so that notifiers can
prepare for the worst possible case.
Link: http://lkml.kernel.org/r/20180823205917.16297-2-mike.kravetz@oracle.com
[mike.kravetz@oracle.com: make _range_in_vma() a static inline]
Link: http://lkml.kernel.org/r/6063f215-a5c8-2f0c-465a-2c515ddc952d@oracle.com
Fixes: 39dde65c99 ("shared page table for hugetlb page")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5e41540c8a upstream.
This bug has been experienced several times by the Oracle DB team. The
BUG is in remove_inode_hugepages() as follows:
/*
* If page is mapped, it was faulted in after being
* unmapped in caller. Unmap (again) now after taking
* the fault mutex. The mutex will prevent faults
* until we finish removing the page.
*
* This race can only happen in the hole punch case.
* Getting here in a truncate operation is a bug.
*/
if (unlikely(page_mapped(page))) {
BUG_ON(truncate_op);
In this case, the elevated map count is not the result of a race.
Rather it was incorrectly incremented as the result of a bug in the huge
pmd sharing code. Consider the following:
- Process A maps a hugetlbfs file of sufficient size and alignment
(PUD_SIZE) that a pmd page could be shared.
- Process B maps the same hugetlbfs file with the same size and
alignment such that a pmd page is shared.
- Process B then calls mprotect() to change protections for the mapping
with the shared pmd. As a result, the pmd is 'unshared'.
- Process B then calls mprotect() again to chage protections for the
mapping back to their original value. pmd remains unshared.
- Process B then forks and process C is created. During the fork
process, we do dup_mm -> dup_mmap -> copy_page_range to copy page
tables. Copying page tables for hugetlb mappings is done in the
routine copy_hugetlb_page_range.
In copy_hugetlb_page_range(), the destination pte is obtained by:
dst_pte = huge_pte_alloc(dst, addr, sz);
If pmd sharing is possible, the returned pointer will be to a pte in an
existing page table. In the situation above, process C could share with
either process A or process B. Since process A is first in the list,
the returned pte is a pointer to a pte in process A's page table.
However, the check for pmd sharing in copy_hugetlb_page_range is:
/* If the pagetables are shared don't copy or take references */
if (dst_pte == src_pte)
continue;
Since process C is sharing with process A instead of process B, the
above test fails. The code in copy_hugetlb_page_range which follows
assumes dst_pte points to a huge_pte_none pte. It copies the pte entry
from src_pte to dst_pte and increments this map count of the associated
page. This is how we end up with an elevated map count.
To solve, check the dst_pte entry for huge_pte_none. If !none, this
implies PMD sharing so do not copy.
Link: http://lkml.kernel.org/r/20181105212315.14125-1-mike.kravetz@oracle.com
Fixes: c5c99429fa ("fix hugepages leak due to pagetable page sharing")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Prakash Sangappa <prakash.sangappa@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1823342a1f upstream.
gcc 8.1.0 complains:
fs/configfs/symlink.c:67:3: warning:
'strncpy' output truncated before terminating nul copying as many
bytes from a string as its length
fs/configfs/symlink.c: In function 'configfs_get_link':
fs/configfs/symlink.c:63:13: note: length computed here
Using strncpy() is indeed less than perfect since the length of data to
be copied has already been determined with strlen(). Replace strncpy()
with memcpy() to address the warning and optimize the code a little.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nobuhiro Iwamatsu <nobuhiro.iwamatsu@cybertrust.co.jp>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7fabaf3034 upstream.
fuse_request_send_notify_reply() may fail if the connection was reset for
some reason (e.g. fs was unmounted). Don't leak request reference in this
case. Besides leaking memory, this resulted in fc->num_waiting not being
decremented and hence fuse_wait_aborted() left in a hanging and unkillable
state.
Fixes: 2d45ba381a ("fuse: add retrieve request")
Fixes: b8f95e5d13 ("fuse: umount should wait for all requests")
Reported-and-tested-by: syzbot+6339eda9cb4ebbc4c37b@syzkaller.appspotmail.com
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: <stable@vger.kernel.org> #v2.6.36
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ebacb81273 upstream.
In async IO blocking case the additional reference to the io is taken for
it to survive fuse_aio_complete(). In non blocking case this additional
reference is not needed, however we still reference io to figure out
whether to wait for completion or not. This is wrong and will lead to
use-after-free. Fix it by storing blocking information in separate
variable.
This was spotted by KASAN when running generic/208 fstest.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reported-by: Zorro Lang <zlang@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 744742d692 ("fuse: Add reference counting for fuse_io_priv")
Cc: <stable@vger.kernel.org> # v4.6
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7ce9a992ff upstream.
Fix an issue with the 32-bit range error path in `rtc_hctosys' where no
error code is set and consequently the successful preceding call result
from `rtc_read_time' is propagated to `rtc_hctosys_ret'. This in turn
makes any subsequent call to `hctosys_show' incorrectly report in sysfs
that the system time has been set from this RTC while it has not.
Set the error to ERANGE then if we can't express the result due to an
overflow.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Fixes: b3a5ac42ab ("rtc: hctosys: Ensure system time doesn't overflow time_t")
Cc: stable@vger.kernel.org # 4.17+
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5d7a5bcb67 upstream.
When truncating the encode buffer, the page_ptr is getting
advanced, causing the next page to be skipped while encoding.
The page is still included in the response, so the response
contains a page of bogus data.
We need to adjust the page_ptr backwards to ensure we encode
the next page into the correct place.
We saw this triggered when concurrent directory modifications caused
nfsd4_encode_direct_fattr() to return nfserr_noent, and the resulting
call to xdr_truncate_encode() corrupted the READDIR reply.
Signed-off-by: Frank Sorenson <sorenson@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9c8e0a1b68 upstream.
Timothy Baldwin <timbaldwin@fastmail.co.uk> wrote:
> As per mount_namespaces(7) unprivileged users should not be able to look under mount points:
>
> Mounts that come as a single unit from more privileged mount are locked
> together and may not be separated in a less privileged mount namespace.
>
> However they can:
>
> 1. Create a mount namespace.
> 2. In the mount namespace open a file descriptor to the parent of a mount point.
> 3. Destroy the mount namespace.
> 4. Use the file descriptor to look under the mount point.
>
> I have reproduced this with Linux 4.16.18 and Linux 4.18-rc8.
>
> The setup:
>
> $ sudo sysctl kernel.unprivileged_userns_clone=1
> kernel.unprivileged_userns_clone = 1
> $ mkdir -p A/B/Secret
> $ sudo mount -t tmpfs hide A/B
>
>
> "Secret" is indeed hidden as expected:
>
> $ ls -lR A
> A:
> total 0
> drwxrwxrwt 2 root root 40 Feb 12 21:08 B
>
> A/B:
> total 0
>
>
> The attack revealing "Secret":
>
> $ unshare -Umr sh -c "exec unshare -m ls -lR /proc/self/fd/4/ 4<A"
> /proc/self/fd/4/:
> total 0
> drwxr-xr-x 3 root root 60 Feb 12 21:08 B
>
> /proc/self/fd/4/B:
> total 0
> drwxr-xr-x 2 root root 40 Feb 12 21:08 Secret
>
> /proc/self/fd/4/B/Secret:
> total 0
I tracked this down to put_mnt_ns running passing UMOUNT_SYNC and
disconnecting all of the mounts in a mount namespace. Fix this by
factoring drop_mounts out of drop_collected_mounts and passing
0 instead of UMOUNT_SYNC.
There are two possible behavior differences that result from this.
- No longer setting UMOUNT_SYNC will no longer set MNT_SYNC_UMOUNT on
the vfsmounts being unmounted. This effects the lazy rcu walk by
kicking the walk out of rcu mode and forcing it to be a non-lazy
walk.
- No longer disconnecting locked mounts will keep some mounts around
longer as they stay because the are locked to other mounts.
There are only two users of drop_collected mounts: audit_tree.c and
put_mnt_ns.
In audit_tree.c the mounts are private and there are no rcu lazy walks
only calls to iterate_mounts. So the changes should have no effect
except for a small timing effect as the connected mounts are disconnected.
In put_mnt_ns there may be references from process outside the mount
namespace to the mounts. So the mounts remaining connected will
be the bug fix that is needed. That rcu walks are allowed to continue
appears not to be a problem especially as the rcu walk change was about
an implementation detail not about semantics.
Cc: stable@vger.kernel.org
Fixes: 5ff9d8a65c ("vfs: Lock in place mounts from more privileged users")
Reported-by: Timothy Baldwin <timbaldwin@fastmail.co.uk>
Tested-by: Timothy Baldwin <timbaldwin@fastmail.co.uk>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit df7342b240 upstream.
Jonathan Calmels from NVIDIA reported that he's able to bypass the
mount visibility security check in place in the Linux kernel by using
a combination of the unbindable property along with the private mount
propagation option to allow a unprivileged user to see a path which
was purposefully hidden by the root user.
Reproducer:
# Hide a path to all users using a tmpfs
root@castiana:~# mount -t tmpfs tmpfs /sys/devices/
root@castiana:~#
# As an unprivileged user, unshare user namespace and mount namespace
stgraber@castiana:~$ unshare -U -m -r
# Confirm the path is still not accessible
root@castiana:~# ls /sys/devices/
# Make /sys recursively unbindable and private
root@castiana:~# mount --make-runbindable /sys
root@castiana:~# mount --make-private /sys
# Recursively bind-mount the rest of /sys over to /mnnt
root@castiana:~# mount --rbind /sys/ /mnt
# Access our hidden /sys/device as an unprivileged user
root@castiana:~# ls /mnt/devices/
breakpoint cpu cstate_core cstate_pkg i915 intel_pt isa kprobe
LNXSYSTM:00 msr pci0000:00 platform pnp0 power software system
tracepoint uncore_arb uncore_cbox_0 uncore_cbox_1 uprobe virtual
Solve this by teaching copy_tree to fail if a mount turns out to be
both unbindable and locked.
Cc: stable@vger.kernel.org
Fixes: 5ff9d8a65c ("vfs: Lock in place mounts from more privileged users")
Reported-by: Jonathan Calmels <jcalmels@nvidia.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 25d202ed82 upstream.
It was recently pointed out that the one instance of testing MNT_LOCKED
outside of the namespace_sem is in ksys_umount.
Fix that by adding a test inside of do_umount with namespace_sem and
the mount_lock held. As it helps to fail fails the existing test is
maintained with an additional comment pointing out that it may be racy
because the locks are not held.
Cc: stable@vger.kernel.org
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Fixes: 5ff9d8a65c ("vfs: Lock in place mounts from more privileged users")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9e4028935c upstream.
Currently bh is set to NULL only during first iteration of for cycle,
then this pointer is not cleared after end of using.
Therefore rollback after errors can lead to extra brelse(bh) call,
decrements bh counter and later trigger an unexpected warning in __brelse()
Patch moves brelse() calls in body of cycle to exclude requirement of
brelse() call in rollback.
Fixes: 33afdcc540 ("ext4: add a function which sets up group blocks ...")
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org # 3.3+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ac765f83f1 upstream.
We currently allow cloning a range from a file which includes the last
block of the file even if the file's size is not aligned to the block
size. This is fine and useful when the destination file has the same size,
but when it does not and the range ends somewhere in the middle of the
destination file, it leads to corruption because the bytes between the EOF
and the end of the block have undefined data (when there is support for
discard/trimming they have a value of 0x00).
Example:
$ mkfs.btrfs -f /dev/sdb
$ mount /dev/sdb /mnt
$ export foo_size=$((256 * 1024 + 100))
$ xfs_io -f -c "pwrite -S 0x3c 0 $foo_size" /mnt/foo
$ xfs_io -f -c "pwrite -S 0xb5 0 1M" /mnt/bar
$ xfs_io -c "reflink /mnt/foo 0 512K $foo_size" /mnt/bar
$ od -A d -t x1 /mnt/bar
0000000 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5
*
0524288 3c 3c 3c 3c 3c 3c 3c 3c 3c 3c 3c 3c 3c 3c 3c 3c
*
0786528 3c 3c 3c 3c 00 00 00 00 00 00 00 00 00 00 00 00
0786544 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
*
0790528 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5
*
1048576
The bytes in the range from 786532 (512Kb + 256Kb + 100 bytes) to 790527
(512Kb + 256Kb + 4Kb - 1) got corrupted, having now a value of 0x00 instead
of 0xb5.
This is similar to the problem we had for deduplication that got recently
fixed by commit de02b9f6bb ("Btrfs: fix data corruption when
deduplicating between different files").
Fix this by not allowing such operations to be performed and return the
errno -EINVAL to user space. This is what XFS is doing as well at the VFS
level. This change however now makes us return -EINVAL instead of
-EOPNOTSUPP for cases where the source range maps to an inline extent and
the destination range's end is smaller then the destination file's size,
since the detection of inline extents is done during the actual process of
dropping file extent items (at __btrfs_drop_extents()). Returning the
-EINVAL error is done early on and solely based on the input parameters
(offsets and length) and destination file's size. This makes us consistent
with XFS and anyone else supporting cloning since this case is now checked
at a higher level in the VFS and is where the -EINVAL will be returned
from starting with kernel 4.20 (the VFS changed was introduced in 4.20-rc1
by commit 07d19dc9fb ("vfs: avoid problematic remapping requests into
partial EOF block"). So this change is more geared towards stable kernels,
as it's unlikely the new VFS checks get removed intentionally.
A test case for fstests follows soon, as well as an update to filter
existing tests that expect -EOPNOTSUPP to accept -EINVAL as well.
CC: <stable@vger.kernel.org> # 4.4+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d0ffb805b7 upstream.
Alpha has had c_ispeed and c_ospeed, but still set speeds in c_cflags
using arbitrary flags. Because BOTHER is not defined, the general
Linux code doesn't allow setting arbitrary baud rates, and because
CBAUDEX == 0, we can have an array overrun of the baud_rate[] table in
drivers/tty/tty_baudrate.c if (c_cflags & CBAUD) == 037.
Resolve both problems by #defining BOTHER to 037 on Alpha.
However, userspace still needs to know if setting BOTHER is actually
safe given legacy kernels (does anyone actually care about that on
Alpha anymore?), so enable the TCGETS2/TCSETS*2 ioctls on Alpha, even
though they use the same structure. Define struct termios2 just for
compatibility; it is the exact same structure as struct termios. In a
future patchset, this will be cleaned up so the uapi headers are
usable from libc.
Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Cc: Jiri Slaby <jslaby@suse.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Eugene Syromiatnikov <esyr@redhat.com>
Cc: <linux-alpha@vger.kernel.org>
Cc: <linux-serial@vger.kernel.org>
Cc: Johan Hovold <johan@kernel.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 89c38422e0 upstream.
Currently the NUMA distance map parsing does not validate the distance
table for the distance-matrix rules 1-2 in [1].
However the arch NUMA code may enforce some of these rules, but not all.
Such is the case for the arm64 port, which does not enforce the rule that
the distance between separates nodes cannot equal LOCAL_DISTANCE.
The patch adds the following rules validation:
- distance of node to self equals LOCAL_DISTANCE
- distance of separate nodes > LOCAL_DISTANCE
This change avoids a yet-unresolved crash reported in [2].
A note on dealing with symmetrical distances between nodes:
Validating symmetrical distances between nodes is difficult. If it were
mandated in the bindings that every distance must be recorded in the
table, then it would be easy. However, it isn't.
In addition to this, it is also possible to record [b, a] distance only
(and not [a, b]). So, when processing the table for [b, a], we cannot
assert that current distance of [a, b] != [b, a] as invalid, as [a, b]
distance may not be present in the table and current distance would be
default at REMOTE_DISTANCE.
As such, we maintain the policy that we overwrite distance [a, b] = [b, a]
for b > a. This policy is different to kernel ACPI SLIT validation, which
allows non-symmetrical distances (ACPI spec SLIT rules allow it). However,
the distance debug message is dropped as it may be misleading (for a distance
which is later overwritten).
Some final notes on semantics:
- It is implied that it is the responsibility of the arch NUMA code to
reset the NUMA distance map for an error in distance map parsing.
- It is the responsibility of the FW NUMA topology parsing (whether OF or
ACPI) to enforce NUMA distance rules, and not arch NUMA code.
[1] Documents/devicetree/bindings/numa.txt
[2] https://www.spinics.net/lists/arm-kernel/msg683304.html
Cc: stable@vger.kernel.org # 4.7
Signed-off-by: John Garry <john.garry@huawei.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit be2e1c9dcf upstream.
I noticed during the creation of another bugfix that the BCH_CONST_PARAMS
option that is set by DOCG3 breaks setting variable parameters for any
other users of the BCH library code.
The only other user we have today is the MTD_NAND software BCH
implementation (most flash controllers use hardware BCH these days
and are not affected). I considered removing BCH_CONST_PARAMS entirely
because of the inherent conflict, but according to the description in
lib/bch.c there is a significant performance benefit in keeping it.
To avoid the immediate problem of the conflict between MTD_NAND_BCH
and DOCG3, this only sets the constant parameters if MTD_NAND_BCH
is disabled, which should fix the problem for all cases that
are affected. This should also work for all stable kernels.
Note that there is only one machine that actually seems to use the
DOCG3 driver (arch/arm/mach-pxa/mioa701.c), so most users should have
the driver disabled, but it almost certainly shows up if we wanted
to test random kernels on machines that use software BCH in MTD.
Fixes: d13d19ece3 ("mtd: docg3: add ECC correction code")
Cc: stable@vger.kernel.org
Cc: Robert Jarzmik <robert.jarzmik@free.fr>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f393808dc6 upstream.
If there's no entry to drop in bucket that corresponds to the hash,
early_drop() should look for it in other buckets. But since it increments
hash instead of bucket number, it actually looks in the same bucket 8
times: hsize is 16k by default (14 bits) and hash is 32-bit value, so
reciprocal_scale(hash, hsize) returns the same value for hash..hash+7 in
most cases.
Fix it by increasing bucket number instead of hash and rename _hash
to bucket to avoid future confusion.
Fixes: 3e86638e9a ("netfilter: conntrack: consider ct netns in early_drop logic")
Cc: <stable@vger.kernel.org> # v4.7+
Signed-off-by: Vasily Khoruzhick <vasilykh@arista.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ac5b2c1891 upstream.
THP allocation might be really disruptive when allocated on NUMA system
with the local node full or hard to reclaim. Stefan has posted an
allocation stall report on 4.12 based SLES kernel which suggests the
same issue:
kvm: page allocation stalls for 194572ms, order:9, mode:0x4740ca(__GFP_HIGHMEM|__GFP_IO|__GFP_FS|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_HARDWALL|__GFP_THISNODE|__GFP_MOVABLE|__GFP_DIRECT_RECLAIM), nodemask=(null)
kvm cpuset=/ mems_allowed=0-1
CPU: 10 PID: 84752 Comm: kvm Tainted: G W 4.12.0+98-ph <a href="/view.php?id=1" title="[geschlossen] Integration Ramdisk" class="resolved">0000001</a> SLE15 (unreleased)
Hardware name: Supermicro SYS-1029P-WTRT/X11DDW-NT, BIOS 2.0 12/05/2017
Call Trace:
dump_stack+0x5c/0x84
warn_alloc+0xe0/0x180
__alloc_pages_slowpath+0x820/0xc90
__alloc_pages_nodemask+0x1cc/0x210
alloc_pages_vma+0x1e5/0x280
do_huge_pmd_wp_page+0x83f/0xf00
__handle_mm_fault+0x93d/0x1060
handle_mm_fault+0xc6/0x1b0
__do_page_fault+0x230/0x430
do_page_fault+0x2a/0x70
page_fault+0x7b/0x80
[...]
Mem-Info:
active_anon:126315487 inactive_anon:1612476 isolated_anon:5
active_file:60183 inactive_file:245285 isolated_file:0
unevictable:15657 dirty:286 writeback:1 unstable:0
slab_reclaimable:75543 slab_unreclaimable:2509111
mapped:81814 shmem:31764 pagetables:370616 bounce:0
free:32294031 free_pcp:6233 free_cma:0
Node 0 active_anon:254680388kB inactive_anon:1112760kB active_file:240648kB inactive_file:981168kB unevictable:13368kB isolated(anon):0kB isolated(file):0kB mapped:280240kB dirty:1144kB writeback:0kB shmem:95832kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 81225728kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
Node 1 active_anon:250583072kB inactive_anon:5337144kB active_file:84kB inactive_file:0kB unevictable:49260kB isolated(anon):20kB isolated(file):0kB mapped:47016kB dirty:0kB writeback:4kB shmem:31224kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 31897600kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
The defrag mode is "madvise" and from the above report it is clear that
the THP has been allocated for MADV_HUGEPAGA vma.
Andrea has identified that the main source of the problem is
__GFP_THISNODE usage:
: The problem is that direct compaction combined with the NUMA
: __GFP_THISNODE logic in mempolicy.c is telling reclaim to swap very
: hard the local node, instead of failing the allocation if there's no
: THP available in the local node.
:
: Such logic was ok until __GFP_THISNODE was added to the THP allocation
: path even with MPOL_DEFAULT.
:
: The idea behind the __GFP_THISNODE addition, is that it is better to
: provide local memory in PAGE_SIZE units than to use remote NUMA THP
: backed memory. That largely depends on the remote latency though, on
: threadrippers for example the overhead is relatively low in my
: experience.
:
: The combination of __GFP_THISNODE and __GFP_DIRECT_RECLAIM results in
: extremely slow qemu startup with vfio, if the VM is larger than the
: size of one host NUMA node. This is because it will try very hard to
: unsuccessfully swapout get_user_pages pinned pages as result of the
: __GFP_THISNODE being set, instead of falling back to PAGE_SIZE
: allocations and instead of trying to allocate THP on other nodes (it
: would be even worse without vfio type1 GUP pins of course, except it'd
: be swapping heavily instead).
Fix this by removing __GFP_THISNODE for THP requests which are
requesting the direct reclaim. This effectivelly reverts 5265047ac3
on the grounds that the zone/node reclaim was known to be disruptive due
to premature reclaim when there was memory free. While it made sense at
the time for HPC workloads without NUMA awareness on rare machines, it
was ultimately harmful in the majority of cases. The existing behaviour
is similar, if not as widespare as it applies to a corner case but
crucially, it cannot be tuned around like zone_reclaim_mode can. The
default behaviour should always be to cause the least harm for the
common case.
If there are specialised use cases out there that want zone_reclaim_mode
in specific cases, then it can be built on top. Longterm we should
consider a memory policy which allows for the node reclaim like behavior
for the specific memory ranges which would allow a
[1] http://lkml.kernel.org/r/20180820032204.9591-1-aarcange@redhat.com
Mel said:
: Both patches look correct to me but I'm responding to this one because
: it's the fix. The change makes sense and moves further away from the
: severe stalling behaviour we used to see with both THP and zone reclaim
: mode.
:
: I put together a basic experiment with usemem configured to reference a
: buffer multiple times that is 80% the size of main memory on a 2-socket
: box with symmetric node sizes and defrag set to "always". The defrag
: setting is not the default but it would be functionally similar to
: accessing a buffer with madvise(MADV_HUGEPAGE). Usemem is configured to
: reference the buffer multiple times and while it's not an interesting
: workload, it would be expected to complete reasonably quickly as it fits
: within memory. The results were;
:
: usemem
: vanilla noreclaim-v1
: Amean Elapsd-1 42.78 ( 0.00%) 26.87 ( 37.18%)
: Amean Elapsd-3 27.55 ( 0.00%) 7.44 ( 73.00%)
: Amean Elapsd-4 5.72 ( 0.00%) 5.69 ( 0.45%)
:
: This shows the elapsed time in seconds for 1 thread, 3 threads and 4
: threads referencing buffers 80% the size of memory. With the patches
: applied, it's 37.18% faster for the single thread and 73% faster with two
: threads. Note that 4 threads showing little difference does not indicate
: the problem is related to thread counts. It's simply the case that 4
: threads gets spread so their workload mostly fits in one node.
:
: The overall view from /proc/vmstats is more startling
:
: 4.19.0-rc1 4.19.0-rc1
: vanillanoreclaim-v1r1
: Minor Faults 35593425 708164
: Major Faults 484088 36
: Swap Ins 3772837 0
: Swap Outs 3932295 0
:
: Massive amounts of swap in/out without the patch
:
: Direct pages scanned 6013214 0
: Kswapd pages scanned 0 0
: Kswapd pages reclaimed 0 0
: Direct pages reclaimed 4033009 0
:
: Lots of reclaim activity without the patch
:
: Kswapd efficiency 100% 100%
: Kswapd velocity 0.000 0.000
: Direct efficiency 67% 100%
: Direct velocity 11191.956 0.000
:
: Mostly from direct reclaim context as you'd expect without the patch.
:
: Page writes by reclaim 3932314.000 0.000
: Page writes file 19 0
: Page writes anon 3932295 0
: Page reclaim immediate 42336 0
:
: Writes from reclaim context is never good but the patch eliminates it.
:
: We should never have default behaviour to thrash the system for such a
: basic workload. If zone reclaim mode behaviour is ever desired but on a
: single task instead of a global basis then the sensible option is to build
: a mempolicy that enforces that behaviour.
This was a severe regression compared to previous kernels that made
important workloads unusable and it starts when __GFP_THISNODE was
added to THP allocations under MADV_HUGEPAGE. It is not a significant
risk to go to the previous behavior before __GFP_THISNODE was added, it
worked like that for years.
This was simply an optimization to some lucky workloads that can fit in
a single node, but it ended up breaking the VM for others that can't
possibly fit in a single node, so going back is safe.
[mhocko@suse.com: rewrote the changelog based on the one from Andrea]
Link: http://lkml.kernel.org/r/20180925120326.24392-2-mhocko@kernel.org
Fixes: 5265047ac3 ("mm, thp: really limit transparent hugepage allocation to local node")
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Stefan Priebe <s.priebe@profihost.ag>
Debugged-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Mel Gorman <mgorman@techsingularity.net>
Tested-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Zi Yan <zi.yan@cs.rutgers.edu>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: <stable@vger.kernel.org> [4.1+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4542d623c7 upstream.
Commands with protection information included were not truncating the
protection iov_iter to the number of protection bytes in the command.
This resulted in vhost_scsi mis-calculating the size of the protection
SGL in vhost_scsi_calc_sgls(), and including both the protection and
data SG entries in the protection SGL.
Fixes: 09b13fa8c1 ("vhost/scsi: Add ANY_LAYOUT support in vhost_scsi_handle_vq")
Signed-off-by: Greg Edwards <gedwards@ddn.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Fixes: 09b13fa8c1
Cc: stable@vger.kernel.org
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e9a2310fb6 upstream.
There is a potential execution path in which function
platform_get_resource() returns NULL. If this happens,
we will end up having a NULL pointer dereference.
Fix this by replacing devm_ioremap with devm_ioremap_resource,
which has the NULL check and the memory region request.
This code was detected with the help of Coccinelle.
Cc: stable@vger.kernel.org
Fixes: 97b7129cd2 ("reset: hisilicon: change the definition of hisi_reset_init")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c09bcc91bb upstream.
Reading the registers without waiting for engine idle returns
unpredictable values. These unpredictable values result in display
corruption - if atyfb_imageblit reads the content of DP_PIX_WIDTH with the
bit DP_HOST_TRIPLE_EN set (from previous invocation), the driver would
never ever clear the bit, resulting in display corruption.
We don't want to wait for idle because it would degrade performance, so
this patch modifies the driver so that it never reads accelerator
registers.
HOST_CNTL doesn't have to be read, we can just write it with
HOST_BYTE_ALIGN because no other part of the driver cares if
HOST_BYTE_ALIGN is set.
DP_PIX_WIDTH is written in the functions atyfb_copyarea and atyfb_fillrect
with the default value and in atyfb_imageblit with the value set according
to the source image data.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Ville Syrjälä <syrjala@sci.fi>
Cc: stable@vger.kernel.org
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3c6c6a7878 upstream.
The code for manual bit triple is not endian-clean. It builds the variable
"hostdword" using byte accesses, therefore we must read the variable with
"le32_to_cpu".
The patch also enables (hardware or software) bit triple only if the image
is monochrome (image->depth). If we want to blit full-color image, we
shouldn't use the triple code.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Ville Syrjälä <syrjala@sci.fi>
Cc: stable@vger.kernel.org
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit efe328230d upstream.
This reverts commit 8b8f53af1e.
splice_dentry() is used by three places. For two places, req->r_dentry
is passed to splice_dentry(). In the case of error, req->r_dentry does
not get updated. So splice_dentry() should not drop reference.
Cc: stable@vger.kernel.org # 4.18+
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 94e6992bb5 upstream.
If the read is large enough, we end up spinning in the messenger:
libceph: osd0 192.168.122.1:6801 io error
libceph: osd0 192.168.122.1:6801 io error
libceph: osd0 192.168.122.1:6801 io error
This is a receive side limit, so only reads were affected.
Cc: stable@vger.kernel.org
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 665636b294 upstream.
Fixes the signedness bug returning '(-22)' on the return type by removing the
sanity checker in rockchip_ddrclk_get_parent(). The function should return
and unsigned value only and it's safe to remove the sanity checker as the
core functions that call get_parent like clk_core_get_parent_by_index already
ensures the validity of the clk index returned (index >= core->num_parents).
Fixes: a4f182bf81 ("clk: rockchip: add new clock-type for the ddrclk")
Cc: stable@vger.kernel.org
Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
Reviewed-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0f5cb0e622 upstream.
Commit a982e45dc1 ("clk: at91: PLL recalc_rate() now using cached MUL
and DIV values") removed a check that prevents a division by zero. This
now causes a stacktrace when booting the kernel on a at91 platform if
the PLL DIV register contains zero. This commit reintroduces this check.
Fixes: a982e45dc1 ("clk: at91: PLL recalc_rate() now using cached...")
Cc: <stable@vger.kernel.org>
Signed-off-by: Ronald Wahl <rwahl@gmx.de>
Acked-by: Ludovic Desroches <ludovic.desroches@microchip.com>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8985167ecf upstream.
When driver is built as module and DT node contains clocks compatible
(e.g. "samsung,s2mps11-clk"), the module will not be autoloaded because
module aliases won't match.
The modalias from uevent: of:NclocksT<NULL>Csamsung,s2mps11-clk
The modalias from driver: platform:s2mps11-clk
The devices are instantiated by parent's MFD. However both Device Tree
bindings and parent define the compatible for clocks devices. In case
of module matching this DT compatible will be used.
The issue will not happen if this is a built-in (no need for module
matching) or when clocks DT node does not contain compatible (not
correct from bindings perspective but working for driver).
Note when backporting to stable kernels: adjust the list of device ID
entries.
Cc: <stable@vger.kernel.org>
Fixes: 53c31b3437 ("mfd: sec-core: Add of_compatible strings for clock MFD cells")
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Acked-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 40dc948f23 upstream.
The bootloader may pass physical address of the boot parameters structure
to the MMUv3 kernel in the register a2. Code in the _SetupMMU block in
the arch/xtensa/kernel/head.S is supposed to map that physical address to
the virtual address in the configured virtual memory layout.
This code haven't been updated when additional 256+256 and 512+512
memory layouts were introduced and it may produce wrong addresses when
used with these layouts.
Cc: stable@vger.kernel.org
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0773495b1f upstream.
Xtensa ABI requires stack alignment to be at least 16. In noMMU
configuration ARCH_SLAB_MINALIGN is used to align stack. Make it at
least 16.
This fixes the following runtime error in noMMU configuration, caused by
interaction between insufficiently aligned stack and alloca function,
that results in corruption of on-stack variable in the libc function
glob:
Caught unhandled exception in 'sh' (pid = 47, pc = 0x02d05d65)
- should not happen
EXCCAUSE is 15
Cc: stable@vger.kernel.org
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4119ba211b upstream.
This section collects all source .note.* sections together in the
vmlinux image. Without it .note.Linux section may be placed at address
0, while the rest of the kernel is at its normal address, resulting in a
huge vmlinux.bin image that may not be linked into the xtensa Image.elf.
Cc: stable@vger.kernel.org
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 99a3ae51d5 ]
In the C-code we need to put the physical address of the hpmc handler in
the interrupt vector table (IVA) in order to get HPMCs working. Since
on parisc64 function pointers are indirect (in fact they are function
descriptors) we instead export the address as variable and not as
function.
This reverts a small part of commit f39cce654f ("parisc: Add
cfi_startproc and cfi_endproc to assembly code").
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: <stable@vger.kernel.org> [4.9+]
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d5654e156b ]
Make sure that the HPMC (High Priority Machine Check) handler is 16-byte
aligned and that it's length in the IVT is a multiple of 16 bytes.
Otherwise PDC may decide not to call the HPMC crash handler.
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0ed9d3de5f ]
The os_hpmc_size variable sometimes wasn't aligned at word boundary and thus
triggered the unaligned fault handler at startup.
Fix it by aligning it properly.
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: <stable@vger.kernel.org> # v4.14+
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4dc69c1c1f ]
Using memcpy() from a string that is shorter than the length copied means
the destination buffer is being filled with arbitrary data from the kernel
rodata segment. Instead, use strncpy() which will fill the trailing bytes
with zeros.
This was found with the future CONFIG_FORTIFY_SOURCE feature.
Cc: Daniel Micay <danielmicay@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 44c445c3d1 ]
This patch fixes a race condition that can result into the interface being
up and carrier on, but with transmits disabled in the hardware.
The bug may show up by repeatedly IFF_DOWN+IFF_UP the interface, which
allows e1000_watchdog() interleave with e1000_down().
CPU x CPU y
--------------------------------------------------------------------
e1000_down():
netif_carrier_off()
e1000_watchdog():
if (carrier == off) {
netif_carrier_on();
enable_hw_transmit();
}
disable_hw_transmit();
e1000_watchdog():
/* carrier on, do nothing */
Signed-off-by: Vincenzo Maffione <v.maffione@gmail.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5983587c8c ]
Currently if the stat type is invalid then data[i] is being set
either by dereferencing a null pointer p, or it is reading from
an incorrect previous location if we had a valid stat type
previously. Fix this by skipping over the read of p on an invalid
stat type.
Detected by CoverityScan, CID#113385 ("Explicit null dereferenced")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit bb177a732c upstream.
syzbot has noticed that a specially crafted library can easily hit
VM_BUG_ON in __mm_populate
kernel BUG at mm/gup.c:1242!
invalid opcode: 0000 [#1] SMP
CPU: 2 PID: 9667 Comm: a.out Not tainted 4.18.0-rc3 #644
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/19/2017
RIP: 0010:__mm_populate+0x1e2/0x1f0
Code: 55 d0 65 48 33 14 25 28 00 00 00 89 d8 75 21 48 83 c4 20 5b 41 5c 41 5d 41 5e 41 5f 5d c3 e8 75 18 f1 ff 0f 0b e8 6e 18 f1 ff <0f> 0b 31 db eb c9 e8 93 06 e0 ff 0f 1f 00 55 48 89 e5 53 48 89 fb
Call Trace:
vm_brk_flags+0xc3/0x100
vm_brk+0x1f/0x30
load_elf_library+0x281/0x2e0
__ia32_sys_uselib+0x170/0x1e0
do_fast_syscall_32+0xca/0x420
entry_SYSENTER_compat+0x70/0x7f
The reason is that the length of the new brk is not page aligned when we
try to populate the it. There is no reason to bug on that though.
do_brk_flags already aligns the length properly so the mapping is
expanded as it should. All we need is to tell mm_populate about it.
Besides that there is absolutely no reason to to bug_on in the first
place. The worst thing that could happen is that the last page wouldn't
get populated and that is far from putting system into an inconsistent
state.
Fix the issue by moving the length sanitization code from do_brk_flags
up to vm_brk_flags. The only other caller of do_brk_flags is brk
syscall entry and it makes sure to provide the proper length so t here
is no need for sanitation and so we can use do_brk_flags without it.
Also remove the bogus BUG_ONs.
[osalvador@techadventures.net: fix up vm_brk_flags s@request@len@]
Link: http://lkml.kernel.org/r/20180706090217.GI32658@dhcp22.suse.cz
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: syzbot <syzbot+5dcb560fe12aa5091c06@syzkaller.appspotmail.com>
Tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Zi Yan <zi.yan@cs.rutgers.edu>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 4.9:
- There is no do_brk_flags() function; update do_brk()
- Adjust context]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 908a572b80 upstream.
Using waitqueue_active() is racy. Make sure we issue a wake_up()
unconditionally after storing into fc->blocked. After that it's okay to
optimize with waitqueue_active() since the first wake up provides the
necessary barrier for all waiters, not the just the woken one.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 3c18ef8117 ("fuse: optimize wake_up")
Cc: <stable@vger.kernel.org> # v3.10
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e4f3aa2e1e upstream.
There is another cast from unsigned long to int which causes
a bounds check to fail with specially crafted input. The value is
then used as an index in the slot array in cdrom_slot_status().
This issue is similar to CVE-2018-16658 and CVE-2018-10940.
Signed-off-by: Young_X <YangX92@hotmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b4dc44b3ca ]
the 9p client code overwrites our glock.client_id pointing to a static
buffer by an allocated string holding the network provided value which
we do not care about; free and reset the value as appropriate.
This is almost identical to the leak in v9fs_file_getlock() fixed by
Al Viro in commit ce85dd58ad ("9p: we are leaking glock.client_id
in v9fs_file_getlock()"), which was returned as an error by a coverity
false positive -- while we are here attempt to make the code slightly
more robust to future change of the net/9p/client code and hopefully
more clear to coverity that there is no problem.
Link: http://lkml.kernel.org/r/1536339057-21974-5-git-send-email-asmadeus@codewreck.org
Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 693b31b2fc ]
Test tm-tmspr might exit before all threads stop executing, because it just
waits for the very last thread to join before proceeding/exiting.
This patch makes sure that all threads that were created will join before
proceeding/exiting.
This patch also guarantees that the amount of threads being created is equal
to thread_num.
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit bd24db0410 ]
The driver ignored the width alignment which exists due to the UYVY
colorspace format. Fix the width alignment and make use of the the
provided v4l2 helper function to set the width, height and all
alignments in one.
Fixes: 963ddc63e2 ("[media] media: tvp5150: Add cropping support")
Signed-off-by: Marco Felsch <m.felsch@pengutronix.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 8344498721 ]
The SC16IS752 is a dual-channel device. The two channels are largely
independent, but the IRQ signals are wired together as an open-drain,
active low signal which will be driven low while either of the
channels requires attention, which can be for significant periods of
time until operations complete and the interrupt can be acknowledged.
In that respect it is should be treated as a true level-sensitive IRQ.
The kernel, however, needs to be able to exit interrupt context in
order to use I2C or SPI to access the device registers (which may
involve sleeping). Therefore the interrupt needs to be masked out or
paused in some way.
The usual way to manage sleeping from within an interrupt handler
is to use a threaded interrupt handler - a regular interrupt routine
does the minimum amount of work needed to triage the interrupt before
waking the interrupt service thread. If the threaded IRQ is marked as
IRQF_ONESHOT the kernel will automatically mask out the interrupt
until the thread runs to completion. The sc16is7xx driver used to
use a threaded IRQ, but a patch switched to using a kthread_worker
in order to set realtime priorities on the handler thread and for
other optimisations. The end result is non-threaded IRQ that
schedules some work then returns IRQ_HANDLED, making the kernel
think that all IRQ processing has completed.
The work-around to prevent a constant stream of interrupts is to
mark the interrupt as edge-sensitive rather than level-sensitive,
but interpreting an active-low source as a falling-edge source
requires care to prevent a total cessation of interrupts. Whereas
an edge-triggering source will generate a new edge for every interrupt
condition a level-triggering source will keep the signal at the
interrupting level until it no longer requires attention; in other
words, the host won't see another edge until all interrupt conditions
are cleared. It is therefore vital that the interrupt handler does not
exit with an outstanding interrupt condition, otherwise the kernel
will not receive another interrupt unless some other operation causes
the interrupt state on the device to be cleared.
The existing sc16is7xx driver has a very simple interrupt "thread"
(kthread_work job) that processes interrupts on each channel in turn
until there are no more. If both channels are active and the first
channel starts interrupting while the handler for the second channel
is running then it will not be detected and an IRQ stall ensues. This
could be handled easily if there was a shared IRQ status register, or
a convenient way to determine if the IRQ had been deasserted for any
length of time, but both appear to be lacking.
Avoid this problem (or at least make it much less likely to happen)
by reducing the granularity of per-channel interrupt processing
to one condition per iteration, only exiting the overall loop when
both channels are no longer interrupting.
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ee9d21b3b3 ]
When building with clang crt0's _zimage_start is not marked weak, which
breaks the build when linking the kernel image:
$ objdump -t arch/powerpc/boot/crt0.o |grep _zimage_start$
0000000000000058 g .text 0000000000000000 _zimage_start
ld: arch/powerpc/boot/wrapper.a(crt0.o): in function '_zimage_start':
(.text+0x58): multiple definition of '_zimage_start';
arch/powerpc/boot/pseries-head.o:(.text+0x0): first defined here
Clang requires the .weak directive to appear after the symbol is
declared. The binutils manual says:
This directive sets the weak attribute on the comma separated list of
symbol names. If the symbols do not already exist, they will be
created.
So it appears this is different with clang. The only reference I could
see for this was an OpenBSD mailing list post[1].
Changing it to be after the declaration fixes building with Clang, and
still works with GCC.
$ objdump -t arch/powerpc/boot/crt0.o |grep _zimage_start$
0000000000000058 w .text 0000000000000000 _zimage_start
Reported to clang as https://bugs.llvm.org/show_bug.cgi?id=38921
[1] https://groups.google.com/forum/#!topic/fa.openbsd.tech/PAgKKen2YCY
Signed-off-by: Joel Stanley <joel@jms.id.au>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c5d59528e2 ]
altera_hw_filt_init() which calls append_internal() assumes
that the node was successfully linked in while in fact it can
silently fail. So the call-site needs to set return to -ENOMEM
on append_internal() returning NULL and exit through the err path.
Fixes: 349bcf02e3 ("[media] Altera FPGA based CI driver module")
Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 538f66ba20 ]
A DMM timeout "timed out waiting for done" has been observed on DRA7
devices. The timeout happens rarely, and only when the system is under
heavy load.
Debugging showed that the timeout can be made to happen much more
frequently by optimizing the DMM driver, so that there's almost no code
between writing the last DMM descriptors to RAM, and writing to DMM
register which starts the DMM transaction.
The current theory is that a wmb() does not properly ensure that the
data written to RAM is observable by all the components in the system.
This DMM timeout has caused interesting (and rare) bugs as the error
handling was not functioning properly (the error handling has been fixed
in previous commits):
* If a DMM timeout happened when a GEM buffer was being pinned for
display on the screen, a timeout error would be shown, but the driver
would continue programming DSS HW with broken buffer, leading to
SYNCLOST floods and possible crashes.
* If a DMM timeout happened when other user (say, video decoder) was
pinning a GEM buffer, a timeout would be shown but if the user
handled the error properly, no other issues followed.
* If a DMM timeout happened when a GEM buffer was being released, the
driver does not even notice the error, leading to crashes or hang
later.
This patch adds wmb() and readl() calls after the last bit is written to
RAM, which should ensure that the execution proceeds only after the data
is actually in RAM, and thus observable by DMM.
The read-back should not be needed. Further study is required to understand
if DMM is somehow special case and read-back is ok, or if DRA7's memory
barriers do not work correctly.
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit f5e284803a ]
When enumerating page size definitions to check hardware support,
we construct a constant which is (1U << (def->shift - 10)).
However, the array of page size definitions is only initalised for
various MMU_PAGE_* constants, so it contains a number of 0-initialised
elements with def->shift == 0. This means we end up shifting by a
very large number, which gives the following UBSan splat:
================================================================================
UBSAN: Undefined behaviour in /home/dja/dev/linux/linux/arch/powerpc/mm/tlb_nohash.c:506:21
shift exponent 4294967286 is too large for 32-bit type 'unsigned int'
CPU: 0 PID: 0 Comm: swapper Not tainted 4.19.0-rc3-00045-ga604f927b012-dirty #6
Call Trace:
[c00000000101bc20] [c000000000a13d54] .dump_stack+0xa8/0xec (unreliable)
[c00000000101bcb0] [c0000000004f20a8] .ubsan_epilogue+0x18/0x64
[c00000000101bd30] [c0000000004f2b10] .__ubsan_handle_shift_out_of_bounds+0x110/0x1a4
[c00000000101be20] [c000000000d21760] .early_init_mmu+0x1b4/0x5a0
[c00000000101bf10] [c000000000d1ba28] .early_setup+0x100/0x130
[c00000000101bf90] [c000000000000528] start_here_multiplatform+0x68/0x80
================================================================================
Fix this by first checking if the element exists (shift != 0) before
constructing the constant.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 35d3cbe845 ]
Andreas Müller reports:
"Fixes:
| Sep 04 09:05:10 imx6qdl-variscite-som systemd-udevd[220]: Failed to apply ACL on /dev/v4l-subdev0: Operation not supported
| Sep 04 09:05:10 imx6qdl-variscite-som systemd-udevd[224]: Failed to apply ACL on /dev/v4l-subdev1: Operation not supported
| Sep 04 09:05:10 imx6qdl-variscite-som systemd-udevd[215]: Failed to apply ACL on /dev/v4l-subdev10: Operation not supported
| Sep 04 09:05:10 imx6qdl-variscite-som systemd-udevd[228]: Failed to apply ACL on /dev/v4l-subdev2: Operation not supported
| Sep 04 09:05:10 imx6qdl-variscite-som systemd-udevd[232]: Failed to apply ACL on /dev/v4l-subdev5: Operation not supported
| Sep 04 09:05:10 imx6qdl-variscite-som systemd-udevd[217]: Failed to apply ACL on /dev/v4l-subdev11: Operation not supported
| Sep 04 09:05:10 imx6qdl-variscite-som systemd-udevd[214]: Failed to apply ACL on /dev/dri/card1: Operation not supported
| Sep 04 09:05:10 imx6qdl-variscite-som systemd-udevd[216]: Failed to apply ACL on /dev/v4l-subdev8: Operation not supported
| Sep 04 09:05:10 imx6qdl-variscite-som systemd-udevd[226]: Failed to apply ACL on /dev/v4l-subdev9: Operation not supported
and nasty follow-ups: Starting weston from sddm as unpriviledged user fails
with some hints on missing access rights."
Select the CONFIG_TMPFS_POSIX_ACL option to fix these issues.
Reported-by: Andreas Müller <schnitzeltony@gmail.com>
Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com>
Acked-by: Otavio Salvador <otavio@ossystems.com.br>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit f9bc28aedf ]
If an error occurs during an unplug operation, it's possible for
eeh_dump_dev_log() to be called when edev->pdn is null, which
currently leads to dereferencing a null pointer.
Handle this by skipping the error log for those devices.
Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ad22cf6ea4 upstream.
We can't use entry->bytes if our entry is a bitmap entry, we need to use
entry->max_extent_size in that case. Fix up all the logic to make this
consistent.
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3527a018c0 upstream.
At inode.c:compress_file_range(), under the "free_pages_out" label, we can
end up dereferencing the "pages" pointer when it has a NULL value. This
case happens when "start" has a value of 0 and we fail to allocate memory
for the "pages" pointer. When that happens we jump to the "cont" label and
then enter the "if (start == 0)" branch where we immediately call the
cow_file_range_inline() function. If that function returns 0 (success
creating an inline extent) or an error (like -ENOMEM for example) we jump
to the "free_pages_out" label and then access "pages[i]" leading to a NULL
pointer dereference, since "nr_pages" has a value greater than zero at
that point.
Fix this by setting "nr_pages" to 0 when we fail to allocate memory for
the "pages" pointer.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201119
Fixes: 771ed689d2 ("Btrfs: Optimize compressed writeback and reads")
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9c7b0c2e8d upstream.
[BUG]
In the following case, rescan won't zero out the number of qgroup 1/0:
$ mkfs.btrfs -fq $DEV
$ mount $DEV /mnt
$ btrfs quota enable /mnt
$ btrfs qgroup create 1/0 /mnt
$ btrfs sub create /mnt/sub
$ btrfs qgroup assign 0/257 1/0 /mnt
$ dd if=/dev/urandom of=/mnt/sub/file bs=1k count=1000
$ btrfs sub snap /mnt/sub /mnt/snap
$ btrfs quota rescan -w /mnt
$ btrfs qgroup show -pcre /mnt
qgroupid rfer excl max_rfer max_excl parent child
-------- ---- ---- -------- -------- ------ -----
0/5 16.00KiB 16.00KiB none none --- ---
0/257 1016.00KiB 16.00KiB none none 1/0 ---
0/258 1016.00KiB 16.00KiB none none --- ---
1/0 1016.00KiB 16.00KiB none none --- 0/257
So far so good, but:
$ btrfs qgroup remove 0/257 1/0 /mnt
WARNING: quotas may be inconsistent, rescan needed
$ btrfs quota rescan -w /mnt
$ btrfs qgroup show -pcre /mnt
qgoupid rfer excl max_rfer max_excl parent child
-------- ---- ---- -------- -------- ------ -----
0/5 16.00KiB 16.00KiB none none --- ---
0/257 1016.00KiB 16.00KiB none none --- ---
0/258 1016.00KiB 16.00KiB none none --- ---
1/0 1016.00KiB 16.00KiB none none --- ---
^^^^^^^^^^ ^^^^^^^^ not cleared
[CAUSE]
Before rescan we call qgroup_rescan_zero_tracking() to zero out all
qgroups' accounting numbers.
However we don't mark all qgroups dirty, but rely on rescan to do so.
If we have any high level qgroup without children, it won't be marked
dirty during rescan, since we cannot reach that qgroup.
This will cause QGROUP_INFO items of childless qgroups never get updated
in the quota tree, thus their numbers will stay the same in "btrfs
qgroup show" output.
[FIX]
Just mark all qgroups dirty in qgroup_rescan_zero_tracking(), so even if
we have childless qgroups, their QGROUP_INFO items will still get
updated during rescan.
Reported-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Tested-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0f375eed92 upstream.
In a scenario like the following:
mkdir /mnt/A # inode 258
mkdir /mnt/B # inode 259
touch /mnt/B/bar # inode 260
sync
mv /mnt/B/bar /mnt/A/bar
mv -T /mnt/A /mnt/B
fsync /mnt/B/bar
<power fail>
After replaying the log we end up with file bar having 2 hard links, both
with the name 'bar' and one in the directory with inode number 258 and the
other in the directory with inode number 259. Also, we end up with the
directory inode 259 still existing and with the directory inode 258 still
named as 'A', instead of 'B'. In this scenario, file 'bar' should only
have one hard link, located at directory inode 258, the directory inode
259 should not exist anymore and the name for directory inode 258 should
be 'B'.
This incorrect behaviour happens because when attempting to log the old
parents of an inode, we skip any parents that no longer exist. Fix this
by forcing a full commit if an old parent no longer exists.
A test case for fstests follows soon.
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 545e3366db upstream.
Allocating new chunks modifies both the extent and chunk tree, which can
trigger new chunk allocations. So instead of doing list_for_each_safe,
just do while (!list_empty()) so we make sure we don't exit with other
pending bg's still on our list.
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3aa7c7a31c upstream.
While testing my backport I noticed there was a panic if I ran
generic/416 generic/417 generic/418 all in a row. This just happened to
uncover a race where we had outstanding IO after we destroy all of our
workqueues, and then we'd go to queue the endio work on those free'd
workqueues.
This is because we aren't waiting for the caching threads to be done
before freeing everything up, so to fix this make sure we wait on any
outstanding caching that's being done before we free up the block group,
so we're sure to be done with all IO by the time we get to
btrfs_stop_all_workers(). This fixes the panic I was seeing
consistently in testing.
------------[ cut here ]------------
kernel BUG at fs/btrfs/volumes.c:6112!
SMP PTI
Modules linked in:
CPU: 1 PID: 27165 Comm: kworker/u4:7 Not tainted 4.16.0-02155-g3553e54a578d-dirty #875
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
Workqueue: btrfs-cache btrfs_cache_helper
RIP: 0010:btrfs_map_bio+0x346/0x370
RSP: 0000:ffffc900061e79d0 EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffff880071542e00 RCX: 0000000000533000
RDX: ffff88006bb74380 RSI: 0000000000000008 RDI: ffff880078160000
RBP: 0000000000000001 R08: ffff8800781cd200 R09: 0000000000503000
R10: ffff88006cd21200 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: ffff8800781cd200 R15: ffff880071542e00
FS: 0000000000000000(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000817ffc4 CR3: 0000000078314000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
btree_submit_bio_hook+0x8a/0xd0
submit_one_bio+0x5d/0x80
read_extent_buffer_pages+0x18a/0x320
btree_read_extent_buffer_pages+0xbc/0x200
? alloc_extent_buffer+0x359/0x3e0
read_tree_block+0x3d/0x60
read_block_for_search.isra.30+0x1a5/0x360
btrfs_search_slot+0x41b/0xa10
btrfs_next_old_leaf+0x212/0x470
caching_thread+0x323/0x490
normal_work_helper+0xc5/0x310
process_one_work+0x141/0x340
worker_thread+0x44/0x3c0
kthread+0xf8/0x130
? process_one_work+0x340/0x340
? kthread_bind+0x10/0x10
ret_from_fork+0x35/0x40
RIP: btrfs_map_bio+0x346/0x370 RSP: ffffc900061e79d0
---[ end trace 827eb13e50846033 ]---
Kernel panic - not syncing: Fatal exception
Kernel Offset: disabled
---[ end Kernel panic - not syncing: Fatal exception
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0be88e367f upstream.
We check whether any device the file system is using supports discard in
the ioctl call, but then we attempt to trim free extents on every device
regardless of whether discard is supported. Due to the way we mask off
EOPNOTSUPP, we can end up issuing the trim operations on each free range
on devices that don't support it, just wasting time.
Fixes: 499f377f49 ("btrfs: iterate over unused chunk space in FITRIM")
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d4e329de5e upstream.
btrfs_trim_fs iterates over the fs_devices->alloc_list while holding the
device_list_mutex. The problem is that ->alloc_list is protected by the
chunk mutex. We don't want to hold the chunk mutex over the trim of the
entire file system. Fortunately, the ->dev_list list is protected by
the dev_list mutex and while it will give us all devices, including
read-only devices, we already just skip the read-only devices. Then we
can continue to take and release the chunk mutex while scanning each
device.
Fixes: 499f377f49 ("btrfs: iterate over unused chunk space in FITRIM")
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b72c3aba09 upstream.
[BUG]
For certain crafted image, whose csum root leaf has missing backref, if
we try to trigger write with data csum, it could cause deadlock with the
following kernel WARN_ON():
WARNING: CPU: 1 PID: 41 at fs/btrfs/locking.c:230 btrfs_tree_lock+0x3e2/0x400
CPU: 1 PID: 41 Comm: kworker/u4:1 Not tainted 4.18.0-rc1+ #8
Workqueue: btrfs-endio-write btrfs_endio_write_helper
RIP: 0010:btrfs_tree_lock+0x3e2/0x400
Call Trace:
btrfs_alloc_tree_block+0x39f/0x770
__btrfs_cow_block+0x285/0x9e0
btrfs_cow_block+0x191/0x2e0
btrfs_search_slot+0x492/0x1160
btrfs_lookup_csum+0xec/0x280
btrfs_csum_file_blocks+0x2be/0xa60
add_pending_csums+0xaf/0xf0
btrfs_finish_ordered_io+0x74b/0xc90
finish_ordered_fn+0x15/0x20
normal_work_helper+0xf6/0x500
btrfs_endio_write_helper+0x12/0x20
process_one_work+0x302/0x770
worker_thread+0x81/0x6d0
kthread+0x180/0x1d0
ret_from_fork+0x35/0x40
[CAUSE]
That crafted image has missing backref for csum tree root leaf. And
when we try to allocate new tree block, since there is no
EXTENT/METADATA_ITEM for csum tree root, btrfs consider it's free slot
and use it.
The extent tree of the image looks like:
Normal image | This fuzzed image
----------------------------------+--------------------------------
BG 29360128 | BG 29360128
One empty slot | One empty slot
29364224: backref to UUID tree | 29364224: backref to UUID tree
Two empty slots | Two empty slots
29376512: backref to CSUM tree | One empty slot (bad type) <<<
29380608: backref to D_RELOC tree | 29380608: backref to D_RELOC tree
... | ...
Since bytenr 29376512 has no METADATA/EXTENT_ITEM, when btrfs try to
alloc tree block, it's an valid slot for btrfs.
And for finish_ordered_write, when we need to insert csum, we try to CoW
csum tree root.
By accident, empty slots at bytenr BG_OFFSET, BG_OFFSET + 8K,
BG_OFFSET + 12K is already used by tree block COW for other trees, the
next empty slot is BG_OFFSET + 16K, which should be the backref for CSUM
tree.
But due to the bad type, btrfs can recognize it and still consider it as
an empty slot, and will try to use it for csum tree CoW.
Then in the following call trace, we will try to lock the new tree
block, which turns out to be the old csum tree root which is already
locked:
btrfs_search_slot() called on csum tree root, which is at 29376512
|- btrfs_cow_block()
|- btrfs_set_lock_block()
| |- Now locks tree block 29376512 (old csum tree root)
|- __btrfs_cow_block()
|- btrfs_alloc_tree_block()
|- btrfs_reserve_extent()
| Now it returns tree block 29376512, which extent tree
| shows its empty slot, but it's already hold by csum tree
|- btrfs_init_new_buffer()
|- btrfs_tree_lock()
| Triggers WARN_ON(eb->lock_owner == current->pid)
|- wait_event()
Wait lock owner to release the lock, but it's
locked by ourself, so it will deadlock
[FIX]
This patch will do the lock_owner and current->pid check at
btrfs_init_new_buffer().
So above deadlock can be avoided.
Since such problem can only happen in crafted image, we will still
trigger kernel warning for later aborted transaction, but with a little
more meaningful warning message.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=200405
Reported-by: Xu Wen <wen.xu@gatech.edu>
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 65c6e82bec upstream.
[BUG]
When mounting certain crafted image, btrfs will trigger kernel BUG_ON()
when trying to recover balance:
kernel BUG at fs/btrfs/extent-tree.c:8956!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 1 PID: 662 Comm: mount Not tainted 4.18.0-rc1-custom+ #10
RIP: 0010:walk_up_proc+0x336/0x480 [btrfs]
RSP: 0018:ffffb53540c9b890 EFLAGS: 00010202
Call Trace:
walk_up_tree+0x172/0x1f0 [btrfs]
btrfs_drop_snapshot+0x3a4/0x830 [btrfs]
merge_reloc_roots+0xe1/0x1d0 [btrfs]
btrfs_recover_relocation+0x3ea/0x420 [btrfs]
open_ctree+0x1af3/0x1dd0 [btrfs]
btrfs_mount_root+0x66b/0x740 [btrfs]
mount_fs+0x3b/0x16a
vfs_kern_mount.part.9+0x54/0x140
btrfs_mount+0x16d/0x890 [btrfs]
mount_fs+0x3b/0x16a
vfs_kern_mount.part.9+0x54/0x140
do_mount+0x1fd/0xda0
ksys_mount+0xba/0xd0
__x64_sys_mount+0x21/0x30
do_syscall_64+0x60/0x210
entry_SYSCALL_64_after_hwframe+0x49/0xbe
[CAUSE]
Extent tree corruption. In this particular case, reloc tree root's
owner is DATA_RELOC_TREE (should be TREE_RELOC), thus its backref is
corrupted and we failed the owner check in walk_up_tree().
[FIX]
It's pretty hard to take care of every extent tree corruption, but at
least we can remove such BUG_ON() and exit more gracefully.
And since in this particular image, DATA_RELOC_TREE and TREE_RELOC share
the same root (which is obviously invalid), we needs to make
__del_reloc_root() more robust to detect such invalid sharing to avoid
possible NULL dereference as root->node can be NULL in this case.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=200411
Reported-by: Xu Wen <wen.xu@gatech.edu>
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1dc6bd5e39 upstream.
Fix child-node lookup during probe, which ended up searching the whole
device tree depth-first starting at the parent rather than just matching
on its children.
To make things worse, the parent pmc node could end up being prematurely
freed as of_find_node_by_name() drops a reference to its first argument.
Fixes: 3568df3d31 ("soc: tegra: Add thermal reset (thermtrip) support to PMC")
Cc: stable <stable@vger.kernel.org> # 4.0
Cc: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 940c620d6a upstream.
Currently a failed allocation of channel->name leads to an
immediate return without freeing channel. Fix this by setting
ret to -ENOMEM and jumping to an exit path that kfree's channel.
Detected by CoverityScan, CID#1473692 ("Resource Leak")
Fixes: 53e2822e56 ("rpmsg: Introduce Qualcomm SMD backend")
Cc: stable@vger.kernel.org
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2a6c7c367d upstream.
x0 is not callee-saved in the PCS. So there is no need to specify
-fcall-used-x0.
Clang doesn't currently support -fcall-used flags. This patch will help
building the kernel with clang.
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Tri Vo <trong@android.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit afeaade90d upstream.
The v4l2-compliance tool complains if a video doesn't start
with a zero sequence number.
While this shouldn't cause any real problem for apps, let's
make it happier, in order to better check the v4l2-compliance
differences before and after patchsets.
This is actually an old issue. It is there since at least its
videobuf2 conversion, e. g. changeset 3829fadc461 ("[media]
em28xx: convert to videobuf2"), if VB1 wouldn't suffer from
the same issue.
Cc: stable@vger.kernel.org
Fixes: d3829fadc4 ("[media] em28xx: convert to videobuf2")
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d3132b3860 upstream.
Commit a856531951 ("xen: make xen_qlock_wait() nestable")
introduced a regression for Xen guests running fully virtualized
(HVM or PVH mode). The Xen hypervisor wouldn't return from the poll
hypercall with interrupts disabled in case of an interrupt (for PV
guests it does).
So instead of disabling interrupts in xen_qlock_wait() use a nesting
counter to avoid calling xen_clear_irq_pending() in case
xen_qlock_wait() is nested.
Fixes: a856531951 ("xen: make xen_qlock_wait() nestable")
Cc: stable@vger.kernel.org
Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Tested-by: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1bd54d851f upstream.
kgdboc_option_setup does not check input argument before passing it
to strlen. The argument would be a NULL pointer if "ekgdboc", without
its value, is set in command line and thus cause the following panic.
PANIC: early exception 0xe3 IP 10:ffffffff8fbbb620 error 0 cr2 0x0
[ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.18-rc8+ #1
[ 0.000000] RIP: 0010:strlen+0x0/0x20
...
[ 0.000000] Call Trace
[ 0.000000] ? kgdboc_option_setup+0x9/0xa0
[ 0.000000] ? kgdboc_early_init+0x6/0x1b
[ 0.000000] ? do_early_param+0x4d/0x82
[ 0.000000] ? parse_args+0x212/0x330
[ 0.000000] ? rdinit_setup+0x26/0x26
[ 0.000000] ? parse_early_options+0x20/0x23
[ 0.000000] ? rdinit_setup+0x26/0x26
[ 0.000000] ? parse_early_param+0x2d/0x39
[ 0.000000] ? setup_arch+0x2f7/0xbf4
[ 0.000000] ? start_kernel+0x5e/0x4c2
[ 0.000000] ? load_ucode_bsp+0x113/0x12f
[ 0.000000] ? secondary_startup_64+0xa5/0xb0
This patch adds a check to prevent the panic.
Cc: stable@vger.kernel.org
Cc: jason.wessel@windriver.com
Cc: gregkh@linuxfoundation.org
Cc: jslaby@suse.com
Signed-off-by: He Zhe <zhe.he@windriver.com>
Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 250854eed5 upstream.
When the OSD is on (i.e. vivid displays text on top of the test pattern), and
you enable hflip, then the driver crashes.
The cause turned out to be a division of a negative number by an unsigned value.
You expect that -8 / 2U would be -4, but in reality it is 2147483644 :-(
Fixes: 3e14e7a82c ("vivid-tpg: add hor/vert downsampling support to tpg_gen_text")
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Reported-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Cc: <stable@vger.kernel.org> # for v4.1 and up
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3f2aa244ee upstream.
Fix a TURBOchannel support regression with commit 205e1b7f51
("dma-mapping: warn when there is no coherent_dma_mask") that caused
coherent DMA allocations to produce a warning such as:
defxx: v1.11 2014/07/01 Lawrence V. Stefani and others
tc1: DEFTA at MMIO addr = 0x1e900000, IRQ = 20, Hardware addr = 08-00-2b-a3-a3-29
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1 at ./include/linux/dma-mapping.h:516 dfx_dev_register+0x670/0x678
Modules linked in:
CPU: 0 PID: 1 Comm: swapper Not tainted 4.19.0-rc6 #2
Stack : ffffffff8009ffc0 fffffffffffffec0 0000000000000000 ffffffff80647650
0000000000000000 0000000000000000 ffffffff806f5f80 ffffffffffffffff
0000000000000000 0000000000000000 0000000000000001 ffffffff8065d4e8
98000000031b6300 ffffffff80563478 ffffffff805685b0 ffffffffffffffff
0000000000000000 ffffffff805d6720 0000000000000204 ffffffff80388df8
0000000000000000 0000000000000009 ffffffff8053efd0 ffffffff806657d0
0000000000000000 ffffffff803177f8 0000000000000000 ffffffff806d0000
9800000003078000 980000000307b9e0 000000001e900000 ffffffff80067940
0000000000000000 ffffffff805d6720 0000000000000204 ffffffff80388df8
ffffffff805176c0 ffffffff8004dc78 0000000000000000 ffffffff80067940
...
Call Trace:
[<ffffffff8004dc78>] show_stack+0xa0/0x130
[<ffffffff80067940>] __warn+0x128/0x170
---[ end trace b1d1e094f67f3bb2 ]---
This is because the TURBOchannel bus driver fails to set the coherent
DMA mask for devices enumerated.
Set the regular and coherent DMA masks for TURBOchannel devices then,
observing that the bus protocol supports a 34-bit (16GiB) DMA address
space, by interpreting the value presented in the address cycle across
the 32 `ad' lines as a 32-bit word rather than byte address[1]. The
architectural size of the TURBOchannel DMA address space exceeds the
maximum amount of RAM any actual TURBOchannel system in existence may
have, hence both masks are the same.
This removes the warning shown above.
References:
[1] "TURBOchannel Hardware Specification", EK-369AA-OD-007B, Digital
Equipment Corporation, January 1993, Section "DMA", pp. 1-15 -- 1-17
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Paul Burton <paul.burton@mips.com>
Patchwork: https://patchwork.linux-mips.org/patch/20835/
Fixes: 205e1b7f51 ("dma-mapping: warn when there is no coherent_dma_mask")
Cc: stable@vger.kernel.org # 4.16+
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 800a7340ab upstream.
In copy_params(), the struct 'dm_ioctl' is first copied from the user
space buffer 'user' to 'param_kernel' and the field 'data_size' is
checked against 'minimum_data_size' (size of 'struct dm_ioctl' payload
up to its 'data' member). If the check fails, an error code EINVAL will be
returned. Otherwise, param_kernel->data_size is used to do a second copy,
which copies from the same user-space buffer to 'dmi'. After the second
copy, only 'dmi->data_size' is checked against 'param_kernel->data_size'.
Given that the buffer 'user' resides in the user space, a malicious
user-space process can race to change the content in the buffer between
the two copies. This way, the attacker can inject inconsistent data
into 'dmi' (versus previously validated 'param_kernel').
Fix redundant copying of 'minimum_data_size' from user-space buffer by
using the first copy stored in 'param_kernel'. Also remove the
'data_size' check after the second copy because it is now unnecessary.
Cc: stable@vger.kernel.org
Signed-off-by: Wenwen Wang <wang6495@umn.edu>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 943cff67b8 upstream.
The intention of nfs4_session_set_rwsize() was to cap the r/wsize to the
buffer sizes negotiated by the CREATE_SESSION. The initial code had a
bug whereby we would not check the values negotiated by nfs_probe_fsinfo()
(the assumption being that CREATE_SESSION will always negotiate buffer values
that are sane w.r.t. the server's preferred r/wsizes) but would only check
values set by the user in the 'mount' command.
The code was changed in 4.11 to _always_ set the r/wsize, meaning that we
now never use the server preferred r/wsizes. This is the regression that
this patch fixes.
Also rename the function to nfs4_session_limit_rwsize() in order to avoid
future confusion.
Fixes: 033853325f (NFSv4.1 respect server's max size in CREATE_SESSION")
Cc: stable@vger.kernel.org # v4.11+
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 746a923b86 upstream.
Commit 1e77d0a1ed ("genirq: Sanitize spurious interrupt detection of
threaded irqs") made detection of spurious interrupts work for threaded
handlers by:
a) incrementing a counter every time the thread returns IRQ_HANDLED, and
b) checking whether that counter has increased every time the thread is
woken.
However for oneshot interrupts, the commit unmasks the interrupt before
incrementing the counter. If another interrupt occurs right after
unmasking but before the counter is incremented, that interrupt is
incorrectly considered spurious:
time
| irq_thread()
| irq_thread_fn()
| action->thread_fn()
| irq_finalize_oneshot()
| unmask_threaded_irq() /* interrupt is unmasked */
|
| /* interrupt fires, incorrectly deemed spurious */
|
| atomic_inc(&desc->threads_handled); /* counter is incremented */
v
This is observed with a hi3110 CAN controller receiving data at high volume
(from a separate machine sending with "cangen -g 0 -i -x"): The controller
signals a huge number of interrupts (hundreds of millions per day) and
every second there are about a dozen which are deemed spurious.
In theory with high CPU load and the presence of higher priority tasks, the
number of incorrectly detected spurious interrupts might increase beyond
the 99,900 threshold and cause disablement of the interrupt.
In practice it just increments the spurious interrupt count. But that can
cause people to waste time investigating it over and over.
Fix it by moving the accounting before the invocation of
irq_finalize_oneshot().
[ tglx: Folded change log update ]
Fixes: 1e77d0a1ed ("genirq: Sanitize spurious interrupt detection of threaded irqs")
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Mathias Duckeck <m.duckeck@kunbus.de>
Cc: Akshay Bhat <akshay.bhat@timesys.com>
Cc: Casey Fitzpatrick <casey.fitzpatrick@timesys.com>
Cc: stable@vger.kernel.org # v3.16+
Link: https://lkml.kernel.org/r/1dfd8bbd16163940648045495e3e9698e63b50ad.1539867047.git.lukas@wunner.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 926674de67 upstream.
Some servers (e.g. Azure) do not include a spnego blob in the SMB3
negotiate protocol response, so on kerberos mounts ("sec=krb5")
we can fail, as we expected the server to list its supported
auth types (OIDs in the spnego blob in the negprot response).
Change this so that on krb5 mounts we default to trying krb5 if the
server doesn't list its supported protocol mechanisms.
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1e77a8c204 upstream.
If backupuid mount option is sent, we can incorrectly retry
(on access denied on query info) with a cifs (FindFirst) operation
on an smb3 mount which causes the server to force the session close.
We set backup intent on open so no need for this fallback.
See kernel bugzilla 201435
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit aea835f2dc upstream.
When channels are registered, the hardware channel number is not the
actual iio channel number.
This is because the driver is probed with a certain number of accessible
channels. Some pins are routed and some not, depending on the description of
the board in the DT.
Because of that, channels 0,1,2,3 can correspond to hardware channels
2,3,4,5 for example.
In the buffered triggered case, we need to do the translation accordingly.
Fixed the channel number to stop reading the wrong channel.
Fixes: 0e589d5fb ("ARM: AT91: IIO: Add AT91 ADC driver.")
Cc: Maxime Ripard <maxime.ripard@bootlin.com>
Signed-off-by: Eugen Hristev <eugen.hristev@microchip.com>
Acked-by: Ludovic Desroches <ludovic.desroches@microchip.com>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bc1b453262 upstream.
When doing simple conversions, the driver did not acknowledge the DRDY irq.
If this irq status is not acked, it will be left pending, and as soon as a
trigger is enabled, the irq handler will be called, it doesn't know why
this status has occurred because no channel is pending, and then it will go
int a irq loop and board will hang.
To avoid this situation, read the LCDR after a raw conversion is done.
Fixes: 0e589d5fb ("ARM: AT91: IIO: Add AT91 ADC driver.")
Cc: Maxime Ripard <maxime.ripard@bootlin.com>
Signed-off-by: Eugen Hristev <eugen.hristev@microchip.com>
Acked-by: Ludovic Desroches <ludovic.desroches@microchip.com>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d3fa21c73c upstream.
Leaving for_each_child_of_node loop we should release child device node,
if it is not stored for future use.
Found by Linux Driver Verification project (linuxtesting.org).
JC: I'm not sending this as a quick fix as it's been wrong for years,
but good to pick up for stable after the merge window.
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Fixes: 6df2e98c3e ("iio: adc: Add imx25-gcq ADC driver")
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8911a43bc1 upstream.
The correct way to handle errors returned by regualtor_get() and friends is
to propagate the error since that means that an regulator was specified,
but something went wrong when requesting it.
For handling optional regulators, e.g. when the device has an internal
vref, regulator_get_optional() should be used to avoid getting the dummy
regulator that the regulator core otherwise provides.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 22146c3ce9 upstream.
Some test systems were experiencing negative huge page reserve counts and
incorrect file block counts. This was traced to /proc/sys/vm/drop_caches
removing clean pages from hugetlbfs file pagecaches. When non-hugetlbfs
explicit code removes the pages, the appropriate accounting is not
performed.
This can be recreated as follows:
fallocate -l 2M /dev/hugepages/foo
echo 1 > /proc/sys/vm/drop_caches
fallocate -l 2M /dev/hugepages/foo
grep -i huge /proc/meminfo
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
HugePages_Total: 2048
HugePages_Free: 2047
HugePages_Rsvd: 18446744073709551615
HugePages_Surp: 0
Hugepagesize: 2048 kB
Hugetlb: 4194304 kB
ls -lsh /dev/hugepages/foo
4.0M -rw-r--r--. 1 root root 2.0M Oct 17 20:05 /dev/hugepages/foo
To address this issue, dirty pages as they are added to pagecache. This
can easily be reproduced with fallocate as shown above. Read faulted
pages will eventually end up being marked dirty. But there is a window
where they are clean and could be impacted by code such as drop_caches.
So, just dirty them all as they are added to the pagecache.
Link: http://lkml.kernel.org/r/b5be45b8-5afe-56cd-9482-28384699a049@oracle.com
Fixes: 6bda666a03 ("hugepages: fold find_or_alloc_pages into huge_no_page()")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Mihcla Hocko <mhocko@suse.com>
Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fbe1a850b3 upstream.
When the LRW block counter overflows, the current implementation returns
128 as the index to the precomputed multiplication table, which has 128
entries. This patch fixes it to return the correct value (127).
Fixes: 64470f1b85 ("[CRYPTO] lrw: Liskov Rivest Wagner, a tweakable narrow block cipher mode")
Cc: <stable@vger.kernel.org> # 2.6.20+
Reported-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0ab93e9c99 upstream.
The genweq_add_file and genwqe_del_file by caching current without
using reference counting embed the assumption that a file descriptor
will never be passed from one process to another. It even embeds the
assumption that the the thread that opened the file will be in
existence when the process terminates. Neither of which are
guaranteed to be true.
Therefore replace caching the task_struct of the opener with
pid of the openers thread group id. All the knowledge of the
opener is used for is as the target of SIGKILL and a SIGKILL
will kill the entire process group.
Rename genwqe_force_sig to genwqe_terminate, remove it's unncessary
signal argument, update it's ownly caller, and use kill_pid
instead of force_sig.
The work force_sig does in changing signal handling state is not
relevant to SIGKILL sent as SEND_SIG_PRIV. The exact same processess
will be killed just with less work, and less confusion. The work done
by force_sig is really only needed for handling syncrhonous
exceptions.
It will still be possible to cause genwqe_device_remove to wait
8 seconds by passing a file descriptor to another process but
the possible user after free is fixed.
Fixes: eaf4722d46 ("GenWQE Character device and DDCB queue")
Cc: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Frank Haverkamp <haver@linux.vnet.ibm.com>
Cc: Joerg-Stephan Vogt <jsvogt@de.ibm.com>
Cc: Michael Jung <mijung@gmx.net>
Cc: Michael Ruettger <michael@ibmra.de>
Cc: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
Cc: Eberhard S. Amann <esa@linux.vnet.ibm.com>
Cc: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
Cc: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d0c9606b31 upstream.
Add Device IDs to the Intel GPU "spurious interrupt" quirk table.
For these devices, unplugging the VGA cable and plugging it in again causes
spurious interrupts from the IGD. Linux eventually disables the interrupt,
but of course that disables any other devices sharing the interrupt.
The theory is that this is a VGA BIOS defect: it should have disabled the
IGD interrupt but failed to do so.
See f67fd55fa9 ("PCI: Add quirk for still enabled interrupts on Intel
Sandy Bridge GPUs") and 7c82126a94 ("PCI: Add new ID for Intel GPU
"spurious interrupt" quirk") for some history.
[bhelgaas: See link below for discussion about how to fix this more
generically instead of adding device IDs for every new Intel GPU. I hope
this is the last patch to add device IDs.]
Link: https://lore.kernel.org/linux-pci/1537974841-29928-1-git-send-email-bmeng.cn@gmail.com
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>
[bhelgaas: changelog]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org # v3.4+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f11274396a upstream.
uref->usage_index can be indirectly controlled by userspace, hence leading
to a potential exploitation of the Spectre variant 1 vulnerability.
This field is used as an array index by the hiddev_ioctl_usage() function,
when 'cmd' is either HIDIOCGCOLLECTIONINDEX, HIDIOCGUSAGES or
HIDIOCSUSAGES.
For cmd == HIDIOCGCOLLECTIONINDEX case, uref->usage_index is compared to
field->maxusage and then used as an index to dereference field->usage
array. The same thing happens to the cmd == HIDIOC{G,S}USAGES cases, where
uref->usage_index is checked against an array maximum value and then it is
used as an index in an array.
This is a summary of the HIDIOCGCOLLECTIONINDEX case, which matches the
traditional Spectre V1 first load:
copy_from_user(uref, user_arg, sizeof(*uref))
if (uref->usage_index >= field->maxusage)
goto inval;
i = field->usage[uref->usage_index].collection_index;
return i;
This patch fixes this by sanitizing field uref->usage_index before using it
to index field->usage (HIDIOCGCOLLECTIONINDEX) or field->value in
HIDIOC{G,S}USAGES arrays, thus, avoiding speculation in the first load.
Cc: <stable@vger.kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
v2: Contemplate cmd == HIDIOC{G,S}USAGES case
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 182a79e0c1 upstream.
We return most failure of dquota_initialize() except
inode evict, this could make a bit sense, for example
we allow file removal even quota files are broken?
But it dosen't make sense to allow setting project
if quota files etc are broken.
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 625ef8a3ac upstream.
Variable retries is not initialized in ext4_da_write_inline_data_begin()
which can lead to nondeterministic number of retries in case we hit
ENOSPC. Initialize retries to zero as we do everywhere else.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Fixes: bc0ca9df3b ("ext4: retry allocation when inline->extent conversion failed")
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ccd3c4373e upstream.
The code cleaning transaction's lists of checkpoint buffers has a bug
where it increases bh refcount only after releasing
journal->j_list_lock. Thus the following race is possible:
CPU0 CPU1
jbd2_log_do_checkpoint()
jbd2_journal_try_to_free_buffers()
__journal_try_to_free_buffer(bh)
...
while (transaction->t_checkpoint_io_list)
...
if (buffer_locked(bh)) {
<-- IO completes now, buffer gets unlocked -->
spin_unlock(&journal->j_list_lock);
spin_lock(&journal->j_list_lock);
__jbd2_journal_remove_checkpoint(jh);
spin_unlock(&journal->j_list_lock);
try_to_free_buffers(page);
get_bh(bh) <-- accesses freed bh
Fix the problem by grabbing bh reference before unlocking
journal->j_list_lock.
Fixes: dc6e8d669c ("jbd2: don't call get_bh() before calling __jbd2_journal_remove_checkpoint()")
Fixes: be1158cc61 ("jbd2: fold __process_buffer() into jbd2_log_do_checkpoint()")
Reported-by: syzbot+7f4a27091759e2fe7453@syzkaller.appspotmail.com
CC: stable@vger.kernel.org
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b6eae0f61d upstream.
Unlike asynchronous initialization in the core we have not yet associated
the device with the parent, and as such the device doesn't hold a reference
to the parent.
In order to resolve that we should be holding a reference on the parent
until the asynchronous initialization has completed.
Cc: <stable@vger.kernel.org>
Fixes: 4d88a97aa9 ("libnvdimm: ...base ... infrastructure")
Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 076ed3da0c upstream.
commit 40413955ee ("Cipso: cipso_v4_optptr enter infinite loop") fixed
a possible infinite loop in the IP option parsing of CIPSO. The fix
assumes that ip_options_compile filtered out all zero length options and
that no other one-byte options beside IPOPT_END and IPOPT_NOOP exist.
While this assumption currently holds true, add explicit checks for zero
length and invalid length options to be safe for the future. Even though
ip_options_compile should have validated the options, the introduction of
new one-byte options can still confuse this code without the additional
checks.
Signed-off-by: Stefan Nuernberger <snu@amazon.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Simon Veith <sveith@amazon.de>
Cc: stable@vger.kernel.org
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3d71c3f1f5 upstream.
The rs_rate_from_ucode_rate() function may return -EINVAL if the rate
is invalid, but none of the callsites check for the error, potentially
making us access arrays with index IWL_RATE_INVALID, which is larger
than the arrays, causing an out-of-bounds access. This will trigger
KASAN warnings, such as the one reported in the bugzilla issue
mentioned below.
This fixes https://bugzilla.kernel.org/show_bug.cgi?id=200659
Cc: stable@vger.kernel.org
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e28fd56ad5 upstream.
In rmmod path, usbip_vudc does platform_device_put() twice once from
platform_device_unregister() and then from put_vudc_device().
The second put results in:
BUG kmalloc-2048 (Not tainted): Poison overwritten error or
BUG: KASAN: use-after-free in kobject_put+0x1e/0x230 if KASAN is
enabled.
[ 169.042156] calling init+0x0/0x1000 [usbip_vudc] @ 1697
[ 169.042396] =============================================================================
[ 169.043678] probe of usbip-vudc.0 returned 1 after 350 usecs
[ 169.044508] BUG kmalloc-2048 (Not tainted): Poison overwritten
[ 169.044509] -----------------------------------------------------------------------------
...
[ 169.057849] INFO: Freed in device_release+0x2b/0x80 age=4223 cpu=3 pid=1693
[ 169.057852] kobject_put+0x86/0x1b0
[ 169.057853] 0xffffffffc0c30a96
[ 169.057855] __x64_sys_delete_module+0x157/0x240
Fix it to call platform_device_del() instead and let put_vudc_device() do
the platform_device_put().
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Shuah Khan (Samsung OSG) <shuah@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a856531951 upstream.
xen_qlock_wait() isn't safe for nested calls due to interrupts. A call
of xen_qlock_kick() might be ignored in case a deeper nesting level
was active right before the call of xen_poll_irq():
CPU 1: CPU 2:
spin_lock(lock1)
spin_lock(lock1)
-> xen_qlock_wait()
-> xen_clear_irq_pending()
Interrupt happens
spin_unlock(lock1)
-> xen_qlock_kick(CPU 2)
spin_lock_irqsave(lock2)
spin_lock_irqsave(lock2)
-> xen_qlock_wait()
-> xen_clear_irq_pending()
clears kick for lock1
-> xen_poll_irq()
spin_unlock_irq_restore(lock2)
-> xen_qlock_kick(CPU 2)
wakes up
spin_unlock_irq_restore(lock2)
IRET
resumes in xen_qlock_wait()
-> xen_poll_irq()
never wakes up
The solution is to disable interrupts in xen_qlock_wait() and not to
poll for the irq in case xen_qlock_wait() is called in nmi context.
Cc: stable@vger.kernel.org
Cc: Waiman.Long@hp.com
Cc: peterz@infradead.org
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2ac2a7d4d9 upstream.
In the following situation a vcpu waiting for a lock might not be
woken up from xen_poll_irq():
CPU 1: CPU 2: CPU 3:
takes a spinlock
tries to get lock
-> xen_qlock_wait()
frees the lock
-> xen_qlock_kick(cpu2)
-> xen_clear_irq_pending()
takes lock again
tries to get lock
-> *lock = _Q_SLOW_VAL
-> *lock == _Q_SLOW_VAL ?
-> xen_poll_irq()
frees the lock
-> xen_qlock_kick(cpu3)
And cpu 2 will sleep forever.
This can be avoided easily by modifying xen_qlock_wait() to call
xen_poll_irq() only if the related irq was not pending and to call
xen_clear_irq_pending() only if it was pending.
Cc: stable@vger.kernel.org
Cc: Waiman.Long@hp.com
Cc: peterz@infradead.org
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f92898e7f3 upstream.
If a block device is hot-added when we are out of grants,
gnttab_grant_foreign_access fails with -ENOSPC (log message "28
granting access to ring page") in this code path:
talk_to_blkback ->
setup_blkring ->
xenbus_grant_ring ->
gnttab_grant_foreign_access
and the failing path in talk_to_blkback sets the driver_data to NULL:
destroy_blkring:
blkif_free(info, 0);
mutex_lock(&blkfront_mutex);
free_info(info);
mutex_unlock(&blkfront_mutex);
dev_set_drvdata(&dev->dev, NULL);
This results in a NULL pointer BUG when blkfront_remove and blkif_free
try to access the failing device's NULL struct blkfront_info.
Cc: stable@vger.kernel.org # 4.5 and later
Signed-off-by: Vasilis Liaskovitis <vliaskovitis@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e487a0f523 upstream.
Functionality of the xen-tpmfront driver was lost secondary to
the introduction of xenbus multi-page support in commit ccc9d90a9a
("xenbus_client: Extend interface to support multi-page ring").
In this commit pointer to location of where the shared page address
is stored was being passed to the xenbus_grant_ring() function rather
then the address of the shared page itself. This resulted in a situation
where the driver would attach to the vtpm-stubdom but any attempt
to send a command to the stub domain would timeout.
A diagnostic finding for this regression is the following error
message being generated when the xen-tpmfront driver probes for a
device:
<3>vtpm vtpm-0: tpm_transmit: tpm_send: error -62
<3>vtpm vtpm-0: A TPM error (-62) occurred attempting to determine
the timeouts
This fix is relevant to all kernels from 4.1 forward which is the
release in which multi-page xenbus support was introduced.
Daniel De Graaf formulated the fix by code inspection after the
regression point was located.
Fixes: ccc9d90a9a ("xenbus_client: Extend interface to support multi-page ring")
Signed-off-by: Dr. Greg Wettstein <greg@enjellic.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[boris: Updated commit message, added Fixes tag]
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: stable@vger.kernel.org # v4.1+
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
commit 7250f422da upstream.
xen_swiotlb_{alloc,free}_coherent() allocate/free memory based on the
order of the pages and not size argument (bytes). This is inconsistent with
range_straddles_page_boundary and memset which use the 'size' value,
which may lead to not exchanging memory with Xen (range_straddles_page_boundary()
returned true). And then the call to xen_swiotlb_free_coherent() would
actually try to exchange the memory with Xen, leading to the kernel
hitting an BUG (as the hypercall returned an error).
This patch fixes it by making the 'size' variable be of the same size
as the amount of memory allocated.
CC: stable@vger.kernel.org
Signed-off-by: Joe Jin <joe.jin@oracle.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Christoph Helwig <hch@lst.de>
Cc: Dongli Zhang <dongli.zhang@oracle.com>
Cc: John Sobecki <john.sobecki@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 672f33198b upstream.
The cooling device properties, like "#cooling-cells" and
"dynamic-power-coefficient", should either be present for all the CPUs
of a cluster or none. If these are present only for a subset of CPUs of
a cluster then things will start falling apart as soon as the CPUs are
brought online in a different order. For example, this will happen
because the operating system looks for such properties in the CPU node
it is trying to bring up, so that it can register a cooling device.
Add such missing properties.
Fix other missing properties (clocks, OPP, clock latency) as well to
make it all work.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cd6f55457e upstream.
The "cooling-min-level" and "cooling-max-level" properties are not
parsed by any part of the kernel currently and the max cooling state of
a CPU cooling device is found by referring to the cpufreq table instead.
Remove the unused properties from the CPU nodes.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 78c9be61c3 ]
Introduce a new flag, uc_buffer, to indicate that the controller
requires the non-cached pages for stream buffers, either as a
chip-specific requirement or specified via snoop=0 option.
This improves the code-readability.
Also, this patch fixes the incorrect behavior for C-Media chip where
the stream buffers were never handled as non-cached due to the check
of driver_type even if you pass snoop=0 option.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 54f919a04c ]
The driver calls clk_get() with the clock name set to NULL, which means
that the driver could only work when probed from devicetree. From now
on, we explicitly require the driver to be probed from devicetree.
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Tested-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3597dfe01d ]
Instead of playing whack-a-mole and changing SEND_SIG_PRIV to
SEND_SIG_FORCED throughout the kernel to ensure a pid namespace init
gets signals sent by the kernel, stop allowing a pid namespace init to
ignore SIGKILL or SIGSTOP sent by the kernel. A pid namespace init is
only supposed to be able to ignore signals sent from itself and
children with SIG_DFL.
Fixes: 921cf9f630 ("signals: protect cinit from unblocked SIG_DFL signals")
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0ef01a2d95 ]
When running an mds diagnostic that passes frames with the switch, soft
lockups are detected. The driver is in a CQE processing loop and has
sufficient amount of traffic that it never exits the ring processing routine,
thus the "lockup".
Cap the number of elements in the work processing routine to 64 elements. This
ensures that the cpu will be given up and the handler reschedule to process
additional items.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ae61cf5b99 ]
When both uio and the uio drivers are built in the kernel, it is possible
for a driver to register devices before the uio class is registered.
This may result in a NULL pointer dereference later on in
get_device_parent() when accessing the class glue_dirs spinlock.
The trace looks like that:
Unable to handle kernel NULL pointer dereference at virtual address 00000140
[...]
[<ffff0000089cc234>] _raw_spin_lock+0x14/0x48
[<ffff0000084f56bc>] device_add+0x154/0x6a0
[<ffff0000084f5e48>] device_create_groups_vargs+0x120/0x128
[<ffff0000084f5edc>] device_create+0x54/0x60
[<ffff0000086e72c0>] __uio_register_device+0x120/0x4a8
[<ffff000008528b7c>] jaguar2_pci_probe+0x2d4/0x558
[<ffff0000083fc18c>] local_pci_probe+0x3c/0xb8
[<ffff0000083fd81c>] pci_device_probe+0x11c/0x180
[<ffff0000084f88bc>] driver_probe_device+0x22c/0x2d8
[<ffff0000084f8a24>] __driver_attach+0xbc/0xc0
[<ffff0000084f69fc>] bus_for_each_dev+0x4c/0x98
[<ffff0000084f81b8>] driver_attach+0x20/0x28
[<ffff0000084f7d08>] bus_add_driver+0x1b8/0x228
[<ffff0000084f93c0>] driver_register+0x60/0xf8
[<ffff0000083fb918>] __pci_register_driver+0x40/0x48
Return EPROBE_DEFER in that case so the driver can register the device
later.
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit cfb03be6c7 ]
The following lockdep splat was observed:
[ 1222.241750] ======================================================
[ 1222.271301] WARNING: possible circular locking dependency detected
[ 1222.301060] 4.16.0-10.el8+5.x86_64+debug #1 Not tainted
[ 1222.326659] ------------------------------------------------------
[ 1222.356565] systemd-shutdow/1 is trying to acquire lock:
[ 1222.382660] ((&ioat_chan->timer)){+.-.}, at: [<00000000f71e1a28>] del_timer_sync+0x5/0xf0
[ 1222.422928]
[ 1222.422928] but task is already holding lock:
[ 1222.451743] (&(&ioat_chan->prep_lock)->rlock){+.-.}, at: [<000000008ea98b12>] ioat_shutdown+0x86/0x100 [ioatdma]
:
[ 1223.524987] Chain exists of:
[ 1223.524987] (&ioat_chan->timer) --> &(&ioat_chan->cleanup_lock)->rlock --> &(&ioat_chan->prep_lock)->rlock
[ 1223.524987]
[ 1223.594082] Possible unsafe locking scenario:
[ 1223.594082]
[ 1223.622630] CPU0 CPU1
[ 1223.645080] ---- ----
[ 1223.667404] lock(&(&ioat_chan->prep_lock)->rlock);
[ 1223.691535] lock(&(&ioat_chan->cleanup_lock)->rlock);
[ 1223.728657] lock(&(&ioat_chan->prep_lock)->rlock);
[ 1223.765122] lock((&ioat_chan->timer));
[ 1223.784095]
[ 1223.784095] *** DEADLOCK ***
[ 1223.784095]
[ 1223.813492] 4 locks held by systemd-shutdow/1:
[ 1223.834677] #0: (reboot_mutex){+.+.}, at: [<0000000056d33456>] SYSC_reboot+0x10f/0x300
[ 1223.873310] #1: (&dev->mutex){....}, at: [<00000000258dfdd7>] device_shutdown+0x1c8/0x660
[ 1223.913604] #2: (&dev->mutex){....}, at: [<0000000068331147>] device_shutdown+0x1d6/0x660
[ 1223.954000] #3: (&(&ioat_chan->prep_lock)->rlock){+.-.}, at: [<000000008ea98b12>] ioat_shutdown+0x86/0x100 [ioatdma]
In the ioat_shutdown() function:
spin_lock_bh(&ioat_chan->prep_lock);
set_bit(IOAT_CHAN_DOWN, &ioat_chan->state);
del_timer_sync(&ioat_chan->timer);
spin_unlock_bh(&ioat_chan->prep_lock);
According to the synchronization rule for the del_timer_sync() function,
the caller must not hold locks which would prevent completion of the
timer's handler.
The timer structure has its own lock that manages its synchronization.
Setting the IOAT_CHAN_DOWN bit should prevent other CPUs from
trying to use that device anyway, there is probably no need to call
del_timer_sync() while holding the prep_lock. So the del_timer_sync()
call is now moved outside of the prep_lock critical section to prevent
the circular lock dependency.
Signed-off-by: Waiman Long <longman@redhat.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 8b97d73c4d ]
The ChipIdea IRQ is disabled before scheduling the otg work and
re-enabled on otg work completion. However if the job is already
scheduled we have to undo the effect of disable_irq int order to
balance the IRQ disable-depth value.
Fixes: be6b0c1bd0 ("usb: chipidea: using one inline function to cover queue work operations")
Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
Signed-off-by: Peter Chen <peter.chen@nxp.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4c1ef72e9b ]
It is a serious driver defect to enable MSI or MSI-X more than once. Doing
so may panic the kernel as in the stack trace below:
Call Trace:
sysfs_add_one+0xa5/0xd0
create_dir+0x7c/0xe0
sysfs_create_subdir+0x1c/0x20
internal_create_group+0x6d/0x290
sysfs_create_groups+0x4a/0xa0
populate_msi_sysfs+0x1cd/0x210
pci_enable_msix+0x31c/0x3e0
igbuio_pci_open+0x72/0x300 [igb_uio]
uio_open+0xcc/0x120 [uio]
chrdev_open+0xa1/0x1e0
[...]
do_sys_open+0xf3/0x1f0
SyS_open+0x1e/0x20
system_call_fastpath+0x16/0x1b
---[ end trace 11042e2848880209 ]---
Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffffa056b4fa
We want to keep the WARN_ON() and stack trace so the driver can be fixed,
but we can avoid the kernel panic by returning an error. We may still get
warnings like this:
Call Trace:
pci_enable_msix+0x3c9/0x3e0
igbuio_pci_open+0x72/0x300 [igb_uio]
uio_open+0xcc/0x120 [uio]
chrdev_open+0xa1/0x1e0
[...]
do_sys_open+0xf3/0x1f0
SyS_open+0x1e/0x20
system_call_fastpath+0x16/0x1b
------------[ cut here ]------------
WARNING: at fs/sysfs/dir.c:526 sysfs_add_one+0xa5/0xd0()
sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:03.0/0000:01:00.1/msi_irqs'
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
[bhelgaas: changelog, fix patch whitespace, remove !!]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d595567dc4 ]
If we change the number of array's device after device is removed from array,
then add the device back to array, we can see that device is added as active
role instead of spare which we expected.
Please see the below link for details:
https://marc.info/?l=linux-raid&m=153736982015076&w=2
This is caused by that we prefer to use device's previous role which is
recorded by saved_raid_disk, but we should respect the new number of
conf->raid_disks since it could be changed after device is removed.
Reported-by: Gioh Kim <gi-oh.kim@profitbricks.com>
Tested-by: Gioh Kim <gi-oh.kim@profitbricks.com>
Acked-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit f18b2b83a7 ]
If the starting block number of either the source or destination file
exceeds the EOF, EXT4_IOC_MOVE_EXT should return EINVAL.
Also fixed the helper function mext_check_coverage() so that if the
logical block is beyond EOF, make it return immediately, instead of
looping until the block number wraps all the away around. This takes
long enough that if there are multiple threads trying to do pound on
an the same inode doing non-sensical things, it can end up triggering
the kernel's soft lockup detector.
Reported-by: syzbot+c61979f6f2cba5cb3c06@syzkaller.appspotmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 11924ba5e6 ]
When adding a VMCI resource, the check for an existing entry
would ignore that the new entry could be a wildcard. This could
result in multiple resource entries that would match a given
handle. One disastrous outcome of this is that the
refcounting used to ensure that delayed callbacks for VMCI
datagrams have run before the datagram is destroyed can be
wrong, since the refcount could be increased on the duplicate
entry. This in turn leads to a use after free bug. This issue
was discovered by Hangbin Liu using KASAN and syzkaller.
Fixes: bc63dedb7d ("VMCI: resource object implementation")
Reported-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Adit Ranadive <aditr@vmware.com>
Reviewed-by: Vishnu Dasa <vdasa@vmware.com>
Signed-off-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0d6d0d62d9 ]
For TPM 1.2 chips the system setup utility allows to set the TPM device in
one of the following states:
* Active: Security chip is functional
* Inactive: Security chip is visible, but is not functional
* Disabled: Security chip is hidden and is not functional
When choosing the "Inactive" state, the TPM 1.2 device is enumerated and
registered, but sending TPM commands fail with either TPM_DEACTIVATED or
TPM_DISABLED depending if the firmware deactivated or disabled the TPM.
Since these TPM 1.2 error codes don't have special treatment, inactivating
the TPM leads to a very noisy kernel log buffer that shows messages like
the following:
tpm_tis 00:05: 1.2 TPM (device-id 0x0, rev-id 78)
tpm tpm0: A TPM error (6) occurred attempting to read a pcr value
tpm tpm0: TPM is disabled/deactivated (0x6)
tpm tpm0: A TPM error (6) occurred attempting get random
tpm tpm0: A TPM error (6) occurred attempting to read a pcr value
ima: No TPM chip found, activating TPM-bypass! (rc=6)
tpm tpm0: A TPM error (6) occurred attempting get random
tpm tpm0: A TPM error (6) occurred attempting get random
tpm tpm0: A TPM error (6) occurred attempting get random
tpm tpm0: A TPM error (6) occurred attempting get random
Let's just suppress error log messages for the TPM_{DEACTIVATED,DISABLED}
return codes, since this is expected when the TPM 1.2 is set to Inactive.
In that case the kernel log is cleaner and less confusing for users, i.e:
tpm_tis 00:05: 1.2 TPM (device-id 0x0, rev-id 78)
tpm tpm0: TPM is disabled/deactivated (0x6)
ima: No TPM chip found, activating TPM-bypass! (rc=6)
Reported-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0f6ef65d1c ]
If the provider driver (such as rdma_rxe) doesn't support pma counters,
avoid exposing its directory similar to optional hw_counters directory.
If core fails to read the PMA counter, return an error so that user can
retry later if needed.
Fixes: 35c4cbb178 ("IB/core: Create get_perf_mad function in sysfs.c")
Reported-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 47db787313 ]
In megasas_mgmt_compat_ioctl_fw(), to handle the structure
compat_megasas_iocpacket 'cioc', a user-space structure megasas_iocpacket
'ioc' is allocated before megasas_mgmt_ioctl_fw() is invoked to handle
the packet. Since the two data structures have different fields, the data
is copied from 'cioc' to 'ioc' field by field. In the copy process,
'sense_ptr' is prepared if the field 'sense_len' is not null, because it
will be used in megasas_mgmt_ioctl_fw(). To prepare 'sense_ptr', the
user-space data 'ioc->sense_off' and 'cioc->sense_off' are copied and
saved to kernel-space variables 'local_sense_off' and 'user_sense_off'
respectively. Given that 'ioc->sense_off' is also copied from
'cioc->sense_off', 'local_sense_off' and 'user_sense_off' should have the
same value. However, 'cioc' is in the user space and a malicious user can
race to change the value of 'cioc->sense_off' after it is copied to
'ioc->sense_off' but before it is copied to 'user_sense_off'. By doing
so, the attacker can inject different values into 'local_sense_off' and
'user_sense_off'. This can cause undefined behavior in the following
execution, because the two variables are supposed to be same.
This patch enforces a check on the two kernel variables 'local_sense_off'
and 'user_sense_off' to make sure they are the same after the copy. In
case they are not, an error code EINVAL will be returned.
Signed-off-by: Wenwen Wang <wang6495@umn.edu>
Acked-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit fd47d919d0 ]
If a target disconnects during a PIO data transfer the command may fail
when the target reconnects:
scsi host1: DMA length is zero!
scsi host1: cur adr[04380000] len[00000000]
The scsi bus is then reset. This happens because the residual reached
zero before the transfer was completed.
The usual residual calculation relies on the Transfer Count registers.
That works for DMA transfers but not for PIO transfers. Fix the problem
by storing the PIO transfer residual and using that to correctly
calculate bytes_sent.
Fixes: 6fe07aaffb ("[SCSI] m68k: new mac_esp scsi driver")
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Tested-by: Michael Schmitz <schmitzmic@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a9911937e7 ]
When running in AP mode, ath10k sometimes suffers from TX credit
starvation. The issue is hard to reproduce and shows up once in a
few days, but has been repeatedly seen with QCA9882 and a large
range of firmwares, including 10.2.4.70.67.
Once the module is in this state, TX credits are never replenished,
which results in "SWBA overrun" errors, as no beacons can be sent.
Even worse, WMI commands run in a timeout while holding the conf
mutex for three seconds each, making any further operations slow
and the whole system unresponsive.
The firmware/driver never recovers from that state automatically,
and triggering TX flush or warm restarts won't work over WMI. So
issue a hardware restart if a WMI command times out due to missing
TX credits. This implies a connectivity outage of about 1.4s in AP
mode, but brings back the interface and the whole system to a usable
state. WMI command timeouts have not been seen in absent of this
specific issue, so taking such drastic actions seems legitimate.
Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 7fb94bd58d ]
While VF2VF with RSS communication, RSS Type were wrongly recognized
and RSS hash was not calculated as it should be. Packets was
distributed on various queues by accident.
This commit fixes that behaviour and causes proper RSS Type recognition.
Signed-off-by: Sebastian Basierski <sebastianx.basierski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b432414b99 ]
If you look at "pinconf-groups" in debugfs for ssbi-gpio you'll notice
it looks like nonsense.
The problem is fairly well described in commit 1cf86bc212 ("pinctrl:
qcom: spmi-gpio: Fix pmic_gpio_config_get() to be compliant") and
commit 05e0c82895 ("pinctrl: msm: Fix msm_config_group_get() to be
compliant"), but it was pointed out that ssbi-gpio has the same
problem. Let's fix it there too.
Fixes: b4c45fe974 ("pinctrl: qcom: ssbi: Family A gpio & mpp drivers")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Stephen Boyd <sboyd@kernel.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0d5b476f8f ]
If you look at "pinconf-groups" in debugfs for ssbi-mpp you'll notice
it looks like nonsense.
The problem is fairly well described in commit 1cf86bc212 ("pinctrl:
qcom: spmi-gpio: Fix pmic_gpio_config_get() to be compliant") and
commit 05e0c82895 ("pinctrl: msm: Fix msm_config_group_get() to be
compliant"), but it was pointed out that ssbi-mpp has the same
problem. Let's fix it there too.
NOTE: in case it's helpful to someone reading this, the way to tell
whether to do the -EINVAL or not is to look at the PCONFDUMP for a
given attribute. If the last element (has_arg) is false then you need
to do the -EINVAL trick.
ALSO NOTE: it seems unlikely that the values returned when we try to
get PIN_CONFIG_BIAS_PULL_UP will actually be printed since "has_arg"
is false for that one, but I guess it's still fine to return different
values so I kept doing that. It seems like another driver (ssbi-gpio)
uses a custom attribute (PM8XXX_QCOM_PULL_UP_STRENGTH) for something
similar so maybe a future change should do that here too.
Fixes: cfb24f6ebd ("pinctrl: Qualcomm SPMI PMIC MPP pin controller driver")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Stephen Boyd <sboyd@kernel.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 89c68b102f ]
It looks like we parse the drive strength setting here, but never
actually write it into the hardware to update it. Parse the setting and
then write it at the end of the pinconf setting function so that it
actually sticks in the hardware.
Fixes: 0e948042c4 ("pinctrl: qcom: spmi-mpp: Implement support for sink mode")
Cc: Doug Anderson <dianders@chromium.org>
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 240714061c ]
Bay and Cherry Trail DSTDs represent a different set of devices depending
on which OS the device think it is booting. One set of decices for Windows
and another set of devices for Android which targets the Android-x86 Linux
kernel fork (which e.g. used to have its own display driver instead of
using the i915 driver).
Which set of devices we are actually going to get is out of our control,
this is controlled by the ACPI OSID variable, which gets either set through
an EFI setup option, or sometimes is autodetected. So we need to support
both.
This commit adds support for the 80862286 and 808622C0 ACPI HIDs which we
get for the first resp. second DMA controller on Cherry Trail devices when
OSID is set to Android.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 9c1442a9d0 ]
We currently align the end of the compressed image to a multiple of
16. However, the PE-COFF header included in the EFI stub says that
the file alignment is 32 bytes, and when adding an EFI signature to
the file it must first be padded to this alignment.
sbsigntool commands warn about this:
warning: file-aligned section .text extends beyond end of file
warning: checksum areas are greater than image size. Invalid section table?
Worse, pesign -at least when creating a detached signature- uses the
hash of the unpadded file, resulting in an invalid signature if
padding is required.
Avoid both these problems by increasing alignment to 32 bytes when
CONFIG_EFI_STUB is enabled.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 51c99dd2c0 ]
We can not call dev_pm_opp_of_cpumask_remove_table() freely anymore
since the latest OPP core updates as that uses reference counting to
free resources. There are cases where no static OPPs are added (using
DT) for a platform and trying to remove the OPP table may end up
decrementing refcount which is already zero and hence generating
warnings.
Lets track if we were able to add static OPPs or not and then only
remove the table based on that. Some reshuffling of code is also done to
do that.
Reported-by: Niklas Cassel <niklas.cassel@linaro.org>
Tested-by: Niklas Cassel <niklas.cassel@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d92116b800 ]
On OLPC XO-1, the RTC is discovered via device tree from the arch
initcall. Don't let the PC platform register another one from its device
initcall, it's not going to work:
sysfs: cannot create duplicate filename '/devices/platform/rtc_cmos'
CPU: 0 PID: 1 Comm: swapper Not tainted 4.19.0-rc6 #12
Hardware name: OLPC XO/XO, BIOS OLPC Ver 1.00.01 06/11/2014
Call Trace:
dump_stack+0x16/0x18
sysfs_warn_dup+0x46/0x58
sysfs_create_dir_ns+0x76/0x9b
kobject_add_internal+0xed/0x209
? __schedule+0x3fa/0x447
kobject_add+0x5b/0x66
device_add+0x298/0x535
? insert_resource_conflict+0x2a/0x3e
platform_device_add+0x14d/0x192
? io_delay_init+0x19/0x19
platform_device_register+0x1c/0x1f
add_rtc_cmos+0x16/0x31
do_one_initcall+0x78/0x14a
? do_early_param+0x75/0x75
kernel_init_freeable+0x152/0x1e0
? rest_init+0xa2/0xa2
kernel_init+0x8/0xd5
ret_from_fork+0x2e/0x38
kobject_add_internal failed for rtc_cmos with -EEXIST, don't try to
register things with the same name in the same directory.
platform rtc_cmos: registered platform RTC device (no PNP device found)
Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/20181004160808.307738-1-lkundrak@v3.sk
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 868a1e863f ]
If all free RB queues are empty, the driver will never restock the
free RB queue. That's because the restocking happens in the Rx flow,
and if the free queue is empty there will be no Rx.
Although there's a background worker (a.k.a. allocator) allocating
memory for RBs so that the Rx handler can restock them, the worker may
run only after the free queue has become empty (and then it is too
late for restocking as explained above).
There is a solution for that called 'emergency': If the number of used
RB's reaches half the amount of all RB's, the Rx handler will not wait
for the allocator but immediately allocate memory for the used RB's
and restock the free queue.
But, since the used RB's is per queue, it may happen that the used
RB's are spread between the queues such that the emergency check will
fail for each of the queues
(and still run out of RBs, causing the above symptom).
To fix it, move to emergency mode if the sum of *all* used RBs (for
all Rx queues) reaches half the amount of all RB's
Signed-off-by: Shaul Triebitz <shaul.triebitz@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 1e44224fb0 ]
For each system in a given pevent, read_event_files() reads in a
temporary 'sys' string. Be sure to free this string before moving onto
to the next system and/or leaving read_event_files().
Fixes the following coverity complaints:
Error: RESOURCE_LEAK (CWE-772):
tools/perf/util/trace-event-read.c:343: overwrite_var: Overwriting
"sys" in "sys = read_string()" leaks the storage that "sys" points to.
tools/perf/util/trace-event-read.c:353: leaked_storage: Variable "sys"
going out of scope leaks the storage it points to.
Signed-off-by: Sanskriti Sharma <sansharm@redhat.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Joe Lawrence <joe.lawrence@redhat.com>
Link: http://lkml.kernel.org/r/1538490554-8161-6-git-send-email-sansharm@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 95dcd64bc5 ]
Technically this is not required because disabling the PWM should be
enough. However, when support for atomic operations was implemented in
the PWM subsystem, only actual changes to the PWM channel are applied
during pwm_config(), which means that during after resume from suspend
the old settings won't be applied.
One possible solution is for the PWM driver to implement its own PM
operations such that settings from before suspend get applied on resume.
This has the disadvantage of completely ignoring any particular ordering
requirements that PWM user drivers might have, so it is best to leave it
up to the user drivers to apply the settings that they want at the
appropriate time.
Another way to solve this would be to read back the current state of the
PWM at the time of resume. That way, in case the configuration was lost
during suspend, applying the old settings in PWM user drivers would
actually get them applied because they differ from the current settings.
However, not all PWM drivers support reading the hardware state, and not
all hardware may support it.
The best workaround at this point seems to be to let PWM user drivers
tell the PWM subsystem that the PWM is turned off by, in addition to
disabling it, also setting the duty cycle to 0. This causes the resume
operation to apply a configuration that is different from the current
configuration, resulting in the proper state from before suspend getting
restored.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b5130dc222 ]
When running as a level 3 guest with no host provided sthyi support
sclp_ocf_cpc_name_copy() will only return zeroes. Zeroes are not a
valid group name, so let's not indicate that the group name field is
valid.
Also the group name is not dependent on stsi, let's not return based
on stsi before setting it.
Fixes: 95ca2cb579 ("KVM: s390: Add sthyi emulation")
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit df52eab23d ]
Configuring generic network device parameters on tun will fail in
presence of IFLA_INFO_KIND attribute in IFLA_LINKINFO nested attribute
since tun_validate() always return failure.
This can be visualized with following ip-link(8) command sequences:
# ip link set dev tun0 group 100
# ip link set dev tun0 group 100 type tun
RTNETLINK answers: Invalid argument
with contrast to dummy and veth drivers:
# ip link set dev dummy0 group 100
# ip link set dev dummy0 type dummy
# ip link set dev veth0 group 100
# ip link set dev veth0 group 100 type veth
Fix by returning zero in tun_validate() when @data is NULL that is
always in case since rtnl_link_ops->maxtype is zero in tun driver.
Fixes: f019a7a594 ("tun: Implement ip link del tunXXX")
Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 1448a2a536 ]
If we fail to allocate the request queue for a disk, we still need to
free that disk, not just the previous ones. Additionally, we need to
cleanup the previous request queues.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 71327f547e ]
Move queue allocation next to disk allocation to fix a couple of issues:
- If add_disk() hasn't been called, we should clear disk->queue before
calling put_disk().
- If we fail to allocate a request queue, we still need to put all of
the disks, not just the ones that we allocated queues for.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 9506a7425b ]
It was found that when debug_locks was turned off because of a problem
found by the lockdep code, the system performance could drop quite
significantly when the lock_stat code was also configured into the
kernel. For instance, parallel kernel build time on a 4-socket x86-64
server nearly doubled.
Further analysis into the cause of the slowdown traced back to the
frequent call to debug_locks_off() from the __lock_acquired() function
probably due to some inconsistent lockdep states with debug_locks
off. The debug_locks_off() function did an unconditional atomic xchg
to write a 0 value into debug_locks which had already been set to 0.
This led to severe cacheline contention in the cacheline that held
debug_locks. As debug_locks is being referenced in quite a few different
places in the kernel, this greatly slow down the system performance.
To prevent that trashing of debug_locks cacheline, lock_acquired()
and lock_contended() now checks the state of debug_locks before
proceeding. The debug_locks_off() function is also modified to check
debug_locks before calling __debug_locks_off().
Signed-off-by: Waiman Long <longman@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Link: http://lkml.kernel.org/r/1539913518-15598-1-git-send-email-longman@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 8c3bf9b62b ]
Clang currently warns:
drivers/net/ethernet/qlogic/qla3xxx.c:384:24: warning: signed shift
result (0xF00000000) requires 37 bits to represent, but 'int' only has
32 bits [-Wshift-overflow]
((ISP_NVRAM_MASK << 16) | qdev->eeprom_cmd_data));
~~~~~~~~~~~~~~ ^ ~~
1 warning generated.
The warning is certainly accurate since ISP_NVRAM_MASK is defined as
(0x000F << 16) which is then shifted by 16, resulting in 64424509440,
well above UINT_MAX.
Given that this is the only location in this driver where ISP_NVRAM_MASK
is shifted again, it seems likely that ISP_NVRAM_MASK was originally
defined without a shift and during the move of the shift to the
definition, this statement wasn't properly removed (since ISP_NVRAM_MASK
is used in the statenent right above this). Only the maintainers can
confirm this since this statment has been here since the driver was
first added to the kernel.
Link: https://github.com/ClangBuiltLinux/linux/issues/127
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit cfdc3170d2 ]
It is important to clear the hw->state value for non-stopped events
when they are added into the PMU. Otherwise when the event is
scheduled out, we won't read the counter because HES_UPTODATE is still
set. This breaks 'perf stat' and similar use cases, causing all the
events to show zero.
This worked for multi-pcr because we make explicit sparc_pmu_start()
calls in calculate_multiple_pcrs(). calculate_single_pcr() doesn't do
this because the idea there is to accumulate all of the counter
settings into the single pcr value. So we have to add explicit
hw->state handling there.
Like x86, we use the PERF_HES_ARCH bit to track truly stopped events
so that we don't accidently start them on a reload.
Related to all of this, sparc_pmu_start() is missing a userpage update
so add it.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 1b9caa10b3 ]
This reverts commit ac0e2cd555.
Michael reported an issue with oversized terms values assignment
and I noticed there was actually a misunderstanding of the max
value check in the past.
The above commit's changelog says:
If bit 21 is set, there is parsing issues as below.
$ perf stat -a -e uncore_qpi_0/event=0x200002,umask=0x8/
event syntax error: '..pi_0/event=0x200002,umask=0x8/'
\___ value too big for format, maximum is 511
But there's no issue there, because the event value is distributed
along the value defined by the format. Even if the format defines
separated bit, the value is treated as a continual number, which
should follow the format definition.
In above case it's 9-bit value with last bit separated:
$ cat uncore_qpi_0/format/event
config:0-7,21
Hence the value 0x200002 is correctly reported as format violation,
because it exceeds 9 bits. It should have been 0x102 instead, which
sets the 9th bit - the bit 21 of the format.
$ perf stat -vv -a -e uncore_qpi_0/event=0x102,umask=0x8/
Using CPUID GenuineIntel-6-2D
...
------------------------------------------------------------
perf_event_attr:
type 10
size 112
config 0x200802
sample_type IDENTIFIER
...
Reported-by: Michael Petlan <mpetlan@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: ac0e2cd555 ("perf tools: Fix PMU term format max value calculation")
Link: http://lkml.kernel.org/r/20181003072046.29276-1-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 706d51681d upstream.
Future Intel processors will support "Enhanced IBRS" which is an "always
on" mode i.e. IBRS bit in SPEC_CTRL MSR is enabled once and never
disabled.
From the specification [1]:
"With enhanced IBRS, the predicted targets of indirect branches
executed cannot be controlled by software that was executed in a less
privileged predictor mode or on another logical processor. As a
result, software operating on a processor with enhanced IBRS need not
use WRMSR to set IA32_SPEC_CTRL.IBRS after every transition to a more
privileged predictor mode. Software can isolate predictor modes
effectively simply by setting the bit once. Software need not disable
enhanced IBRS prior to entering a sleep state such as MWAIT or HLT."
If Enhanced IBRS is supported by the processor then use it as the
preferred spectre v2 mitigation mechanism instead of Retpoline. Intel's
Retpoline white paper [2] states:
"Retpoline is known to be an effective branch target injection (Spectre
variant 2) mitigation on Intel processors belonging to family 6
(enumerated by the CPUID instruction) that do not have support for
enhanced IBRS. On processors that support enhanced IBRS, it should be
used for mitigation instead of retpoline."
The reason why Enhanced IBRS is the recommended mitigation on processors
which support it is that these processors also support CET which
provides a defense against ROP attacks. Retpoline is very similar to ROP
techniques and might trigger false positives in the CET defense.
If Enhanced IBRS is selected as the mitigation technique for spectre v2,
the IBRS bit in SPEC_CTRL MSR is set once at boot time and never
cleared. Kernel also has to make sure that IBRS bit remains set after
VMEXIT because the guest might have cleared the bit. This is already
covered by the existing x86_spec_ctrl_set_guest() and
x86_spec_ctrl_restore_host() speculation control functions.
Enhanced IBRS still requires IBPB for full mitigation.
[1] Speculative-Execution-Side-Channel-Mitigations.pdf
[2] Retpoline-A-Branch-Target-Injection-Mitigation.pdf
Both documents are available at:
https://bugzilla.kernel.org/show_bug.cgi?id=199511
Originally-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim C Chen <tim.c.chen@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Link: https://lkml.kernel.org/r/1533148945-24095-1-git-send-email-sai.praneeth.prakhya@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ccde460b9a upstream.
memory_corruption_check[{_period|_size}]()'s handlers do not check input
argument before passing it to kstrtoul() or simple_strtoull(). The argument
would be a NULL pointer if each of the kernel parameters, without its
value, is set in command line and thus cause the following panic.
PANIC: early exception 0xe3 IP 10:ffffffff73587c22 error 0 cr2 0x0
[ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.18-rc8+ #2
[ 0.000000] RIP: 0010:kstrtoull+0x2/0x10
...
[ 0.000000] Call Trace
[ 0.000000] ? set_corruption_check+0x21/0x49
[ 0.000000] ? do_early_param+0x4d/0x82
[ 0.000000] ? parse_args+0x212/0x330
[ 0.000000] ? rdinit_setup+0x26/0x26
[ 0.000000] ? parse_early_options+0x20/0x23
[ 0.000000] ? rdinit_setup+0x26/0x26
[ 0.000000] ? parse_early_param+0x2d/0x39
[ 0.000000] ? setup_arch+0x2f7/0xbf4
[ 0.000000] ? start_kernel+0x5e/0x4c2
[ 0.000000] ? load_ucode_bsp+0x113/0x12f
[ 0.000000] ? secondary_startup_64+0xa5/0xb0
This patch adds checks to prevent the panic.
Signed-off-by: He Zhe <zhe.he@windriver.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: gregkh@linuxfoundation.org
Cc: kstewart@linuxfoundation.org
Cc: pombredanne@nexb.com
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1534260823-87917-1-git-send-email-zhe.he@windriver.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 53c613fe63 upstream.
STIBP is a feature provided by certain Intel ucodes / CPUs. This feature
(once enabled) prevents cross-hyperthread control of decisions made by
indirect branch predictors.
Enable this feature if
- the CPU is vulnerable to spectre v2
- the CPU supports SMT and has SMT siblings online
- spectre_v2 mitigation autoselection is enabled (default)
After some previous discussion, this leaves STIBP on all the time, as wrmsr
on crossing kernel boundary is a no-no. This could perhaps later be a bit
more optimized (like disabling it in NOHZ, experiment with disabling it in
idle, etc) if needed.
Note that the synchronization of the mask manipulation via newly added
spec_ctrl_mutex is currently not strictly needed, as the only updater is
already being serialized by cpu_add_remove_lock, but let's make this a
little bit more future-proof.
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "WoodhouseDavid" <dwmw@amazon.co.uk>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: "SchauflerCasey" <casey.schaufler@intel.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1809251438240.15880@cbobk.fhfr.pm
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ac237c28d5 upstream.
The Creative Audigy SE (SB0570) card currently exhibits an audible pop
whenever playback is stopped or resumed, or during silent periods of an
audio stream. Initialise the IZD bit to the 0 to eliminate these pops.
The Infinite Zero Detection (IZD) feature on the DAC causes the output
to be shunted to Vcap after 2048 samples of silence. This discharges the
AC coupling capacitor through the output and causes the aforementioned
pop/click noise.
The behaviour of the IZD bit is described on page 15 of the WM8768GEDS
datasheet: "With IZD=1, applying MUTE for 1024 consecutive input samples
will cause all outputs to be connected directly to VCAP. This also
happens if 2048 consecutive zero input samples are applied to all 6
channels, and IZD=0. It will be removed as soon as any channel receives
a non-zero input". I believe the second sentence might be referring to
IZD=1 instead of IZD=0 given the observed behaviour of the card.
This change should make the DAC initialisation consistent with
Creative's Windows driver, as this popping persists when initialising
the card in Linux and soft rebooting into Windows, but is not present on
a cold boot to Windows.
Signed-off-by: Alex Stanoev <alex@astanoev.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5b7c5e1f4c upstream.
BIOS on ASUS G751 doesn't seem to map the headphone pin (NID 0x16)
correctly. Add a quirk to address it, as well as chaining to the
previous fix for the microphone.
Reported-by: Håvard <hovardslill@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3c229b3f2d upstream.
Fix a long-existing small nasty bug in the map_pages() implementation which
leads to overwriting already written pte entries with zero, *if* map_pages() is
called a second time with an end address which isn't aligned on a pmd boundry.
This happens for example if we want to remap only the text segment read/write
in order to run alternative patching on the code. Exiting the loop when we
reach the end address fixes this.
Cc: stable@vger.kernel.org
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 95691e3edd upstream.
Currently, "disable_clkrun" yenta_socket module parameter is only
implemented for TI CardBus bridges.
Add also an implementation for Ricoh bridges that have the necessary
setting documented in publicly available datasheets.
Tested on a RL5C476II with a Sunrich C-160 CardBus NIC that doesn't work
correctly unless the CLKRUN protocol is disabled.
Let's also make it clear in its description that the "disable_clkrun"
module parameter only works on these two previously mentioned brands of
CardBus bridges.
Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
Cc: stable@vger.kernel.org
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 92e2921f7e upstream.
When an invalid mount option is passed to jffs2, jffs2_parse_options()
will fail and jffs2_sb_info will be freed, but then jffs2_sb_info will
be used (use-after-free) and freeed (double-free) in jffs2_kill_sb().
Fix it by removing the buggy invocation of kfree() when getting invalid
mount options.
Fixes: 92abc475d8 ("jffs2: implement mount option parsing and compression overriding")
Cc: stable@kernel.org
Signed-off-by: Hou Tao <houtao1@huawei.com>
Reviewed-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e7c6a55606 upstream.
Devices with compatible="pmbus" field have zero initial page count,
and pmbus_clear_faults() being called before the page count auto-
detection does not actually clear faults because it depends on the
page count. Non-cleared faults in its turn may fail the subsequent
page count auto-detection.
This patch fixes this problem by calling pmbus_clear_fault_page()
for currently set page and calling pmbus_clear_faults() after the
page count was detected.
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Bazhenov <bazhenov.dn@gmail.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2d6cb6edd2 upstream.
refill->end record the last key of writeback, for example, at the first
time, keys (1,128K) to (1,1024K) are flush to the backend device, but
the end key (1,1024K) is not included, since the bellow code:
if (bkey_cmp(k, refill->end) >= 0) {
ret = MAP_DONE;
goto out;
}
And in the next time when we refill writeback keybuf again, we searched
key start from (1,1024K), and got a key bigger than it, so the key
(1,1024K) missed.
This patch modify the above code, and let the end key to be included to
the writeback key buffer.
Signed-off-by: Tang Junhui <tang.junhui.linux@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 78c9c4dfbf ]
The posix timer overrun handling is broken because the forwarding functions
can return a huge number of overruns which does not fit in an int. As a
consequence timer_getoverrun(2) and siginfo::si_overrun can turn into
random number generators.
The k_clock::timer_forward() callbacks return a 64 bit value now. Make
k_itimer::ti_overrun[_last] 64bit as well, so the kernel internal
accounting is correct. 3Remove the temporary (int) casts.
Add a helper function which clamps the overrun value returned to user space
via timer_getoverrun(2) or siginfo::si_overrun limited to a positive value
between 0 and INT_MAX. INT_MAX is an indicator for user space that the
overrun value has been clamped.
Reported-by: Team OWL337 <icytxw@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: John Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Link: https://lkml.kernel.org/r/20180626132705.018623573@linutronix.de
[florian: Make patch apply to v4.9.135]
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 53c13ba8ed upstream.
Clang warns that the declaration of jiffies in include/linux/jiffies.h
doesn't match the definition in arch/x86/time/kernel.c:
arch/x86/kernel/time.c:29:42: warning: section does not match previous declaration [-Wsection]
__visible volatile unsigned long jiffies __cacheline_aligned = INITIAL_JIFFIES;
^
./include/linux/cache.h:49:4: note: expanded from macro '__cacheline_aligned'
__section__(".data..cacheline_aligned")))
^
./include/linux/jiffies.h:81:31: note: previous attribute is here
extern unsigned long volatile __cacheline_aligned_in_smp __jiffy_arch_data jiffies;
^
./arch/x86/include/asm/cache.h:20:2: note: expanded from macro '__cacheline_aligned_in_smp'
__page_aligned_data
^
./include/linux/linkage.h:39:29: note: expanded from macro '__page_aligned_data'
#define __page_aligned_data __section(.data..page_aligned) __aligned(PAGE_SIZE)
^
./include/linux/compiler_attributes.h:233:56: note: expanded from macro '__section'
#define __section(S) __attribute__((__section__(#S)))
^
1 warning generated.
The declaration was changed in commit 7c30f352c8 ("jiffies.h: declare
jiffies and jiffies_64 with ____cacheline_aligned_in_smp") but wasn't
updated here. Make them match so Clang no longer warns.
Fixes: 7c30f352c8 ("jiffies.h: declare jiffies and jiffies_64 with ____cacheline_aligned_in_smp")
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20181013005311.28617-1-natechancellor@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit baa9be4ffb upstream.
With a very low cpu.cfs_quota_us setting, such as the minimum of 1000,
distribute_cfs_runtime may not empty the throttled_list before it runs
out of runtime to distribute. In that case, due to the change from
c06f04c704 to put throttled entries at the head of the list, later entries
on the list will starve. Essentially, the same X processes will get pulled
off the list, given CPU time and then, when expired, get put back on the
head of the list where distribute_cfs_runtime will give runtime to the same
set of processes leaving the rest.
Fix the issue by setting a bit in struct cfs_bandwidth when
distribute_cfs_runtime is running, so that the code in throttle_cfs_rq can
decide to put the throttled entry on the tail or the head of the list. The
bit is set/cleared by the callers of distribute_cfs_runtime while they hold
cfs_bandwidth->lock.
This is easy to reproduce with a handful of CPU consumers. I use 'crash' on
the live system. In some cases you can simply look at the throttled list and
see the later entries are not changing:
crash> list cfs_rq.throttled_list -H 0xffff90b54f6ade40 -s cfs_rq.runtime_remaining | paste - - | awk '{print $1" "$4}' | pr -t -n3
1 ffff90b56cb2d200 -976050
2 ffff90b56cb2cc00 -484925
3 ffff90b56cb2bc00 -658814
4 ffff90b56cb2ba00 -275365
5 ffff90b166a45600 -135138
6 ffff90b56cb2da00 -282505
7 ffff90b56cb2e000 -148065
8 ffff90b56cb2fa00 -872591
9 ffff90b56cb2c000 -84687
10 ffff90b56cb2f000 -87237
11 ffff90b166a40a00 -164582
crash> list cfs_rq.throttled_list -H 0xffff90b54f6ade40 -s cfs_rq.runtime_remaining | paste - - | awk '{print $1" "$4}' | pr -t -n3
1 ffff90b56cb2d200 -994147
2 ffff90b56cb2cc00 -306051
3 ffff90b56cb2bc00 -961321
4 ffff90b56cb2ba00 -24490
5 ffff90b166a45600 -135138
6 ffff90b56cb2da00 -282505
7 ffff90b56cb2e000 -148065
8 ffff90b56cb2fa00 -872591
9 ffff90b56cb2c000 -84687
10 ffff90b56cb2f000 -87237
11 ffff90b166a40a00 -164582
Sometimes it is easier to see by finding a process getting starved and looking
at the sched_info:
crash> task ffff8eb765994500 sched_info
PID: 7800 TASK: ffff8eb765994500 CPU: 16 COMMAND: "cputest"
sched_info = {
pcount = 8,
run_delay = 697094208,
last_arrival = 240260125039,
last_queued = 240260327513
},
crash> task ffff8eb765994500 sched_info
PID: 7800 TASK: ffff8eb765994500 CPU: 16 COMMAND: "cputest"
sched_info = {
pcount = 8,
run_delay = 697094208,
last_arrival = 240260125039,
last_queued = 240260327513
},
Signed-off-by: Phil Auld <pauld@redhat.com>
Reviewed-by: Ben Segall <bsegall@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Fixes: c06f04c704 ("sched: Fix potential near-infinite distribute_cfs_runtime() loop")
Link: http://lkml.kernel.org/r/20181008143639.GA4019@pauld.bos.csb
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 665c365a77 upstream.
Commit 7a68d9fb85 ("USB: usbdevfs: sanitize flags more") checks the
transfer flags for URBs submitted from userspace via usbfs. However,
the check for whether the USBDEVFS_URB_SHORT_NOT_OK flag should be
allowed for a control transfer was added in the wrong place, before
the code has properly determined the direction of the control
transfer. (Control transfers are special because for them, the
direction is set by the bRequestType byte of the Setup packet rather
than direction bit of the endpoint address.)
This patch moves code which sets up the allow_short flag for control
transfers down after is_in has been set to the correct value.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-and-tested-by: syzbot+24a30223a4b609bb802e@syzkaller.appspotmail.com
Fixes: 7a68d9fb85 ("USB: usbdevfs: sanitize flags more")
CC: Oliver Neukum <oneukum@suse.com>
CC: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9ae24af366 upstream.
num can be indirectly controlled by user-space, hence leading to
a potential exploitation of the Spectre variant 1 vulnerability.
This issue was detected with the help of Smatch:
drivers/usb/gadget/function/f_mass_storage.c:3177 fsg_lun_make() warn:
potential spectre issue 'fsg_opts->common->luns' [r] (local cap)
Fix this by sanitizing num before using it to index
fsg_opts->common->luns
Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].
[1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2
Cc: stable@vger.kernel.org
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Felipe Balbi <felipe.balbi@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f976d0e574 upstream.
The usb standard ("Universal Serial Bus Class Definitions for Communication
Devices") distiguishes between "consistent signals" (DSR, DCD), and
"irregular signals" (break, ring, parity error, framing error, overrun).
The bits of "irregular signals" are set, if this error/event occurred on
the device side and are immeadeatly unset, if the serial state notification
was sent.
Like other drivers of real serial ports do, just the occurence of those
events should be counted in serial_icounter_struct (but no 1->0
transitions).
Signed-off-by: Tobias Herzog <t-herzog@gmx.de>
Acked-by: Oliver Neukum <oneukum@suse.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0295e39595 upstream.
hdr.cmd can be indirectly controlled by user-space, hence leading to
a potential exploitation of the Spectre variant 1 vulnerability.
This issue was detected with the help of Smatch:
drivers/infiniband/core/ucm.c:1127 ib_ucm_write() warn: potential
spectre issue 'ucm_cmd_table' [r] (local cap)
Fix this by sanitizing hdr.cmd before using it to index
ucm_cmd_table.
Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].
[1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2
Cc: stable@vger.kernel.org
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a3671a4f97 upstream.
hdr.cmd can be indirectly controlled by user-space, hence leading to
a potential exploitation of the Spectre variant 1 vulnerability.
This issue was detected with the help of Smatch:
drivers/infiniband/core/ucma.c:1686 ucma_write() warn: potential
spectre issue 'ucma_cmd_table' [r] (local cap)
Fix this by sanitizing hdr.cmd before using it to index
ucm_cmd_table.
Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].
[1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2
Cc: stable@vger.kernel.org
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit efa61c8cf2 upstream.
pin_index can be indirectly controlled by user-space, hence leading
to a potential exploitation of the Spectre variant 1 vulnerability.
This issue was detected with the help of Smatch:
drivers/ptp/ptp_chardev.c:253 ptp_ioctl() warn: potential spectre issue
'ops->pin_config' [r] (local cap)
Fix this by sanitizing pin_index before using it to index
ops->pin_config, and before passing it as an argument to
function ptp_set_pinfunc(), in which it is used to index
info->pin_config.
Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].
[1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2
Cc: stable@vger.kernel.org
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 169b803397 upstream.
the victim might've been rmdir'ed just before the lock_rename();
unlike the normal callers, we do not look the source up after the
parents are locked - we know it beforehand and just recheck that it's
still the child of what used to be its parent. Unfortunately,
the check is too weak - we don't spot a dead directory since its
->d_parent is unchanged, dentry is positive, etc. So we sail all
the way to ->rename(), with hosting filesystems _not_ expecting
to be asked renaming an rmdir'ed subdirectory.
The fix is easy, fortunately - the lock on parent is sufficient for
making IS_DEADDIR() on child safe.
Cc: stable@vger.kernel.org
Fixes: 9ae326a690 (CacheFiles: A cache that backs onto a mounted filesystem)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a606ebdb85 ]
The truncate transaction does not ever modify the inode btree, but
includes an associated log reservation. Update
xfs_calc_itruncate_reservation() to remove the reservation
associated with inobt updates.
[Amir: This commit was merged for kernel v4.16 and a twin commit was
merged for xfsprogs v4.16. As a result, a small xfs filesystem
formatted with features -m rmapbt=1,reflink=1 using mkfs.xfs
version >= v4.16 cannot be mounted with kernel < v4.16.
For example, xfstests generic/17{1,2,3} format a small fs and
when trying to mount it, they fail with an assert on this very
demonic line:
XFS (vdc): Log size 3075 blocks too small, minimum size is 3717 blocks
XFS (vdc): AAIEEE! Log failed size checks. Abort!
XFS: Assertion failed: 0, file: src/linux/fs/xfs/xfs_log.c, line: 666
The simple solution for stable kernels is to apply this patch,
because mkfs.xfs v4.16 is already in the wild, so we have to
assume that xfs filesystems with a "too small" log exist.
Regardless, xfsprogs maintainers should also consider reverting
the twin patch to stop creating those filesystems for the sake
of users with unpatched kernels.]
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Cc: <stable@vger.kernel.org> # v4.9+
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Darrick J . Wong <darrick.wong@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 833eacc7b5 ]
The MXS driver was calling back into the GPIO API from
its irqchip. This is not very elegant, as we are a driver,
let's just shortcut back into the gpio_chip .get() function
instead.
This is a tricky case since the .get() callback is not in
this file, instead assigned by bgpio_init(). Calling the
function direcly in the gpio_chip is however the lesser
evil.
Cc: Sascha Hauer <s.hauer@pengutronix.de>
Cc: Janusz Uzycki <j.uzycki@elproma.com.pl>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d312fefea8 ]
ahci_pci_reset_controller() calls ahci_reset_controller(), which may
fail, but ignores the result code and always returns success. This
may result in failures like below
ahci 0000:02:00.0: version 3.0
ahci 0000:02:00.0: enabling device (0000 -> 0003)
ahci 0000:02:00.0: SSS flag set, parallel bus scan disabled
ahci 0000:02:00.0: controller reset failed (0xffffffff)
ahci 0000:02:00.0: failed to stop engine (-5)
... repeated many times ...
ahci 0000:02:00.0: failed to stop engine (-5)
Unable to handle kernel paging request at virtual address ffff0000093f9018
...
PC is at ahci_stop_engine+0x5c/0xd8 [libahci]
LR is at ahci_deinit_port.constprop.12+0x1c/0xc0 [libahci]
...
[<ffff000000a17014>] ahci_stop_engine+0x5c/0xd8 [libahci]
[<ffff000000a196b4>] ahci_deinit_port.constprop.12+0x1c/0xc0 [libahci]
[<ffff000000a197d8>] ahci_init_controller+0x80/0x168 [libahci]
[<ffff000000a260f8>] ahci_pci_init_controller+0x60/0x68 [ahci]
[<ffff000000a26f94>] ahci_init_one+0x75c/0xd88 [ahci]
[<ffff000008430324>] local_pci_probe+0x3c/0xb8
[<ffff000008431728>] pci_device_probe+0x138/0x170
[<ffff000008585e54>] driver_probe_device+0x2dc/0x458
[<ffff0000085860e4>] __driver_attach+0x114/0x118
[<ffff000008583ca8>] bus_for_each_dev+0x60/0xa0
[<ffff000008585638>] driver_attach+0x20/0x28
[<ffff0000085850b0>] bus_add_driver+0x1f0/0x2a8
[<ffff000008586ae0>] driver_register+0x60/0xf8
[<ffff00000842f9b4>] __pci_register_driver+0x3c/0x48
[<ffff000000a3001c>] ahci_pci_driver_init+0x1c/0x1000 [ahci]
[<ffff000008083918>] do_one_initcall+0x38/0x120
where an obvious hardware level failure results in an unnecessary 15 second
delay and a subsequent crash.
So record the result code of ahci_reset_controller() and relay it, rather
than ignoring it.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9039f3ef44 ]
The SCTP program may sleep under a spinlock, and the function call path is:
sctp_generate_t3_rtx_event (acquire the spinlock)
sctp_do_sm
sctp_side_effects
sctp_cmd_interpreter
sctp_make_init_ack
sctp_pack_cookie
crypto_shash_setkey
shash_setkey_unaligned
kmalloc(GFP_KERNEL)
For the same reason, the orinoco driver may sleep in interrupt handler,
and the function call path is:
orinoco_rx_isr_tasklet
orinoco_rx
orinoco_mic
crypto_shash_setkey
shash_setkey_unaligned
kmalloc(GFP_KERNEL)
To fix it, GFP_KERNEL is replaced with GFP_ATOMIC.
This bug is found by my static analysis tool and my code review.
Signed-off-by: Jia-Ju Bai <baijiaju1990@163.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
This reverts commit 3a8304b7ad, which was
upstream commit 05ab1d8a4b.
Ben Hutchings writes:
This backport is incorrect. The part that updated __startup_64() in
arch/x86/kernel/head64.c was dropped, presumably because that function
doesn't exist in 4.9. However that seems to be an essential of the
fix. In 4.9 the startup_64 routine in arch/x86/kernel/head_64.S would
need to be changed instead.
I also found that this introduces new boot-time warnings on some
systems if CONFIG_DEBUG_WX is enabled.
So, unless someone provides fixes for those issues, I think this should
be reverted for the 4.9 branch.
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d4d576f5ab ]
Commit 058214a4d1 ("ip6_tun: Add infrastructure for doing
encapsulation") added the ip6_tnl_encap() call in ip6_tnl_xmit(), before
the call to ipv6_push_frag_opts() to append the IPv6 Tunnel Encapsulation
Limit option (option 4, RFC 2473, par. 5.1) to the outer IPv6 header.
As long as the option didn't actually end up in generated packets, this
wasn't an issue. Then commit 89a23c8b52 ("ip6_tunnel: Fix missing tunnel
encapsulation limit option") fixed sending of this option, and the
resulting layout, e.g. for FoU, is:
.-------------------.------------.----------.-------------------.----- - -
| Outer IPv6 Header | UDP header | Option 4 | Inner IPv6 Header | Payload
'-------------------'------------'----------'-------------------'----- - -
Needless to say, FoU and GUE (at least) won't work over IPv6. The option
is appended by default, and I couldn't find a way to disable it with the
current iproute2.
Turn this into a more reasonable:
.-------------------.----------.------------.-------------------.----- - -
| Outer IPv6 Header | Option 4 | UDP header | Inner IPv6 Header | Payload
'-------------------'----------'------------'-------------------'----- - -
With this, and with 84dad55951 ("udp6: fix encap return code for
resubmitting"), FoU and GUE work again over IPv6.
Fixes: 058214a4d1 ("ip6_tun: Add infrastructure for doing encapsulation")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d55bef5059 ]
We've been getting checksum errors involving small UDP packets, usually
59B packets with 1 extra non-zero padding byte. netdev_rx_csum_fault()
has been complaining that HW is providing bad checksums. Turns out the
problem is in pskb_trim_rcsum_slow(), introduced in commit 88078d98d1
("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends").
The source of the problem is that when the bytes we are trimming start
at an odd address, as in the case of the 1 padding byte above,
skb_checksum() returns a byte-swapped value. We cannot just combine this
with skb->csum using csum_sub(). We need to use csum_block_sub() here
that takes into account the parity of the start address and handles the
swapping.
Matches existing code in __skb_postpull_rcsum() and esp_remove_trailer().
Fixes: 88078d98d1 ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends")
Signed-off-by: Dimitris Michailidis <dmichail@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 7de414a9dd ]
Most callers of pskb_trim_rcsum() simply drop the skb when
it fails, however, ip_check_defrag() still continues to pass
the skb up to stack. This is suspicious.
In ip_check_defrag(), after we learn the skb is an IP fragment,
passing the skb to callers makes no sense, because callers expect
fragments are defrag'ed on success. So, dropping the skb when we
can't defrag it is reasonable.
Note, prior to commit 88078d98d1, this is not a big problem as
checksum will be fixed up anyway. After it, the checksum is not
correct on failure.
Found this during code review.
Fixes: 88078d98d1 ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends")
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 414dd6fb9a ]
The attribute IFLA_BOND_AD_ACTOR_SYSTEM is sent to user space having the
length of sizeof(bond->params.ad_actor_system) which is 8 byte. This
patch aligns the length to ETH_ALEN to have the same MAC address exposed
as using sysfs.
Fixes: f87fda00b6 ("bonding: prevent out of bound accesses")
Signed-off-by: Tobias Jungel <tobias.jungel@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 58f5bbe331 ]
In dev_ethtool(), the eth command 'ethcmd' is firstly copied from the
use-space buffer 'useraddr' and checked to see whether it is
ETHTOOL_PERQUEUE. If yes, the sub-command 'sub_cmd' is further copied from
the user space. Otherwise, 'sub_cmd' is the same as 'ethcmd'. Next,
according to 'sub_cmd', a permission check is enforced through the function
ns_capable(). For example, the permission check is required if 'sub_cmd' is
ETHTOOL_SCOALESCE, but it is not necessary if 'sub_cmd' is
ETHTOOL_GCOALESCE, as suggested in the comment "Allow some commands to be
done by anyone". The following execution invokes different handlers
according to 'ethcmd'. Specifically, if 'ethcmd' is ETHTOOL_PERQUEUE,
ethtool_set_per_queue() is called. In ethtool_set_per_queue(), the kernel
object 'per_queue_opt' is copied again from the user-space buffer
'useraddr' and 'per_queue_opt.sub_command' is used to determine which
operation should be performed. Given that the buffer 'useraddr' is in the
user space, a malicious user can race to change the sub-command between the
two copies. In particular, the attacker can supply ETHTOOL_PERQUEUE and
ETHTOOL_GCOALESCE to bypass the permission check in dev_ethtool(). Then
before ethtool_set_per_queue() is called, the attacker changes
ETHTOOL_GCOALESCE to ETHTOOL_SCOALESCE. In this way, the attacker can
bypass the permission check and execute ETHTOOL_SCOALESCE.
This patch enforces a check in ethtool_set_per_queue() after the second
copy from 'useraddr'. If the sub-command is different from the one obtained
in the first copy in dev_ethtool(), an error code EINVAL will be returned.
Fixes: f38d138a7d ("net/ethtool: support set coalesce per queue")
Signed-off-by: Wenwen Wang <wang6495@umn.edu>
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ff002269a4 ]
The idx in vhost_vring_ioctl() was controlled by userspace, hence a
potential exploitation of the Spectre variant 1 vulnerability.
Fixing this by sanitizing idx before using it to index d->vqs.
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b336decab2 ]
syzbot reported an use-after-free involving sctp_id2asoc. Dmitry Vyukov
helped to root cause it and it is because of reading the asoc after it
was freed:
CPU 1 CPU 2
(working on socket 1) (working on socket 2)
sctp_association_destroy
sctp_id2asoc
spin lock
grab the asoc from idr
spin unlock
spin lock
remove asoc from idr
spin unlock
free(asoc)
if asoc->base.sk != sk ... [*]
This can only be hit if trying to fetch asocs from different sockets. As
we have a single IDR for all asocs, in all SCTP sockets, their id is
unique on the system. An application can try to send stuff on an id
that matches on another socket, and the if in [*] will protect from such
usage. But it didn't consider that as that asoc may belong to another
socket, it may be freed in parallel (read: under another socket lock).
We fix it by moving the checks in [*] into the protected region. This
fixes it because the asoc cannot be freed while the lock is held.
Reported-by: syzbot+c7dd55d7aec49d48e49a@syzkaller.appspotmail.com
Acked-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 6b839b6cf9 ]
rtl_rx() and rtl_tx() are called only if the respective bits are set
in the interrupt status register. Under high load NAPI may not be
able to process all data (work_done == budget) and it will schedule
subsequent calls to the poll callback.
rtl_ack_events() however resets the bits in the interrupt status
register, therefore subsequent calls to rtl8169_poll() won't call
rtl_rx() and rtl_tx() - chip interrupts are still disabled.
Fix this by calling rtl_rx() and rtl_tx() independent of the bits
set in the interrupt status register. Both functions will detect
if there's nothing to do for them.
Fixes: da78dbff2e ("r8169: remove work from irq handler.")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit db4f1be3ca ]
Current handling of CHECKSUM_COMPLETE packets by the UDP stack is
incorrect for any packet that has an incorrect checksum value.
udp4/6_csum_init() will both make a call to
__skb_checksum_validate_complete() to initialize/validate the csum
field when receiving a CHECKSUM_COMPLETE packet. When this packet
fails validation, skb->csum will be overwritten with the pseudoheader
checksum so the packet can be fully validated by software, but the
skb->ip_summed value will be left as CHECKSUM_COMPLETE so that way
the stack can later warn the user about their hardware spewing bad
checksums. Unfortunately, leaving the SKB in this state can cause
problems later on in the checksum calculation.
Since the the packet is still marked as CHECKSUM_COMPLETE,
udp_csum_pull_header() will SUBTRACT the checksum of the UDP header
from skb->csum instead of adding it, leaving us with a garbage value
in that field. Once we try to copy the packet to userspace in the
udp4/6_recvmsg(), we'll make a call to skb_copy_and_csum_datagram_msg()
to checksum the packet data and add it in the garbage skb->csum value
to perform our final validation check.
Since the value we're validating is not the proper checksum, it's possible
that the folded value could come out to 0, causing us not to drop the
packet. Instead, we believe that the packet was checksummed incorrectly
by hardware since skb->ip_summed is still CHECKSUM_COMPLETE, and we attempt
to warn the user with netdev_rx_csum_fault(skb->dev);
Unfortunately, since this is the UDP path, skb->dev has been overwritten
by skb->dev_scratch and is no longer a valid pointer, so we end up
reading invalid memory.
This patch addresses this problem in two ways:
1) Do not use the dev pointer when calling netdev_rx_csum_fault()
from skb_copy_and_csum_datagram_msg(). Since this gets called
from the UDP path where skb->dev has been overwritten, we have
no way of knowing if the pointer is still valid. Also for the
sake of consistency with the other uses of
netdev_rx_csum_fault(), don't attempt to call it if the
packet was checksummed by software.
2) Add better CHECKSUM_COMPLETE handling to udp4/6_csum_init().
If we receive a packet that's CHECKSUM_COMPLETE that fails
verification (i.e. skb->csum_valid == 0), check who performed
the calculation. It's possible that the checksum was done in
software by the network stack earlier (such as Netfilter's
CONNTRACK module), and if that says the checksum is bad,
we can drop the packet immediately instead of waiting until
we try and copy it to userspace. Otherwise, we need to
mark the SKB as CHECKSUM_NONE, since the skb->csum field
no longer contains the full packet checksum after the
call to __skb_checksum_validate_complete().
Fixes: e6afc8ace6 ("udp: remove headers from UDP packets before queueing")
Fixes: c84d949057 ("udp: copy skb->truesize in the first cache line")
Cc: Sam Kumar <samanthakumar@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Sean Tranchetti <stranche@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 30549aab14 ]
When building stmmac, it is only possible to select CONFIG_DWMAC_GENERIC,
or any of the glue drivers, when CONFIG_STMMAC_PLATFORM is set.
The only exception is CONFIG_STMMAC_PCI.
When calling of_mdiobus_register(), it will call our ->reset()
callback, which is set to stmmac_mdio_reset().
Most of the code in stmmac_mdio_reset() is protected by a
"#if defined(CONFIG_STMMAC_PLATFORM)", which will evaluate
to false when CONFIG_STMMAC_PLATFORM=m.
Because of this, the phy reset gpio will only be pulled when
stmmac is built as built-in, but not when built as modules.
Fix this by using "#if IS_ENABLED()" instead of "#if defined()".
Signed-off-by: Niklas Cassel <niklas.cassel@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b6168562c8 ]
In ethtool_ioctl(), the ioctl command 'ethcmd' is checked through a switch
statement to see whether it is necessary to pre-process the ethtool
structure, because, as mentioned in the comment, the structure
ethtool_rxnfc is defined with padding. If yes, a user-space buffer 'rxnfc'
is allocated through compat_alloc_user_space(). One thing to note here is
that, if 'ethcmd' is ETHTOOL_GRXCLSRLALL, the size of the buffer 'rxnfc' is
partially determined by 'rule_cnt', which is actually acquired from the
user-space buffer 'compat_rxnfc', i.e., 'compat_rxnfc->rule_cnt', through
get_user(). After 'rxnfc' is allocated, the data in the original user-space
buffer 'compat_rxnfc' is then copied to 'rxnfc' through copy_in_user(),
including the 'rule_cnt' field. However, after this copy, no check is
re-enforced on 'rxnfc->rule_cnt'. So it is possible that a malicious user
race to change the value in the 'compat_rxnfc->rule_cnt' between these two
copies. Through this way, the attacker can bypass the previous check on
'rule_cnt' and inject malicious data. This can cause undefined behavior of
the kernel and introduce potential security risk.
This patch avoids the above issue via copying the value acquired by
get_user() to 'rxnfc->rule_cn', if 'ethcmd' is ETHTOOL_GRXCLSRLALL.
Signed-off-by: Wenwen Wang <wang6495@umn.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 38b4f18d56 ]
gred_change_table_def() takes a pointer to TCA_GRED_DPS attribute,
and expects it will be able to interpret its contents as
struct tc_gred_sopt. Pass the correct gred attribute, instead of
TCA_OPTIONS.
This bug meant the table definition could never be changed after
Qdisc was initialized (unless whatever TCA_OPTIONS contained both
passed netlink validation and was a valid struct tc_gred_sopt...).
Old behaviour:
$ ip link add type dummy
$ tc qdisc replace dev dummy0 parent root handle 7: \
gred setup vqs 4 default 0
$ tc qdisc replace dev dummy0 parent root handle 7: \
gred setup vqs 4 default 0
RTNETLINK answers: Invalid argument
Now:
$ ip link add type dummy
$ tc qdisc replace dev dummy0 parent root handle 7: \
gred setup vqs 4 default 0
$ tc qdisc replace dev dummy0 parent root handle 7: \
gred setup vqs 4 default 0
$ tc qdisc replace dev dummy0 parent root handle 7: \
gred setup vqs 4 default 0
Fixes: f62d6b936d ("[PKT_SCHED]: GRED: Use central VQ change procedure")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4ba4c566ba ]
The loop wants to skip previously dumped addresses, so loops until
current index >= saved index. If the message fills it wants to save
the index for the next address to dump - ie., the one that did not
fit in the current message.
Currently, it is incrementing the index counter before comparing to the
saved index, and then the saved index is off by 1 - it assumes the
current address is going to fit in the message.
Change the index handling to increment only after a succesful dump.
Fixes: 502a2ffd73 ("ipv6: convert idev_list to list macros")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5a8e7aea95 ]
WHen an llc sock is added into the sk_laddr_hash of an llc_sap,
it is not marked with SOCK_RCU_FREE.
This causes that the sock could be freed while it is still being
read by __llc_lookup_established() with RCU read lock. sock is
refcounted, but with RCU read lock, nothing prevents the readers
getting a zero refcnt.
Fix it by setting SOCK_RCU_FREE in llc_sap_add_socket().
Reported-by: syzbot+11e05f04c15e03be5254@syzkaller.appspotmail.com
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ee1abcf689 ]
Commit a61bbcf28a ("[NET]: Store skb->timestamp as offset to a base
timestamp") introduces a neighbour control buffer and zeroes it out in
ndisc_rcv(), as ndisc_recv_ns() uses it.
Commit f2776ff047 ("[IPV6]: Fix address/interface handling in UDP and
DCCP, according to the scoping architecture.") introduces the usage of the
IPv6 control buffer in protocol error handlers (e.g. inet6_iif() in
present-day __udp6_lib_err()).
Now, with commit b94f1c0904 ("ipv6: Use icmpv6_notify() to propagate
redirect, instead of rt6_redirect()."), we call protocol error handlers
from ndisc_redirect_rcv(), after the control buffer is already stolen and
some parts are already zeroed out. This implies that inet6_iif() on this
path will always return zero.
This gives unexpected results on UDP socket lookup in __udp6_lib_err(), as
we might actually need to match sockets for a given interface.
Instead of always claiming the control buffer in ndisc_rcv(), do that only
when needed.
Fixes: b94f1c0904 ("ipv6: Use icmpv6_notify() to propagate redirect, instead of rt6_redirect().")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0fe5119e26 upstream.
Recently a check was added which prevents marking of routers with zero
source address, but for IPv6 that cannot happen as the relevant RFCs
actually forbid such packets:
RFC 2710 (MLDv1):
"To be valid, the Query message MUST
come from a link-local IPv6 Source Address, be at least 24 octets
long, and have a correct MLD checksum."
Same goes for RFC 3810.
And also it can be seen as a requirement in ipv6_mc_check_mld_query()
which is used by the bridge to validate the message before processing
it. Thus any queries with :: source address won't be processed anyway.
So just remove the check for zero IPv6 source address from the query
processing function.
Fixes: 5a2de63fd1 ("bridge: do not add port to router list when receives query with source 0.0.0.0")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5a2de63fd1 upstream.
Based on RFC 4541, 2.1.1. IGMP Forwarding Rules
The switch supporting IGMP snooping must maintain a list of
multicast routers and the ports on which they are attached. This
list can be constructed in any combination of the following ways:
a) This list should be built by the snooping switch sending
Multicast Router Solicitation messages as described in IGMP
Multicast Router Discovery [MRDISC]. It may also snoop
Multicast Router Advertisement messages sent by and to other
nodes.
b) The arrival port for IGMP Queries (sent by multicast routers)
where the source address is not 0.0.0.0.
We should not add the port to router list when receives query with source
0.0.0.0.
Reported-by: Ying Xu <yinxu@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit da15fc2fa9 ]
The Yocto build system does a 'make clean' when rebuilding due to
changed dependencies, and that consistently fails for me (causing the
whole BSP build to fail) with errors such as
| find: '[...]/perf/1.0-r9/perf-1.0/plugin_mac80211.so': No such file or directory
| find: '[...]/perf/1.0-r9/perf-1.0/plugin_mac80211.so': No such file or directory
| find: find: '[...]/perf/1.0-r9/perf-1.0/libtraceevent.a''[...]/perf/1.0-r9/perf-1.0/libtraceevent.a': No such file or directory: No such file or directory
|
[...]
| find: cannot delete '/mnt/xfs/devel/pil/yocto/tmp-glibc/work/wandboard-oe-linux-gnueabi/perf/1.0-r9/perf-1.0/util/.pstack.o.cmd': No such file or directory
Apparently (despite the comment), 'make clean' ends up launching
multiple sub-makes that all want to remove the same things - perhaps
this only happens in combination with a O=... parameter. In any case, we
don't lose much by explicitly disabling the parallelism for the clean
target, and it makes automated builds much more reliable.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180705131527.19749-1-linux@rasmusvillemoes.dk
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
This reverts commit ad8b1ffc3e.
From Florian Westphal <fw@strlen.de>:
It causes kernel crash for locally generated ipv6 fragments
when netfilter ipv6 defragmentation is used.
The faulty commit is not essential for -stable, it only
delays netns teardown for longer than needed when that netns
still has ipv6 frags queued. Much better than crash :-/
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 78a55d05de ]
napi poll functions should be initialized before running request_irq(),
to handle a rare condition where there is a pending interrupt, causing
the ISR to fire immediately while the poll function wasn't set yet,
causing a NULL dereference.
Fixes: 1738cd3ed3 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 298bc15b20 ]
Move the out-of-order and duplicate ACK packet check to before the call to
rxrpc_input_ackinfo() so that the receive window size and MTU size are only
checked in the latest ACK packet and don't regress.
Fixes: 248f219cb8 ("rxrpc: Rewrite the data and ack handling code")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c479d5f2c2 ]
We should only call the function to end a call's Tx phase if we rotated the
marked-last packet out of the transmission buffer.
Make rxrpc_rotate_tx_window() return an indication of whether it just
rotated the packet marked as the last out of the transmit buffer, carrying
the information out of the locked section in that function.
We can then check the return value instead of examining RXRPC_CALL_TX_LAST.
Fixes: 70790dbe3f ("rxrpc: Pass the last Tx packet marker in the annotation buffer")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit eea96566c1 ]
The maximum CPU frequency for the i.MX53 QSB is 1GHz, so disable the
1.2GHz OPP. This makes the board work again with configs that have
cpufreq enabled like imx_v6_v7_defconfig on which the board stopped
working with the addition of cpufreq-dt support.
Fixes: 791f416608 ("ARM: dts: imx53: add cpufreq-dt support")
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit aa90f9f955 ]
Recently, the subtest numbering was changed to start from 1. While it
is fine for displaying results, this should not be the case when the
subtests are actually invoked.
Typically, the subtests are stored in zero-indexed arrays and invoked
based on the index passed to the main test function. Since the index
now starts from 1, the second subtest in the array (index 1) gets
invoked instead of the first (index 0). This applies to all of the
following subtests but for the last one, the subtest always fails
because it does not meet the boundary condition of the subtest index
being lesser than the number of subtests.
This can be observed on powerpc64 and x86_64 systems running Fedora 28
as shown below.
Before:
# perf test "builtin clang support"
55: builtin clang support :
55.1: builtin clang compile C source to IR : Ok
55.2: builtin clang compile C source to ELF object : FAILED!
# perf test "LLVM search and compile"
38: LLVM search and compile :
38.1: Basic BPF llvm compile : Ok
38.2: kbuild searching : Ok
38.3: Compile source for BPF prologue generation : Ok
38.4: Compile source for BPF relocation : FAILED!
# perf test "BPF filter"
40: BPF filter :
40.1: Basic BPF filtering : Ok
40.2: BPF pinning : Ok
40.3: BPF prologue generation : Ok
40.4: BPF relocation checker : FAILED!
After:
# perf test "builtin clang support"
55: builtin clang support :
55.1: builtin clang compile C source to IR : Ok
55.2: builtin clang compile C source to ELF object : Ok
# perf test "LLVM search and compile"
38: LLVM search and compile :
38.1: Basic BPF llvm compile : Ok
38.2: kbuild searching : Ok
38.3: Compile source for BPF prologue generation : Ok
38.4: Compile source for BPF relocation : Ok
# perf test "BPF filter"
40: BPF filter :
40.1: Basic BPF filtering : Ok
40.2: BPF pinning : Ok
40.3: BPF prologue generation : Ok
40.4: BPF relocation checker : Ok
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Hendrik Brueckner <brueckner@linux.ibm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Fixes: 9ef0112442 ("perf test: Fix subtest number when showing results")
Link: http://lkml.kernel.org/r/20180726171733.33208-1-sandipan@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2278446e2b ]
Hub driver will try to disable a USB3 device twice at logical disconnect,
racing with xhci_free_dev() callback from the first port disable.
This can be triggered with "udisksctl power-off --block-device <disk>"
or by writing "1" to the "remove" sysfs file for a USB3 device
in 4.17-rc4.
USB3 devices don't have a similar disabled link state as USB2 devices,
and use a U3 suspended link state instead. In this state the port
is still enabled and connected.
hub_port_connect() first disconnects the device, then later it notices
that device is still enabled (due to U3 states) it will try to disable
the port again (set to U3).
The xhci_free_dev() called during device disable is async, so checking
for existing xhci->devs[i] when setting link state to U3 the second time
was successful, even if device was being freed.
The regression was caused by, and whole thing revealed by,
Commit 44a182b9d1 ("xhci: Fix use-after-free in xhci_free_virt_device")
which sets xhci->devs[i]->udev to NULL before xhci_virt_dev() returned.
and causes a NULL pointer dereference the second time we try to set U3.
Fix this by checking xhci->devs[i]->udev exists before setting link state.
The original patch went to stable so this fix needs to be applied there as
well.
Fixes: 44a182b9d1 ("xhci: Fix use-after-free in xhci_free_virt_device")
Cc: <stable@vger.kernel.org>
Reported-by: Jordan Glover <Golden_Miller83@protonmail.ch>
Tested-by: Jordan Glover <Golden_Miller83@protonmail.ch>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4f666675cd ]
When powering down a SDIO connected card during suspend, make sure to call
into the generic lbs_suspend() function before pulling the plug. This will
make sure the card is successfully deregistered from the system to avoid
communication to the card starving out.
Fixes: 7444a80929 ("libertas: fix suspend and resume for SDIO connected cards")
Signed-off-by: Daniel Mack <daniel@zonque.org>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3dc7c7badb ]
Before returning -EPERM we should release some resources, as already done
in the other error handling path of the function.
Fixes: d8f9cc328c ("IB/mlx4: Mark user MR as writable if actual virtual memory is writable")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c739969849 ]
Commit 42de82a8b5 previously attempted to fix this, and it did
correctly pad the MN and FR fields with spaces, but the SN field still
contains 0 bytes. The current code fills out the first 16 bytes with
hex2bin, leaving the last 4 bytes zeroed. Rather than adding a lot of
error-prone math to avoid overwriting SN twice, just set the whole thing
to spaces up front (it's only 20 bytes).
Fixes: 42de82a8b5 ("nvmet: don't report 0-bytes in serial number")
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Reviewed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 11e9d7829d ]
bond_miimon_commit() handles the UP transition for each slave of a bond
in the case of MII. It is triggered 10 times per second for the default
MII Polling interval of 100ms. For device drivers that do not implement
__ethtool_get_link_ksettings() the call to bond_update_speed_duplex()
fails persistently while the MII status could remain UP. That is, in
this and other cases where the speed/duplex update keeps failing over a
longer period of time while the MII state is UP, a warning is printed
every MII polling interval.
To address these excessive warnings net_ratelimit() should be used.
Printing a warning once would not be sufficient since the call to
bond_update_speed_duplex() could recover to succeed and fail again
later. In that case there would be no new indication what went wrong.
Fixes: b5bf0f5b16 (bonding: correctly update link status during mii-commit phase)
Signed-off-by: Andreas Born <futur.andy@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 56f772279a ]
In failure path, we overwrite err to what vnic_rq_disable() returns. In
case it returns 0, enic_open() returns success in case of error.
Reported-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Fixes: e8588e2685 ("enic: enable rq before updating rq descriptors")
Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit cfb61b5e3e ]
pmdp_invalidate() was changed to update the pmd atomically
(to not lose dirty/access bits) and return the original pmd
value.
However, in doing so, we lost a lot of the essential work that
set_pmd_at() does, namely to update hugepage mapping counts and
queuing up the batched TLB flush entry.
Thus we were not flushing entries out of the TLB when making
such PMD changes.
Fix this by abstracting the accounting work of set_pmd_at() out into a
separate function, and call it from pmdp_establish().
Fixes: a8e654f01c ("sparc64: update pmdp_invalidate() to return old pmd value")
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 45c8184c1b ]
Update the features after calling register_netdev() otherwise the
device features are not set up correctly and it not possible to change
the MTU of the device. After this change, the features reported by
ethtool match the device's features before the commit which introduced
the issue and it is possible to change the device's MTU.
Fixes: f599c64fdf ("xen-netfront: Fix race between device setup and open")
Reported-by: Liam Shepherd <liam@dancer.es>
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 52fda36d63 ]
Function bpf_fill_maxinsns11 is designed to not be able to be JITed on
x86_64. So, it fails when CONFIG_BPF_JIT_ALWAYS_ON=y, and
commit 09584b4067 ("bpf: fix selftests/bpf test_kmod.sh failure when
CONFIG_BPF_JIT_ALWAYS_ON=y") makes sure that failure is detected on that
case.
However, it does not fail on other architectures, which have a different
JIT compiler design. So, test_bpf has started to fail to load on those.
After this fix, test_bpf loads fine on both x86_64 and ppc64el.
Fixes: 09584b4067 ("bpf: fix selftests/bpf test_kmod.sh failure when CONFIG_BPF_JIT_ALWAYS_ON=y")
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6a30abaa40 ]
The commit c469652bb5 ("ALSA: hda - Use IS_REACHABLE() for
dependency on input") simplified the dependencies with IS_REACHABLE()
macro, but it broke due to its incorrect usage: it should have been
IS_REACHABLE(CONFIG_INPUT) instead of IS_REACHABLE(INPUT).
Fixes: c469652bb5 ("ALSA: hda - Use IS_REACHABLE() for dependency on input")
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e78c38f6bd ]
In commit 30d6e0a419 ("futex: Remove duplicated code and fix undefined
behaviour"), I let FUTEX_WAKE_OP to fail on invalid op. Namely when op
should be considered as shift and the shift is out of range (< 0 or > 31).
But strace's test suite does this madness:
futex(0x7fabd78bcffc, 0x5, 0xfacefeed, 0xb, 0x7fabd78bcffc, 0xa0caffee);
futex(0x7fabd78bcffc, 0x5, 0xfacefeed, 0xb, 0x7fabd78bcffc, 0xbadfaced);
futex(0x7fabd78bcffc, 0x5, 0xfacefeed, 0xb, 0x7fabd78bcffc, 0xffffffff);
When I pick the first 0xa0caffee, it decodes as:
0x80000000 & 0xa0caffee: oparg is shift
0x70000000 & 0xa0caffee: op is FUTEX_OP_OR
0x0f000000 & 0xa0caffee: cmp is FUTEX_OP_CMP_EQ
0x00fff000 & 0xa0caffee: oparg is sign-extended 0xcaf = -849
0x00000fff & 0xa0caffee: cmparg is sign-extended 0xfee = -18
That means the op tries to do this:
(futex |= (1 << (-849))) == -18
which is completely bogus. The new check of op in the code is:
if (encoded_op & (FUTEX_OP_OPARG_SHIFT << 28)) {
if (oparg < 0 || oparg > 31)
return -EINVAL;
oparg = 1 << oparg;
}
which results obviously in the "Invalid argument" errno:
FAIL: futex
===========
futex(0x7fabd78bcffc, 0x5, 0xfacefeed, 0xb, 0x7fabd78bcffc, 0xa0caffee) = -1: Invalid argument
futex.test: failed test: ../futex failed with code 1
So let us soften the failure to print only a (ratelimited) message, crop
the value and continue as if it were right. When userspace keeps up, we
can switch this to return -EINVAL again.
[v2] Do not return 0 immediatelly, proceed with the cropped value.
Fixes: 30d6e0a419 ("futex: Remove duplicated code and fix undefined behaviour")
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Darren Hart <dvhart@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3995bbf53b ]
On 32-bit (e.g. with m68k-linux-gnu-gcc-4.1):
fs/cifs/inode.c: In function ‘simple_hashstr’:
fs/cifs/inode.c:713: warning: integer constant is too large for ‘long’ type
Fixes: 7ea884c77e ("smb3: Fix root directory when server returns inode number of zero")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2aeb188354 ]
We're missing ctx lock when iterating children siblings
within the perf_read path for group reading. Following
race and crash can happen:
User space doing read syscall on event group leader:
T1:
perf_read
lock event->ctx->mutex
perf_read_group
lock leader->child_mutex
__perf_read_group_add(child)
list_for_each_entry(sub, &leader->sibling_list, group_entry)
----> sub might be invalid at this point, because it could
get removed via perf_event_exit_task_context in T2
Child exiting and cleaning up its events:
T2:
perf_event_exit_task_context
lock ctx->mutex
list_for_each_entry_safe(child_event, next, &child_ctx->event_list,...
perf_event_exit_event(child)
lock ctx->lock
perf_group_detach(child)
unlock ctx->lock
----> child is removed from sibling_list without any sync
with T1 path above
...
free_event(child)
Before the child is removed from the leader's child_list,
(and thus is omitted from perf_read_group processing), we
need to ensure that perf_read_group touches child's
siblings under its ctx->lock.
Peter further notes:
| One additional note; this bug got exposed by commit:
|
| ba5213ae6b ("perf/core: Correct event creation with PERF_FORMAT_GROUP")
|
| which made it possible to actually trigger this code-path.
Tested-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: ba5213ae6b ("perf/core: Correct event creation with PERF_FORMAT_GROUP")
Link: http://lkml.kernel.org/r/20170720141455.2106-1-jolsa@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 900631ee6a ]
If L2TP_ATTR_OFFSET is set to a non-zero value in L2TPv3 tunnels, it
results in L2TPv3 packets being transmitted which might not be
compliant with the L2TPv3 RFC. This patch has l2tp ignore the offset
setting and send all packets with no offset.
In more detail:
L2TPv2 supports a variable offset from the L2TPv2 header to the
payload. The offset value is indicated by an optional field in the
L2TP header. Our L2TP implementation already detects the presence of
the optional offset and skips that many bytes when handling data
received packets. All transmitted packets are always transmitted with
no offset.
L2TPv3 has no optional offset field in the L2TPv3 packet
header. Instead, L2TPv3 defines optional fields in a "Layer-2 Specific
Sublayer". At the time when the original L2TP code was written, there
was talk at IETF of offset being implemented in a new Layer-2 Specific
Sublayer. A L2TP_ATTR_OFFSET netlink attribute was added so that this
offset could be configured and the intention was to allow it to be
also used to set the tx offset for L2TPv2. However, no L2TPv3 offset
was ever specified and the L2TP_ATTR_OFFSET parameter was forgotten
about.
Setting L2TP_ATTR_OFFSET results in L2TPv3 packets being transmitted
with the specified number of bytes padding between L2TPv3 header and
payload. This is not compliant with L2TPv3 RFC3931. This change
removes the configurable offset altogether while retaining
L2TP_ATTR_OFFSET for backwards compatibility. Any L2TP_ATTR_OFFSET
value is ignored.
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f61dfff2f5 ]
With gcc 4.1.2:
drivers/iio/pressure/zpa2326.c: In function ‘zpa2326_wait_oneshot_completion’:
drivers/iio/pressure/zpa2326.c:868: warning: ‘ret’ may be used uninitialized in this function
When testing for "timeout < 0", timeout is already guaranteed to be
strict negative, so the branch is always taken, and ret is thus always
initialized. But (some version of) gcc is not smart enough to notice.
Remove the check to fix this.
As there is no other code in between assigning the error codes and
returning them, the error codes can be returned immediately, and the
intermediate variable can be dropped.
Drop the "else" to please checkpatch.
Fixes: e7215fe4d5 ("iio: pressure: zpa2326: report interrupted case as failure")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4d217a5adc ]
The newly added 'rodata_enabled' global variable is protected by
the wrong #ifdef, leading to a link error when CONFIG_DEBUG_SET_MODULE_RONX
is turned on:
kernel/module.o: In function `disable_ro_nx':
module.c:(.text.unlikely.disable_ro_nx+0x88): undefined reference to `rodata_enabled'
kernel/module.o: In function `module_disable_ro':
module.c:(.text.module_disable_ro+0x8c): undefined reference to `rodata_enabled'
kernel/module.o: In function `module_enable_ro':
module.c:(.text.module_enable_ro+0xb0): undefined reference to `rodata_enabled'
CONFIG_SET_MODULE_RONX does not exist, so use the correct one instead.
Fixes: 39290b389e ("module: extend 'rodata=off' boot cmdline parameter to module mappings")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jessica Yu <jeyu@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3976626ea3 ]
Commit 62e3a3e342 changed get_pages() to initialise
msm_gem_object::pages before trying to initialise msm_gem_object::sgt,
so that put_pages() would properly clean up pages in the failure
case.
However, this means that put_pages() now needs to check that
msm_gem_object::sgt is not null before trying to clean it up, and
this check was only applied to part of the cleanup code. Move
it all into the conditional block. (Strictly speaking we don't
need to make the kfree() conditional, but since we can't avoid
checking for null ourselves we may as well do so.)
Fixes: 62e3a3e342 ("drm/msm: fix leak in failed get_pages")
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Reviewed-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 24e52b11e0 ]
When doing an incremental send, while processing an extent that changed
between the parent and send snapshots and that extent was an inline extent
in the parent snapshot, it's possible to access a memory region beyond
the end of leaf if the inline extent is very small and it is the first
item in a leaf.
An example scenario is described below.
The send snapshot has the following leaf:
leaf 33865728 items 33 free space 773 generation 46 owner 5
fs uuid ab7090d8-dafd-4fb9-9246-723b6d2e2fb7
chunk uuid 2d16478c-c704-4ab9-b574-68bff2281b1f
(...)
item 14 key (335 EXTENT_DATA 0) itemoff 3052 itemsize 53
generation 36 type 1 (regular)
extent data disk byte 12791808 nr 4096
extent data offset 0 nr 4096 ram 4096
extent compression 0 (none)
item 15 key (335 EXTENT_DATA 8192) itemoff 2999 itemsize 53
generation 36 type 1 (regular)
extent data disk byte 138170368 nr 225280
extent data offset 0 nr 225280 ram 225280
extent compression 0 (none)
(...)
And the parent snapshot has the following leaf:
leaf 31272960 items 17 free space 17 generation 31 owner 5
fs uuid ab7090d8-dafd-4fb9-9246-723b6d2e2fb7
chunk uuid 2d16478c-c704-4ab9-b574-68bff2281b1f
item 0 key (335 EXTENT_DATA 0) itemoff 3951 itemsize 44
generation 31 type 0 (inline)
inline extent data size 23 ram_bytes 613 compression 1 (zlib)
(...)
When computing the send stream, it is detected that the extent of inode
335, at file offset 0, and at fs/btrfs/send.c:is_extent_unchanged() we
grab the leaf from the parent snapshot and access the inline extent item.
However, before jumping to the 'out' label, we access the 'offset' and
'disk_bytenr' fields of the extent item, which should not be done for
inline extents since the inlined data starts at the offset of the
'disk_bytenr' field and can be very small. For example accessing the
'offset' field of the file extent item results in the following trace:
[ 599.705368] general protection fault: 0000 [#1] PREEMPT SMP
[ 599.706296] Modules linked in: btrfs psmouse i2c_piix4 ppdev acpi_cpufreq serio_raw parport_pc i2c_core evdev tpm_tis tpm_tis_core sg pcspkr parport tpm button su$
[ 599.709340] CPU: 7 PID: 5283 Comm: btrfs Not tainted 4.10.0-rc8-btrfs-next-46+ #1
[ 599.709340] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014
[ 599.709340] task: ffff88023eedd040 task.stack: ffffc90006658000
[ 599.709340] RIP: 0010:read_extent_buffer+0xdb/0xf4 [btrfs]
[ 599.709340] RSP: 0018:ffffc9000665ba00 EFLAGS: 00010286
[ 599.709340] RAX: db73880000000000 RBX: 0000000000000000 RCX: 0000000000000001
[ 599.709340] RDX: ffffc9000665ba60 RSI: db73880000000000 RDI: ffffc9000665ba5f
[ 599.709340] RBP: ffffc9000665ba30 R08: 0000000000000001 R09: ffff88020dc5e098
[ 599.709340] R10: 0000000000001000 R11: 0000160000000000 R12: 6db6db6db6db6db7
[ 599.709340] R13: ffff880000000000 R14: 0000000000000000 R15: ffff88020dc5e088
[ 599.709340] FS: 00007f519555a8c0(0000) GS:ffff88023f3c0000(0000) knlGS:0000000000000000
[ 599.709340] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 599.709340] CR2: 00007f1411afd000 CR3: 0000000235f8e000 CR4: 00000000000006e0
[ 599.709340] Call Trace:
[ 599.709340] btrfs_get_token_64+0x93/0xce [btrfs]
[ 599.709340] ? printk+0x48/0x50
[ 599.709340] btrfs_get_64+0xb/0xd [btrfs]
[ 599.709340] process_extent+0x3a1/0x1106 [btrfs]
[ 599.709340] ? btree_read_extent_buffer_pages+0x5/0xef [btrfs]
[ 599.709340] changed_cb+0xb03/0xb3d [btrfs]
[ 599.709340] ? btrfs_get_token_32+0x7a/0xcc [btrfs]
[ 599.709340] btrfs_compare_trees+0x432/0x53d [btrfs]
[ 599.709340] ? process_extent+0x1106/0x1106 [btrfs]
[ 599.709340] btrfs_ioctl_send+0x960/0xe26 [btrfs]
[ 599.709340] btrfs_ioctl+0x181b/0x1fed [btrfs]
[ 599.709340] ? trace_hardirqs_on_caller+0x150/0x1ac
[ 599.709340] vfs_ioctl+0x21/0x38
[ 599.709340] ? vfs_ioctl+0x21/0x38
[ 599.709340] do_vfs_ioctl+0x611/0x645
[ 599.709340] ? rcu_read_unlock+0x5b/0x5d
[ 599.709340] ? __fget+0x6d/0x79
[ 599.709340] SyS_ioctl+0x57/0x7b
[ 599.709340] entry_SYSCALL_64_fastpath+0x18/0xad
[ 599.709340] RIP: 0033:0x7f51945eec47
[ 599.709340] RSP: 002b:00007ffc21c13e98 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
[ 599.709340] RAX: ffffffffffffffda RBX: ffffffff81096459 RCX: 00007f51945eec47
[ 599.709340] RDX: 00007ffc21c13f20 RSI: 0000000040489426 RDI: 0000000000000004
[ 599.709340] RBP: ffffc9000665bf98 R08: 00007f519450d700 R09: 00007f519450d700
[ 599.709340] R10: 00007f519450d9d0 R11: 0000000000000202 R12: 0000000000000046
[ 599.709340] R13: ffffc9000665bf78 R14: 0000000000000000 R15: 00007f5195574040
[ 599.709340] ? trace_hardirqs_off_caller+0x43/0xb1
[ 599.709340] Code: 29 f0 49 39 d8 4c 0f 47 c3 49 03 81 58 01 00 00 44 89 c1 4c 01 c2 4c 29 c3 48 c1 f8 03 49 0f af c4 48 c1 e0 0c 4c 01 e8 48 01 c6 <f3> a4 31 f6 4$
[ 599.709340] RIP: read_extent_buffer+0xdb/0xf4 [btrfs] RSP: ffffc9000665ba00
[ 599.762057] ---[ end trace fe00d7af61b9f49e ]---
This is because the 'offset' field starts at an offset of 37 bytes
(offsetof(struct btrfs_file_extent_item, offset)), has a length of 8
bytes and therefore attemping to read it causes a 1 byte access beyond
the end of the leaf, as the first item's content in a leaf is located
at the tail of the leaf, the item size is 44 bytes and the offset of
that field plus its length (37 + 8 = 45) goes beyond the item's size
by 1 byte.
So fix this by accessing the 'offset' and 'disk_bytenr' fields after
jumping to the 'out' label if we are processing an inline extent. We
move the reading operation of the 'disk_bytenr' field too because we
have the same problem as for the 'offset' field explained above when
the inline data is less then 8 bytes. The access to the 'generation'
field is also moved but just for the sake of grouping access to all
the fields.
Fixes: e1cbfd7bf6 ("Btrfs: send, fix file hole not being preserved due to inline extent")
Cc: <stable@vger.kernel.org> # v4.12+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 612601d001 ]
commit 9a9b811269 will cause core to fail UD QP from being destroyed
on ipoib unload, therefore cause resources leakage.
On pkey change event above patch modifies mgid before calling underlying
driver to detach it from QP. Drivers' detach_mcast() will fail to find
modified mgid it was never given to attach in a first place.
Core qp->usecnt will never go down, so ib_destroy_qp() will fail.
IPoIB driver actually does take care of new broadcast mgid based on new
pkey by destroying an old mcast object in ipoib_mcast_dev_flush())
....
if (priv->broadcast) {
rb_erase(&priv->broadcast->rb_node, &priv->multicast_tree);
list_add_tail(&priv->broadcast->list, &remove_list);
priv->broadcast = NULL;
}
...
then in restarted ipoib_macst_join_task() creating a new broadcast mcast
object, sending join request and on completion tells the driver to attach
to reinitialized QP:
...
if (!priv->broadcast) {
...
broadcast = ipoib_mcast_alloc(dev, 0);
...
memcpy(broadcast->mcmember.mgid.raw, priv->dev->broadcast + 4,
sizeof (union ib_gid));
priv->broadcast = broadcast;
...
Fixes: 9a9b811269 ("IB/ipoib: Update broadcast object if PKey value was changed in index 0")
Cc: stable@vger.kernel.org
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Alex Estrin <alex.estrin@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 09f79fd49d ]
X722 devices use the AdminQ to access the NVM, and this requires taking
the AdminQ lock. Because of this, we lock the AdminQ during
i40e_read_nvm(), which is also called in places where the lock is
already held, such as the firmware update path which wants to lock once
and then unlock when finished after performing several tasks.
Although this should have only affected X722 devices, commit
96a39aed25 ("i40e: Acquire NVM lock before reads on all devices",
2016-12-02) added locking for all NVM reads, regardless of device
family.
This resulted in us accidentally causing NVM acquire timeouts on all
devices, causing failed firmware updates which left the eeprom in
a corrupt state.
Create unsafe non-locked variants of i40e_read_nvm_word and
i40e_read_nvm_buffer, __i40e_read_nvm_word and __i40e_read_nvm_buffer
respectively. These variants will not take the NVM lock and are expected
to only be called in places where the NVM lock is already held if
needed.
Since the only caller of i40e_read_nvm_buffer() was in such a path,
remove it entirely in favor of the unsafe version. If necessary we can
always add it back in the future.
Additionally, we now need to hold the NVM lock in i40e_validate_checksum
because the call to i40e_calc_nvm_checksum now assumes that the NVM lock
is held. We can further move the call to read I40E_SR_SW_CHECKSUM_WORD
up a bit so that we do not need to acquire the NVM lock twice.
This should resolve firmware updates and also fix potential raise that
could have caused the driver to report an invalid NVM checksum upon
driver load.
Reported-by: Stefan Assmann <sassmann@kpanic.de>
Fixes: 96a39aed25 ("i40e: Acquire NVM lock before reads on all devices", 2016-12-02)
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3a9910d7b6 ]
qla2x00_tmf_sp_done() now deletes the timer that will run
qla2x00_tmf_iocb_timeout(), but doesn't check whether the timer already
expired. Check the return value from del_timer() to avoid calling
complete() a second time.
Fixes: 4440e46d5d ("[SCSI] qla2xxx: Add IOCB Abort command asynchronous ...")
Fixes: 1514839b36 ("scsi: qla2xxx: Fix NULL pointer crash due to active ...")
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Acked-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e7b169f344 ]
During QP creation, the mlx5 driver translates the QP type to an
internal value which is passed on to FW. There was no check to make
sure that the translated value is valid, and -EINVAL was coerced into
the mailbox command.
Current firmware refuses this as an invalid QP type, but future/past
firmware may do something else.
Fixes: 09a7d9eca1 ('{net,IB}/mlx5: QP/XRCD commands via mlx5 ifc')
Reviewed-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d61b7f972d ]
A user noticed that write performance was horrible over loopback and we
traced it to an inversion of when we need to set MSG_MORE. It should be
set when we have more bvec's to send, not when we are on the last bvec.
This patch made the test go from 20 iops to 78k iops.
Signed-off-by: Josef Bacik <jbacik@fb.com>
Fixes: 429a787be6 ("nbd: fix use-after-free of rq/bio in the xmit path")
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6b9f8970cd ]
If the allocation of elem fails, it is not sufficient to simply check
for NULL and return. We need to also put our reference on the pool or
else we will leave the pool with a permanent ref count and we will never
be able to free it.
Fixes: 4831ca9e4a ("IB/rxe: check for allocation failure on elem")
Suggested-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1f80bd6a6c ]
The locking order of vlan_rwsem (LOCK A) and then rtnl (LOCK B),
contradicts other flows such as ipoib_open possibly causing a deadlock.
To prevent this deadlock heavy flush is called with RTNL locked and
only then tries to acquire vlan_rwsem.
This deadlock is possible only when there are child interfaces.
[ 140.941758] ======================================================
[ 140.946276] WARNING: possible circular locking dependency detected
[ 140.950950] 4.15.0-rc1+ #9 Tainted: G O
[ 140.954797] ------------------------------------------------------
[ 140.959424] kworker/u32:1/146 is trying to acquire lock:
[ 140.963450] (rtnl_mutex){+.+.}, at: [<ffffffffc083516a>] __ipoib_ib_dev_flush+0x2da/0x4e0 [ib_ipoib]
[ 140.970006]
but task is already holding lock:
[ 140.975141] (&priv->vlan_rwsem){++++}, at: [<ffffffffc0834ee1>] __ipoib_ib_dev_flush+0x51/0x4e0 [ib_ipoib]
[ 140.982105]
which lock already depends on the new lock.
[ 140.990023]
the existing dependency chain (in reverse order) is:
[ 140.998650]
-> #1 (&priv->vlan_rwsem){++++}:
[ 141.005276] down_read+0x4d/0xb0
[ 141.009560] ipoib_open+0xad/0x120 [ib_ipoib]
[ 141.014400] __dev_open+0xcb/0x140
[ 141.017919] __dev_change_flags+0x1a4/0x1e0
[ 141.022133] dev_change_flags+0x23/0x60
[ 141.025695] devinet_ioctl+0x704/0x7d0
[ 141.029156] sock_do_ioctl+0x20/0x50
[ 141.032526] sock_ioctl+0x221/0x300
[ 141.036079] do_vfs_ioctl+0xa6/0x6d0
[ 141.039656] SyS_ioctl+0x74/0x80
[ 141.042811] entry_SYSCALL_64_fastpath+0x1f/0x96
[ 141.046891]
-> #0 (rtnl_mutex){+.+.}:
[ 141.051701] lock_acquire+0xd4/0x220
[ 141.055212] __mutex_lock+0x88/0x970
[ 141.058631] __ipoib_ib_dev_flush+0x2da/0x4e0 [ib_ipoib]
[ 141.063160] __ipoib_ib_dev_flush+0x71/0x4e0 [ib_ipoib]
[ 141.067648] process_one_work+0x1f5/0x610
[ 141.071429] worker_thread+0x4a/0x3f0
[ 141.074890] kthread+0x141/0x180
[ 141.078085] ret_from_fork+0x24/0x30
[ 141.081559]
other info that might help us debug this:
[ 141.088967] Possible unsafe locking scenario:
[ 141.094280] CPU0 CPU1
[ 141.097953] ---- ----
[ 141.101640] lock(&priv->vlan_rwsem);
[ 141.104771] lock(rtnl_mutex);
[ 141.109207] lock(&priv->vlan_rwsem);
[ 141.114032] lock(rtnl_mutex);
[ 141.116800]
*** DEADLOCK ***
Fixes: b4b678b06f ("IB/ipoib: Grab rtnl lock on heavy flush when calling ndo_open/stop")
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit afe49de44c ]
Commit 15e668070a ("ipv6: reorder icmpv6_init() and ip6_mr_init()")
moved the cleanup label for ipmr_fail, but should have changed the
contents of the cleanup labels as well. Now we can end up cleaning up
icmpv6 even though it hasn't been initialized (jump to icmp_fail or
ipmr_fail).
Simply undo things in the reverse order of their initialization.
Example of panic (triggered by faking a failure of icmpv6_init):
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN PTI
[...]
RIP: 0010:__list_del_entry_valid+0x79/0x160
[...]
Call Trace:
? lock_release+0x8a0/0x8a0
unregister_pernet_operations+0xd4/0x560
? ops_free_list+0x480/0x480
? down_write+0x91/0x130
? unregister_pernet_subsys+0x15/0x30
? down_read+0x1b0/0x1b0
? up_read+0x110/0x110
? kmem_cache_create_usercopy+0x1b4/0x240
unregister_pernet_subsys+0x1d/0x30
icmpv6_cleanup+0x1d/0x30
inet6_init+0x1b5/0x23f
Fixes: 15e668070a ("ipv6: reorder icmpv6_init() and ip6_mr_init()")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7be52c03bb ]
Currently ath10k unncessarily warns about board id not available from OTP:
ath10k_pci 0000:02:00.0: pci irq msi oper_irq_mode 2 irq_mode 0 reset_mode 0
ath10k_pci 0000:02:00.0: qca988x hw2.0 target 0x4100016c chip_id 0x043202ff sub 0000:0000
ath10k_pci 0000:02:00.0: kconfig debug 1 debugfs 1 tracing 1 dfs 1 testmode 1
ath10k_pci 0000:02:00.0: firmware ver 10.2.4.70.9-2 api 5 features no-p2p,raw-mode crc32 b8d50af5
ath10k_pci 0000:02:00.0: board id is not exist in otp, ignore it
ath10k_pci 0000:02:00.0: board_file api 1 bmi_id N/A crc32 bebc7c08
ath10k_pci 0000:02:00.0: htt-ver 2.1 wmi-op 5 htt-op 2 cal otp max-sta 128 raw 0 hwcrypto 1
But not all boards have the board id in OTP so this is not a problem and no
need to confuse the user with that info. So this can be safely changed to a
debug message.
Also fix grammar in the debug message.
Fixes: d2e202c06c ("ath10k: ignore configuring the incorrect board_id")
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a8dd397903 ]
Commit d04adf1b35 ("sctp: reset owner sk for data chunks on out queues
when migrating a sock") made a mistake that using 'list' as the param of
list_for_each_entry to traverse the retransmit, sacked and abandoned
queues, while chunks are using 'transmitted_list' to link into these
queues.
It could cause NULL dereference panic if there are chunks in any of these
queues when peeling off one asoc.
So use the chunk member 'transmitted_list' instead in this patch.
Fixes: d04adf1b35 ("sctp: reset owner sk for data chunks on out queues when migrating a sock")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6314dab4b8 ]
The GetNtbFormat and SetNtbFormat requests operate on 16 bit little
endian values. We get away with ignoring this most of the time, because
we only care about USB_CDC_NCM_NTB16_FORMAT which is 0x0000. This
fails for USB_CDC_NCM_NTB32_FORMAT.
Fix comparison between LE value from device and constant by converting
the constant to LE.
Reported-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Fixes: 2b02c20ce0 ("cdc_ncm: Set NTB format again after altsetting switch for Huawei devices")
Cc: Enrico Mioso <mrkiko.rs@gmail.com>
Cc: Christian Panton <christian@panton.org>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Acked-By: Enrico Mioso <mrkiko.rs@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8818efaaac ]
Another deadlock path caused by recursive locking is reported. This
kind of issue was introduced since commit 743b5f1434 ("ocfs2: take
inode lock in ocfs2_iop_set/get_acl()"). Two deadlock paths have been
fixed by commit b891fa5024 ("ocfs2: fix deadlock issue when taking
inode lock at vfs entry points"). Yes, we intend to fix this kind of
case in incremental way, because it's hard to find out all possible
paths at once.
This one can be reproduced like this. On node1, cp a large file from
home directory to ocfs2 mountpoint. While on node2, run
setfacl/getfacl. Both nodes will hang up there. The backtraces:
On node1:
__ocfs2_cluster_lock.isra.39+0x357/0x740 [ocfs2]
ocfs2_inode_lock_full_nested+0x17d/0x840 [ocfs2]
ocfs2_write_begin+0x43/0x1a0 [ocfs2]
generic_perform_write+0xa9/0x180
__generic_file_write_iter+0x1aa/0x1d0
ocfs2_file_write_iter+0x4f4/0xb40 [ocfs2]
__vfs_write+0xc3/0x130
vfs_write+0xb1/0x1a0
SyS_write+0x46/0xa0
On node2:
__ocfs2_cluster_lock.isra.39+0x357/0x740 [ocfs2]
ocfs2_inode_lock_full_nested+0x17d/0x840 [ocfs2]
ocfs2_xattr_set+0x12e/0xe80 [ocfs2]
ocfs2_set_acl+0x22d/0x260 [ocfs2]
ocfs2_iop_set_acl+0x65/0xb0 [ocfs2]
set_posix_acl+0x75/0xb0
posix_acl_xattr_set+0x49/0xa0
__vfs_setxattr+0x69/0x80
__vfs_setxattr_noperm+0x72/0x1a0
vfs_setxattr+0xa7/0xb0
setxattr+0x12d/0x190
path_setxattr+0x9f/0xb0
SyS_setxattr+0x14/0x20
Fix this one by using ocfs2_inode_{lock|unlock}_tracker, which is
exported by commit 439a36b8ef ("ocfs2/dlmglue: prepare tracking logic
to avoid recursive cluster lock").
Link: http://lkml.kernel.org/r/20170622014746.5815-1-zren@suse.com
Fixes: 743b5f1434 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()")
Signed-off-by: Eric Ren <zren@suse.com>
Reported-by: Thomas Voegtle <tv@lio96.de>
Tested-by: Thomas Voegtle <tv@lio96.de>
Reviewed-by: Joseph Qi <jiangqi903@gmail.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3a50d3518d ]
PTT entries are per-hwfn; If some errneous flow is trying
to use a PTT belonging to a differnet hwfn warn user, as this
can break every register accessing flow later and is very hard
to root-cause.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 631b010abc ]
Inheriting the ADC BIAS current settings from the BIOS instead of
hardcoding then causes the AXP288 to disable charging (I think it
mis-detects an overheated battery) on at least one model tablet.
So lets go back to hard coding the values, this reverts
commit fa2849e964 ("iio: adc: axp288: Drop bogus
AXP288_ADC_TS_PIN_CTRL register modifications"), fixing charging not
working on the model tablet in question.
The exact cause is not fully understood, hence the revert to a known working
state.
Cc: stable@vger.kernel.org
Reported-by: Umberto Ixxo <sfumato1977@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 91a825290c ]
The function rds_ib_setup_qp is calling rds_ib_get_client_data and
should correspondingly call rds_ib_dev_put. This call was lost in
the non-error path with the introduction of error handling done in
commit 3b12f73a5c ("rds: ib: add error handle")
Signed-off-by: Dag Moxnes <dag.moxnes@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0e490657c7 ]
The vif->idx value is always 0 for two interfaces.
wl->vif_num = 0;
loop {
...
vif->idx = wl->vif_num;
...
wl->vif_num = i;
....
i++;
...
}
At present, vif->idx is assigned the value of wl->vif_num
at the beginning of this block and device is initialized
based on this index value.
In the next iteration, wl->vif_num is still 0 as it is only updated
later but gets assigned to vif->idx in the beginning. This causes problems
later when we try to reference a particular interface and also while
configuring the firmware.
This patch moves the assignment to vif->idx from the beginning
of the block to after wl->vif_num is updated with latest value of i.
Fixes: commit 735bb39ca3 ("staging: wilc1000: simplify vif[i]->ndev accesses")
Cc: <stable@vger.kernel.org>
Signed-off-by: Aditya Shankar <aditya.shankar@microchip.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5790eabc6e ]
Add more stubs to make it build.
Fixes: 81fbfe8a ("ptr_ring: use kmalloc_array()")
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c07c1a0f68 ]
The TOP "aclk400_mscl" clock should be kept enabled all the time
to allow proper access to power management control for MSC power
domain and devices that are a part of it. This change is required
for the scaler to work properly after domain power on/off sequence.
Fixes: 318fa46cc6 ("clk/samsung: exynos542x: mark some clocks as critical")
Signed-off-by: Andrzej Pietrasiewicz <andrzej.p@samsung.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ee249b4554 ]
IRQ_NOAUTOEN cannot be used with shared IRQs, since commit 04c848d398
("genirq: Warn when IRQ_NOAUTOEN is used with shared interrupts") and
kernel now throws a warn dump. But OMAP DWC3 driver uses this flag. As
per commit 12a7f17fac ("usb: dwc3: omap: fix race of pm runtime with
irq handler in probe") that introduced this flag, PM runtime can race
with IRQ handler when deferred probing happens due to extcon,
therefore IRQ_NOAUTOEN needs to be set so that irq is not enabled until
extcon is registered.
Remove setting of IRQ_NOAUTOEN and move the registration of
shared irq to a point after dwc3_omap_extcon_register() and
of_platform_populate(). This avoids possibility of probe deferring and
above said race condition.
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Vignesh R <vigneshr@ti.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b7d44c36a6 ]
The commit b8b9c974af ("usb: renesas_usbhs: gadget: disable all eps
when the driver stops") causes the unused-but-set-variable warning.
But, if the usbhsg_ep_disable() will return non-zero value, udc/core.c
doesn't clear the ep->enabled flag. So, this driver should not return
non-zero value, if the pipe is zero because this means the pipe is
already disabled. Otherwise, the ep->enabled flag is never cleared
when the usbhsg_ep_disable() is called by the renesas_usbhs driver first.
Fixes: b8b9c974af ("usb: renesas_usbhs: gadget: disable all eps when the driver stops")
Fixes: 11432050f0 ("usb: renesas_usbhs: gadget: fix NULL pointer dereference in ep_disable()")
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 14a8d4bfc2 ]
This patch fixes an issue that the spin_lock_init() is not called
for almost all pipes. Otherwise, the lockdep output the following
message when we connect a usb cable using g_ncm:
INFO: trying to register non-static key.
the code is fine but needs lockdep annotation.
turning off the locking correctness validator.
Reported-by: Kazuya Mizuguchi <kazuya.mizuguchi.ks@renesas.com>
Fixes: b8b9c974af ("usb: renesas_usbhs: gadget: disable all eps when the driver stops")
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Tested-by: Kazuya Mizuguchi <kazuya.mizuguchi.ks@renesas.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6377ed0bba ]
spin_lock/unlock of health->wq_lock should be IRQ safe.
It was changed to spin_lock_irqsave since adding commit 0179720d6b
("net/mlx5: Introduce trigger_health_work function") which uses
spin_lock from asynchronous event (IRQ) context.
Thus, all spin_lock/unlock of health->wq_lock should have been moved
to IRQ safe mode.
However, one occurrence on new code using this lock missed that
change, resulting in possible deadlock:
kernel: Possible unsafe locking scenario:
kernel: CPU0
kernel: ----
kernel: lock(&(&health->wq_lock)->rlock);
kernel: <Interrupt>
kernel: lock(&(&health->wq_lock)->rlock);
kernel: #012 *** DEADLOCK ***
Fixes: 2a0165a034 ("net/mlx5: Cancel delayed recovery work when unloading the driver")
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7598f8bc13 ]
In commit 613f050d68 ("perf probe: Fix to probe on gcc generated
functions in modules"), the offset from symbol is, incorrectly, added
to the trace point address. This leads to incorrect probe trace points
for inlined functions and when using relative line number on symbols.
Prior this patch:
$ perf probe -m nf_nat -D in_range
p:probe/in_range nf_nat:in_range.isra.9+0
$ perf probe -m i40e -D i40e_clean_rx_irq
p:probe/i40e_clean_rx_irq i40e:i40e_napi_poll+2212
$ perf probe -m i40e -D i40e_clean_rx_irq:16
p:probe/i40e_clean_rx_irq i40e:i40e_lan_xmit_frame+626
After:
$ perf probe -m nf_nat -D in_range
p:probe/in_range nf_nat:in_range.isra.9+0
$ perf probe -m i40e -D i40e_clean_rx_irq
p:probe/i40e_clean_rx_irq i40e:i40e_napi_poll+1106
$ perf probe -m i40e -D i40e_clean_rx_irq:16
p:probe/i40e_clean_rx_irq i40e:i40e_napi_poll+2665
Committer testing:
Using 'pfunct', a tool found in the 'dwarves' package [1], one can ask what are
the functions that while not being explicitely marked as inline, were inlined
by the compiler:
# pfunct --cc_inlined /lib/modules/4.12.0-rc4+/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko | head
__ew32
e1000_regdump
e1000e_dump_ps_pages
e1000_desc_unused
e1000e_systim_to_hwtstamp
e1000e_rx_hwtstamp
e1000e_update_rdt_wa
e1000e_update_tdt_wa
e1000_put_txbuf
e1000_consume_page
Then ask 'perf probe' to produce the kprobe_tracer probe definitions for two of
them:
# perf probe -m e1000e -D e1000e_rx_hwtstamp
p:probe/e1000e_rx_hwtstamp e1000e:e1000_receive_skb+74
# perf probe -m e1000e -D e1000_consume_page
p:probe/e1000_consume_page e1000e:e1000_clean_jumbo_rx_irq+876
p:probe/e1000_consume_page_1 e1000e:e1000_clean_jumbo_rx_irq+1506
p:probe/e1000_consume_page_2 e1000e:e1000_clean_rx_irq_ps+1074
Now lets concentrate on the 'e1000_consume_page' one, that was inlined twice in
e1000_clean_jumbo_rx_irq(), lets see what readelf says about the DWARF tags for
that function:
$ readelf -wi /lib/modules/4.12.0-rc4+/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko
<SNIP>
<1><13e27b>: Abbrev Number: 121 (DW_TAG_subprogram)
<13e27c> DW_AT_name : (indirect string, offset: 0xa8945): e1000_clean_jumbo_rx_irq
<13e287> DW_AT_low_pc : 0x17a30
<3><13e6ef>: Abbrev Number: 119 (DW_TAG_inlined_subroutine)
<13e6f0> DW_AT_abstract_origin: <0x13ed2c>
<13e6f4> DW_AT_low_pc : 0x17be6
<SNIP>
<1><13ed2c>: Abbrev Number: 142 (DW_TAG_subprogram)
<13ed2e> DW_AT_name : (indirect string, offset: 0xa54c3): e1000_consume_page
So, the first time in e1000_clean_jumbo_rx_irq() where e1000_consume_page() is
inlined is at PC 0x17be6, which subtracted from e1000_clean_jumbo_rx_irq()'s
address, gives us the offset we should use in the probe definition:
0x17be6 - 0x17a30 = 438
but above we have 876, which is twice as much.
Lets see the second inline expansion of e1000_consume_page() in
e1000_clean_jumbo_rx_irq():
<3><13e86e>: Abbrev Number: 119 (DW_TAG_inlined_subroutine)
<13e86f> DW_AT_abstract_origin: <0x13ed2c>
<13e873> DW_AT_low_pc : 0x17d21
0x17d21 - 0x17a30 = 753
So we where adding it at twice the offset from the containing function as we
should.
And then after this patch:
# perf probe -m e1000e -D e1000e_rx_hwtstamp
p:probe/e1000e_rx_hwtstamp e1000e:e1000_receive_skb+37
# perf probe -m e1000e -D e1000_consume_page
p:probe/e1000_consume_page e1000e:e1000_clean_jumbo_rx_irq+438
p:probe/e1000_consume_page_1 e1000e:e1000_clean_jumbo_rx_irq+753
p:probe/e1000_consume_page_2 e1000e:e1000_clean_jumbo_rx_irq+1353
#
Which matches the two first expansions and shows that because we were
doubling the offset it would spill over the next function:
readelf -sw /lib/modules/4.12.0-rc4+/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko
673: 0000000000017a30 1626 FUNC LOCAL DEFAULT 2 e1000_clean_jumbo_rx_irq
674: 0000000000018090 2013 FUNC LOCAL DEFAULT 2 e1000_clean_rx_irq_ps
This is the 3rd inline expansion of e1000_consume_page() in
e1000_clean_jumbo_rx_irq():
<3><13ec77>: Abbrev Number: 119 (DW_TAG_inlined_subroutine)
<13ec78> DW_AT_abstract_origin: <0x13ed2c>
<13ec7c> DW_AT_low_pc : 0x17f79
0x17f79 - 0x17a30 = 1353
So:
0x17a30 + 2 * 1353 = 0x184c2
And:
0x184c2 - 0x18090 = 1074
Which explains the bogus third expansion for e1000_consume_page() to end up at:
p:probe/e1000_consume_page_2 e1000e:e1000_clean_rx_irq_ps+1074
All fixed now :-)
[1] https://git.kernel.org/pub/scm/devel/pahole/pahole.git/
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: stable@vger.kernel.org
Fixes: 613f050d68 ("perf probe: Fix to probe on gcc generated functions in modules")
Link: http://lkml.kernel.org/r/20170621164134.5701-1-bjorn.topel@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7a1ac110c2 ]
Since commit 18e7a45af9 ("perf/x86: Reject non sampling events with
precise_ip") returns -EINVAL for sys_perf_event_open() with an attribute
with (attr.precise_ip > 0 && attr.sample_period == 0), just like is done
in the routine used to probe the max precise level when no events were
passed to 'perf record' or 'perf top', i.e.:
perf_evsel__new_cycles()
perf_event_attr__set_max_precise_ip()
The x86 code, in x86_pmu_hw_config(), which is called all the way from
sys_perf_event_open() did, starting with the aforementioned commit:
/* There's no sense in having PEBS for non sampling events: */
if (!is_sampling_event(event))
return -EINVAL;
Which makes it fail for cycles:ppp, cycles:pp and cycles:p, always using
just the non precise cycles variant.
To make sure that this is the case, I tested it, before this patch,
with:
# perf probe -L x86_pmu_hw_config
<x86_pmu_hw_config@/home/acme/git/linux/arch/x86/events/core.c:0>
0 int x86_pmu_hw_config(struct perf_event *event)
1 {
2 if (event->attr.precise_ip) {
<SNIP>
17 if (event->attr.precise_ip > precise)
18 return -EOPNOTSUPP;
/* There's no sense in having PEBS for non sampling events: */
21 if (!is_sampling_event(event))
22 return -EINVAL;
}
<SNIP>
# perf probe x86_pmu_hw_config:22
Added new events:
probe:x86_pmu_hw_config (on x86_pmu_hw_config:22)
probe:x86_pmu_hw_config_1 (on x86_pmu_hw_config:22)
You can now use it in all perf tools, such as:
perf record -e probe:x86_pmu_hw_config_1 -aR sleep 1
# perf trace -e perf_event_open,probe:x86_pmu_hwconfig*/max-stack=16/ perf record usleep 1
0.000 ( 0.015 ms): perf/4150 perf_event_open(attr_uptr: 0x7ffebc8ba110, cpu: -1, group_fd: -1 ) ...
0.015 ( ): probe:x86_pmu_hw_config:(ffffffff9c0065e1))
x86_pmu_hw_config ([kernel.kallsyms])
hsw_hw_config ([kernel.kallsyms])
x86_pmu_event_init ([kernel.kallsyms])
perf_try_init_event ([kernel.kallsyms])
perf_event_alloc ([kernel.kallsyms])
SYSC_perf_event_open ([kernel.kallsyms])
sys_perf_event_open ([kernel.kallsyms])
do_syscall_64 ([kernel.kallsyms])
return_from_SYSCALL_64 ([kernel.kallsyms])
syscall (/usr/lib64/libc-2.24.so)
perf_event_attr__set_max_precise_ip (/home/acme/bin/perf)
perf_evsel__new_cycles (/home/acme/bin/perf)
perf_evlist__add_default (/home/acme/bin/perf)
cmd_record (/home/acme/bin/perf)
run_builtin (/home/acme/bin/perf)
handle_internal_command (/home/acme/bin/perf)
0.000 ( 0.021 ms): perf/4150 ... [continued]: perf_event_open()) = -1 EINVAL Invalid argument
0.023 ( 0.002 ms): perf/4150 perf_event_open(attr_uptr: 0x7ffebc8ba110, cpu: -1, group_fd: -1 ) ...
0.025 ( ): probe:x86_pmu_hw_config:(ffffffff9c0065e1))
x86_pmu_hw_config ([kernel.kallsyms])
hsw_hw_config ([kernel.kallsyms])
x86_pmu_event_init ([kernel.kallsyms])
perf_try_init_event ([kernel.kallsyms])
perf_event_alloc ([kernel.kallsyms])
SYSC_perf_event_open ([kernel.kallsyms])
sys_perf_event_open ([kernel.kallsyms])
do_syscall_64 ([kernel.kallsyms])
return_from_SYSCALL_64 ([kernel.kallsyms])
syscall (/usr/lib64/libc-2.24.so)
perf_event_attr__set_max_precise_ip (/home/acme/bin/perf)
perf_evsel__new_cycles (/home/acme/bin/perf)
perf_evlist__add_default (/home/acme/bin/perf)
cmd_record (/home/acme/bin/perf)
run_builtin (/home/acme/bin/perf)
handle_internal_command (/home/acme/bin/perf)
0.023 ( 0.004 ms): perf/4150 ... [continued]: perf_event_open()) = -1 EINVAL Invalid argument
0.028 ( 0.002 ms): perf/4150 perf_event_open(attr_uptr: 0x7ffebc8ba110, cpu: -1, group_fd: -1 ) ...
0.030 ( ): probe:x86_pmu_hw_config:(ffffffff9c0065e1))
x86_pmu_hw_config ([kernel.kallsyms])
hsw_hw_config ([kernel.kallsyms])
x86_pmu_event_init ([kernel.kallsyms])
perf_try_init_event ([kernel.kallsyms])
perf_event_alloc ([kernel.kallsyms])
SYSC_perf_event_open ([kernel.kallsyms])
sys_perf_event_open ([kernel.kallsyms])
do_syscall_64 ([kernel.kallsyms])
return_from_SYSCALL_64 ([kernel.kallsyms])
syscall (/usr/lib64/libc-2.24.so)
perf_event_attr__set_max_precise_ip (/home/acme/bin/perf)
perf_evsel__new_cycles (/home/acme/bin/perf)
perf_evlist__add_default (/home/acme/bin/perf)
cmd_record (/home/acme/bin/perf)
run_builtin (/home/acme/bin/perf)
handle_internal_command (/home/acme/bin/perf)
0.028 ( 0.004 ms): perf/4150 ... [continued]: perf_event_open()) = -1 EINVAL Invalid argument
41.018 ( 0.012 ms): perf/4150 perf_event_open(attr_uptr: 0x7ffebc8b5dd0, pid: -1, group_fd: -1, flags: FD_CLOEXEC) = 4
41.065 ( 0.011 ms): perf/4150 perf_event_open(attr_uptr: 0x3c7db78, pid: -1, group_fd: -1, flags: FD_CLOEXEC) = 4
41.080 ( 0.006 ms): perf/4150 perf_event_open(attr_uptr: 0x3c7db78, pid: -1, group_fd: -1, flags: FD_CLOEXEC) = 4
41.103 ( 0.010 ms): perf/4150 perf_event_open(attr_uptr: 0x3c4e748, pid: 4151 (perf), group_fd: -1, flags: FD_CLOEXEC) = 4
41.115 ( 0.006 ms): perf/4150 perf_event_open(attr_uptr: 0x3c4e748, pid: 4151 (perf), cpu: 1, group_fd: -1, flags: FD_CLOEXEC) = 5
41.122 ( 0.004 ms): perf/4150 perf_event_open(attr_uptr: 0x3c4e748, pid: 4151 (perf), cpu: 2, group_fd: -1, flags: FD_CLOEXEC) = 6
41.128 ( 0.008 ms): perf/4150 perf_event_open(attr_uptr: 0x3c4e748, pid: 4151 (perf), cpu: 3, group_fd: -1, flags: FD_CLOEXEC) = 8
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.017 MB perf.data (2 samples) ]
#
I.e. that return -EINVAL in x86_pmu_hw_config() is hit three times.
So fix it by just setting attr.sample_period
Now, after this patch:
# perf trace --max-stack=2 -e perf_event_open,probe:x86_pmu_hw_config* perf record usleep 1
[ perf record: Woken up 1 times to write data ]
0.000 ( 0.017 ms): perf/8469 perf_event_open(attr_uptr: 0x7ffe36c27d10, pid: -1, cpu: 3, group_fd: -1, flags: FD_CLOEXEC) = 4
syscall (/usr/lib64/libc-2.24.so)
perf_event_open_cloexec_flag (/home/acme/bin/perf)
0.050 ( 0.031 ms): perf/8469 perf_event_open(attr_uptr: 0x24ebb78, pid: -1, group_fd: -1, flags: FD_CLOEXEC) = 4
syscall (/usr/lib64/libc-2.24.so)
perf_evlist__config (/home/acme/bin/perf)
0.092 ( 0.040 ms): perf/8469 perf_event_open(attr_uptr: 0x24ebb78, pid: -1, group_fd: -1, flags: FD_CLOEXEC) = 4
syscall (/usr/lib64/libc-2.24.so)
perf_evlist__config (/home/acme/bin/perf)
0.143 ( 0.007 ms): perf/8469 perf_event_open(attr_uptr: 0x24bc748, cpu: -1, group_fd: -1 ) = 4
syscall (/usr/lib64/libc-2.24.so)
perf_event_attr__set_max_precise_ip (/home/acme/bin/perf)
0.161 ( 0.007 ms): perf/8469 perf_event_open(attr_uptr: 0x24bc748, pid: 8470 (perf), group_fd: -1, flags: FD_CLOEXEC) = 4
syscall (/usr/lib64/libc-2.24.so)
perf_evsel__open (/home/acme/bin/perf)
0.171 ( 0.005 ms): perf/8469 perf_event_open(attr_uptr: 0x24bc748, pid: 8470 (perf), cpu: 1, group_fd: -1, flags: FD_CLOEXEC) = 5
syscall (/usr/lib64/libc-2.24.so)
perf_evsel__open (/home/acme/bin/perf)
0.180 ( 0.007 ms): perf/8469 perf_event_open(attr_uptr: 0x24bc748, pid: 8470 (perf), cpu: 2, group_fd: -1, flags: FD_CLOEXEC) = 6
syscall (/usr/lib64/libc-2.24.so)
perf_evsel__open (/home/acme/bin/perf)
0.190 ( 0.005 ms): perf/8469 perf_event_open(attr_uptr: 0x24bc748, pid: 8470 (perf), cpu: 3, group_fd: -1, flags: FD_CLOEXEC) = 8
syscall (/usr/lib64/libc-2.24.so)
perf_evsel__open (/home/acme/bin/perf)
[ perf record: Captured and wrote 0.017 MB perf.data (7 samples) ]
#
The probe one called from perf_event_attr__set_max_precise_ip() works
the first time, with attr.precise_ip = 3, wit hthe next ones being the
per cpu ones for the cycles:ppp event.
And here is the text from a report and alternative proposed patch by
Thomas-Mich Richter:
---
On s390 the counter and sampling facility do not support a precise IP
skid level and sometimes returns EOPNOTSUPP when structure member
precise_ip in struct perf_event_attr is not set to zero.
On s390 commnd 'perf record -- true' fails with error EOPNOTSUPP. This
happens only when no events are specified on command line.
The functions called are
...
--> perf_evlist__add_default
--> perf_evsel__new_cycles
--> perf_event_attr__set_max_precise_ip
The last function determines the value of structure member precise_ip by
invoking the perf_event_open() system call and checking the return code.
The first successful open is the value for precise_ip.
However the value is determined without setting member sample_period and
indicates no sampling.
On s390 the counter facility and sampling facility are different. The
above procedure determines a precise_ip value of 3 using the counter
facility. Later it uses the sampling facility with a value of 3 and
fails with EOPNOTSUPP.
---
v2: Older compilers (e.g. gcc 4.4.7) don't support referencing members
of unnamed union members in the container struct initialization, so
move from:
struct perf_event_attr attr = {
...
.sample_period = 1,
};
to right after it as:
struct perf_event_attr attr = {
...
};
attr.sample_period = 1;
v3: We need to reset .sample_period to 0 to let the users of
perf_evsel__new_cycles() to properly setup attr.sample_period or
attr.sample_freq. Reported by Ingo Molnar.
Reported-and-Acked-by: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com>
Acked-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Fixes: 18e7a45af9 ("perf/x86: Reject non sampling events with precise_ip")
Link: http://lkml.kernel.org/n/tip-yv6nnkl7tzqocrm0hl3x7vf1@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8ce59b16b4 ]
When wait for firmware init fails, previous code would mistakenly
return success and cause inconsistency in the driver state.
Fixes: 6c780a0267 ("net/mlx5: Wait for FW readiness before initializing command interface")
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e58edaa486 ]
Helmut reported a bug about division by zero while
running traffic and doing physical cable pull test.
When the cable unplugged the ppms become zero, so when
dividing the current ppms by the previous ppms in the
next dim iteration there is division by zero.
This patch prevent this division for both ppms and epms.
Fixes: c3164d2fc4 ("net/mlx5e: Added BW check for DIM decision mechanism")
Reported-by: Helmut Grauer <helmut.grauer@de.ibm.com>
Signed-off-by: Talat Batheesh <talatb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 452e62b71f ]
Before this, we use 'filled' mode here, ie. if all range has been
filled with EXTENT_DEFRAG bits, get to clear it, but if the defrag
range joins the adjacent delalloc range, then we'll have EXTENT_DEFRAG
bits in extent_state until releasing this inode's pages, and that
prevents extent_data from being freed.
This clears the bit if any was found within the ordered extent.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 594238158b ]
The current comparison of entry < 0 will never be true since entry is an
unsigned integer. Make entry an int to ensure -ve error return values
from the call to jumbo_frm are correctly being caught.
Detected by CoverityScan, CID#1238760 ("Macro compares unsigned to 0")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9bd2bbc01d ]
gcc 7.1 reports the following warning:
block/elevator.c: In function ‘elv_register’:
block/elevator.c:898:5: warning: ‘snprintf’ output may be truncated before the last format character [-Wformat-truncation=]
"%s_io_cq", e->elevator_name);
^~~~~~~~~~
block/elevator.c:897:3: note: ‘snprintf’ output between 7 and 22 bytes into a destination of size 21
snprintf(e->icq_cache_name, sizeof(e->icq_cache_name),
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
"%s_io_cq", e->elevator_name);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The bug is that the name of the icq_cache is 6 characters longer than
the elevator name, but only ELV_NAME_MAX + 5 characters were reserved
for it --- so in the case of a maximum-length elevator name, the 'q'
character in "_io_cq" would be truncated by snprintf(). Fix it by
reserving ELV_NAME_MAX + 6 characters instead.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b7dfee2433 ]
The description of the CSI_SEL bit in the i.MX6 reference manual is
incorrect. It states "This bit defines which CSI is the input to the
IC. This bit is effective only if IC_INPUT is bit cleared".
From experiment it was found this is in fact not correct. The CSI_SEL
bit selects which CSI is input to _both_ the VDIC _and_ the IC. If the
IC_INPUT bit is set so that the IC is receiving from the VDIC, the IC
ignores the CSI_SEL bit, but CSI_SEL still selects which CSI the VDIC
receives from in that case.
Signed-off-by: Marek Vasut <marex@denx.de>
Signed-off-by: Steve Longerbeam <steve_longerbeam@mentor.com>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 06a4b6d009 ]
As reported by Patrice, the header layout of the decompressor is
incorrect when building for v7-M. In this case, the __nop macro
resolves to 'mov r0, r0', which is emitted as a narrow encoding,
resulting in the header data fields to end up at lower offsets than
required.
Given the variety of targets we need to support with the same code,
the startup sequence is a bit of a jumble, and uses instructions
and macros whose encoding widths cannot be specified (badr), or only
exist in a narrow encoding (bx)
So force the use of a wide encoding in __nop, and replace the start
sequence with a simple jump to the label marking the start of code,
preceded by a Thumb2 mode switch if required (using explicit wide
encodings where appropriate). The label itself can be moved to the
start of code [where it belongs] due to the larger range of branch
instructions as compared to adr instructions.
Reported-by: Patrice CHOTARD <patrice.chotard@st.com>
Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ae1d557d8f ]
A SoC variant of Geode GX1, notably NSC branded SC1100, seems to
report an inverted Device ID in its DIR0 configuration register,
specifically 0xb instead of the expected 0x4.
Catch this presumably quirky version so it's properly recognized
as GX1 and has its cache switched to write-back mode, which provides
a significant performance boost in most workloads.
SC1100's datasheet "Geode™ SC1100 Information Appliance On a Chip",
states in section 1.1.7.1 "Device ID" that device identification
values are specified in SC1100's device errata. These, however,
seem to not have been publicly released.
Wading through a number of boot logs and /proc/cpuinfo dumps found on
pastebin and blogs, this patch should mostly be relevant for a number
of now admittedly aging Soekris NET4801 and PC Engines WRAP devices,
the latter being the platform this issue was discovered on.
Performance impact was verified using "openssl speed", with
write-back caching scaling throughput between -3% and +41%.
Signed-off-by: Christian Sünkenberg <christian.suenkenberg@student.kit.edu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1496596719.26725.14.camel@student.kit.edu
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4bd7ef0b03 ]
Qlogic's 82xx series adapter doesn't support
tunnel offloads, driver incorrectly assumes that it is
supported and causes firmware hang while running tunnel IO.
This patch fixes this by not advertising tunnel offloads
for 82xx adapters.
Signed-off-by: Manish Chopra <manish.chopra@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f7cf69ae17 ]
ata_parse_force_one() was incorrectly comparing @p to @endp when it
should have been comparing @id. The only consequence is that it may
end up using an invalid port number in "libata.force" module param
instead of rejecting it.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Petru-Florin Mihancea <petrum@gmail.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=195785
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7a7c0a6438 ]
When starting or stopping an aggregation session, one of the steps
is that the driver calls back to mac80211 that the start/stop can
proceed. This is handled by queueing up a fake SKB and processing
it from the normal iface/sdata work. Since this isn't flushed when
disassociating, the following race is possible:
* associate
* start aggregation session
* driver callback
* disassociate
* associate again to the same AP
* callback processing runs, leading to a WARN_ON() that
the TID hadn't requested aggregation
If the second association isn't to the same AP, there would only
be a message printed ("Could not find station: <addr>"), but the
same race could happen.
Fix this by not going the whole detour with a fake SKB etc. but
simply looking up the aggregation session in the driver callback,
marking it with a START_CB/STOP_CB bit and then scheduling the
regular aggregation work that will now process these bits as well.
This also simplifies the code and gets rid of the whole problem
with allocation failures of said skb, which could have left the
session in limbo.
Reported-by: Jouni Malinen <j@w1.fi>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0a71677691 ]
Element size in the manifest should be updated for each token, so that the
loop can parse all the string elements in the manifest. This was not
happening when more than two string elements appear consecutively, as it is
not updated with correct string element size. Fixed with this patch.
Signed-off-by: Shreyas NC <shreyas.nc@intel.com>
Signed-off-by: Subhransu S. Prusty <subhransu.s.prusty@intel.com>
Acked-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4497a224f7 ]
The hi6220_reset driver can be built as a standalone module
yet it cannot be loaded because it depends on GPL exported symbols.
Lets set the module license so that the module loads, and things like
the on-board kirin drm starts working.
Signed-off-by: Jeremy Linton <lintonrjeremy@gmail.com>
Reviewed-by: Xinliang Liu <xinliang.liu@linaro.org>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4751832da9 ]
[BUG]
Cycle mount btrfs can cause fiemap to return different result.
Like:
# mount /dev/vdb5 /mnt/btrfs
# dd if=/dev/zero bs=16K count=4 oflag=dsync of=/mnt/btrfs/file
# xfs_io -c "fiemap -v" /mnt/btrfs/file
/mnt/test/file:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [0..127]: 25088..25215 128 0x1
# umount /mnt/btrfs
# mount /dev/vdb5 /mnt/btrfs
# xfs_io -c "fiemap -v" /mnt/btrfs/file
/mnt/test/file:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [0..31]: 25088..25119 32 0x0
1: [32..63]: 25120..25151 32 0x0
2: [64..95]: 25152..25183 32 0x0
3: [96..127]: 25184..25215 32 0x1
But after above fiemap, we get correct merged result if we call fiemap
again.
# xfs_io -c "fiemap -v" /mnt/btrfs/file
/mnt/test/file:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [0..127]: 25088..25215 128 0x1
[REASON]
Btrfs will try to merge extent map when inserting new extent map.
btrfs_fiemap(start=0 len=(u64)-1)
|- extent_fiemap(start=0 len=(u64)-1)
|- get_extent_skip_holes(start=0 len=64k)
| |- btrfs_get_extent_fiemap(start=0 len=64k)
| |- btrfs_get_extent(start=0 len=64k)
| | Found on-disk (ino, EXTENT_DATA, 0)
| |- add_extent_mapping()
| |- Return (em->start=0, len=16k)
|
|- fiemap_fill_next_extent(logic=0 phys=X len=16k)
|
|- get_extent_skip_holes(start=0 len=64k)
| |- btrfs_get_extent_fiemap(start=0 len=64k)
| |- btrfs_get_extent(start=16k len=48k)
| | Found on-disk (ino, EXTENT_DATA, 16k)
| |- add_extent_mapping()
| | |- try_merge_map()
| | Merge with previous em start=0 len=16k
| | resulting em start=0 len=32k
| |- Return (em->start=0, len=32K) << Merged result
|- Stripe off the unrelated range (0~16K) of return em
|- fiemap_fill_next_extent(logic=16K phys=X+16K len=16K)
^^^ Causing split fiemap extent.
And since in add_extent_mapping(), em is already merged, in next
fiemap() call, we will get merged result.
[FIX]
Here we introduce a new structure, fiemap_cache, which records previous
fiemap extent.
And will always try to merge current fiemap_cache result before calling
fiemap_fill_next_extent().
Only when we failed to merge current fiemap extent with cached one, we
will call fiemap_fill_next_extent() to submit cached one.
So by this method, we can merge all fiemap extents.
It can also be done in fs/ioctl.c, however the problem is if
fieinfo->fi_extents_max == 0, we have no space to cache previous fiemap
extent.
So I choose to merge it in btrfs.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b0804ed0ca ]
The Raspberry Pi startup stub files for multi-core BCM283X processors
make the secondary CPUs spin until the corresponding mailbox is
written. These stubs are loaded at physical address 0x00000xxx (as seen
by the ARMs), but this page will be reused by the kernel unless it is
explicitly reserved, causing the waiting cores to execute random code.
Use the /memreserve/ Device Tree directive to mark the first page as
off-limits to the kernel.
See: https://github.com/raspberrypi/linux/issues/1989
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a7595a820b ]
Move NAPI enable to 'ath10k_ahb_hif_start' from
'ath10k_ahb_hif_power_up'. This is to maintain the symmetry
of calling napi_enable() from ath10k_ahb_hif_start() so that it
matches with napi_disable() being called from ath10k_pci_hif_stop().
This change is based on the crash fix from Kalle for PCI interface in
commit 1427228d58 ("ath10k: fix napi crash during rmmod when probe
firmware fails").
Signed-off-by: Mohammed Shafi Shajakhan <mohammed@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e2dc9b6e38 ]
As a further improvement to the PF/VF link change logic, use a private
mutex instead of the rtnl lock to protect link change logic. With the
new mutex, we don't have to take the rtnl lock in the workqueue when
we have to handle link related functions. If the VF and PF drivers
are running on the same host and both take the rtnl lock and one is
waiting for the other, it will cause timeout. This patch fixes these
timeouts.
Fixes: 90c694bb71 ("bnxt_en: Fix RTNL lock usage on bnxt_update_link().")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fd849b7c41 ]
No matter whether a request is inserted into workqueue as a work item
to cancel a subscription or to delete a subscription's subscriber
asynchronously, the work items may be executed in different workers.
As a result, it doesn't mean that one request which is raised prior to
another request is definitely handled before the latter. By contrast,
if the latter request is executed before the former request, below
error may happen:
[ 656.183644] BUG: spinlock bad magic on CPU#0, kworker/u8:0/12117
[ 656.184487] general protection fault: 0000 [#1] SMP
[ 656.185160] Modules linked in: tipc ip6_udp_tunnel udp_tunnel 9pnet_virtio 9p 9pnet virtio_net virtio_pci virtio_ring virtio [last unloaded: ip6_udp_tunnel]
[ 656.187003] CPU: 0 PID: 12117 Comm: kworker/u8:0 Not tainted 4.11.0-rc7+ #6
[ 656.187920] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 656.188690] Workqueue: tipc_rcv tipc_recv_work [tipc]
[ 656.189371] task: ffff88003f5cec40 task.stack: ffffc90004448000
[ 656.190157] RIP: 0010:spin_bug+0xdd/0xf0
[ 656.190678] RSP: 0018:ffffc9000444bcb8 EFLAGS: 00010202
[ 656.191375] RAX: 0000000000000034 RBX: ffff88003f8d1388 RCX: 0000000000000000
[ 656.192321] RDX: ffff88003ba13708 RSI: ffff88003ba0cd08 RDI: ffff88003ba0cd08
[ 656.193265] RBP: ffffc9000444bcd0 R08: 0000000000000030 R09: 000000006b6b6b6b
[ 656.194208] R10: ffff8800bde3e000 R11: 00000000000001b4 R12: 6b6b6b6b6b6b6b6b
[ 656.195157] R13: ffffffff81a3ca64 R14: ffff88003f8d1388 R15: ffff88003f8d13a0
[ 656.196101] FS: 0000000000000000(0000) GS:ffff88003ba00000(0000) knlGS:0000000000000000
[ 656.197172] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 656.197935] CR2: 00007f0b3d2e6000 CR3: 000000003ef9e000 CR4: 00000000000006f0
[ 656.198873] Call Trace:
[ 656.199210] do_raw_spin_lock+0x66/0xa0
[ 656.199735] _raw_spin_lock_bh+0x19/0x20
[ 656.200258] tipc_subscrb_subscrp_delete+0x28/0xf0 [tipc]
[ 656.200990] tipc_subscrb_rcv_cb+0x45/0x260 [tipc]
[ 656.201632] tipc_receive_from_sock+0xaf/0x100 [tipc]
[ 656.202299] tipc_recv_work+0x2b/0x60 [tipc]
[ 656.202872] process_one_work+0x157/0x420
[ 656.203404] worker_thread+0x69/0x4c0
[ 656.203898] kthread+0x138/0x170
[ 656.204328] ? process_one_work+0x420/0x420
[ 656.204889] ? kthread_create_on_node+0x40/0x40
[ 656.205527] ret_from_fork+0x29/0x40
[ 656.206012] Code: 48 8b 0c 25 00 c5 00 00 48 c7 c7 f0 24 a3 81 48 81 c1 f0 05 00 00 65 8b 15 61 ef f5 7e e8 9a 4c 09 00 4d 85 e4 44 8b 4b 08 74 92 <45> 8b 84 24 40 04 00 00 49 8d 8c 24 f0 05 00 00 eb 8d 90 0f 1f
[ 656.208504] RIP: spin_bug+0xdd/0xf0 RSP: ffffc9000444bcb8
[ 656.209798] ---[ end trace e2a800e6eb0770be ]---
In above scenario, the request of deleting subscriber was performed
earlier than the request of canceling a subscription although the
latter was issued before the former, which means tipc_subscrb_delete()
was called before tipc_subscrp_cancel(). As a result, when
tipc_subscrb_subscrp_delete() called by tipc_subscrp_cancel() was
executed to cancel a subscription, the subscription's subscriber
refcnt had been decreased to 1. After tipc_subscrp_delete() where
the subscriber was freed because its refcnt was decremented to zero,
but the subscriber's lock had to be released, as a consequence, panic
happened.
By contrast, if we increase subscriber's refcnt before
tipc_subscrb_subscrp_delete() is called in tipc_subscrp_cancel(),
the panic issue can be avoided.
Fixes: d094c4d5f5 ("tipc: add subscription refcount to avoid invalid delete")
Reported-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6c7e983b22 ]
In 9dbbfb0ab6 function tipc_sk_reinit
had additional logic added to loop in the event that function
rhashtable_walk_next() returned -EAGAIN. No worries.
However, if rhashtable_walk_start returns -EAGAIN, it does "continue",
and therefore skips the call to rhashtable_walk_stop(). That has
the effect of calling rcu_read_lock() without its paired call to
rcu_read_unlock(). Since rcu_read_lock() may be nested, the problem
may not be apparent for a while, especially since resize events may
be rare. But the comments to rhashtable_walk_start() state:
* ...Note that we take the RCU lock in all
* cases including when we return an error. So you must always call
* rhashtable_walk_stop to clean up.
This patch replaces the continue with a goto and label to ensure a
matching call to rhashtable_walk_stop().
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 061870800e ]
Completion on timeout should not free the driver command entry structure
as it will need to access it again once real completion event from FW
will occur.
Fixes: 73dd3a4839 ('net/mlx5: Avoid using pending command interface slots')
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Cc: kernel-team@fb.com
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1c78f7735b ]
Currently we create the sysfs entry even if we fail mapping
it. In that case, the unmapping will not remove the sysfs created
file. There is no good reason to create a sysfs entry for a non
working CMB and show his characteristics.
Fixes: f63572dff ("nvme: unmap CMB and remove sysfs file in reset path")
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Stephen Bates <sbates@raithlin.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 07d432bb97 ]
The driver may sleep under a spin lock, and the function call path is:
post_one_send (acquire the lock by spin_lock_irqsave)
init_send_wqe
copy_from_user --> may sleep
There is no flow that makes "qp->is_user" true, and copy_from_user may
cause bug when a non-user pointer is used. So the lines of copy_from_user
and check of "qp->is_user" are removed.
Signed-off-by: Jia-Ju Bai <baijiaju1990@163.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5f13e58767 ]
A previous patch which claimed to remove off by ones actually introduced
them.
strlen() returns the length of the string not including the NUL
character. We are using strcpy() to copy "name" into a buffer which is
ORANGEFS_MAX_XATTR_NAMELEN characters long. We should make sure to
leave space for the NUL, otherwise we're writing one character beyond
the end of the buffer.
Fixes: e675c5ec51 ("orangefs: clean up oversize xattr validation")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5236333592 ]
RoCE Annex (A16.9.10/11) declares that during attach (detach) QP to a
multicast group, if the QP is associated with a RoCE port, the
multicast group MLID is unused and is ignored.
During attach or detach multicast, when the QP is associated with a
port, it is enough to check the port's link layer and validate the
LID only if it is Infiniband. Otherwise, avoid validating the
multicast LID.
Fixes: 8561eae60f ("IB/core: For multicast functions, verify that LIDs are multicast LIDs")
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 14fa91e0fe ]
netdev_wait_allrefs() could rebroadcast NETDEV_UNREGISTER event
multiple times until all refs are gone, which will result in calling
ipoib_delete_debug_files multiple times and printing a warning.
Remove the WARN_ONCE since checks of NULL pointers before calling
debugfs_remove are not needed.
Fixes: 771a525840 ("IB/IPoIB: ibX: failed to create mcg debug file")
Signed-off-by: Alaa Hleihel <alaa@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f9ac89f5ad ]
The 98d610c373 patch was introduced since v4.11-rc1 that it causes
that the accelerometer input device will not be created on workable
machines because the HID string comparing logic is wrong.
And, the patch doesn't prevent that the accelerometer input device
be created on the machines that have no BST0001. That's because
the acpi_get_devices() returns success even it didn't find any
match device.
This patch fixed the HID string comparing logic of BST0001 device.
And, it also makes sure that the acpi_get_devices() returns
acpi_handle for BST0001.
Fixes: 98d610c373 ("acer-wmi: setup accelerometer when machine has appropriate notify event")
Reference: https://bugzilla.kernel.org/show_bug.cgi?id=193761
Reported-by: Samuel Sieb <samuel-kbugs@sieb.net>
Signed-off-by: "Lee, Chun-Yi" <jlee@suse.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b91d532928 ]
After commit c2ed1880fd ("net: ipv6: check route protocol when
deleting routes"), ipv6 route checks rt protocol when trying to
remove a rt entry.
It introduced a side effect causing 'ip -6 route flush cache' not
to work well. When flushing caches with iproute, all route caches
get dumped from kernel then removed one by one by sending DELROUTE
requests to kernel for each cache.
The thing is iproute sends the request with the cache whose proto
is set with RTPROT_REDIRECT by rt6_fill_node() when kernel dumps
it. But in kernel the rt_cache protocol is still 0, which causes
the cache not to be matched and removed.
So the real reason is rt6i_protocol in the route is not set when
it is allocated. As David Ahern's suggestion, this patch is to
set rt6i_protocol properly in the route when it is installed and
remove the codes setting rtm_protocol according to rt6i_flags in
rt6_fill_node.
This is also an improvement to keep rt6i_protocol consistent with
rtm_protocol.
Fixes: c2ed1880fd ("net: ipv6: check route protocol when deleting routes")
Reported-by: Jianlin Shi <jishi@redhat.com>
Suggested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 92a16c8629 ]
PCI_STD_RESOURCE_END is (confusingly) the index of the last valid BAR, not
the *number* of BARs. To iterate through all possible BARs, we need to
include PCI_STD_RESOURCE_END.
Fixes: 55d728a40d ("efi/fb: Avoid reconfiguration of BAR that covers the framebuffer")
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ababb08938 ]
Since commit e247454103 ("bcm2835: Fix hang for writing messages
larger than 16 bytes") the interrupt handler is prone to a possible
NULL pointer dereference. This could happen if an interrupt fires
before curr_msg is set by bcm2835_i2c_xfer_msg() and randomly occurs
on the RPi 3. Even this is an unexpected behavior the driver must
handle that with an error instead of a crash.
Reported-by: Peter Robinson <pbrobinson@gmail.com>
Fixes: e247454103 ("bcm2835: Fix hang for writing messages larger than 16 bytes")
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Acked-by: Noralf Trønnes <noralf@tronnes.org>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit deb8699932 ]
HiSilicon Hip06/Hip07 can operate as either a Root Port or an Endpoint. It
always advertises an MSI capability, but it can only generate MSIs when in
Endpoint mode.
The device has the same Vendor and Device IDs in both modes, so check the
Class Code and disable MSI only when operating as a Root Port.
[bhelgaas: changelog]
Fixes: 72f2ff0deb ("PCI: Disable MSI for HiSilicon Hip06/Hip07 Root Ports")
Signed-off-by: Dongdong Liu <liudongdong3@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Zhou Wang <wangzhou1@hisilicon.com>
Cc: stable@vger.kernel.org # v4.11+
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0f27cff859 ]
The acpi_mask_gpe= kernel parameter documentation states that the range
of mask is 128 GPEs (0x00 to 0x7F). The acpi_masked_gpes mask is a u64 so
only 64 GPEs (0x00 to 0x3F) can really be masked.
Use a bitmap of size 0xFF instead of a u64 for the GPE mask so 256
GPEs can be masked.
Fixes: 9c4aa1eecb (ACPI / sysfs: Provide quirk mechanism to prevent GPE flooding)
Signed-off-by: Prarit Bharava <prarit@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2a83fba6ca ]
This patch reverts two previous applied patches to fix an issue
that appeared when using SGMII based SFP modules. In the current
state the driver will try to reset the PHY before obtaining the
phy_addr of the SGMII attached PHY. That leads to an error in
e1000_write_phy_reg_sgmii_82575. Causing the initialization to
fail:
igb: Intel(R) Gigabit Ethernet Network Driver - version 5.4.0-k
igb: Copyright (c) 2007-2014 Intel Corporation.
igb: probe of ????:??:??.? failed with error -3
The patches being reverted are:
commit 1827853354
Author: Aaron Sierra <asierra@xes-inc.com>
Date: Tue Nov 29 10:03:56 2016 -0600
igb: reset the PHY before reading the PHY ID
commit 440aeca4b9
Author: Matwey V Kornilov <matwey@sai.msu.ru>
Date: Thu Nov 24 13:32:48 2016 +0300
igb: Explicitly select page 0 at initialization
The first reverted patch directly causes the problem mentioned above.
In case of SGMII the phy_addr is not known at this point and will
only be obtained by 'igb_get_phy_id_82575' further down in the code.
The second removed patch selects forces selection of page 0 in the
PHY. Something that the reset tries to address as well.
As pointed out by Alexander Duzck, the patch below fixes the same
issue but in the proper location:
commit 4e684f59d7
Author: Chris J Arges <christopherarges@gmail.com>
Date: Wed Nov 2 09:13:42 2016 -0500
igb: Workaround for igb i210 firmware issue
Reverts: 440aeca4b9.
Reverts: 1827853354.
Signed-off-by: Christian Grönke <c.groenke@infodas.de>
Reviewed-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d3bb910c15 ]
Commit 88c5c13a50 (f2fs: fix multiple f2fs_add_link() calls having
same name) does not cover the scenario where inline dentry is enabled.
In that case, F2FS_I(dir)->task will be NULL, and __f2fs_add_link will
lookup dentries one more time.
This patch fixes it by moving the assigment of current task to a upper
level to cover both normal and inline dentry.
Cc: <stable@vger.kernel.org>
Fixes: 88c5c13a50 (f2fs: fix multiple f2fs_add_link() calls having same name)
Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 11887ed172 ]
Commit 34c2f668d0 ("MIPS: microMIPS: Add unaligned access support.")
added fairly broken support for handling 16bit microMIPS instructions in
get_frame_info(). It adjusts the instruction pointer by 16bits in the
case of a 16bit sp move instruction, but not any other 16bit
instruction.
Commit b6c7a324df ("MIPS: Fix get_frame_info() handling of microMIPS
function size") goes some way to fixing get_frame_info() to iterate over
microMIPS instuctions, but the instruction pointer is still manipulated
using a postincrement, and is of union mips_instruction type. Since the
union is sized to the largest member (a word), but microMIPS
instructions are a mix of halfword and word sizes, the function does not
always iterate correctly, ending up misaligned with the instruction
stream and interpreting it incorrectly.
Since the instruction modifying the stack pointer is usually the first
in the function, that one is usually handled correctly. But the
instruction which saves the return address to the sp is some variable
number of instructions into the frame and is frequently missed due to
not being on a word boundary, leading to incomplete walking of the
stack.
Fix this by incrementing the instruction pointer based on the size of
the previously decoded instruction (& remove the hack introduced by
commit 34c2f668d0 ("MIPS: microMIPS: Add unaligned access support.")
which adjusts the instruction pointer in the case of a 16bit sp move
instruction, but not any other).
Fixes: 34c2f668d0 ("MIPS: microMIPS: Add unaligned access support.")
Signed-off-by: Matt Redfearn <matt.redfearn@imgtec.com>
Cc: Marcin Nowakowski <marcin.nowakowski@imgtec.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/16953/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d6d8c8a482 ]
When mainline introduced commit a96dfddbcc ("base/memory, hotplug: fix
a kernel oops in show_valid_zones()"), it obtained the valid start and
end pfn from the given pfn range. The valid start pfn can fix the
actual issue, but it introduced another issue. The valid end pfn will
may exceed the given end_pfn.
Although the incorrect overflow will not result in actual problem at
present, but I think it need to be fixed.
[toshi.kani@hpe.com: remove assumption that end_pfn is aligned by MAX_ORDER_NR_PAGES]
Fixes: a96dfddbcc ("base/memory, hotplug: fix a kernel oops in show_valid_zones()")
Link: http://lkml.kernel.org/r/1486467299-22648-1-git-send-email-zhongjiang@huawei.com
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 331c7cb307 ]
Perf top is often crashing at very random locations on powerpc. After
investigating, I found the crash only happens when sample is of zero
length symbol. Powerpc kernel has many such symbols which does not
contain length details in vmlinux binary and thus start and end
addresses of such symbols are same.
Structure
struct sym_hist {
u64 nr_samples;
u64 period;
struct sym_hist_entry addr[0];
};
has last member 'addr[]' of size zero. 'addr[]' is an array of addresses
that belongs to one symbol (function). If function consist of 100
instructions, 'addr' points to an array of 100 'struct sym_hist_entry'
elements. For zero length symbol, it points to the *empty* array, i.e.
no members in the array and thus offset 0 is also invalid for such
array.
static int __symbol__inc_addr_samples(...)
{
...
offset = addr - sym->start;
h = annotation__histogram(notes, evidx);
h->nr_samples++;
h->addr[offset].nr_samples++;
h->period += sample->period;
h->addr[offset].period += sample->period;
...
}
Here, when 'addr' is same as 'sym->start', 'offset' becomes 0, which is
valid for normal symbols but *invalid* for zero length symbols and thus
updating h->addr[offset] causes memory corruption.
Fix this by adding one dummy element for zero length symbols.
Link: https://lkml.org/lkml/2016/10/10/148
Fixes: edee44be59 ("perf annotate: Don't throw error for zero length symbols")
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Kim Phillips <kim.phillips@arm.com>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Taeung Song <treeze.taeung@gmail.com>
Link: http://lkml.kernel.org/r/1508854806-10542-1-git-send-email-ravi.bangoria@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2c05d88818 ]
In cxgb_extension_ioctl(), the command of the ioctl is firstly copied from
the user-space buffer 'useraddr' to 'cmd' and checked through the
switch statement. If the command is not as expected, an error code
EOPNOTSUPP is returned. In the following execution, i.e., the cases of the
switch statement, the whole buffer of 'useraddr' is copied again to a
specific data structure, according to what kind of command is requested.
However, after the second copy, there is no re-check on the newly-copied
command. Given that the buffer 'useraddr' is in the user space, a malicious
user can race to change the command between the two copies. By doing so,
the attacker can supply malicious data to the kernel and cause undefined
behavior.
This patch adds a re-check in each case of the switch statement if there is
a second copy in that case, to re-check whether the command obtained in the
second copy is the same as the one in the first copy. If not, an error code
EINVAL is returned.
Signed-off-by: Wenwen Wang <wang6495@umn.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fe3a83af6a ]
Fix a commit 4bcc595ccd ("printk: reinstate KERN_CONT for printing
continuation lines") regression with the `declance' driver, which caused
the adapter identification message to be split between two lines, e.g.:
declance.c: v0.011 by Linux MIPS DECstation task force
tc6: PMAD-AA
, addr = 08:00:2b:1b:2a:6a, irq = 14
tc6: registered as eth0.
Address that properly, by printing identification with a single call,
making the messages now look like:
declance.c: v0.011 by Linux MIPS DECstation task force
tc6: PMAD-AA, addr = 08:00:2b:1b:2a:6a, irq = 14
tc6: registered as eth0.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Fixes: 4bcc595ccd ("printk: reinstate KERN_CONT for printing continuation lines")
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 657ade07df ]
During certain heavy network loads TX could time out
with TX ring dump.
TX is sometimes never restarted after reaching
"tx_stop_threshold" because function "fec_enet_tx_queue"
only tests the first queue.
In addition the TX timeout callback function failed to
recover because it also operated only on the first queue.
Signed-off-by: Rickard x Andersson <rickaran@axis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit cd6fb677ce ]
Some of the scheduling tracepoints allow the perf_tp_event
code to write to ring buffer under different cpu than the
code is running on.
This results in corrupted ring buffer data demonstrated in
following perf commands:
# perf record -e 'sched:sched_switch,sched:sched_wakeup' perf bench sched messaging
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run
Total time: 0.383 [sec]
[ perf record: Woken up 8 times to write data ]
0x42b890 [0]: failed to process type: -1765585640
[ perf record: Captured and wrote 4.825 MB perf.data (29669 samples) ]
# perf report --stdio
0x42b890 [0]: failed to process type: -1765585640
The reason for the corruption are some of the scheduling tracepoints,
that have __perf_task dfined and thus allow to store data to another
cpu ring buffer:
sched_waking
sched_wakeup
sched_wakeup_new
sched_stat_wait
sched_stat_sleep
sched_stat_iowait
sched_stat_blocked
The perf_tp_event function first store samples for current cpu
related events defined for tracepoint:
hlist_for_each_entry_rcu(event, head, hlist_entry)
perf_swevent_event(event, count, &data, regs);
And then iterates events of the 'task' and store the sample
for any task's event that passes tracepoint checks:
ctx = rcu_dereference(task->perf_event_ctxp[perf_sw_context]);
list_for_each_entry_rcu(event, &ctx->event_list, event_entry) {
if (event->attr.type != PERF_TYPE_TRACEPOINT)
continue;
if (event->attr.config != entry->type)
continue;
perf_swevent_event(event, count, &data, regs);
}
Above code can race with same code running on another cpu,
ending up with 2 cpus trying to store under the same ring
buffer, which is specifically not allowed.
This patch prevents the problem, by allowing only events with the same
current cpu to receive the event.
NOTE: this requires the use of (per-task-)per-cpu buffers for this
feature to work; perf-record does this.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
[peterz: small edits to Changelog]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andrew Vagin <avagin@openvz.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Fixes: e6dab5ffab ("perf/trace: Add ability to set a target task for events")
Link: http://lkml.kernel.org/r/20180923161343.GB15054@krava
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c530c471ba ]
The driver does not check for Wake-on-LAN modes specified by an user,
but will conditionally set the device as wake-up enabled or not based on
that, which could be a very confusing user experience.
Fixes: e0e474a83c ("smsc95xx: add wol magic packet support")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9c734b2769 ]
The driver does not check for Wake-on-LAN modes specified by an user,
but will conditionally set the device as wake-up enabled or not based on
that, which could be a very confusing user experience.
Fixes: 6c63650326 ("smsc75xx: add wol magic packet support")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f2750df154 ]
The driver does not check for Wake-on-LAN modes specified by an user,
but will conditionally set the device as wake-up enabled or not based on
that, which could be a very confusing user experience.
Fixes: 21ff2e8976 ("r8152: support WOL")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c5cb93e994 ]
The driver currently silently accepts unsupported Wake-on-LAN modes
(other than WAKE_PHY or WAKE_MAGIC) without reporting that to the user,
which is confusing.
Fixes: 19a38d8e0a ("USB2NET : SR9800 : One chip USB2.0 USB2NET SR9800 Device Driver Support")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit eb9ad088f9 ]
The driver supports a fair amount of Wake-on-LAN modes, but is not
checking that the user specified one that is supported.
Fixes: 55d7de9de6 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Woojung Huh <Woojung.Huh@Microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5ba6b4aa9a ]
The driver currently silently accepts unsupported Wake-on-LAN modes
(other than WAKE_PHY or WAKE_MAGIC) without reporting that to the user,
which is confusing.
Fixes: e2ca90c276 ("ax88179_178a: ASIX AX88179_178A USB 3.0/2.0 to gigabit ethernet adapter driver")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c4ce446e33 ]
The driver currently silently accepts unsupported Wake-on-LAN modes
(other than WAKE_PHY or WAKE_MAGIC) without reporting that to the user,
which is confusing.
Fixes: 2e55cc7210 ("[PATCH] USB: usbnet (3/9) module for ASIX Ethernet adapters")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1c492a9d55 ]
Clang warns when a constant is used in a boolean context as it thinks a
bitwise operation may have been intended.
drivers/net/ethernet/qlogic/qed/qed_vf.c:415:27: warning: use of logical
'&&' with constant operand [-Wconstant-logical-operand]
if (!p_iov->b_pre_fp_hsi &&
^
drivers/net/ethernet/qlogic/qed/qed_vf.c:415:27: note: use '&' for a
bitwise operation
if (!p_iov->b_pre_fp_hsi &&
^~
&
drivers/net/ethernet/qlogic/qed/qed_vf.c:415:27: note: remove constant
to silence this warning
if (!p_iov->b_pre_fp_hsi &&
~^~
1 warning generated.
This has been here since commit 1fe614d10f ("qed: Relax VF firmware
requirements") and I am not entirely sure why since 0 isn't a special
case. Just remove the statement causing Clang to warn since it isn't
required.
Link: https://github.com/ClangBuiltLinux/linux/issues/126
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d3a315795b ]
Clang warns when one enumerated type is implicitly converted to another.
drivers/net/ethernet/qlogic/qed/qed_roce.c:153:12: warning: implicit
conversion from enumeration type 'enum roce_mode' to different
enumeration type 'enum roce_flavor' [-Wenum-conversion]
flavor = ROCE_V2_IPV6;
~ ^~~~~~~~~~~~
drivers/net/ethernet/qlogic/qed/qed_roce.c:156:12: warning: implicit
conversion from enumeration type 'enum roce_mode' to different
enumeration type 'enum roce_flavor' [-Wenum-conversion]
flavor = MAX_ROCE_MODE;
~ ^~~~~~~~~~~~~
2 warnings generated.
Use the appropriate values from the expected type, roce_flavor:
ROCE_V2_IPV6 = RROCE_IPV6 = 2
MAX_ROCE_MODE = MAX_ROCE_FLAVOR = 3
While we're add it, ditch the local variable flavor, we can just return
the value directly from the switch statement.
Link: https://github.com/ClangBuiltLinux/linux/issues/125
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 28ef8b49a3 ]
The allocation of hwsim radio identifiers uses a post-increment from 0,
so the first radio has idx 0. This idx is explicitly excluded from
multicast announcements ever since, but it is unclear why.
Drop that idx check and announce the first radio as well. This makes
userspace happy if it relies on these events.
Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 96fc74333f ]
There is a copy and paste bug so we accidentally use the RX_ shift when
we're in TX_ mode.
Fixes: bb8b2062af ("fsl/qe: setup clock source for TDM mode")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Zhao Qiang <qiang.zhao@nxp.com>
Signed-off-by: Li Yang <leoyang.li@nxp.com>
(cherry picked from commit 3cb31b634052ed458922e0c8e2b4b093d7fb60b9)
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 64e9e22e68 ]
If the qman driver didn't probe, calling qman_alloc_fqid_range,
qman_alloc_pool_range or qman_alloc_cgrid_range (as done in dpaa_eth) will
pass a NULL pointer to gen_pool_alloc, leading to a NULL pointer
dereference.
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Reviewed-by: Roy Pledge <roy.pledge@nxp.com>
Signed-off-by: Li Yang <leoyang.li@nxp.com>
(cherry picked from commit f72487a2788aa70c3aee1d0ebd5470de9bac953a)
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e1e5d8a9fe ]
Clear ADDR64 dma bit in DMACFG register in case that HW_DMA_CAP_64B is
not detected on 64bit system.
The issue was observed when bootloader(u-boot) does not check macb
feature at DCFG6 register (DAW64_OFFSET) and enabling 64bit dma support
by default. Then macb driver is reading DMACFG register back and only
adding 64bit dma configuration but not cleaning it out.
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3ab97942d0 ]
A number of our interrupts were incorrectly specified, fix both the PPI
and SPI interrupts to be correct.
Fixes: b5762cacc4 ("ARM: bcm63138: add NAND DT support")
Fixes: 46d4bca044 ("ARM: BCM63XX: add BCM63138 minimal Device Tree")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3a58ac65e2 ]
IO_SPACE_LIMIT is the ending address of the PCI IO space, i.e
something like 0xfffff (and not 0x100000).
Therefore, when offset = 0xf0000 is passed as argument, this function
fails even though the offset + SZ_64K fits below the
IO_SPACE_LIMIT. This makes the last chunk of 64 KB of the I/O space
not usable as it cannot be mapped.
This patch fixes that by substracing 1 to offset + SZ_64K, so that we
compare the addrss of the last byte of the I/O space against
IO_SPACE_LIMIT instead of the address of the first byte of what is
after the I/O space.
Fixes: c279443709 ("ARM: Add fixed PCI i/o mapping")
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit cb59bc14e8 ]
If the TDLS setup happens over a connection to an AP that
doesn't have QoS, we nevertheless assign a non-zero TID
(skb->priority) and queue mapping, which may confuse us or
drivers later.
Fix it by just assigning the special skb->priority and then
using ieee80211_select_queue() just like other data frames
would go through.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 119f94a6fe ]
cfg80211_get_bss_channel() is used to update the RX channel based on the
available frame payload information (channel number from DSSS Parameter
Set element or HT Operation element). This is needed on 2.4 GHz channels
where frames may be received on neighboring channels due to overlapping
frequency range.
This might of some use on the 5 GHz band in some corner cases, but
things are more complex there since there is no n:1 or 1:n mapping
between channel numbers and frequencies due to multiple different
starting frequencies in different operating classes. This could result
in ieee80211_channel_to_frequency() returning incorrect frequency and
ieee80211_get_channel() returning incorrect channel information (or
indication of no match). In the previous implementation, this could
result in some scan results being dropped completely, e.g., for the 4.9
GHz channels. That prevented connection to such BSSs.
Fix this by using the driver-provided channel pointer if
ieee80211_get_channel() does not find matching channel data for the
channel number in the frame payload and if the scan is done with 5 MHz
or 10 MHz channel bandwidth. While doing this, also add comments
describing what the function is trying to achieve to make it easier to
understand what happens here and why.
Signed-off-by: Jouni Malinen <jouni@codeaurora.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6eae4a6c2b ]
In our environment running lots of mesh nodes, we are seeing the
pending queue hang periodically, with the debugfs queues file showing
lines such as:
00: 0x00000000/348
i.e. there are a large number of frames but no stop reason set.
One way this could happen is if queue processing from the pending
tasklet exited early without processing all frames, and without having
some future event (incoming frame, stop reason flag, ...) to reschedule
it.
Exactly this can occur today if ieee80211_tx() returns false due to
packet drops or power-save buffering in the tx handlers. In the
past, this function would return true in such cases, and the change
to false doesn't seem to be intentional. Fix this case by reverting
to the previous behavior.
Fixes: bb42f2d13f ("mac80211: Move reorder-sensitive TX handlers to after TXQ dequeue")
Signed-off-by: Bob Copeland <bobcopeland@fb.com>
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 24f33e64fc ]
Core regulatory hints didn't set wiphy_idx to WIPHY_IDX_INVALID. Since
the regulatory request is zeroed, wiphy_idx was always implicitly set to
0. This resulted in updating only phy #0.
Fix that.
Fixes: 806a9e3967 ("cfg80211: make regulatory_request use wiphy_idx instead of wiphy")
Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
[add fixes tag]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8682250b3c ]
If a frame is dropped for any reason, mac80211 wouldn't report the TX
status back to user space.
As the user space may rely on the TX_STATUS to kick its state
machines, resends etc, it's better to just report this frame as not
acked instead.
Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 215ab0f021 ]
After commit d6990976af ("vti6: fix PMTU caching
and reporting on xmit"), some too big skbs might be potentially passed down to
__xfrm6_output, causing it to fail to transmit but not free the skb, causing a
leak of skb, and consequentially a leak of dst references.
After running pmtu.sh, that shows as failure to unregister devices in a namespace:
[ 311.397671] unregister_netdevice: waiting for veth_b to become free. Usage count = 1
The fix is to call kfree_skb in case of transmit failures.
Fixes: dd767856a3 ("xfrm6: Don't call icmpv6_send on local error")
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 07bf790895 ]
We don't validate the address prefix lengths in the xfrm
selector we got from userspace. This can lead to undefined
behaviour in the address matching functions if the prefix
is too big for the given address family. Fix this by checking
the prefixes and refuse SA/policy insertation when a prefix
is invalid.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reported-by: Air Icy <icytxw@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit a3ade8cc47 upstream.
The host may send multiple negotiation packets
(due to timeout) before the KVP user-mode daemon
is connected. KVP user-mode daemon is connected.
We need to defer processing those packets
until the daemon is negotiated and connected.
It's okay for guest to respond
to all negotiation packets.
In addition, the host may send multiple staged
KVP requests as soon as negotiation is done.
We need to properly process those packets using one
tasklet for exclusive access to ring buffer.
This patch is based on the work of
Nick Meier <Nick.Meier@microsoft.com>.
The above is the original changelog of
a3ade8cc47 ("HV: properly delay KVP packets when negotiation is in progress"
Here I re-worked the original patch because the mainline version
can't work for the linux-4.4.y branch, on which channel->callback_event
doesn't exist yet. In the mainline, channel->callback_event was added by:
631e63a9f3 ("vmbus: change to per channel tasklet"). Here we don't want
to backport it to v4.4, as it requires extra supporting changes and fixes,
which are unnecessary as to the KVP bug we're trying to resolve.
NOTE: before this patch is used, we should cherry-pick the other related
3 patches from the mainline first:
The background of this backport request is that: recently Wang Jian reported
some KVP issues: https://github.com/LIS/lis-next/issues/593:
e.g. the /var/lib/hyperv/.kvp_pool_* files can not be updated, and sometimes
if the hv_kvp_daemon doesn't timely start, the host may not be able to query
the VM's IP address via KVP.
Reported-by: Wang Jian <jianjian.wang1@gmail.com>
Tested-by: Wang Jian <jianjian.wang1@gmail.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8bc1379b82 upstream.
Use a separate journal transaction if it turns out that we need to
convert an inline file to use an data block. Otherwise we could end
up failing due to not having journal credits.
This addresses CVE-2018-10883.
https://bugzilla.kernel.org/show_bug.cgi?id=200071
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
[fengc@google.com: 4.4 and 4.9 backport: adjust context]
Signed-off-by: Chenbo Feng <fengc@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 25e2d8c1b9 upstream.
irq_time_read() returns the irqtime minus the ksoftirqd time. This
is necessary because irq_time_read() is used to substract the IRQ time
from the sum_exec_runtime of a task. If we were to include the softirq
time of ksoftirqd, this task would substract its own CPU time everytime
it updates ksoftirqd->sum_exec_runtime which would therefore never
progress.
But this behaviour got broken by:
a499a5a14d ("sched/cputime: Increment kcpustat directly on irqtime account")
... which now includes ksoftirqd softirq time in the time returned by
irq_time_read().
This has resulted in wrong ksoftirqd cputime reported to userspace
through /proc/stat and thus "top" not showing ksoftirqd when it should
after intense networking load.
ksoftirqd->stime happens to be correct but it gets scaled down by
sum_exec_runtime through task_cputime_adjusted().
To fix this, just account the strict IRQ time in a separate counter and
use it to report the IRQ time.
Reported-and-tested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Link: http://lkml.kernel.org/r/1493129448-5356-1-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ivan Delalande <colona@arista.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit daa35bd956 upstream.
When the gadget serial device has no associated TTY, do not pass any
received data into the TTY layer for processing; simply drop it instead.
This prevents the TTY layer from calling back into the gadget serial
driver, which will then crash in e.g. gs_write_room() due to lack of
gadget serial device to TTY association (i.e. a NULL pointer dereference).
Signed-off-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit b6cc0ba2cb (HID: add support for Apple Magic Keyboards)
backported support for the Magic Keyboard over Bluetooth, but did not
add the BT_VENDOR_ID_APPLE to hid_have_special_driver[] so the hid-apple
driver is never loaded and Fn key does not work at all.
Adding BT_VENDOR_ID_APPLE to hid_have_special_driver[] is not needed
after commit e04a0442d3 (HID: core: remove the absolute need of
hid_have_special_driver[]), so 4.16 kernels and newer does not need it.
Fixes: b6cc0ba2cb (HID: add support for Apple Magic Keyboards)
Bugzilla-id: https://bugzilla.kernel.org/show_bug.cgi?id=99881
Signed-off-by: Natanael Copa <ncopa@alpinelinux.org>
Acked-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 40660f1fce upstream.
There's not much sense in doing that because if user or
his build-system didn't set CROSS_COMPILE we still may
very well make incorrect guess.
But as it turned out setting CROSS_COMPILE is not as harmless
as one may think: with recent changes that implemented automatic
discovery of __host__ gcc features unconditional setup of
CROSS_COMPILE leads to failures on execution of "make xxx_defconfig"
with absent cross-compiler, for more info see [1].
Set CROSS_COMPILE as well gets in the way if we want only to build
.dtb's (again with absent cross-compiler which is not really needed
for building .dtb's), see [2].
Note, we had to change LIBGCC assignment type from ":=" to "="
so that is is resolved on its usage, otherwise if it is resolved
at declaration time with missing CROSS_COMPILE we're getting this
error message from host GCC:
| gcc: error: unrecognized command line option -mmedium-calls
| gcc: error: unrecognized command line option -mno-sdata
[1] http://lists.infradead.org/pipermail/linux-snps-arc/2018-September/004308.html
[2] http://lists.infradead.org/pipermail/linux-snps-arc/2018-September/004320.html
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Rob Herring <robh@kernel.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 615f64458a upstream.
This check is very naive: we simply test if GCC invoked without
"-mcpu=XXX" has ARC700 define set. In that case we think that GCC
was built with "--with-cpu=arc700" and has libgcc built for ARC700.
Otherwise if ARC700 is not defined we think that everythng was built
for ARCv2.
But in reality our life is much more interesting.
1. Regardless of GCC configuration (i.e. what we pass in "--with-cpu"
it may generate code for any ARC core).
2. libgcc might be built with explicitly specified "--mcpu=YYY"
That's exactly what happens in case of multilibbed toolchains:
- GCC is configured with default settings
- All the libs built for many different CPU flavors
I.e. that check gets in the way of usage of multilibbed
toolchains. And even non-multilibbed toolchains are affected.
OpenEmbedded also builds GCC without "--with-cpu" because
each and every target component later is compiled with explicitly
set "-mcpu=ZZZ".
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ab6dd1beac upstream.
Commit 4440a2ab3b ("netfilter: synproxy: Check oom when adding synproxy
and seqadj ct extensions") wanted to drop the packet when it fails to add
seqadj ext due to no memory by checking if nfct_seqadj_ext_add returns
NULL.
But that nfct_seqadj_ext_add returns NULL can also happen when seqadj ext
already exists in a nf_conn. It will cause that userspace protocol doesn't
work when both dnat and snat are configured.
Li Shuang found this issue in the case:
Topo:
ftp client router ftp server
10.167.131.2 <-> 10.167.131.254 10.167.141.254 <-> 10.167.141.1
Rules:
# iptables -t nat -A PREROUTING -i eth1 -p tcp -m tcp --dport 21 -j \
DNAT --to-destination 10.167.141.1
# iptables -t nat -A POSTROUTING -o eth2 -p tcp -m tcp --dport 21 -j \
SNAT --to-source 10.167.141.254
In router, when both dnat and snat are added, nf_nat_setup_info will be
called twice. The packet can be dropped at the 2nd time for DNAT due to
seqadj ext is already added at the 1st time for SNAT.
This patch is to fix it by checking for seqadj ext existence before adding
it, so that the packet will not be dropped if seqadj ext already exists.
Note that as Florian mentioned, as a long term, we should review ext_add()
behaviour, it's better to return a pointer to the existing ext instead.
Fixes: 4440a2ab3b ("netfilter: synproxy: Check oom when adding synproxy and seqadj ct extensions")
Reported-by: Li Shuang <shuali@redhat.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4628a64591 upstream.
Currently _PAGE_DEVMAP bit is not preserved in mprotect(2) calls. As a
result we will see warnings such as:
BUG: Bad page map in process JobWrk0013 pte:800001803875ea25 pmd:7624381067
addr:00007f0930720000 vm_flags:280000f9 anon_vma: (null) mapping:ffff97f2384056f0 index:0
file:457-000000fe00000030-00000009-000000ca-00000001_2001.fileblock fault:xfs_filemap_fault [xfs] mmap:xfs_file_mmap [xfs] readpage: (null)
CPU: 3 PID: 15848 Comm: JobWrk0013 Tainted: G W 4.12.14-2.g7573215-default #1 SLE12-SP4 (unreleased)
Hardware name: Intel Corporation S2600WFD/S2600WFD, BIOS SE5C620.86B.01.00.0833.051120182255 05/11/2018
Call Trace:
dump_stack+0x5a/0x75
print_bad_pte+0x217/0x2c0
? enqueue_task_fair+0x76/0x9f0
_vm_normal_page+0xe5/0x100
zap_pte_range+0x148/0x740
unmap_page_range+0x39a/0x4b0
unmap_vmas+0x42/0x90
unmap_region+0x99/0xf0
? vma_gap_callbacks_rotate+0x1a/0x20
do_munmap+0x255/0x3a0
vm_munmap+0x54/0x80
SyS_munmap+0x1d/0x30
do_syscall_64+0x74/0x150
entry_SYSCALL_64_after_hwframe+0x3d/0xa2
...
when mprotect(2) gets used on DAX mappings. Also there is a wide variety
of other failures that can result from the missing _PAGE_DEVMAP flag
when the area gets used by get_user_pages() later.
Fix the problem by including _PAGE_DEVMAP in a set of flags that get
preserved by mprotect(2).
Fixes: 69660fd797 ("x86, mm: introduce _PAGE_DEVMAP")
Fixes: ebd3119793 ("powerpc/mm: Add devmap support for ppc64")
Cc: <stable@vger.kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit eb66ae0308 upstream.
Jann Horn points out that our TLB flushing was subtly wrong for the
mremap() case. What makes mremap() special is that we don't follow the
usual "add page to list of pages to be freed, then flush tlb, and then
free pages". No, mremap() obviously just _moves_ the page from one page
table location to another.
That matters, because mremap() thus doesn't directly control the
lifetime of the moved page with a freelist: instead, the lifetime of the
page is controlled by the page table locking, that serializes access to
the entry.
As a result, we need to flush the TLB not just before releasing the lock
for the source location (to avoid any concurrent accesses to the entry),
but also before we release the destination page table lock (to avoid the
TLB being flushed after somebody else has already done something to that
page).
This also makes the whole "need_flush" logic unnecessary, since we now
always end up flushing the TLB for every valid entry.
Reported-and-tested-by: Jann Horn <jannh@google.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Tested-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5ebb1bc2d6 ]
ACPI HID devices do not actually have an alias for
them in the IVRS. But dev_data->alias is still used
for indexing into the IOMMU device table for devices
being handled by the IOMMU. So for ACPI HID devices,
we simply return the corresponding devid as an alias,
as parsed from IVRS table.
Signed-off-by: Arindam Nath <arindam.nath@amd.com>
Fixes: 2bf9a0a127 ('iommu/amd: Add iommu support for ACPI HID devices')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 96dc89d526 ]
Current we store the userspace r1 to PACATMSCRATCH before finally
saving it to the thread struct.
In theory an exception could be taken here (like a machine check or
SLB miss) that could write PACATMSCRATCH and hence corrupt the
userspace r1. The SLB fault currently doesn't touch PACATMSCRATCH, but
others do.
We've never actually seen this happen but it's theoretically
possible. Either way, the code is fragile as it is.
This patch saves r1 to the kernel stack (which can't fault) before we
turn MSR[RI] back on. PACATMSCRATCH is still used but only with
MSR[RI] off. We then copy r1 from the kernel stack to the thread
struct once we have MSR[RI] back on.
Suggested-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit cf13435b73 ]
When we treclaim we store the userspace checkpointed r13 to a scratch
SPR and then later save the scratch SPR to the user thread struct.
Unfortunately, this doesn't work as accessing the user thread struct
can take an SLB fault and the SLB fault handler will write the same
scratch SPRG that now contains the userspace r13.
To fix this, we store r13 to the kernel stack (which can't fault)
before we access the user thread struct.
Found by running P8 guest + powervm + disable_1tb_segments + TM. Seen
as a random userspace segfault with r13 looking like a kernel address.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Reviewed-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 8ac1ee6f4d ]
Clang warns that the address of a pointer will always evaluated as true
in a boolean context:
drivers/net/ethernet/mellanox/mlx4/eq.c:243:11: warning: address of
array 'eq->affinity_mask' will always evaluate to 'true'
[-Wpointer-bool-conversion]
if (!eq->affinity_mask || cpumask_empty(eq->affinity_mask))
~~~~~^~~~~~~~~~~~~
1 warning generated.
Use cpumask_available, introduced in commit f7e30f01a9 ("cpumask: Add
helper cpumask_available()"), which does the proper checking and avoids
this warning.
Link: https://github.com/ClangBuiltLinux/linux/issues/86
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit f1f1fadaca ]
When sd_init_command() get's a command with a unknown req_op() it crashes the
system via BUG().
This makes debugging the actual reason for the broken request cmd_flags pretty
hard as the system is down before it's able to write out debugging data on the
serial console or the trace buffer.
Change the BUG() to a WARN_ON() and return BLKPREP_KILL to fail gracefully and
return an I/O error to the producer of the request.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 69be1984de ]
Currently, if userspace calls drm_wait_vblank before the crtc is
activated the crtc vblank_enable hook is called, which in case of
malidp driver triggers some warninngs. This happens because on
device init we don't inform the drm core about the vblank state
by calling drm_crtc_vblank_on/off/reset which together with
drm_vblank_get have some magic that prevents calling drm_vblank_enable
when crtc is off.
Signed-off-by: Alexandru Gheorghe <alexandru-cosmin.gheorghe@arm.com>
Acked-by: Liviu Dudau <liviu.dudau@arm.com>
Signed-off-by: Liviu Dudau <liviu.dudau@arm.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2fe397a395 ]
EtherAVB hardware requires 0 to be written to status register bits in
order to clear them, however, care must be taken not to:
1. Clear other bits, by writing zero to them
2. Write one to reserved bits
This patch corrects the ravb driver with respect to the second point above.
This is done by defining reserved bit masks for the affected registers and,
after auditing the code, ensure all sites that may write a one to a
reserved bit use are suitably masked.
Signed-off-by: Kazuya Mizuguchi <kazuya.mizuguchi.ks@renesas.com>
Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
Reviewed-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d792d4c4fc ]
There's currently a warning about string overflow with strncat:
drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c: In function 'ibmvscsis_probe':
drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c:3479:2: error: 'strncat' specified
bound 64 equals destination size [-Werror=stringop-overflow=]
strncat(vscsi->eye, vdev->name, MAX_EYE);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Switch to a single snprintf instead of a strcpy + strcat to handle this
cleanly.
Signed-off-by: Laura Abbott <labbott@redhat.com>
Suggested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4c4af69008 ]
The hardif_neigh refcounter is to be decreased by the queued work and
currently is never decreased if the queue_work() call fails.
Fix by checking the queue_work() return value and decrease refcount
if necessary.
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5af96b9c59 ]
The backbone_gw refcounter is to be decreased by the queued work and
currently is never decreased if the queue_work() call fails.
Fix by checking the queue_work() return value and decrease refcount
if necessary.
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ae3cdc97dc ]
The function batadv_tvlv_handler_register is responsible for adding new
tvlv_handler to the handler_list. It first checks whether the entry
already is in the list or not. If it is, then the creation of a new entry
is aborted.
But the lock for the list is only held when the list is really modified.
This could lead to duplicated entries because another context could create
an entry with the same key between the check and the list manipulation.
The check and the manipulation of the list must therefore be in the same
locked code section.
Fixes: ef26157747 ("batman-adv: tvlv - basic infrastructure")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e7136e48ff ]
The function batadv_tt_global_orig_entry_add is responsible for adding new
tt_orig_list_entry to the orig_list. It first checks whether the entry
already is in the list or not. If it is, then the creation of a new entry
is aborted.
But the lock for the list is only held when the list is really modified.
This could lead to duplicated entries because another context could create
an entry with the same key between the check and the list manipulation.
The check and the manipulation of the list must therefore be in the same
locked code section.
Fixes: d657e621a0 ("batman-adv: add reference counting for type batadv_tt_orig_list_entry")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 94cb82f594 ]
The function batadv_softif_vlan_get is responsible for adding new
softif_vlan to the softif_vlan_list. It first checks whether the entry
already is in the list or not. If it is, then the creation of a new entry
is aborted.
But the lock for the list is only held when the list is really modified.
This could lead to duplicated entries because another context could create
an entry with the same key between the check and the list manipulation.
The check and the manipulation of the list must therefore be in the same
locked code section.
Fixes: 5d2c05b213 ("batman-adv: add per VLAN interface attribute framework")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit fa122fec86 ]
The function batadv_nc_get_nc_node is responsible for adding new nc_nodes
to the in_coding_list and out_coding_list. It first checks whether the
entry already is in the list or not. If it is, then the creation of a new
entry is aborted.
But the lock for the list is only held when the list is really modified.
This could lead to duplicated entries because another context could create
an entry with the same key between the check and the list manipulation.
The check and the manipulation of the list must therefore be in the same
locked code section.
Fixes: d56b1705e2 ("batman-adv: network coding - detect coding nodes and remove these after timeout")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Acked-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a25bab9d72 ]
The per hardif sysfs file "batman_adv/elp_interval" is using the generic
functions to store/show uint values. The helper __batadv_store_uint_attr
requires the softif net_device as parameter to print the resulting change
as info text when the users writes to this file. It uses the helper
function batadv_info to add it at the same time to the kernel ring buffer
and to the batman-adv debug log (when CONFIG_BATMAN_ADV_DEBUG is enabled).
The function batadv_info requires as first parameter the batman-adv softif
net_device. This parameter is then used to find the private buffer which
contains the debug log for this batman-adv interface. But
batadv_store_throughput_override used as first argument the slave
net_device. This slave device doesn't have the batadv_priv private data
which is access by batadv_info.
Writing to this file with CONFIG_BATMAN_ADV_DEBUG enabled can either lead
to a segfault or to memory corruption.
Fixes: 0744ff8fa8 ("batman-adv: Add hard_iface specific sysfs wrapper macros for UINT")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Acked-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b9fd14c208 ]
The per hardif sysfs file "batman_adv/throughput_override" prints the
resulting change as info text when the users writes to this file. It uses
the helper function batadv_info to add it at the same time to the kernel
ring buffer and to the batman-adv debug log (when CONFIG_BATMAN_ADV_DEBUG
is enabled).
The function batadv_info requires as first parameter the batman-adv softif
net_device. This parameter is then used to find the private buffer which
contains the debug log for this batman-adv interface. But
batadv_store_throughput_override used as first argument the slave
net_device. This slave device doesn't have the batadv_priv private data
which is access by batadv_info.
Writing to this file with CONFIG_BATMAN_ADV_DEBUG enabled can either lead
to a segfault or to memory corruption.
Fixes: 0b5ecc6811 ("batman-adv: add throughput override attribute to hard_ifaces")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Acked-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 312f73b648 ]
When less than 3 bytes are written to the device, memcpy is called with
negative array size which leads to buffer overflow and kernel panic. This
patch adds a condition and returns -EOPNOTSUPP instead.
Fixes bugzilla issue 64871
[mchehab+samsung@kernel.org: fix a merge conflict and changed the
condition to match the patch's comment, e. g. len == 3 could
also be valid]
Signed-off-by: Jozef Balga <jozef.balga@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(commit 70837ffe30 upstream)
We accidentally removed the parentheses here, but they are required
because '!' has higher precedence than '&'.
Fixes: fa0f527358 ("ip: use rb trees for IP frag queue.")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch changes the runtime behavior of IP defrag queue:
incoming in-order fragments are added to the end of the current
list/"run" of in-order fragments at the tail.
On some workloads, UDP stream performance is substantially improved:
RX: ./udp_stream -F 10 -T 2 -l 60
TX: ./udp_stream -c -H <host> -F 10 -T 5 -l 60
with this patchset applied on a 10Gbps receiver:
throughput=9524.18
throughput_units=Mbit/s
upstream (net-next):
throughput=4608.93
throughput_units=Mbit/s
Reported-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Peter Oskolkov <posk@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit a4fd284a1f)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch introduces several helper functions/macros that will be
used in the follow-up patch. No runtime changes yet.
The new logic (fully implemented in the second patch) is as follows:
* Nodes in the rb-tree will now contain not single fragments, but lists
of consecutive fragments ("runs").
* At each point in time, the current "active" run at the tail is
maintained/tracked. Fragments that arrive in-order, adjacent
to the previous tail fragment, are added to this tail run without
triggering the re-balancing of the rb-tree.
* If a fragment arrives out of order with the offset _before_ the tail run,
it is inserted into the rb-tree as a single fragment.
* If a fragment arrives after the current tail fragment (with a gap),
it starts a new "tail" run, as is inserted into the rb-tree
at the end as the head of the new run.
skb->cb is used to store additional information
needed here (suggested by Eric Dumazet).
Reported-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Peter Oskolkov <posk@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 353c9cb360)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(commit fa0f527358 upstream)
Similar to TCP OOO RX queue, it makes sense to use rb trees to store
IP fragments, so that OOO fragments are inserted faster.
Tested:
- a follow-up patch contains a rather comprehensive ip defrag
self-test (functional)
- ran neper `udp_stream -c -H <host> -F 100 -l 300 -T 20`:
netstat --statistics
Ip:
282078937 total packets received
0 forwarded
0 incoming packets discarded
946760 incoming packets delivered
18743456 requests sent out
101 fragments dropped after timeout
282077129 reassemblies required
944952 packets reassembled ok
262734239 packet reassembles failed
(The numbers/stats above are somewhat better re:
reassemblies vs a kernel without this patchset. More
comprehensive performance testing TBD).
Reported-by: Jann Horn <jannh@google.com>
Reported-by: Juha-Matti Tilli <juha-matti.tilli@iki.fi>
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Geeralize private netem_rb_to_skb()
TCP rtx queue will soon be converted to rb-tree,
so we will need skb_rbtree_walk() helpers.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 18a4c0eab2)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
After working on IP defragmentation lately, I found that some large
packets defeat CHECKSUM_COMPLETE optimization because of NIC adding
zero paddings on the last (small) fragment.
While removing the padding with pskb_trim_rcsum(), we set skb->ip_summed
to CHECKSUM_NONE, forcing a full csum validation, even if all prior
fragments had CHECKSUM_COMPLETE set.
We can instead compute the checksum of the part we are trimming,
usually smaller than the part we keep.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 88078d98d1)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
don't bother with pathological cases, they only waste cycles.
IPv6 requires a minimum MTU of 1280 so we should never see fragments
smaller than this (except last frag).
v3: don't use awkward "-offset + len"
v2: drop IPv4 part, which added same check w. IPV4_MIN_MTU (68).
There were concerns that there could be even smaller frags
generated by intermediate nodes, e.g. on radio networks.
Cc: Peter Oskolkov <posk@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 0ed4229b08)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
As measured in my prior patch ("sch_netem: faster rb tree removal"),
rbtree_postorder_for_each_entry_safe() is nice looking but much slower
than using rb_next() directly, except when tree is small enough
to fit in CPU caches (then the cost is the same)
Also note that there is not even an increase of text size :
$ size net/core/skbuff.o.before net/core/skbuff.o
text data bss dec hex filename
40711 1298 0 42009 a419 net/core/skbuff.o.before
40711 1298 0 42009 a419 net/core/skbuff.o
From: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 7c90584c66)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This behavior is required in IPv6, and there is little need
to tolerate overlapping fragments in IPv4. This change
simplifies the code and eliminates potential DDoS attack vectors.
Tested: ran ip_defrag selftest (not yet available uptream).
Suggested-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Florian Westphal <fw@strlen.de>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 7969e5c40d)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Giving an integer to proc_doulongvec_minmax() is dangerous on 64bit arches,
since linker might place next to it a non zero value preventing a change
to ip6frag_low_thresh.
ip6frag_low_thresh is not used anymore in the kernel, but we do not
want to prematuraly break user scripts wanting to change it.
Since specifying a minimal value of 0 for proc_doulongvec_minmax()
is moot, let's remove these zero values in all defrag units.
Fixes: 6e00f7dd5e ("ipv6: frags: fix /proc/sys/net/ipv6/ip6frag_low_thresh")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 3d23401283)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ip_defrag uses skb->cb[] to store the fragment offset, and unfortunately
this integer is currently in a different cache line than skb->next,
meaning that we use two cache lines per skb when finding the insertion point.
By aliasing skb->ip_defrag_offset and skb->dev, we pack all the fields
in a single cache line and save precious memory bandwidth.
Note that after the fast path added by Changli Gao in commit
d6bebca92c ("fragment: add fast path for in-order fragments")
this change wont help the fast path, since we still need
to access prev->len (2nd cache line), but will show great
benefits when slow path is entered, since we perform
a linear scan of a potentially long list.
Also, note that this potential long list is an attack vector,
we might consider also using an rb-tree there eventually.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit bf66337140)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Put the read-mostly fields in a separate cache line
at the beginning of struct netns_frags, to reduce
false sharing noticed in inet_frag_kill()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit c2615cf5a7)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
While under frags DDOS I noticed unfortunate false sharing between
@nelems and @params.automatic_shrinking
Move @nelems at the end of struct rhashtable so that first cache line
is shared between all cpus, because almost never dirtied.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit e5d672a078)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
An skb_clone() was added in commit ec4fbd6475 ("inet: frag: release
spinlock before calling icmp_send()")
While fixing the bug at that time, it also added a very high cost
for DDOS frags, as the ICMP rate limit is applied after this
expensive operation (skb_clone() + consume_skb(), implying memory
allocations, copy, and freeing)
We can use skb_get(head) here, all we want is to make sure skb wont
be freed by another cpu.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 1eec5d5670)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Some users are willing to provision huge amounts of memory to be able
to perform reassembly reasonnably well under pressure.
Current memory tracking is using one atomic_t and integers.
Switch to atomic_long_t so that 64bit arches can use more than 2GB,
without any cost for 32bit arches.
Note that this patch avoids an overflow error, if high_thresh was set
to ~2GB, since this test in inet_frag_alloc() was never true :
if (... || frag_mem_limit(nf) > nf->high_thresh)
Tested:
$ echo 16000000000 >/proc/sys/net/ipv4/ipfrag_high_thresh
<frag DDOS>
$ grep FRAG /proc/net/sockstat
FRAG: inuse 14705885 memory 16000002880
$ nstat -n ; sleep 1 ; nstat | grep Reas
IpReasmReqds 3317150 0.0
IpReasmFails 3317112 0.0
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 3e67f106f6)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This refactors ip_expire() since one indentation level is removed.
Note: in the future, we should try hard to avoid the skb_clone()
since this is a serious performance cost.
Under DDOS, the ICMP message wont be sent because of rate limits.
Fact that ip6_expire_frag_queue() does not use skb_clone() is
disturbing too. Presumably IPv6 should have the same
issue than the one we fixed in commit ec4fbd6475
("inet: frag: release spinlock before calling icmp_send()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 399d1404be)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Remove sum_frag_mem_limit(), ip_frag_mem() & ip6_frag_mem()
Also since we use rhashtable we can bring back the number of fragments
in "grep FRAG /proc/net/sockstat /proc/net/sockstat6" that was
removed in commit 434d305405 ("inet: frag: don't account number
of fragment queues")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 6befe4a78b)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Some applications still rely on IP fragmentation, and to be fair linux
reassembly unit is not working under any serious load.
It uses static hash tables of 1024 buckets, and up to 128 items per bucket (!!!)
A work queue is supposed to garbage collect items when host is under memory
pressure, and doing a hash rebuild, changing seed used in hash computations.
This work queue blocks softirqs for up to 25 ms when doing a hash rebuild,
occurring every 5 seconds if host is under fire.
Then there is the problem of sharing this hash table for all netns.
It is time to switch to rhashtables, and allocate one of them per netns
to speedup netns dismantle, since this is a critical metric these days.
Lookup is now using RCU. A followup patch will even remove
the refcount hold/release left from prior implementation and save
a couple of atomic operations.
Before this patch, 16 cpus (16 RX queue NIC) could not handle more
than 1 Mpps frags DDOS.
After the patch, I reach 9 Mpps without any tuning, and can use up to 2GB
of storage for the fragments (exact number depends on frags being evicted
after timeout)
$ grep FRAG /proc/net/sockstat
FRAG: inuse 1966916 memory 2140004608
A followup patch will change the limits for 64bit arches.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Florian Westphal <fw@strlen.de>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Alexander Aring <alex.aring@gmail.com>
Cc: Stefan Schmidt <stefan@osg.samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 648700f76b)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Rehashing and destroying large hash table takes a lot of time,
and happens in process context. It is safe to add cond_resched()
in rhashtable_rehash_table() and rhashtable_free_and_destroy()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit ae6da1f503)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
IPv4 was changed in commit 52a773d645 ("net: Export ip fragment
sysctl to unprivileged users")
The only sysctl that is not per-netns is not used :
ip6frag_secret_interval
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 18dcbe12fe)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We want to call lowpan_net_frag_init() earlier.
Similar to commit "inet: frags: refactor ipv6_frag_init()"
This is a prereq to "inet: frags: use rhashtables for reassembly units"
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 807f1844df)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We want to call inet_frags_init() earlier.
This is a prereq to "inet: frags: use rhashtables for reassembly units"
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 5b975bab23)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We need to call inet_frags_init() before register_pernet_subsys(),
as a prereq for following patch ("inet: frags: use rhashtables for reassembly units")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 483a6e4fa0)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In order to simplify the API, add a pointer to struct inet_frags.
This will allow us to make things less complex.
These functions no longer have a struct inet_frags parameter :
inet_frag_destroy(struct inet_frag_queue *q /*, struct inet_frags *f */)
inet_frag_put(struct inet_frag_queue *q /*, struct inet_frags *f */)
inet_frag_kill(struct inet_frag_queue *q /*, struct inet_frags *f */)
inet_frags_exit_net(struct netns_frags *nf /*, struct inet_frags *f */)
ip6_expire_frag_queue(struct net *net, struct frag_queue *fq)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 093ba72914)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We will soon initialize one rhashtable per struct netns_frags
in inet_frags_init_net().
This patch changes the return value to eventually propagate an
error.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 787bea7748)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2ab2ddd301 ]
Timer handlers do not imply rcu_read_lock(), so my recent fix
triggered a LOCKDEP warning when SYNACK is retransmit.
Lets add rcu_read_lock()/rcu_read_unlock() pairs around ireq->ireq_opt
usages instead of guessing what is done by callers, since it is
not worth the pain.
Get rid of ireq_opt_deref() helper since it hides the logic
without real benefit, since it is now a standard rcu_dereference().
Fixes: 1ad98e9d1b ("tcp/dccp: fix lockdep issue when SYN is backlogged")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 1ad98e9d1b ]
In normal SYN processing, packets are handled without listener
lock and in RCU protected ingress path.
But syzkaller is known to be able to trick us and SYN
packets might be processed in process context, after being
queued into socket backlog.
In commit 06f877d613 ("tcp/dccp: fix other lockdep splats
accessing ireq_opt") I made a very stupid fix, that happened
to work mostly because of the regular path being RCU protected.
Really the thing protecting ireq->ireq_opt is RCU read lock,
and the pseudo request refcnt is not relevant.
This patch extends what I did in commit 449809a66c ("tcp/dccp:
block BH for SYN processing") by adding an extra rcu_read_{lock|unlock}
pair in the paths that might be taken when processing SYN from
socket backlog (thus possibly in process context)
Fixes: 06f877d613 ("tcp/dccp: fix other lockdep splats accessing ireq_opt")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0e1d6eca51 ]
We have an impressive number of syzkaller bugs that are linked
to the fact that syzbot was able to create a networking device
with millions of TX (or RX) queues.
Let's limit the number of RX/TX queues to 4096, this really should
cover all known cases.
A separate patch will add various cond_resched() in the loops
handling sysfs entries at device creation and dismantle.
Tested:
lpaa6:~# ip link add gre-4097 numtxqueues 4097 numrxqueues 4097 type ip6gretap
RTNETLINK answers: Invalid argument
lpaa6:~# time ip link add gre-4096 numtxqueues 4096 numrxqueues 4096 type ip6gretap
real 0m0.180s
user 0m0.000s
sys 0m0.107s
Fixes: 76ff5cc919 ("rtnl: allow to specify number of rx and tx queues on device creation")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 45ec318578 ]
The AON_PM_L2 is normally used to trigger and identify the source of a
wake-up event. Since the RX_SYS clock is no longer turned off, we also
have an interrupt being sent to the SYSTEMPORT INTRL_2_0 controller, and
that interrupt remains active up until the magic packet detector is
disabled which happens much later during the driver resumption.
The race happens if we have a CPU that is entering the SYSTEMPORT
INTRL2_0 handler during resume, and another CPU has managed to clear the
wake-up interrupt during bcm_sysport_resume_from_wol(). In that case, we
have the first CPU stuck in the interrupt handler with an interrupt
cause that has been cleared under its feet, and so we keep returning
IRQ_NONE and we never make any progress.
This was not a problem before because we would always turn off the
RX_SYS clock during WoL, so the SYSTEMPORT INTRL2_0 would also be turned
off as well, thus not latching the interrupt.
The fix is to make sure we do not enable either the MPD or
BRCM_TAG_MATCH interrupts since those are redundant with what the
AON_PM_L2 interrupt controller already processes and they would cause
such a race to occur.
Fixes: bb9051a2b2 ("net: systemport: Add support for WAKE_FILTER")
Fixes: 83e82f4c70 ("net: systemport: add Wake-on-LAN support")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 35f3625c21 ]
When offloading the L3 and L4 csum computation on TX, we need to extract
the l3_proto from the ethtype, independently of the presence of a vlan
tag.
The actual driver uses skb->protocol as-is, resulting in packets with
the wrong L4 checksum being sent when there's a vlan tag in the packet
header and checksum offloading is enabled.
This commit makes use of vlan_protocol_get() to get the correct ethtype
regardless the presence of a vlan tag.
Fixes: 3f518509de ("ethernet: Add new driver for Marvell Armada 375 network unit")
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit bf3b452b7a ]
The order in which we release resources is unfortunately leading to bus
errors while dismantling the port. This is because we set
priv->wol_ports_mask to 0 to tell bcm_sf2_sw_suspend() that it is now
permissible to clock gate the switch. Later on, when dsa_slave_destroy()
comes in from dsa_unregister_switch() and calls
dsa_switch_ops::port_disable, we perform the same dismantling again, and
this time we hit registers that are clock gated.
Make sure that dsa_unregister_switch() is the first thing that happens,
which takes care of releasing all user visible resources, then proceed
with clock gating hardware. We still need to set priv->wol_ports_mask to
0 to make sure that an enabled port properly gets disabled in case it
was previously used as part of Wake-on-LAN.
Fixes: d9338023fb ("net: dsa: bcm_sf2: Make it a real platform device driver")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4f7617705b ]
Added support for Gemalto's Cinterion ALASxx WWAN interfaces
by adding QMI_FIXED_INTF with Cinterion's VID and PID.
Signed-off-by: Giacinto Cifelli <gciofono@gmail.com>
Acked-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c333fa0c4f ]
In regular NIC transmission flow, driver always configures MAC using
Tx queue zero descriptor as a part of MAC learning flow.
But with multi Tx queue supported NIC, regular transmission can occur on
any non-zero Tx queue and from that context it uses
Tx queue zero descriptor to configure MAC, at the same time TX queue
zero could be used by another CPU for regular transmission
which could lead to Tx queue zero descriptor corruption and cause FW
abort.
This patch fixes this in such a way that driver always configures
learned MAC address from the same Tx queue which is used for
regular transmission.
Fixes: 7e2cf4feba ("qlcnic: change driver hardware interface mechanism")
Signed-off-by: Shahed Shaikh <shahed.shaikh@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit f88b4c01b9 ]
netlbl_unlabel_addrinfo_get() assumes that if it finds the
NLBL_UNLABEL_A_IPV4ADDR attribute, it must also have the
NLBL_UNLABEL_A_IPV4MASK attribute as well. However, this is
not necessarily the case as the current checks in
netlbl_unlabel_staticadd() and friends are not sufficent to
enforce this.
If passed a netlink message with NLBL_UNLABEL_A_IPV4ADDR,
NLBL_UNLABEL_A_IPV6ADDR, and NLBL_UNLABEL_A_IPV6MASK attributes,
these functions will all call netlbl_unlabel_addrinfo_get() which
will then attempt dereference NULL when fetching the non-existent
NLBL_UNLABEL_A_IPV4MASK attribute:
Unable to handle kernel NULL pointer dereference at virtual address 0
Process unlab (pid: 31762, stack limit = 0xffffff80502d8000)
Call trace:
netlbl_unlabel_addrinfo_get+0x44/0xd8
netlbl_unlabel_staticremovedef+0x98/0xe0
genl_rcv_msg+0x354/0x388
netlink_rcv_skb+0xac/0x118
genl_rcv+0x34/0x48
netlink_unicast+0x158/0x1f0
netlink_sendmsg+0x32c/0x338
sock_sendmsg+0x44/0x60
___sys_sendmsg+0x1d0/0x2a8
__sys_sendmsg+0x64/0xb4
SyS_sendmsg+0x34/0x4c
el0_svc_naked+0x34/0x38
Code: 51001149 7100113f 540000a0 f9401508 (79400108)
---[ end trace f6438a488e737143 ]---
Kernel panic - not syncing: Fatal exception
Signed-off-by: Sean Tranchetti <stranche@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 86f9bd1ff6 ]
The backend handling for /proc/net/if_inet6 in addrconf.c doesn't properly
handle starting/stopping the iteration. The problem is that at some point
during the iteration, an overflow is detected and the process is
subsequently stopped. The item being shown via seq_printf() when the
overflow occurs is not actually shown, though. When start() is
subsequently called to resume iterating, it returns the next item, and
thus the item that was being processed when the overflow occurred never
gets printed.
Alter the meaning of the private data member "offset". Currently, when it
is not 0 (which only happens at the very beginning), "offset" represents
the next hlist item to be printed. After this change, "offset" always
represents the current item.
This is also consistent with the private data member "bucket", which
represents the current bucket, and also the use of "pos" as defined in
seq_file.txt:
The pos passed to start() will always be either zero, or the most
recent pos used in the previous session.
Signed-off-by: Jeff Barnhill <0xeffeff@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit af7d6cce53 ]
Since commit 5aad1de5ea ("ipv4: use separate genid for next hop
exceptions"), exceptions get deprecated separately from cached
routes. In particular, administrative changes don't clear PMTU anymore.
As Stefano described in commit e9fa1495d7 ("ipv6: Reflect MTU changes
on PMTU of exceptions for MTU-less routes"), the PMTU discovered before
the local MTU change can become stale:
- if the local MTU is now lower than the PMTU, that PMTU is now
incorrect
- if the local MTU was the lowest value in the path, and is increased,
we might discover a higher PMTU
Similarly to what commit e9fa1495d7 did for IPv6, update PMTU in those
cases.
If the exception was locked, the discovered PMTU was smaller than the
minimal accepted PMTU. In that case, if the new local MTU is smaller
than the current PMTU, let PMTU discovery figure out if locking of the
exception is still needed.
To do this, we need to know the old link MTU in the NETDEV_CHANGEMTU
notifier. By the time the notifier is called, dev->mtu has been
changed. This patch adds the old MTU as additional information in the
notifier structure, and a new call_netdevice_notifiers_u32() function.
Fixes: 5aad1de5ea ("ipv4: use separate genid for next hop exceptions")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2e9361efa7 ]
If SMMU is on, there is more likely that skb_shinfo(skb)->frags[i]
can not send by a single BD. when this happen, the
hns_nic_net_xmit_hw function map the whole data in a frags using
skb_frag_dma_map, but unmap each BD' data individually when tx is
done, which causes problem when SMMU is on.
This patch fixes this problem by ummapping the whole data in a
frags when tx is done.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 54baca0963 ]
There is no reason to open code what the switch setup function does, in
fact, because we just issued a switch reset, we would make all the
register get their default values, including for instance, having unused
port be enabled again and wasting power and leading to an inappropriate
switch core clock being selected.
Fixes: 8cfa94984c ("net: dsa: bcm_sf2: add suspend/resume callbacks")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 73f21c653f ]
The current netpoll implementation in the bnxt_en driver has problems
that may miss TX completion events. bnxt_poll_work() in effect is
only handling at most 1 TX packet before exiting. In addition,
there may be in flight TX completions that ->poll() may miss even
after we fix bnxt_poll_work() to handle all visible TX completions.
netpoll may not call ->poll() again and HW may not generate IRQ
because the driver does not ARM the IRQ when the budget (0 for netpoll)
is reached.
We fix it by handling all TX completions and to always ARM the IRQ
when we exit ->poll() with 0 budget.
Also, the logic to ACK the completion ring in case it is almost filled
with TX completions need to be adjusted to take care of the 0 budget
case, as discussed with Eric Dumazet <edumazet@google.com>
Reported-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Song Liu <songliubraving@fb.com>
Tested-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1208d8a84f upstream.
When disabling a USB3 port the hub driver will set the port link state to
U3 to prevent "ejected" or "safely removed" devices that are still
physically connected from immediately re-enumerating.
If the device was really unplugged, then error messages were printed
as the hub tries to set the U3 link state for a port that is no longer
enabled.
xhci-hcd ee000000.usb: Cannot set link state.
usb usb8-port1: cannot disable (err = -32)
Don't print error message in xhci-hub if hub tries to set port link state
for a disabled port. Return -ENODEV instead which also silences hub driver.
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Tested-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: Ross Zwisler <zwisler@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 08d9db00fe upstream.
The i2c-scmi driver crashes when the SMBus Write Block transaction is
executed:
WARNING: CPU: 9 PID: 2194 at mm/page_alloc.c:3931 __alloc_pages_slowpath+0x9db/0xec0
Call Trace:
? get_page_from_freelist+0x49d/0x11f0
? alloc_pages_current+0x6a/0xe0
? new_slab+0x499/0x690
__alloc_pages_nodemask+0x265/0x280
alloc_pages_current+0x6a/0xe0
kmalloc_order+0x18/0x40
kmalloc_order_trace+0x24/0xb0
? acpi_ut_allocate_object_desc_dbg+0x62/0x10c
__kmalloc+0x203/0x220
acpi_os_allocate_zeroed+0x34/0x36
acpi_ut_copy_eobject_to_iobject+0x266/0x31e
acpi_evaluate_object+0x166/0x3b2
acpi_smbus_cmi_access+0x144/0x530 [i2c_scmi]
i2c_smbus_xfer+0xda/0x370
i2cdev_ioctl_smbus+0x1bd/0x270
i2cdev_ioctl+0xaa/0x250
do_vfs_ioctl+0xa4/0x600
SyS_ioctl+0x79/0x90
do_syscall_64+0x73/0x130
entry_SYSCALL_64_after_hwframe+0x3d/0xa2
ACPI Error: Evaluating _SBW: 4 (20170831/smbus_cmi-185)
This problem occurs because the length of ACPI Buffer object is not
defined/initialized in the code before a corresponding ACPI method is
called. The obvious patch below fixes this issue.
Signed-off-by: Edgar Cherkasov <echerkasov@dev.rtsoft.ru>
Acked-by: Viktor Krasnov <vkrasnov@dev.rtsoft.ru>
Acked-by: Michael Brunner <Michael.Brunner@kontron.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 76ebebd246 upstream.
On Sun Ultra 5, it happens that the dot clock is not set up properly for
some videomodes. For example, if we set the videomode "r1024x768x60" in
the firmware, Linux would incorrectly set a videomode with refresh rate
180Hz when booting (suprisingly, my LCD monitor can display it, although
display quality is very low).
The reason is this: Older mach64 cards set the divider in the register
VCLK_POST_DIV. The register has four 2-bit fields (the field that is
actually used is specified in the lowest two bits of the register
CLOCK_CNTL). The 2 bits select divider "1, 2, 4, 8". On newer mach64 cards,
there's another bit added - the top four bits of PLL_EXT_CNTL extend the
divider selection, so we have possible dividers "1, 2, 4, 8, 3, 5, 6, 12".
The Linux driver clears the top four bits of PLL_EXT_CNTL and never sets
them, so it can work regardless if the card supports them. However, the
sparc64 firmware may set these extended dividers during boot - and the
mach64 driver detects incorrect dot clock in this case.
This patch makes the driver read the additional divider bit from
PLL_EXT_CNTL and calculate the initial refresh rate properly.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Acked-by: David S. Miller <davem@davemloft.net>
Reviewed-by: Ville Syrjälä <syrjala@sci.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d176620277 ]
When VMX is used with flexpriority disabled (because of no support or
if disabled with module parameter) MMIO interface to lAPIC is still
available in x2APIC mode while it shouldn't be (kvm-unit-tests):
PASS: apic_disable: Local apic enabled in x2APIC mode
PASS: apic_disable: CPUID.1H:EDX.APIC[bit 9] is set
FAIL: apic_disable: *0xfee00030: 50014
The issue appears because we basically do nothing while switching to
x2APIC mode when APIC access page is not used. apic_mmio_{read,write}
only check if lAPIC is disabled before proceeding to actual write.
When APIC access is virtualized we correctly manipulate with VMX controls
in vmx_set_virtual_apic_mode() and we don't get vmexits from memory writes
in x2APIC mode so there's no issue.
Disabling MMIO interface seems to be easy. The question is: what do we
do with these reads and writes? If we add apic_x2apic_mode() check to
apic_mmio_in_range() and return -EOPNOTSUPP these reads and writes will
go to userspace. When lAPIC is in kernel, Qemu uses this interface to
inject MSIs only (see kvm_apic_mem_write() in hw/i386/kvm/apic.c). This
somehow works with disabled lAPIC but when we're in xAPIC mode we will
get a real injected MSI from every write to lAPIC. Not good.
The simplest solution seems to be to just ignore writes to the region
and return ~0 for all reads when we're in x2APIC mode. This is what this
patch does. However, this approach is inconsistent with what currently
happens when flexpriority is enabled: we allocate APIC access page and
create KVM memory region so in x2APIC modes all reads and writes go to
this pre-allocated page which is, btw, the same for all vCPUs.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 321cc359d8 ]
We need this new compatibility string as we experienced different behavior
for this 10/100Mbits/s macb interface on this particular SoC.
Backward compatibility is preserved as we keep the alternative strings.
Signed-off-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit eb4ed8e2d7 ]
Create a new configuration for the sama5d3-macb new compatibility string.
This configuration disables scatter-gather because we experienced lock down
of the macb interface of this particular SoC under very high load.
Signed-off-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit edf2ef7242 ]
Synopsys DWC Ethernet MAC can be configured to have 1..32, 64, or
128 unicast filter entries. (Table 7-8 MAC Address Registers from
databook) Fix dwmac1000_validate_ucast_entries() to accept values
between 1 and 32 in addition.
Signed-off-by: Jongsung Kim <neidhard.kim@lge.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b61749a89f ]
In snd_hdac_bus_init_chip(), we enable interrupt before
snd_hdac_bus_init_cmd_io() initializing dma buffers. If irq has
been acquired and irq handler uses the dma buffer, kernel may crash
when interrupt comes in.
Fix the problem by postponing enabling irq after dma buffer
initialization. And warn once on null dma buffer pointer during the
initialization.
Reviewed-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Yu Zhao <yuzhao@google.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 679fcae46c ]
Fedora got a bug report of a crash with iSCSI:
kernel BUG at include/linux/scatterlist.h:143!
...
RIP: 0010:iscsit_do_crypto_hash_buf+0x154/0x180 [iscsi_target_mod]
...
Call Trace:
? iscsi_target_tx_thread+0x200/0x200 [iscsi_target_mod]
iscsit_get_rx_pdu+0x4cd/0xa90 [iscsi_target_mod]
? native_sched_clock+0x3e/0xa0
? iscsi_target_tx_thread+0x200/0x200 [iscsi_target_mod]
iscsi_target_rx_thread+0x81/0xf0 [iscsi_target_mod]
kthread+0x120/0x140
? kthread_create_worker_on_cpu+0x70/0x70
ret_from_fork+0x3a/0x50
This is a BUG_ON for using a stack buffer with a scatterlist. There
are two cases that trigger this bug. Switch to using a dynamically
allocated buffer for one case and do not assign a NULL buffer in
another case.
Signed-off-by: Laura Abbott <labbott@redhat.com>
Reviewed-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 10492ee8ed ]
It currently only works if the parent bus uses "simple-bus". We
currently try to probe children with non-existing compatible values.
And we're missing .probe.
I noticed this while testing devices configured to probe using ti-sysc
interconnect target module driver. For that we also may want to rebind
the driver, so let's remove __init and __exit.
Signed-off-by: Tony Lindgren <tony@atomide.com>
Acked-by: Roger Quadros <rogerq@ti.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4d85af102a ]
add CONFIG_MEMORY_HOTREMOVE=y in config
without this config, /sys/devices/system/memory/memory*/removable
always return 0, I endup getting an early skip during test
Signed-off-by: Lei Yang <Lei.Yang@windriver.com>
Signed-off-by: Shuah Khan (Samsung OSG) <shuah@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c953d63548 upstream.
The info->target comes from userspace and it would be used directly.
So we need to add the sanity check to make sure it is a valid standard
target, although the ebtables tool has already checked it. Kernel needs
to validate anything coming from userspace.
If the target is set as an evil value, it would break the ebtables
and cause a panic. Because the non-standard target is treated as one
offset.
Now add one helper function ebt_invalid_target, and we would replace
the macro INVALID_TARGET later.
Signed-off-by: Gao Feng <gfree.wind@vip.163.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Loic <hackurx@opensec.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 37f31b6ca4 upstream.
The requested device name can be NULL or an empty string.
Check for that and refuse to continue. UBIFS has to do this manually
since we cannot use mount_bdev(), which checks for this condition.
Fixes: 1e51764a3c ("UBIFS: add new flash file system")
Reported-by: syzbot+38bd0f7865e5c6379280@syzkaller.appspotmail.com
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5fe23f262e upstream.
There is a race condition between ucma_close() and ucma_resolve_ip():
CPU0 CPU1
ucma_resolve_ip(): ucma_close():
ctx = ucma_get_ctx(file, cmd.id);
list_for_each_entry_safe(ctx, tmp, &file->ctx_list, list) {
mutex_lock(&mut);
idr_remove(&ctx_idr, ctx->id);
mutex_unlock(&mut);
...
mutex_lock(&mut);
if (!ctx->closing) {
mutex_unlock(&mut);
rdma_destroy_id(ctx->cm_id);
...
ucma_free_ctx(ctx);
ret = rdma_resolve_addr();
ucma_put_ctx(ctx);
Before idr_remove(), ucma_get_ctx() could still find the ctx
and after rdma_destroy_id(), rdma_resolve_addr() may still
access id_priv pointer. Also, ucma_put_ctx() may use ctx after
ucma_free_ctx() too.
ucma_close() should call ucma_put_ctx() too which tests the
refcnt and waits for the last one releasing it. The similar
pattern is already used by ucma_destroy_id().
Reported-and-tested-by: syzbot+da2591e115d57a9cbb8b@syzkaller.appspotmail.com
Reported-by: syzbot+cfe3c1e8ef634ba8964b@syzkaller.appspotmail.com
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c58a584f05 upstream.
Per ARC TLS ABI, r25 is designated TP (thread pointer register).
However so far kernel didn't do any special treatment, like setting up
usermode r25, even for CLONE_SETTLS. We instead relied on libc runtime
to do this, in say clone libc wrapper [1]. This was deliberate to keep
kernel ABI agnostic (userspace could potentially change TP, specially
for different ARC ISA say ARCompact vs. ARCv2 with different spare
registers etc)
However userspace setting up r25, after clone syscall opens a race, if
child is not scheduled and gets a signal instead. It starts off in
userspace not in clone but in a signal handler and anything TP sepcific
there such as pthread_self() fails which showed up with uClibc
testsuite nptl/tst-kill6 [2]
Fix this by having kernel populate r25 to TP value. So this locks in
ABI, but it was not going to change anyways, and fwiw is same for both
ARCompact (arc700 core) and ARCvs (HS3x cores)
[1] https://cgit.uclibc-ng.org/cgi/cgit/uclibc-ng.git/tree/libc/sysdeps/linux/arc/clone.S
[2] https://github.com/wbx-github/uclibc-ng-test/blob/master/test/nptl/tst-kill6.c
Fixes: ARC STAR 9001378481
Cc: stable@vger.kernel.org
Reported-by: Nikita Sobolev <sobolev@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 98b8cd7f75 upstream.
- log an error message when registration fails and no error code listed
in the switch is returned
- translate the hv error code to posix error code and return it from
fw_register
- return the posix error code from fw_register to the process writing
to sysfs
- return EEXIST on re-registration
- return success on deregistration when fadump is not registered
- return ENODEV when no memory is reserved for fadump
Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Tested-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
[mpe: Use pr_err() to shrink the error print]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9ef0f58ed7 upstream.
The skb may be freed in tx completion context before
trace_ath10k_wmi_cmd is called. This can be easily captured when
KASAN(Kernel Address Sanitizer) is enabled. The fix is to move
trace_ath10k_wmi_cmd before the send operation. As the ret has no
meaning in trace_ath10k_wmi_cmd then, so remove this parameter too.
Signed-off-by: Carl Huang <cjhuang@codeaurora.org>
Tested-by: Brian Norris <briannorris@chromium.org>
Reviewed-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 116d2f7496 upstream.
Deadlock during cgroup migration from cpu hotplug path when a task T is
being moved from source to destination cgroup.
kworker/0:0
cpuset_hotplug_workfn()
cpuset_hotplug_update_tasks()
hotplug_update_tasks_legacy()
remove_tasks_in_empty_cpuset()
cgroup_transfer_tasks() // stuck in iterator loop
cgroup_migrate()
cgroup_migrate_add_task()
In cgroup_migrate_add_task() it checks for PF_EXITING flag of task T.
Task T will not migrate to destination cgroup. css_task_iter_start()
will keep pointing to task T in loop waiting for task T cg_list node
to be removed.
Task T
do_exit()
exit_signals() // sets PF_EXITING
exit_task_namespaces()
switch_task_namespaces()
free_nsproxy()
put_mnt_ns()
drop_collected_mounts()
namespace_unlock()
synchronize_rcu()
_synchronize_rcu_expedited()
schedule_work() // on cpu0 low priority worker pool
wait_event() // waiting for work item to execute
Task T inserted a work item in the worklist of cpu0 low priority
worker pool. It is waiting for expedited grace period work item
to execute. This work item will only be executed once kworker/0:0
complete execution of cpuset_hotplug_workfn().
kworker/0:0 ==> Task T ==>kworker/0:0
In case of PF_EXITING task being migrated from source to destination
cgroup, migrate next available task in source cgroup.
Signed-off-by: Prateek Sood <prsood@codeaurora.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
[AmitP: Upstream commit cherry-pick failed, so I picked the
backported changes from CAF/msm-4.9 tree instead:
https://source.codeaurora.org/quic/la/kernel/msm-4.9/commit/?id=49b74f1696417b270c89cd893ca9f37088928078]
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 513f86d738 upstream.
If there an inode points to a block which is also some other type of
metadata block (such as a block allocation bitmap), the
buffer_verified flag can be set when it was validated as that other
metadata block type; however, it would make a really terrible external
attribute block. The reason why we use the verified flag is to avoid
constantly reverifying the block. However, it doesn't take much
overhead to make sure the magic number of the xattr block is correct,
and this will avoid potential crashes.
This addresses CVE-2018-10879.
https://bugzilla.kernel.org/show_bug.cgi?id=200001
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
[Backported to 4.9: adjust context]
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5369a762c8 upstream.
In theory this should have been caught earlier when the xattr list was
verified, but in case it got missed, it's simple enough to add check
to make sure we don't overrun the xattr buffer.
This addresses CVE-2018-10879.
https://bugzilla.kernel.org/show_bug.cgi?id=200001
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
[bwh: Backported to 3.16:
- Add inode parameter to ext4_xattr_set_entry() and update callers
- Return -EIO instead of -EFSCORRUPTED on error
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[adjusted context for 4.9]
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8894891446 upstream.
On systems with OF_IMAP_OLDWORLD_MAC set in of_irq_workarounds, the
devicetree interrupt parsing code is different, causing unit tests of
devicetree interrupt nodes to fail. Due to a bug in unittest code, which
tries to dereference an uninitialized pointer, this results in a crash.
OF: /testcase-data/phandle-tests/consumer-a: arguments longer than property
Unable to handle kernel paging request for data at address 0x00bc616e
Faulting instruction address: 0xc08e9468
Oops: Kernel access of bad area, sig: 11 [#1]
BE PREEMPT PowerMac
Modules linked in:
CPU: 0 PID: 1 Comm: swapper Not tainted 4.14.72-rc1-yocto-standard+ #1
task: cf8e0000 task.stack: cf8da000
NIP: c08e9468 LR: c08ea5bc CTR: c08ea5ac
REGS: cf8dbb50 TRAP: 0300 Not tainted (4.14.72-rc1-yocto-standard+)
MSR: 00001032 <ME,IR,DR,RI> CR: 82004044 XER: 00000000
DAR: 00bc616e DSISR: 40000000
GPR00: c08ea5bc cf8dbc00 cf8e0000 c13ca517c13ca517 c13ca8a0 00000066 00000002
GPR08: 00000063 00bc614e c0b05865 000affff 82004048 00000000 c00047f0 00000000
GPR16: c0a80000 c0a9cc34 c13ca517 c0ad1134 05ffffff 000affff c0b05860 c0abeef8
GPR24: cecec278 cecec278 c0a8c4d0 c0a885e0 c13ca8a0 05ffffff c13ca8a0 c13ca517
NIP [c08e9468] device_node_gen_full_name+0x30/0x15c
LR [c08ea5bc] device_node_string+0x190/0x3c8
Call Trace:
[cf8dbc00] [c007f670] trace_hardirqs_on_caller+0x118/0x1fc (unreliable)
[cf8dbc40] [c08ea5bc] device_node_string+0x190/0x3c8
[cf8dbcb0] [c08eb794] pointer+0x25c/0x4d0
[cf8dbd00] [c08ebcbc] vsnprintf+0x2b4/0x5ec
[cf8dbd60] [c08ec00c] vscnprintf+0x18/0x48
[cf8dbd70] [c008e268] vprintk_store+0x4c/0x22c
[cf8dbda0] [c008ecac] vprintk_emit+0x94/0x130
[cf8dbdd0] [c008ff54] printk+0x5c/0x6c
[cf8dbe10] [c0b8ddd4] of_unittest+0x2220/0x26f8
[cf8dbea0] [c0004434] do_one_initcall+0x4c/0x184
[cf8dbf00] [c0b4534c] kernel_init_freeable+0x13c/0x1d8
[cf8dbf30] [c0004814] kernel_init+0x24/0x118
[cf8dbf40] [c0013398] ret_from_kernel_thread+0x5c/0x64
The problem was observed when running a qemu test for the g3beige machine
with devicetree unittests enabled.
Disable interrupt node tests on affected systems to avoid both false
unittest failures and the crash.
With this patch in place, unittest on the affected system passes with
the following message.
dt-test ### end of unittest - 144 passed, 0 failed
Fixes: 53a42093d9 ("of: Add device tree selftests")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Frank Rowand <frank.rowand@sony.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ffe84e01bb upstream.
The workaround for missing CAS bit is also needed for xHC on Intel
sunrisepoint PCH. For more details see:
Intel 100/c230 series PCH specification update Doc #332692-006 Errata #8
Cc: <stable@vger.kernel.org>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5d07384a66 upstream.
A reload of the cache's DM table is needed during resize because
otherwise a crash will occur when attempting to access smq policy
entries associated with the portion of the cache that was recently
extended.
The reason is cache-size based data structures in the policy will not be
resized, the only way to safely extend the cache is to allow for a
proper cache policy initialization that occurs when the cache table is
loaded. For example the smq policy's space_init(), init_allocator(),
calc_hotspot_params() must be sized based on the extended cache size.
The fix for this is to disallow cache resizes of this pattern:
1) suspend "cache" target's device
2) resize the fast device used for the cache
3) resume "cache" target's device
Instead, the last step must be a full reload of the cache's DM table.
Fixes: 66a636356 ("dm cache: add stochastic-multi-queue (smq) policy")
Cc: stable@vger.kernel.org
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4561ffca88 upstream.
Commit fd2fa9541 ("dm cache metadata: save in-core policy_hint_size to
on-disk superblock") enabled previously written policy hints to be
used after a cache is reactivated. But in doing so the cache
metadata's hint array was left exposed to out of bounds access because
on resize the metadata's on-disk hint array wasn't ever extended.
Fix this by ignoring that there are no on-disk hints associated with the
newly added cache blocks. An expanded on-disk hint array is later
rewritten upon the next clean shutdown of the cache.
Fixes: fd2fa9541 ("dm cache metadata: save in-core policy_hint_size to on-disk superblock")
Cc: stable@vger.kernel.org
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 69e445ab8b upstream.
If __device_suspend() runs asynchronously (in which case the device
passed to it is in dpm_suspended_list at that point) and it returns
early on an error or pending wakeup, and the power.direct_complete
flag has been set for the device already, the subsequent
device_resume() will be confused by that and it will call
pm_runtime_enable() incorrectly, as runtime PM has not been
disabled for the device by __device_suspend().
To avoid that, clear power.direct_complete if __device_suspend()
is not going to disable runtime PM for the device before returning.
Fixes: aae4518b31 (PM / sleep: Mechanism to avoid resuming runtime-suspended devices unnecessarily)
Reported-by: Al Cooper <alcooperx@gmail.com>
Tested-by: Al Cooper <alcooperx@gmail.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Cc: 3.16+ <stable@vger.kernel.org> # 3.16+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 211710ca74 upstream.
key->sta is only valid after ieee80211_key_link, which is called later
in this function. Because of that, the IEEE80211_KEY_FLAG_RX_MGMT is
never set when management frame protection is enabled.
Fixes: e548c49e6d ("mac80211: add key flag for management keys")
Cc: stable@vger.kernel.org
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 083874549f upstream.
On 38+ Intel-based ASUS products, the NVIDIA GPU becomes unusable after S3
suspend/resume. The affected products include multiple generations of
NVIDIA GPUs and Intel SoCs. After resume, nouveau logs many errors such
as:
fifo: fault 00 [READ] at 0000005555555000 engine 00 [GR] client 04
[HUB/FE] reason 4a [] on channel -1 [007fa91000 unknown]
DRM: failed to idle channel 0 [DRM]
Similarly, the NVIDIA proprietary driver also fails after resume (black
screen, 100% CPU usage in Xorg process). We shipped a sample to NVIDIA for
diagnosis, and their response indicated that it's a problem with the parent
PCI bridge (on the Intel SoC), not the GPU.
Runtime suspend/resume works fine, only S3 suspend is affected.
We found a workaround: on resume, rewrite the Intel PCI bridge
'Prefetchable Base Upper 32 Bits' register (PCI_PREF_BASE_UPPER32). In the
cases that I checked, this register has value 0 and we just have to rewrite
that value.
Linux already saves and restores PCI config space during suspend/resume,
but this register was being skipped because upon resume, it already has
value 0 (the correct, pre-suspend value).
Intel appear to have previously acknowledged this behaviour and the
requirement to rewrite this register:
https://bugzilla.kernel.org/show_bug.cgi?id=116851#c23
Based on that, rewrite the prefetch register values even when that appears
unnecessary.
We have confirmed this solution on all the affected models we have in-hands
(X542UQ, UX533FD, X530UN, V272UN).
Additionally, this solves an issue where r8169 MSI-X interrupts were broken
after S3 suspend/resume on ASUS X441UAR. This issue was recently worked
around in commit 7bb05b85bc ("r8169: don't use MSI-X on RTL8106e"). It
also fixes the same issue on RTL6186evl/8111evl on an Aimfor-tech laptop
that we had not yet patched. I suspect it will also fix the issue that was
worked around in commit 7c53a72245 ("r8169: don't use MSI-X on
RTL8168g").
Thomas Martitz reports that this change also solves an issue where the AMD
Radeon Polaris 10 GPU on the HP Zbook 14u G5 is unresponsive after S3
suspend/resume.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=201069
Signed-off-by: Daniel Drake <drake@endlessm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-By: Peter Wu <peter@lekensteyn.nl>
CC: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 715bd9d12f upstream.
The syscall fallbacks in the vDSO have incorrect asm constraints.
They are not marked as writing to their outputs -- instead, they are
marked as clobbering "memory", which is useless. In particular, gcc
is smart enough to know that the timespec parameter hasn't escaped,
so a memory clobber doesn't clobber it. And passing a pointer as an
asm *input* does not tell gcc that the pointed-to value is changed.
Add in the fact that the asm instructions weren't volatile, and gcc
was free to omit them entirely unless their sole output (the return
value) is used. Which it is (phew!), but that stops happening with
some upcoming patches.
As a trivial example, the following code:
void test_fallback(struct timespec *ts)
{
vdso_fallback_gettime(CLOCK_MONOTONIC, ts);
}
compiles to:
00000000000000c0 <test_fallback>:
c0: c3 retq
To add insult to injury, the RCX and R11 clobbers on 64-bit
builds were missing.
The "memory" clobber is also unnecessary -- no ordering with respect to
other memory operations is needed, but that's going to be fixed in a
separate not-for-stable patch.
Fixes: 2aae950b21 ("x86_64: Add vDSO for x86-64 with gettimeofday/clock_gettime/getcpu")
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/2c0231690551989d2fafa60ed0e7b5cc8b403908.1538422295.git.luto@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 780e83c259 upstream.
Both len and off are frontend specified values, so we need to make
sure there's no overflow when adding the two for the bounds check. We
also want to avoid undefined behavior and hence use off to index into
->hash.mapping[] only after bounds checking. This at the same time
allows to take care of not applying off twice for the bounds checking
against vif->num_queues.
It is also insufficient to bounds check copy_op.len, as this is len
truncated to 16 bits.
This is XSA-270 / CVE-2018-15471.
Reported-by: Felix Wilhelm <fwilhelm@google.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Tested-by: Paul Durrant <paul.durrant@citrix.com>
Cc: stable@vger.kernel.org [4.7 onwards]
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1bafcbf59f upstream.
OMAPFB_MEMORY_READ ioctl reads pixels from the LCD's memory and copies
them to a userspace buffer. The code has two issues:
- The user provided width and height could be large enough to overflow
the calculations
- The copy_to_user() can copy uninitialized memory to the userspace,
which might contain sensitive kernel information.
Fix these by limiting the width & height parameters, and only copying
the amount of data that we actually received from the LCD.
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Reported-by: Jann Horn <jannh@google.com>
Cc: stable@vger.kernel.org
Cc: security@kernel.org
Cc: Will Deacon <will.deacon@arm.com>
Cc: Jann Horn <jannh@google.com>
Cc: Tony Lindgren <tony@atomide.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 013ad04390 upstream.
sector_div() is only viable for use with sector_t.
dm_block_t is typedef'd to uint64_t -- so use div_u64() instead.
Fixes: 3ab918281 ("dm thin metadata: try to avoid ever aborting transactions")
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f8a00cef17 upstream.
Currently, you can use /proc/self/task/*/stack to cause a stack walk on
a task you control while it is running on another CPU. That means that
the stack can change under the stack walker. The stack walker does
have guards against going completely off the rails and into random
kernel memory, but it can interpret random data from your kernel stack
as instruction pointers and stack pointers. This can cause exposure of
kernel stack contents to userspace.
Restrict the ability to inspect kernel stacks of arbitrary tasks to root
in order to prevent a local attacker from exploiting racy stack unwinding
to leak kernel task stack contents. See the added comment for a longer
rationale.
There don't seem to be any users of this userspace API that can't
gracefully bail out if reading from the file fails. Therefore, I believe
that this change is unlikely to break things. In the case that this patch
does end up needing a revert, the next-best solution might be to fake a
single-entry stack based on wchan.
Link: http://lkml.kernel.org/r/20180927153316.200286-1-jannh@google.com
Fixes: 2ec220e27f ("proc: add /proc/*/stack")
Signed-off-by: Jann Horn <jannh@google.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Ken Chen <kenchen@google.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d80771c083 upstream.
When compiling with CONFIG_DEBUG_ATOMIC_SLEEP=y the mxs-dcp driver
prints warnings such as:
WARNING: CPU: 0 PID: 120 at kernel/sched/core.c:7736 __might_sleep+0x98/0x9c
do not call blocking ops when !TASK_RUNNING; state=1 set at [<8081978c>] dcp_chan_thread_sha+0x3c/0x2ec
The problem is that blocking ops will manipulate current->state
themselves so it is not allowed to call them between
set_current_state(TASK_INTERRUPTIBLE) and schedule().
Fix this by converting the per-chan mutex to a spinlock (it only
protects tiny list ops anyway) and rearranging the wait logic so that
callbacks are called current->state as TASK_RUNNING. Those callbacks
will indeed call blocking ops themselves so this is required.
Cc: <stable@vger.kernel.org>
Signed-off-by: Leonard Crestez <leonard.crestez@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0595751f26 upstream.
When mounting a Windows share that is the root of a drive (eg. C$)
the server does not return . and .. directory entries. This results in
the smb2 code path erroneously skipping the 2 first entries.
Pseudo-code of the readdir() code path:
cifs_readdir(struct file, struct dir_context)
initiate_cifs_search <-- if no reponse cached yet
server->ops->query_dir_first
dir_emit_dots
dir_emit <-- adds "." and ".." if we're at pos=0
find_cifs_entry
initiate_cifs_search <-- if pos < start of current response
(restart search)
server->ops->query_dir_next <-- if pos > end of current response
(fetch next search res)
for(...) <-- loops over cur response entries
starting at pos
cifs_filldir <-- skip . and .., emit entry
cifs_fill_dirent
dir_emit
pos++
A) dir_emit_dots() always adds . & ..
and sets the current dir pos to 2 (0 and 1 are done).
Therefore we always want the index_to_find to be 2 regardless of if
the response has . and ..
B) smb1 code initializes index_of_last_entry with a +2 offset
in cifssmb.c CIFSFindFirst():
psrch_inf->index_of_last_entry = 2 /* skip . and .. */ +
psrch_inf->entries_in_buffer;
Later in find_cifs_entry() we want to find the next dir entry at pos=2
as a result of (A)
first_entry_in_buffer = cfile->srch_inf.index_of_last_entry -
cfile->srch_inf.entries_in_buffer;
This var is the dir pos that the first entry in the buffer will
have therefore it must be 2 in the first call.
If we don't offset index_of_last_entry by 2 (like in (B)),
first_entry_in_buffer=0 but we were instructed to get pos=2 so this
code in find_cifs_entry() skips the 2 first which is ok for non-root
shares, as it skips . and .. from the response but is not ok for root
shares where the 2 first are actual files
pos_in_buf = index_to_find - first_entry_in_buffer;
// pos_in_buf=2
// we skip 2 first response entries :(
for (i = 0; (i < (pos_in_buf)) && (cur_ent != NULL); i++) {
/* go entry by entry figuring out which is first */
cur_ent = nxt_dir_entry(cur_ent, end_of_smb,
cfile->srch_inf.info_level);
}
C) cifs_filldir() skips . and .. so we can safely ignore them for now.
Sample program:
int main(int argc, char **argv)
{
const char *path = argc >= 2 ? argv[1] : ".";
DIR *dh;
struct dirent *de;
printf("listing path <%s>\n", path);
dh = opendir(path);
if (!dh) {
printf("opendir error %d\n", errno);
return 1;
}
while (1) {
de = readdir(dh);
if (!de) {
if (errno) {
printf("readdir error %d\n", errno);
return 1;
}
printf("end of listing\n");
break;
}
printf("off=%lu <%s>\n", de->d_off, de->d_name);
}
return 0;
}
Before the fix with SMB1 on root shares:
<.> off=1
<..> off=2
<$Recycle.Bin> off=3
<bootmgr> off=4
and on non-root shares:
<.> off=1
<..> off=4 <-- after adding .., the offsets jumps to +2 because
<2536> off=5 we skipped . and .. from response buffer (C)
<411> off=6 but still incremented pos
<file> off=7
<fsx> off=8
Therefore the fix for smb2 is to mimic smb1 behaviour and offset the
index_of_last_entry by 2.
Test results comparing smb1 and smb2 before/after the fix on root
share, non-root shares and on large directories (ie. multi-response
dir listing):
PRE FIX
=======
pre-1-root VS pre-2-root:
ERR pre-2-root is missing [bootmgr, $Recycle.Bin]
pre-1-nonroot VS pre-2-nonroot:
OK~ same files, same order, different offsets
pre-1-nonroot-large VS pre-2-nonroot-large:
OK~ same files, same order, different offsets
POST FIX
========
post-1-root VS post-2-root:
OK same files, same order, same offsets
post-1-nonroot VS post-2-nonroot:
OK same files, same order, same offsets
post-1-nonroot-large VS post-2-nonroot-large:
OK same files, same order, same offsets
REGRESSION?
===========
pre-1-root VS post-1-root:
OK same files, same order, same offsets
pre-1-nonroot VS post-1-nonroot:
OK same files, same order, same offsets
BugLink: https://bugzilla.samba.org/show_bug.cgi?id=13107
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Paulo Alcantara <palcantara@suse.deR>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ffc4c92227 upstream.
Commit 786534b92f introduced a regression that caused listxattr to
return the POSIX ACL attribute names even though sysfs doesn't support
POSIX ACLs. This happens because simple_xattr_list checks for NULL
i_acl / i_default_acl, but inode_init_always initializes those fields
to ACL_NOT_CACHED ((void *)-1). For example:
$ getfattr -m- -d /sys
/sys: system.posix_acl_access: Operation not supported
/sys: system.posix_acl_default: Operation not supported
Fix this in simple_xattr_list by checking if the filesystem supports POSIX ACLs.
Fixes: 786534b92f ("tmpfs: listxattr should include POSIX ACL xattrs")
Reported-by: Marc Aurèle La France <tsi@tuyoix.net>
Tested-by: Marc Aurèle La France <tsi@tuyoix.net>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Cc: stable@vger.kernel.org # v4.5+
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3366cdb6d3 ]
The command 'xl vcpu-set 0 0', issued in dom0, will crash dom0:
BUG: unable to handle kernel NULL pointer dereference at 00000000000002d8
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 65 Comm: xenwatch Not tainted 4.19.0-rc2-1.ga9462db-default #1 openSUSE Tumbleweed (unreleased)
Hardware name: Intel Corporation S5520UR/S5520UR, BIOS S5500.86B.01.00.0050.050620101605 05/06/2010
RIP: e030:device_offline+0x9/0xb0
Code: 77 24 00 e9 ce fe ff ff 48 8b 13 e9 68 ff ff ff 48 8b 13 e9 29 ff ff ff 48 8b 13 e9 ea fe ff ff 90 66 66 66 66 90 41 54 55 53 <f6> 87 d8 02 00 00 01 0f 85 88 00 00 00 48 c7 c2 20 09 60 81 31 f6
RSP: e02b:ffffc90040f27e80 EFLAGS: 00010203
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: ffff8801f3800000 RSI: ffffc90040f27e70 RDI: 0000000000000000
RBP: 0000000000000000 R08: ffffffff820e47b3 R09: 0000000000000000
R10: 0000000000007ff0 R11: 0000000000000000 R12: ffffffff822e6d30
R13: dead000000000200 R14: dead000000000100 R15: ffffffff8158b4e0
FS: 00007ffa595158c0(0000) GS:ffff8801f39c0000(0000) knlGS:0000000000000000
CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000002d8 CR3: 00000001d9602000 CR4: 0000000000002660
Call Trace:
handle_vcpu_hotplug_event+0xb5/0xc0
xenwatch_thread+0x80/0x140
? wait_woken+0x80/0x80
kthread+0x112/0x130
? kthread_create_worker_on_cpu+0x40/0x40
ret_from_fork+0x3a/0x50
This happens because handle_vcpu_hotplug_event is called twice. In the
first iteration cpu_present is still true, in the second iteration
cpu_present is false which causes get_cpu_device to return NULL.
In case of cpu#0, cpu_online is apparently always true.
Fix this crash by checking if the cpu can be hotplugged, which is false
for a cpu that was just removed.
Also check if the cpu was actually offlined by device_remove, otherwise
leave the cpu_present state as it is.
Rearrange to code to do all work with device_hotplug_lock held.
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 87dffe86d4 ]
When guest receives a sysrq request from the host it acknowledges it by
writing '\0' to control/sysrq xenstore node. This, however, make xenstore
watch fire again but xenbus_scanf() fails to parse empty value with "%c"
format string:
sysrq: SysRq : Emergency Sync
Emergency Sync complete
xen:manage: Error -34 reading sysrq code in control/sysrq
Ignore -ERANGE the same way we already ignore -ENOENT, empty value in
control/sysrq is totally legal.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0ac1487c4b ]
For inbound data with an unsupported HW header format, only dump the
actual HW header. We have no idea how much payload follows it, and what
it contains. Worst case, we dump past the end of the Inbound Buffer and
access whatever is located next in memory.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit aec45e857c ]
qeth_query_oat_command() currently allocates the kernel buffer for
the SIOC_QETH_QUERY_OAT ioctl with kzalloc. So on systems with
fragmented memory, large allocations may fail (eg. the qethqoat tool by
default uses 132KB).
Solve this issue by using vzalloc, backing the allocation with
non-contiguous memory.
Signed-off-by: Wenjia Zhang <wenjia@linux.ibm.com>
Reviewed-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 6ad5690199 ]
After system suspend, sometimes the r8169 doesn't work when ethernet
cable gets pluggued.
This issue happens because rtl_reset_work() doesn't get called from
rtl8169_runtime_resume(), after system suspend.
In rtl_task(), RTL_FLAG_TASK_* only gets cleared if this condition is
met:
if (!netif_running(dev) ||
!test_bit(RTL_FLAG_TASK_ENABLED, tp->wk.flags))
...
If RTL_FLAG_TASK_ENABLED was cleared during system suspend while
RTL_FLAG_TASK_RESET_PENDING was set, the next rtl_schedule_task() won't
schedule task as the flag is still there.
So in addition to clearing RTL_FLAG_TASK_ENABLED, also clears other
flags.
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5c41aaad40 ]
Building drivers/mtd/nand/raw/nandsim.c on arch/hexagon/ produces a
printk format build warning. This is due to hexagon's ffs() being
coded as returning long instead of int.
Fix the printk format warning by changing all of hexagon's ffs() and
fls() functions to return int instead of long. The variables that
they return are already int instead of long. This return type
matches the return type in <asm-generic/bitops/>.
../drivers/mtd/nand/raw/nandsim.c: In function 'init_nandsim':
../drivers/mtd/nand/raw/nandsim.c:760:2: warning: format '%u' expects argument of type 'unsigned int', but argument 2 has type 'long int' [-Wformat]
There are no ffs() or fls() allmodconfig build errors after making this
change.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: linux-hexagon@vger.kernel.org
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Patch-mainline: linux-kernel @ 07/22/2018, 16:03
Signed-off-by: Richard Kuo <rkuo@codeaurora.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 200f351e27 ]
Fix build warning in arch/hexagon/kernel/dma.c by casting a void *
to unsigned long to match the function parameter type.
../arch/hexagon/kernel/dma.c: In function 'arch_dma_alloc':
../arch/hexagon/kernel/dma.c:51:5: warning: passing argument 2 of 'gen_pool_add' makes integer from pointer without a cast [enabled by default]
../include/linux/genalloc.h:112:19: note: expected 'long unsigned int' but argument is of type 'void *'
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: linux-sh@vger.kernel.org
Patch-mainline: linux-kernel @ 07/20/2018, 20:17
[rkuo@codeaurora.org: fixed architecture name]
Signed-off-by: Richard Kuo <rkuo@codeaurora.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3ab9182816 ]
Committing a transaction can consume some metadata of it's own, we now
reserve a small amount of metadata to cover this. Free metadata
reported by the kernel will not include this reserve.
If any of the reserve has been used after a commit we enter a new
internal state PM_OUT_OF_METADATA_SPACE. This is reported as
PM_READ_ONLY, so no userland changes are needed. If the metadata
device is resized the pool will move back to PM_WRITE.
These changes mean we never need to abort and rollback a transaction due
to running out of metadata space. This is particularly important
because there have been a handful of reports of data corruption against
DM thin-provisioning that can all be attributed to the thin-pool having
ran out of metadata space.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 16160c1946 ]
Problem: perf did not show branch predicted/mispredicted bit in brstack.
Output of perf -F brstack for profile collected
Before:
0x4fdbcd/0x4fdc03/-/-/-/0
0x45f4c1/0x4fdba0/-/-/-/0
0x45f544/0x45f4bb/-/-/-/0
0x45f555/0x45f53c/-/-/-/0
0x7f66901cc24b/0x45f555/-/-/-/0
0x7f66901cc22e/0x7f66901cc23d/-/-/-/0
0x7f66901cc1ff/0x7f66901cc20f/-/-/-/0
0x7f66901cc1e8/0x7f66901cc1fc/-/-/-/0
After:
0x4fdbcd/0x4fdc03/P/-/-/0
0x45f4c1/0x4fdba0/P/-/-/0
0x45f544/0x45f4bb/P/-/-/0
0x45f555/0x45f53c/P/-/-/0
0x7f66901cc24b/0x45f555/P/-/-/0
0x7f66901cc22e/0x7f66901cc23d/P/-/-/0
0x7f66901cc1ff/0x7f66901cc20f/P/-/-/0
0x7f66901cc1e8/0x7f66901cc1fc/P/-/-/0
Cause:
As mentioned in Software Development Manual vol 3, 17.4.8.1,
IA32_PERF_CAPABILITIES[5:0] indicates the format of the address that is
stored in the LBR stack. Knights Landing reports 1 (LBR_FORMAT_LIP) as
its format. Despite that, registers containing FROM address of the branch,
do have MISPREDICT bit but because of the format indicated in
IA32_PERF_CAPABILITIES[5:0], LBR did not read MISPREDICT bit.
Solution:
Teach LBR about above Knights Landing quirk and make it read MISPREDICT bit.
Signed-off-by: Jacek Tomaka <jacek.tomaka@poczta.fm>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180802013830.10600-1-jacekt@dugeo.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ef5b0771d2 ]
The buffer length field in the ena rx descriptor is 16 bit, and the
current driver passes a full page in each ena rx descriptor.
When PAGE_SIZE equals 64kB or more, the buffer length field becomes
zero.
To solve this issue, limit the ena Rx descriptor to use 16kB even
when allocating 64kB kernel pages. This change would not impact ena
device functionality, as 16kB is still larger than maximum MTU.
Signed-off-by: Netanel Belgazal <netanel@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit bcfb84a996 ]
A powerpc build of cifs with gcc v8.2.0 produces this warning:
fs/cifs/cifssmb.c: In function ‘CIFSSMBNegotiate’:
fs/cifs/cifssmb.c:605:3: warning: ‘strncpy’ writing 16 bytes into a region of size 1 overflows the destination [-Wstringop-overflow=]
strncpy(pSMB->DialectsArray+count, protocols[i].name, 16);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Since we are already doing a strlen() on the source, change the strncpy
to a memcpy().
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c44a5ee803 ]
Update superblock when particular devices are requested via rebuild
(e.g. lvconvert --replace ...) to avoid spurious failure with the "New
device injected into existing raid set without 'delta_disks' or
'rebuild' parameter specified" error message.
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0a6986c659 ]
This Falcon application doesn't appear to be present on some newer
systems, so let's not fail init if we can't find it.
TBD: is there a way to determine whether it *should* be there?
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 8407879c4e ]
Currently we always repost the recv buffer before we send a response
capsule back to the host. Since ordering is not guaranteed for send
and recv completions, it is posible that we will receive a new request
from the host before we got a send completion for the response capsule.
Today, we pre-allocate 2x rsps the length of the queue, but in reality,
under heavy load there is nothing that is really preventing the gap to
expand until we exhaust all our rsps.
To fix this, if we don't have any pre-allocated rsps left, we dynamically
allocate a rsp and make sure to free it when we are done. If under memory
pressure we fail to allocate a rsp, we silently drop the command and
wait for the host to retry.
Reported-by: Steve Wise <swise@opengridcomputing.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
[hch: dropped a superflous assignment]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 14427b8683 ]
snprintf() always returns the full length of the string it could have
printed, even if it was truncated because the buffer was too small.
So in case the counter value is truncated, we will over-read from
in_buffer and over-write to the caller's buffer.
I don't think it's actually possible for this to happen, but in case
truncation occurs, WARN and return -EIO.
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0d23ba6034 ]
The current code grabs the private_data of whatever file descriptor
userspace has supplied and implicitly casts it to a `struct ucma_file *`,
potentially causing a type confusion.
This is probably fine in practice because the pointer is only used for
comparisons, it is never actually dereferenced; and even in the
comparisons, it is unlikely that a file from another filesystem would have
a ->private_data pointer that happens to also be valid in this context.
But ->private_data is not always guaranteed to be a valid pointer to an
object owned by the file's filesystem; for example, some filesystems just
cram numbers in there.
Check the type of the supplied file descriptor to be safe, analogous to how
other places in the kernel do it.
Fixes: 88314e4dda ("RDMA/cma: add support for rdma_migrate_id()")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c37bd52836 ]
There is no deallocation of fotg210->ep[i] elements, allocated at
fotg210_udc_probe.
The patch adds deallocation of fotg210->ep array elements and simplifies
error path of fotg210_udc_probe().
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Anton Vasilyev <vasilyev@ispras.ru>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ee34549243 ]
USB device
Vendor 05ac (Apple)
Device 026c (Magic Keyboard with Numeric Keypad)
Bluetooth devices
Vendor 004c (Apple)
Device 0267 (Magic Keyboard)
Device 026c (Magic Keyboard with Numeric Keypad)
Support already exists for the Magic Keyboard over USB connection.
Add support for the Magic Keyboard over Bluetooth connection, and for
the Magic Keyboard with Numeric Keypad over Bluetooth and USB
connection.
Signed-off-by: Sean O'Brien <seobrien@chromium.org>
Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d41aa52523 upstream.
Reproducer, assuming 2M of hugetlbfs available:
Hugetlbfs mounted, size=2M and option user=testuser
# mount | grep ^hugetlbfs
hugetlbfs on /dev/hugepages type hugetlbfs (rw,pagesize=2M,user=dan)
# sysctl vm.nr_hugepages=1
vm.nr_hugepages = 1
# grep Huge /proc/meminfo
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
HugePages_Total: 1
HugePages_Free: 1
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
Hugetlb: 2048 kB
Code:
#include <sys/mman.h>
#include <stddef.h>
#define SIZE 2*1024*1024
int main()
{
void *ptr;
ptr = mmap(NULL, SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_HUGETLB | MAP_ANONYMOUS, -1, 0);
madvise(ptr, SIZE, MADV_DONTDUMP);
madvise(ptr, SIZE, MADV_DODUMP);
}
Compile and strace:
mmap(NULL, 2097152, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_HUGETLB, -1, 0) = 0x7ff7c9200000
madvise(0x7ff7c9200000, 2097152, MADV_DONTDUMP) = 0
madvise(0x7ff7c9200000, 2097152, MADV_DODUMP) = -1 EINVAL (Invalid argument)
hugetlbfs pages have VM_DONTEXPAND in the VmFlags driver pages based on
author testing with analysis from Florian Weimer[1].
The inclusion of VM_DONTEXPAND into the VM_SPECIAL defination was a
consequence of the large useage of VM_DONTEXPAND in device drivers.
A consequence of [2] is that VM_DONTEXPAND marked pages are unable to be
marked DODUMP.
A user could quite legitimately madvise(MADV_DONTDUMP) their hugetlbfs
memory for a while and later request that madvise(MADV_DODUMP) on the same
memory. We correct this omission by allowing madvice(MADV_DODUMP) on
hugetlbfs pages.
[1] https://stackoverflow.com/questions/52548260/madvisedodump-on-the-same-ptr-size-as-a-successful-madvisedontdump-fails-wit
[2] commit 0103bd16fb ("mm: prepare VM_DONTDUMP for using in drivers")
Link: http://lkml.kernel.org/r/20180930054629.29150-1-daniel@linux.ibm.com
Link: https://lists.launchpad.net/maria-discuss/msg05245.html
Fixes: 0103bd16fb ("mm: prepare VM_DONTDUMP for using in drivers")
Reported-by: Kenneth Penza <kpenza@gmail.com>
Signed-off-by: Daniel Black <daniel@linux.ibm.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 6c18b27d6e ]
If the driver fails to properly prepare for the channel
switch, mac80211 will disconnect. If the CSA IE had mode
set to 1, it means that the clients are not allowed to send
any Tx on the current channel, and that includes the
deauthentication frame.
Make sure that we don't send the deauthentication frame in
this case.
In iwlwifi, this caused a failure to flush queues since the
firmware already closed the queues after having parsed the
CSA IE. Then mac80211 would wait until the deauthentication
frame would go out (drv_flush(drop=false)) and that would
never happen.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0007e94355 ]
When performing a channel switch flow for a managed interface, the
flow did not update the bandwidth of the AP station and the rate
scale algorithm. In case of a channel width downgrade, this would
result with the rate scale algorithm using a bandwidth that does not
match the interface channel configuration.
Fix this by updating the AP station bandwidth and rate scaling algorithm
before the actual channel change in case of a bandwidth downgrade, or
after the actual channel change in case of a bandwidth upgrade.
Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit f3ffb6c3a2 ]
We hit a problem with iwlwifi that was caused by a bug in
mac80211. A bug in iwlwifi caused the firwmare to crash in
certain cases in channel switch. Because of that bug,
drv_pre_channel_switch would fail and trigger the restart
flow.
Now we had the hw restart worker which runs on the system's
workqueue and the csa_connection_drop_work worker that runs
on mac80211's workqueue that can run together. This is
obviously problematic since the restart work wants to
reconfigure the connection, while the csa_connection_drop_work
worker does the exact opposite: it tries to disconnect.
Fix this by cancelling the csa_connection_drop_work worker
in the restart worker.
Note that this can sound racy: we could have:
driver iface_work CSA_work restart_work
+++++++++++++++++++++++++++++++++++++++++++++
|
<--drv_cs ---|
<FW CRASH!>
-CS FAILED-->
| |
| cancel_work(CSA)
schedule |
CSA work |
| |
Race between those 2
But this is not possible because we flush the workqueue
in the restart worker before we cancel the CSA worker.
That would be bullet proof if we could guarantee that
we schedule the CSA worker only from the iface_work
which runs on the workqueue (and not on the system's
workqueue), but unfortunately we do have an instance
in which we schedule the CSA work outside the context
of the workqueue (ieee80211_chswitch_done).
Note also that we should probably cancel other workers
like beacon_connection_loss_work and possibly others
for different types of interfaces, at the very least,
IBSS should suffer from the exact same problem, but for
now, do the minimum to fix the actual bug that was actually
experienced and reproduced.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 8442938c3a ]
The "chandef->center_freq1" variable is a u32 but "freq" is a u16 so we
are truncating away the high bits. I noticed this bug because in commit
9cf0a0b4b6 ("cfg80211: Add support for 60GHz band channels 5 and 6")
we made "freq <= 56160 + 2160 * 6" a valid requency when before it was
only "freq <= 56160 + 2160 * 4" that was valid. It introduces a static
checker warning:
net/wireless/util.c:1571 ieee80211_chandef_to_operating_class()
warn: always true condition '(freq <= 56160 + 2160 * 6) => (0-u16max <= 69120)'
But really we probably shouldn't have been truncating the high bits
away to begin with.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c15e3f19a6 ]
When a Mac client saves an item containing a backslash to a file server
the backslash is represented in the CIFS/SMB protocol as as U+F026.
Before this change, listing a directory containing an item with a
backslash in its name will return that item with the backslash
represented with a true backslash character (U+005C) because
convert_sfm_character mapped U+F026 to U+005C when interpretting the
CIFS/SMB protocol response. However, attempting to open or stat the
path using a true backslash will result in an error because
convert_to_sfm_char does not map U+005C back to U+F026 causing the
CIFS/SMB request to be made with the backslash represented as U+005C.
This change simply prevents the U+F026 to U+005C conversion from
happenning. This is analogous to how the code does not do any
translation of UNI_SLASH (U+F000).
Signed-off-by: Jon Kuhn <jkuhn@barracuda.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 16fe10cf92 ]
The kernel module may sleep with holding a spinlock.
The function call paths (from bottom to top) in Linux-4.16 are:
[FUNC] usleep_range
drivers/net/ethernet/cadence/macb_main.c, 648:
usleep_range in macb_halt_tx
drivers/net/ethernet/cadence/macb_main.c, 730:
macb_halt_tx in macb_tx_error_task
drivers/net/ethernet/cadence/macb_main.c, 721:
_raw_spin_lock_irqsave in macb_tx_error_task
To fix this bug, usleep_range() is replaced with udelay().
This bug is found by my static analysis tool DSAC.
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4c85609b08 ]
This driver currently emits a STOP if the next message is not
I2C_MD_RD. It should not do it because it disturbs the I2C_RDWR
ioctl, where read/write transactions are combined without STOP
between.
Issue STOP only when the message is the last one _or_ flagged with
I2C_M_STOP.
Fixes: 6a62974b66 ("i2c: uniphier_f: add UniPhier FIFO-builtin I2C driver")
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 38f5d8d8cb ]
This driver currently emits a STOP if the next message is not
I2C_MD_RD. It should not do it because it disturbs the I2C_RDWR
ioctl, where read/write transactions are combined without STOP
between.
Issue STOP only when the message is the last one _or_ flagged with
I2C_M_STOP.
Fixes: dd6fd4a327 ("i2c: uniphier: add UniPhier FIFO-less I2C driver")
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 1d0ffd2642 ]
In raid10 reshape_request it gets max_sectors in read_balance. If the underlayer disks
have bad blocks, the max_sectors is less than last. It will call goto read_more many
times. It calls raise_barrier(conf, sectors_done != 0) every time. In this condition
sectors_done is not 0. So the value passed to the argument force of raise_barrier is
true.
In raise_barrier it checks conf->barrier when force is true. If force is true and
conf->barrier is 0, it panic. In this case reshape_request submits bio to under layer
disks. And in the callback function of the bio it calls lower_barrier. If the bio
finishes before calling raise_barrier again, it can trigger the BUG_ON.
Add one pair of raise_barrier/lower_barrier to fix this bug.
Signed-off-by: Xiao Ni <xni@redhat.com>
Suggested-by: Neil Brown <neilb@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d49b48f088 ]
gpiochip_add_data_with_key() adds the gpiochip to the gpio_devices list
before of_gpiochip_add() is called, but it's only the latter which sets
the ->of_xlate function pointer. gpiochip_find() can be called by
someone else between these two actions, and it can find the chip and
call of_gpiochip_match_node_and_xlate() which leads to the following
crash due to a NULL ->of_xlate().
Unhandled prefetch abort: page domain fault (0x01b) at 0x00000000
Modules linked in: leds_gpio(+) gpio_generic(+)
CPU: 0 PID: 830 Comm: insmod Not tainted 4.18.0+ #43
Hardware name: ARM-Versatile Express
PC is at (null)
LR is at of_gpiochip_match_node_and_xlate+0x2c/0x38
Process insmod (pid: 830, stack limit = 0x(ptrval))
(of_gpiochip_match_node_and_xlate) from (gpiochip_find+0x48/0x84)
(gpiochip_find) from (of_get_named_gpiod_flags+0xa8/0x238)
(of_get_named_gpiod_flags) from (gpiod_get_from_of_node+0x2c/0xc8)
(gpiod_get_from_of_node) from (devm_fwnode_get_index_gpiod_from_child+0xb8/0x144)
(devm_fwnode_get_index_gpiod_from_child) from (gpio_led_probe+0x208/0x3c4 [leds_gpio])
(gpio_led_probe [leds_gpio]) from (platform_drv_probe+0x48/0x9c)
(platform_drv_probe) from (really_probe+0x1d0/0x3d4)
(really_probe) from (driver_probe_device+0x78/0x1c0)
(driver_probe_device) from (__driver_attach+0x120/0x13c)
(__driver_attach) from (bus_for_each_dev+0x68/0xb4)
(bus_for_each_dev) from (bus_add_driver+0x1a8/0x268)
(bus_add_driver) from (driver_register+0x78/0x10c)
(driver_register) from (do_one_initcall+0x54/0x1fc)
(do_one_initcall) from (do_init_module+0x64/0x1f4)
(do_init_module) from (load_module+0x2198/0x26ac)
(load_module) from (sys_finit_module+0xe0/0x110)
(sys_finit_module) from (ret_fast_syscall+0x0/0x54)
One way to fix this would be to rework the hairy registration sequence
in gpiochip_add_data_with_key(), but since I'd probably introduce a
couple of new bugs if I attempted that, simply add a check for a
non-NULL of_xlate function pointer in
of_gpiochip_match_node_and_xlate(). This works since the driver looking
for the gpio will simply fail to find the gpio and defer its probe and
be reprobed when the driver which is registering the gpiochip has fully
completed its probe.
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4f0223bfe9 ]
nl80211_update_ft_ies() tried to validate NL80211_ATTR_IE with
is_valid_ie_attr() before dereferencing it, but that helper function
returns true in case of NULL pointer (i.e., attribute not included).
This can result to dereferencing a NULL pointer. Fix that by explicitly
checking that NL80211_ATTR_IE is included.
Fixes: 355199e02b ("cfg80211: Extend support for IEEE 802.11r Fast BSS Transition")
Signed-off-by: Arunk Khandavalli <akhandav@codeaurora.org>
Signed-off-by: Jouni Malinen <jouni@codeaurora.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 455c4401fe ]
If there are packets in hardware when changing the speed
or duplex, it may cause hardware hang up.
This patch adds netif_carrier_off before change speed and
duplex in ethtool_ops.set_link_ksettings, and adds
netif_carrier_on after complete the change.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 1f631c3201 ]
IEEE 802.11-2016 14.10.8.3 HWMP sequence numbering says:
If it is a target mesh STA, it shall update its own HWMP SN to
maximum (current HWMP SN, target HWMP SN in the PREQ element) + 1
immediately before it generates a PREP element in response to a
PREQ element.
Signed-off-by: Yuan-Chi Pang <fu3mo6goo@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d7c863a2f6 ]
The mac80211_hwsim driver intends to say that it supports up to four
STBC receive streams, but instead it ends up saying something undefined.
The IEEE80211_VHT_CAP_RXSTBC_X macros aren't independent bits that can
be ORed together, but values. In this case, _4 is the appropriate one
to use.
Signed-off-by: Danek Duvall <duvall@comfychair.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 67d1ba8a6d ]
The mod mask for VHT capabilities intends to say that you can override
the number of STBC receive streams, and it does, but only by accident.
The IEEE80211_VHT_CAP_RXSTBC_X aren't bits to be set, but values (albeit
left-shifted). ORing the bits together gets the right answer, but we
should use the _MASK macro here instead.
Signed-off-by: Danek Duvall <duvall@comfychair.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 46dec40fb7 ]
This fixes a bug which causes guest virtual addresses to get translated
to guest real addresses incorrectly when the guest is using the HPT MMU
and has more than 256GB of RAM, or more specifically has a HPT larger
than 2GB. This has showed up in testing as a failure of the host to
emulate doorbell instructions correctly on POWER9 for HPT guests with
more than 256GB of RAM.
The bug is that the HPTE index in kvmppc_mmu_book3s_64_hv_xlate()
is stored as an int, and in forming the HPTE address, the index gets
shifted left 4 bits as an int before being signed-extended to 64 bits.
The simple fix is to make the variable a long int, matching the
return type of kvmppc_hv_find_lock_hpte(), which is what calculates
the index.
Fixes: 697d3899dc ("KVM: PPC: Implement MMIO emulation support for Book3S HV guests")
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 77cfaf52ec ]
The TXQ teardown code can reference the vif data structures that are
stored in the netdev private memory area if there are still packets on
the queue when it is being freed. Since the TXQ teardown code is run
after the netdevs are freed, this can lead to a use-after-free. Fix this
by moving the TXQ teardown code to earlier in ieee80211_unregister_hw().
Reported-by: Ben Greear <greearb@candelatech.com>
Tested-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e0bf2d4982 upstream.
Apparently, this driver (or the hardware) does not support character
length settings. It's apparently running in 8-bit mode, but it makes
userspace believe it's in 5-bit mode. That makes tcsetattr with CS8
incorrectly fail, breaking e.g. getty from busybox, thus the login shell
on ttyMVx.
Fix by hard-wiring CS8 into c_cflag.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Fixes: 30530791a7 ("serial: mvebu-uart: initial support for Armada-3700 serial port")
Cc: stable <stable@vger.kernel.org> # 4.6+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ad608fbcf1 upstream.
The event subscriptions are added to the subscribed event list while
holding a spinlock, but that lock is subsequently released while still
accessing the subscription object. This makes it possible to unsubscribe
the event --- and freeing the subscription object's memory --- while
the subscription object is simultaneously accessed.
Prevent this by adding a mutex to serialise the event subscription and
unsubscription. This also gives a guarantee to the callback ops that the
add op has returned before the del op is called.
This change also results in making the elems field less special:
subscriptions are only added to the event list once they are fully
initialised.
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Reviewed-by: Hans Verkuil <hans.verkuil@cisco.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Cc: stable@vger.kernel.org # for 4.14 and up
Fixes: c3b5b0241f ("V4L/DVB: V4L: Events: Add backend")
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 7fd6d98b89 ]
Commit 7ae81952cda ("i2c: i801: Allow ACPI SystemIO OpRegion to conflict
with PCI BAR") made it possible for AML code to access SMBus I/O ports
by installing custom SystemIO OpRegion handler and blocking i80i driver
access upon first AML read/write to this OpRegion.
However, while ThinkPad T560 does have SystemIO OpRegion declared under
the SMBus device, it does not access any of the SMBus registers:
Device (SMBU)
{
...
OperationRegion (SMBP, PCI_Config, 0x50, 0x04)
Field (SMBP, DWordAcc, NoLock, Preserve)
{
, 5,
TCOB, 11,
Offset (0x04)
}
Name (TCBV, 0x00)
Method (TCBS, 0, NotSerialized)
{
If ((TCBV == 0x00))
{
TCBV = (\_SB.PCI0.SMBU.TCOB << 0x05)
}
Return (TCBV) /* \_SB_.PCI0.SMBU.TCBV */
}
OperationRegion (TCBA, SystemIO, TCBS (), 0x10)
Field (TCBA, ByteAcc, NoLock, Preserve)
{
Offset (0x04),
, 9,
CPSC, 1
}
}
Problem with the current approach is that it blocks all I/O port access
and because this system has touchpad connected to the SMBus controller
after first AML access (happens during suspend/resume cycle) the
touchpad fails to work anymore.
Fix this so that we allow ACPI AML I/O port access if it does not touch
the region reserved for the SMBus.
Fixes: 7ae81952cda ("i2c: i801: Allow ACPI SystemIO OpRegion to conflict with PCI BAR")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=200737
Reported-by: Yussuf Khalil <dev@pp3345.net>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 1d8f574708 ]
An unfortunate consequence of having a strong typing for the input
values to the SMC call is that it also affects the type of the
return values, limiting r0 to 32 bits and r{1,2,3} to whatever
was passed as an input.
Let's turn everything into "unsigned long", which satisfies the
requirements of both architectures, and allows for the full
range of return values.
Reported-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2ab4d0e742 ]
For SI/Kv, the power state is managed by function
amdgpu_pm_compute_clocks.
when dpm enabled, we should call amdgpu_pm_compute_clocks
to update current power state instand of set boot state.
this change can fix the oops when kfd driver was enabled on Kv.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ee400a3f1b ]
In 'e1000_set_ringparam()', the tx_ring and rx_ring are updated with new value
and the old tx/rx rings are freed only when the device is up. There are resource
leaks on old tx/rx rings when the device is not up. This bug is reported by COD,
a tool for testing kernel module binaries I am building.
This patch fixes the bug by always calling 'kfree()' on old tx/rx rings in
'e1000_set_ringparam()'.
Signed-off-by: Bo Chen <chenbo@pdx.edu>
Reviewed-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit cf1acec008 ]
When the device is not up, the call to 'e1000_up()' from the error handling path
of 'e1000_set_ringparam()' causes a kernel oops with a null-pointer
dereference. The null-pointer dereference is triggered in function
'e1000_alloc_rx_buffers()' at line 'buffer_info = &rx_ring->buffer_info[i]'.
This bug was reported by COD, a tool for testing kernel module binaries I am
building. This bug was also detected by KFI from Dr. Kai Cong.
This patch fixes the bug by checking on 'netif_running()' before calling
'e1000_up()' in 'e1000_set_ringparam()'.
Signed-off-by: Bo Chen <chenbo@pdx.edu>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b1ccd4c0ab ]
skb->truesize is not meant to be tracking amount of used bytes in a skb,
but amount of reserved/consumed bytes in memory.
For instance, if we use a single byte in last page fragment, we have to
account the full size of the fragment.
So skb_add_rx_frag needs to calculate the length of the entire buffer into
turesize.
Fixes: 9cbe9fd521 ("net: hns: optimize XGE capability by reducing cpu usage")
Signed-off-by: Huazhong tan <tanhuazhong@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3ed614dce3 ]
When enable the config item "CONFIG_ARM64_64K_PAGES", the size of PAGE_SIZE
is 65536(64K). But the type of length and page_offset are u16, they will
overflow. So change them to u32.
Fixes: 6fe6611ff2 ("net: add Hisilicon Network Subsystem hnae framework support")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 152395fd03 ]
When thermal zone is in passive mode, disabling its mode from
sysfs is NOT taking effect at all, it is still polling the
temperature of the disabled thermal zone and handling all thermal
trips, it makes user confused. The disabling operation should
disable the thermal zone behavior completely, for both active and
passive mode, this patch clears the passive_delay when thermal
zone is disabled and restores it when it is enabled.
Signed-off-by: Anson Huang <Anson.Huang@nxp.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 76271809f4 ]
Successive iterations of halting and resuming the management chip (MCP)
might fail, since currently the driver doesn't wait for these operations to
actually take place.
This patch prevents the driver from moving forward before the operations
are reflected in the state register.
Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit f00d25f315 ]
The MFW might be reset and re-update its shared memory.
Upon the detection of such a reset the driver rereads this memory, but it
has to wait till the data is valid.
This patch adds the missing wait for a data ready indication.
Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8cdb5240ec upstream.
When expanding the extra isize space, we must never move the
system.data xattr out of the inode body. For performance reasons, it
doesn't make any sense, and the inline data implementation assumes
that system.data xattr is never in the external xattr block.
This addresses CVE-2018-10880
https://bugzilla.kernel.org/show_bug.cgi?id=200005
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
[groeck: Context changes]
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d26c25a9d1 upstream.
We currently allow userspace to access the core register file
in about any possible way, including straddling multiple
registers and doing unaligned accesses.
This is not the expected use of the ABI, and nobody is actually
using it that way. Let's tighten it by explicitly checking
the size and alignment for each field of the register file.
Cc: <stable@vger.kernel.org>
Fixes: 2f4a07c5f9 ("arm64: KVM: guest one-reg interface")
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
[maz: rewrote Dave's initial patch to be more easily backported]
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8c39e2699f upstream.
Signed-off-by: Vincent Pelletier <plr.vincent@gmail.com>
Reviewed-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
[plr.vincent@gmail.com: hunk context change for 4.4 and 4.9, no code change]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d623500b3c upstream.
If a packet stream uses an UnsupportedVL (virtual lane), the send
engine will not send the packet, and it will not indicate that an
error has occurred. This will cause the packet stream to block.
HFI has 8 virtual lanes available for packet streams. Each lane can
be enabled or disabled using the UnsupportedVL mask. If a lane is
disabled, adding a packet to the send context must be disallowed.
The current mask for determining unsupported VLs defaults to 0 (allow
all). This is incorrect. Only the VLs that are defined should be
allowed.
Determine which VLs are disabled (mtu == 0), and set the appropriate
unsupported bit in the mask. The correct mask will allow the send
engine to error on the invalid VL, and error recovery will work
correctly.
Cc: <stable@vger.kernel.org> # 4.9.x+
Fixes: 7724105686 ("IB/hfi1: add driver files")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Lukasz Odzioba <lukasz.odzioba@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ee92efe41c upstream.
Use different loop variables for the inner and outer loop. This avoids
that an infinite loop occurs if there are more RDMA channels than
target->req_ring_size.
Fixes: d92c0da71a ("IB/srp: Add multichannel support")
Cc: <stable@vger.kernel.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c183813fce upstream.
usb_driver_claim_interface() disables and re-enables Link Power
Management, but it shouldn't do either one, for the reasons listed
below. This patch removes the two LPM-related function calls from the
routine.
The reason for disabling LPM in the analogous function
usb_probe_interface() is so that drivers won't have to deal with
unwanted LPM transitions in their probe routine. But
usb_driver_claim_interface() doesn't call the driver's probe routine
(or any other callbacks), so that reason doesn't apply here.
Furthermore, no driver other than usbfs will ever call
usb_driver_claim_interface() unless it is already bound to another
interface in the same device, which means disabling LPM here would be
redundant. usbfs doesn't interact with LPM at all.
Lastly, the error return from usb_unlocked_disable_lpm() isn't handled
properly; the code doesn't clean up its earlier actions before
returning.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Fixes: 8306095fd2 ("USB: Disable USB 3.0 LPM in critical sections.")
CC: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e871db8d78 upstream.
This reverts commit 6e22e3af7b.
The bug the patch describes to, has been already fixed in commit
2df6948428 ("USB: cdc-wdm: don't enable interrupts in USB-giveback")
so need to this, revert it.
Fixes: 6e22e3af7b ("usb: cdc-wdm: Fix a sleep-in-atomic-context bug in service_outstanding_interrupt()")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7a68d9fb85 upstream.
Requesting a ZERO_PACKET or not is sensible only for output.
In the input direction the device decides.
Likewise accepting short packets makes sense only for input.
This allows operation with panic_on_warn without opening up
a local DOS.
Signed-off-by: Oliver Neukum <oneukum@suse.com>
Reported-by: syzbot+843efa30c8821bd69f53@syzkaller.appspotmail.com
Fixes: 0cb54a3e47 ("USB: debugging code shouldn't alter control flow")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f620d1d7af upstream.
media: uvcvideo: Support UVC 1.5 video probe & commit controls
The length of UVC 1.5 video control is 48, and it is 34 for UVC 1.1.
Change it to 48 for UVC 1.5 device, and the UVC 1.5 device can be
recognized.
More changes to the driver are needed for full UVC 1.5 compatibility.
However, at least the UVC 1.5 Realtek RTS5847/RTS5852 cameras have been
reported to work well.
[laurent.pinchart@ideasonboard.com: Factor out code to helper function, update size checks]
Cc: stable@vger.kernel.org
Signed-off-by: ming_qian <ming_qian@realsil.com.cn>
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Tested-by: Ana Guerrero Lopez <ana.guerrero@collabora.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c9a4cb204e upstream.
usb_find_alt_setting() takes a pointer to a struct usb_host_config as
an argument; it searches for an interface with specified interface and
alternate setting numbers in that config. However, it crashes if the
usb_host_config pointer argument is NULL.
Since this is a general-purpose routine, available for use in many
places, we want to to be more robust. This patch makes it return NULL
whenever the config argument is NULL.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: syzbot+19c3aaef85a89d451eac@syzkaller.appspotmail.com
CC: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bd729f9d67 upstream.
The syzbot fuzzing project found a use-after-free bug in the USB
core. The bug was caused by usbfs not unbinding from an interface
when the USB device file was closed, which led another process to
attempt the unbind later on, after the private data structure had been
deallocated.
The reason usbfs did not unbind the interface at the appropriate time
was because it thought the interface had never been claimed in the
first place. This was caused by the fact that
usb_driver_claim_interface() does not clean up properly when
device_bind_driver() returns an error. Although the error code gets
passed back to the caller, the iface->dev.driver pointer remains set
and iface->condition remains equal to USB_INTERFACE_BOUND.
This patch adds proper error handling to usb_driver_claim_interface().
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: syzbot+f84aa7209ccec829536f@syzkaller.appspotmail.com
CC: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fb6de923ca upstream.
dev_set_drvdata() needs to be called before device_register()
exposes device to userspace. Otherwise kernel crashes after it
gets null pointer from dev_get_drvdata() when userspace tries
to access sysfs entries.
[Removed backtrace for length -- broonie]
Signed-off-by: Yu Zhao <yuzhao@google.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8dbbaa47b9 upstream.
When interrupted, wait_event_interruptible_timeout() returns
-ERESTARTSYS, and the SPI transfer in progress will fail, as expected:
m25p80 spi0.0: SPI transfer failed: -512
spi_master spi0: failed to transfer one message from queue
However, as the underlying DMA transfers may not have completed, all
subsequent SPI transfers may start to fail:
spi_master spi0: receive timeout
qspi_transfer_out_in() returned -110
m25p80 spi0.0: SPI transfer failed: -110
spi_master spi0: failed to transfer one message from queue
Fix this by calling dmaengine_terminate_all() not only for timeouts, but
also for errors.
This can be reproduced on r8a7991/koelsch, using "hd /dev/mtd0" followed
by CTRL-C.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c1ca59c22c upstream.
If the SPI queue is running during system suspend, the system may lock
up.
Fix this by stopping/restarting the queue during system suspend/resume,
by calling spi_master_suspend()/spi_master_resume() from the PM
callbacks. In-kernel users will receive an -ESHUTDOWN error while
system suspend/resume is in progress.
Based on a patch for sh-msiof by Gaku Inami.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ffa69d6a16 upstream.
If the SPI queue is running during system suspend, the system may lock
up.
Fix this by stopping/restarting the queue during system suspend/resume
by calling spi_master_suspend()/spi_master_resume() from the PM
callbacks. In-kernel users will receive an -ESHUTDOWN error while
system suspend/resume is in progress.
Signed-off-by: Gaku Inami <gaku.inami.xw@bp.renesas.com>
Signed-off-by: Hiromitsu Yamasaki <hiromitsu.yamasaki.ym@renesas.com>
[geert: Cleanup, reword]
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7001cab1da upstream.
Depending on the SPI instance one may get an interrupt storm upon
requesting resp. interrupt unless the clock is explicitly enabled
beforehand. This has been observed trying to bring up instance 4 on
T20.
Signed-off-by: Marcel Ziswiler <marcel.ziswiler@toradex.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3216c622a2 upstream.
The function tty_port_tty_get() gets a reference to the tty. Since
the code is not using tty_port_tty_set(), the reference is kept
even after closing the tty.
Avoid using tty_port_tty_get() by directly access the tty instance.
Since lpuart_start_rx_dma() is called from the .startup() and
.set_termios() callback, it is safe to assume the tty instance is
valid.
Cc: stable@vger.kernel.org # v4.9+
Fixes: 5887ad43ee ("tty: serial: fsl_lpuart: Use cyclic DMA for Rx")
Signed-off-by: Stefan Agner <stefan@agner.ch>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 65eea8edc3 upstream.
The final field of a floppy_struct is the field "name", which is a pointer
to a string in kernel memory. The kernel pointer should not be copied to
user memory. The FDGETPRM ioctl copies a floppy_struct to user memory,
including this "name" field. This pointer cannot be used by the user
and it will leak a kernel address to user-space, which will reveal the
location of kernel code and data and undermine KASLR protection.
Model this code after the compat ioctl which copies the returned data
to a previously cleared temporary structure on the stack (excluding the
name pointer) and copy out to userspace from there. As we already have
an inparam union with an appropriate member and that memory is already
cleared even for read only calls make use of that as a temporary store.
Based on an initial patch by Brian Belleville.
CVE-2018-7755
Signed-off-by: Andy Whitcroft <apw@canonical.com>
Broke up long line.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5b7b15aee6 ]
We're encoding a single op in the reply but leaving the number of ops
zero, so the reply makes no sense.
Somewhat academic as this isn't a case any real client will hit, though
in theory perhaps that could change in a future protocol extension.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 7279d99175 ]
men_z127_debounce() tries to round up and down, but uses functions which
are only suitable when the divider is a power of two, which is not the
case. Use the appropriate ones.
Found by static check. Compile tested.
Fixes: f436bc2726 ("gpio: add driver for MEN 16Z127 GPIO controller")
Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 9f2d1e68cf ]
Livepatch modules are special in that we preserve their entire symbol
tables in order to be able to apply relocations after module load. The
unwanted side effect of this is that undefined (SHN_UNDEF) symbols of
livepatch modules are accessible via the kallsyms api and this can
confuse symbol resolution in livepatch (klp_find_object_symbol()) and
cause subtle bugs in livepatch.
Have the module kallsyms api skip over SHN_UNDEF symbols. These symbols
are usually not available for normal modules anyway as we cut down their
symbol tables to just the core (non-undefined) symbols, so this should
really just affect livepatch modules. Note that this patch doesn't
affect the display of undefined symbols in /proc/kallsyms.
Reported-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e01b4f6242 ]
Sometime a component or topology may configure a DAI widget with no
private data leading to a dev_dbg() dereferencne of this data.
Fix this to check for non NULL private data and let users know if widget
is missing DAI.
Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0592e57b24 ]
LBR has a limited stack size. If a task has a deeper call stack than
LBR's stack size, only the overflowed part is reported. A complete call
stack may not be reconstructed by perf tool.
Current code doesn't access all LBR registers. It only read the ones
below the TOS. The LBR registers above the TOS will be discarded
unconditionally.
When a CALL is captured, the TOS is incremented by 1 , modulo max LBR
stack size. The LBR HW only records the call stack information to the
register which the TOS points to. It will not touch other LBR
registers. So the registers above the TOS probably still store the valid
call stack information for an overflowed call stack, which need to be
reported.
To retrieve complete call stack information, we need to start from TOS,
read all LBR registers until an invalid entry is detected.
0s can be used to detect the invalid entry, because:
- When a RET is captured, the HW zeros the LBR register which TOS points
to, then decreases the TOS.
- The LBR registers are reset to 0 when adding a new LBR event or
scheduling an existing LBR event.
- A taken branch at IP 0 is not expected
The context switch code is also modified to save/restore all valid LBR
registers. Furthermore, the LBR registers, which don't have valid call
stack information, need to be reset in restore, because they may be
polluted while swapped out.
Here is a small test program, tchain_deep.
Its call stack is deeper than 32.
noinline void f33(void)
{
int i;
for (i = 0; i < 10000000;) {
if (i%2)
i++;
else
i++;
}
}
noinline void f32(void)
{
f33();
}
noinline void f31(void)
{
f32();
}
... ...
noinline void f1(void)
{
f2();
}
int main()
{
f1();
}
Here is the test result on SKX. The max stack size of SKX is 32.
Without the patch:
$ perf record -e cycles --call-graph lbr -- ./tchain_deep
$ perf report --stdio
#
# Children Self Command Shared Object Symbol
# ........ ........ ........... ................ .................
#
100.00% 99.99% tchain_deep tchain_deep [.] f33
|
--99.99%--f30
f31
f32
f33
With the patch:
$ perf record -e cycles --call-graph lbr -- ./tchain_deep
$ perf report --stdio
# Children Self Command Shared Object Symbol
# ........ ........ ........... ................ ..................
#
99.99% 0.00% tchain_deep tchain_deep [.] f1
|
---f1
f2
f3
f4
f5
f6
f7
f8
f9
f10
f11
f12
f13
f14
f15
f16
f17
f18
f19
f20
f21
f22
f23
f24
f25
f26
f27
f28
f29
f30
f31
f32
f33
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@kernel.org
Cc: eranian@google.com
Link: https://lore.kernel.org/lkml/1528213126-4312-1-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d0d378ff45 ]
With CONFIG_FORTIFY_SOURCE, memcpy uses the declared size of operands to
detect buffer overflows. If src or dest is declared as a char, attempts to
copy more than byte will result in a fortify_panic().
Address this problem in mvebu_setup_boot_addr_wa() by declaring
mvebu_boot_wa_start and mvebu_boot_wa_end as character arrays. Also remove
a couple addressof operators to avoid "arithmetic on pointer to an
incomplete type" compiler error.
See commit 54a7d50b92 ("x86: mark kprobe templates as character arrays,
not single characters") for a similar fix.
Fixes "detected buffer overflow in memcpy" error during init on some mvebu
systems (armada-370-xp, armada-375):
(fortify_panic) from (mvebu_setup_boot_addr_wa+0xb0/0xb4)
(mvebu_setup_boot_addr_wa) from (mvebu_v7_cpu_pm_init+0x154/0x204)
(mvebu_v7_cpu_pm_init) from (do_one_initcall+0x7c/0x1a8)
(do_one_initcall) from (kernel_init_freeable+0x1bc/0x254)
(kernel_init_freeable) from (kernel_init+0x8/0x114)
(kernel_init) from (ret_from_fork+0x14/0x2c)
Signed-off-by: Ethan Tuttle <ethan@ethantuttle.com>
Tested-by: Ethan Tuttle <ethan@ethantuttle.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4ec7cece87 ]
Otherwise we can get:
WARNING: CPU: 0 PID: 55 at drivers/net/wireless/ti/wlcore/io.h:84
I've only seen this few times with the runtime PM patches enabled
so this one is probably not needed before that. This seems to
work currently based on the current PM implementation timer. Let's
apply this separately though in case others are hitting this issue.
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ae636fb155 ]
This is a static checker fix, not something I have tested. The issue
is that on the second iteration through the loop, we jump forward by
le32_to_cpu(auth_req->length) bytes. The problem is that if the length
is more than "buflen" then we end up with a negative "buflen". A
negative buflen is type promoted to a high positive value and the loop
continues but it's accessing beyond the end of the buffer.
I believe the "auth_req->length" comes from the firmware and if the
firmware is malicious or buggy, you're already toasted so the impact of
this bug is probably not very severe.
Fixes: 030645aceb ("rndis_wlan: handle 802.11 indications from device")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ab4e6ee578 ]
Since a phy_device is added to the global mdio_bus list during
phy_device_register(), but a phy_device's phy_driver doesn't get
attached until phy_probe(). It's possible of_phy_find_device() in
xgmiitorgmii will return a valid phy with a NULL phy_driver. Leading to
a NULL pointer access during the memcpy().
Fixes this Oops:
Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = c0004000
[00000000] *pgd=00000000
Internal error: Oops: 5 [#1] PREEMPT SMP ARM
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.14.40 #1
Hardware name: Xilinx Zynq Platform
task: ce4c8d00 task.stack: ce4ca000
PC is at memcpy+0x48/0x330
LR is at xgmiitorgmii_probe+0x90/0xe8
pc : [<c074bc68>] lr : [<c0529548>] psr: 20000013
sp : ce4cbb54 ip : 00000000 fp : ce4cbb8c
r10: 00000000 r9 : 00000000 r8 : c0c49178
r7 : 00000000 r6 : cdc14718 r5 : ce762800 r4 : cdc14710
r3 : 00000000 r2 : 00000054 r1 : 00000000 r0 : cdc14718
Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
Control: 18c5387d Table: 0000404a DAC: 00000051
Process swapper/0 (pid: 1, stack limit = 0xce4ca210)
...
[<c074bc68>] (memcpy) from [<c0529548>] (xgmiitorgmii_probe+0x90/0xe8)
[<c0529548>] (xgmiitorgmii_probe) from [<c0526a94>] (mdio_probe+0x28/0x34)
[<c0526a94>] (mdio_probe) from [<c04db98c>] (driver_probe_device+0x254/0x414)
[<c04db98c>] (driver_probe_device) from [<c04dbd58>] (__device_attach_driver+0xac/0x10c)
[<c04dbd58>] (__device_attach_driver) from [<c04d96f4>] (bus_for_each_drv+0x84/0xc8)
[<c04d96f4>] (bus_for_each_drv) from [<c04db5bc>] (__device_attach+0xd0/0x134)
[<c04db5bc>] (__device_attach) from [<c04dbdd4>] (device_initial_probe+0x1c/0x20)
[<c04dbdd4>] (device_initial_probe) from [<c04da8fc>] (bus_probe_device+0x98/0xa0)
[<c04da8fc>] (bus_probe_device) from [<c04d8660>] (device_add+0x43c/0x5d0)
[<c04d8660>] (device_add) from [<c0526cb8>] (mdio_device_register+0x34/0x80)
[<c0526cb8>] (mdio_device_register) from [<c0580b48>] (of_mdiobus_register+0x170/0x30c)
[<c0580b48>] (of_mdiobus_register) from [<c05349c4>] (macb_probe+0x710/0xc00)
[<c05349c4>] (macb_probe) from [<c04dd700>] (platform_drv_probe+0x44/0x80)
[<c04dd700>] (platform_drv_probe) from [<c04db98c>] (driver_probe_device+0x254/0x414)
[<c04db98c>] (driver_probe_device) from [<c04dbc58>] (__driver_attach+0x10c/0x118)
[<c04dbc58>] (__driver_attach) from [<c04d9600>] (bus_for_each_dev+0x8c/0xd0)
[<c04d9600>] (bus_for_each_dev) from [<c04db1fc>] (driver_attach+0x2c/0x30)
[<c04db1fc>] (driver_attach) from [<c04daa98>] (bus_add_driver+0x50/0x260)
[<c04daa98>] (bus_add_driver) from [<c04dc440>] (driver_register+0x88/0x108)
[<c04dc440>] (driver_register) from [<c04dd6b4>] (__platform_driver_register+0x50/0x58)
[<c04dd6b4>] (__platform_driver_register) from [<c0b31248>] (macb_driver_init+0x24/0x28)
[<c0b31248>] (macb_driver_init) from [<c010203c>] (do_one_initcall+0x60/0x1a4)
[<c010203c>] (do_one_initcall) from [<c0b00f78>] (kernel_init_freeable+0x15c/0x1f8)
[<c0b00f78>] (kernel_init_freeable) from [<c0763d10>] (kernel_init+0x18/0x124)
[<c0763d10>] (kernel_init) from [<c0112d74>] (ret_from_fork+0x14/0x20)
Code: ba000002 f5d1f03c f5d1f05c f5d1f07c (e8b151f8)
---[ end trace 3e4ec21905820a1f ]---
Signed-off-by: Brandon Maier <brandon.maier@rockwellcollins.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 168f75f11f ]
While debugging driver crashes related to a buggy firmware
crashing under load, I noticed that ath10k_htt_rx_ring_free
could be called without being under lock. I'm not sure if this
is the root cause of the crash or not, but it seems prudent to
protect it.
Originally tested on 4.16+ kernel with ath10k-ct 10.4 firmware
running on 9984 NIC.
Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2ec7debd44 ]
The struct clk_init_data init variable is declared in the isp_xclk_init()
function so is an automatic variable allocated in the stack. But it's not
explicitly zero-initialized, so some init fields are left uninitialized.
This causes the data structure to have undefined values that may confuse
the common clock framework when the clock is registered.
For example, the uninitialized .flags field could have the CLK_IS_CRITICAL
bit set, causing the framework to wrongly prepare the clk on registration.
This leads to the isp_xclk_prepare() callback being called, which in turn
calls to the omap3isp_get() function that increments the isp dev refcount.
Since this omap3isp_get() call is unexpected, this leads to an unbalanced
omap3isp_get() call that prevents the requested IRQ to be later enabled,
due the refcount not being 0 when the correct omap3isp_get() call happens.
Fixes: 9b28ee3c91 ("[media] omap3isp: Use the common clock framework")
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Reviewed-by: Sebastian Reichel <sebastian.reichel@collabora.co.uk>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 22216ec41e ]
The banding filter ON/OFF is controlled via bit 5 of COM8 register. It
is attempted to be enabled in ov772x_set_params() by the following line.
ret = ov772x_mask_set(client, COM8, BNDF_ON_OFF, 1);
But this unexpectedly results disabling the banding filter, because the
mask and set bits are exclusive.
On the other hand, ov772x_s_ctrl() correctly sets the bit by:
ret = ov772x_mask_set(client, COM8, BNDF_ON_OFF, BNDF_ON_OFF);
The same fix was already applied to non-soc_camera version of ov772x
driver in the commit commit a024ee14cd ("media: ov772x: correct setting
of banding filter")
Cc: Jacopo Mondi <jacopo+renesas@jmondi.org>
Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Cc: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 222bce5eb8 ]
Both calls to of_find_node_by_name() and of_get_next_child() return a
node pointer with refcount incremented thus it must be explicidly
decremented here after the last usage. As we are assured to have a
refcounted np either from the initial
of_find_node_by_name(NULL, name); or from the of_get_next_child(gpio, np)
in the while loop if we reached the error code path below, an
x of_node_put(np) is needed.
Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
Fixes: commit f3d9478b2c ("[ALSA] snd-aoa: add snd-aoa")
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 6b2ddf33ba ]
arch/s390/mm/extmem.c: In function '__segment_load':
arch/s390/mm/extmem.c:436:2: warning: 'strncat' specified bound 7 equals
source length [-Wstringop-overflow=]
strncat(seg->res_name, " (DCSS)", 7);
What gcc complains about here is the misuse of strncat function, which
in this case does not limit a number of bytes taken from "src", so it is
in the end the same as strcat(seg->res_name, " (DCSS)");
Keeping in mind that a res_name is 15 bytes, strncat in this case
would overflow the buffer and write 0 into alignment byte between the
fields in the struct. To avoid that increasing res_name size to 16,
and reusing strlcat.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5f936e19cc ]
Air Icy reported:
UBSAN: Undefined behaviour in kernel/time/alarmtimer.c:811:7
signed integer overflow:
1529859276030040771 + 9223372036854775807 cannot be represented in type 'long long int'
Call Trace:
alarm_timer_nsleep+0x44c/0x510 kernel/time/alarmtimer.c:811
__do_sys_clock_nanosleep kernel/time/posix-timers.c:1235 [inline]
__se_sys_clock_nanosleep kernel/time/posix-timers.c:1213 [inline]
__x64_sys_clock_nanosleep+0x326/0x4e0 kernel/time/posix-timers.c:1213
do_syscall_64+0xb8/0x3a0 arch/x86/entry/common.c:290
alarm_timer_nsleep() uses ktime_add() to add the current time and the
relative expiry value. ktime_add() has no sanity checks so the addition
can overflow when the relative timeout is large enough.
Use ktime_add_safe() which has the necessary sanity checks in place and
limits the result to the valid range.
Fixes: 9a7adcf5c6 ("timers: Posix interface for alarm-timers")
Reported-by: Team OWL337 <icytxw@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1807020926360.1595@nanos.tec.linutronix.de
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d3d4ffaae4 ]
We use PHB in mode1 which uses bit 59 to select a correct DMA window.
However there is mode2 which uses bits 59:55 and allows up to 32 DMA
windows per a PE.
Even though documentation does not clearly specify that, it seems that
the actual hardware does not support bits 59:55 even in mode1, in other
words we can create a window as big as 1<<58 but DMA simply won't work.
This reduces the upper limit from 59 to 55 bits to let the userspace know
about the hardware limits.
Fixes: 7aafac11e3 "powerpc/powernv/ioda2: Gracefully fail if too many TCE levels requested"
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d3ac5598c5 ]
Comparing an int to a size, which is unsigned, causes the int to become
unsigned, giving the wrong result. usb_get_descriptor can return a
negative error code.
A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)
// <smpl>
@@
int x;
expression e,e1;
identifier f;
@@
*x = f(...);
... when != x = e1
when != if (x < 0 || ...) { ... return ...; }
*x < sizeof(e)
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 1262dc09dc ]
Currently an open firmware property is copied into partition_name variable
without keeping a room for \0.
Later one, this variable (partition_name), which is 97 bytes long, is
strncpyed into ibmvcsci_host_data->madapter_info->partition_name, which is
96 bytes long, possibly truncating it 'again' and removing the \0.
This patch simply decreases the partition name to 96 and just copy using
strlcpy() which guarantees that the string is \0 terminated. I think there
is no issue if this there is a truncation in this very first copy, i.e,
when the open firmware property is read and copied into the driver for the
very first time;
This issue also causes the following warning on GCC 8:
drivers/scsi/ibmvscsi/ibmvscsi.c:281:2: warning: strncpy output may be truncated copying 96 bytes from a string of length 96 [-Wstringop-truncation]
...
inlined from ibmvscsi_probe at drivers/scsi/ibmvscsi/ibmvscsi.c:2221:7:
drivers/scsi/ibmvscsi/ibmvscsi.c:265:3: warning: strncpy specified bound 97 equals destination size [-Wstringop-truncation]
CC: Bart Van Assche <bart.vanassche@wdc.com>
CC: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
Acked-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 624fa7790f ]
In the scsi_transport_srp implementation it cannot be avoided to
iterate over a klist from atomic context when using the legacy block
layer instead of blk-mq. Hence this patch that makes it safe to use
klists in atomic context. This patch avoids that lockdep reports the
following:
WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
Possible interrupt unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&(&k->k_lock)->rlock);
local_irq_disable();
lock(&(&q->__queue_lock)->rlock);
lock(&(&k->k_lock)->rlock);
<Interrupt>
lock(&(&q->__queue_lock)->rlock);
stack backtrace:
Workqueue: kblockd blk_timeout_work
Call Trace:
dump_stack+0xa4/0xf5
check_usage+0x6e6/0x700
__lock_acquire+0x185d/0x1b50
lock_acquire+0xd2/0x260
_raw_spin_lock+0x32/0x50
klist_next+0x47/0x190
device_for_each_child+0x8e/0x100
srp_timed_out+0xaf/0x1d0 [scsi_transport_srp]
scsi_times_out+0xd4/0x410 [scsi_mod]
blk_rq_timed_out+0x36/0x70
blk_timeout_work+0x1b5/0x220
process_one_work+0x4fe/0xad0
worker_thread+0x63/0x5a0
kthread+0x1c1/0x1e0
ret_from_fork+0x24/0x30
See also commit c9ddf73476 ("scsi: scsi_transport_srp: Fix shost to
rport translation").
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: James Bottomley <jejb@linux.vnet.ibm.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 6d609b35c8 ]
When the RTC lock and unlock functions were introduced it was likely
assumed that they would always be called from irq enabled context, hence
the use of local_irq_disable/enable. This is no longer true as the
RTC+DDR path makes a late call during the suspend path after irqs
have been disabled to enable the RTC hwmod which calls both unlock and
lock, leading to IRQs being reenabled through the local_irq_enable call
in omap_hwmod_rtc_lock call.
To avoid this change the local_irq_disable/enable to
local_irq_save/restore to ensure that from whatever context this is
called the proper IRQ configuration is maintained.
Signed-off-by: Dave Gerlach <d-gerlach@ti.com>
Signed-off-by: Keerthy <j-keerthy@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c2d7c8ff89 ]
"nents" is an unsigned int, so if ib_map_mr_sg() returns a negative
error code then it's type promoted to a high unsigned int which is
treated as success.
Fixes: a060b5629a ("IB/core: generic RDMA READ/WRITE API")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 010228e4a9 ]
When one node leaves cluster or stops the resyncing
(resync or recovery) array, then other nodes need to
call recover_bitmaps to continue the unfinished task.
But we need to clear suspend_area later after other
nodes copy the resync information to their bitmap
(by call bitmap_copy_from_slot). Otherwise, all nodes
could write to the suspend_area even the suspend_area
is not handled by any node, because area_resyncing
returns 0 at the beginning of raid1_write_request.
Which means one node could write suspend_area while
another node is resyncing the same area, then data
could be inconsistent.
So let's clear suspend_area later to avoid above issue
with the protection of bm lock. Also it is straightforward
to clear suspend_area after nodes have copied the resync
info to bitmap.
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5bedf8aa03 ]
Since proc_dointvec does not perform value range control,
proc_dointvec_minmax should be used to limit value range, which is
clearly intended here, as the internal representation of the value:
unsigned int alloc_pgste:1;
In fact it currently works, since we have
mm->context.alloc_pgste = page_table_allocate_pgste || ...
... since commit 23fefe119c ("s390/kvm: avoid global config of vm.alloc_pgste=1")
Before that it was
mm->context.alloc_pgste = page_table_allocate_pgste;
which was broken. That was introduced with commit 0b46e0a3ec ("s390/kvm:
remove delayed reallocation of page tables for KVM").
Fixes: 0b46e0a3ec ("s390/kvm: remove delayed reallocation of page tables for KVM")
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 03bc05e1a4 ]
After decompression of 6lowpan socket data, an IPv6 header is inserted
before the existing socket payload. After this, we reset the
network_header value of the skb to account for the difference in payload
size from prior to decompression + the addition of the IPv6 header.
However, we fail to reset the mac_header value.
Leaving the mac_header value untouched here, can cause a calculation
error in net/packet/af_packet.c packet_rcv() function when an
AF_PACKET socket is opened in SOCK_RAW mode for use on a 6lowpan
interface.
On line 2088, the data pointer is moved backward by the value returned
from skb_mac_header(). If skb->data is adjusted so that it is before
the skb->head pointer (which can happen when an old value of mac_header
is left in place) the kernel generates a panic in net/core/skbuff.c
line 1717.
This panic can be generated by BLE 6lowpan interfaces (such as bt0) and
802.15.4 interfaces (such as lowpan0) as they both use the same 6lowpan
sources for compression and decompression.
Signed-off-by: Michael Scott <michael@opensourcefoundries.com>
Acked-by: Alexander Aring <aring@mojatatu.com>
Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a420b5d939 ]
Make sure to return -EIO in case of a short modem-status read request.
While at it, split the debug message to not include the (zeroed)
transfer-buffer content in case of errors.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3c120143f5 ]
Although the mapping has already been removed in the page table, it maybe
still exist in TLB. Suppose the freed IOVAs is reused by others before the
flush operation completed, the new user can not correctly access to its
meomory.
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Fixes: b1516a1465 ('iommu/amd: Implement flush queue')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 09bebb1adb ]
Vexpress platforms provide two different restart handlers: SYS_REBOOT
that restart the entire system, while DB_RESET only restarts the
daughter board containing the CPU. DB_RESET is overridden by SYS_REBOOT
if it exists.
notifier_chain_register used in register_restart_handler by design
relies on notifiers to be registered once only, however vexpress restart
notifier can get registered twice. When this happen it corrupts list
of notifiers, as result some notifiers can be not called on proper
event, traverse on list can be cycled forever, and second unregister
can access already freed memory.
So far, since this was the only restart handler in the system, no issue
was observed even if the same notifier was registered twice. However
commit 6c5c0d48b6 ("watchdog: sp805: add restart handler") added
support for SP805 restart handlers and since the system under test
contains two vexpress restart and two SP805 watchdog instances, it was
observed that during the boot traversing the restart handler list looped
forever as there's a cycle in that list resulting in boot hang.
This patch fixes the issues by ensuring that the notifier is installed
only once.
Cc: Sebastian Reichel <sre@kernel.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Fixes: 46c99ac662 ("power/reset: vexpress: Register with kernel restart handler")
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.co.uk>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 11b71782c1 ]
hwarc_probe() allocates memory for hwarc, but does not free it
if uwb_rc_add() or hwarc_get_version() fail.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Anton Vasilyev <vasilyev@ispras.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c5fae4f4fd ]
Currently the check on error return from the call to rtsx_write_register
is checking the error status from the previous call. Fix this by adding
in the missing assignment of retval.
Detected by CoverityScan, CID#709877
Fixes: fa590c222f ("staging: rts5208: add support for rts5208 and rts5288")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ce054546cc ]
ADC channel 0 photodiode detects both infrared + visible light,
but ADC channel 1 just detects infrared. However, the latter is a bit
more sensitive in that range so complete darkness or low light causes
a error condition in which the chan0 - chan1 is negative that
results in a -EAGAIN.
This patch changes the resulting lux1_input sysfs attribute message from
"Resource temporarily unavailable" to a user-grokable lux value of 0.
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Matt Ranostay <matt.ranostay@konsulko.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 308aa2b8f7 upstream.
Once the qp has been flushed, it cannot be flushed again. The user qp
flush logic wasn't enforcing it however. The bug can cause
touch-after-free crashes like:
Unable to handle kernel paging request for data at address 0x000001ec
Faulting instruction address: 0xc008000016069100
Oops: Kernel access of bad area, sig: 11 [#1]
...
NIP [c008000016069100] flush_qp+0x80/0x480 [iw_cxgb4]
LR [c00800001606cd6c] c4iw_modify_qp+0x71c/0x11d0 [iw_cxgb4]
Call Trace:
[c00800001606cd6c] c4iw_modify_qp+0x71c/0x11d0 [iw_cxgb4]
[c00800001606e868] c4iw_ib_modify_qp+0x118/0x200 [iw_cxgb4]
[c0080000119eae80] ib_security_modify_qp+0xd0/0x3d0 [ib_core]
[c0080000119c4e24] ib_modify_qp+0xc4/0x2c0 [ib_core]
[c008000011df0284] iwcm_modify_qp_err+0x44/0x70 [iw_cm]
[c008000011df0fec] destroy_cm_id+0xcc/0x370 [iw_cm]
[c008000011ed4358] rdma_destroy_id+0x3c8/0x520 [rdma_cm]
[c0080000134b0540] ucma_close+0x90/0x1b0 [rdma_ucm]
[c000000000444da4] __fput+0xe4/0x2f0
So fix flush_qp() to only flush the wq once.
Cc: stable@vger.kernel.org
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 91a2968e24 upstream.
The PCIE I/O and MEM resource allocation mechanism is that root bus
goes through the following steps:
1. Check PCI bridges' range and computes I/O and Mem base/limits.
2. Sort all subordinate devices I/O and MEM resource requirements and
allocate the resources and writes/updates subordinate devices'
requirements to PCI bridges I/O and Mem MEM/limits registers.
Currently, PCI Aardvark driver only handles the second step and lacks
the first step, so there is an I/O and MEM resource allocation failure
when using a PCI switch. This commit fixes that by sizing bridges
before doing the resource allocation.
Fixes: 8c39d71036 ("PCI: aardvark: Add Aardvark PCI host controller
driver")
Signed-off-by: Zachary Zhang <zhangzg@marvell.com>
[Thomas: edit commit log.]
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit de66a1a04c upstream.
Add support for USB based DS4 dongle device, which allows connecting
a DS4 through Bluetooth, but hides Bluetooth from the host system.
Signed-off-by: Roderick Colenbrander <roderick.colenbrander@sony.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d0cdb3ce88 upstream.
When a task which previously ran on a given CPU is remotely queued to
wake up on that same CPU, there is a period where the task's state is
TASK_WAKING and its vruntime is not normalized. This is not accounted
for in vruntime_normalized() which will cause an error in the task's
vruntime if it is switched from the fair class during this time.
For example if it is boosted to RT priority via rt_mutex_setprio(),
rq->min_vruntime will not be subtracted from the task's vruntime but
it will be added again when the task returns to the fair class. The
task's vruntime will have been erroneously doubled and the effective
priority of the task will be reduced.
Note this will also lead to inflation of all vruntimes since the doubled
vruntime value will become the rq's min_vruntime when other tasks leave
the rq. This leads to repeated doubling of the vruntime and priority
penalty.
Fix this by recognizing a WAKING task's vruntime as normalized only if
sched_remote_wakeup is true. This indicates a migration, in which case
the vruntime would have been normalized in migrate_task_rq_fair().
Based on a similar patch from John Dias <joaodias@google.com>.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Steve Muckle <smuckle@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Chris Redpath <Chris.Redpath@arm.com>
Cc: John Dias <joaodias@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Miguel de Dios <migueldedios@google.com>
Cc: Morten Rasmussen <Morten.Rasmussen@arm.com>
Cc: Patrick Bellasi <Patrick.Bellasi@arm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Quentin Perret <quentin.perret@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Todd Kjos <tkjos@google.com>
Cc: kernel-team@android.com
Fixes: b5179ac70d ("sched/fair: Prepare to fix fairness problems on migration")
Link: http://lkml.kernel.org/r/20180831224217.169476-1-smuckle@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fe18d64989 upstream.
Marking mmp bh dirty before writing it will make writeback
pick up mmp block later and submit a write, we don't want the
duplicate write as kmmpd thread should have full control of
reading and writing the mmp block.
Another reason is we will also have random I/O error on
the writeback request when blk integrity is enabled, because
kmmpd could modify the content of the mmp block(e.g. setting
new seq and time) while the mmp block is under I/O requested
by writeback.
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5f8c10936f upstream.
An online resize of a file system with the bigalloc feature enabled
and a 1k block size would be refused since ext4_resize_begin() did not
understand s_first_data_block is 0 for all bigalloc file systems, even
when the block size is 1k.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f0a459dec5 upstream.
Avoid growing the file system to an extent so that the last block
group is too small to hold all of the metadata that must be stored in
the block group.
This problem can be triggered with the following reproducer:
umount /mnt
mke2fs -F -m0 -b 4096 -t ext4 -O resize_inode,^has_journal \
-E resize=1073741824 /tmp/foo.img 128M
mount /tmp/foo.img /mnt
truncate --size 1708M /tmp/foo.img
resize2fs /dev/loop0 295400
umount /mnt
e2fsck -fy /tmp/foo.img
Reported-by: Torsten Hilbrich <torsten.hilbrich@secunet.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4274f516d4 upstream.
When mounting the superblock, ext4_fill_super() calculates the free
blocks and free inodes and stores them in the superblock. It's not
strictly necessary, since we don't use them any more, but it's nice to
keep them roughly aligned to reality.
Since it's not critical for file system correctness, the code doesn't
call ext4_commit_super(). The problem is that it's in
ext4_commit_super() that we recalculate the superblock checksum. So
if we're not going to call ext4_commit_super(), we need to call
ext4_superblock_csum_set() to make sure the superblock checksum is
consistent.
Most of the time, this doesn't matter, since we end up calling
ext4_commit_super() very soon thereafter, and definitely by the time
the file system is unmounted. However, it doesn't work in this
sequence:
mke2fs -Fq -t ext4 /dev/vdc 128M
mount /dev/vdc /vdc
cp xfstests/git-versions /vdc
godown /vdc
umount /vdc
mount /dev/vdc
tune2fs -l /dev/vdc
With this commit, the "tune2fs -l" no longer fails.
Reported-by: Chengguang Xu <cgxu519@gmx.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4d982e25d0 upstream.
A specially crafted file system can trick empty_inline_dir() into
reading past the last valid entry in a inline directory, and then run
into the end of xattr marker. This will trigger a divide by zero
fault. Fix this by using the size of the inline directory instead of
dir->i_size.
Also clean up error reporting in __ext4_check_dir_entry so that the
message is clearer and more understandable --- and avoids the division
by zero trap if the size passed in is zero. (I'm not sure why we
coded it that way in the first place; printing offset % size is
actually more confusing and less useful.)
https://bugzilla.kernel.org/show_bug.cgi?id=200933
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reported-by: Wen Xu <wen.xu@gatech.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e97267cb4d upstream.
vsa.console is indirectly controlled by user-space, hence leading to
a potential exploitation of the Spectre variant 1 vulnerability.
This issue was detected with the help of Smatch:
drivers/tty/vt/vt_ioctl.c:711 vt_ioctl() warn: potential spectre issue
'vc_cons' [r]
Fix this by sanitizing vsa.console before using it to index vc_cons
Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].
[1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2
Cc: stable@vger.kernel.org
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 79e765ad66 upstream.
On most systems with ACPI hotplugging support, it seems that we always
receive a hotplug event once we re-enable EC interrupts even if the GPU
hasn't even been resumed yet.
This can cause problems since even though we schedule hpd_work to handle
connector reprobing for us, hpd_work synchronizes on
pm_runtime_get_sync() to wait until the device is ready to perform
reprobing. Since runtime suspend/resume callbacks are disabled before
the PM core calls ->suspend(), any calls to pm_runtime_get_sync() during
this period will grab a runtime PM ref and return immediately with
-EACCES. Because we schedule hpd_work from our ACPI HPD handler, and
hpd_work synchronizes on pm_runtime_get_sync(), this causes us to launch
a connector reprobe immediately even if the GPU isn't actually resumed
just yet. This causes various warnings in dmesg and occasionally, also
prevents some displays connected to the dedicated GPU from coming back
up after suspend. Example:
usb 1-4: USB disconnect, device number 14
usb 1-4.1: USB disconnect, device number 15
WARNING: CPU: 0 PID: 838 at drivers/gpu/drm/nouveau/include/nvkm/subdev/i2c.h:170 nouveau_dp_detect+0x17e/0x370 [nouveau]
CPU: 0 PID: 838 Comm: kworker/0:6 Not tainted 4.17.14-201.Lyude.bz1477182.V3.fc28.x86_64 #1
Hardware name: LENOVO 20EQS64N00/20EQS64N00, BIOS N1EET77W (1.50 ) 03/28/2018
Workqueue: events nouveau_display_hpd_work [nouveau]
RIP: 0010:nouveau_dp_detect+0x17e/0x370 [nouveau]
RSP: 0018:ffffa15143933cf0 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff8cb4f656c400 RCX: 0000000000000000
RDX: ffffa1514500e4e4 RSI: ffffa1514500e4e4 RDI: 0000000001009002
RBP: ffff8cb4f4a8a800 R08: ffffa15143933cfd R09: ffffa15143933cfc
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8cb4fb57a000
R13: ffff8cb4fb57a000 R14: ffff8cb4f4a8f800 R15: ffff8cb4f656c418
FS: 0000000000000000(0000) GS:ffff8cb51f400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f78ec938000 CR3: 000000073720a003 CR4: 00000000003606f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
? _cond_resched+0x15/0x30
nouveau_connector_detect+0x2ce/0x520 [nouveau]
? _cond_resched+0x15/0x30
? ww_mutex_lock+0x12/0x40
drm_helper_probe_detect_ctx+0x8b/0xe0 [drm_kms_helper]
drm_helper_hpd_irq_event+0xa8/0x120 [drm_kms_helper]
nouveau_display_hpd_work+0x2a/0x60 [nouveau]
process_one_work+0x187/0x340
worker_thread+0x2e/0x380
? pwq_unbound_release_workfn+0xd0/0xd0
kthread+0x112/0x130
? kthread_create_worker_on_cpu+0x70/0x70
ret_from_fork+0x35/0x40
Code: 4c 8d 44 24 0d b9 00 05 00 00 48 89 ef ba 09 00 00 00 be 01 00 00 00 e8 e1 09 f8 ff 85 c0 0f 85 b2 01 00 00 80 7c 24 0c 03 74 02 <0f> 0b 48 89 ef e8 b8 07 f8 ff f6 05 51 1b c8 ff 02 0f 84 72 ff
---[ end trace 55d811b38fc8e71a ]---
So, to fix this we attempt to grab a runtime PM reference in the ACPI
handler itself asynchronously. If the GPU is already awake (it will have
normal hotplugging at this point) or runtime PM callbacks are currently
disabled on the device, we drop our reference without updating the
autosuspend delay. We only schedule connector reprobes when we
successfully managed to queue up a resume request with our asynchronous
PM ref.
This also has the added benefit of preventing redundant connector
reprobes from ACPI while the GPU is runtime resumed!
Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: stable@vger.kernel.org
Cc: Karol Herbst <kherbst@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1477182#c41
Signed-off-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6833fb1ec1 upstream.
It's true we can't resume the device from poll workers in
nouveau_connector_detect(). We can however, prevent the autosuspend
timer from elapsing immediately if it hasn't already without risking any
sort of deadlock with the runtime suspend/resume operations. So do that
instead of entirely avoiding grabbing a power reference.
Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Daniel Vetter <daniel@ffwll.ch>
Cc: stable@vger.kernel.org
Cc: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d77ef138ff upstream.
Turns out this part is my fault for not noticing when reviewing
9a2eba337c ("drm/nouveau: Fix drm poll_helper handling"). Currently
we call drm_kms_helper_poll_enable() from nouveau_display_hpd_work().
This makes basically no sense however, because that means we're calling
drm_kms_helper_poll_enable() every time we schedule the hotplug
detection work. This is also against the advice mentioned in
drm_kms_helper_poll_enable()'s documentation:
Note that calls to enable and disable polling must be strictly ordered,
which is automatically the case when they're only call from
suspend/resume callbacks.
Of course, hotplugs can't really be ordered. They could even happen
immediately after we called drm_kms_helper_poll_disable() in
nouveau_display_fini(), which can lead to all sorts of issues.
Additionally; enabling polling /after/ we call
drm_helper_hpd_irq_event() could also mean that we'd miss a hotplug
event anyway, since drm_helper_hpd_irq_event() wouldn't bother trying to
probe connectors so long as polling is disabled.
So; simply move this back into nouveau_display_init() again. The race
condition that both of these patches attempted to work around has
already been fixed properly in
d61a5c1063 ("drm/nouveau: Fix deadlock on runtime suspend")
Fixes: 9a2eba337c ("drm/nouveau: Fix drm poll_helper handling")
Signed-off-by: Lyude Paul <lyude@redhat.com>
Acked-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Daniel Vetter <daniel@ffwll.ch>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Peter Ujfalusi <peter.ujfalusi@ti.com>
Cc: stable@vger.kernel.org
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit f0e0d04413 ]
Update 'confirmed' timestamp when ARP packet is received. It shouldn't
affect locktime logic and anyway entry can be confirmed by any higher-layer
protocol. Thus it makes sense to confirm it when ARP packet is received.
Fixes: 77d7123342 ("neighbour: update neigh timestamps iff update is effective")
Signed-off-by: Vasily Khoruzhick <vasilykh@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2b5a921740 ]
commit 2abb7cdc0d ("udp: Add support for doing checksum
unnecessary conversion") left out the early demux path for
connected sockets. As a result IP_CMSG_CHECKSUM gives wrong
values for such socket when GRO is not enabled/available.
This change addresses the issue by moving the csum conversion to a
common helper and using such helper in both the default and the
early demux rx path.
Fixes: 2abb7cdc0d ("udp: Add support for doing checksum unnecessary conversion")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a7f38002fb ]
The operation ~(p100_inb(VG_LAN_CFG_1) & HP100_LINK_UP) returns a value
that is always non-zero and hence the wait for the link to drop always
terminates prematurely. Fix this by using a logical not operator instead
of a bitwise complement. This issue has been in the driver since
pre-2.6.12-rc2.
Detected by CoverityScan, CID#114157 ("Logical vs. bitwise operator")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit bbd6528d28 ]
In the unlikely case ip6_xmit() has to call skb_realloc_headroom(),
we need to call skb_set_owner_w() before consuming original skb,
otherwise we risk a use-after-free.
Bring IPv6 in line with what we do in IPv4 to fix this.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c56cae23c6 ]
When splitting a GSO segment that consists of encapsulated packets, the
skb->mac_len of the segments can end up being set wrong, causing packet
drops in particular when using act_mirred and ifb interfaces in
combination with a qdisc that splits GSO packets.
This happens because at the time skb_segment() is called, network_header
will point to the inner header, throwing off the calculation in
skb_reset_mac_len(). The network_header is subsequently adjust by the
outer IP gso_segment handlers, but they don't set the mac_len.
Fix this by adding skb_reset_mac_len() calls to both the IPv4 and IPv6
gso_segment handlers, after they modify the network_header.
Many thanks to Eric Dumazet for his help in identifying the cause of
the bug.
Acked-by: Dave Taht <dave.taht@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 83f365554e upstream.
When reducing ring buffer size, pages are removed by scheduling a work
item on each CPU for the corresponding CPU ring buffer. After the pages
are removed from ring buffer linked list, the pages are free()d in a
tight loop. The loop does not give up CPU until all pages are removed.
In a worst case behavior, when lot of pages are to be freed, it can
cause system stall.
After the pages are removed from the list, the free() can happen while
the work is rescheduled. Call cond_resched() in the loop to prevent the
system hangup.
Link: http://lkml.kernel.org/r/20180907223129.71994-1-vnagarnaik@google.com
Cc: stable@vger.kernel.org
Fixes: 83f40318da ("ring-buffer: Make removal of ring buffer pages atomic")
Reported-by: Jason Behmer <jbehmer@google.com>
Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ad4f15dc2c upstream.
Commit 57f230ab04 ("xen/netfront: raise max number of slots in
xennet_get_responses()") raised the max number of allowed slots by one.
This seems to be problematic in some configurations with netback using
a larger MAX_SKB_FRAGS value (e.g. old Linux kernel with MAX_SKB_FRAGS
defined as 18 instead of nowadays 17).
Instead of BUG_ON() in this case just fall back to retransmission.
Fixes: 57f230ab04 ("xen/netfront: raise max number of slots in xennet_get_responses()")
Cc: stable@vger.kernel.org
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 498fe23aad upstream.
Although private data of sound card instance is usually allocated in the
tail of the instance, drivers in ALSA firewire stack allocate the private
data before allocating the instance. In this case, the private data
should be released explicitly at .private_free callback of the instance.
This commit fixes memory leak following to the above design.
Fixes: 6c29230e2a ('ALSA: oxfw: delayed registration of sound card')
Cc: <stable@vger.kernel.org> # v4.7+
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ce925f088b upstream.
After allocating model-dependent data, ALSA OXFW driver has memory leak
of the data at error path.
This commit releases the data at the error path.
Fixes: 6c29230e2a ('ALSA: oxfw: delayed registration of sound card')
Cc: <stable@vger.kernel.org> # v4.7+
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c3b55e2ec9 upstream.
After allocating memory object for response buffer, ALSA fireworks
driver has leak of the memory object at error path.
This commit releases the object at the error path.
Fixes: 7d3c1d5901aa('ALSA: fireworks: delayed registration of sound card')
Cc: <stable@vger.kernel.org> # v4.7+
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8d28277c06 upstream.
Although private data of sound card instance is usually allocated in the
tail of the instance, drivers in ALSA firewire stack allocate the private
data before allocating the instance. In this case, the private data
should be released explicitly at .private_free callback of the instance.
This commit fixes memory leak following to the above design.
Fixes: b610386c8a ('ALSA: firewire-tascam: deleyed registration of sound card')
Cc: <stable@vger.kernel.org> # v4.7+
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a49a83ab05 upstream.
Although private data of sound card instance is usually allocated in the
tail of the instance, drivers in ALSA firewire stack allocate the private
data before allocating the instance. In this case, the private data
should be released explicitly at .private_free callback of the instance.
This commit fixes memory leak following to the above design.
Fixes: 86c8dd7f4d ('ALSA: firewire-digi00x: delayed registration of sound card')
Cc: <stable@vger.kernel.org> # v4.7+
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 493626f2d8 upstream.
When executing 'fw_run_transaction()' with 'TCODE_WRITE_BLOCK_REQUEST',
an address of 'payload' argument is used for streaming DMA mapping by
'firewire_ohci' module if 'size' argument is larger than 8 byte.
Although in this case the address should not be on kernel stack, current
implementation of ALSA bebob driver uses data in kernel stack for a cue
to boot M-Audio devices. This often brings unexpected result, especially
for a case of CONFIG_VMAP_STACK=y.
This commit fixes the bug.
Reference: https://bugzilla.kernel.org/show_bug.cgi?id=201021
Reference: https://forum.manjaro.org/t/firewire-m-audio-410-driver-wont-load-firmware/51165
Fixes: a2b2a7798fb6('ALSA: bebob: Send a cue to load firmware for M-Audio Firewire series')
Cc: <stable@vger.kernel.org> # v3.16+
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b1fbebd416 upstream.
After allocating model-dependent data for M-Audio FW1814 and ProjectMix
I/O, ALSA bebob driver has memory leak at error path.
This commit releases the allocated data at the error path.
Fixes: 04a2c73c97eb('ALSA: bebob: delayed registration of sound card')
Cc: <stable@vger.kernel.org> # v4.7+
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e285d5bfb7 upstream.
According to ETSI TS 102 622 specification chapter 4.4 pipe identifier
is 7 bits long which allows for 128 unique pipe IDs. Because
NFC_HCI_MAX_PIPES is used as the number of pipes supported and not
as the max pipe ID, its value should be 128 instead of 127.
nfc_hci_recv_from_llc extracts pipe ID from packet header using
NFC_HCI_FRAGMENT(0x7F) mask which allows for pipe ID value of 127.
Same happens when NCI_HCP_MSG_GET_PIPE() is being used. With
pipes array having only 127 elements and pipe ID of 127 the OOB memory
access will result.
Cc: Samuel Ortiz <sameo@linux.intel.com>
Cc: Allen Pais <allen.pais@oracle.com>
Cc: "David S. Miller" <davem@davemloft.net>
Suggested-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 674d9de02a upstream.
When handling SHDLC I-Frame commands "pipe" field used for indexing
into an array should be checked before usage. If left unchecked it
might access memory outside of the array of size NFC_HCI_MAX_PIPES(127).
Malformed NFC HCI frames could be injected by a malicious NFC device
communicating with the device being attacked (remote attack vector),
or even by an attacker with physical access to the I2C bus such that
they could influence the data transfers on that bus (local attack vector).
skb->data is controlled by the attacker and has only been sanitized in
the most trivial ways (CRC check), therefore we can consider the
create_info struct and all of its members to tainted. 'create_info->pipe'
with max value of 255 (uint8) is used to take an offset of the
hdev->pipes array of 127 elements which can lead to OOB write.
Cc: Samuel Ortiz <sameo@linux.intel.com>
Cc: Allen Pais <allen.pais@oracle.com>
Cc: "David S. Miller" <davem@davemloft.net>
Suggested-by: Kevin Deus <kdeus@google.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e2710dbf0d upstream.
Alex reported the following race condition:
/* link goes up... interrupt... schedule watchdog */
\ e1000_watchdog_task
\ e1000e_has_link
\ hw->mac.ops.check_for_link() === e1000e_check_for_copper_link
\ e1000e_phy_has_link_generic(..., &link)
link = true
/* link goes down... interrupt */
\ e1000_msix_other
hw->mac.get_link_status = true
/* link is up */
mac->get_link_status = false
link_active = true
/* link_active is true, wrongly, and stays so because
* get_link_status is false */
Avoid this problem by making sure that we don't set get_link_status = false
after having checked the link.
It seems this problem has been present since the introduction of e1000e.
Link: https://lkml.org/lkml/2018/1/29/338
Reported-by: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Yanhui He <yanhuih@vmware.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3016e0a0c9 upstream.
This reverts commit 19110cfbb3.
This reverts commit 4110e02eb4.
This reverts commit d3604515c9eda464a92e8e67aae82dfe07fe3c98.
Commit 19110cfbb3 ("e1000e: Separate signaling for link check/link up")
changed what happens to the link status when there is an error which
happens after "get_link_status = false" in the copper check_for_link
callbacks. Previously, such an error would be ignored and the link
considered up. After that commit, any error implies that the link is down.
Revert commit 19110cfbb3 ("e1000e: Separate signaling for link check/link
up") and its followups. After reverting, the race condition described in
the log of commit 19110cfbb3 is reintroduced. It may still be triggered
by LSC events but this should keep the link down in case the link is
electrically unstable, as discussed. The race may no longer be
triggered by RXO events because commit 4aea7a5c5e ("e1000e: Avoid
receiver overrun interrupt bursts") restored reading icr in the Other
handler.
Link: https://lkml.org/lkml/2018/3/1/789
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Yanhui He <yanhuih@vmware.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 116f4a640b upstream.
The 82574 specification update errata 12 states that interrupts may be
missed if ICR is read while INT_ASSERTED is not set. Avoid that problem by
setting all bits related to events that can trigger the Other interrupt in
IMS.
The Other interrupt is raised for such events regardless of whether or not
they are set in IMS. However, only when they are set is the INT_ASSERTED
bit also set in ICR.
By doing this, we ensure that INT_ASSERTED is always set when we read ICR
in e1000_msix_other() and steer clear of the errata. This also ensures that
ICR will automatically be cleared on read, therefore we no longer need to
clear bits explicitly.
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
[bwh: Backported to 4.9: adjust context]
Cc: Yanhui He <yanhuih@vmware.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 361a954e6a upstream.
Restores the ICS write for Rx/Tx queue interrupts which was present before
commit 16ecba59bc ("e1000e: Do not read ICR in Other interrupt", v4.5-rc1)
but was not restored in commit 4aea7a5c5e
("e1000e: Avoid receiver overrun interrupt bursts", v4.15-rc1).
This re-raises the queue interrupts in case the txq or rxq bits were set in
ICR and the Other interrupt handler read and cleared ICR before the queue
interrupt was raised.
Fixes: 4aea7a5c5e ("e1000e: Avoid receiver overrun interrupt bursts")
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Yanhui He <yanhuih@vmware.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1f0ea19722 upstream.
This partially reverts commit 4aea7a5c5e.
We keep the fix for the first part of the problem (1) described in the log
of that commit, that is to read ICR in the other interrupt handler. We
remove the fix for the second part of the problem (2), Other interrupt
throttling.
Bursts of "Other" interrupts may once again occur during rxo (receive
overflow) traffic conditions. This is deemed acceptable in the interest of
avoiding unforeseen fallout from changes that are not strictly necessary.
As discussed, the e1000e driver should be in "maintenance mode".
Link: https://www.spinics.net/lists/netdev/msg480675.html
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Yanhui He <yanhuih@vmware.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 745d0bd3af upstream.
It was reported that emulated e1000e devices in vmware esxi 6.5 Build
7526125 do not link up after commit 4aea7a5c5e ("e1000e: Avoid receiver
overrun interrupt bursts", v4.15-rc1). Some tracing shows that after
e1000e_trigger_lsc() is called, ICR reads out as 0x0 in e1000_msix_other()
on emulated e1000e devices. In comparison, on real e1000e 82574 hardware,
icr=0x80000004 (_INT_ASSERTED | _LSC) in the same situation.
Some experimentation showed that this flaw in vmware e1000e emulation can
be worked around by not setting Other in EIAC. This is how it was before
16ecba59bc ("e1000e: Do not read ICR in Other interrupt", v4.5-rc1).
Fixes: 4aea7a5c5e ("e1000e: Avoid receiver overrun interrupt bursts")
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
[bwh: Backported to 4.9: adjust context]
Cc: Yanhui He <yanhuih@vmware.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0f02cfbc3d upstream.
When a system suffers from dcache aliasing a user program may observe
stale VDSO data from an aliased cache line. Notably this can break the
expectation that clock_gettime(CLOCK_MONOTONIC, ...) is, as its name
suggests, monotonic.
In order to ensure that users observe updates to the VDSO data page as
intended, align the user mappings of the VDSO data page such that their
cache colouring matches that of the virtual address range which the
kernel will use to update the data page - typically its unmapped address
within kseg0.
This ensures that we don't introduce aliasing cache lines for the VDSO
data page, and therefore that userland will observe updates without
requiring cache invalidation.
Signed-off-by: Paul Burton <paul.burton@mips.com>
Reported-by: Hauke Mehrtens <hauke@hauke-m.de>
Reported-by: Rene Nielsen <rene.nielsen@microsemi.com>
Reported-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Fixes: ebb5e78cc6 ("MIPS: Initial implementation of a VDSO")
Patchwork: https://patchwork.linux-mips.org/patch/20344/
Tested-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Tested-by: Hauke Mehrtens <hauke@hauke-m.de>
Cc: James Hogan <jhogan@kernel.org>
Cc: linux-mips@linux-mips.org
Cc: stable@vger.kernel.org # v4.4+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b40b3e9358 upstream.
We accidentally removed the check for negative returns
without considering the issue of type promotion.
The "if_version_length" variable is type size_t so if __mei_cl_recv()
returns a negative then "bytes_recv" is type promoted
to a high positive value and treated as success.
Cc: <stable@vger.kernel.org>
Fixes: 582ab27a06 ("mei: bus: fix received data size check in NFC fixup")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 1cf86bc212 ]
If you do this on an sdm845 board:
grep "" /sys/kernel/debug/pinctrl/*spmi:pmic*/pinconf-groups
...it looks like nonsense. For every pin you see listed:
input bias disabled, input bias high impedance, input bias pull down, input bias pull up, ...
That's because pmic_gpio_config_get() isn't complying with the rules
that pinconf_generic_dump_one() expects. Specifically for boolean
parameters (anything with a "struct pin_config_item" where has_arg is
false) the function expects that the function should return its value
not through the "config" parameter but should return "0" if the value
is set and "-EINVAL" if the value isn't set.
Let's fix this.
>From a quick sample of other pinctrl drivers, it appears to be
tradition to also return 1 through the config parameter for these
boolean parameters when they exist. I'm not one to knock tradition,
so I'll follow tradition and return 1 in these cases. While I'm at
it, I'll also continue searching for four leaf clovers, kocking on
wood three times, and trying not to break mirrors.
NOTE: This also fixes an apparent typo for reading
PIN_CONFIG_BIAS_DISABLE where the old driver was accidentally
using "=" instead of "==" and thus was setting some internal
state when you tried to query PIN_CONFIG_BIAS_DISABLE. Oops.
Fixes: eadff30244 ("pinctrl: Qualcomm SPMI PMIC GPIO pin controller driver")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ff2d6acdf6 ]
Without this commit the following intervals [x y), (x y) were be
replaced to (y-1 y) by snd_interval_refine_last(). This was also done
if y-1 is part of the previous interval.
With this changes it will be replaced with [y-1 y) in case of y-1 is
part of the previous interval. A similar behavior will be used for
snd_interval_refine_first().
This commit adapts the changes for alsa-lib of commit
9bb985c ("pcm: snd_interval_refine_first/last: exclude value only if
also excluded before")
Signed-off-by: Timo Wischer <twischer@de.adit-jv.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 193c2a07cf ]
Locking the root adapter for __i2c_transfer will deadlock if the
device sits behind a mux-locked I2C mux. Switch to the finer-grained
i2c_lock_bus with the I2C_LOCK_SEGMENT flag. If the device does not
sit behind a mux-locked mux, the two locking variants are equivalent.
Signed-off-by: Peter Rosin <peda@axentia.se>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 8c8f74f327 ]
Locking the root adapter for __i2c_transfer will deadlock if the
device sits behind a mux-locked I2C mux. Switch to the finer-grained
i2c_lock_bus with the I2C_LOCK_SEGMENT flag. If the device does not
sit behind a mux-locked mux, the two locking variants are equivalent.
Signed-off-by: Peter Rosin <peda@axentia.se>
Acked-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b23ec59926 ]
Since we put static variable to a header file it's copied to each module
that includes the header. But not all of them are actually used it.
Mark gpio_suffixes array with __maybe_unused to hide a compiler warning:
In file included from
drivers/gpio/gpiolib-legacy.c:6:0:
drivers/gpio/gpiolib.h:95:27: warning: ‘gpio_suffixes’ defined but not used [-Wunused-const-variable=]
static const char * const gpio_suffixes[] = { "gpios", "gpio" };
^~~~~~~~~~~~~
In file included from drivers/gpio/gpiolib-devprop.c:17:0:
drivers/gpio/gpiolib.h:95:27: warning: ‘gpio_suffixes’ defined but not used [-Wunused-const-variable=]
static const char * const gpio_suffixes[] = { "gpios", "gpio" };
^~~~~~~~~~~~~
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 9506755633 ]
platform_get_resource() may fail and return NULL, so we should
better check it's return value to avoid a NULL pointer dereference
a bit later in the code.
This is detected by Coccinelle semantic patch.
@@
expression pdev, res, n, t, e, e1, e2;
@@
res = platform_get_resource(pdev, t, n);
+ if (!res)
+ return -EINVAL;
... when != res == NULL
e = devm_ioremap(e1, res->start, e2);
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Robert Jarzmik <robert.jarzmik@free.fr>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ccff2dface ]
Probing the TPIU driver under UBSan triggers an out-of-bounds shift
warning in coresight_timeout():
...
[ 5.677530] UBSAN: Undefined behaviour in drivers/hwtracing/coresight/coresight.c:929:16
[ 5.685542] shift exponent 64 is too large for 64-bit type 'long unsigned int'
...
On closer inspection things are exponentially out of whack because we're
passing a bitmask where a bit number should be. Amusingly, it seems that
both calls will find their expected values by sheer luck and appear to
succeed: 1 << FFCR_FON_MAN ends up at bit 64 which whilst undefined
evaluates as zero in practice, while 1 << FFSR_FT_STOPPED finds bit 2
(TCPresent) which apparently is usually tied high.
Following the examples of other drivers, define separate FOO and FOO_BIT
macros for masks vs. indices, and put things right.
CC: Robert Walker <robert.walker@arm.com>
CC: Mike Leach <mike.leach@linaro.org>
CC: Mathieu Poirier <mathieu.poirier@linaro.org>
Fixes: 11595db8e1 ("coresight: Fix disabling of CoreSight TPIU")
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b59fb482b5 ]
Depending on the kernel configuration, early ARM architecture setup code
may have attached the GPU to a DMA/IOMMU mapping that transparently uses
the IOMMU to back the DMA API. Tegra requires special handling for IOMMU
backed buffers (a special bit in the GPU's MMU page tables indicates the
memory path to take: via the SMMU or directly to the memory controller).
Transparently backing DMA memory with an IOMMU prevents Nouveau from
properly handling such memory accesses and causes memory access faults.
As a side-note: buffers other than those allocated in instance memory
don't need to be physically contiguous from the GPU's perspective since
the GPU can map them into contiguous buffers using its own MMU. Mapping
these buffers through the IOMMU is unnecessary and will even lead to
performance degradation because of the additional translation. One
exception to this are compressible buffers which need large pages. In
order to enable these large pages, multiple small pages will have to be
combined into one large (I/O virtually contiguous) mapping via the
IOMMU. However, that is a topic outside the scope of this fix and isn't
currently supported. An implementation will want to explicitly create
these large pages in the Nouveau driver, so detaching from a DMA/IOMMU
mapping would still be required.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Tested-by: Nicolas Chauvet <kwizart@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 1b5190c2e7 ]
For eMMC devices it is valid to only support 1.8V signaling. When
vqmmc is set to a fixed 1.8V regulator the stack tries to set 3.3V
initially and prints the following warning:
mmc1: Switching to 3.3V signalling voltage failed
Clear the MMC_SIGNAL_VOLTAGE_330 flag in case 3.3V is signaling is
not available. This prevents the stack from even trying to use
3.3V signaling and avoids the above warning.
Signed-off-by: Stefan Agner <stefan@agner.ch>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 127407e36f ]
The stack assumes that SDHC controller which support SD3.0 (SDR104) do
support HS200. This is not the case for Tegra 3, which does support SD
3.0
but only supports eMMC spec 4.41.
Use SDHCI_QUIRK2_BROKEN_HS200 to indicate that the controller does not
support HS200.
Note that commit 156e14b126 ("mmc: sdhci: fix caps2 for HS200") added
the tie between SD3.0 (SDR104) and HS200. I don't think that this is
necessarly true. It is fully legitimate to support SD3.0 and not support
HS200. The quirk naming suggests something is broken in the controller,
but this is not the case: The controller simply does not support HS200.
Fixes: 7ad2ed1dfc ("mmc: tegra: enable UHS-I modes")
Signed-off-by: Stefan Agner <stefan@agner.ch>
Tested-by: Marcel Ziswiler <marcel.ziswiler@toradex.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 81646a3d39 ]
of_find_compatible_node() returns a device node with refcount incremented
and thus needs an explicit of_node_put(). Further relying on an unchecked
of_iomap() which can return NULL is problematic here, after all ctrl_base
is critical enough for hix5hd2_set_cpu() to call BUG() if not available
so a check seems mandated here.
Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
0002 Fixes: commit 06cc5c1d4d ("ARM: hisi: enable hix5hd2 SoC")
Signed-off-by: Wei Xu <xuwei5@hisilicon.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 9f30b5ae05 ]
of_iomap() can return NULL which seems critical here and thus should be
explicitly flagged so that the cause of system halting can be understood.
As of_find_compatible_node() is returning a device node with refcount
incremented it must be explicitly decremented here.
Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
Fixes: commit 7fda91e731 ("ARM: hisi: enable smp for HiP01")
Signed-off-by: Wei Xu <xuwei5@hisilicon.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d396cb185c ]
Relying on an unchecked of_iomap() which can return NULL is problematic
here, an explicit check seems mandatory. Also the call to
of_find_compatible_node() returns a device node with refcount incremented
therefor an explicit of_node_put() is needed here.
Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
Fixes: commit 22bae42904 ("ARM: hi3xxx: add hotplug support")
Signed-off-by: Wei Xu <xuwei5@hisilicon.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 61f0d55569 ]
The following commit:
7e1550b8f2 ("efi: Drop type and attribute checks in efi_mem_desc_lookup()")
refactored the implementation of efi_mem_desc_lookup() so that the type
check is moved to the callers, one of which is the x86 version of
efi_arch_mem_reserve(), where we added a modified check that only takes
EFI_BOOT_SERVICES_DATA regions into account.
This is reasonable, since it is the only memory type that requires this,
but doing so uncovered some unexpected behavior in the ESRT code, which
permits the ESRT table to reside in other types of memory than what the
UEFI spec mandates (i.e., EFI_BOOT_SERVICES_DATA), and unconditionally
calls efi_mem_reserve() on the region in question. This may result in
errors such as
esrt: Reserving ESRT space from 0x000000009c810318 to 0x000000009c810350.
efi: Failed to lookup EFI memory descriptor for 0x000000009c810318
when the ESRT table is not in EFI_BOOT_SERVICES_DATA memory, but we try
to reserve it nonetheless.
So make the call to efi_mem_reserve() conditional on the memory type.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Jones <pjones@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit cc57c07343 ]
This patch fixes a bug where configfs_register_group had added
a group in a tree, and userspace has done a rmdir on a dir somewhere
above that group and we hit a kernel crash. The problem is configfs_rmdir
will detach everything under it and unlink groups on the default_groups
list. It will not unlink groups added with configfs_register_group so when
configfs_unregister_group is called to drop its references to the group/items
we crash when we try to access the freed dentrys.
The patch just adds a check for if a rmdir has been done above
us and if so just does the unlink part of unregistration.
Sorry if you are getting this multiple times. I thouhgt I sent
this to some of you and lkml, but I do not see it.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit cd87668d60 ]
The PCI_OHCI_INT_REG case in pci_ohci_read_reg() contains the following
if statement:
if ((lo & 0x00000f00) == CS5536_USB_INTR)
CS5536_USB_INTR expands to the constant 11, which gives us the following
condition which can never evaluate true:
if ((lo & 0xf00) == 11)
At least when using GCC 8.1.0 this falls foul of the tautoligcal-compare
warning, and since the code is built with the -Werror flag the build
fails.
Fix this by shifting lo right by 8 bits in order to match the
corresponding PCI_OHCI_INT_REG case in pci_ohci_write_reg().
Signed-off-by: Paul Burton <paul.burton@mips.com>
Patchwork: https://patchwork.linux-mips.org/patch/19861/
Cc: Huacai Chen <chenhc@lemote.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e2861fa716 ]
When EVM attempts to appraise a file signed with a crypto algorithm the
kernel doesn't have support for, it will cause the kernel to trigger a
module load. If the EVM policy includes appraisal of kernel modules this
will in turn call back into EVM - since EVM is holding a lock until the
crypto initialisation is complete, this triggers a deadlock. Add a
CRYPTO_NOLOAD flag and skip module loading if it's set, and add that flag
in the EVM case in order to fail gracefully with an error message
instead of deadlocking.
Signed-off-by: Matthew Garrett <mjg59@google.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 6c6bc9ea84 ]
The first checks in mtdchar_read() and mtdchar_write() attempt to limit
`count` such that `*ppos + count <= mtd->size`. However, they ignore the
possibility of `*ppos > mtd->size`, allowing the calculation of `count` to
wrap around. `mtdchar_lseek()` prevents seeking beyond mtd->size, but the
pread/pwrite syscalls bypass this.
I haven't found any codepath on which this actually causes dangerous
behavior, but it seems like a sensible change anyway.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit baa2a4fdd5 ]
audit_add_watch stores locally krule->watch without taking a reference
on watch. Then, it calls audit_add_to_parent, and uses the watch stored
locally.
Unfortunately, it is possible that audit_add_to_parent updates
krule->watch.
When it happens, it also drops a reference of watch which
could free the watch.
How to reproduce (with KASAN enabled):
auditctl -w /etc/passwd -F success=0 -k test_passwd
auditctl -w /etc/passwd -F success=1 -k test_passwd2
The second call to auditctl triggers the use-after-free, because
audit_to_parent updates krule->watch to use a previous existing watch
and drops the reference to the newly created watch.
To fix the issue, we grab a reference of watch and we release it at the
end of the function.
Signed-off-by: Ronny Chevalier <ronny.chevalier@hp.com>
Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2f819db565 ]
The regset API documented in <linux/regset.h> defines -ENODEV as the
result of the `->active' handler to be used where the feature requested
is not available on the hardware found. However code handling core file
note generation in `fill_thread_core_info' interpretes any non-zero
result from the `->active' handler as the regset requested being active.
Consequently processing continues (and hopefully gracefully fails later
on) rather than being abandoned right away for the regset requested.
Fix the problem then by making the code proceed only if a positive
result is returned from the `->active' handler.
Signed-off-by: Maciej W. Rozycki <macro@mips.com>
Signed-off-by: Paul Burton <paul.burton@mips.com>
Fixes: 4206d3aa19 ("elf core dump: notes user_regset")
Patchwork: https://patchwork.linux-mips.org/patch/19332/
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: James Hogan <jhogan@kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 994b15b983 upstream.
The previous fix broke recovery of delegated stateids because it assumes
that if we did not mark the delegation as suspect, then the delegation has
effectively been revoked, and so it removes that delegation irrespectively
of whether or not it is valid and still in use. While this is "mostly
harmless" for ordinary I/O, we've seen pNFS fail with LAYOUTGET spinning
in an infinite loop while complaining that we're using an invalid stateid
(in this case the all-zero stateid).
What we rather want to do here is ensure that the delegation is always
correctly marked as needing testing when that is the case. So we want
to close the loophole offered by nfs4_schedule_stateid_recovery(),
which marks the state as needing to be reclaimed, but not the
delegation that may be backing it.
Fixes: 0e3d3e5df0 ("NFSv4.1 fix infinite loop on IO BAD_STATEID error")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: stable@vger.kernel.org # v4.11+
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit df3aa13c7b upstream.
This reverts commit a81cf9799a.
The patch causes a regression, which I cannot find the reason for.
So let's revert for now, as a revert hurts only performance.
Original report:
I was trying to resolve the problem with Oliver but we don't get any conclusion
for 5 months, so I am now sending this to mail list and cdc_acm authors.
I am using simple request-response protocol to obtain the boiller parameters
in constant intervals.
A simple one transaction is:
1. opening the /dev/ttyACM0
2. sending the following 10-bytes request to the device:
unsigned char req[] = {0x02, 0xfe, 0x01, 0x05, 0x08, 0x02, 0x01, 0x69, 0xab, 0x03};
3. reading response (frame of 74 bytes length).
4. closing the descriptor
I am doing this transaction with 5 seconds intervals.
Before the bad commit everything was working correctly: I've got a requests and
a responses in a timely manner.
After the bad commit more time I am using the kernel module, more problems I have.
The graph [2] is showing the problem.
As you can see after module load all seems fine but after about 30 minutes I've got
a plenty of EAGAINs when doing read()'s and trying to read back the data.
When I rmmod and insmod the cdc_acm module again, then the situation is starting
over again: running ok shortly after load, and more time it is running, more EAGAINs
I have when calling read().
As a bonus I can see the problem on the device itself:
The device is configured as you can see here on this screen [3].
It has two transmision LEDs: TX and RX. Blink duration is set for 100ms.
This is a recording before the bad commit when all is working fine: [4]
And this is with the bad commit: [5]
As you can see the TX led is blinking wrongly long (indicating transmission?)
and I have problems doing read() calls (EAGAIN).
Reported-by: Mariusz Bialonczyk <manio@skyboo.net>
Signed-off-by: Oliver Neukum <oneukum@suse.com>
Fixes: a81cf9799a ("cdc-acm: implement put_char() and flush_chars()")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6e22e3af7b upstream.
wdm_in_callback() is a completion handler function for the USB driver.
So it should not sleep. But it calls service_outstanding_interrupt(),
which calls usb_submit_urb() with GFP_KERNEL.
To fix this bug, GFP_KERNEL is replaced with GFP_ATOMIC.
This bug is found by my static analysis tool DSAC.
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5dfdd24eb3 upstream.
Similarly to a recently reported bug in io_ti, a malicious USB device
could set port_number to a negative value and we would underflow the
port array in the interrupt completion handler.
As these devices only have one or two ports, fix this by making sure we
only consider the seventh bit when determining the port number (and
ignore bits 0xb0 which are typically set to 0x30).
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bc8acc214d upstream.
async_complete() in uss720.c is a completion handler function for the
USB driver. So it should not sleep, but it is can sleep according to the
function call paths (from bottom to top) in Linux-4.16.
[FUNC] set_1284_register(GFP_KERNEL)
drivers/usb/misc/uss720.c, 372:
set_1284_register in parport_uss720_frob_control
drivers/parport/ieee1284.c, 560:
[FUNC_PTR]parport_uss720_frob_control in parport_ieee1284_ack_data_avail
drivers/parport/ieee1284.c, 577:
parport_ieee1284_ack_data_avail in parport_ieee1284_interrupt
./include/linux/parport.h, 474:
parport_ieee1284_interrupt in parport_generic_irq
drivers/usb/misc/uss720.c, 116:
parport_generic_irq in async_complete
[FUNC] get_1284_register(GFP_KERNEL)
drivers/usb/misc/uss720.c, 382:
get_1284_register in parport_uss720_read_status
drivers/parport/ieee1284.c, 555:
[FUNC_PTR]parport_uss720_read_status in parport_ieee1284_ack_data_avail
drivers/parport/ieee1284.c, 577:
parport_ieee1284_ack_data_avail in parport_ieee1284_interrupt
./include/linux/parport.h, 474:
parport_ieee1284_interrupt in parport_generic_irq
drivers/usb/misc/uss720.c, 116:
parport_generic_irq in async_complete
Note that [FUNC_PTR] means a function pointer call is used.
To fix these bugs, GFP_KERNEL is replaced with GFP_ATOMIC.
These bugs are found by my static analysis tool DSAC.
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 691a03cfe8 upstream.
As reported by Dan Carpenter, a malicious USB device could set
port_number to a negative value and we would underflow the port array in
the interrupt completion handler.
As these devices only have one or two ports, fix this by making sure we
only consider the seventh bit when determining the port number (and
ignore bits 0xb0 which are typically set to 0x30).
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Cc: stable <stable@vger.kernel.org>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dec3c23c9a upstream.
Commit f16443a034 ("USB: gadgetfs, dummy-hcd, net2280: fix locking
for callbacks") was based on a serious misunderstanding. It
introduced regressions into both the dummy-hcd and net2280 drivers.
The problem in dummy-hcd was fixed by commit 7dbd8f4cab ("USB:
dummy-hcd: Fix erroneous synchronization change"), but the problem in
net2280 remains. Namely: the ->disconnect(), ->suspend(), ->resume(),
and ->reset() callbacks must be invoked without the private lock held;
otherwise a deadlock will occur when the callback routine tries to
interact with the UDC driver.
This patch largely is a reversion of the relevant parts of
f16443a034. It also drops the private lock around the calls to
->suspend() and ->resume() (something the earlier patch forgot to do).
This is safe from races with device interrupts because it occurs
within the interrupt handler.
Finally, the patch changes where the ->disconnect() callback is
invoked when net2280_pullup() turns the pullup off. Rather than
making the callback from within stop_activity() at a time when dropping
the private lock could be unsafe, the callback is moved to a point
after the lock has already been dropped.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Fixes: f16443a034 ("USB: gadgetfs, dummy-hcd, net2280: fix locking for callbacks")
Reported-by: D. Ziesche <dziesche@zes.com>
Tested-by: D. Ziesche <dziesche@zes.com>
CC: <stable@vger.kernel.org>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6d4f268fa1 upstream.
i_usX2Y_subs_startup in usbusx2yaudio.c is a completion handler function
for the USB driver. So it should not sleep, but it is can sleep
according to the function call paths (from bottom to top) in Linux-4.16.
[FUNC] msleep
drivers/usb/host/u132-hcd.c, 2558:
msleep in u132_get_frame
drivers/usb/core/hcd.c, 2231:
[FUNC_PTR]u132_get_frame in usb_hcd_get_frame_number
drivers/usb/core/usb.c, 822:
usb_hcd_get_frame_number in usb_get_current_frame_number
sound/usb/usx2y/usbusx2yaudio.c, 303:
usb_get_current_frame_number in i_usX2Y_urb_complete
sound/usb/usx2y/usbusx2yaudio.c, 366:
i_usX2Y_urb_complete in i_usX2Y_subs_startup
Note that [FUNC_PTR] means a function pointer call is used.
To fix this bug, msleep() is replaced with mdelay().
This bug is found by my static analysis tool DSAC.
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f9a5b4f58b upstream.
The steps taken by usb core to set a new interface is very different from
what is done on the xHC host side.
xHC hardware will do everything in one go. One command is used to set up
new endpoints, free old endpoints, check bandwidth, and run the new
endpoints.
All this is done by xHC when usb core asks the hcd to check for
available bandwidth. At this point usb core has not yet flushed the old
endpoints, which will cause use-after-free issues in xhci driver as
queued URBs are cancelled on a re-allocated endpoint.
To resolve this add a call to usb_disable_interface() which will flush
the endpoints before calling usb_hcd_alloc_bandwidth()
Additional checks in xhci driver will also be implemented to gracefully
handle stale URB cancel on freed and re-allocated endpoints
Cc: <stable@vger.kernel.org>
Reported-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 42d1c6d4a0 upstream.
The hope that UAS devices would be less broken than old style storage
devices has turned out to be unfounded. Make UAS support more of the
quirk flags of the old driver.
Signed-off-by: Oliver Neukum <oneukum@suse.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f45681f9be upstream.
This device does not correctly handle the LPM operations.
Also, the device cannot handle ATA pass-through commands
and locks up when attempted while running in super speed.
This patch adds the equivalent quirk logic as found in uas.
Signed-off-by: Tim Anderson <tsa@biglakesoftware.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8d2d8935d3 upstream.
Some of the ME clients are available only for BIOS operation and are
removed during hand off to an OS. However the removal is not instant.
A client may be visible on the client list when the mei driver requests
for enumeration, while the subsequent request for properties will be
answered with client not found error value. The default behavior
for an error is to perform client reset while this error is harmless and
the link reset should be prevented. This issue started to be visible due to
suspend/resume timing changes. Currently reported only on the Haswell
based system.
Fixes:
[33.564957] mei_me 0000:00:16.0: hbm: properties response: wrong status = 1 CLIENT_NOT_FOUND
[33.564978] mei_me 0000:00:16.0: mei_irq_read_handler ret = -71.
[33.565270] mei_me 0000:00:16.0: unexpected reset: dev_state = INIT_CLIENTS fw status = 1E000255 60002306 00000200 00004401 00000000 00000010
Cc: <stable@vger.kernel.org>
Reported-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f3dc41c5d2 upstream.
usb_hc_died() should only be called once, and with the primary HCD
as parameter. It will mark both primary and secondary hcd's dead.
Remove the extra call to usb_cd_died with the shared hcd as parameter.
Fixes: ff9d78b36f ("USB: Set usb_hcd->state and flags for shared roothubs")
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Cc: stable <stable@vger.kernel.org>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit de916736aa upstream.
val is indirectly controlled by user-space, hence leading to a
potential exploitation of the Spectre variant 1 vulnerability.
This issue was detected with the help of Smatch:
drivers/misc/hmc6352.c:54 compass_store() warn: potential spectre issue
'map' [r]
Fix this by sanitizing val before using it to index map
Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].
[1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2
Cc: stable@vger.kernel.org
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3c398f3c3b upstream.
after unbinding mmc I get things like this:
[ 185.294067] mmc1: card 0001 removed
[ 185.305206] omap_hsmmc 480b4000.mmc: wake IRQ with no resume: -13
The wakeirq stays in /proc-interrupts
rebinding shows this:
[ 289.795959] genirq: Flags mismatch irq 112. 0000200a (480b4000.mmc:wakeup) vs. 0000200a (480b4000.mmc:wakeup)
[ 289.808959] omap_hsmmc 480b4000.mmc: Unable to request wake IRQ
[ 289.815338] omap_hsmmc 480b4000.mmc: no SDIO IRQ support, falling back to polling
That bug seems to be introduced by switching from devm_request_irq()
to generic wakeirq handling.
So let us cleanup at removal.
Signed-off-by: Andreas Kemnade <andreas@kemnade.info>
Fixes: 5b83b2234b ("mmc: omap_hsmmc: Change wake-up interrupt to use generic wakeirq")
Cc: stable@vger.kernel.org # v4.2+
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 816e846c2e upstream.
Inside of start_xmit() the call to check if the connection is up and the
queueing of the packets for later transmission is not atomic which leaves
a window where cm_rep_handler can run, set the connection up, dequeue
pending packets and leave the subsequently queued packets by start_xmit()
sitting on neigh->queue until they're dropped when the connection is torn
down. This only applies to connected mode. These dropped packets can
really upset TCP, for example, and cause multi-minute delays in
transmission for open connections.
Here's the code in start_xmit where we check to see if the connection is
up:
if (ipoib_cm_get(neigh)) {
if (ipoib_cm_up(neigh)) {
ipoib_cm_send(dev, skb, ipoib_cm_get(neigh));
goto unref;
}
}
The race occurs if cm_rep_handler execution occurs after the above
connection check (specifically if it gets to the point where it acquires
priv->lock to dequeue pending skb's) but before the below code snippet in
start_xmit where packets are queued.
if (skb_queue_len(&neigh->queue) < IPOIB_MAX_PATH_REC_QUEUE) {
push_pseudo_header(skb, phdr->hwaddr);
spin_lock_irqsave(&priv->lock, flags);
__skb_queue_tail(&neigh->queue, skb);
spin_unlock_irqrestore(&priv->lock, flags);
} else {
++dev->stats.tx_dropped;
dev_kfree_skb_any(skb);
}
The patch acquires the netif tx lock in cm_rep_handler for the section
where it sets the connection up and dequeues and retransmits deferred
skb's.
Fixes: 839fcaba35 ("IPoIB: Connected mode experimental support")
Cc: stable@vger.kernel.org
Signed-off-by: Aaron Knister <aaron.s.knister@nasa.gov>
Tested-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8edfe2e992 upstream.
Commit 822fb18a82 ("xen-netfront: wait xenbus state change when load
module manually") added a new wait queue to wait on for a state change
when the module is loaded manually. Unfortunately there is no wakeup
anywhere to stop that waiting.
Instead of introducing a new wait queue rename the existing
module_unload_q to module_wq and use it for both purposes (loading and
unloading).
As any state change of the backend might be intended to stop waiting
do the wake_up_all() in any case when netback_changed() is called.
Fixes: 822fb18a82 ("xen-netfront: wait xenbus state change when load module manually")
Cc: <stable@vger.kernel.org> #4.18
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 831b624df1 upstream.
persistent_ram_vmap() returns the page start vaddr.
persistent_ram_iomap() supports non-page-aligned mapping.
persistent_ram_buffer_map() always adds offset-in-page to the vaddr
returned from these two functions, which causes incorrect mapping of
non-page-aligned persistent ram buffer.
By default ftrace_size is 4096 and max_ftrace_cnt is nr_cpu_ids. Without
this patch, the zone_sz in ramoops_init_przs() is 4096/nr_cpu_ids which
might not be page aligned. If the offset-in-page > 2048, the vaddr will be
in next page. If the next page is not mapped, it will cause kernel panic:
[ 0.074231] BUG: unable to handle kernel paging request at ffffa19e0081b000
...
[ 0.075000] RIP: 0010:persistent_ram_new+0x1f8/0x39f
...
[ 0.075000] Call Trace:
[ 0.075000] ramoops_init_przs.part.10.constprop.15+0x105/0x260
[ 0.075000] ramoops_probe+0x232/0x3a0
[ 0.075000] platform_drv_probe+0x3e/0xa0
[ 0.075000] driver_probe_device+0x2cd/0x400
[ 0.075000] __driver_attach+0xe4/0x110
[ 0.075000] ? driver_probe_device+0x400/0x400
[ 0.075000] bus_for_each_dev+0x70/0xa0
[ 0.075000] driver_attach+0x1e/0x20
[ 0.075000] bus_add_driver+0x159/0x230
[ 0.075000] ? do_early_param+0x95/0x95
[ 0.075000] driver_register+0x70/0xc0
[ 0.075000] ? init_pstore_fs+0x4d/0x4d
[ 0.075000] __platform_driver_register+0x36/0x40
[ 0.075000] ramoops_init+0x12f/0x131
[ 0.075000] do_one_initcall+0x4d/0x12c
[ 0.075000] ? do_early_param+0x95/0x95
[ 0.075000] kernel_init_freeable+0x19b/0x222
[ 0.075000] ? rest_init+0xbb/0xbb
[ 0.075000] kernel_init+0xe/0xfc
[ 0.075000] ret_from_fork+0x3a/0x50
Signed-off-by: Bin Yang <bin.yang@intel.com>
[kees: add comments describing the mapping differences, updated commit log]
Fixes: 24c3d2f342 ("staging: android: persistent_ram: Make it possible to use memory outside of bootmem")
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 954a8e3aea upstream.
When AF_IB addresses are used during rdma_resolve_addr() a lock is not
held. A cma device can get removed while list traversal is in progress
which may lead to crash. ie
CPU0 CPU1
==== ====
rdma_resolve_addr()
cma_resolve_ib_dev()
list_for_each() cma_remove_one()
cur_dev->device mutex_lock(&lock)
list_del();
mutex_unlock(&lock);
cma_process_remove();
Therefore, hold a lock while traversing the list which avoids such
situation.
Cc: <stable@vger.kernel.org> # 3.10
Fixes: f17df3b0de ("RDMA/cma: Add support for AF_IB to rdma_resolve_addr()")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0e7d4d932f ]
This patch fixes two typos related to unregistering algorithms supported by
SAHARAH 3. In sahara_register_algs the wrong algorithms are unregistered
in case of an error. In sahara_unregister_algs the wrong array is used to
determine the iteration count.
Signed-off-by: Michael Müller <michael@fds-team.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 8bbafed8dd ]
The mv_xor_v2 driver uses a tasklet, initialized during the probe()
routine. However, it forgets to cleanup the tasklet using
tasklet_kill() function during the remove() routine, which this patch
fixes. This prevents the tasklet from potentially running after the
module has been removed.
Fixes: 19a340b1a8 ("dmaengine: mv_xor_v2: new driver")
Signed-off-by: Hanna Hawa <hannah@marvell.com>
Reviewed-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3297c8fc65 ]
There is a race window in device_shutdown(), which may cause
-1. parent device shut down before child or
-2. no shutdown on a new probing device.
For 1st, taking the following scenario:
device_shutdown new plugin device
list_del_init(parent_dev);
spin_unlock(list_lock);
device_add(child)
probe child
shutdown parent_dev
--> now child is on the tail of devices_kset
For 2nd, taking the following scenario:
device_shutdown new plugin device
device_add(dev)
device_lock(dev);
...
device_unlock(dev);
probe dev
--> now, the new occurred dev has no opportunity to shutdown
To fix this race issue, just prevent the new probing request. With this
logic, device_shutdown() is more similar to dpm_prepare().
Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Reviewed-by: Rafael J. Wysocki <rafael@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 1d47191de7 ]
The vgic_init function can race with kvm_arch_vcpu_create() which does
not hold kvm_lock() and we therefore have no synchronization primitives
to ensure we're doing the right thing.
As the user is trying to initialize or run the VM while at the same time
creating more VCPUs, we just have to refuse to initialize the VGIC in
this case rather than silently failing with a broken VCPU.
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 70551dc46f ]
After the subdriver's remove() routine has completed, the card's layer
mode is undetermined again. Reflect this in the layer2 field.
If qeth_dev_layer2_store() hits an error after remove() was called, the
card _always_ requires a setup(), even if the previous layer mode is
requested again.
But qeth_dev_layer2_store() bails out early if the requested layer mode
still matches the current one. So unless we reset the layer2 field,
re-probing the card back to its previous mode is currently not possible.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a702349a40 ]
By updating q->used_buffers only _after_ do_QDIO() has completed, there
is a potential race against the buffer's TX completion. In the unlikely
case that the TX completion path wins, qeth_qdio_output_handler() would
decrement the counter before qeth_flush_buffers() even incremented it.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e53db01831 ]
Current LED trigger, 'bt', is not known/used by any existing driver.
Fix this by renaming it to 'bluetooth-power' trigger which is
controlled by the Bluetooth subsystem.
Fixes: 9943230c88 ("arm64: dts: qcom: Add apq8016-sbc board LED's related device nodes")
Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
Signed-off-by: Andy Gross <andy.gross@linaro.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2d408c0d45 ]
Commit f599c64fdf ("xen-netfront: Fix race between device setup and
open") changed the initialization order: xennet_create_queues() now
happens before we do register_netdev() so using netdev->name in
xennet_init_queue() is incorrect, we end up with the following in
/proc/interrupts:
60: 139 0 xen-dyn -event eth%d-q0-tx
61: 265 0 xen-dyn -event eth%d-q0-rx
62: 234 0 xen-dyn -event eth%d-q1-tx
63: 1 0 xen-dyn -event eth%d-q1-rx
and this looks ugly. Actually, using early netdev name (even when it's
already set) is also not ideal: nowadays we tend to rename eth devices
and queue name may end up not corresponding to the netdev name.
Use nodename from xenbus device for queue naming: this can't change in VM's
lifetime. Now /proc/interrupts looks like
62: 202 0 xen-dyn -event device/vif/0-q0-tx
63: 317 0 xen-dyn -event device/vif/0-q0-rx
64: 262 0 xen-dyn -event device/vif/0-q1-tx
65: 17 0 xen-dyn -event device/vif/0-q1-rx
Fixes: f599c64fdf ("xen-netfront: Fix race between device setup and open")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 07300f774f ]
After device is stopped we reset the rings by moving all free buffers
to positions [0, cnt - 2], and clear the position cnt - 1 in the ring.
We then proceed to clear the read/write pointers. This means that if
we try to reset the ring again the code will assume that the next to
fill buffer is at position 0 and swap it with cnt - 1. Since we
previously cleared position cnt - 1 it will lead to leaking the first
buffer and leaving ring in a bad state.
This scenario can only happen if FW communication fails, in which case
the ring will never be used again, so the fact it's in a bad state will
not be noticed. Buffer leak is the only problem. Don't try to move
buffers in the ring if the read/write pointers indicate the ring was
never used or have already been reset.
nfp_net_clear_config_and_disable() is now fully idempotent.
Found by code inspection, FW communication failures are very rare,
and reconfiguring a live device is not common either, so it's unlikely
anyone has ever noticed the leak.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3ea86495ae ]
The BGRT code validates the contents of the table against the UEFI
memory map, and so it expects it to be mapped when the code runs.
On ARM, this is currently not the case, since we tear down the early
mapping after efi_init() completes, and only create the permanent
mapping in arm_enable_runtime_services(), which executes as an early
initcall, but still leaves a window where the UEFI memory map is not
mapped.
So move the call to efi_memmap_unmap() from efi_init() to
arm_enable_runtime_services().
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
[will: fold in EFI_MEMMAP attribute check from Ard]
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 129a998909 ]
A socket which has sk_family set to PF_INET6 is able to receive not
only IPv6 but also IPv4 traffic (IPv4-mapped IPv6 addresses).
Prior to this patch, the smk_skb_to_addr_ipv6() could have been
called for socket buffers containing IPv4 packets, in result such
traffic was allowed.
Signed-off-by: Piotr Sawicki <p.sawicki2@partner.samsung.com>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 133bf90dbb ]
As explained in ieee80211_delayed_tailroom_dec(), during roam,
keys of the old AP will be destroyed and new keys will be
installed. Deletion of the old key causes
crypto_tx_tailroom_needed_cnt to go from 1 to 0 and the new key
installation causes a transition from 0 to 1.
Whenever crypto_tx_tailroom_needed_cnt transitions from 0 to 1,
we invoke synchronize_net(); the reason for doing this is to avoid
a race in the TX path as explained in increment_tailroom_need_count().
This synchronize_net() operation can be slow and can affect the station
roam time. To avoid this, decrementing the crypto_tx_tailroom_needed_cnt
is delayed for a while so that upon installation of new key the
transition would be from 1 to 2 instead of 0 to 1 and thereby
improving the roam time.
This is all correct for a STA iftype, but deferring the tailroom_needed
decrement for other iftypes may be unnecessary.
For example, let's consider the case of a 4-addr client connecting to
an AP for which AP_VLAN interface is also created, let the initial
value for tailroom_needed on the AP be 1.
* 4-addr client connects to the AP (AP: tailroom_needed = 1)
* AP will clear old keys, delay decrement of tailroom_needed count
* AP_VLAN is created, it takes the tailroom count from master
(AP_VLAN: tailroom_needed = 1, AP: tailroom_needed = 1)
* Install new key for the station, assume key is plumbed in the HW,
there won't be any change in tailroom_needed count on AP iface
* Delayed decrement of tailroom_needed count on AP
(AP: tailroom_needed = 0, AP_VLAN: tailroom_needed = 1)
Because of the delayed decrement on AP iface, tailroom_needed count goes
out of sync between AP(master iface) and AP_VLAN(slave iface) and
there would be unnecessary tailroom created for the packets going
through AP_VLAN iface.
Also, WARN_ONs were observed while trying to bring down the AP_VLAN
interface:
(warn_slowpath_common) (warn_slowpath_null+0x18/0x20)
(warn_slowpath_null) (ieee80211_free_keys+0x114/0x1e4)
(ieee80211_free_keys) (ieee80211_del_virtual_monitor+0x51c/0x850)
(ieee80211_del_virtual_monitor) (ieee80211_stop+0x30/0x3c)
(ieee80211_stop) (__dev_close_many+0x94/0xb8)
(__dev_close_many) (dev_close_many+0x5c/0xc8)
Restricting delayed decrement to station interface alone fixes the problem
and it makes sense to do so because delayed decrement is done to improve
roam time which is applicable only for client devices.
Signed-off-by: Manikanta Pubbisetty <mpubbise@codeaurora.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c6ea7e9747 ]
Having the zload address at 0x8060.0000 means the size of the
uncompressed kernel cannot be bigger than around 6 MiB, as it is
deflated at address 0x8001.0000.
This limit is too small; a kernel with some built-in drivers and things
like debugfs enabled will already be over 6 MiB in size, and so will
fail to extract properly.
To fix this, we bump the zload address from 0x8060.0000 to 0x8100.0000.
This is fine, as all the boards featuring Ingenic JZ SoCs have at least
32 MiB of RAM, and use u-boot or compatible bootloaders which won't
hardcode the load address but read it from the uImage's header.
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Signed-off-by: Paul Burton <paul.burton@mips.com>
Patchwork: https://patchwork.linux-mips.org/patch/19787/
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit bd90284cc6 ]
The intention here is to consume and discard the remaining buffer
upon error. This works if there has not been a previous partial write.
If there has been, then total_len is no longer total number of bytes
to copy. total_len is always "bytes left to copy", so it should be
added to written bytes.
This code may not be exercised any more if partial writes will not be
hit, but this is a small bugfix before a larger change.
Reviewed-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit cd4806911c ]
For most of Exynos SoCs, Power Management Unit (PMU) address space is
mapped into global variable 'pmu_base_addr' very early when initializing
PMU interrupt controller. A lot of other machine code depends on it so
when doing iounmap() on this address, clear the global as well to avoid
usage of invalid value (pointing to unmapped memory region).
Properly mapped PMU address space is a requirement for all other machine
code so this fix is purely theoretical. Boot will fail immediately in
many other places after following this error path.
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 1ba0a59cea ]
I discovered the problem when developing a frame buffer driver for the
PlayStation 2 (not yet merged), using the following video modes for the
PlayStation 3 in drivers/video/fbdev/ps3fb.c:
}, {
/* 1080if */
"1080if", 50, 1920, 1080, 13468, 148, 484, 36, 4, 88, 5,
FB_SYNC_BROADCAST, FB_VMODE_INTERLACED
}, {
/* 1080pf */
"1080pf", 50, 1920, 1080, 6734, 148, 484, 36, 4, 88, 5,
FB_SYNC_BROADCAST, FB_VMODE_NONINTERLACED
},
In ps3fb_probe, the mode_option module parameter is used with fb_find_mode
but it can only select the interlaced variant of 1920x1080 since the loop
matching the modes does not take the difference between interlaced and
progressive modes into account.
In short, without the patch, progressive 1920x1080 cannot be chosen as a
mode_option parameter since fb_find_mode (falsely) thinks interlace is a
perfect match.
Signed-off-by: Fredrik Noring <noring@nocrew.org>
Cc: "Maciej W. Rozycki" <macro@linux-mips.org>
[b.zolnierkie: updated patch description]
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b951d80aaf ]
When parsing the video modes from DT properties, make sure to zero out
memory before using it. This is important because not all fields in the mode
struct are explicitly initialized, even though they are used later on.
Fixes: 420a488278 ("video: fbdev: pxafb: initial devicetree conversion")
Reviewed-by: Robert Jarzmik <robert.jarzmik@free.fr>
Signed-off-by: Daniel Mack <daniel@zonque.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 9068533e4f ]
For powerpc64, perf will filter out the second entry in the callchain,
i.e. the LR value, if the return address of the function corresponding
to the probed location has already been saved on its caller's stack.
The state of the return address is determined using debug information.
At any point within a function, if the return address is already saved
somewhere, a DWARF expression can tell us about its location. If the
return address in still in LR only, no DWARF expression would exist.
Typically, the instructions in a function's prologue first copy the LR
value to R0 and then pushes R0 on to the stack. If LR has already been
copied to R0 but R0 is yet to be pushed to the stack, we can still get a
DWARF expression that says that the return address is in R0. This is
indicating that getting a DWARF expression for the return address does
not guarantee the fact that it has already been saved on the stack.
This can be observed on a powerpc64le system running Fedora 27 as shown
below.
# objdump -d /usr/lib64/libc-2.26.so | less
...
000000000015af20 <inet_pton>:
15af20: 0b 00 4c 3c addis r2,r12,11
15af24: e0 c1 42 38 addi r2,r2,-15904
15af28: a6 02 08 7c mflr r0
15af2c: f0 ff c1 fb std r30,-16(r1)
15af30: f8 ff e1 fb std r31,-8(r1)
15af34: 78 1b 7f 7c mr r31,r3
15af38: 78 23 83 7c mr r3,r4
15af3c: 78 2b be 7c mr r30,r5
15af40: 10 00 01 f8 std r0,16(r1)
15af44: c1 ff 21 f8 stdu r1,-64(r1)
15af48: 28 00 81 f8 std r4,40(r1)
...
# readelf --debug-dump=frames-interp /usr/lib64/libc-2.26.so | less
...
00027024 0000000000000024 00027028 FDE cie=00000000 pc=000000000015af20..000000000015af88
LOC CFA r30 r31 ra
000000000015af20 r1+0 u u u
000000000015af34 r1+0 c-16 c-8 r0
000000000015af48 r1+64 c-16 c-8 c+16
000000000015af5c r1+0 c-16 c-8 c+16
000000000015af78 r1+0 u u
...
# perf probe -x /usr/lib64/libc-2.26.so -a inet_pton+0x18
# perf record -e probe_libc:inet_pton -g ping -6 -c 1 ::1
# perf script
Before:
ping 2829 [005] 512917.460174: probe_libc:inet_pton: (7fff7e2baf38)
7fff7e2baf38 __GI___inet_pton+0x18 (/usr/lib64/libc-2.26.so)
7fff7e2705b4 getaddrinfo+0x164 (/usr/lib64/libc-2.26.so)
12f152d70 _init+0xbfc (/usr/bin/ping)
7fff7e1836a0 generic_start_main.isra.0+0x140 (/usr/lib64/libc-2.26.so)
7fff7e183898 __libc_start_main+0xb8 (/usr/lib64/libc-2.26.so)
0 [unknown] ([unknown])
After:
ping 2829 [005] 512917.460174: probe_libc:inet_pton: (7fff7e2baf38)
7fff7e2baf38 __GI___inet_pton+0x18 (/usr/lib64/libc-2.26.so)
7fff7e26fa54 gaih_inet.constprop.7+0xf44 (/usr/lib64/libc-2.26.so)
7fff7e2705b4 getaddrinfo+0x164 (/usr/lib64/libc-2.26.so)
12f152d70 _init+0xbfc (/usr/bin/ping)
7fff7e1836a0 generic_start_main.isra.0+0x140 (/usr/lib64/libc-2.26.so)
7fff7e183898 __libc_start_main+0xb8 (/usr/lib64/libc-2.26.so)
0 [unknown] ([unknown])
Reported-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Maynard Johnson <maynard@us.ibm.com>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/66e848a7bdf2d43b39210a705ff6d828a0865661.1530724939.git.sandipan@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b6566b47a6 ]
Fix a build warning in viafbdev.c when CONFIG_PROC_FS is not enabled
by marking the unused function as __maybe_unused.
../drivers/video/fbdev/via/viafbdev.c:1471:12: warning: 'viafb_sup_odev_proc_show' defined but not used [-Wunused-function]
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e79e0e1428 ]
Before this patch, you could get into situations like this:
1. Process 1 searches for X free blocks, finds them, makes a reservation
2. Process 2 searches for free blocks in the same rgrp, but now the
bitmap is full because process 1's reservation is skipped over.
So it marks the bitmap as GBF_FULL.
3. Process 1 tries to allocate blocks from its own reservation, but
since the GBF_FULL bit is set, it skips over the rgrp and searches
elsewhere, thus not using its own reservation.
This patch adds an additional check to allow processes to use their
own reservations.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 1d25e3eeed ]
Fix 2 printk format warnings (this driver is currently only used by
arch/sh/) by using "%pap" instead of "%lx".
Fixes these build warnings:
../drivers/mtd/maps/solutionengine.c: In function 'init_soleng_maps':
../include/linux/kern_levels.h:5:18: warning: format '%lx' expects argument of type 'long unsigned int', but argument 2 has type 'resource_size_t' {aka 'unsigned int'} [-Wformat=]
../drivers/mtd/maps/solutionengine.c:62:54: note: format string is defined here
printk(KERN_NOTICE "Solution Engine: Flash at 0x%08lx, EPROM at 0x%08lx\n",
~~~~^
%08x
../include/linux/kern_levels.h:5:18: warning: format '%lx' expects argument of type 'long unsigned int', but argument 3 has type 'resource_size_t' {aka 'unsigned int'} [-Wformat=]
../drivers/mtd/maps/solutionengine.c:62:72: note: format string is defined here
printk(KERN_NOTICE "Solution Engine: Flash at 0x%08lx, EPROM at 0x%08lx\n",
~~~~^
%08x
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Brian Norris <computersforpeace@gmail.com>
Cc: Boris Brezillon <boris.brezillon@bootlin.com>
Cc: Marek Vasut <marek.vasut@gmail.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: linux-mtd@lists.infradead.org
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: linux-sh@vger.kernel.org
Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 536ca245c5 ]
According to "Annex A16: RDMA over Converged Ethernet (RoCE)":
A16.4.3 MANAGEMENT INTERFACES
As defined in the base specification, a special Queue Pair, QP0 is defined
solely for communication between subnet manager(s) and subnet management
agents. Since such an IB-defined subnet management architecture is outside
the scope of this annex, it follows that there is also no requirement that
a port which conforms to this annex be associated with a QP0. Thus, for
end nodes designed to conform to this annex, the concept of QP0 is
undefined and unused for any port connected to an Ethernet network.
CA16-8: A packet arriving at a RoCE port containing a BTH with the
destination QP field set to QP0 shall be silently dropped.
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Acked-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e49756544a ]
In pl330_update() when checking if a channel has been aborted, the
channel's lock is not taken, only the overall pl330_dmac lock. But in
pl330_terminate_all() the aborted flag (req_running==-1) is set under
the channel lock and not the pl330_dmac lock.
With threaded interrupts, this leads to a potential race:
pl330_terminate_all pl330_update
------------------- ------------
lock channel
entry
lock pl330
_stop channel
unlock pl330
lock pl330
check req_running != -1
req_running = -1
_start channel
Signed-off-by: John Keeping <john@metanate.com>
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5a1a2f63d8 ]
The error path currently calls tw686x_video_free() which requires
vc->dev to be initialized, causing a NULL dereference on uninitizalized
channels.
Fix this by setting the vc->dev fields for all the channels first.
Fixes: f8afaa8dbc ("[media] tw686x: Introduce an interface to support multiple DMA modes")
Signed-off-by: Krzysztof Ha?asa <khalasa@piap.pl>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 9c2af1c737 ]
If Make gets a fatal signal while a shell is executing, it may delete
the target file that the recipe was supposed to update. This is needed
to make sure that it is remade from scratch when Make is next run; if
Make is interrupted after the recipe has begun to write the target file,
it results in an incomplete file whose time stamp is newer than that
of the prerequisites files. Make automatically deletes the incomplete
file on interrupt unless the target is marked .PRECIOUS.
The situation is just the same as when the shell fails for some reasons.
Usually when a recipe line fails, if it has changed the target file at
all, the file is corrupted, or at least it is not completely updated.
Yet the file’s time stamp says that it is now up to date, so the next
time Make runs, it will not try to update that file.
However, Make does not cater to delete the incomplete target file in
this case. We need to add .DELETE_ON_ERROR somewhere in the Makefile
to request it.
scripts/Kbuild.include seems a suitable place to add it because it is
included from almost all sub-makes.
Please note .DELETE_ON_ERROR is not effective for phony targets.
The external module building should never ever touch the kernel tree.
The following recipe fails if include/generated/autoconf.h is missing.
However, include/config/auto.conf is not deleted since it is a phony
target.
PHONY += include/config/auto.conf
include/config/auto.conf:
$(Q)test -e include/generated/autoconf.h -a -e $@ || ( \
echo >&2; \
echo >&2 " ERROR: Kernel configuration is invalid."; \
echo >&2 " include/generated/autoconf.h or $@ are missing.";\
echo >&2 " Run 'make oldconfig && make prepare' on kernel src to fix it."; \
echo >&2 ; \
/bin/false)
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit f6dab4233d ]
Fixed factor clock has two initializations at of_clk_init() time
and during platform driver probe. Before of_clk_init() call,
node is marked as populated and so its probe never gets called.
During of_clk_init() fixed factor clock registration may fail if
any of its parent clock is not registered. In this case, it doesn't
get chance to retry registration from probe. Clear OF_POPULATED
flag if fixed factor clock registration fails so that clock
registration is attempted again from probe.
Signed-off-by: Rajan Vaja <rajan.vaja@xilinx.com>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 11177e7a7a ]
of_find_compatible_node() is returning a device node with refcount
incremented and must be explicitly decremented after the last use
which is right after the us in of_iomap() here.
Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
Fixes: 787b4271a6 ("clk: imx: add imx6ul clk tree support")
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 776125785a ]
To speed up the common case of appending to a file,
gfs2_write_alloc_required presumes that writing beyond the end of a file
will always require additional blocks to be allocated. This assumption
is incorrect for preallocates files, but there are no negative
consequences as long as *some* space is still left on the filesystem.
One special file that always has some space preallocated beyond the end
of the file is the rindex: when growing a filesystem, gfs2_grow adds one
or more new resource groups and appends records describing those
resource groups to the rindex; the preallocated space ensures that this
is always possible.
However, when a filesystem is completely full, gfs2_write_alloc_required
will indicate that an additional allocation is required, and appending
the next record to the rindex will fail even though space for that
record has already been preallocated. To fix that, skip the incorrect
optimization in gfs2_write_alloc_required, but for the rindex only.
Other writes to preallocated space beyond the end of the file are still
allowed to fail on completely full filesystems.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit bd1cd0eb2c ]
AU0828_DEVICE() macro in quirks-table.h uses USB_DEVICE_VENDOR_SPEC()
for expanding idVendor and idProduct fields. However, the latter
macro adds also match_flags and bInterfaceClass, which are different
from the values AU0828_DEVICE() macro sets after that.
For fixing them, just expand idVendor and idProduct fields manually in
AU0828_DEVICE().
This fixes sparse warnings like:
sound/usb/quirks-table.h:2892:1: warning: Initializer entry defined twice
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5df816e7f4 ]
When initializing the device (procedure init_one), the driver
calls mlx5_pci_init to perform pci initialization. As part of this
initialization, mlx5_pci_init creates a debugfs directory.
If this creation fails, init_one aborts, returning failure to
the caller (which is the probe method caller).
The main reason for such a failure to occur is if the debugfs
directory already exists. This can happen if the last time
mlx5_pci_close was called, debugfs_remove (silently) failed due
to the debugfs directory not being empty.
Guarantee that such a debugfs_remove failure will not occur by
instead calling debugfs_remove_recursive in procedure mlx5_pci_close.
Fixes: 59211bd3b6 ("net/mlx5: Split the load/unload flow into hardware and software flows")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 76d5581c87 ]
When the mlx5 health mechanism detects a problem while the driver
is in the middle of init_one or remove_one, the driver needs to prevent
the health mechanism from scheduling future work; if future work
is scheduled, there is a problem with use-after-free: the system WQ
tries to run the work item (which has been freed) at the scheduled
future time.
Prevent this by disabling work item scheduling in the health mechanism
when the driver is in the middle of init_one() or remove_one().
Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Reviewed-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit cc4dfb7f70 ]
When a rds sock is bound, it is inserted into the bind_hash_table
which is protected by RCU. But when releasing rds sock, after it
is removed from this hash table, it is freed immediately without
respecting RCU grace period. This could cause some use-after-free
as reported by syzbot.
Mark the rds sock with SOCK_RCU_FREE before inserting it into the
bind_hash_table, so that it would be always freed after a RCU grace
period.
The other problem is in rds_find_bound(), the rds sock could be
freed in between rhashtable_lookup_fast() and rds_sock_addref(),
so we need to extend RCU read lock protection in rds_find_bound()
to close this race condition.
Reported-and-tested-by: syzbot+8967084bcac563795dc6@syzkaller.appspotmail.com
Reported-by: syzbot+93a5839deb355537440f@syzkaller.appspotmail.com
Cc: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Cc: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Cc: rds-devel@oss.oracle.com
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oarcle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7a9cdebdcc upstream.
Jann Horn points out that the vmacache_flush_all() function is not only
potentially expensive, it's buggy too. It also happens to be entirely
unnecessary, because the sequence number overflow case can be avoided by
simply making the sequence number be 64-bit. That doesn't even grow the
data structures in question, because the other adjacent fields are
already 64-bit.
So simplify the whole thing by just making the sequence number overflow
case go away entirely, which gets rid of all the complications and makes
the code faster too. Win-win.
[ Oleg Nesterov points out that the VMACACHE_FULL_FLUSHES statistics
also just goes away entirely with this ]
Reported-by: Jann Horn <jannh@google.com>
Suggested-by: Will Deacon <will.deacon@arm.com>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e466af75c0 upstream.
syzkaller reports an out of bound read in strlcpy(), triggered
by xt_copy_counters_from_user()
Fix this by using memcpy(), then forcing a zero byte at the last position
of the destination, as Florian did for the non COMPAT code.
Fixes: d7591f0c41 ("netfilter: x_tables: introduce and use xt_copy_counters_from_user")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Greg Hackmann <ghackmann@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 44a182b9d1 upstream.
KASAN found a use-after-free in xhci_free_virt_device+0x33b/0x38e
where xhci_free_virt_device() sets slot id to 0 if udev exists:
if (dev->udev && dev->udev->slot_id)
dev->udev->slot_id = 0;
dev->udev will be true even if udev is freed because dev->udev is
not set to NULL.
set dev->udev pointer to NULL in xhci_free_dev()
The original patch went to stable so this fix needs to be applied
there as well.
Fixes: a400efe455 ("xhci: zero usb device slot_id member when disabling and freeing a xhci slot")
Cc: <stable@vger.kernel.org>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 643d213a9a ]
Currently if the cm_id is not bound to any netdevice, than for such cm_id,
net namespace is ignored; which is incorrect.
Regardless of cm_id bound to a netdevice or not, net namespace must
match. When a cm_id is bound to a netdevice, in such case net namespace
and netdevice both must match.
Fixes: 4c21b5bcef ("IB/cma: Add net_dev and private data checks to RDMA CM")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit bd3d16a887 ]
If the client is sending a layoutget, but the server issues a callback
to recall what it thinks may be an outstanding layout, then we may find
an uninitialised layout attached to the inode due to the layoutget.
In that case, it is appropriate to return NFS4ERR_NOMATCHING_LAYOUT
rather than NFS4ERR_DELAY, as the latter can end up deadlocking.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 46583e8c48 ]
When attaching a device to an IOMMU group with
CONFIG_DEBUG_ATOMIC_SLEEP=y:
BUG: sleeping function called from invalid context at mm/slab.h:421
in_atomic(): 1, irqs_disabled(): 128, pid: 61, name: kworker/1:1
...
Call trace:
...
arm_lpae_alloc_pgtable+0x114/0x184
arm_64_lpae_alloc_pgtable_s1+0x2c/0x128
arm_32_lpae_alloc_pgtable_s1+0x40/0x6c
alloc_io_pgtable_ops+0x60/0x88
ipmmu_attach_device+0x140/0x334
ipmmu_attach_device() takes a spinlock, while arm_lpae_alloc_pgtable()
allocates memory using GFP_KERNEL. Originally, the ipmmu-vmsa driver
had its own custom page table allocation implementation using
GFP_ATOMIC, hence the spinlock was fine.
Fix this by replacing the spinlock by a mutex, like the arm-smmu driver
does.
Fixes: f20ed39f53 ("iommu/ipmmu-vmsa: Use the ARM LPAE page table allocator")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 14cb2c8a6c ]
The if-block that sets a successful return value in aix_partition()
uses 'lvip[].pps_per_lv' and 'n[].name' potentially uninitialized.
For example, if 'numlvs' is zero or alloc_lvn() fails, neither is
initialized, but are used anyway if alloc_pvd() succeeds after it.
So, make the alloc_pvd() call conditional on their initialization.
This has been hit when attaching an apparently corrupted/stressed
AIX LUN, misleading the kernel to pr_warn() invalid data and hang.
[...] partition (null) (11 pp's found) is not contiguous
[...] partition (null) (2 pp's found) is not contiguous
[...] partition (null) (3 pp's found) is not contiguous
[...] partition (null) (64 pp's found) is not contiguous
Fixes: 6ceea22bbb ("partitions: add aix lvm partition support files")
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d43fdae7ba ]
Even if properly initialized, the lvname array (i.e., strings)
is read from disk, and might contain corrupt data (e.g., lack
the null terminating character for strings).
So, make sure the partition name string used in pr_warn() has
the null terminating character.
Fixes: 6ceea22bbb ("partitions: add aix lvm partition support files")
Suggested-by: Daniel J. Axtens <daniel.axtens@canonical.com>
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4faeaf9c0f ]
Look up of buffers in s5p_mfc_handle_frame_new, s5p_mfc_handle_frame_copy_time
functions is not working properly for DMA addresses above 2 GiB. As a result
flags and timestamp of returned buffers are not set correctly and it breaks
operation of GStreamer/OMX plugins which rely on the CAPTURE buffer queue
flags.
Due to improper return type of the get_dec_y_adr, get_dspl_y_adr callbacks
and sign bit extension these callbacks return incorrect address values,
e.g. 0xfffffffffefc0000 instead of 0x00000000fefc0000. Then the statement:
"if (vb2_dma_contig_plane_dma_addr(&dst_buf->b->vb2_buf, 0) == dec_y_addr)"
is always false, which breaks looking up capture queue buffers.
To ensure proper matching by address u32 type is used for the DMA
addresses. This should work on all related SoCs, since the MFC DMA
address width is not larger than 32-bit.
Changes done in this patch are minimal as there is a larger patch series
pending refactoring the whole driver.
Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 36f5d9ef26 ]
The driver only registers one input device, which uses the screen
parameters from the first T9 instance. The first T63 instance also uses
those parameters.
It is incorrect to send input reports from the second instances of these
objects if they are enabled: the input scaling will be wrong and the
positions will be mashed together.
This also causes problems on Android if the number of slots exceeds 32.
In the future, this could be handled by looking for enabled touch object
instances and creating an input device for each one.
Signed-off-by: Nick Dyer <nick.dyer@itdev.co.uk>
Acked-by: Benson Leung <bleung@chromium.org>
Acked-by: Yufeng Shen <miletus@chromium.org>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 08193d1a89 ]
The function dcb_app_lookup walks the list of specified DCB APP entries,
looking for one that matches a given criteria: ifindex, selector,
protocol ID and optionally also priority. The "don't care" value for
priority is set to 0, because that priority has not been allowed under
CEE regime, which predates the IEEE standardization.
Under IEEE, 0 is a valid priority number. But because dcb_app_lookup
considers zero a wild card, attempts to add an APP entry with priority 0
fail when other entries exist for a given ifindex / selector / PID
triplet.
Fix by changing the wild-card value to -1.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit bb853aac2c ]
Locking the root adapter for __i2c_transfer will deadlock if the
device sits behind a mux-locked I2C mux. Switch to the finer-grained
i2c_lock_bus with the I2C_LOCK_SEGMENT flag. If the device does not
sit behind a mux-locked mux, the two locking variants are equivalent.
Signed-off-by: Peter Rosin <peda@axentia.se>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Tested-by: Alexander Steffen <Alexander.Steffen@infineon.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 1a339b658d ]
An SPI TPM device managed directly on an embedded board using
the SPI bus and some GPIO or similar line as IRQ handler will
pass the IRQn from the TPM device associated with the SPI
device. This is already handled by the SPI core, so make sure
to pass this down to the core as well.
(The TPM core habit of using -1 to signal no IRQ is dubious
(as IRQ 0 is NO_IRQ) but I do not want to mess with that
semantic in this patch.)
Cc: Mark Brown <broonie@kernel.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4b270a8cc5 ]
In synchronous scenario, like in checkpoint(), we are going to flush
dirty node pages to device synchronously, we can easily failed
writebacking node page due to trylock_page() failure, especially in
condition of intensive lock competition, which can cause long latency
of checkpoint(). So let's use lock_page() in synchronous scenario to
avoid this issue.
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 8466baf788 ]
It is incorrect to enable TX/RX queues (call by mvneta_port_up()) for
port without link. Indeed MTU change for interface without link causes TX
queues to stuck.
Fixes: c5aff18204 ("net: mvneta: driver for Marvell Armada 370/XP
network unit")
Signed-off-by: Yelena Krivosheev <yelena@marvell.com>
[gregory.clement: adding Fixes tags and rewording commit log]
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4bf4eed44b ]
If ioh_gpio_probe() fails on devm_irq_alloc_descs() then chip may point
to any element of chip_save array, so reverse iteration from pointer chip
may become chip_save[-1] and gpiochip_remove() will operate with wrong
memory.
The patch fix the error path of ioh_gpio_probe() to correctly bypass
chip_save array.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Anton Vasilyev <vasilyev@ispras.ru>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b3cadaa485 ]
This fixes two issues with setting hid->name information.
CC net/bluetooth/hidp/core.o
In function ‘hidp_setup_hid’,
inlined from ‘hidp_session_dev_init’ at net/bluetooth/hidp/core.c:815:9,
inlined from ‘hidp_session_new’ at net/bluetooth/hidp/core.c:953:8,
inlined from ‘hidp_connection_add’ at net/bluetooth/hidp/core.c:1366:8:
net/bluetooth/hidp/core.c:778:2: warning: ‘strncpy’ output may be truncated copying 127 bytes from a string of length 127 [-Wstringop-truncation]
strncpy(hid->name, req->name, sizeof(req->name) - 1);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
CC net/bluetooth/hidp/core.o
net/bluetooth/hidp/core.c: In function ‘hidp_setup_hid’:
net/bluetooth/hidp/core.c:778:38: warning: argument to ‘sizeof’ in ‘strncpy’ call is the same expression as the source; did you mean to use the size of the destination? [-Wsizeof-pointer-memaccess]
strncpy(hid->name, req->name, sizeof(req->name));
^
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 673bc519c5 ]
The tx completion of multiple mgmt frames can be bundled
in a single event and sent by the firmware to host, if this
capability is not disabled explicitly by the host. If the host
cannot handle the bundled mgmt tx completion, this capability
support needs to be disabled in the wmi init cmd, sent to the firmware.
Add the host capability indication flag in the wmi ready command,
to let firmware know the features supported by the host driver.
This field is ignored if it is not supported by firmware.
Set the host capability indication flag(i.e. host_capab) to zero,
for disabling the support of bundle mgmt tx completion. This will
indicate the firmware to send completion event for every mgmt tx
completion, instead of bundling them together and sending in a single
event.
Tested HW: WCN3990
Tested FW: WLAN.HL.2.0-01188-QCAHLSWMTPLZ-1
Signed-off-by: Surabhi Vishnoi <svishnoi@codeaurora.org>
Signed-off-by: Rakesh Pillai <pillair@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4dc98c1995 ]
tw_probe() returns 0 in case of fail of tw_initialize_device_extension(),
pci_resource_start() or tw_reset_sequence() and releases resources.
twl_probe() returns 0 in case of fail of twl_initialize_device_extension(),
pci_iomap() and twl_reset_sequence(). twa_probe() returns 0 in case of
fail of tw_initialize_device_extension(), ioremap() and
twa_reset_sequence().
The patch adds retval initialization for these cases.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Anton Vasilyev <vasilyev@ispras.ru>
Acked-by: Adam Radford <aradford@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2dbb3ec29a ]
We have seen that on some platforms, SATA device never show any DEVSLP
residency. This prevent power gating of SATA IP, which prevent system
to transition to low power mode in systems with SLP_S0 aka modern
standby systems. The PHY logic is off only in DEVSLP not in slumber.
Reference:
https://www.intel.com/content/dam/www/public/us/en/documents/datasheets
/332995-skylake-i-o-platform-datasheet-volume-1.pdf
Section 28.7.6.1
Here driver is trying to do read-modify-write the devslp register. But
not resetting the bits for which this driver will modify values (DITO,
MDAT and DETO). So simply reset those bits before updating to new values.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 37a634f60f ]
When receiving a beacon or probe response, we should update the
boottime_ns field which is the timestamp the frame was received at.
(cf mac80211.h)
This fixes a scanning issue with Android since it relies on this
timestamp to determine when the AP has been seen for the last time
(via the nl80211 BSS_LAST_SEEN_BOOTTIME parameter).
Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3f25911158 ]
The QCA4019 hw1.0 firmware 10.4-3.2.1-00050 and 10.4-3.5.3-00053 (and most
likely all other) seem to ignore the WMI_CHAN_FLAG_DFS flag during the
scan. This results in transmission (probe requests) on channels which are
not "available" for transmissions.
Since the firmware is closed source and nothing can be done from our side
to fix the problem in it, the driver has to work around this problem. The
WMI_CHAN_FLAG_PASSIVE seems to be interpreted by the firmware to not
scan actively on a channel unless an AP was detected on it. Simple probe
requests will then be transmitted by the STA on the channel.
ath10k must therefore also use this flag when it queues a radar channel for
scanning. This should reduce the chance of an active scan when the channel
might be "unusable" for transmissions.
Fixes: e8a50f8ba4 ("ath10k: introduce DFS implementation")
Signed-off-by: Sven Eckelmann <sven.eckelmann@openmesh.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 461d8a6bb9 ]
The tx power applied by set_txpower is limited by the CTL (conformance
test limit) entries in the EEPROM. These can change based on the user
configured regulatory domain.
Depending on the EEPROM data this can cause the tx power to become too
limited, if the original regdomain CTLs impose lower limits than the CTLs
of the user configured regdomain.
To fix this issue, set the initial channel limits without any CTL
restrictions and only apply the CTL at run time when setting the channel
and the real tx power.
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 21b8732eb4 ]
After update of kernel, the perf tool doesn't run anymore on my 32MB RAM
powerpc board, but still runs on a 128MB RAM board:
~# strace perf
execve("/usr/sbin/perf", ["perf"], [/* 12 vars */]) = -1 ENOMEM (Cannot allocate memory)
--- SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL, si_addr=0} ---
+++ killed by SIGSEGV +++
Segmentation fault
objdump -x shows that .bss section has a huge size of 24Mbytes:
27 .bss 016baca8 101cebb8 101cebb8 001cd988 2**3
With especially the following objects having quite big size:
10205f80 l O .bss 00140000 runtime_cycles_stats
10345f80 l O .bss 00140000 runtime_stalled_cycles_front_stats
10485f80 l O .bss 00140000 runtime_stalled_cycles_back_stats
105c5f80 l O .bss 00140000 runtime_branches_stats
10705f80 l O .bss 00140000 runtime_cacherefs_stats
10845f80 l O .bss 00140000 runtime_l1_dcache_stats
10985f80 l O .bss 00140000 runtime_l1_icache_stats
10ac5f80 l O .bss 00140000 runtime_ll_cache_stats
10c05f80 l O .bss 00140000 runtime_itlb_cache_stats
10d45f80 l O .bss 00140000 runtime_dtlb_cache_stats
10e85f80 l O .bss 00140000 runtime_cycles_in_tx_stats
10fc5f80 l O .bss 00140000 runtime_transaction_stats
11105f80 l O .bss 00140000 runtime_elision_stats
11245f80 l O .bss 00140000 runtime_topdown_total_slots
11385f80 l O .bss 00140000 runtime_topdown_slots_retired
114c5f80 l O .bss 00140000 runtime_topdown_slots_issued
11605f80 l O .bss 00140000 runtime_topdown_fetch_bubbles
11745f80 l O .bss 00140000 runtime_topdown_recovery_bubbles
This is due to commit 4d255766d2 ("perf: Bump max number of cpus
to 1024"), because many tables are sized with MAX_NR_CPUS
This patch gives the opportunity to redefine MAX_NR_CPUS via
$ make EXTRA_CFLAGS=-DMAX_NR_CPUS=1
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/20170922112043.8349468C57@po15668-vm-win7.idsi0.si.c-s.fr
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3611ce9911 ]
For the case when sbi->segs_per_sec > 1, take section:segment = 5 for
example, if segment 1 is just used and allocate new segment 2, and the
blocks of segment 1 is invalidated, at this time, the previous code will
use __set_test_and_free to free the free_secmap and free_sections++,
this is not correct since it is still a current section, so fix it.
Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0419056ec8 ]
If number of isa and pci boards exceed NUM_BOARDS on the path
rp_init()->init_PCI()->register_PCI() then buffer overwrite occurs
in register_PCI() on assign rcktpt_io_addr[i].
The patch adds check on upper bound for index of registered
board in register_PCI.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Anton Vasilyev <vasilyev@ispras.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit f019f07ecf ]
The uio_unregister_device() function assumes that if "info->uio_dev" is
non-NULL that means "info" is fully allocated. Setting info->uio_de
has to be the last thing in the function.
In the current code, if request_threaded_irq() fails then we return with
info->uio_dev set to non-NULL but info is not fully allocated and it can
lead to double frees.
Fixes: beafc54c4e ("UIO: Add the User IO core code")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 363e934d88 ]
timer_base::must_forward_clock is indicating that the base clock might be
stale due to a long idle sleep.
The forwarding of the base clock takes place in the timer softirq or when a
timer is enqueued to a base which is idle. If the enqueue of timer to an
idle base happens from a remote CPU, then the following race can happen:
CPU0 CPU1
run_timer_softirq mod_timer
base = lock_timer_base(timer);
base->must_forward_clk = false
if (base->must_forward_clk)
forward(base); -> skipped
enqueue_timer(base, timer, idx);
-> idx is calculated high due to
stale base
unlock_timer_base(timer);
base = lock_timer_base(timer);
forward(base);
The root cause is that timer_base::must_forward_clk is cleared outside the
timer_base::lock held region, so the remote queuing CPU observes it as
cleared, but the base clock is still stale. This can cause large
granularity values for timers, i.e. the accuracy of the expiry time
suffers.
Prevent this by clearing the flag with timer_base::lock held, so that the
forwarding takes place before the cleared flag is observable by a remote
CPU.
Signed-off-by: Gaurav Kohli <gkohli@codeaurora.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: john.stultz@linaro.org
Cc: sboyd@kernel.org
Cc: linux-arm-msm@vger.kernel.org
Link: https://lkml.kernel.org/r/1533199863-22748-1-git-send-email-gkohli@codeaurora.org
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d63e2fc804 ]
During raid5 replacement, the stripes can be marked with R5_NeedReplace
flag. Data can be read from being-replaced devices and written to
replacing spares without reading all other devices. (It's 'replace'
mode. s.replacing = 1) If a being-replaced device is dropped, the
replacement progress will be interrupted and resumed with pure recovery
mode. However, existing stripes before being interrupted cannot read
from the dropped device anymore. It prints lots of WARN_ON messages.
And it results in data corruption because existing stripes write
problematic data into its replacement device and update the progress.
\# Erase disks (1MB + 2GB)
dd if=/dev/zero of=/dev/sda bs=1MB count=2049
dd if=/dev/zero of=/dev/sdb bs=1MB count=2049
dd if=/dev/zero of=/dev/sdc bs=1MB count=2049
dd if=/dev/zero of=/dev/sdd bs=1MB count=2049
mdadm -C /dev/md0 -amd -R -l5 -n3 -x0 /dev/sd[abc] -z 2097152
\# Ensure array stores non-zero data
dd if=/root/data_4GB.iso of=/dev/md0 bs=1MB
\# Start replacement
mdadm /dev/md0 -a /dev/sdd
mdadm /dev/md0 --replace /dev/sda
Then, Hot-plug out /dev/sda during recovery, and wait for recovery done.
echo check > /sys/block/md0/md/sync_action
cat /sys/block/md0/md/mismatch_cnt # it will be greater than 0.
Soon after you hot-plug out /dev/sda, you will see many WARN_ON
messages. The replacement recovery will be interrupted shortly. After
the recovery finishes, it will result in data corruption.
Actually, it's just an unhandled case of replacement. In commit
<f94c0b6658c7> (md/raid5: fix interaction of 'replace' and 'recovery'.),
if a NeedReplace device is not UPTODATE then that is an error, the
commit just simply print WARN_ON but also mark these corrupted stripes
with R5_WantReplace. (it means it's ready for writes.)
To fix this case, we can leverage 'sync and replace' mode mentioned in
commit <9a3e1101b827> (md/raid5: detect and handle replacements during
recovery.). We can add logics to detect and use 'sync and replace' mode
for these stripes.
Reported-by: Alex Chen <alexchen@synology.com>
Reviewed-by: Alex Wu <alexwu@synology.com>
Reviewed-by: Chung-Chiang Cheng <cccheng@synology.com>
Signed-off-by: BingJing Chang <bingjingc@synology.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 6a64f6e159 ]
When __transport_register_session is called from transport_register_session
irqs will already have been disabled, so we do not want the unlock irq call
to enable them until the higher level has done the final
spin_unlock_irqrestore/ spin_unlock_irq.
This has __transport_register_session use the save/restore call.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 77fefa93bf ]
Modify the register offsets in the Broadcom iProc mdio mux to start
from the top of the register address space.
Earlier, the base address pointed to the end of the block's register
space. The base address will now point to the start of the mdio's
address space. The offsets have been fixed to match this.
Signed-off-by: Arun Parameswaran <arun.parameswaran@broadcom.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 40b25bce0a ]
There is a bug in regards to deferred probing within the drivers core
that causes GPIO-driver to suspend after its users. The bug appears if
GPIO-driver probe is getting deferred, which happens after introducing
dependency on PINCTRL-driver for the GPIO-driver by defining "gpio-ranges"
property in device-tree. The bug in the drivers core is old (more than 4
years now) and is well known, unfortunately there is no easy fix for it.
The good news is that we can workaround the deferred probe issue by
changing GPIO / PINCTRL drivers registration order and hence by moving
PINCTRL driver registration to the arch_init level and GPIO to the
subsys_init.
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Acked-by: Stefan Agner <stefan@agner.ch>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 6c3711ec64 ]
This driver was recently updated to use serdev, so add the appropriate
dependency. Without this one can get compiler warnings like this if
CONFIG_SERIAL_DEV_BUS is not enabled:
CC [M] drivers/bluetooth/hci_h5.o
drivers/bluetooth/hci_h5.c:934:36: warning: ‘h5_serdev_driver’ defined but not used [-Wunused-variable]
static struct serdev_device_driver h5_serdev_driver = {
^~~~~~~~~~~~~~~~
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d89d415561 ]
Android's header sanitization tool chokes on static inline functions having a
trailing semicolon, leading to an incorrectly parsed header file. While the
tool should obviously be fixed, also fix the header files for the two affected
functions: ethtool_get_flow_spec_ring() and ethtool_get_flow_spec_ring_vf().
Fixes: 8cf6f497de ("ethtool: Add helper routines to pass vf to rx_flow_spec")
Reporetd-by: Blair Prescott <blair.prescott@broadcom.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a39284ae9d ]
There are only 2 callers of scif_get_new_port() and both appear to get
the error handling wrong. Both treat zero returns as error, but it
actually returns negative error codes and >= 0 on success.
Fixes: e9089f43c9 ("misc: mic: SCIF open close bind and listen APIs")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c2f83143f1 upstream.
Hillf Danton pointed out that since commit 1d82de618d ("mm, vmscan:
make kswapd reclaim in terms of nodes") that PGDAT_WRITEBACK is no
longer cleared.
It was not noticed as triggering it requires pages under writeback to
cycle twice through the LRU and before kswapd gets stalled.
Historically, such issues tended to occur on small machines writing
heavily to slow storage such as a USB stick.
Once kswapd stalls, direct reclaim stalls may be higher but due to the
fact that memory pressure is required, it would not be very noticable.
Michal Hocko suggested removing the flag entirely but the conservative
fix is to restore the intended PGDAT_WRITEBACK behaviour and clear the
flag when a suitable zone is balanced.
Fixes: 1d82de618d ("mm, vmscan: make kswapd reclaim in terms of nodes")
Link: http://lkml.kernel.org/r/20170203203222.gq7hk66yc36lpgtb@suse.de
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 50972fe78f upstream.
Fix ordering of link creation between node->prev and prev->next in
osq_lock(). A case in which the status of optimistic spin queue is
CPU6->CPU2 in which CPU6 has acquired the lock.
tail
v
,-. <- ,-.
|6| |2|
`-' -> `-'
At this point if CPU0 comes in to acquire osq_lock, it will update the
tail count.
CPU2 CPU0
----------------------------------
tail
v
,-. <- ,-. ,-.
|6| |2| |0|
`-' -> `-' `-'
After tail count update if CPU2 starts to unqueue itself from
optimistic spin queue, it will find an updated tail count with CPU0 and
update CPU2 node->next to NULL in osq_wait_next().
unqueue-A
tail
v
,-. <- ,-. ,-.
|6| |2| |0|
`-' `-' `-'
unqueue-B
->tail != curr && !node->next
If reordering of following stores happen then prev->next where prev
being CPU2 would be updated to point to CPU0 node:
tail
v
,-. <- ,-. ,-.
|6| |2| |0|
`-' `-' -> `-'
osq_wait_next()
node->next <- 0
xchg(node->next, NULL)
tail
v
,-. <- ,-. ,-.
|6| |2| |0|
`-' `-' `-'
unqueue-C
At this point if next instruction
WRITE_ONCE(next->prev, prev);
in CPU2 path is committed before the update of CPU0 node->prev = prev then
CPU0 node->prev will point to CPU6 node.
tail
v----------. v
,-. <- ,-. ,-.
|6| |2| |0|
`-' `-' `-'
`----------^
At this point if CPU0 path's node->prev = prev is committed resulting
in change of CPU0 prev back to CPU2 node. CPU2 node->next is NULL
currently,
tail
v
,-. <- ,-. <- ,-.
|6| |2| |0|
`-' `-' `-'
`----------^
so if CPU0 gets into unqueue path of osq_lock it will keep spinning
in infinite loop as condition prev->next == node will never be true.
Signed-off-by: Prateek Sood <prsood@codeaurora.org>
[ Added pictures, rewrote comments. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: sramana@codeaurora.org
Link: http://lkml.kernel.org/r/1500040076-27626-1-git-send-email-prsood@codeaurora.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 476accbe2f upstream.
There is a strange __GFP_NOMEMALLOC usage pattern in SELinux,
specifically GFP_ATOMIC | __GFP_NOMEMALLOC which doesn't make much
sense. GFP_ATOMIC on its own allows to access memory reserves while
__GFP_NOMEMALLOC dictates we cannot use memory reserves. Replace this
with the much more sane GFP_NOWAIT in the AVC code as we can tolerate
memory allocation failures in that code.
Signed-off-by: Michal Hocko <mhocko@kernel.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9c29c31830 upstream.
If a spinner is present, there is a chance that the load of
rwsem_has_spinner() in rwsem_wake() can be reordered with
respect to decrement of rwsem count in __up_write() leading
to wakeup being missed:
spinning writer up_write caller
--------------- -----------------------
[S] osq_unlock() [L] osq
spin_lock(wait_lock)
sem->count=0xFFFFFFFF00000001
+0xFFFFFFFF00000000
count=sem->count
MB
sem->count=0xFFFFFFFE00000001
-0xFFFFFFFF00000001
spin_trylock(wait_lock)
return
rwsem_try_write_lock(count)
spin_unlock(wait_lock)
schedule()
Reordering of atomic_long_sub_return_release() in __up_write()
and rwsem_has_spinner() in rwsem_wake() can cause missing of
wakeup in up_write() context. In spinning writer, sem->count
and local variable count is 0XFFFFFFFE00000001. It would result
in rwsem_try_write_lock() failing to acquire rwsem and spinning
writer going to sleep in rwsem_down_write_failed().
The smp_rmb() will make sure that the spinner state is
consulted after sem->count is updated in up_write context.
Signed-off-by: Prateek Sood <prsood@codeaurora.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave@stgolabs.net
Cc: longman@redhat.com
Cc: parri.andrea@gmail.com
Cc: sramana@codeaurora.org
Link: http://lkml.kernel.org/r/1504794658-15397-1-git-send-email-prsood@codeaurora.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 265698d7e6 upstream.
If TX rates are specified during mesh join, the channel must
also be specified. Check the channel pointer to avoid a null
pointer dereference if it isn't.
Reported-by: Jouni Malinen <j@w1.fi>
Fixes: 8564e38206 ("cfg80211: add checks for beacon rate, extend to mesh")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e00f4f4d0f upstream.
blkcg allocates some per-cgroup data structures with GFP_NOWAIT and
when that fails falls back to operations which aren't specific to the
cgroup. Occassional failures are expected under pressure and falling
back to non-cgroup operation is the right thing to do.
Unfortunately, I forgot to add __GFP_NOWARN to these allocations and
these expected failures end up creating a lot of noise. Add
__GFP_NOWARN.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Marc MERLIN <marc@merlins.org>
Reported-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4c93496f18 upstream.
This fixes a over-read condition detected by FORTIFY_SOURCE for this
line:
memcpy(SKB_TO_PKT(skb), &ack_pkt, sizeof(skb->cb));
The error was:
In file included from ./include/linux/bitmap.h:8:0,
from ./include/linux/cpumask.h:11,
from ./include/linux/mm_types_task.h:13,
from ./include/linux/mm_types.h:4,
from ./include/linux/kmemcheck.h:4,
from ./include/linux/skbuff.h:18,
from drivers/infiniband/sw/rxe/rxe_resp.c:34:
In function 'memcpy',
inlined from 'send_atomic_ack.constprop' at drivers/infiniband/sw/rxe/rxe_resp.c:998:2,
inlined from 'acknowledge' at drivers/infiniband/sw/rxe/rxe_resp.c:1026:3,
inlined from 'rxe_responder' at drivers/infiniband/sw/rxe/rxe_resp.c:1286:10:
./include/linux/string.h:309:4: error: call to '__read_overflow2' declared with attribute error: detected read beyond size of object passed as 2nd parameter
__read_overflow2();
Daniel Micay noted that struct rxe_pkt_info is 32 bytes on 32-bit
architectures, but skb->cb is still 64. The memcpy() over-reads 32
bytes. This fixes it by zeroing the unused bytes in skb->cb.
Link: http://lkml.kernel.org/r/1497903987-21002-5-git-send-email-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Moni Shoua <monis@mellanox.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Daniel Micay <danielmicay@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 498c4b4e9c upstream.
The driver may sleep under a spin lock, and the function call path is:
rtsx_exclusive_enter_ss (acquire the lock by spin_lock)
rtsx_enter_ss
rtsx_power_off_card
xd_cleanup_work
xd_delay_write
xd_finish_write
xd_copy_page
wait_timeout
schedule_timeout --> may sleep
To fix it, "wait_timeout" is replaced with mdelay in xd_copy_page.
Signed-off-by: Jia-Ju Bai <baijiaju1990@163.com>
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b0f5a8f32e upstream.
This fixes a regression in commit 4d6501dce0 where I didn't notice
that MIPS and OpenRISC were reinitialising p->{set,clear}_child_tid to
NULL after our initialisation in copy_process().
We can simply get rid of the arch-specific initialisation here since it
is now always done in copy_process() before hitting copy_thread{,_tls}().
Review notes:
- As far as I can tell, copy_process() is the only user of
copy_thread_tls(), which is the only caller of copy_thread() for
architectures that don't implement copy_thread_tls().
- After this patch, there is no arch-specific code touching
p->set_child_tid or p->clear_child_tid whatsoever.
- It may look like MIPS/OpenRISC wanted to always have these fields be
NULL, but that's not true, as copy_process() would unconditionally
set them again _after_ calling copy_thread_tls() before commit
4d6501dce0.
Fixes: 4d6501dce0 ("kthread: Fix use-after-free if kthread fork fails")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net> # MIPS only
Acked-by: Stafford Horne <shorne@gmail.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: openrisc@lists.librecores.org
Cc: Jamie Iles <jamie.iles@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4d6501dce0 upstream.
If a kthread forks (e.g. usermodehelper since commit 1da5c46fa9) but
fails in copy_process() between calling dup_task_struct() and setting
p->set_child_tid, then the value of p->set_child_tid will be inherited
from the parent and get prematurely freed by free_kthread_struct().
kthread()
- worker_thread()
- process_one_work()
| - call_usermodehelper_exec_work()
| - kernel_thread()
| - _do_fork()
| - copy_process()
| - dup_task_struct()
| - arch_dup_task_struct()
| - tsk->set_child_tid = current->set_child_tid // implied
| - ...
| - goto bad_fork_*
| - ...
| - free_task(tsk)
| - free_kthread_struct(tsk)
| - kfree(tsk->set_child_tid)
- ...
- schedule()
- __schedule()
- wq_worker_sleeping()
- kthread_data(task)->flags // UAF
The problem started showing up with commit 1da5c46fa9 since it reused
->set_child_tid for the kthread worker data.
A better long-term solution might be to get rid of the ->set_child_tid
abuse. The comment in set_kthread_struct() also looks slightly wrong.
Debugged-by: Jamie Iles <jamie.iles@oracle.com>
Fixes: 1da5c46fa9 ("kthread: Make struct kthread kmalloc'ed")
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jamie Iles <jamie.iles@oracle.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20170509073959.17858-1-vegard.nossum@oracle.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b3193bc0dc upstream.
In below scenario blkio cgroup does not work as per their assigned
weights :-
1. When the underlying device is nonrotational with a single HW queue
with depth of >= CFQ_HW_QUEUE_MIN
2. When the use case is forming two blkio cgroups cg1(weight 1000) &
cg2(wight 100) and two processes(file1 and file2) doing sync IO in
their respective blkio cgroups.
For above usecase result of fio (without this patch):-
file1: (groupid=0, jobs=1): err= 0: pid=685: Thu Jan 1 19:41:49 1970
write: IOPS=1315, BW=41.1MiB/s (43.1MB/s)(1024MiB/24906msec)
<...>
file2: (groupid=0, jobs=1): err= 0: pid=686: Thu Jan 1 19:41:49 1970
write: IOPS=1295, BW=40.5MiB/s (42.5MB/s)(1024MiB/25293msec)
<...>
// both the process BW is equal even though they belong to diff.
cgroups with weight of 1000(cg1) and 100(cg2)
In above case (for non rotational NCQ devices),
as soon as the request from cg1 is completed and even
though it is provided with higher set_slice=10, because of CFQ
algorithm when the driver tries to fetch the request, CFQ expires
this group without providing any idle time nor weight priority
and schedules another cfq group (in this case cg2).
And thus both cfq groups(cg1 & cg2) keep alternating to get the
disk time and hence loses the cgroup weight based scheduling.
Below patch gives a chance to cfq algorithm (cfq_arm_slice_timer)
to arm the slice timer in case group_idle is enabled.
In case if group_idle is also not required (including for nonrotational
NCQ drives), we need to explicitly set group_idle = 0 from sysfs for
such cases.
With this patch result of fio(for above usecase) :-
file1: (groupid=0, jobs=1): err= 0: pid=690: Thu Jan 1 00:06:08 1970
write: IOPS=1706, BW=53.3MiB/s (55.9MB/s)(1024MiB/19197msec)
<..>
file2: (groupid=0, jobs=1): err= 0: pid=691: Thu Jan 1 00:06:08 1970
write: IOPS=1043, BW=32.6MiB/s (34.2MB/s)(1024MiB/31401msec)
<..>
// In this processes BW is as per their respective cgroups weight.
Signed-off-by: Ritesh Harjani <riteshh@codeaurora.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1603764396 upstream.
On AMD/ATI controllers, the HD-audio controller driver allows a bus
reset upon the error recovery, and its procedure includes the
cancellation of pending jack polling work as found in
snd_hda_bus_codec_reset(). This works usually fine, but it becomes a
problem when the reset happens from the jack poll work itself; then
calling cancel_work_sync() from the work being processed tries to wait
the finish endlessly.
As a workaround, this patch adds the check of current_work() and
applies the cancel_work_sync() only when it's not from the
jackpoll_work.
This doesn't fix the root cause of the reported error below, but at
least, it eases the unexpected stall of the whole system.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=200937
Cc: <stable@vger.kernel.org>
Cc: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ae7304c3ea upstream.
Disable interrupts while configuring the transfer and enable them back.
We have below as the programming sequence
1. start and slave address
2. byte count and stop
In some customer platform there was a lot of interrupts between 1 and 2
and after slave address (around 7 clock cyles) if 2 is not executed
then the transaction is nacked.
To fix this case make the 2 writes atomic.
Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@xilinx.com>
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
[wsa: added a newline for better readability]
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4c4a39dd5f upstream.
If there is a mismatch in the I/D min line size, we must
always use the system wide safe value both in applications
and in the kernel, while performing cache operations. However,
we have been checking more bits than just the min line sizes,
which triggers false negatives. We may need to trap the user
accesses in such cases, but not necessarily patch the kernel.
This patch fixes the check to do the right thing as advertised.
A new capability will be added to check mismatches in other
fields and ensure we trap the CTR accesses.
Fixes: be68a8aaf9 ("arm64: cpufeature: Fix CTR_EL0 field definitions")
Cc: <stable@vger.kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d814a49198 upstream.
We use customized, nodesize batch value to update dirty_metadata_bytes.
We should also use batch version of compare function or we will easily
goto fast path and get false result from percpu_counter_compare().
Fixes: e2d845211e ("Btrfs: use percpu counter for dirty metadata count")
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Ethan Lien <ethanlien@synology.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
nb: Rebased on 4.4.y ]
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5eda25b102 upstream.
The memove, memset, memcpy, __memset16, __memset32 and __memset64
function have an additional indirect return branch in form of a
"bzr" instruction. These need to use expolines as well.
Cc: <stable@vger.kernel.org> # v4.17+
Fixes: 97489e0663 ("s390/lib: use expoline for indirect branches")
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cc365dcf0e upstream.
>From the pci power documentation:
"The driver itself should not call pm_runtime_allow(), though. Instead,
it should let user space or some platform-specific code do that (user space
can do it via sysfs as stated above)..."
However, the S0ix residency cannot be reached without MEI device getting
into low power state. Hence, for mei devices that support D0i3, it's better
to make runtime power management mandatory and not rely on the system
integration such as udev rules.
This policy cannot be applied globally as some older platforms
were found to have broken power management.
Cc: <stable@vger.kernel.org> v4.13+
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Reviewed-by: Alexander Usyskin <alexander.usyskin@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 2aa6d036b7 ("mm: numa: avoid waiting on freed migrated pages")
was an incomplete backport of the upstream commit. It is necessary to
always reset page_nid before attempting any early exit.
The original commit conflicted due to lack of commit 82b0f8c39a
("mm: join struct fault_env and vm_fault") in 4.9 so it wasn't a clean
application, and the change must have just gotten lost in the noise.
Signed-off-by: Chas Williams <chas3@att.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cb5c656886 upstream.
In commit ab123fe071 ("enic: handle mtu change for vf properly")
ASSERT_RTNL() is added to _enic_change_mtu() to prevent it from being
called without rtnl held. enic_probe() calls enic_change_mtu()
without rtnl held. At this point netdev is not registered yet.
Remove call to enic_change_mtu and assign the mtu to netdev->mtu.
Fixes: ab123fe071 ("enic: handle mtu change for vf properly")
Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The irda_setsockopt() function conditionally allocates memory for a new
self->ias_object or, in some cases, reuses the existing
self->ias_object. Existing objects were incorrectly reinserted into the
LM_IAS database which corrupted the doubly linked list used for the
hashbin implementation of the LM_IAS database. When combined with a
memory leak in irda_bind(), this issue could be leveraged to create a
use-after-free vulnerability in the hashbin list. This patch fixes the
issue by only inserting newly allocated objects into the database.
CVE-2018-6555
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Reviewed-by: Seth Arnold <seth.arnold@canonical.com>
Reviewed-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The irda_bind() function allocates memory for self->ias_obj without
checking to see if the socket is already bound. A userspace process
could repeatedly bind the socket, have each new object added into the
LM-IAS database, and lose the reference to the old object assigned to
the socket to exhaust memory resources. This patch errors out of the
bind operation when self->ias_obj is already assigned.
CVE-2018-6554
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Reviewed-by: Seth Arnold <seth.arnold@canonical.com>
Reviewed-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 914b087ff9 upstream.
When $DEPMOD is not found, only print a warning instead of exiting
with an error message and error status:
Warning: 'make modules_install' requires /sbin/depmod. Please install it.
This is probably in the kmod package.
Change the Error to a Warning because "not all build hosts for cross
compiling Linux are Linux systems and are able to provide a working
port of depmod, especially at the file patch /sbin/depmod."
I.e., "make modules_install" may be used to copy/install the
loadable modules files to a target directory on a build system and
then transferred to an embedded device where /sbin/depmod is run
instead of it being run on the build system.
Fixes: 934193a654 ("kbuild: verify that $DEPMOD is installed")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: H. Nikolaus Schaller <hns@goldelico.com>
Cc: stable@vger.kernel.org
Cc: Lucas De Marchi <lucas.demarchi@profusion.mobi>
Cc: Lucas De Marchi <lucas.de.marchi@gmail.com>
Cc: Michal Marek <michal.lkml@markovi.net>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Chih-Wei Huang <cwhuang@linux.org.tw>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Maxim Zhukov <mussitantesmortem@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b2d7a075a1 upstream.
Using only 32-bit writes for the pte will result in an intermediate
L1TF vulnerable PTE. When running as a Xen PV guest this will at once
switch the guest to shadow mode resulting in a loss of performance.
Use arch_atomic64_xchg() instead which will perform the requested
operation atomically with all 64 bits.
Some performance considerations according to:
https://software.intel.com/sites/default/files/managed/ad/dc/Intel-Xeon-Scalable-Processor-throughput-latency.pdf
The main number should be the latency, as there is no tight loop around
native_ptep_get_and_clear().
"lock cmpxchg8b" has a latency of 20 cycles, while "lock xchg" (with a
memory operand) isn't mentioned in that document. "lock xadd" (with xadd
having 3 cycles less latency than xchg) has a latency of 11, so we can
assume a latency of 14 for "lock xchg".
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Jason Andryuk <jandryuk@gmail.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
[ Atomic operations gained an arch_ prefix in 8bf705d130
("locking/atomic/x86: Switch atomic.h to use atomic-instrumented.h") so
s/arch_atomic64_xchg/atomic64_xchg/ for backport.]
Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fc91a3c4c2 upstream.
While debugging an issue debugobject tracking warned about an annotation
issue of an object on stack. It turned out that the issue was due to the
object in concern being on a different stack which was due to another
issue.
Thomas suggested to print the pointers and the location of the stack for
the currently running task. This helped to figure out that the object was
on the wrong stack.
As this is general useful information for debugging similar issues, make
the error message more informative by printing the pointers.
[ tglx: Massaged changelog ]
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Waiman Long <longman@redhat.com>
Acked-by: Yang Shi <yang.shi@linux.alibaba.com>
Cc: kernel-team@android.com
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: astrachan@google.com
Link: https://lkml.kernel.org/r/20180723212531.202328-1-joel@joelfernandes.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d1558dfd9f ]
A number of the Rockchip-specific drivers (IOMMU, display controllers)
are now assuming that CONFIG_PM is set, and may completely misbehave
if that's not the case.
Since there is hardly any reason for this configuration option not
to be selected anyway, let's require it (in the same way Tegra already
does).
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 7db7a8f563 ]
A number of the Rockchip-specific drivers (IOMMU, display controllers)
are now assuming that CONFIG_PM is set, and may completely misbehave
if that's not the case.
Since there is hardly any reason for this configuration option not
to be selected anyway, let's require it (in the same way Tegra already
does).
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4379444654 ]
[BUG]
Under certain KVM load and LTP tests, it is possible to hit the
following calltrace if quota is enabled:
BTRFS critical (device vda2): unable to find logical 8820195328 length 4096
BTRFS critical (device vda2): unable to find logical 8820195328 length 4096
WARNING: CPU: 0 PID: 49 at ../block/blk-core.c:172 blk_status_to_errno+0x1a/0x30
CPU: 0 PID: 49 Comm: kworker/u2:1 Not tainted 4.12.14-15-default #1 SLE15 (unreleased)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
task: ffff9f827b340bc0 task.stack: ffffb4f8c0304000
RIP: 0010:blk_status_to_errno+0x1a/0x30
Call Trace:
submit_extent_page+0x191/0x270 [btrfs]
? btrfs_create_repair_bio+0x130/0x130 [btrfs]
__do_readpage+0x2d2/0x810 [btrfs]
? btrfs_create_repair_bio+0x130/0x130 [btrfs]
? run_one_async_done+0xc0/0xc0 [btrfs]
__extent_read_full_page+0xe7/0x100 [btrfs]
? run_one_async_done+0xc0/0xc0 [btrfs]
read_extent_buffer_pages+0x1ab/0x2d0 [btrfs]
? run_one_async_done+0xc0/0xc0 [btrfs]
btree_read_extent_buffer_pages+0x94/0xf0 [btrfs]
read_tree_block+0x31/0x60 [btrfs]
read_block_for_search.isra.35+0xf0/0x2e0 [btrfs]
btrfs_search_slot+0x46b/0xa00 [btrfs]
? kmem_cache_alloc+0x1a8/0x510
? btrfs_get_token_32+0x5b/0x120 [btrfs]
find_parent_nodes+0x11d/0xeb0 [btrfs]
? leaf_space_used+0xb8/0xd0 [btrfs]
? btrfs_leaf_free_space+0x49/0x90 [btrfs]
? btrfs_find_all_roots_safe+0x93/0x100 [btrfs]
btrfs_find_all_roots_safe+0x93/0x100 [btrfs]
btrfs_find_all_roots+0x45/0x60 [btrfs]
btrfs_qgroup_trace_extent_post+0x20/0x40 [btrfs]
btrfs_add_delayed_data_ref+0x1a3/0x1d0 [btrfs]
btrfs_alloc_reserved_file_extent+0x38/0x40 [btrfs]
insert_reserved_file_extent.constprop.71+0x289/0x2e0 [btrfs]
btrfs_finish_ordered_io+0x2f4/0x7f0 [btrfs]
? pick_next_task_fair+0x2cd/0x530
? __switch_to+0x92/0x4b0
btrfs_worker_helper+0x81/0x300 [btrfs]
process_one_work+0x1da/0x3f0
worker_thread+0x2b/0x3f0
? process_one_work+0x3f0/0x3f0
kthread+0x11a/0x130
? kthread_create_on_node+0x40/0x40
ret_from_fork+0x35/0x40
BTRFS critical (device vda2): unable to find logical 8820195328 length 16384
BTRFS: error (device vda2) in btrfs_finish_ordered_io:3023: errno=-5 IO failure
BTRFS info (device vda2): forced readonly
BTRFS error (device vda2): pending csums is 2887680
[CAUSE]
It's caused by race with block group auto removal:
- There is a meta block group X, which has only one tree block
The tree block belongs to fs tree 257.
- In current transaction, some operation modified fs tree 257
The tree block gets COWed, so the block group X is empty, and marked
as unused, queued to be deleted.
- Some workload (like fsync) wakes up cleaner_kthread()
Which will call btrfs_delete_unused_bgs() to remove unused block
groups.
So block group X along its chunk map get removed.
- Some delalloc work finished for fs tree 257
Quota needs to get the original reference of the extent, which will
read tree blocks of commit root of 257.
Then since the chunk map gets removed, the above warning gets
triggered.
[FIX]
Just let btrfs_delete_unused_bgs() skip block group which still has
pinned bytes.
However there is a minor side effect: currently we only queue empty
blocks at update_block_group(), and such empty block group with pinned
bytes won't go through update_block_group() again, such block group
won't be removed, until it gets new extent allocated and removed.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 389305b2aa ]
Invalid reloc tree can cause kernel NULL pointer dereference when btrfs
does some cleanup of the reloc roots.
It turns out that fs_info::reloc_ctl can be NULL in
btrfs_recover_relocation() as we allocate relocation control after all
reloc roots have been verified.
So when we hit: note, we haven't called set_reloc_control() thus
fs_info::reloc_ctl is still NULL.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=199833
Reported-by: Xu Wen <wen.xu@gatech.edu>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Tested-by: Gu Jinxiang <gujx@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 1e7e1f9e3a ]
on-disk devs stats value is updated in btrfs_run_dev_stats(),
which is called during commit transaction, if device->dev_stats_ccnt
is not zero.
Since current replace operation does not touch dev_stats_ccnt,
on-disk dev stats value is not updated. Therefore "btrfs device stats"
may return old device's value after umount/mount
(Example: See "btrfs ins dump-t -t DEV $DEV" after btrfs/100 finish).
Fix this by just incrementing dev_stats_ccnt in
btrfs_dev_replace_finishing() when replace is succeeded and this will
update the values.
Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 640332d1a0 ]
PWM2 is commonly used to control voltage of PWM regulator of VDD_LOG in
RK3399. On the Firefly-RK3399 board, PWM2 outputs 40 KHz square wave
from power on and the VDD_LOG is about 0.9V. When the kernel boots
normally into the system, the PWM2 keeps outputing PWM signal.
But the kernel hangs randomly after "Starting kernel ..." line on that
board. When it happens, PWM2 outputs high level which causes VDD_LOG
drops to 0.4V below the normal operating voltage.
By adding "pclk_rkpwm_pmu" to the rk3399_pmucru_critical_clocks array,
PWM clock is ensured to be prepared at startup and the PWM2 output is
normal. After repeated tests, the early boot hang is gone.
This patch works on both Firefly-RK3399 and ROC-RK3399-PC boards.
Signed-off-by: Levin Du <djw@t-chip.com.cn>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 74e96bf44f ]
The global mce data buffer that used to copy rtas error log is of 2048
(RTAS_ERROR_LOG_MAX) bytes in size. Before the copy we read
extended_log_length from rtas error log header, then use max of
extended_log_length and RTAS_ERROR_LOG_MAX as a size of data to be copied.
Ideally the platform (phyp) will never send extended error log with
size > 2048. But if that happens, then we have a risk of buffer overrun
and corruption. Fix this by using min_t instead.
Fixes: d368514c30 ("powerpc: Fix corruption when grabbing FWNMI data")
Reported-by: Michal Suchanek <msuchanek@suse.com>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 289131e1f1 ]
For SMB2/SMB3 the number of requests sent was not displayed
in /proc/fs/cifs/Stats unless CONFIG_CIFS_STATS2 was
enabled (only number of failed requests displayed). As
with earlier dialects, we should be displaying these
counters if CONFIG_CIFS_STATS is enabled. They
are important for debugging.
e.g. when you cat /proc/fs/cifs/Stats (before the patch)
Resources in use
CIFS Session: 1
Share (unique mount targets): 2
SMB Request/Response Buffer: 1 Pool size: 5
SMB Small Req/Resp Buffer: 1 Pool size: 30
Operations (MIDs): 0
0 session 0 share reconnects
Total vfs operations: 690 maximum at one time: 2
1) \\localhost\test
SMBs: 975
Negotiates: 0 sent 0 failed
SessionSetups: 0 sent 0 failed
Logoffs: 0 sent 0 failed
TreeConnects: 0 sent 0 failed
TreeDisconnects: 0 sent 0 failed
Creates: 0 sent 2 failed
Closes: 0 sent 0 failed
Flushes: 0 sent 0 failed
Reads: 0 sent 0 failed
Writes: 0 sent 0 failed
Locks: 0 sent 0 failed
IOCTLs: 0 sent 1 failed
Cancels: 0 sent 0 failed
Echos: 0 sent 0 failed
QueryDirectories: 0 sent 63 failed
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c281bc0c74 ]
echo 0 > /proc/fs/cifs/Stats is supposed to reset the stats
but there were four (see example below) that were not reset
(bytes read and witten, total vfs ops and max ops
at one time).
...
0 session 0 share reconnects
Total vfs operations: 100 maximum at one time: 2
1) \\localhost\test
SMBs: 0
Bytes read: 502092 Bytes written: 31457286
TreeConnects: 0 total 0 failed
TreeDisconnects: 0 total 0 failed
...
This patch fixes cifs_stats_proc_write to properly reset
those four.
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 7c27a26e1e ]
There are some powerpc selftests, as tm/tm-unavailable, that run for a long
period (>120 seconds), and if it is interrupted, as pressing CRTL-C
(SIGINT), the foreground process (harness) dies but the child process and
threads continue to execute (with PPID = 1 now) in background.
In this case, you'd think the whole test exited, but there are remaining
threads and processes being executed in background. Sometimes these
zombies processes are doing annoying things, as consuming the whole CPU or
dumping things to STDOUT.
This patch fixes this problem by attaching an empty signal handler to
SIGINT in the harness process. This handler will interrupt (EINTR) the
parent process waitpid() call, letting the code to follow through the
normal flow, which will kill all the processes in the child process group.
This patch also fixes a typo.
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Gustavo Romero <gromero@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e083926b3e ]
The PFI subdevice flags indicate that the subdevice is readable and
writeable, but that is only true for the supported "M-series" boards,
not the older "E-series" boards. Only set the SDF_READABLE and
SDF_WRITABLE subdevice flags for the M-series boards. These two flags
are mainly for informational purposes.
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit dfd0309fd7 ]
pcie->realio.end should be the address of last byte of the area,
therefore using resource_size() of another resource is not correct, we
must substract 1 to get the address of the last byte.
Fixes: 11be65472a ("PCI: mvebu: Adapt to the new device tree layout")
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5971b0c159 ]
Since commit 63347db0af "ACPI / scan: Use acpi_bus_get_status() to
initialize ACPI_TYPE_DEVICE devs" the status field of normal acpi_devices
gets set to 0 by acpi_bus_type_and_status() and filled with its actual
value later when acpi_add_single_object() calls acpi_bus_get_status().
This means that any acpi_match_device_ids() calls in between will always
fail with -ENOENT.
We already have a workaround for this, which temporary forces status to
ACPI_STA_DEFAULT in drivers/acpi/x86/utils.c: acpi_device_always_present()
and the next commit in this series adds another acpi_match_device_ids()
call between status being initialized as 0 and the acpi_bus_get_status()
call.
Rather then adding another workaround, this commit makes
acpi_bus_type_and_status() initialize status to ACPI_STA_DEFAULT, this is
safe to do as the only code looking at status between the initialization
and the acpi_bus_get_status() call is those acpi_match_device_ids() calls.
Note this does mean that we need to (re)set status to 0 in case the
acpi_bus_get_status() call fails.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 7c6553d4db ]
Fix a panic that occurs for a device that got an error in
dasd_eckd_check_characteristics() during online processing.
For example the read configuration data command may have failed.
If this error occurs the device is not being set online and the earlier
invoked steps during online processing are rolled back. Therefore
dasd_eckd_uncheck_device() is called which needs a valid private
structure. But this pointer is not valid if
dasd_eckd_check_characteristics() has failed.
Check for a valid device->private pointer to prevent a panic.
Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com>
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d6c02a9beb ]
In commit ed996a52c8 ("block: simplify and cleanup bvec pool
handling"), the value of the slab index is incremented by one in
bvec_alloc() after the allocation is done to indicate an index value of
0 does not need to be later freed.
bvec_nr_vecs() was not updated accordingly, and thus returns the wrong
value. Decrement idx before performing the lookup.
Fixes: ed996a52c8 ("block: simplify and cleanup bvec pool handling")
Signed-off-by: Greg Edwards <gedwards@ddn.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 354b064b8e ]
In some cases, a symbol may have multiple aliases. Attempting to add an
entry probe for such symbols results in a probe being added at an
incorrect location while it fails altogether for return probes. This is
only applicable for binaries with debug information.
During the arch-dependent post-processing, the offset from the start of
the symbol at which the probe is to be attached is determined and added
to the start address of the symbol to get the probe's location. In case
there are multiple aliases, this offset gets added multiple times for
each alias of the symbol and we end up with an incorrect probe location.
This can be verified on a powerpc64le system as shown below.
$ nm /lib/modules/$(uname -r)/build/vmlinux | grep "sys_open$"
...
c000000000414290 T __se_sys_open
c000000000414290 T sys_open
$ objdump -d /lib/modules/$(uname -r)/build/vmlinux | grep -A 10 "<__se_sys_open>:"
c000000000414290 <__se_sys_open>:
c000000000414290: 19 01 4c 3c addis r2,r12,281
c000000000414294: 70 c4 42 38 addi r2,r2,-15248
c000000000414298: a6 02 08 7c mflr r0
c00000000041429c: e8 ff a1 fb std r29,-24(r1)
c0000000004142a0: f0 ff c1 fb std r30,-16(r1)
c0000000004142a4: f8 ff e1 fb std r31,-8(r1)
c0000000004142a8: 10 00 01 f8 std r0,16(r1)
c0000000004142ac: c1 ff 21 f8 stdu r1,-64(r1)
c0000000004142b0: 78 23 9f 7c mr r31,r4
c0000000004142b4: 78 1b 7e 7c mr r30,r3
For both the entry probe and the return probe, the probe location
should be _text+4276888 (0xc000000000414298). Since another alias
exists for 'sys_open', the post-processing code will end up adding
the offset (8 for powerpc64le) twice and perf will attempt to add
the probe at _text+4276896 (0xc0000000004142a0) instead.
Before:
# perf probe -v -a sys_open
probe-definition(0): sys_open
symbol:sys_open file:(null) line:0 offset:0 return:0 lazy:(null)
0 arguments
Looking at the vmlinux_path (8 entries long)
Using /lib/modules/4.18.0-rc8+/build/vmlinux for symbols
Open Debuginfo file: /lib/modules/4.18.0-rc8+/build/vmlinux
Try to find probe point from debuginfo.
Symbol sys_open address found : c000000000414290
Matched function: __se_sys_open [2ad03a0]
Probe point found: __se_sys_open+0
Found 1 probe_trace_events.
Opening /sys/kernel/debug/tracing/kprobe_events write=1
Writing event: p:probe/sys_open _text+4276896
Added new event:
probe:sys_open (on sys_open)
...
# perf probe -v -a sys_open%return $retval
probe-definition(0): sys_open%return
symbol:sys_open file:(null) line:0 offset:0 return:1 lazy:(null)
0 arguments
Looking at the vmlinux_path (8 entries long)
Using /lib/modules/4.18.0-rc8+/build/vmlinux for symbols
Open Debuginfo file: /lib/modules/4.18.0-rc8+/build/vmlinux
Try to find probe point from debuginfo.
Symbol sys_open address found : c000000000414290
Matched function: __se_sys_open [2ad03a0]
Probe point found: __se_sys_open+0
Found 1 probe_trace_events.
Opening /sys/kernel/debug/tracing/README write=0
Opening /sys/kernel/debug/tracing/kprobe_events write=1
Parsing probe_events: p:probe/sys_open _text+4276896
Group:probe Event:sys_open probe:p
Writing event: r:probe/sys_open__return _text+4276896
Failed to write event: Invalid argument
Error: Failed to add events. Reason: Invalid argument (Code: -22)
After:
# perf probe -v -a sys_open
probe-definition(0): sys_open
symbol:sys_open file:(null) line:0 offset:0 return:0 lazy:(null)
0 arguments
Looking at the vmlinux_path (8 entries long)
Using /lib/modules/4.18.0-rc8+/build/vmlinux for symbols
Open Debuginfo file: /lib/modules/4.18.0-rc8+/build/vmlinux
Try to find probe point from debuginfo.
Symbol sys_open address found : c000000000414290
Matched function: __se_sys_open [2ad03a0]
Probe point found: __se_sys_open+0
Found 1 probe_trace_events.
Opening /sys/kernel/debug/tracing/kprobe_events write=1
Writing event: p:probe/sys_open _text+4276888
Added new event:
probe:sys_open (on sys_open)
...
# perf probe -v -a sys_open%return $retval
probe-definition(0): sys_open%return
symbol:sys_open file:(null) line:0 offset:0 return:1 lazy:(null)
0 arguments
Looking at the vmlinux_path (8 entries long)
Using /lib/modules/4.18.0-rc8+/build/vmlinux for symbols
Open Debuginfo file: /lib/modules/4.18.0-rc8+/build/vmlinux
Try to find probe point from debuginfo.
Symbol sys_open address found : c000000000414290
Matched function: __se_sys_open [2ad03a0]
Probe point found: __se_sys_open+0
Found 1 probe_trace_events.
Opening /sys/kernel/debug/tracing/README write=0
Opening /sys/kernel/debug/tracing/kprobe_events write=1
Parsing probe_events: p:probe/sys_open _text+4276888
Group:probe Event:sys_open probe:p
Writing event: r:probe/sys_open__return _text+4276888
Added new event:
probe:sys_open__return (on sys_open%return)
...
Reported-by: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Fixes: 99e608b595 ("perf probe ppc64le: Fix probe location when using DWARF")
Link: http://lkml.kernel.org/r/20180809161929.35058-1-sandipan@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0702bc4d2f ]
When compiling bmips with SMP disabled, the build fails with:
drivers/irqchip/irq-bcm7038-l1.o: In function `bcm7038_l1_cpu_offline':
drivers/irqchip/irq-bcm7038-l1.c:242: undefined reference to `irq_set_affinity_locked'
make[5]: *** [vmlinux] Error 1
Fix this by adding and setting bcm7038_l1_cpu_offline only when actually
compiling for SMP. It wouldn't have been used anyway, as it requires
CPU_HOTPLUG, which in turn requires SMP.
Fixes: 34c535793b ("irqchip/bcm7038-l1: Implement irq_cpu_offline() callback")
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a1ceeca679 ]
hns bitmap allocation functions return 0 on success and -1 on failure.
Callers of these functions wrongly used their return value as an errno,
fix that by making a proper conversion.
Fixes: a598c6f4c5 ("IB/hns: Simplify function of pd alloc and qp alloc")
Signed-off-by: Gal Pressman <pressmangal@gmail.com>
Acked-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 880b29ac10 ]
Add entry to WMI keymap for lid flip event on Asus UX360.
On Asus Zenbook ux360 flipping lid from/to tablet mode triggers
keyscan code 0xfa which cannot be handled and results in kernel
log message "Unknown key fa pressed".
Signed-off-by: Aleh Filipovich<aleh@appnexus.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a53b42c118 ]
We came across infinite loop in ipvs when using ipvs in docker
env.
When ipvs receives new packets and cannot find an ipvs connection,
it will create a new connection, then if the dest is unavailable
(i.e. IP_VS_DEST_F_AVAILABLE), the packet will be dropped sliently.
But if the dropped packet is the first packet of this connection,
the connection control timer never has a chance to start and the
ipvs connection cannot be released. This will lead to memory leak, or
infinite loop in cleanup_net() when net namespace is released like
this:
ip_vs_conn_net_cleanup at ffffffffa0a9f31a [ip_vs]
__ip_vs_cleanup at ffffffffa0a9f60a [ip_vs]
ops_exit_list at ffffffff81567a49
cleanup_net at ffffffff81568b40
process_one_work at ffffffff810a851b
worker_thread at ffffffff810a9356
kthread at ffffffff810b0b6f
ret_from_fork at ffffffff81697a18
race condition:
CPU1 CPU2
ip_vs_in()
ip_vs_conn_new()
ip_vs_del_dest()
__ip_vs_unlink_dest()
~IP_VS_DEST_F_AVAILABLE
cp->dest && !IP_VS_DEST_F_AVAILABLE
__ip_vs_conn_put
...
cleanup_net ---> infinite looping
Fix this by checking whether the timer already started.
Signed-off-by: Tan Hu <tan.hu@zte.com.cn>
Reviewed-by: Jiang Biao <jiang.biao2@zte.com.cn>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2d2e7075b8 ]
The vmcoreinfo of a crashed system is potentially fragmented. Thus the
crash kernel has an intermediate step where the vmcoreinfo is copied into a
temporary, continuous buffer in the crash kernel memory. This temporary
buffer is never freed. Free it now to prevent the memleak.
While at it replace all occurrences of "VMCOREINFO" by its corresponding
macro to prevent potential renaming issues.
Signed-off-by: Philipp Rudo <prudo@linux.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 6cd00a01f0 ]
Since only dentry->d_name.len + 1 bytes out of DNAME_INLINE_LEN bytes
are initialized at __d_alloc(), we can't copy the whole size
unconditionally.
WARNING: kmemcheck: Caught 32-bit read from uninitialized memory (ffff8fa27465ac50)
636f6e66696766732e746d70000000000010000000000000020000000188ffff
i i i i i i i i i i i i i u u u u u u u u u u i i i i i u u u u
^
RIP: 0010:take_dentry_name_snapshot+0x28/0x50
RSP: 0018:ffffa83000f5bdf8 EFLAGS: 00010246
RAX: 0000000000000020 RBX: ffff8fa274b20550 RCX: 0000000000000002
RDX: ffffa83000f5be40 RSI: ffff8fa27465ac50 RDI: ffffa83000f5be60
RBP: ffffa83000f5bdf8 R08: ffffa83000f5be48 R09: 0000000000000001
R10: ffff8fa27465ac00 R11: ffff8fa27465acc0 R12: ffff8fa27465ac00
R13: ffff8fa27465acc0 R14: 0000000000000000 R15: 0000000000000000
FS: 00007f79737ac8c0(0000) GS:ffffffff8fc30000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff8fa274c0b000 CR3: 0000000134aa7002 CR4: 00000000000606f0
take_dentry_name_snapshot+0x28/0x50
vfs_rename+0x128/0x870
SyS_rename+0x3b2/0x3d0
entry_SYSCALL_64_fastpath+0x1a/0xa4
0xffffffffffffffff
Link: http://lkml.kernel.org/r/201709131912.GBG39012.QMJLOVFSFFOOtH@I-love.SAKURA.ne.jp
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Vegard Nossum <vegard.nossum@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e6c47dd0da ]
Some SMB2/3 servers, Win2016 but possibly others too, adds padding
not only between PDUs in a compound but also to the final PDU.
This padding extends the PDU to a multiple of 8 bytes.
Check if the unexpected length looks like this might be the case
and avoid triggering the log messages for :
"SMB2 server sent bad RFC1001 len %d not %d\n"
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5ffe57da29 ]
use_all_metadata() acquires read_lock(&ife_mod_lock), then calls
add_metainfo() which calls find_ife_oplist() which acquires the same
lock again. Deadlock!
Introduce __add_metainfo() which accepts struct tcf_meta_ops *ops
as an additional parameter and let its callers to decide how
to find it. For use_all_metadata(), it already has ops, no
need to find it again, just call __add_metainfo() directly.
And, as ife_mod_lock is only needed for find_ife_oplist(),
this means we can make non-atomic allocation for populate_metalist()
now.
Fixes: 817e9f2c5c ("act_ife: acquire ife_mod_lock before reading ifeoplist")
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4e407ff5cd ]
The only time we need to take tcfa_lock is when adding
a new metainfo to an existing ife->metalist. We don't need
to take tcfa_lock so early and so broadly in tcf_ife_init().
This means we can always take ife_mod_lock first, avoid the
reverse locking ordering warning as reported by Vlad.
Reported-by: Vlad Buslov <vladbu@mellanox.com>
Tested-by: Vlad Buslov <vladbu@mellanox.com>
Cc: Vlad Buslov <vladbu@mellanox.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b93c1b5ac8 ]
Registering another device with same MAC address (such as TAP, VPN or
DPDK KNI) will confuse the VF autobinding logic. Restrict the search
to only run if the device is known to be a PCI attached VF.
Fixes: e8ff40d4bf ("hv_netvsc: improve VF device matching")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2d66f997f0 ]
We don't wakeup the virtqueue if the first byte of pending iova range
is the last byte of the range we just got updated. This will lead a
virtqueue to wait for IOTLB updating forever. Fixing by correct the
check and wake up the virtqueue in this case.
Fixes: 6b1e6cc785 ("vhost: new device IOTLB API")
Reported-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Tested-by: Peter Xu <peterx@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit bab1be79a5 ]
As Marcelo noticed, in sctp_transport_get_next, it is iterating over
transports but then also accessing the association directly, without
checking any refcnts before that, which can cause an use-after-free
Read.
So fix it by holding transport before accessing the association. With
that, sctp_transport_hold calls can be removed in the later places.
Fixes: 626d16f50f ("sctp: export some apis or variables for sctp_diag and reuse some for proc")
Reported-by: syzbot+fe62a0c9aa6a85c6de16@syzkaller.appspotmail.com
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 9f28954614 ]
Before the commit d6990976af ("vti6: fix PMTU caching and reporting
on xmit") '!skb->ignore_df' check was always true because the function
skb_scrub_packet() was called before it, resetting ignore_df to zero.
In the commit, skb_scrub_packet() was moved below, and now this check
can be false for the packet, e.g. when sending it in the two fragments,
this prevents successful PMTU updates in such case. The next attempts
to send the packet lead to the same tx error. Moreover, vti6 initial
MTU value relies on PMTU adjustments.
This issue can be reproduced with the following LTP test script:
udp_ipsec_vti.sh -6 -p ah -m tunnel -s 2000
Fixes: ccd740cbc6 ("vti6: Add pmtu handling to vti6_xmit.")
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 63cc357f7b ]
RFC 1337 says:
''Ignore RST segments in TIME-WAIT state.
If the 2 minute MSL is enforced, this fix avoids all three hazards.''
So with net.ipv4.tcp_rfc1337=1, expected behaviour is to have TIME-WAIT sk
expire rather than removing it instantly when a reset is received.
However, Linux will also re-start the TIME-WAIT timer.
This causes connect to fail when tying to re-use ports or very long
delays (until syn retry interval exceeds MSL).
packetdrill test case:
// Demonstrate bogus rearming of TIME-WAIT timer in rfc1337 mode.
`sysctl net.ipv4.tcp_rfc1337=1`
0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
0.000 bind(3, ..., ...) = 0
0.000 listen(3, 1) = 0
0.100 < S 0:0(0) win 29200 <mss 1460,nop,nop,sackOK,nop,wscale 7>
0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7>
0.200 < . 1:1(0) ack 1 win 257
0.200 accept(3, ..., ...) = 4
// Receive first segment
0.310 < P. 1:1001(1000) ack 1 win 46
// Send one ACK
0.310 > . 1:1(0) ack 1001
// read 1000 byte
0.310 read(4, ..., 1000) = 1000
// Application writes 100 bytes
0.350 write(4, ..., 100) = 100
0.350 > P. 1:101(100) ack 1001
// ACK
0.500 < . 1001:1001(0) ack 101 win 257
// close the connection
0.600 close(4) = 0
0.600 > F. 101:101(0) ack 1001 win 244
// Our side is in FIN_WAIT_1 & waits for ack to fin
0.7 < . 1001:1001(0) ack 102 win 244
// Our side is in FIN_WAIT_2 with no outstanding data.
0.8 < F. 1001:1001(0) ack 102 win 244
0.8 > . 102:102(0) ack 1002 win 244
// Our side is now in TIME_WAIT state, send ack for fin.
0.9 < F. 1002:1002(0) ack 102 win 244
0.9 > . 102:102(0) ack 1002 win 244
// Peer reopens with in-window SYN:
1.000 < S 1000:1000(0) win 9200 <mss 1460,nop,nop,sackOK,nop,wscale 7>
// Therefore, reply with ACK.
1.000 > . 102:102(0) ack 1002 win 244
// Peer sends RST for this ACK. Normally this RST results
// in tw socket removal, but rfc1337=1 setting prevents this.
1.100 < R 1002:1002(0) win 244
// second syn. Due to rfc1337=1 expect another pure ACK.
31.0 < S 1000:1000(0) win 9200 <mss 1460,nop,nop,sackOK,nop,wscale 7>
31.0 > . 102:102(0) ack 1002 win 244
// .. and another RST from peer.
31.1 < R 1002:1002(0) win 244
31.2 `echo no timer restart;ss -m -e -a -i -n -t -o state TIME-WAIT`
// third syn after one minute. Time-Wait socket should have expired by now.
63.0 < S 1000:1000(0) win 9200 <mss 1460,nop,nop,sackOK,nop,wscale 7>
// so we expect a syn-ack & 3whs to proceed from here on.
63.0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7>
Without this patch, 'ss' shows restarts of tw timer and last packet is
thus just another pure ack, more than one minute later.
This restores the original code from commit 283fd6cf0be690a83
("Merge in ANK networking jumbo patch") in netdev-vger-cvs.git .
For some reason the else branch was removed/lost in 1f28b683339f7
("Merge in TCP/UDP optimizations and [..]") and timer restart became
unconditional.
Reported-by: Michal Tesar <mtesar@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 9fd0e09a4e ]
This card identifies itself as:
Ethernet controller [0200]: NCube Device [10ff:8168] (rev 06)
Subsystem: TP-LINK Technologies Co., Ltd. Device [7470:3468]
Adding a new entry to rtl8169_pci_tbl makes the card work.
Link: http://launchpad.net/bugs/1788730
Signed-off-by: Anthony Wong <anthony.wong@ubuntu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 6750c87074 ]
qlge_fix_features() is not supposed to modify hardware or
driver state, rather it is supposed to only fix requested
fetures bits. Currently qlge_fix_features() also goes for
interface down and up unnecessarily if there is not even
any change in features set.
This patch changes/fixes following -
1) Move reload of interface or device re-config from
qlge_fix_features() to qlge_set_features().
2) Reload of interface in qlge_set_features() only if
relevant feature bit (NETIF_F_HW_VLAN_CTAG_RX) is changed.
3) Get rid of qlge_fix_features() since driver is not really
required to fix any features bit.
Signed-off-by: Manish <manish.chopra@cavium.com>
Reviewed-by: Benjamin Poirier <bpoirier@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c3c397c1f1 ]
When using the fixed PHY with GENET (e.g. MOCA) the PHY link
status can be determined from the internal link status captured
by the MAC. This allows the PHY state machine to use the correct
link state with the fixed PHY even if MAC link event interrupts
are missed when the net device is opened.
Fixes: 8d88c6ebb3 ("net: bcmgenet: enable MoCA link state change detection")
Signed-off-by: Doug Berger <opendmb@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 431280eebe ]
tcp uses per-cpu (and per namespace) sockets (net->ipv4.tcp_sk) internally
to send some control packets.
1) RST packets, through tcp_v4_send_reset()
2) ACK packets in SYN-RECV and TIME-WAIT state, through tcp_v4_send_ack()
These packets assert IP_DF, and also use the hashed IP ident generator
to provide an IPv4 ID number.
Geoff Alexander reported this could be used to build off-path attacks.
These packets should not be fragmented, since their size is smaller than
IPV4_MIN_MTU. Only some tunneled paths could eventually have to fragment,
regardless of inner IPID.
We really can use zero IPID, to address the flaw, and as a bonus,
avoid a couple of atomic operations in ip_idents_reserve()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Geoff Alexander <alexandg@cs.unm.edu>
Tested-by: Geoff Alexander <alexandg@cs.unm.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 6d784f1625 ]
Immediately after module_put(), user could delete this
module, so e->ops could be already freed before we call
e->ops->release().
Fix this by moving module_put() after ops->release().
Fixes: ef6980b6be ("introduce IFE action")
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e14d7dfb41 upstream.
Jan has noticed that pte_pfn and co. resp. pfn_pte are incorrect for
CONFIG_PAE because phys_addr_t is wider than unsigned long and so the
pte_val reps. shift left would get truncated. Fix this up by using proper
types.
[Just one chunk, again, needed here. Thanks to Ben and Guenter for
finding and fixing this. - gregkh]
Fixes: 6b28baca9b ("x86/speculation/l1tf: Protect PROT_NONE PTEs against speculation")
Reported-by: Jan Beulich <JBeulich@suse.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0522236d4f upstream.
This patch fixes sleep-in-atomic bugs in AES-CBC and AES-XTS VMX
implementations. The problem is that the blkcipher_* functions should
not be called in atomic context.
The bugs can be reproduced via the AF_ALG interface by trying to
encrypt/decrypt sufficiently large buffers (at least 64 KiB) using the
VMX implementations of 'cbc(aes)' or 'xts(aes)'. Such operations then
trigger BUG in crypto_yield():
[ 891.863680] BUG: sleeping function called from invalid context at include/crypto/algapi.h:424
[ 891.864622] in_atomic(): 1, irqs_disabled(): 0, pid: 12347, name: kcapi-enc
[ 891.864739] 1 lock held by kcapi-enc/12347:
[ 891.864811] #0: 00000000f5d42c46 (sk_lock-AF_ALG){+.+.}, at: skcipher_recvmsg+0x50/0x530
[ 891.865076] CPU: 5 PID: 12347 Comm: kcapi-enc Not tainted 4.19.0-0.rc0.git3.1.fc30.ppc64le #1
[ 891.865251] Call Trace:
[ 891.865340] [c0000003387578c0] [c000000000d67ea4] dump_stack+0xe8/0x164 (unreliable)
[ 891.865511] [c000000338757910] [c000000000172a58] ___might_sleep+0x2f8/0x310
[ 891.865679] [c000000338757990] [c0000000006bff74] blkcipher_walk_done+0x374/0x4a0
[ 891.865825] [c0000003387579e0] [d000000007e73e70] p8_aes_cbc_encrypt+0x1c8/0x260 [vmx_crypto]
[ 891.865993] [c000000338757ad0] [c0000000006c0ee0] skcipher_encrypt_blkcipher+0x60/0x80
[ 891.866128] [c000000338757b10] [c0000000006ec504] skcipher_recvmsg+0x424/0x530
[ 891.866283] [c000000338757bd0] [c000000000b00654] sock_recvmsg+0x74/0xa0
[ 891.866403] [c000000338757c10] [c000000000b00f64] ___sys_recvmsg+0xf4/0x2f0
[ 891.866515] [c000000338757d90] [c000000000b02bb8] __sys_recvmsg+0x68/0xe0
[ 891.866631] [c000000338757e30] [c00000000000bbe4] system_call+0x5c/0x70
Fixes: 8c755ace35 ("crypto: vmx - Adding CBC routines for VMX module")
Fixes: c07f5d3da6 ("crypto: vmx - Adding support for XTS")
Cc: stable@vger.kernel.org
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3943b040f1 upstream.
The writeback thread would exit with a lock held when the cache device
is detached via sysfs interface, fix it by releasing the held lock
before exiting the while-loop.
Fixes: fadd94e05c (bcache: quit dc->writeback_thread when BCACHE_DEV_DETACHING is set)
Signed-off-by: Shan Hai <shan.hai@oracle.com>
Signed-off-by: Coly Li <colyli@suse.de>
Tested-by: Shenghui Wang <shhuiw@foxmail.com>
Cc: stable@vger.kernel.org #4.17+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d1c392c9e2 upstream.
I hit the following splat in my tests:
------------[ cut here ]------------
IRQs not enabled as expected
WARNING: CPU: 3 PID: 0 at kernel/time/tick-sched.c:982 tick_nohz_idle_enter+0x44/0x8c
Modules linked in: ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables ipv6
CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.19.0-rc2-test+ #2
Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
EIP: tick_nohz_idle_enter+0x44/0x8c
Code: ec 05 00 00 00 75 26 83 b8 c0 05 00 00 00 75 1d 80 3d d0 36 3e c1 00
75 14 68 94 63 12 c1 c6 05 d0 36 3e c1 01 e8 04 ee f8 ff <0f> 0b 58 fa bb a0
e5 66 c1 e8 25 0f 04 00 64 03 1d 28 31 52 c1 8b
EAX: 0000001c EBX: f26e7f8c ECX: 00000006 EDX: 00000007
ESI: f26dd1c0 EDI: 00000000 EBP: f26e7f40 ESP: f26e7f38
DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010296
CR0: 80050033 CR2: 0813c6b0 CR3: 2f342000 CR4: 001406f0
Call Trace:
do_idle+0x33/0x202
cpu_startup_entry+0x61/0x63
start_secondary+0x18e/0x1ed
startup_32_smp+0x164/0x168
irq event stamp: 18773830
hardirqs last enabled at (18773829): [<c040150c>] trace_hardirqs_on_thunk+0xc/0x10
hardirqs last disabled at (18773830): [<c040151c>] trace_hardirqs_off_thunk+0xc/0x10
softirqs last enabled at (18773824): [<c0ddaa6f>] __do_softirq+0x25f/0x2bf
softirqs last disabled at (18773767): [<c0416bbe>] call_on_stack+0x45/0x4b
---[ end trace b7c64aa79e17954a ]---
After a bit of debugging, I found what was happening. This would trigger
when performing "perf" with a high NMI interrupt rate, while enabling and
disabling function tracer. Ftrace uses breakpoints to convert the nops at
the start of functions to calls to the function trampolines. The breakpoint
traps disable interrupts and this makes calls into lockdep via the
trace_hardirqs_off_thunk in the entry.S code. What happens is the following:
do_idle {
[interrupts enabled]
<interrupt> [interrupts disabled]
TRACE_IRQS_OFF [lockdep says irqs off]
[...]
TRACE_IRQS_IRET
test if pt_regs say return to interrupts enabled [yes]
TRACE_IRQS_ON [lockdep says irqs are on]
<nmi>
nmi_enter() {
printk_nmi_enter() [traced by ftrace]
[ hit ftrace breakpoint ]
<breakpoint exception>
TRACE_IRQS_OFF [lockdep says irqs off]
[...]
TRACE_IRQS_IRET [return from breakpoint]
test if pt_regs say interrupts enabled [no]
[iret back to interrupt]
[iret back to code]
tick_nohz_idle_enter() {
lockdep_assert_irqs_enabled() [lockdep say no!]
Although interrupts are indeed enabled, lockdep thinks it is not, and since
we now do asserts via lockdep, it gives a false warning. The issue here is
that printk_nmi_enter() is called before lockdep_off(), which disables
lockdep (for this reason) in NMIs. By simply not allowing ftrace to see
printk_nmi_enter() (via notrace annotation) we keep lockdep from getting
confused.
Cc: stable@vger.kernel.org
Fixes: 42a0bb3f71 ("printk/nmi: generic solution for safe printk in NMI")
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 286e877181 upstream.
Commit efda1b5d87 ("acpi, nfit, libnvdimm: fix / harden ars_status output length handling")
Introduced additional hardening for ambiguity in the ACPI spec for
ars_status output sizing. However, it had a couple of cases mixed up.
Where it should have been checking for (and returning) "out_field[1] -
4" it was using "out_field[1] - 8" and vice versa.
This caused a four byte discrepancy in the buffer size passed on to
the command handler, and in some cases, this caused memory corruption
like:
./daxdev-errors.sh: line 76: 24104 Aborted (core dumped) ./daxdev-errors $busdev $region
malloc(): memory corruption
Program received signal SIGABRT, Aborted.
[...]
#5 0x00007ffff7865a2e in calloc () from /lib64/libc.so.6
#6 0x00007ffff7bc2970 in ndctl_bus_cmd_new_ars_status (ars_cap=ars_cap@entry=0x6153b0) at ars.c:136
#7 0x0000000000401644 in check_ars_status (check=0x7fffffffdeb0, bus=0x604c20) at daxdev-errors.c:144
#8 test_daxdev_clear_error (region_name=<optimized out>, bus_name=<optimized out>)
at daxdev-errors.c:332
Cc: <stable@vger.kernel.org>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Lukasz Dorau <lukasz.dorau@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Fixes: efda1b5d87 ("acpi, nfit, libnvdimm: fix / harden ars_status output length handling")
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-of-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 82c9a927bc upstream.
When running in a container with a user namespace, if you call getxattr
with name = "system.posix_acl_access" and size % 8 != 4, then getxattr
silently skips the user namespace fixup that it normally does resulting in
un-fixed-up data being returned.
This is caused by posix_acl_fix_xattr_to_user() being passed the total
buffer size and not the actual size of the xattr as returned by
vfs_getxattr().
This commit passes the actual length of the xattr as returned by
vfs_getxattr() down.
A reproducer for the issue is:
touch acl_posix
setfacl -m user:0:rwx acl_posix
and the compile:
#define _GNU_SOURCE
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <unistd.h>
#include <attr/xattr.h>
/* Run in user namespace with nsuid 0 mapped to uid != 0 on the host. */
int main(int argc, void **argv)
{
ssize_t ret1, ret2;
char buf1[128], buf2[132];
int fret = EXIT_SUCCESS;
char *file;
if (argc < 2) {
fprintf(stderr,
"Please specify a file with "
"\"system.posix_acl_access\" permissions set\n");
_exit(EXIT_FAILURE);
}
file = argv[1];
ret1 = getxattr(file, "system.posix_acl_access",
buf1, sizeof(buf1));
if (ret1 < 0) {
fprintf(stderr, "%s - Failed to retrieve "
"\"system.posix_acl_access\" "
"from \"%s\"\n", strerror(errno), file);
_exit(EXIT_FAILURE);
}
ret2 = getxattr(file, "system.posix_acl_access",
buf2, sizeof(buf2));
if (ret2 < 0) {
fprintf(stderr, "%s - Failed to retrieve "
"\"system.posix_acl_access\" "
"from \"%s\"\n", strerror(errno), file);
_exit(EXIT_FAILURE);
}
if (ret1 != ret2) {
fprintf(stderr, "The value of \"system.posix_acl_"
"access\" for file \"%s\" changed "
"between two successive calls\n", file);
_exit(EXIT_FAILURE);
}
for (ssize_t i = 0; i < ret2; i++) {
if (buf1[i] == buf2[i])
continue;
fprintf(stderr,
"Unexpected different in byte %zd: "
"%02x != %02x\n", i, buf1[i], buf2[i]);
fret = EXIT_FAILURE;
}
if (fret == EXIT_SUCCESS)
fprintf(stderr, "Test passed\n");
else
fprintf(stderr, "Test failed\n");
_exit(fret);
}
and run:
./tester acl_posix
On a non-fixed up kernel this should return something like:
root@c1:/# ./t
Unexpected different in byte 16: ffffffa0 != 00
Unexpected different in byte 17: ffffff86 != 00
Unexpected different in byte 18: 01 != 00
and on a fixed kernel:
root@c1:~# ./t
Test passed
Cc: stable@vger.kernel.org
Fixes: 2f6f0654ab ("userns: Convert vfs posix_acl support to use kuids and kgids")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=199945
Reported-by: Colin Watson <cjwatson@ubuntu.com>
Signed-off-by: Christian Brauner <christian@brauner.io>
Acked-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bb24153a3f upstream.
The default delay 5 jiffies is too much when the kernel is compiled with
HZ=100 - it results in jumpy cursor in Xwindow.
In order to find out the optimal delay, I benchmarked the driver on
1280x720x30fps video. I found out that with HZ=1000, 10ms is acceptable,
but with HZ=250 or HZ=300, we need 4ms, so that the video is played
without any frame skips.
This patch changes the delay to this value.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8c5b044299 upstream.
I have a USB display adapter using the udlfb driver and I use it on an ARM
board that doesn't have any graphics card. When I plug the adapter in, the
console is properly displayed, however when I unplug and re-plug the
adapter, the console is not displayed and I can't access it until I reboot
the board.
The reason is this:
When the adapter is unplugged, dlfb_usb_disconnect calls
unlink_framebuffer, then it waits until the reference count drops to zero
and then it deallocates the framebuffer. However, the console that is
attached to the framebuffer device keeps the reference count non-zero, so
the framebuffer device is never destroyed. When the USB adapter is plugged
again, it creates a new device /dev/fb1 and the console is not attached to
it.
This patch fixes the bug by unbinding the console from unlink_framebuffer.
The code to unbind the console is moved from do_unregister_framebuffer to
a function unbind_console. When the console is unbound, the reference
count drops to zero and the udlfb driver frees the framebuffer. When the
adapter is plugged back, a new framebuffer is created and the console is
attached to it.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Bernie Thompson <bernie@plugable.com>
Cc: Ladislav Michl <ladis@linux-mips.org>
Cc: stable@vger.kernel.org
[b.zolnierkie: preserve old behavior for do_unregister_framebuffer()]
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 38dabd91ff upstream.
pwm-tiehrpwm driver disables PWM output by putting it in low output
state via active AQCSFRC register in ehrpwm_pwm_disable(). But, the
AQCSFRC shadow register is not updated. Therefore, when shadow AQCSFRC
register is re-enabled in ehrpwm_pwm_enable() (say to enable second PWM
output), previous settings are lost as shadow register value is loaded
into active register. This results in things like PWMA getting enabled
automatically, when PWMB is enabled and vice versa. Fix this by
updating AQCSFRC shadow register as well during ehrpwm_pwm_disable().
Fixes: 19891b20e7 ("pwm: pwm-tiehrpwm: PWM driver support for EHRPWM")
Cc: stable@vger.kernel.org
Signed-off-by: Vignesh R <vigneshr@ti.com>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5996559320 upstream.
In ubifs_jnl_update() we sync parent and child inodes to the flash,
in case of xattrs, the parent inode (AKA host inode) has a non-zero
data_len. Therefore we need to adjust synced_i_size too.
This issue was reported by ubifs self tests unter a xattr related work
load.
UBIFS error (ubi0:0 pid 1896): dbg_check_synced_i_size: ui_size is 4, synced_i_size is 0, but inode is clean
UBIFS error (ubi0:0 pid 1896): dbg_check_synced_i_size: i_ino 65, i_mode 0x81a4, i_size 4
Cc: <stable@vger.kernel.org>
Fixes: 1e51764a3c ("UBIFS: add new flash file system")
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5820f140ed upstream.
The old code would hold the userns_state_mutex indefinitely if
memdup_user_nul stalled due to e.g. a userfault region. Prevent that by
moving the memdup_user_nul in front of the mutex_lock().
Note: This changes the error precedence of invalid buf/count/*ppos vs
map already written / capabilities missing.
Fixes: 22d917d80e ("userns: Rework the user_namespace adding uid/gid...")
Cc: stable@vger.kernel.org
Signed-off-by: Jann Horn <jannh@google.com>
Acked-by: Christian Brauner <christian@brauner.io>
Acked-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 42a0cc3478 upstream.
Holding uts_sem as a writer while accessing userspace memory allows a
namespace admin to stall all processes that attempt to take uts_sem.
Instead, move data through stack buffers and don't access userspace memory
while uts_sem is held.
Cc: stable@vger.kernel.org
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0f725561e1 upstream.
When SRIOV VF device IOTLB is invalidated, we need to provide
the PF source ID such that IOMMU hardware can gauge the depth
of invalidation queue which is shared among VFs. This is needed
when device invalidation throttle (DIT) capability is supported.
This patch adds bit definitions for checking and tracking PFSID.
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: stable@vger.kernel.org
Cc: "Ashok Raj" <ashok.raj@intel.com>
Cc: "Lu Baolu" <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6e1811900b upstream.
On all versions of Tegra30 Cardhu, the reset signal to the NXP PCA9546
I2C mux is connected to the Tegra GPIO BB0. Currently, this pin on the
Tegra is not configured as a GPIO but as a special-function IO (SFIO)
that is multiplexing the pin to an I2S controller. On exiting system
suspend, I2C commands sent to the PCA9546 are failing because there is
no ACK. Although it is not possible to see exactly what is happening
to the reset during suspend, by ensuring it is configured as a GPIO
and driven high, to de-assert the reset, the failures are no longer
seen.
Please note that this GPIO is also used to drive the reset signal
going to the camera connector on the board. However, given that there
is no camera support currently for Cardhu, this should not have any
impact.
Fixes: 40431d16ff ("ARM: tegra: enable PCA9546 on Cardhu")
Cc: stable@vger.kernel.org
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0f90be132c upstream.
After a live data migration event at the NFS server, the client may send
I/O requests to the wrong server, causing a live hang due to repeated
recovery events. On the wire, this will appear as an I/O request failing
with NFS4ERR_BADSESSION, followed by successful CREATE_SESSION, repeatedly.
NFS4ERR_BADSSESSION is returned because the session ID being used was
issued by the other server and is not valid at the old server.
The failure is caused by async worker threads having cached the transport
(xprt) in the rpc_task structure. After the migration recovery completes,
the task is redispatched and the task resends the request to the wrong
server based on the old value still present in tk_xprt.
The solution is to recompute the tk_xprt field of the rpc_task structure
so that the request goes to the correct server.
Signed-off-by: Bill Baker <bill.baker@oracle.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Helen Chao <helen.chao@oracle.com>
Fixes: fb43d17210 ("SUNRPC: Use the multipath iterator to assign a ...")
Cc: stable@vger.kernel.org # v4.9+
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0914bb965e upstream.
"dev->nr_children" is the number of children which were parsed
successfully in bl_parse_stripe(). It could be all of them and then, in
that case, it is equal to v->stripe.volumes_count. Either way, the >
should be >= so that we don't go beyond the end of what we're supposed
to.
Fixes: 5c83746a0c ("pnfs/blocklayout: in-kernel GETDEVICEINFO XDR parsing")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org # 3.17+
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fec3259c9f upstream.
Cache invalidation macros use cache line size to iterate over
invalidated cache lines, assuming that all cache ways are invalidated by
single instruction, but xtensa ISA recommends to not assume that for
future compatibility:
In some implementations all ways at index Addry-1..z are invalidated
regardless of the specified way, but for future compatibility this
behavior should not be assumed.
Iterate over all cache ways in ___invalidate_icache_all and
___invalidate_dcache_all.
Cc: stable@vger.kernel.org
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit be75de2525 upstream.
When building kernel for xtensa cores with big cache lines (e.g. 128
bytes or more) __loop_cache_all and __loop_cache_page may generate
assembly instructions with immediate fields that are too big. This
results in the following build errors:
arch/xtensa/mm/misc.S: Assembler messages:
arch/xtensa/mm/misc.S:464: Error: operand 2 of 'diwbi' has invalid value '256'
arch/xtensa/mm/misc.S:464: Error: operand 2 of 'diwbi' has invalid value '384'
arch/xtensa/kernel/head.S: Assembler messages:
arch/xtensa/kernel/head.S:172: Error: operand 2 of 'diu' has invalid value '256'
arch/xtensa/kernel/head.S:172: Error: operand 2 of 'diu' has invalid value '384'
arch/xtensa/kernel/head.S:176: Error: operand 2 of 'iiu' has invalid value '256'
arch/xtensa/kernel/head.S:176: Error: operand 2 of 'iiu' has invalid value '384'
arch/xtensa/kernel/head.S:255: Error: operand 2 of 'diwb' has invalid value '256'
arch/xtensa/kernel/head.S:255: Error: operand 2 of 'diwb' has invalid value '384'
Add parameter max_immed to these macros and use it to limit values of
immediate operands. Extract common code of these macros into the new
macro __loop_cache_unroll.
Cc: stable@vger.kernel.org
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0027ff2a75 upstream.
Two bug fixes:
1) missing entries in the l1d_param array; this can cause a host crash
if an access attempts to reach the missing entry. Future-proof the get
function against any overflows as well. However, the two entries
VMENTER_L1D_FLUSH_EPT_DISABLED and VMENTER_L1D_FLUSH_NOT_REQUIRED must
not be accepted by the parse function, so disable them there.
2) invalid values must be rejected even if the CPU does not have the
bug, so test for them before checking boot_cpu_has(X86_BUG_L1TF)
... and a small refactoring, since the .cmd field is redundant with
the index in the array.
Reported-by: Bandan Das <bsd@redhat.com>
Cc: stable@vger.kernel.org
Fixes: a7b9020b06
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3df6f61fff upstream.
Commit ea0212f40c (power: auto select CONFIG_SRCU) made the code in
drivers/base/power/wakeup.c use SRCU instead of RCU, but it forgot to
select CONFIG_SRCU in Kconfig, which leads to the following build
error if CONFIG_SRCU is not selected somewhere else:
drivers/built-in.o: In function `wakeup_source_remove':
(.text+0x3c6fc): undefined reference to `synchronize_srcu'
drivers/built-in.o: In function `pm_print_active_wakeup_sources':
(.text+0x3c7a8): undefined reference to `__srcu_read_lock'
drivers/built-in.o: In function `pm_print_active_wakeup_sources':
(.text+0x3c84c): undefined reference to `__srcu_read_unlock'
drivers/built-in.o: In function `device_wakeup_arm_wake_irqs':
(.text+0x3d1d8): undefined reference to `__srcu_read_lock'
drivers/built-in.o: In function `device_wakeup_arm_wake_irqs':
(.text+0x3d228): undefined reference to `__srcu_read_unlock'
drivers/built-in.o: In function `device_wakeup_disarm_wake_irqs':
(.text+0x3d24c): undefined reference to `__srcu_read_lock'
drivers/built-in.o: In function `device_wakeup_disarm_wake_irqs':
(.text+0x3d29c): undefined reference to `__srcu_read_unlock'
drivers/built-in.o:(.data+0x4158): undefined reference to `process_srcu'
Fix this error by selecting CONFIG_SRCU when PM_SLEEP is enabled.
Fixes: ea0212f40c (power: auto select CONFIG_SRCU)
Cc: 4.2+ <stable@vger.kernel.org> # 4.2+
Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
[ rjw: Minor subject/changelog fixups ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6afebb70ee upstream.
Fixes https://bugs.linaro.org/show_bug.cgi?id=3903
LTP Functional tests have caused a bad paging request when triggering
the regmap_read_debugfs() logic of the device PMIC Hi6553 (reading
regmap/f8000000.pmic/registers file during read_all test):
Unable to handle kernel paging request at virtual address ffff0
[ffff00000984e000] pgd=0000000077ffe803, pud=0000000077ffd803,0
Internal error: Oops: 96000007 [#1] SMP
...
Hardware name: HiKey Development Board (DT)
...
Call trace:
regmap_mmio_read8+0x24/0x40
regmap_mmio_read+0x48/0x70
_regmap_bus_reg_read+0x38/0x48
_regmap_read+0x68/0x170
regmap_read+0x50/0x78
regmap_read_debugfs+0x1a0/0x308
regmap_map_read_file+0x48/0x58
full_proxy_read+0x68/0x98
__vfs_read+0x48/0x80
vfs_read+0x94/0x150
SyS_read+0x6c/0xd8
el0_svc_naked+0x30/0x34
Code: aa1e03e0 d503201f f9400280 8b334000 (39400000)
Investigations have showed that, when triggered by debugfs read()
handler, the mmio regmap logic was reading a bigger (16k) register area
than the one mapped by devm_ioremap_resource() during hi655x-pmic probe
time (4k).
This commit changes hi655x's max register, according to HW specs, to be
the same as the one declared in the pmic device in hi6220's dts, fixing
the issue.
Cc: <stable@vger.kernel.org> #v4.9 #v4.14 #v4.16 #v4.17
Signed-off-by: Rafael David Tinoco <rafael.tinoco@linaro.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 016f8ffc48 upstream.
While debugging another bug, I was looking at all the synchronize*()
functions being used in kernel/trace, and noticed that trace_uprobes was
using synchronize_sched(), with a comment to synchronize with
{u,ret}_probe_trace_func(). When looking at those functions, the data is
protected with "rcu_read_lock()" and not with "rcu_read_lock_sched()". This
is using the wrong synchronize_*() function.
Link: http://lkml.kernel.org/r/20180809160553.469e1e32@gandalf.local.home
Cc: stable@vger.kernel.org
Fixes: 70ed91c6ec ("tracing/uprobes: Support ftrace_event_file base multibuffer")
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 757d914007 upstream.
Masami Hiramatsu reported:
Current trace-enable attribute in sysfs returns an error
if user writes the same setting value as current one,
e.g.
# cat /sys/block/sda/trace/enable
0
# echo 0 > /sys/block/sda/trace/enable
bash: echo: write error: Invalid argument
# echo 1 > /sys/block/sda/trace/enable
# echo 1 > /sys/block/sda/trace/enable
bash: echo: write error: Device or resource busy
But this is not a preferred behavior, it should ignore
if new setting is same as current one. This fixes the
problem as below.
# cat /sys/block/sda/trace/enable
0
# echo 0 > /sys/block/sda/trace/enable
# echo 1 > /sys/block/sda/trace/enable
# echo 1 > /sys/block/sda/trace/enable
Link: http://lkml.kernel.org/r/20180816103802.08678002@gandalf.local.home
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org
Cc: stable@vger.kernel.org
Fixes: cd649b8bb8 ("blktrace: remove sysfs_blk_trace_enable_show/store()")
Reported-by: Masami Hiramatsu <mhiramat@kernel.org>
Tested-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f143641bfe upstream.
Currently, when one echo's in 1 into tracing_on, the current tracer's
"start()" function is executed, even if tracing_on was already one. This can
lead to strange side effects. One being that if the hwlat tracer is enabled,
and someone does "echo 1 > tracing_on" into tracing_on, the hwlat tracer's
start() function is called again which will recreate another kernel thread,
and make it unable to remove the old one.
Link: http://lkml.kernel.org/r/1533120354-22923-1-git-send-email-erica.bugden@linutronix.de
Cc: stable@vger.kernel.org
Fixes: 2df8f8a6a8 ("tracing: Fix regression with irqsoff tracer and tracing_on file")
Reported-by: Erica Bugden <erica.bugden@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c3cc1b0fc2 upstream.
Currently, when all modules, including VMCI and VMware balloon are built
into the kernel, the initialization of the balloon happens before the
VMCI is probed. As a result, the balloon fails to initialize the VMCI
doorbell, which it uses to get asynchronous requests for balloon size
changes.
The problem can be seen in the logs, in the form of the following
message:
"vmw_balloon: failed to initialize vmci doorbell"
The driver would work correctly but slightly less efficiently, probing
for requests periodically. This patch changes the balloon to be
initialized using late_initcall() instead of module_init() to address
this issue. It does not address a situation in which VMCI is built as a
module and the balloon is built into the kernel.
Fixes: 48e3d668b7 ("VMware balloon: Enable notification via VMCI")
Cc: stable@vger.kernel.org
Reviewed-by: Xavier Deguillard <xdeguillard@vmware.com>
Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ce664331b2 upstream.
When vmballoon_vmci_init() sets a doorbell using VMCI_DOORBELL_SET, for
some reason it does not consider the status and looks at the result.
However, the hypervisor does not update the result - it updates the
status. This might cause VMCI doorbell not to be enabled, resulting in
degraded performance.
Fixes: 48e3d668b7 ("VMware balloon: Enable notification via VMCI")
Cc: stable@vger.kernel.org
Reviewed-by: Xavier Deguillard <xdeguillard@vmware.com>
Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5081efd112 upstream.
If the hypervisor sets 2MB batching is on, while batching is cleared,
the balloon code breaks. In this case the legacy mechanism is used with
2MB page. The VM would report a 2MB page is ballooned, and the
hypervisor would only take the first 4KB.
While the hypervisor should not report such settings, make the code more
robust by not enabling 2MB support without batching.
Fixes: 365bd7ef7e ("VMware balloon: Support 2m page ballooning.")
Cc: stable@vger.kernel.org
Reviewed-by: Xavier Deguillard <xdeguillard@vmware.com>
Signed-off-by: Nadav Amit <nadav.amit@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 09755690c6 upstream.
When balloon batching is not supported by the hypervisor, the guest
frame number (GFN) must fit in 32-bit. However, due to a bug, this check
was mistakenly ignored. In practice, when total RAM is greater than
16TB, the balloon does not work currently, making this bug unlikely to
happen.
Fixes: ef0f8f1129 ("VMware balloon: partially inline vmballoon_reserve_page.")
Cc: stable@vger.kernel.org
Reviewed-by: Xavier Deguillard <xdeguillard@vmware.com>
Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9a5094ca29 upstream.
A sysfs write callback function needs to either return the number of
consumed characters or an error.
The ad952x_store() function currently returns 0 if the input value was "0",
this will signal that no characters have been consumed and the function
will be called repeatedly in a loop indefinitely. Fix this by returning
number of supplied characters to indicate that the whole input string has
been consumed.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com>
Fixes: cd1678f963 ("iio: frequency: New driver for AD9523 SPI Low Jitter Clock Generator")
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5a4e33c1c5 upstream.
Fix the displayed phase for the ad9523 driver. Currently the most
significant decimal place is dropped and all other digits are shifted one
to the left. This is due to a multiplication by 10, which is not necessary,
so remove it.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com>
Fixes: cd1678f963 ("iio: frequency: New driver for AD9523 SPI Low Jitter Clock Generator")
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fd2fa95416 upstream.
policy_hint_size starts as 0 during __write_initial_superblock(). It
isn't until the policy is loaded that policy_hint_size is set in-core
(cmd->policy_hint_size). But it never got recorded in the on-disk
superblock because __commit_transaction() didn't deal with transfering
the in-core cmd->policy_hint_size to the on-disk superblock.
The in-core cmd->policy_hint_size gets initialized by metadata_open()'s
__begin_transaction_flags() which re-reads all superblock fields.
Because the superblock's policy_hint_size was never properly stored, when
the cache was created, hints_array_available() would always return false
when re-activating a previously created cache. This means
__load_mappings() always considered the hints invalid and never made use
of the hints (these hints served to optimize).
Another detremental side-effect of this oversight is the cache_check
utility would fail with: "invalid hint width: 0"
Cc: stable@vger.kernel.org
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 75294442d8 upstream.
Now both check_for_space() and do_no_space_timeout() will read & write
pool->pf.error_if_no_space. If these functions run concurrently, as
shown in the following case, the default setting of "queue_if_no_space"
can get lost.
precondition:
* error_if_no_space = false (aka "queue_if_no_space")
* pool is in Out-of-Data-Space (OODS) mode
* no_space_timeout worker has been queued
CPU 0: CPU 1:
// delete a thin device
process_delete_mesg()
// check_for_space() invoked by commit()
set_pool_mode(pool, PM_WRITE)
pool->pf.error_if_no_space = \
pt->requested_pf.error_if_no_space
// timeout, pool is still in OODS mode
do_no_space_timeout
// "queue_if_no_space" config is lost
pool->pf.error_if_no_space = true
pool->pf.mode = new_mode
Fix it by stopping no_space_timeout worker when switching to write mode.
Fixes: bcc696fac1 ("dm thin: stay in out-of-data-space mode once no_space_timeout expires")
Cc: stable@vger.kernel.org
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3111784bee upstream.
In my testing, v9fs_fid_xattr_set will return successfully even if the
backend ext4 filesystem has no space to store xattr key-value. That will
cause inconsistent behavior between front end and back end. The reason is
that lsetxattr will be triggered by p9_client_clunk, and unfortunately we
did not catch the error. This patch will catch the error to notify upper
caller.
p9_client_clunk (in 9p)
p9_client_rpc(clnt, P9_TCLUNK, "d", fid->fid);
v9fs_clunk (in qemu)
put_fid
free_fid
v9fs_xattr_fid_clunk
v9fs_co_lsetxattr
s->ops->lsetxattr
ext4_xattr_user_set (in host ext4 filesystem)
Link: http://lkml.kernel.org/r/5B57EACC.2060900@huawei.com
Signed-off-by: Jun Piao <piaojun@huawei.com>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@sandia.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@vger.kernel.org
Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ef6cb5f1a0 upstream.
Function atomic_inc_unless_negative() returns a bool to indicate
success/failure. However cxl_adapter_context_get() wrongly compares
the return value against '>=0' which will always be true. The patch
fixes this comparison to '==0' there by also fixing this compile time
warning:
drivers/misc/cxl/main.c:290 cxl_adapter_context_get()
warn: 'atomic_inc_unless_negative(&adapter->contexts_num)' is unsigned
Fixes: 70b565bbdb ("cxl: Prevent adapter reset if an active context exists")
Cc: stable@vger.kernel.org # v4.9+
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit db2173198b upstream.
The generic code is racy when multiple children of a PCI bridge try to
enable it simultaneously.
This leads to drivers trying to access a device through a
not-yet-enabled bridge, and this EEH errors under various
circumstances when using parallel driver probing.
There is work going on to fix that properly in the PCI core but it
will take some time.
x86 gets away with it because (outside of hotplug), the BIOS enables
all the bridges at boot time.
This patch does the same thing on powernv by enabling all bridges that
have child devices at boot time, thus avoiding subsequent races. It's
suitable for backporting to stable and distros, while the proper PCI
fix will probably be significantly more invasive.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: stable@vger.kernel.org
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7506dc7989 upstream.
Add PCI-specific dev_printk() wrappers and use them to simplify the code
slightly. No functional change intended.
Signed-off-by: Frederick Lawler <fred@fredlawl.com>
[bhelgaas: squash into one patch]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
[only take the pci.h portion of this patch, to make backporting stuff
easier over time - gregkh]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cd813e1cd7 upstream.
During Machine Check interrupt on pseries platform, register r3 points
RTAS extended event log passed by hypervisor. Since hypervisor uses r3
to pass pointer to rtas log, it stores the original r3 value at the
start of the memory (first 8 bytes) pointed by r3. Since hypervisor
stores this info and rtas log is in BE format, linux should make
sure to restore r3 value in correct endian format.
Without this patch when MCE handler, after recovery, returns to code that
that caused the MCE may end up with Data SLB access interrupt for invalid
address followed by kernel panic or hang.
Severe Machine check interrupt [Recovered]
NIP [d00000000ca301b8]: init_module+0x1b8/0x338 [bork_kernel]
Initiator: CPU
Error type: SLB [Multihit]
Effective address: d00000000ca70000
cpu 0xa: Vector: 380 (Data SLB Access) at [c0000000fc7775b0]
pc: c0000000009694c0: vsnprintf+0x80/0x480
lr: c0000000009698e0: vscnprintf+0x20/0x60
sp: c0000000fc777830
msr: 8000000002009033
dar: a803a30c000000d0
current = 0xc00000000bc9ef00
paca = 0xc00000001eca5c00 softe: 3 irq_happened: 0x01
pid = 8860, comm = insmod
vscnprintf+0x20/0x60
vprintk_emit+0xb4/0x4b0
vprintk_func+0x5c/0xd0
printk+0x38/0x4c
init_module+0x1c0/0x338 [bork_kernel]
do_one_initcall+0x54/0x230
do_init_module+0x8c/0x248
load_module+0x12b8/0x15b0
sys_finit_module+0xa8/0x110
system_call+0x58/0x6c
--- Exception: c00 (System Call) at 00007fff8bda0644
SP (7fffdfbfe980) is in userspace
This patch fixes this issue.
Fixes: a08a53ea4c ("powerpc/le: Enable RTAS events support")
Cc: stable@vger.kernel.org # v3.15+
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1bd6a1c4b8 upstream.
Crash memory ranges is an array of memory ranges of the crashing kernel
to be exported as a dump via /proc/vmcore file. The size of the array
is set based on INIT_MEMBLOCK_REGIONS, which works alright in most cases
where memblock memory regions count is less than INIT_MEMBLOCK_REGIONS
value. But this count can grow beyond INIT_MEMBLOCK_REGIONS value since
commit 142b45a72e ("memblock: Add array resizing support").
On large memory systems with a few DLPAR operations, the memblock memory
regions count could be larger than INIT_MEMBLOCK_REGIONS value. On such
systems, registering fadump results in crash or other system failures
like below:
task: c00007f39a290010 ti: c00000000b738000 task.ti: c00000000b738000
NIP: c000000000047df4 LR: c0000000000f9e58 CTR: c00000000010f180
REGS: c00000000b73b570 TRAP: 0300 Tainted: G L X (4.4.140+)
MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 22004484 XER: 20000000
CFAR: c000000000008500 DAR: 000007a450000000 DSISR: 40000000 SOFTE: 0
...
NIP [c000000000047df4] smp_send_reschedule+0x24/0x80
LR [c0000000000f9e58] resched_curr+0x138/0x160
Call Trace:
resched_curr+0x138/0x160 (unreliable)
check_preempt_curr+0xc8/0xf0
ttwu_do_wakeup+0x38/0x150
try_to_wake_up+0x224/0x4d0
__wake_up_common+0x94/0x100
ep_poll_callback+0xac/0x1c0
__wake_up_common+0x94/0x100
__wake_up_sync_key+0x70/0xa0
sock_def_readable+0x58/0xa0
unix_stream_sendmsg+0x2dc/0x4c0
sock_sendmsg+0x68/0xa0
___sys_sendmsg+0x2cc/0x2e0
__sys_sendmsg+0x5c/0xc0
SyS_socketcall+0x36c/0x3f0
system_call+0x3c/0x100
as array index overflow is not checked for while setting up crash memory
ranges causing memory corruption. To resolve this issue, dynamically
allocate memory for crash memory ranges and resize it incrementally,
in units of pagesize, on hitting array size limit.
Fixes: 2df173d9e8 ("fadump: Initialize elfcore header and add PT_LOAD program headers.")
Cc: stable@vger.kernel.org # v3.4+
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
[mpe: Just use PAGE_SIZE directly, fixup variable placement]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3512a18cbd upstream.
There is a potential execution path in which function
platform_get_resource() returns NULL. If this happens,
we will end up having a NULL pointer dereference.
Fix this by replacing devm_ioremap with devm_ioremap_resource,
which has the NULL check and the memory region request.
This code was detected with the help of Coccinelle.
Cc: stable@vger.kernel.org
Fixes: f700e84f41 ("mailbox: Add support for APM X-Gene platform mailbox driver")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7444a80929 upstream.
Prior to commit 573185cc7e ("mmc: core: Invoke sdio func driver's PM
callbacks from the sdio bus"), the MMC core used to call into the power
management functions of SDIO clients itself and removed the card if the
return code was non-zero. IOW, the mmc handled errors gracefully and didn't
upchain them to the pm core.
Since this change, the mmc core relies on generic power management
functions which treat all errors as a reason to cancel the suspend
immediately. This causes suspend attempts to fail when the libertas
driver is loaded.
To fix this, power down the card explicitly in if_sdio_suspend() when we
know we're about to lose power and return success. Also set a flag in these
cases, and power up the card again in if_sdio_resume().
Fixes: 573185cc7e ("mmc: core: Invoke sdio func driver's PM callbacks from the sdio bus")
Cc: <stable@vger.kernel.org>
Signed-off-by: Daniel Mack <daniel@zonque.org>
Reviewed-by: Chris Ball <chris@printf.net>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d8ffee2f55 upstream.
Registers of DSPI should not be accessed before enabling its clock. On
Toradex Colibri VF50 on Iris carrier board this could be seen during
bootup as imprecise abort:
Unhandled fault: imprecise external abort (0x1c06) at 0x00000000
Internal error: : 1c06 [#1] ARM
Modules linked in:
CPU: 0 PID: 1 Comm: swapper Not tainted 4.14.39-dirty #97
Hardware name: Freescale Vybrid VF5xx/VF6xx (Device Tree)
Backtrace:
[<804166a8>] (regmap_write) from [<80466b5c>] (dspi_probe+0x1f0/0x8dc)
[<8046696c>] (dspi_probe) from [<8040107c>] (platform_drv_probe+0x54/0xb8)
[<80401028>] (platform_drv_probe) from [<803ff53c>] (driver_probe_device+0x280/0x2f8)
[<803ff2bc>] (driver_probe_device) from [<803ff674>] (__driver_attach+0xc0/0xc4)
[<803ff5b4>] (__driver_attach) from [<803fd818>] (bus_for_each_dev+0x70/0xa4)
[<803fd7a8>] (bus_for_each_dev) from [<803fee74>] (driver_attach+0x24/0x28)
[<803fee50>] (driver_attach) from [<803fe980>] (bus_add_driver+0x1a0/0x218)
[<803fe7e0>] (bus_add_driver) from [<803fffe8>] (driver_register+0x80/0x100)
[<803fff68>] (driver_register) from [<80400fdc>] (__platform_driver_register+0x48/0x50)
[<80400f94>] (__platform_driver_register) from [<8091cf7c>] (fsl_dspi_driver_init+0x1c/0x20)
[<8091cf60>] (fsl_dspi_driver_init) from [<8010195c>] (do_one_initcall+0x4c/0x174)
[<80101910>] (do_one_initcall) from [<80900e8c>] (kernel_init_freeable+0x144/0x1d8)
[<80900d48>] (kernel_init_freeable) from [<805ff6a8>] (kernel_init+0x10/0x114)
[<805ff698>] (kernel_init) from [<80107be8>] (ret_from_fork+0x14/0x2c)
Cc: <stable@vger.kernel.org>
Fixes: 5ee67b587a ("spi: dspi: clear SPI_SR before enable interrupt")
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d28c756cae upstream.
The zero-copy optimization when reading or writing large chunks of data
is quite useful. However, the 9p messages created through the zero-copy
write path have an incorrect message size: it should be the size of the
header + size of the data being written but instead it's just the size
of the header.
This only works if the server ignores the size field of the message and
otherwise breaks the framing of the protocol. Fix this by re-writing the
message size field with the correct value.
Tested by running `dd if=/dev/zero of=out bs=4k count=1` inside a
virtio-9p mount.
Link: http://lkml.kernel.org/r/20180717003529.114368-1-chirantan@chromium.org
Signed-off-by: Chirantan Ekbote <chirantan@chromium.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Greg Kurz <groug@kaod.org>
Cc: Dylan Reid <dgreid@chromium.org>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: stable@vger.kernel.org
Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch is against 4.9. It does not apply to master due to a large
rework of ion in 4.12 which removed the affected functions altogther.
4c23cbff07 ("staging: android: ion: Remove import interface")
Userspace can cause the kref to handles to increment
arbitrarily high. Ensure it does not overflow.
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8f3fafc9c2 upstream.
Like d88b6d04: "cdrom: information leak in cdrom_ioctl_media_changed()"
There is another cast from unsigned long to int which causes
a bounds check to fail with specially crafted input. The value is
then used as an index in the slot array in cdrom_slot_status().
Signed-off-by: Scott Bauer <scott.bauer@intel.com>
Signed-off-by: Scott Bauer <sbauer@plzdonthack.me>
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a427503eda upstream.
If an iio channel defines a basic property, there are duplicate entries
in /sys/class/power/*/uevent.
So add a check to avoid duplicates. Since all channels may be duplicates,
we have to modify the related error check.
Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com>
Cc: stable@vger.kernel.org
Fixes: e60fea794e ("power: battery: Generic battery driver using IIO")
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 932d47448c upstream.
We did have sporadic problems in the pinctrl framework during boot
where a pin group name unexpectedly became NULL leading to a NULL
dereference in strcmp.
Detailled analysis of the failing cases did reveal that there were
two devm allocated objects close to each other. The second one was
the affected group_desc in pinmux and the first one was the
psy_desc->properties buffer of the gab driver.
Review of the gab code showed that the address calculation for
one memcpy() is wrong. It does
properties + sizeof(type) * index
but C is defined to do the index multiplication already for
pointer + integer additions. Hence the factor was applied twice
and the memcpy() does write outside of the properties buffer.
Sometimes it happened to be the pinctrl and triggered the strcmp(NULL).
Anyways, it is overkill to use a memcpy() here instead of a simple
assignment, which is easier to read and has less risk for wrong
address calculations. So we change code to a simple assignment.
If we initialize the index to the first free location, we can even
remove the local variable 'properties'.
This bug seems to exist right from the beginning in 3.7-rc1 in
commit e60fea794e ("power: battery: Generic battery driver using IIO")
Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com>
Cc: stable@vger.kernel.org
Fixes: e60fea794e ("power: battery: Generic battery driver using IIO")
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 26abc916a8 upstream.
The problem is that iscsi_login_zero_tsih_s1 sets conn->sess early in
iscsi_login_set_conn_values. If the function fails later like when we
alloc the idr it does kfree(sess) and leaves the conn->sess pointer set.
iscsi_login_zero_tsih_s1 then returns -Exyz and we then call
iscsi_target_login_sess_out and access the freed memory.
This patch has iscsi_login_zero_tsih_s1 either completely setup the
session or completely tear it down, so later in
iscsi_target_login_sess_out we can just check for it being set to the
connection.
Cc: stable@vger.kernel.org
Fixes: 0957627a99 ("iscsi-target: Fix sess allocation leak in...")
Signed-off-by: Mike Christie <mchristi@redhat.com>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0ee223b2e1 upstream.
A long time ago the unfortunate decision was taken to add a self-deletion
attribute to the sysfs SCSI device directory. That decision was unfortunate
because self-deletion is really tricky. We can't drop that attribute
because widely used user space software depends on it, namely the
rescan-scsi-bus.sh script. Hence this patch that avoids that writing into
that attribute triggers a deadlock. See also commit 7973cbd9fbd9 ("[PATCH]
add sysfs attributes to scan and delete scsi_devices").
This patch avoids that self-removal triggers the following deadlock:
======================================================
WARNING: possible circular locking dependency detected
4.18.0-rc2-dbg+ #5 Not tainted
------------------------------------------------------
modprobe/6539 is trying to acquire lock:
000000008323c4cd (kn->count#202){++++}, at: kernfs_remove_by_name_ns+0x45/0x90
but task is already holding lock:
00000000a6ec2c69 (&shost->scan_mutex){+.+.}, at: scsi_remove_host+0x21/0x150 [scsi_mod]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&shost->scan_mutex){+.+.}:
__mutex_lock+0xfe/0xc70
mutex_lock_nested+0x1b/0x20
scsi_remove_device+0x26/0x40 [scsi_mod]
sdev_store_delete+0x27/0x30 [scsi_mod]
dev_attr_store+0x3e/0x50
sysfs_kf_write+0x87/0xa0
kernfs_fop_write+0x190/0x230
__vfs_write+0xd2/0x3b0
vfs_write+0x101/0x270
ksys_write+0xab/0x120
__x64_sys_write+0x43/0x50
do_syscall_64+0x77/0x230
entry_SYSCALL_64_after_hwframe+0x49/0xbe
-> #0 (kn->count#202){++++}:
lock_acquire+0xd2/0x260
__kernfs_remove+0x424/0x4a0
kernfs_remove_by_name_ns+0x45/0x90
remove_files.isra.1+0x3a/0x90
sysfs_remove_group+0x5c/0xc0
sysfs_remove_groups+0x39/0x60
device_remove_attrs+0x82/0xb0
device_del+0x251/0x580
__scsi_remove_device+0x19f/0x1d0 [scsi_mod]
scsi_forget_host+0x37/0xb0 [scsi_mod]
scsi_remove_host+0x9b/0x150 [scsi_mod]
sdebug_driver_remove+0x4b/0x150 [scsi_debug]
device_release_driver_internal+0x241/0x360
device_release_driver+0x12/0x20
bus_remove_device+0x1bc/0x290
device_del+0x259/0x580
device_unregister+0x1a/0x70
sdebug_remove_adapter+0x8b/0xf0 [scsi_debug]
scsi_debug_exit+0x76/0xe8 [scsi_debug]
__x64_sys_delete_module+0x1c1/0x280
do_syscall_64+0x77/0x230
entry_SYSCALL_64_after_hwframe+0x49/0xbe
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&shost->scan_mutex);
lock(kn->count#202);
lock(&shost->scan_mutex);
lock(kn->count#202);
*** DEADLOCK ***
2 locks held by modprobe/6539:
#0: 00000000efaf9298 (&dev->mutex){....}, at: device_release_driver_internal+0x68/0x360
#1: 00000000a6ec2c69 (&shost->scan_mutex){+.+.}, at: scsi_remove_host+0x21/0x150 [scsi_mod]
stack backtrace:
CPU: 10 PID: 6539 Comm: modprobe Not tainted 4.18.0-rc2-dbg+ #5
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
Call Trace:
dump_stack+0xa4/0xf5
print_circular_bug.isra.34+0x213/0x221
__lock_acquire+0x1a7e/0x1b50
lock_acquire+0xd2/0x260
__kernfs_remove+0x424/0x4a0
kernfs_remove_by_name_ns+0x45/0x90
remove_files.isra.1+0x3a/0x90
sysfs_remove_group+0x5c/0xc0
sysfs_remove_groups+0x39/0x60
device_remove_attrs+0x82/0xb0
device_del+0x251/0x580
__scsi_remove_device+0x19f/0x1d0 [scsi_mod]
scsi_forget_host+0x37/0xb0 [scsi_mod]
scsi_remove_host+0x9b/0x150 [scsi_mod]
sdebug_driver_remove+0x4b/0x150 [scsi_debug]
device_release_driver_internal+0x241/0x360
device_release_driver+0x12/0x20
bus_remove_device+0x1bc/0x290
device_del+0x259/0x580
device_unregister+0x1a/0x70
sdebug_remove_adapter+0x8b/0xf0 [scsi_debug]
scsi_debug_exit+0x76/0xe8 [scsi_debug]
__x64_sys_delete_module+0x1c1/0x280
do_syscall_64+0x77/0x230
entry_SYSCALL_64_after_hwframe+0x49/0xbe
See also https://www.mail-archive.com/linux-scsi@vger.kernel.org/msg54525.html.
Fixes: ac0ece9174 ("scsi: use device_remove_file_self() instead of device_schedule_callback()")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
commit 690d9163bf upstream.
Some versions of GCC suboptimally generate calls to the __multi3()
intrinsic for MIPS64r6 builds, resulting in link failures due to the
missing function:
LD vmlinux.o
MODPOST vmlinux.o
kernel/bpf/verifier.o: In function `kmalloc_array':
include/linux/slab.h:631: undefined reference to `__multi3'
fs/select.o: In function `kmalloc_array':
include/linux/slab.h:631: undefined reference to `__multi3'
...
We already have a workaround for this in which we provide the
instrinsic, but we do so selectively for GCC 7 only. Unfortunately the
issue occurs with older GCC versions too - it has been observed with
both GCC 5.4.0 & GCC 6.4.0.
MIPSr6 support was introduced in GCC 5, so all major GCC versions prior
to GCC 8 are affected and we extend our workaround accordingly to all
MIPS64r6 builds using GCC versions older than GCC 8.
Signed-off-by: Paul Burton <paul.burton@mips.com>
Reported-by: Vladimir Kondratiev <vladimir.kondratiev@intel.com>
Fixes: ebabcf17bc ("MIPS: Implement __multi3 for GCC7 MIPS64r6 builds")
Patchwork: https://patchwork.linux-mips.org/patch/20297/
Cc: James Hogan <jhogan@kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: stable@vger.kernel.org # 4.15+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 866f3576a7 upstream.
During interrupt setup we allocate interrupt vectors, walk the list of msi
descriptors, and fill in the message data. Requesting more interrupts than
supported on s390 can lead to an out of bounds access.
When we restrict the number of interrupts we should also stop walking the
msi list after all supported interrupts are handled.
Cc: stable@vger.kernel.org
Signed-off-by: Sebastian Ott <sebott@linux.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fb7d7518b0 upstream.
The numa_init_early initcall sets the node_to_cpumask_map[0] to the
full cpu_possible_mask. Unfortunately this early_initcall is too late,
the NUMA setup for numa=emu is done even earlier. The order of calls
is numa_setup() -> emu_update_cpu_topology(), then the early_initcalls(),
followed by sched_init_domains().
Starting with git commit 051f3ca02e
"sched/topology: Introduce NUMA identity node sched domain"
the incorrect node_to_cpumask_map[0] really screws up the domain
setup and the kernel panics with the follow oops:
Cc: <stable@vger.kernel.org> # v4.15+
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 64e03ff726 upstream.
When allocating a new AOB fails, handle_outbound() is still capable of
transmitting the selected buffer (just without async completion).
But if a previous transfer on this queue slot used async completion, its
sbal_state flags field is still set to QDIO_OUTBUF_STATE_FLAG_PENDING.
So when the upper layer driver sees this stale flag, it expects an async
completion that never happens.
Fix this by unconditionally clearing the flags field.
Fixes: 104ea556ee ("qdio: support asynchronous delivery of storage blocks")
Cc: <stable@vger.kernel.org> #v3.2+
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 26f843848b upstream.
For machines without the exrl instruction the BFP jit generates
code that uses an "br %r1" instruction located in the lowcore page.
Unfortunately there is a cut & paste error that puts an additional
"larl %r1,.+14" instruction in the code that clobbers the branch
target address in %r1. Remove the larl instruction.
Cc: <stable@vger.kernel.org> # v4.17+
Fixes: de5cb6eb51 ("s390: use expoline thunks in the BPF JIT")
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f12d11c5c1 upstream.
Reset the KASAN shadow state of the task stack before rewinding RSP.
Without this, a kernel oops will leave parts of the stack poisoned, and
code running under do_exit() can trip over such poisoned regions and cause
nonsensical false-positive KASAN reports about stack-out-of-bounds bugs.
This does not wipe the exception stacks; if an oops happens on an exception
stack, it might result in random KASAN false-positives from other tasks
afterwards. This is probably relatively uninteresting, since if the kernel
oopses on an exception stack, there are most likely bigger things to worry
about. It'd be more interesting if vmapped stacks and KASAN were
compatible, since then handle_stack_overflow() would oops from exception
stack context.
Fixes: 2deb4be280 ("x86/dumpstack: When OOPSing, rewind the stack before do_exit()")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: kasan-dev@googlegroups.com
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180828184033.93712-1-jannh@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cc51e5428e upstream.
On Nehalem and newer core CPUs the CPU cache internally uses 44 bits
physical address space. The L1TF workaround is limited by this internal
cache address width, and needs to have one bit free there for the
mitigation to work.
Older client systems report only 36bit physical address space so the range
check decides that L1TF is not mitigated for a 36bit phys/32GB system with
some memory holes.
But since these actually have the larger internal cache width this warning
is bogus because it would only really be needed if the system had more than
43bits of memory.
Add a new internal x86_cache_bits field. Normally it is the same as the
physical bits field reported by CPUID, but for Nehalem and newerforce it to
be at least 44bits.
Change the L1TF memory size warning to use the new cache_bits field to
avoid bogus warnings and remove the bogus comment about memory size.
Fixes: 17dbca1193 ("x86/speculation/l1tf: Add sysfs reporting for l1tf")
Reported-by: George Anchev <studio@anchev.net>
Reported-by: Christopher Snowhill <kode54@gmail.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Michael Hocko <mhocko@suse.com>
Cc: vbabka@suse.cz
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180824170351.34874-1-andi@firstfloor.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ae1c696a48 upstream.
There is a potential execution path in which function
platform_get_resource() returns NULL. If this happens,
we will end up having a NULL pointer dereference.
Fix this by replacing devm_ioremap with devm_ioremap_resource,
which has the NULL check and the memory region request.
This code was detected with the help of Coccinelle.
Cc: stable@vger.kernel.org
Fixes: 2bd8d1d5cf ("ASoC: sirf: Add audio usp interface driver")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4febced15a upstream.
When merging codec formats, dpcm_runtime_base_format() should skip
the codecs which are not supporting the current stream direction.
At the moment, if a BE link has more than one codec, and only one
of these codecs has no capture DAI, it becomes impossible to start
a capture stream because the merged format would be 0.
Skipping invalid codec DAI solves the problem.
Fixes: b073ed4e21 ("ASoC: soc-pcm: DPCM cares BE format")
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 542bb9788a upstream.
Allocations larger than PAGE_ALLOC_COSTLY_ORDER are unreliable and they
may fail anytime. This patch fixes the udl kms driver so that when a large
alloactions fails, it tries to do multiple smaller allocations.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8456b99c16 upstream.
If we leave urbs around, it causes not only leak, but also memory
corruption. This patch fixes the function udl_free_urb_list, so that it
always waits for all urbs that are in progress.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b8f95e5d13 upstream.
fuse_abort_conn() does not guarantee that all async requests have actually
finished aborting (i.e. their ->end() function is called). This could
actually result in still used inodes after umount.
Add a helper to wait until all requests are fully done. This is done by
looking at the "num_waiting" counter. When this counter drops to zero, we
can be sure that no more requests are outstanding.
Fixes: 0d8e84b043 ("fuse: simplify request abort")
Cc: <stable@vger.kernel.org> # v4.2
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 45ff350bbd upstream.
fuse_dev_release() assumes that it's the only one referencing the
fpq->processing list, but that's not true, since fuse_abort_conn() can be
doing the same without any serialization between the two.
Fixes: c3696046be ("fuse: separate pqueue for clones")
Cc: <stable@vger.kernel.org> # v4.2
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 87114373ea upstream.
Refcounting of request is broken when fuse_abort_conn() is called and
request is on the fpq->io list:
- ref is taken too late
- then it is not dropped
Fixes: 0d8e84b043 ("fuse: simplify request abort")
Cc: <stable@vger.kernel.org> # v4.2
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a2477b0e67 upstream.
fuse_dev_splice_write() reads pipe->buffers to determine the size of
'bufs' array before taking the pipe_lock(). This is not safe as
another thread might change the 'pipe->buffers' between the allocation
and taking the pipe_lock(). So we end up with too small 'bufs' array.
Move the bufs allocations inside pipe_lock()/pipe_unlock() to fix this.
Fixes: dd3bb14f44 ("fuse: support splice() writing to fuse device")
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: <stable@vger.kernel.org> # v2.6.35
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 024d83cadc upstream.
Mikhail reported the following lockdep splat:
WARNING: possible irq lock inversion dependency detected
CPU 0/KVM/10284 just changed the state of lock:
000000000d538a88 (&st->lock){+...}, at:
speculative_store_bypass_update+0x10b/0x170
but this lock was taken by another, HARDIRQ-safe lock
in the past:
(&(&sighand->siglock)->rlock){-.-.}
and interrupts could create inverse lock ordering between them.
Possible interrupt unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&st->lock);
local_irq_disable();
lock(&(&sighand->siglock)->rlock);
lock(&st->lock);
<Interrupt>
lock(&(&sighand->siglock)->rlock);
*** DEADLOCK ***
The code path which connects those locks is:
speculative_store_bypass_update()
ssb_prctl_set()
do_seccomp()
do_syscall_64()
In svm_vcpu_run() speculative_store_bypass_update() is called with
interupts enabled via x86_virt_spec_ctrl_set_guest/host().
This is actually a false positive, because GIF=0 so interrupts are
disabled even if IF=1; however, we can easily move the invocations of
x86_virt_spec_ctrl_set_guest/host() into the interrupt disabled region to
cure it, and it's a good idea to keep the GIF=0/IF=1 area as small
and self-contained as possible.
Fixes: 1f50ddb4f4 ("x86/speculation: Handle HT correctly on AMD")
Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: kvm@vger.kernel.org
Cc: x86@kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b0a182f875 upstream.
Two users have reported [1] that they have an "extremely unlikely" system
with more than MAX_PA/2 memory and L1TF mitigation is not effective. In
fact it's a CPU with 36bits phys limit (64GB) and 32GB memory, but due to
holes in the e820 map, the main region is almost 500MB over the 32GB limit:
[ 0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000081effffff] usable
Suggestions to use 'mem=32G' to enable the L1TF mitigation while losing the
500MB revealed, that there's an off-by-one error in the check in
l1tf_select_mitigation().
l1tf_pfn_limit() returns the last usable pfn (inclusive) and the range
check in the mitigation path does not take this into account.
Instead of amending the range check, make l1tf_pfn_limit() return the first
PFN which is over the limit which is less error prone. Adjust the other
users accordingly.
[1] https://bugzilla.suse.com/show_bug.cgi?id=1105536
Fixes: 17dbca1193 ("x86/speculation/l1tf: Add sysfs reporting for l1tf")
Reported-by: George Anchev <studio@anchev.net>
Reported-by: Christopher Snowhill <kode54@gmail.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180823134418.17008-1-vbabka@suse.cz
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9df9516940 upstream.
On 32bit PAE kernels on 64bit hardware with enough physical bits,
l1tf_pfn_limit() will overflow unsigned long. This in turn affects
max_swapfile_size() and can lead to swapon returning -EINVAL. This has been
observed in a 32bit guest with 42 bits physical address size, where
max_swapfile_size() overflows exactly to 1 << 32, thus zero, and produces
the following warning to dmesg:
[ 6.396845] Truncating oversized swap area, only using 0k out of 2047996k
Fix this by using unsigned long long instead.
Fixes: 17dbca1193 ("x86/speculation/l1tf: Add sysfs reporting for l1tf")
Fixes: 377eeaa8e1 ("x86/speculation/l1tf: Limit swap file size to MAX_PA/2")
Reported-by: Dominique Leuenberger <dimstar@suse.de>
Reported-by: Adrian Schroeter <adrian@suse.de>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180820095835.5298-1-vbabka@suse.cz
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2dc77533f1 upstream.
When building the kernel for Sparc using gcc 7.x, the build fails
with:
arch/sparc/kernel/pcic.c: In function ‘pcibios_fixup_bus’:
arch/sparc/kernel/pcic.c:647:8: error: ‘cmd’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
cmd |= PCI_COMMAND_IO;
^~
The simplified code looks like this:
unsigned int cmd;
[...]
pcic_read_config(dev->bus, dev->devfn, PCI_COMMAND, 2, &cmd);
[...]
cmd |= PCI_COMMAND_IO;
I.e, the code assumes that pcic_read_config() will always initialize
cmd. But it's not the case. Looking at pcic_read_config(), if
bus->number is != 0 or if the size is not one of 1, 2 or 4, *val will
not be initialized.
As a simple fix, we initialize cmd to zero at the beginning of
pcibios_fixup_bus.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 86658b819c upstream.
Contention on updating a PMD entry by a large number of vcpus can lead
to duplicate work when handling stage 2 page faults. As the page table
update follows the break-before-make requirement of the architecture,
it can lead to repeated refaults due to clearing the entry and
flushing the tlbs.
This problem is more likely when -
* there are large number of vcpus
* the mapping is large block mapping
such as when using PMD hugepages (512MB) with 64k pages.
Fix this by skipping the page table update if there is no change in
the entry being updated.
Cc: stable@vger.kernel.org
Fixes: ad361f093c ("KVM: ARM: Support hugetlbfs backed huge pages")
Reviewed-by: Suzuki Poulose <suzuki.poulose@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 976d34e2da upstream.
When there is contention on faulting in a particular page table entry
at stage 2, the break-before-make requirement of the architecture can
lead to additional refaulting due to TLB invalidation.
Avoid this by skipping a page table update if the new value of the PTE
matches the previous value.
Cc: stable@vger.kernel.org
Fixes: d5d8184d35 ("KVM: ARM: Memory virtualization setup")
Reviewed-by: Suzuki Poulose <suzuki.poulose@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch is 4.9.y only. Kernels 4.12 and later are unaffected, since
all the underlying ion_handle infrastructure has been ripped out.
The ION_IOC_{MAP,SHARE} ioctls drop and reacquire client->lock several
times while operating on one of the client's ion_handles. This creates
windows where userspace can call ION_IOC_FREE on the same client with
the same handle, and effectively make the kernel drop its own reference.
For example:
- thread A: ION_IOC_ALLOC creates an ion_handle with refcount 1
- thread A: starts ION_IOC_MAP and increments the refcount to 2
- thread B: ION_IOC_FREE decrements the refcount to 1
- thread B: ION_IOC_FREE decrements the refcount to 0 and frees the
handle
- thread A: continues ION_IOC_MAP with a dangling ion_handle * to
freed memory
Fix this by holding client->lock for the duration of
ION_IOC_{MAP,SHARE}, preventing the concurrent ION_IOC_FREE. Also
remove ion_handle_get_by_id(), since there's literally no way to use it
safely.
Cc: stable@vger.kernel.org # v4.11-
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a4d2aadca1 upstream.
While working on extended rand for last_error/first_error timestamps,
I noticed that the endianess is wrong; we access the little-endian
fields in struct ext4_super_block as native-endian when we print them.
This adds a special case in ext4_attr_show() and ext4_attr_store()
to byteswap the superblock fields if needed.
In older kernels, this code was part of super.c, it got moved to
sysfs.c in linux-4.4.
Cc: stable@vger.kernel.org
Fixes: 52c198c682 ("ext4: add sysfs entry showing whether the fs contains errors")
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7d95178c77 upstream.
Extended attribute names are defined to be NUL-terminated, so the name
must not contain a NUL character. This is important because there are
places when remove extended attribute, the code uses strlen to
determine the length of the entry. That should probably be fixed at
some point, but code is currently really messy, so the simplest fix
for now is to simply validate that the extended attributes are sane.
https://bugzilla.kernel.org/show_bug.cgi?id=200401
Reported-by: Wen Xu <wen.xu@gatech.edu>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 306d6c49ac upstream.
When the oom killer kills a userspace process in the page fault handler
while in guest context, the fault handler fails to release the mm_sem
if the FAULT_FLAG_RETRY_NOWAIT option is set. This leads to a deadlock
when tearing down the mm when the process terminates. This bug can only
happen when pfault is enabled, so only KVM clients are affected.
The problem arises in the rare cases in which handle_mm_fault does not
release the mm_sem. This patch fixes the issue by manually releasing
the mm_sem when needed.
Fixes: 24eb3a824c ("KVM: s390: Add FAULT_FLAG_RETRY_NOWAIT for guest fault")
Cc: <stable@vger.kernel.org> # 3.15+
Signed-off-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5ad356eabc upstream.
ARM64's pfn_valid() shifts away the upper PAGE_SHIFT bits of the input
before seeing if the PFN is valid. This leads to false positives when
some of the upper bits are set, but the lower bits match a valid PFN.
For example, the following userspace code looks up a bogus entry in
/proc/kpageflags:
int pagemap = open("/proc/self/pagemap", O_RDONLY);
int pageflags = open("/proc/kpageflags", O_RDONLY);
uint64_t pfn, val;
lseek64(pagemap, [...], SEEK_SET);
read(pagemap, &pfn, sizeof(pfn));
if (pfn & (1UL << 63)) { /* valid PFN */
pfn &= ((1UL << 55) - 1); /* clear flag bits */
pfn |= (1UL << 55);
lseek64(pageflags, pfn * sizeof(uint64_t), SEEK_SET);
read(pageflags, &val, sizeof(val));
}
On ARM64 this causes the userspace process to crash with SIGSEGV rather
than reading (1 << KPF_NOPAGE). kpageflags_read() treats the offset as
valid, and stable_page_flags() will try to access an address between the
user and kernel address ranges.
Fixes: c1cc155261 ("arm64: MMU initialisation")
Cc: stable@vger.kernel.org
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fd09b7d3b3 upstream.
An earlier commit had a typo which prevented the
optimization from working:
commit 18dd8e1a65 ("Do not send SMB3 SET_INFO request if nothing is changing")
Thank you to Metze for noticing this. Also clear a
reserved field in the FILE_BASIC_INFO struct we send
that should be zero (all the other fields in that
struct were set or cleared explicitly already in
cifs_set_file_info).
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
CC: Stable <stable@vger.kernel.org> # 4.9.x+
Reported-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e02789a53d upstream.
When enumerating snapshots, the last few bytes of the final
snapshot could be left off since we were miscalculating the
length returned (leaving off the sizeof struct SRV_SNAPSHOT_ARRAY)
See MS-SMB2 section 2.2.32.2. In addition fixup the length used
to allow smaller buffer to be passed in, in order to allow
returning the size of the whole snapshot array more easily.
Sample userspace output with a kernel patched with this
(mounted to a Windows volume with two snapshots).
Before this patch, the second snapshot would be missing a
few bytes at the end.
~/cifs-2.6# ~/enum-snapshots /mnt/file
press enter to issue the ioctl to retrieve snapshot information ...
size of snapshot array = 102
Num snapshots: 2 Num returned: 2 Array Size: 102
Snapshot 0:@GMT-2018.06.30-19.34.17
Snapshot 1:@GMT-2018.06.30-19.33.37
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 950132afd5 upstream.
/proc/fs/cifs/DebugData displays the features (Kconfig options)
used to build cifs.ko but it was missing some, and needed comma
separator. These can be useful in debugging certain problems
so we know which optional features were enabled in the user's build.
Also clarify them, by making them more closely match the
corresponding CONFIG_CIFS_* parm.
Old format:
Features: dfs fscache posix spnego xattr acl
New format:
Features: DFS,FSCACHE,SMB_DIRECT,STATS,DEBUG2,ALLOW_INSECURE_LEGACY,CIFS_POSIX,UPCALL(SPNEGO),XATTR,ACL
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Paulo Alcantara <palcantara@suse.de>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 40413955ee upstream.
in for(),if((optlen > 0) && (optptr[1] == 0)), enter infinite loop.
Test: receive a packet which the ip length > 20 and the first byte of ip option is 0, produce this issue
Signed-off-by: yujuan.qi <yujuan.qi@mediatek.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e95153b64d ]
Commands that are reset are returned with status
SAM_STAT_COMMAND_TERMINATED. PVSCSI currently returns DID_OK |
SAM_STAT_COMMAND_TERMINATED which fails the command. Instead, set hostbyte
to DID_RESET to allow upper layers to retry.
Tested by copying a large file between two pvscsi disks on same adapter
while performing a bus reset at 1-second intervals. Before fix, commands
sometimes fail with DID_OK. After fix, commands observed to fail with
DID_RESET.
Signed-off-by: Jim Gill <jgill@vmware.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 1550ec458e ]
When receiving a LOGO request we forget to clear the FC_RP_STARTED flag
before starting the rport delete routine.
As the started flag was not cleared, we're not deleting the rport but
waiting for a restart and thus are keeping the reference count of the rdata
object at 1.
This leads to the following kmemleak report:
unreferenced object 0xffff88006542aa00 (size 512):
comm "kworker/0:2", pid 24, jiffies 4294899222 (age 226.880s)
hex dump (first 32 bytes):
68 96 fe 65 00 88 ff ff 00 00 00 00 00 00 00 00 h..e............
01 00 00 00 08 00 00 00 02 c5 45 24 ac b8 00 10 ..........E$....
backtrace:
[<(____ptrval____)>] fcoe_ctlr_vn_add.isra.5+0x7f/0x770 [libfcoe]
[<(____ptrval____)>] fcoe_ctlr_vn_recv+0x12af/0x27f0 [libfcoe]
[<(____ptrval____)>] fcoe_ctlr_recv_work+0xd01/0x32f0 [libfcoe]
[<(____ptrval____)>] process_one_work+0x7ff/0x1420
[<(____ptrval____)>] worker_thread+0x87/0xef0
[<(____ptrval____)>] kthread+0x2db/0x390
[<(____ptrval____)>] ret_from_fork+0x35/0x40
[<(____ptrval____)>] 0xffffffffffffffff
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Reported-by: ard <ard@kwaak.net>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit afb41bb039 ]
Current value for a target abort error is 0x010, however, this value
should in fact be 0x002. As it stands, the range of error is 0..7 so
it is currently never being detected. This bug has been in the driver
since the early 2.6.12 days (or before).
Detected by CoverityScan, CID#744290 ("Logically dead code")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a3f94cb99a ]
Previously in squashfs_readpage() when copying data into the page
cache, it used the length of the datablock read from the filesystem
(after decompression). However, if the filesystem has been corrupted
this data block may be short, which will leave pages unfilled.
The fix for this is to compute the expected number of bytes to copy
from the inode size, and use this to detect if the block is short.
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Tested-by: Willy Tarreau <w@1wt.eu>
Cc: Анатолий Тросиненко <anatoly.trosinenko@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit cdbb65c4c7 ]
Anatoly continues to find issues with fuzzed squashfs images.
This time, corrupt, missing, or undersized data for the page filling
wasn't checked for, because the squashfs_{copy,read}_cache() functions
did the squashfs_copy_data() call without checking the resulting data
size.
Which could result in the page cache pages being incompletely filled in,
and no error indication to the user space reading garbage data.
So make a helper function for the "fill in pages" case, because the
exact same incomplete sequence existed in two places.
[ I should have made a squashfs branch for these things, but I didn't
intend to start doing them in the first place.
My historical connection through cramfs is why I got into looking at
these issues at all, and every time I (continue to) think it's a
one-off.
Because _this_ time is always the last time. Right? - Linus ]
Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com>
Tested-by: Willy Tarreau <w@1wt.eu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Phillip Lougher <phillip@squashfs.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ec837d620c ]
Fix type warnings in arch/arc/mm/cache.c.
../arch/arc/mm/cache.c: In function 'flush_anon_page':
../arch/arc/mm/cache.c:1062:55: warning: passing argument 2 of '__flush_dcache_page' makes integer from pointer without a cast [-Wint-conversion]
__flush_dcache_page((phys_addr_t)page_address(page), page_address(page));
^~~~~~~~~~~~~~~~~~
../arch/arc/mm/cache.c:1013:59: note: expected 'long unsigned int' but argument is of type 'void *'
void __flush_dcache_page(phys_addr_t paddr, unsigned long vaddr)
~~~~~~~~~~~~~~^~~~~
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: linux-snps-arc@lists.infradead.org
Cc: Elad Kanfi <eladkan@mellanox.com>
Cc: Leon Romanovsky <leonro@mellanox.com>
Cc: Ofer Levi <oferle@mellanox.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b1f32ce1c3 ]
Add <linux/types.h> to fix build errors.
Both ctop.h and <soc/nps/common.h> use u32 types and cause many
errors.
Examples:
../include/soc/nps/common.h:71:4: error: unknown type name 'u32'
u32 __reserved:20, cluster:4, core:4, thread:4;
../include/soc/nps/common.h:76:3: error: unknown type name 'u32'
u32 value;
../include/soc/nps/common.h:124:4: error: unknown type name 'u32'
u32 base:8, cl_x:4, cl_y:4,
../include/soc/nps/common.h:127:3: error: unknown type name 'u32'
u32 value;
../arch/arc/plat-eznps/include/plat/ctop.h:83:4: error: unknown type name 'u32'
u32 gen:1, gdis:1, clk_gate_dis:1, asb:1,
../arch/arc/plat-eznps/include/plat/ctop.h:86:3: error: unknown type name 'u32'
u32 value;
../arch/arc/plat-eznps/include/plat/ctop.h:93:4: error: unknown type name 'u32'
u32 csa:22, dmsid:6, __reserved:3, cs:1;
../arch/arc/plat-eznps/include/plat/ctop.h:95:3: error: unknown type name 'u32'
u32 value;
Cc: linux-snps-arc@lists.infradead.org
Cc: Ofer Levi <oferle@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ab123fe071 ]
When driver gets notification for mtu change, driver does not handle it for
all RQs. It handles only RQ[0].
Fix is to use enic_change_mtu() interface to change mtu for vf.
Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d5ea019f8a ]
This reverts commit 2a027b47db ("MIPS: BCM47XX: Enable 74K Core
ExternalSync for PCIe erratum").
Enabling ExternalSync caused a regression for BCM4718A1 (used e.g. in
Netgear E3000 and ASUS RT-N16): it simply hangs during PCIe
initialization. It's likely that BCM4717A1 is also affected.
I didn't notice that earlier as the only BCM47XX devices with PCIe I
own are:
1) BCM4706 with 2 x 14e4:4331
2) BCM4706 with 14e4:4360 and 14e4:4331
it appears that BCM4706 is unaffected.
While BCM5300X-ES300-RDS.pdf seems to document that erratum and its
workarounds (according to quotes provided by Tokunori) it seems not even
Broadcom follows them.
According to the provided info Broadcom should define CONF7_ES in their
SDK's mipsinc.h and implement workaround in the si_mips_init(). Checking
both didn't reveal such code. It *could* mean Broadcom also had some
problems with the given workaround.
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Signed-off-by: Paul Burton <paul.burton@mips.com>
Reported-by: Michael Marley <michael@michaelmarley.com>
Patchwork: https://patchwork.linux-mips.org/patch/20032/
URL: https://bugs.openwrt.org/index.php?do=details&task_id=1688
Cc: Tokunori Ikegami <ikegami@allied-telesis.co.jp>
Cc: Hauke Mehrtens <hauke@hauke-m.de>
Cc: Chris Packham <chris.packham@alliedtelesis.co.nz>
Cc: James Hogan <jhogan@kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 16e536ef47 ]
/sys/../zswap/stored_pages keeps rising in a zswap test with
"zswap.max_pool_percent=0" parameter. But it should not compress or
store pages any more since there is no space in the compressed pool.
Reproduce steps:
1. Boot kernel with "zswap.enabled=1"
2. Set the max_pool_percent to 0
# echo 0 > /sys/module/zswap/parameters/max_pool_percent
3. Do memory stress test to see if some pages have been compressed
# stress --vm 1 --vm-bytes $mem_available"M" --timeout 60s
4. Watching the 'stored_pages' number increasing or not
The root cause is:
When zswap_max_pool_percent is set to 0 via kernel parameter,
zswap_is_full() will always return true due to zswap_shrink(). But if
the shinking is able to reclain a page successfully the code then
proceeds to compressing/storing another page, so the value of
stored_pages will keep changing.
To solve the issue, this patch adds a zswap_is_full() check again after
zswap_shrink() to make sure it's now under the max_pool_percent, and to
not compress/store if we reached the limit.
Link: http://lkml.kernel.org/r/20180530103936.17812-1-liwang@redhat.com
Signed-off-by: Li Wang <liwang@redhat.com>
Acked-by: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Huang Ying <huang.ying.caritas@gmail.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c2412ac45a ]
If we meet a conflicting object that is marked FSCACHE_OBJECT_IS_LIVE in
the active object tree, we have been emitting a BUG after logging
information about it and the new object.
Instead, we should wait for the CACHEFILES_OBJECT_ACTIVE flag to be cleared
on the old object (or return an error). The ACTIVE flag should be cleared
after it has been removed from the active object tree. A timeout of 60s is
used in the wait, so we shouldn't be able to get stuck there.
Fixes: 9ae326a690 ("CacheFiles: A cache that backs onto a mounted filesystem")
Signed-off-by: Kiran Kumar Modukuri <kiran.modukuri@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 934140ab02 ]
cachefiles_read_waiter() has the right to access a 'monitor' object by
virtue of being called under the waitqueue lock for one of the pages in its
purview. However, it has no ref on that monitor object or on the
associated operation.
What it is allowed to do is to move the monitor object to the operation's
to_do list, but once it drops the work_lock, it's actually no longer
permitted to access that object. However, it is trying to enqueue the
retrieval operation for processing - but it can only do this via a pointer
in the monitor object, something it shouldn't be doing.
If it doesn't enqueue the operation, the operation may not get processed.
If the order is flipped so that the enqueue is first, then it's possible
for the work processor to look at the to_do list before the monitor is
enqueued upon it.
Fix this by getting a ref on the operation so that we can trust that it
will still be there once we've added the monitor to the to_do list and
dropped the work_lock. The op can then be enqueued after the lock is
dropped.
The bug can manifest in one of a couple of ways. The first manifestation
looks like:
FS-Cache:
FS-Cache: Assertion failed
FS-Cache: 6 == 5 is false
------------[ cut here ]------------
kernel BUG at fs/fscache/operation.c:494!
RIP: 0010:fscache_put_operation+0x1e3/0x1f0
...
fscache_op_work_func+0x26/0x50
process_one_work+0x131/0x290
worker_thread+0x45/0x360
kthread+0xf8/0x130
? create_worker+0x190/0x190
? kthread_cancel_work_sync+0x10/0x10
ret_from_fork+0x1f/0x30
This is due to the operation being in the DEAD state (6) rather than
INITIALISED, COMPLETE or CANCELLED (5) because it's already passed through
fscache_put_operation().
The bug can also manifest like the following:
kernel BUG at fs/fscache/operation.c:69!
...
[exception RIP: fscache_enqueue_operation+246]
...
#7 [ffff883fff083c10] fscache_enqueue_operation at ffffffffa0b793c6
#8 [ffff883fff083c28] cachefiles_read_waiter at ffffffffa0b15a48
#9 [ffff883fff083c48] __wake_up_common at ffffffff810af028
I'm not entirely certain as to which is line 69 in Lei's kernel, so I'm not
entirely clear which assertion failed.
Fixes: 9ae326a690 ("CacheFiles: A cache that backs onto a mounted filesystem")
Reported-by: Lei Xue <carmark.dlut@gmail.com>
Reported-by: Vegard Nossum <vegard.nossum@gmail.com>
Reported-by: Anthony DeRobertis <aderobertis@metrics.net>
Reported-by: NeilBrown <neilb@suse.com>
Reported-by: Daniel Axtens <dja@axtens.net>
Reported-by: Kiran Kumar Modukuri <kiran.modukuri@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d0eb06afe7 ]
Alter the state-check assertion in fscache_enqueue_operation() to allow
cancelled operations to be given processing time so they can be cleaned up.
Also fix a debugging statement that was requiring such operations to have
an object assigned.
Fixes: 9ae326a690 ("CacheFiles: A cache that backs onto a mounted filesystem")
Reported-by: Kiran Kumar Modukuri <kiran.modukuri@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 92a4728608 ]
Dirk Gouders reported that two consecutive "make" invocations on an
already compiled tree will show alternating behaviors:
$ make
CALL scripts/checksyscalls.sh
DESCEND objtool
CHK include/generated/compile.h
DATAREL arch/x86/boot/compressed/vmlinux
Kernel: arch/x86/boot/bzImage is ready (#48)
Building modules, stage 2.
MODPOST 165 modules
$ make
CALL scripts/checksyscalls.sh
DESCEND objtool
CHK include/generated/compile.h
LD arch/x86/boot/compressed/vmlinux
ZOFFSET arch/x86/boot/zoffset.h
AS arch/x86/boot/header.o
LD arch/x86/boot/setup.elf
OBJCOPY arch/x86/boot/setup.bin
OBJCOPY arch/x86/boot/vmlinux.bin
BUILD arch/x86/boot/bzImage
Setup is 15644 bytes (padded to 15872 bytes).
System is 6663 kB
CRC 3eb90f40
Kernel: arch/x86/boot/bzImage is ready (#48)
Building modules, stage 2.
MODPOST 165 modules
He bisected it back to:
commit 98f7852537 ("x86/boot: Refuse to build with data relocations")
The root cause was the use of the "if_changed" kbuild function multiple
times for the same target. It was designed to only be used once per
target, otherwise it will effectively always trigger, flipping back and
forth between the two commands getting recorded by "if_changed". Instead,
this patch merges the two commands into a single function to get stable
build artifacts (i.e. .vmlinux.cmd), and a single build behavior.
Bisected-and-Reported-by: Dirk Gouders <dirk@gouders.net>
Fix-Suggested-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180724230827.GA37823@beast
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ae2dcb28c2 ]
Rx hash/filter table configuration uses rss_conf_obj to configure filters
in the hardware. This object is initialized only when the interface is
brought up.
This patch adds driver changes to configure rss params only when the device
is in opened state. In port disabled case, the config will be cached in the
driver structure which will be applied in the successive load path.
Please consider applying it to 'net' branch.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0894da849f ]
Including asm/cacheflush.h first results in the following build error
when trying to build sparc32:allmodconfig, because 'struct page' has not
been declared, and the function declaration ends up creating a separate
(private) declaration of struct page (as a result of function arguments
being in the scope of the function declaration and definition, not in
global scope).
The C scoping rules do not just affect variable visibility, they also
affect type declaration visibility.
The end result is that when the actual call site is seen in
<linux/highmem.h>, the 'struct page' type in the caller is not the same
'struct page' that the function was declared with, resulting in:
In file included from arch/sparc/include/asm/page.h:10:0,
...
from drivers/staging/media/omap4iss/iss_video.c:15:
include/linux/highmem.h: In function 'clear_user_highpage':
include/linux/highmem.h:137:31: error:
passing argument 1 of 'sparc_flush_page_to_ram' from incompatible
pointer type
Include generic includes files first to fix the problem.
Fixes: fc96d58c10 ("[media] v4l: omap4iss: Add support for OMAP4 camera interface - Video devices")
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
[ Added explanation of C scope rules - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b5c1a23b17 ]
of_iomap() can return NULL so that return needs to be checked and NULL
treated as failure. While at it also take care of the missing
of_node_put() in the error path.
Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
Fixes: commit afa17a500a ("net/can: add driver for mscan family & mpc52xx_mscan")
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c9ce1fa1c2 ]
Prevent drivers from building on PPC32 if they use isa_bus_to_virt(),
isa_virt_to_bus(), or isa_page_to_bus(), which are not available and
thus cause build errors.
../drivers/net/ethernet/3com/3c515.c: In function 'corkscrew_open':
../drivers/net/ethernet/3com/3c515.c:824:9: error: implicit declaration of function 'isa_virt_to_bus'; did you mean 'virt_to_bus'? [-Werror=implicit-function-declaration]
../drivers/net/ethernet/amd/lance.c: In function 'lance_rx':
../drivers/net/ethernet/amd/lance.c:1203:23: error: implicit declaration of function 'isa_bus_to_virt'; did you mean 'bus_to_virt'? [-Werror=implicit-function-declaration]
../drivers/net/ethernet/amd/ni65.c: In function 'ni65_init_lance':
../drivers/net/ethernet/amd/ni65.c:585:20: error: implicit declaration of function 'isa_virt_to_bus'; did you mean 'virt_to_bus'? [-Werror=implicit-function-declaration]
../drivers/net/ethernet/cirrus/cs89x0.c: In function 'net_open':
../drivers/net/ethernet/cirrus/cs89x0.c:897:20: error: implicit declaration of function 'isa_virt_to_bus'; did you mean 'virt_to_bus'? [-Werror=implicit-function-declaration]
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 58874c7b24 ]
There's a possible race where driver can read link status in mid-transition
and see that virtual-link is up yet speed is 0. Since in this
mid-transition we're guaranteed to see a mailbox from MFW soon, we can
afford to treat this as link down.
Fixes: cc875c2e ("qed: Add link support")
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b9c1e60e7b ]
None of the JITs is allowed to implement exit paths from the BPF
insn mappings other than BPF_JMP | BPF_EXIT. In the BPF core code
we have a couple of rewrites in eBPF (e.g. LD_ABS / LD_IND) and
in eBPF to cBPF translation to retain old existing behavior where
exceptions may occur; they are also tightly controlled by the
verifier where it disallows some of the features such as BPF to
BPF calls when legacy LD_ABS / LD_IND ops are present in the BPF
program. During recent review of all BPF_XADD JIT implementations
I noticed that the ppc64 one is buggy in that it contains two
jumps to exit paths. This is problematic as this can bypass verifier
expectations e.g. pointed out in commit f6b1b3bf0d ("bpf: fix
subprog verifier bypass by div/mod by 0 exception"). The first
exit path is obsoleted by the fix in ca36960211 ("bpf: allow xadd
only on aligned memory") anyway, and for the second one we need to
do a fetch, add and store loop if the reservation from lwarx/ldarx
was lost in the meantime.
Fixes: 156d0e290e ("powerpc/ebpf/jit: Implement JIT compiler for extended BPF")
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
Tested-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit eec24f2a0d ]
The list [1] of commits doing endianness fixes in USB subsystem is long
due to below quote from USB spec Revision 2.0 from April 27, 2000:
------------
8.1 Byte/Bit Ordering
Multiple byte fields in standard descriptors, requests, and responses
are interpreted as and moved over the bus in little-endian order, i.e.
LSB to MSB.
------------
This commit belongs to the same family.
[1] Example of endianness fixes in USB subsystem:
commit 14e1d56cbe ("usb: gadget: f_uac2: endianness fixes.")
commit 42370b8211 ("usb: gadget: f_uac1: endianness fixes.")
commit 63afd5cc78 ("USB: chaoskey: fix Alea quirk on big-endian hosts")
commit 74098c4ac7 ("usb: gadget: acm: fix endianness in notifications")
commit cdd7928df0 ("ACM gadget: fix endianness in notifications")
commit 323ece54e0 ("cdc-wdm: fix endianness bug in debug statements")
commit e102609f10 ("usb: gadget: uvc: Fix endianness mismatches")
list goes on
Fixes: 132fcb4608 ("usb: gadget: Add Audio Class 2.0 Driver")
Signed-off-by: Eugeniu Rosca <erosca@de.adit-jv.com>
Reviewed-by: Ruslan Bilovol <ruslan.bilovol@gmail.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a2b22dddc7 ]
The tools/usb/ffs-test.c file defines cpu_to_le16/32 by using the C
library htole16/32 function calls. However, cpu_to_le16/32 are used when
initializing structures, i.e in a context where a function call is not
allowed.
It works fine on little endian systems because htole16/32 are defined by
the C library as no-ops. But on big-endian systems, they are actually
doing something, which might involve calling a function, causing build
failures, such as:
ffs-test.c:48:25: error: initializer element is not constant
#define cpu_to_le32(x) htole32(x)
^~~~~~~
ffs-test.c:128:12: note: in expansion of macro ‘cpu_to_le32’
.magic = cpu_to_le32(FUNCTIONFS_DESCRIPTORS_MAGIC_V2),
^~~~~~~~~~~
To solve this, we code cpu_to_le16/32 in a way that allows them to be
used when initializing structures. This fix was imported from
meta-openembedded/android-tools/fix-big-endian-build.patch written by
Thomas Petazzoni <thomas.petazzoni@free-electrons.com>.
CC: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Peter Senna Tschudin <peter.senna@gmail.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a39ba90a1c ]
Fix build errors when built for PPC64:
These variables are only used on PPC32 so they don't need to be
initialized for PPC64.
../drivers/usb/phy/phy-fsl-usb.c: In function 'usb_otg_start':
../drivers/usb/phy/phy-fsl-usb.c:865:3: error: '_fsl_readl' undeclared (first use in this function); did you mean 'fsl_readl'?
_fsl_readl = _fsl_readl_be;
../drivers/usb/phy/phy-fsl-usb.c:865:16: error: '_fsl_readl_be' undeclared (first use in this function); did you mean 'fsl_readl'?
_fsl_readl = _fsl_readl_be;
../drivers/usb/phy/phy-fsl-usb.c:866:3: error: '_fsl_writel' undeclared (first use in this function); did you mean 'fsl_writel'?
_fsl_writel = _fsl_writel_be;
../drivers/usb/phy/phy-fsl-usb.c:866:17: error: '_fsl_writel_be' undeclared (first use in this function); did you mean 'fsl_writel'?
_fsl_writel = _fsl_writel_be;
../drivers/usb/phy/phy-fsl-usb.c:868:16: error: '_fsl_readl_le' undeclared (first use in this function); did you mean 'fsl_readl'?
_fsl_readl = _fsl_readl_le;
../drivers/usb/phy/phy-fsl-usb.c:869:17: error: '_fsl_writel_le' undeclared (first use in this function); did you mean 'fsl_writel'?
_fsl_writel = _fsl_writel_le;
and the sysfs "show" function return type should be ssize_t, not int:
../drivers/usb/phy/phy-fsl-usb.c:1042:49: error: initialization of 'ssize_t (*)(struct device *, struct device_attribute *, char *)' {aka 'long int (*)(struct device *, struct device_attribute *, char *)'} from incompatible pointer type 'int (*)(struct device *, struct device_attribute *, char *)' [-Werror=incompatible-pointer-types]
static DEVICE_ATTR(fsl_usb2_otg_state, S_IRUGO, show_fsl_usb2_otg_state, NULL);
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Felipe Balbi <balbi@kernel.org>
Cc: linux-usb@vger.kernel.org
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit f36b507c14 ]
The driver may sleep in an interrupt handler.
The function call path (from bottom to top) in Linux-4.16.7 is:
[FUNC] r8a66597_queue(GFP_KERNEL)
drivers/usb/gadget/udc/r8a66597-udc.c, 1193:
r8a66597_queue in get_status
drivers/usb/gadget/udc/r8a66597-udc.c, 1301:
get_status in setup_packet
drivers/usb/gadget/udc/r8a66597-udc.c, 1381:
setup_packet in irq_control_stage
drivers/usb/gadget/udc/r8a66597-udc.c, 1508:
irq_control_stage in r8a66597_irq (interrupt handler)
To fix this bug, GFP_KERNEL is replaced with GFP_ATOMIC.
This bug is found by my static analysis tool (DSAC-2) and checked by
my code review.
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0602088b10 ]
The driver may sleep with holding a spinlock.
The function call paths (from bottom to top) in Linux-4.16.7 are:
[FUNC] msleep
drivers/usb/gadget/udc/r8a66597-udc.c, 839:
msleep in init_controller
drivers/usb/gadget/udc/r8a66597-udc.c, 96:
init_controller in r8a66597_usb_disconnect
drivers/usb/gadget/udc/r8a66597-udc.c, 93:
spin_lock in r8a66597_usb_disconnect
[FUNC] msleep
drivers/usb/gadget/udc/r8a66597-udc.c, 835:
msleep in init_controller
drivers/usb/gadget/udc/r8a66597-udc.c, 96:
init_controller in r8a66597_usb_disconnect
drivers/usb/gadget/udc/r8a66597-udc.c, 93:
spin_lock in r8a66597_usb_disconnect
To fix these bugs, msleep() is replaced with mdelay().
This bug is found by my static analysis tool (DSAC-2) and checked by
my code review.
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b58262396f ]
The LVDS signal integrity is only guaranteed when the correct enable
sequence (first IPU DI, then LDB) is used. If the LDB display output was
active before the imx-drm driver is loaded (like when a bootsplash was
active) the DI will be disabled by the full IPU reset we do when loading
the driver. The LDB control registers are not part of the IPU range and
thus will remain unchanged.
This leads to the LDB still being active when the DI is getting enabled,
effectively reversing the required enable sequence. Fix this by also
disabling the LDB on driver bind.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a17037e7d5 ]
In iscsi_check_tmf_restrictions() task->hdr is dereferenced to print the
opcode, it is possible that task->hdr is NULL.
There are two cases based on opcode argument:
1. ISCSI_OP_SCSI_CMD - In this case alloc_pdu() is called
after iscsi_check_tmf_restrictions()
iscsi_prep_scsi_cmd_pdu() -> iscsi_check_tmf_restrictions() -> alloc_pdu().
Transport drivers allocate memory for iSCSI hdr in alloc_pdu() and assign
it to task->hdr. In case of TMF task->hdr will be NULL resulting in NULL
pointer dereference.
2. ISCSI_OP_SCSI_DATA_OUT - In this case transport driver can free the
memory for iSCSI hdr after transmitting the pdu so task->hdr can be NULL or
invalid.
This patch fixes this issue by removing task->hdr->opcode from the printk
statement.
Signed-off-by: Varun Prakash <varun@chelsio.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5cf3006cc8 ]
I was looking at usually suppressed gcc warnings,
[-Wimplicit-fallthrough=] in this case:
The code definitely looks like a break is missing here.
However I am not able to test the NL80211_IFTYPE_MESH_POINT,
nor do I actually know what might be :)
So please use this patch with caution and only if you are
able to do some testing.
Signed-off-by: Bernd Edlinger <bernd.edlinger@hotmail.de>
[johannes: looks obvious enough to apply as is, interesting
though that it never seems to have been a problem]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 19103a4bfb ]
As part of hw reconfig, only stations linked to AP interfaces are added
back to the driver ignoring those which are tied to AP_VLAN interfaces.
It is true that there could be stations tied to the AP_VLAN interface while
serving 4addr clients or when using AP_VLAN for VLAN operations; we should
be adding these stations back to the driver as part of hw reconfig, failing
to do so can cause functional issues.
In the case of ath10k driver, the following errors were observed.
ath10k_pci : failed to install key for non-existent peer XX:XX:XX:XX:XX:XX
Workqueue: events_freezable ieee80211_restart_work [mac80211]
(unwind_backtrace) from (show_stack+0x10/0x14)
(show_stack) (dump_stack+0x80/0xa0)
(dump_stack) (warn_slowpath_common+0x68/0x8c)
(warn_slowpath_common) (warn_slowpath_null+0x18/0x20)
(warn_slowpath_null) (ieee80211_enable_keys+0x88/0x154 [mac80211])
(ieee80211_enable_keys) (ieee80211_reconfig+0xc90/0x19c8 [mac80211])
(ieee80211_reconfig]) (ieee80211_restart_work+0x8c/0xa0 [mac80211])
(ieee80211_restart_work) (process_one_work+0x284/0x488)
(process_one_work) (worker_thread+0x228/0x360)
(worker_thread) (kthread+0xd8/0xec)
(kthread) (ret_from_fork+0x14/0x24)
Also while bringing down the AP VAP, WARN_ONs and errors related to peer
removal were observed.
ath10k_pci : failed to clear all peer wep keys for vdev 0: -2
ath10k_pci : failed to disassociate station: 8c:fd:f0:0a:8c:f5 vdev 0: -2
(unwind_backtrace) (show_stack+0x10/0x14)
(show_stack) (dump_stack+0x80/0xa0)
(dump_stack) (warn_slowpath_common+0x68/0x8c)
(warn_slowpath_common) (warn_slowpath_null+0x18/0x20)
(warn_slowpath_null) (sta_set_sinfo+0xb98/0xc9c [mac80211])
(sta_set_sinfo [mac80211]) (__sta_info_flush+0xf0/0x134 [mac80211])
(__sta_info_flush [mac80211]) (ieee80211_stop_ap+0xe8/0x390 [mac80211])
(ieee80211_stop_ap [mac80211]) (__cfg80211_stop_ap+0xe0/0x3dc [cfg80211])
(__cfg80211_stop_ap [cfg80211]) (cfg80211_stop_ap+0x30/0x44 [cfg80211])
(cfg80211_stop_ap [cfg80211]) (genl_rcv_msg+0x274/0x30c)
(genl_rcv_msg) (netlink_rcv_skb+0x58/0xac)
(netlink_rcv_skb) (genl_rcv+0x20/0x34)
(genl_rcv) (netlink_unicast+0x11c/0x204)
(netlink_unicast) (netlink_sendmsg+0x30c/0x370)
(netlink_sendmsg) (sock_sendmsg+0x70/0x84)
(sock_sendmsg) (___sys_sendmsg.part.3+0x188/0x228)
(___sys_sendmsg.part.3) (__sys_sendmsg+0x4c/0x70)
(__sys_sendmsg) (ret_fast_syscall+0x0/0x44)
These issues got fixed by adding the stations which are
tied to AP_VLANs back to the driver.
Signed-off-by: Manikanta Pubbisetty <mpubbise@codeaurora.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 8cc8877385 ]
Fix missing dst_release() when local broadcast or multicast traffic is
xfrm policy blocked.
For IPv4 this results to dst leak: ip_route_output_flow() allocates
dst_entry via __ip_route_output_key() and passes it to
xfrm_lookup_route(). xfrm_lookup returns ERR_PTR(-EPERM) that is
propagated. The dst that was allocated is never released.
IPv4 local broadcast testcase:
ping -b 192.168.1.255 &
sleep 1
ip xfrm policy add src 0.0.0.0/0 dst 192.168.1.255/32 dir out action block
IPv4 multicast testcase:
ping 224.0.0.1 &
sleep 1
ip xfrm policy add src 0.0.0.0/0 dst 224.0.0.1/32 dir out action block
For IPv6 the missing dst_release() causes trouble e.g. when used in netns:
ip netns add TEST
ip netns exec TEST ip link set lo up
ip link add dummy0 type dummy
ip link set dev dummy0 netns TEST
ip netns exec TEST ip addr add fd00::1111 dev dummy0
ip netns exec TEST ip link set dummy0 up
ip netns exec TEST ping -6 -c 5 ff02::1%dummy0 &
sleep 1
ip netns exec TEST ip xfrm policy add src ::/0 dst ff02::1 dir out action block
wait
ip netns del TEST
After netns deletion we see:
[ 258.239097] unregister_netdevice: waiting for lo to become free. Usage count = 2
[ 268.279061] unregister_netdevice: waiting for lo to become free. Usage count = 2
[ 278.367018] unregister_netdevice: waiting for lo to become free. Usage count = 2
[ 288.375259] unregister_netdevice: waiting for lo to become free. Usage count = 2
Fixes: ac37e2515c ("xfrm: release dst_orig in case of error in xfrm_lookup()")
Signed-off-by: Tommi Rantala <tommi.t.rantala@nokia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d6990976af ]
When setting the skb->dst before doing the MTU check, the route PMTU
caching and reporting is done on the new dst which is about to be
released.
Instead, PMTU handling should be done using the original dst.
This is aligned with IPv4 VTI.
Fixes: ccd740cbc6 ("vti6: Add pmtu handling to vti6_xmit.")
Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a13f085d11 upstream.
This fixes the following issues:
- When a buffer size is supplied to reiserfs_listxattr() such that each
individual name fits, but the concatenation of all names doesn't fit,
reiserfs_listxattr() overflows the supplied buffer. This leads to a
kernel heap overflow (verified using KASAN) followed by an out-of-bounds
usercopy and is therefore a security bug.
- When a buffer size is supplied to reiserfs_listxattr() such that a
name doesn't fit, -ERANGE should be returned. But reiserfs instead just
truncates the list of names; I have verified that if the only xattr on a
file has a longer name than the supplied buffer length, listxattr()
incorrectly returns zero.
With my patch applied, -ERANGE is returned in both cases and the memory
corruption doesn't happen anymore.
Credit for making me clean this code up a bit goes to Al Viro, who pointed
out that the ->actor calling convention is suboptimal and should be
changed.
Link: http://lkml.kernel.org/r/20180802151539.5373-1-jannh@google.com
Fixes: 48b32a3553 ("reiserfs: use generic xattr handlers")
Signed-off-by: Jann Horn <jannh@google.com>
Acked-by: Jeff Mahoney <jeffm@suse.com>
Cc: Eric Biggers <ebiggers@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bed4ff1ed4 upstream.
This fixes a race condition, where the DMAEN bit ends up being set after
I2C slave has transmitted a byte following the dummy read. When that
happens, an interrupt is generated instead, and no DMA request is generated
to kickstart the DMA read, and a timeout happens after DMA_TIMEOUT (1 sec).
Fixed by setting the DMAEN bit before the dummy read.
Signed-off-by: Esben Haabendal <eha@deif.com>
Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1204e35bed upstream.
Commit b440bde74f ("PCI: Add pci_ignore_hotplug() to ignore hotplug
events for a device") iterates over the devices on a hotplug port's
subordinate bus in pciehp's IRQ handler without acquiring pci_bus_sem.
It is thus possible for a user to cause a crash by concurrently
manipulating the device list, e.g. by disabling slot power via sysfs
on a different CPU or by initiating a remove/rescan via sysfs.
This can't be fixed by acquiring pci_bus_sem because it may sleep.
The simplest fix is to avoid the list iteration altogether and just
check the ignore_hotplug flag on the port itself. This works because
pci_ignore_hotplug() sets the flag both on the device as well as on its
parent bridge.
We do lose the ability to print the name of the device blocking hotplug
in the debug message, but that's probably bearable.
Fixes: b440bde74f ("PCI: Add pci_ignore_hotplug() to ignore hotplug events for a device")
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 281e878eab upstream.
When pciehp is unbound (e.g. on unplug of a Thunderbolt device), the
hotplug_slot struct is deregistered and thus freed before freeing the
IRQ. The IRQ handler and the work items it schedules print the slot
name referenced from the freed structure in various informational and
debug log messages, each time resulting in a quadruple dereference of
freed pointers (hotplug_slot -> pci_slot -> kobject -> name).
At best the slot name is logged as "(null)", at worst kernel memory is
exposed in logs or the driver crashes:
pciehp 0000:10:00.0:pcie204: Slot((null)): Card not present
An attacker may provoke the bug by unplugging multiple devices on a
Thunderbolt daisy chain at once. Unplugging can also be simulated by
powering down slots via sysfs. The bug is particularly easy to trigger
in poll mode.
It has been present since the driver's introduction in 2004:
https://git.kernel.org/tglx/history/c/c16b4b14d980
Fix by rearranging teardown such that the IRQ is freed first. Run the
work items queued by the IRQ handler to completion before freeing the
hotplug_slot struct by draining the work queue from the ->release_slot
callback which is invoked by pci_hp_deregister().
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org # v2.6.4
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3dbe97efe8 upstream.
PCIe r4.0, sec 9.3.5.4, "Device Control Register", shows both
Max_Payload_Size (MPS) and Max_Read_request_Size (MRRS) to be 'RsvdP' for
VFs. Just prior to the table it states:
"PF and VF functionality is defined in Section 7.5.3.4 except where
noted in Table 9-16. For VF fields marked 'RsvdP', the PF setting
applies to the VF."
All of which implies that with respect to Max_Payload_Size Supported
(MPSS), MPS, and MRRS values, we should not be paying any attention to the
VF's fields, but rather only to the PF's. Only looking at the PF's fields
also logically makes sense as it's the sole physical interface to the PCIe
bus.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=200527
Fixes: 27d868b5e6 ("PCI: Set MPS to match upstream bridge")
Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org # 4.3+
Cc: Keith Busch <keith.busch@intel.com>
Cc: Sinan Kaya <okaya@kernel.org>
Cc: Dongdong Liu <liudongdong3@huawei.com>
Cc: Jon Mason <jdmason@kudzu.us>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4ce6435820 upstream.
If addition of sysfs files fails on registration of a hotplug slot, the
struct pci_slot as well as the entry in the slot_list is leaked. The
issue has been present since the hotplug core was introduced in 2002:
https://git.kernel.org/tglx/history/c/a8a2069f432c
Perhaps the idea was that even though sysfs addition fails, the slot
should still be usable. But that's not how drivers use the interface,
they abort probe if a non-zero value is returned.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org # v2.4.15+
Cc: Greg Kroah-Hartman <greg@kroah.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3b885ac1dc upstream.
Now that mb() is an instruction barrier, it will slow performance if we issue
unnecessary barriers.
The spinlock defines have a number of unnecessary barriers. The __ldcw()
define is both a hardware and compiler barrier. The mb() barriers in the
routines using __ldcw() serve no purpose.
The only barrier needed is the one in arch_spin_unlock(). We need to ensure
all accesses are complete prior to releasing the lock.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Cc: stable@vger.kernel.org # 4.0+
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4576cd469d upstream.
TPACKET_V3 stores variable length frames in fixed length blocks.
Blocks must be able to store a block header, optional private space
and at least one minimum sized frame.
Frames, even for a zero snaplen packet, store metadata headers and
optional reserved space.
In the block size bounds check, ensure that the frame of the
chosen configuration fits. This includes sockaddr_ll and optional
tp_reserve.
Syzbot was able to construct a ring with insuffient room for the
sockaddr_ll in the header of a zero-length frame, triggering an
out-of-bounds write in dev_parse_header.
Convert the comparison to less than, as zero is a valid snap len.
This matches the test for minimum tp_frame_size immediately below.
Fixes: f6fb8f100b ("af-packet: TPACKET_V3 flexible buffer implementation.")
Fixes: eb73190f4f ("net/packet: refine check for priv area size")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6613b6173d upstream.
When first DCCP packet is SYNC or SYNCACK, we insert a new conntrack
that has an un-initialized timeout value, i.e. such entry could be
reaped at any time.
Mark them as INVALID and only ignore SYNC/SYNCACK when connection had
an old state.
Reported-by: syzbot+6f18401420df260e37ed@syzkaller.appspotmail.com
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7797167ffd upstream.
Now that we use a sync prior to releasing the locks in syscall.S, we don't need
the PA 2.0 ordered stores used to release some locks. Using an ordered store,
potentially slows the release and subsequent code.
There are a number of other ordered stores and loads that serve no purpose. I
have converted these to normal stores.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Cc: stable@vger.kernel.org # 4.0+
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1a5d5e5d51 upstream.
'ac->ac_g_ex.fe_len' is a user-controlled value which is used in the
derivation of 'ac->ac_2order'. 'ac->ac_2order', in turn, is used to
index arrays which makes it a potential spectre gadget. Fix this by
sanitizing the value assigned to 'ac->ac2_order'. This covers the
following accesses found with the help of smatch:
* fs/ext4/mballoc.c:1896 ext4_mb_simple_scan_group() warn: potential
spectre issue 'grp->bb_counters' [w] (local cap)
* fs/ext4/mballoc.c:445 mb_find_buddy() warn: potential spectre issue
'EXT4_SB(e4b->bd_sb)->s_mb_offsets' [r] (local cap)
* fs/ext4/mballoc.c:446 mb_find_buddy() warn: potential spectre issue
'EXT4_SB(e4b->bd_sb)->s_mb_maxs' [r] (local cap)
Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Jeremy Cline <jcline@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9432a31757 upstream.
A comment warning against this bug is there, but the code is not doing what
the comment says. Therefore it is possible that an EPOLLHUP races against
irq_bypass_register_consumer. The EPOLLHUP handler schedules irqfd_shutdown,
and if that runs soon enough, you get a use-after-free.
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e56b8ce363 ]
Attempt to make cryptic TCP seq number error messages clearer by
(1) identifying the source of the message as "TCP", (2) identifying the
errors as "seq # bug", and (3) grouping the field identifiers and values
by separating them with commas.
E.g., the following message is changed from:
recvmsg bug 2: copied 73BCB6CD seq 70F17CBE rcvnxt 73BCB9AA fl 0
WARNING: CPU: 2 PID: 1501 at /linux/net/ipv4/tcp.c:1881 tcp_recvmsg+0x649/0xb90
to:
TCP recvmsg seq # bug 2: copied 73BCB6CD, seq 70F17CBE, rcvnxt 73BCB9AA, fl 0
WARNING: CPU: 2 PID: 1501 at /linux/net/ipv4/tcp.c:2011 tcp_recvmsg+0x694/0xba0
Suggested-by: 積丹尼 Dan Jacobson <jidanni@jidanni.org>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 711c62dfa6 ]
In case the SPI thread is not running, a simple reset of sync
state won't fix the transmit timeout. We also need to wake up the kernel
thread.
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Fixes: ed7d42e24e ("net: qca_spi: fix transmit queue timeout handling")
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b2bab426dc ]
As long as the synchronization with the QCA7000 isn't finished, we
cannot accept packets from the upper layers. So let the SPI thread
enable the TX queue after sync and avoid unwanted packet drop.
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Fixes: 291ab06ecf ("net: qualcomm: new Ethernet over SPI driver for QCA7000")
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3a9b045506 ]
This driver can spam the kernel log with multiple messages of:
net eth0: eth0: allmulti set
Usually 4 or 8 at a time (probably because of using ConnMan).
This message doesn't seem useful, so let's demote it from dev_info()
to dev_dbg().
Signed-off-by: David Lechner <david@lechnology.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c133459765 ]
CC [M] drivers/net/ethernet/freescale/fman/fman.o
In file included from ../drivers/net/ethernet/freescale/fman/fman.c:35:
../include/linux/fsl/guts.h: In function 'guts_set_dmacr':
../include/linux/fsl/guts.h:165:2: error: implicit declaration of function 'clrsetbits_be32' [-Werror=implicit-function-declaration]
clrsetbits_be32(&guts->dmacr, 3 << shift, device << shift);
^~~~~~~~~~~~~~~
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Madalin Bucur <madalin.bucur@nxp.com>
Cc: netdev@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c29e9da56b ]
platform_get_resource() may fail and return NULL, so we should
better check it's return value to avoid a NULL pointer dereference
a bit later in the code.
This is detected by Coccinelle semantic patch.
@@
expression pdev, res, n, t, e, e1, e2;
@@
res = platform_get_resource(pdev, t, n);
+ if (!res)
+ return -EINVAL;
... when != res == NULL
e = devm_ioremap_nocache(e1, res->start, e2);
Fixes: cc4fa83f66 ("pinctrl: nsp: add pinmux driver support for Broadcom NSP SoC")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Ray Jui <ray.jui@broadcom.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 993675a310 ]
If variable length link layer headers result in a packet shorter
than dev->hard_header_len, reset the network header offset. Else
skb->mac_len may exceed skb->len after skb_mac_reset_len.
packet_sendmsg_spkt already has similar logic.
Fixes: b84bbaf7a6 ("packet: in packet_snd start writing at link layer allocation")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d14c780c11 ]
This change makes it so that we are much more explicit about the ordering
of updates to the receive address register (RAR) table. Prior to this patch
I believe we may have been updating the table while entries were still
active, or possibly allowing for reordering of things since we weren't
explicitly flushing writes to either the lower or upper portion of the
register prior to accessing the other half.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 923847413f ]
The AM3517 has a different OTG controller location than the OMAP3,
which is included from omap3.dtsi. This results in a hwmod error.
Since the AM3517 has a different OTG controller address, this patch
disabes one that is isn't available.
Signed-off-by: Adam Ford <aford173@gmail.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2f8b5b2183 ]
Call secure services to enable ACTLR[0] (Enable invalidates of BTB with
ICIALLU) when branch hardening is enabled for kernel.
On GP devices OMAP5/DRA7, there is no possibility to update secure
side since "secure world" is ROM and there are no override mechanisms
possible. On HS devices, appropriate PPA should do the workarounds as
well.
However, the configuration is only done for secondary core, since it is
expected that firmware/bootloader will have enabled the required
configuration for the primary boot core (note: bootloaders typically
will NOT enable secondary processors, since it has no need to do so).
Signed-off-by: Nishanth Menon <nm@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b4c7e2bd2e ]
Dynamic ftrace requires modifying the code segments that are usually
set to read-only. To do this, a per arch function is called both before
and after the ftrace modifications are performed. The "before" function
will set kernel code text to read-write to allow for ftrace to make the
modifications, and the "after" function will set the kernel code text
back to "read-only" to keep the kernel code text protected.
The issue happens when dynamic ftrace is tested at boot up. The test is
done before the kernel code text has been set to read-only. But the
"before" and "after" calls are still performed. The "after" call will
change the kernel code text to read-only prematurely, and other boot
code that expects this code to be read-write will fail.
The solution is to add a variable that is set when the kernel code text
is expected to be converted to read-only, and make the ftrace "before"
and "after" calls do nothing if that variable is not yet set. This is
similar to the x86 solution from commit 1623963097 ("ftrace, x86:
make kernel text writable only for conversions").
Link: http://lkml.kernel.org/r/20180620212906.24b7b66e@vmware.local.home
Reported-by: Stefan Agner <stefan@agner.ch>
Tested-by: Stefan Agner <stefan@agner.ch>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ee6581ceba ]
Incremental patch to fix the unchecked dereference in acpi_nfit_ctl.
Reported by Dan Carpenter:
"acpi/nfit: fix cmd_rc for acpi_nfit_ctl to
always return a value" from Jun 28, 2018, leads to the following
Smatch complaint:
drivers/acpi/nfit/core.c:578 acpi_nfit_ctl()
warn: variable dereferenced before check 'cmd_rc' (see line 411)
drivers/acpi/nfit/core.c
410
411 *cmd_rc = -EINVAL;
^^^^^^^^^^^^^^^^^^
Patch adds unchecked dereference.
Fixes: c1985cefd8 ("acpi/nfit: fix cmd_rc for acpi_nfit_ctl to always return a value")
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 78f058a4aa ]
The current code returns -ENOMEM and does not bother to set the output
parameters to 0 when no rings are available. Some callers, such as
bnxt_get_channels() will display garbage ring numbers when that happens.
Fix it by always setting the output parameters.
Fixes: 6e6c5a57fb ("bnxt_en: Modify bnxt_get_max_rings() to support shared or non shared rings.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e8708786d4 ]
This is used in configs lacking hardware atomics to emulate atomic r-m-w
for user space, implemented by disabling preemption in kernel.
However there are issues in current implementation:
1. Process not terminated if invalid user pointer passed:
i.e. __get_user() failed.
2. The reason for this patch was __put_user() failure not being handled
either, specifically for the COW break scenario.
The zero page is initially wired up and read from __get_user()
succeeds. A subsequent write by __put_user() induces a
Protection Violation, but COW can't finish as Linux page fault
handler is disabled due to preempt disable.
And what's worse is we silently return the stale value to user space.
Fix this specific case by re-enabling preemption and explicitly
fixing up the fault and retrying the whole sequence over.
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: linux-arch@vger.kernel.org
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
[vgupta: rewrote the changelog]
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2045cdfa1b ]
Loading the nf_conntrack module with doubled hashsize parameter, i.e.
modprobe nf_conntrack hashsize=12345 hashsize=12345
causes NULL-ptr deref.
If 'hashsize' specified twice, the nf_conntrack_set_hashsize() function
will be called also twice.
The first nf_conntrack_set_hashsize() call will set the
'nf_conntrack_htable_size' variable:
nf_conntrack_set_hashsize()
...
/* On boot, we can set this without any fancy locking. */
if (!nf_conntrack_htable_size)
return param_set_uint(val, kp);
But on the second invocation, the nf_conntrack_htable_size is already set,
so the nf_conntrack_set_hashsize() will take a different path and call
the nf_conntrack_hash_resize() function. Which will crash on the attempt
to dereference 'nf_conntrack_hash' pointer:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
RIP: 0010:nf_conntrack_hash_resize+0x255/0x490 [nf_conntrack]
Call Trace:
nf_conntrack_set_hashsize+0xcd/0x100 [nf_conntrack]
parse_args+0x1f9/0x5a0
load_module+0x1281/0x1a50
__se_sys_finit_module+0xbe/0xf0
do_syscall_64+0x7c/0x390
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Fix this, by checking !nf_conntrack_hash instead of
!nf_conntrack_htable_size. nf_conntrack_hash will be initialized only
after the module loaded, so the second invocation of the
nf_conntrack_set_hashsize() won't crash, it will just reinitialize
nf_conntrack_htable_size again.
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 8a81388ec2 ]
Instead of having the function name hard-coded (it might change and we
forgot to update them in the debug output) we can use __func__ instead
and also shorter the line so we do not need to break it. Also fix an
extra blank line while being here.
Found by checkpatch.
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0c1049dcb4 ]
PXA3xx platforms have 56 interrupts that are stored in two ICMR
registers. The code in pxa_irq_suspend() and pxa_irq_resume() however
does a simple division by 32 which only leads to one register being
saved at suspend and restored at resume time. The NAND interrupt
setting, for instance, is lost.
Fix this by using DIV_ROUND_UP() instead.
Signed-off-by: Daniel Mack <daniel@zonque.org>
Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 05925e52a7 ]
The change fixes sleep in atomic context bug, which is encountered
every time when link settings are changed by ethtool.
Since commit 35b5f6b1a8 ("PHYLIB: Locking fixes for PHY I/O
potentially sleeping") phy_start_aneg() function utilizes a mutex
to serialize changes to phy state, however that helper function is
called in atomic context under a grabbed spinlock, because
phy_start_aneg() is called by phy_ethtool_ksettings_set() and by
replaced phy_ethtool_sset() helpers from phylib.
Now duplex mode setting is enforced in ravb_adjust_link() only, also
now RX/TX is disabled when link is put down or modifications to E-MAC
registers ECMR and GECMR are expected for both cases of checked and
ignored link status pin state from E-MAC interrupt handler.
Fixes: c156633f13 ("Renesas Ethernet AVB driver proper")
Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
Reviewed-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0973a4dd79 ]
Since commit 35b5f6b1a8 ("PHYLIB: Locking fixes for PHY I/O
potentially sleeping") phy_start_aneg() function utilizes a mutex
to serialize changes to phy state, however the helper function is
called in atomic context.
The bug can be reproduced by running "ethtool -r" command, the bug
is reported if CONFIG_DEBUG_ATOMIC_SLEEP build option is enabled.
Fixes: c156633f13 ("Renesas Ethernet AVB driver proper")
Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
Reviewed-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5cb3f52a11 ]
The change fixes sleep in atomic context bug, which is encountered
every time when link settings are changed by ethtool.
Since commit 35b5f6b1a8 ("PHYLIB: Locking fixes for PHY I/O
potentially sleeping") phy_start_aneg() function utilizes a mutex
to serialize changes to phy state, however that helper function is
called in atomic context under a grabbed spinlock, because
phy_start_aneg() is called by phy_ethtool_ksettings_set() and by
replaced phy_ethtool_sset() helpers from phylib.
Now duplex mode setting is enforced in sh_eth_adjust_link() only,
also now RX/TX is disabled when link is put down or modifications
to E-MAC registers ECMR and GECMR are expected for both cases of
checked and ignored link status pin state from E-MAC interrupt handler.
For reference the change is a partial rework of commit 1e1b812bbe
("sh_eth: fix handling of no LINK signal").
Fixes: dc19e4e5e0 ("sh: sh_eth: Add support ethtool")
Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
Reviewed-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 53a710b504 ]
Since commit 35b5f6b1a8 ("PHYLIB: Locking fixes for PHY I/O
potentially sleeping") phy_start_aneg() function utilizes a mutex
to serialize changes to phy state, however the helper function is
called in atomic context.
The bug can be reproduced by running "ethtool -r" command, the bug
is reported if CONFIG_DEBUG_ATOMIC_SLEEP build option is enabled.
Fixes: dc19e4e5e0 ("sh: sh_eth: Add support ethtool")
Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
Reviewed-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a9ba23d48d ]
At present the ipv6_renew_options_kern() function ends up calling into
access_ok() which is problematic if done from inside an interrupt as
access_ok() calls WARN_ON_IN_IRQ() on some (all?) architectures
(x86-64 is affected). Example warning/backtrace is shown below:
WARNING: CPU: 1 PID: 3144 at lib/usercopy.c:11 _copy_from_user+0x85/0x90
...
Call Trace:
<IRQ>
ipv6_renew_option+0xb2/0xf0
ipv6_renew_options+0x26a/0x340
ipv6_renew_options_kern+0x2c/0x40
calipso_req_setattr+0x72/0xe0
netlbl_req_setattr+0x126/0x1b0
selinux_netlbl_inet_conn_request+0x80/0x100
selinux_inet_conn_request+0x6d/0xb0
security_inet_conn_request+0x32/0x50
tcp_conn_request+0x35f/0xe00
? __lock_acquire+0x250/0x16c0
? selinux_socket_sock_rcv_skb+0x1ae/0x210
? tcp_rcv_state_process+0x289/0x106b
tcp_rcv_state_process+0x289/0x106b
? tcp_v6_do_rcv+0x1a7/0x3c0
tcp_v6_do_rcv+0x1a7/0x3c0
tcp_v6_rcv+0xc82/0xcf0
ip6_input_finish+0x10d/0x690
ip6_input+0x45/0x1e0
? ip6_rcv_finish+0x1d0/0x1d0
ipv6_rcv+0x32b/0x880
? ip6_make_skb+0x1e0/0x1e0
__netif_receive_skb_core+0x6f2/0xdf0
? process_backlog+0x85/0x250
? process_backlog+0x85/0x250
? process_backlog+0xec/0x250
process_backlog+0xec/0x250
net_rx_action+0x153/0x480
__do_softirq+0xd9/0x4f7
do_softirq_own_stack+0x2a/0x40
</IRQ>
...
While not present in the backtrace, ipv6_renew_option() ends up calling
access_ok() via the following chain:
access_ok()
_copy_from_user()
copy_from_user()
ipv6_renew_option()
The fix presented in this patch is to perform the userspace copy
earlier in the call chain such that it is only called when the option
data is actually coming from userspace; that place is
do_ipv6_setsockopt(). Not only does this solve the problem seen in
the backtrace above, it also allows us to simplify the code quite a
bit by removing ipv6_renew_options_kern() completely. We also take
this opportunity to cleanup ipv6_renew_options()/ipv6_renew_option()
a small amount as well.
This patch is heavily based on a rough patch by Al Viro. I've taken
his original patch, converted a kmemdup() call in do_ipv6_setsockopt()
to a memdup_user() call, made better use of the e_inval jump target in
the same function, and cleaned up the use ipv6_renew_option() by
ipv6_renew_options().
CC: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d376bef9c2 ]
nft_compat relies on xt_request_find_match to increment
refcount of the module that provides the match/target.
The (builtin) icmp matches did't set the module owner so it
was possible to rmmod ip(6)tables while icmp extensions were still in use.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4d5d33a085 ]
This fixes build error regarding redefinition:
CLANG-bpf samples/bpf/parse_varlen.o
samples/bpf/parse_varlen.c:111:8: error: redefinition of 'vlan_hdr'
struct vlan_hdr {
^
./include/linux/if_vlan.h:38:8: note: previous definition is here
So remove duplicate 'struct vlan_hdr' in sample code and include if_vlan.h
Signed-off-by: Taeung Song <treeze.taeung@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d461e3da90 ]
In certain conditions, the device may not be able to link in gigabit mode. This software workaround ensures that the device will not enter the failure state.
Fixes: d0cad87170 ("SMSC75XX USB 2.0 Gigabit Ethernet Devices")
Signed-off-by: Yuiko Oshino <yuiko.oshino@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 1e8e18f694 ]
There is a special case that the size is "(N << KASAN_SHADOW_SCALE_SHIFT)
Pages plus X", the value of X is [1, KASAN_SHADOW_SCALE_SIZE-1]. The
operation "size >> KASAN_SHADOW_SCALE_SHIFT" will drop X, and the
roundup operation can not retrieve the missed one page. For example:
size=0x28006, PAGE_SIZE=0x1000, KASAN_SHADOW_SCALE_SHIFT=3, we will get
shadow_size=0x5000, but actually we need 6 pages.
shadow_size = round_up(size >> KASAN_SHADOW_SCALE_SHIFT, PAGE_SIZE);
This can lead to a kernel crash when kasan is enabled and the value of
mod->core_layout.size or mod->init_layout.size is like above. Because
the shadow memory of X has not been allocated and mapped.
move_module:
ptr = module_alloc(mod->core_layout.size);
...
memset(ptr, 0, mod->core_layout.size); //crashed
Unable to handle kernel paging request at virtual address ffff0fffff97b000
......
Call trace:
__asan_storeN+0x174/0x1a8
memset+0x24/0x48
layout_and_allocate+0xcd8/0x1800
load_module+0x190/0x23e8
SyS_finit_module+0x148/0x180
Link: http://lkml.kernel.org/r/1529659626-12660-1-git-send-email-thunder.leizhen@huawei.com
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Reviewed-by: Dmitriy Vyukov <dvyukov@google.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Libin <huawei.libin@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 26b68dd2f4 ]
Silence warnings (triggered at W=1) by adding relevant __printf attributes.
CC kernel/trace/trace.o
kernel/trace/trace.c: In function ‘__trace_array_vprintk’:
kernel/trace/trace.c:2979:2: warning: function might be possible candidate for ‘gnu_printf’ format attribute [-Wsuggest-attribute=format]
len = vscnprintf(tbuffer, TRACE_BUF_SIZE, fmt, args);
^~~
AR kernel/trace/built-in.o
Link: http://lkml.kernel.org/r/20180308205843.27447-1-malat@debian.org
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3b8d573586 ]
The touch sensors on the 2nd-gen Intuos tablets don't use a 4096x4096
sensor like other similar tablets (3rd-gen Bamboo, Intuos5, etc.).
The incorrect maximum XY values don't normally affect userspace since
touch input from these devices is typically relative rather than
absolute. It does, however, cause problems when absolute distances
need to be measured, e.g. for gesture recognition. Since the resolution
of the touch sensor on these devices is 10 units / mm (versus 100 for
the pen sensor), the proper maximum values can be calculated by simply
dividing by 10.
Fixes: b5fd2a3e92 ("Input: wacom - add support for three new Intuos devices")
Signed-off-by: Jason Gerecke <jason.gerecke@wacom.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5dc2d3996a ]
After we change the ipvlan mode from l3 to l2, or vice versa, we only
reset IFF_NOARP flag, but don't flush the ARP table cache, which will
cause eth->h_dest to be equal to eth->h_source in ipvlan_xmit_mode_l2().
Then the message will not come out of host.
Here is the reproducer on local host:
ip link set eth1 up
ip addr add 192.168.1.1/24 dev eth1
ip link add link eth1 ipvlan1 type ipvlan mode l3
ip netns add net1
ip link set ipvlan1 netns net1
ip netns exec net1 ip link set ipvlan1 up
ip netns exec net1 ip addr add 192.168.2.1/24 dev ipvlan1
ip route add 192.168.2.0/24 via 192.168.1.2
ping 192.168.2.2 -c 2
ip netns exec net1 ip link set ipvlan1 type ipvlan mode l2
ping 192.168.2.2 -c 2
Add the same configuration on remote host. After we set the mode to l2,
we could find that the src/dst MAC addresses are the same on eth1:
21:26:06.648565 00:b7:13:ad:d3:05 > 00:b7:13:ad:d3:05, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 58356, offset 0, flags [DF], proto ICMP (1), length 84)
192.168.2.1 > 192.168.2.2: ICMP echo request, id 22686, seq 1, length 64
Fix this by calling dev_change_flags(), which will call netdevice notifier
with flag change info.
v2:
a) As pointed out by Wang Cong, check return value for dev_change_flags() when
change dev flags.
b) As suggested by Stefano and Sabrina, move flags setting before l3mdev_ops.
So we don't need to redo ipvlan_{, un}register_nf_hook() again in err path.
Reported-by: Jianlin Shi <jishi@redhat.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Fixes: 2ad7bf3638 ("ipvlan: Initial check-in of the IPVLAN driver.")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 08b393d01c ]
Since the following commit:
cd77849a69 ("objtool: Fix GCC 8 cold subfunction detection for aliased functions")
... if the kernel is built with EXTRA_CFLAGS='-fno-reorder-functions',
objtool can get stuck in an infinite loop.
That flag causes the new GCC 8 cold subfunctions to be placed in .text
instead of .text.unlikely. But it also has an unfortunate quirk: in the
symbol table, the subfunction (e.g., nmi_panic.cold.7) is nested inside
the parent (nmi_panic).
That function overlap confuses objtool, and causes it to get into an
infinite loop in next_insn_same_func(). Here's Allan's description of
the loop:
"Objtool iterates through the instructions in nmi_panic using
next_insn_same_func. Once it reaches the end of nmi_panic at 0x534 it
jumps to 0x528 as that's the start of nmi_panic.cold.7. However, since
the instructions starting at 0x528 are still associated with nmi_panic
objtool will get stuck in a loop, continually jumping back to 0x528
after reaching 0x534."
Fix it by shortening the length of the parent function so that the
functions no longer overlap.
Reported-and-analyzed-by: Allan Xavier <allan.x.xavier@oracle.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Allan Xavier <allan.x.xavier@oracle.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/9e704c52bee651129b036be14feda317ae5606ae.1530136978.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ecd60532e0 ]
Booting a ColdFire m68k core with MMU enabled causes a "bad page state"
oops since commit 1d40a5ea01 ("mm: mark pages in use for page tables"):
BUG: Bad page state in process sh pfn:01ce2
page:004fefc8 count:0 mapcount:-1024 mapping:00000000 index:0x0
flags: 0x0()
raw: 00000000 00000000 00000000 fffffbff 00000000 00000100 00000200 00000000
raw: 039c4000
page dumped because: nonzero mapcount
Modules linked in:
CPU: 0 PID: 22 Comm: sh Not tainted 4.17.0-07461-g1d40a5ea01d5 #13
Fix by calling pgtable_page_dtor() in our __pte_free_tlb() code path,
so that the PG_table flag is cleared before we free the pte page.
Note that I had to change the type of pte_free() to be static from
extern. Otherwise you get a lot of warnings like this:
./arch/m68k/include/asm/mcf_pgalloc.h:80:2: warning: ‘pgtable_page_dtor’ is static but used in inline function ‘pte_free’ which is not static
pgtable_page_dtor(page);
^
And making it static is consistent with our use of this in the other
m68k pgalloc definitions of pte_free().
Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
CC: Matthew Wilcox <willy@infradead.org>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c1985cefd8 ]
cmd_rc is passed in by reference to the acpi_nfit_ctl() function and the
caller expects a value returned. However, when the package is pass through
via the ND_CMD_CALL command, cmd_rc is not touched. Make sure cmd_rc is
always set.
Fixes: aef2533822 ("libnvdimm, nfit: centralize command status translation")
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 484c016d93 ]
Driver performs the internal reload when it receives tx-timeout event from
the OS. Internal reload might fail in some scenarios e.g., fatal HW issues.
In such cases OS still see the link, which would result in undesirable
functionalities such as re-generation of tx-timeouts.
The patch addresses this issue by indicating the link-down to OS when
tx-timeout is detected, and keeping the link in down state till the
internal reload is successful.
Please consider applying it to 'net' branch.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 342639d996 ]
The call to of_get_next_child() returns a node pointer with
refcount incremented thus it must be explicitly decremented
here after the last usage.
Fixes: ab597d35ef ("PCI: xilinx-nwl: Add support for Xilinx NWL PCIe Host Controller")
Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
[lorenzo.pieralisi@arm.com: updated commit log]
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 8c3f9bd851 ]
The call to of_get_next_child() returns a node pointer with refcount
incremented thus it must be explicitly decremented here after the last
usage.
Fixes: 8961def568 ("PCI: xilinx: Add Xilinx AXI PCIe Host Bridge IP driver")
Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
[lorenzo.pieralisi@arm.com: reworked commit log]
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit f605ce5eb2 ]
If we would ever fail in the bpf_jit_prog() pass that writes the
actual insns to the image after we got header via bpf_jit_binary_alloc()
then we also need to make sure to free it through bpf_jit_binary_free()
again when bailing out. Given we had prior bpf_jit_prog() passes to
initially probe for clobbered registers, program size and to fill in
addrs arrray for jump targets, this is more of a theoretical one,
but at least make sure this doesn't break with future changes.
Fixes: 0546231057 ("s390/bpf: Add s390x eBPF JIT compiler backend")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit dd209ef809 ]
Fix following issues related to planar YUV pixel format configuration:
- NV16/61 modes were incorrectly programmed as NV12/21,
- YVU420 was programmed as YUV420 on source,
- YVU420 and YUV422 were programmed as YUV420 on output.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 188f60ab8e ]
Commit 9757235f45, "nl80211: correct checks for
NL80211_MESHCONF_HT_OPMODE value") relaxed the range for the HT
operation field in meshconf, while also adding checks requiring
the non-greenfield and non-ht-sta bits to be set in certain
circumstances. The latter bit is actually reserved for mesh BSSes
according to Table 9-168 in 802.11-2016, so in fact it should not
be set.
wpa_supplicant sets these bits because the mesh and AP code share
the same implementation, but authsae does not. As a result, some
meshconf updates from authsae which set only the NONHT_MIXED
protection bits were being rejected.
In order to avoid breaking userspace by changing the rules again,
simply accept the values with or without the bits set, and mask
off the reserved bit to match the spec.
While in here, update the 802.11-2012 reference to 802.11-2016.
Fixes: 9757235f45 ("nl80211: correct checks for NL80211_MESHCONF_HT_OPMODE value")
Cc: Masashi Honma <masashi.honma@gmail.com>
Signed-off-by: Bob Copeland <bobcopeland@fb.com>
Reviewed-by: Masashi Honma <masashi.honma@gmail.com>
Reviewed-by: Masashi Honma <masashi.honma@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit bda3153998 ]
During assemble, the spare marked for replacement is not checked.
conf->fullsync cannot be updated to be 1. As a result, recovery will
treat it as a clean array. All recovering sectors are skipped. Original
device is replaced with the not-recovered spare.
mdadm -C /dev/md0 -l10 -n4 -pn2 /dev/loop[0123]
mdadm /dev/md0 -a /dev/loop4
mdadm /dev/md0 --replace /dev/loop0
mdadm -S /dev/md0 # stop array during recovery
mdadm -A /dev/md0 /dev/loop[01234]
After reassemble, you can see recovery go on, but it completes
immediately. In fact, recovery is not actually processed.
To solve this problem, we just add the missing logics for replacment
spares. (In raid1.c or raid5.c, they have already been checked.)
Reported-by: Alex Chen <alexchen@synology.com>
Reviewed-by: Alex Wu <alexwu@synology.com>
Reviewed-by: Chung-Chiang Cheng <cccheng@synology.com>
Signed-off-by: BingJing Chang <bingjingc@synology.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e3f329c600 ]
The reported residue is already calculated in BURST unit granularity, so
advertise this capability properly to other devices in the system.
Fixes: aee4d1fac8 ("dmaengine: pl330: improve pl330_tx_status() function")
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3eb1b955cd ]
The intc #interrupt-cells is equal to 1. Currently gpio
node has 2 cells per IRQ which is wrong. Remove the additional
cell for each of the interrupts.
Signed-off-by: Keerthy <j-keerthy@ti.com>
Fixes: 2e38b946dc ("ARM: davinci: da850: add GPIO DT node")
Signed-off-by: Sekhar Nori <nsekhar@ti.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit dffd22aed2 ]
When proc_dostring() is called with a non-zero offset in strict mode, it
doesn't just write to the ->data buffer, it also reads. Make sure it
doesn't read uninitialized data.
Fixes: c6ac37d8d8 ("netfilter: nf_log: fix error on write NONE to [...]")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 983107072b ]
Currently we can hit following assert when running numa bench:
$ perf bench numa mem -p 3 -t 1 -P 512 -s 100 -zZ0cm --thp 1
perf: bench/numa.c:1577: __bench_numa: Assertion `!(!(((wait_stat) & 0x7f) == 0))' failed.
The assertion is correct, because we hit the SIGFPE in following line:
Thread 2.2 "thread 0/0" received signal SIGFPE, Arithmetic exception.
[Switching to Thread 0x7fffd28c6700 (LWP 11750)]
0x000.. in worker_thread (__tdata=0x7.. ) at bench/numa.c:1257
1257 td->speed_gbs = bytes_done / (td->runtime_ns / NSEC_PER_SEC) / 1e9;
We don't check if the runtime is actually bigger than 1 second,
and thus this might end up with zero division within FPU.
Adding the check to prevent this.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180620094036.17278-1-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 143c99f6ac ]
For some cases, the callchain provided by the kernel may be empty. So,
the callchain ip filtering code will cause a crash if we do not check
whether the struct ip_callchain pointer is NULL before accessing any
members.
This can be observed on a powerpc64le system running Fedora 27 as shown
below.
# perf record -b -e cycles:u ls
Before:
# perf report --branch-history
perf: Segmentation fault
-------- backtrace --------
perf[0x1027615c]
linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x7fff856304d8]
perf(arch_skip_callchain_idx+0x44)[0x10257c58]
perf[0x1017f2e4]
perf(thread__resolve_callchain+0x124)[0x1017ff5c]
perf(sample__resolve_callchain+0xf0)[0x10172788]
...
After:
# perf report --branch-history
Samples: 25 of event 'cycles:u', Event count (approx.): 2306870
Overhead Source:Line Symbol Shared Object
+ 11.60% _init+35736 [.] _init ls
+ 9.84% strcoll_l.c:137 [.] __strcoll_l libc-2.26.so
+ 9.16% memcpy.S:175 [.] __memcpy_power7 libc-2.26.so
+ 9.01% gconv_charset.h:54 [.] _nl_find_locale libc-2.26.so
+ 8.87% dl-addr.c:52 [.] _dl_addr libc-2.26.so
+ 8.83% _init+236 [.] _init ls
...
Reported-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Acked-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20180611104049.11048-1-sandipan@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b930e62ecd ]
On s390 this test case fails because the socket identifiction numbers
assigned to the CPU are higher than the CPU identification numbers.
F/ix this by adding the platform architecture into the perf data header
flag information. This helps identifiing the test platform and handles
s390 specifics in process_cpu_topology().
Before:
[root@p23lp27 perf]# perf test -vvvvv -F 39
39: Session topology :
--- start ---
templ file: /tmp/perf-test-iUv755
socket_id number is too big.You may need to upgrade the perf tool.
---- end ----
Session topology: Skip
[root@p23lp27 perf]#
After:
[root@p23lp27 perf]# perf test -vvvvv -F 39
39: Session topology :
--- start ---
templ file: /tmp/perf-test-8X8VTs
CPU 0, core 0, socket 6
CPU 1, core 1, socket 3
---- end ----
Session topology: Ok
[root@p23lp27 perf]#
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Fixes: c84974ed9f ("perf test: Add entry to test cpu topology")
Link: http://lkml.kernel.org/r/20180611073153.15592-2-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 36eb93509c ]
Initialize the 'err' variate to remove the build warning,
the warning is shown as below:
drivers/usb/host/xhci-tegra.c: In function 'tegra_xusb_mbox_thread':
drivers/usb/host/xhci-tegra.c:552:6: warning: 'err' may be used uninitialized in this function [-Wuninitialized]
drivers/usb/host/xhci-tegra.c:482:6: note: 'err' was declared here
Fixes: e84fce0f88 ("usb: xhci: Add NVIDIA Tegra XUSB controller driver")
Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Acked-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c9a4c63888 ]
The kernel may spew a WARNING with UBSAN undefined behavior at
handling ALSA sequencer ioctl SNDRV_SEQ_IOCTL_QUERY_NEXT_CLIENT:
UBSAN: Undefined behaviour in sound/core/seq/seq_clientmgr.c:2007:14
signed integer overflow:
2147483647 + 1 cannot be represented in type 'int'
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x122/0x1c8 lib/dump_stack.c:113
ubsan_epilogue+0x12/0x86 lib/ubsan.c:159
handle_overflow+0x1c2/0x21f lib/ubsan.c:190
__ubsan_handle_add_overflow+0x2a/0x31 lib/ubsan.c:198
snd_seq_ioctl_query_next_client+0x1ac/0x1d0 sound/core/seq/seq_clientmgr.c:2007
snd_seq_ioctl+0x264/0x3d0 sound/core/seq/seq_clientmgr.c:2144
....
It happens only when INT_MAX is passed there, as we're incrementing it
unconditionally. So the fix is trivial, check the value with
INT_MAX. Although the bug itself is fairly harmless, it's better to
fix it so that fuzzers won't hit this again later.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=200211
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 13399ff25f ]
According to IIO ABI relative humidity reading should be
returned in milli percent.
This patch addresses that by applying proper scaling and
returning integer instead of fractional format type specifier.
Note that the fixes tag is before the driver was heavily refactored
to introduce spi support, so the patch won't apply that far back.
Signed-off-by: Tomasz Duszynski <tduszyns@gmail.com>
Fixes: 14beaa8f5a ("iio: pressure: bmp280: add humidity support")
Acked-by: Matt Ranostay <matt.ranostay@konsulko.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 9713cb0cf1 ]
A reference for the best gateway is taken when the list of gateways in the
mesh is sent via netlink. This is necessary to check whether the currently
dumped entry is the currently selected gateway or not. This information is
then transferred as flag BATADV_ATTR_FLAG_BEST.
After the comparison of the current entry is done,
batadv_v_gw_dump_entry() has to decrease the reference counter again.
Otherwise the reference will be held and thus prevents a proper shutdown of
the batman-adv interfaces (and some of the interfaces enslaved in it).
Fixes: b71bb6f924 ("batman-adv: add B.A.T.M.A.N. V bat_gw_dump implementations")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Acked-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b5685d2687 ]
A reference for the best gateway is taken when the list of gateways in the
mesh is sent via netlink. This is necessary to check whether the currently
dumped entry is the currently selected gateway or not. This information is
then transferred as flag BATADV_ATTR_FLAG_BEST.
After the comparison of the current entry is done,
batadv_iv_gw_dump_entry() has to decrease the reference counter again.
Otherwise the reference will be held and thus prevents a proper shutdown of
the batman-adv interfaces (and some of the interfaces enslaved in it).
Fixes: efb766af06 ("batman-adv: add B.A.T.M.A.N. IV bat_gw_dump implementations")
Reported-by: Andreas Ziegler <dev@andreas-ziegler.de>
Tested-by: Andreas Ziegler <dev@andreas-ziegler.de>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Acked-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 7b4e88434c ]
Smack: Mark inode instant in smack_task_to_inode
/proc clean-up in commit 1bbc55131e
resulted in smack_task_to_inode() being called before smack_d_instantiate.
This resulted in the smk_inode value being ignored, even while present
for files in /proc/self. Marking the inode as instant here fixes that.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: James Morris <james.morris@microsoft.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 6c6da92808 ]
After recieving MLD querys, we update idev->mc_maxdelay with max_delay
from query header. This make the later unsolicited reports have the same
interval with mc_maxdelay, which means we may send unsolicited reports with
long interval time instead of default configured interval time.
Also as we will not call ipv6_mc_reset() after device up. This issue will
be there even after leave the group and join other groups.
Fixes: fc4eba58b4 ("ipv6: make unsolicited report intervals configurable for mld")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ba56bc3a07 ]
When booting a 64 KB pages kernel on a ACPI GICv3 system that
implements support for v2 emulation, the following warning is
produced
GICV size 0x2000 not a multiple of page size 0x10000
and support for v2 emulation is disabled, preventing GICv2 VMs
from being able to run on such hosts.
The reason is that vgic_v3_probe() performs a sanity check on the
size of the window (it should be a multiple of the page size),
while the ACPI MADT parsing code hardcodes the size of the window
to 8 KB. This makes sense, considering that ACPI does not bother
to describe the size in the first place, under the assumption that
platforms implementing ACPI will follow the architecture and not
put anything else in the same 64 KB window.
So let's just drop the sanity check altogether, and assume that
the window is at least 64 KB in size.
Fixes: 9097773245 ("KVM: arm/arm64: vgic-new: vgic_init: implement kvm_vgic_hyp_init")
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ea0820bb77 ]
Device tree based systems without of_dev_auxdata will have the mdio
device named differently than "davinci_mdio(.0)". In this case use the
device's parent's compatible string for matching
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2f24ef7413 ]
machine_desc->init_per_cpu() hook is supposed to be per cpu
initialization and would seem to apply equally to UP and/or SMP.
Infact the comment in header file seems to suggest it works for
UP too, which was not the case and this patch.
This enables !CONFIG_SMP build for platforms such as hsdk.
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
[vgupta: trimmeed changelog]
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d68a90e148 ]
Controllers that are not yet enabled should not really enforce keep alive
timeouts, but we still want to track a timeout and cleanup in case a host
died before it enabled the controller. Hence, simply reset the keep
alive timer when the controller is enabled.
Suggested-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit bc8a2d9bcb ]
The Stratix10 platform has an additional reset line, OCP(Open Core Protocol),
that also needs to get deasserted for the stmmac ethernet controller to work.
Thus we need to update the Kconfig to include ARCH_STRATIX10 in order to build
dwmac-socfpga.
Also, remove the redundant check for the reset controller pointer. The
reset driver already checks for the pointer and returns 0 if the pointer
is NULL.
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4e8439aa34 ]
The array bpq_eth_addr is only used to get the size of an
address, whereas the bcast_addr is used to set the broadcast
address. This leads to a warning when using clang:
drivers/net/hamradio/bpqether.c:94:13: warning: variable 'bpq_eth_addr' is not
needed and will not be emitted [-Wunneeded-internal-declaration]
static char bpq_eth_addr[6];
^
Remove both variables and use the common eth_broadcast_addr
to set the broadcast address.
Signed-off-by: Stefan Agner <stefan@agner.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3256d29fc7 ]
lockdep spotted that we are using rfs_h.lock in enic_get_rxnfc() without
initializing. rfs_h.lock is initialized in enic_open(). But ethtool_ops
can be called when interface is down.
Move enic_rfs_flw_tbl_init to enic_probe.
INFO: trying to register non-static key.
the code is fine but needs lockdep annotation.
turning off the locking correctness validator.
CPU: 18 PID: 1189 Comm: ethtool Not tainted 4.17.0-rc7-devel+ #27
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
Call Trace:
dump_stack+0x85/0xc0
register_lock_class+0x550/0x560
? __handle_mm_fault+0xa8b/0x1100
__lock_acquire+0x81/0x670
lock_acquire+0xb9/0x1e0
? enic_get_rxnfc+0x139/0x2b0 [enic]
_raw_spin_lock_bh+0x38/0x80
? enic_get_rxnfc+0x139/0x2b0 [enic]
enic_get_rxnfc+0x139/0x2b0 [enic]
ethtool_get_rxnfc+0x8d/0x1c0
dev_ethtool+0x16c8/0x2400
? __mutex_lock+0x64d/0xa00
? dev_load+0x6a/0x150
dev_ioctl+0x253/0x4b0
sock_do_ioctl+0x9a/0x130
sock_ioctl+0x1af/0x350
do_vfs_ioctl+0x8e/0x670
? syscall_trace_enter+0x1e2/0x380
ksys_ioctl+0x60/0x90
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x5a/0x170
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b154886f78 ]
We can't call function trace hook before setup percpu offset.
When entering secondary_start_kernel(), percpu offset has not
been initialized. So this lead hotplug malfunction.
Here is the flow to reproduce this bug:
echo 0 > /sys/devices/system/cpu/cpu1/online
echo function > /sys/kernel/debug/tracing/current_tracer
echo 1 > /sys/kernel/debug/tracing/tracing_on
echo 1 > /sys/devices/system/cpu/cpu1/online
Acked-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Zhizhou Zhang <zhizhouzhang@asrmicro.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 70c3c8cb83 ]
If isoc split in transfer with no data (the length of DATA0
packet is zero), we can't simply return immediately. Because
the DATA0 can be the first transaction or the second transaction
for the isoc split in transaction. If the DATA0 packet with no
data is in the first transaction, we can return immediately.
But if the DATA0 packet with no data is in the second transaction
of isoc split in transaction sequence, we need to increase the
qtd->isoc_frame_index and giveback urb to device driver if needed,
otherwise, the MDATA packet will be lost.
A typical test case is that connect the dwc2 controller with an
usb hs Hub (GL852G-12), and plug an usb fs audio device (Plantronics
headset) into the downstream port of Hub. Then use the usb mic
to record, we can find noise when playback.
In the case, the isoc split in transaction sequence like this:
- SSPLIT IN transaction
- CSPLIT IN transaction
- MDATA packet (176 bytes)
- CSPLIT IN transaction
- DATA0 packet (0 byte)
This patch use both the length of DATA0 and qtd->isoc_split_offset
to check if the DATA0 is in the second transaction.
Tested-by: Gevorg Sahakyan <sahakyan@synopsys.com>
Tested-by: Heiko Stuebner <heiko@sntech.de>
Acked-by: Minas Harutyunyan hminas@synopsys.com>
Signed-off-by: William Wu <william.wu@rock-chips.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit fae2a63737 ]
Currently smatch warns of possible Spectre-V1 issue in ahci_led_store():
drivers/ata/libahci.c:1150 ahci_led_store() warn: potential spectre issue 'pp->em_priv' (local cap)
Userspace controls @pmp from following callchain:
em_message->store()
->ata_scsi_em_message_store()
-->ap->ops->em_store()
--->ahci_led_store()
After the mask+shift @pmp is effectively an 8b value, which is used to
index into an array of length 8, so sanitize the array index.
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 375dc53d03 ]
Run the completer task to post a work completion after processing
a memory registration or invalidate work request. This covers the
case where the memory registration or invalidate was the last work
request posted to the qp.
Signed-off-by: Vijay Immanuel <vijayi@attalasystems.com>
Reviewed-by: Yonatan Cohen <yonatanc@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 89610dc2c2 ]
In the situation that DE and SE aren’t shared the same interrupt number,
the Global SE interrupts mask bit MASK_IRQ_EN in MASKIRQ must be set, or
else other mask bits will not work and no SE interrupt will occur. This
patch enables MASK_IRQ_EN for SE to fix this problem.
Signed-off-by: Alison Wang <alison.wang@nxp.com>
Acked-by: Liviu Dudau <liviu.dudau@arm.com>
Signed-off-by: Liviu Dudau <liviu.dudau@arm.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 403fde6448 ]
The interrupts for the PCIe controllers should all be of type
IRQ_TYPE_LEVEL_HIGH instead of IRQ_TYPE_NONE.
Fixes: d71eb94120 ("ARM: dts: NSP: Add MSI support on PCI")
Fixes: 522199029f ("ARM: dts: NSP: Fix PCIE DT issue")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d6a3e55131 ]
Unless the software synchronization objects (CONFIG_SW_SYNC) is enabled,
the sync test will be skipped:
TAP version 13
1..0 # Skipped: Sync framework not supported by kernel
Add a config fragment file to be able to run "make kselftest-merge" to
enable relevant configuration required in order to run the sync test.
Signed-off-by: Fathi Boudra <fathi.boudra@linaro.org>
Link: https://lkml.org/lkml/2017/5/5/14
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Shuah Khan (Samsung OSG) <shuah@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 685814466b ]
When zram test is skipped because of unmet dependencies and/or
unsupported configuration, it exits with error which is treated as
a fail by the Kselftest framework. This leads to false negative result
even when the test could not be run.
Change it to return kselftest skip code when a test gets skipped to
clearly report that the test could not be run.
Kselftest framework SKIP code is 4 and the framework prints appropriate
messages to indicate that the test is skipped.
Signed-off-by: Shuah Khan (Samsung OSG) <shuah@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d7d5311d4a ]
When user test is skipped because of unmet dependencies and/or
unsupported configuration, it exits with error which is treated as
a fail by the Kselftest framework. This leads to false negative result
even when the test could not be run.
Change it to return kselftest skip code when a test gets skipped to
clearly report that the test could not be run. Add an explicit check
for module presence and return skip code if module isn't present.
Kselftest framework SKIP code is 4 and the framework prints appropriate
messages to indicate that the test is skipped.
Signed-off-by: Shuah Khan (Samsung OSG) <shuah@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 8781578087 ]
When static_keys test is skipped because of unmet dependencies and/or
unsupported configuration, it exits with error which is treated as a fail
by the Kselftest framework. This leads to false negative result even when
the test could not be run.
Change it to return kselftest skip code when a test gets skipped to clearly
report that the test could not be run.
Added an explicit searches for test_static_key_base and test_static_keys
modules and return skip code if they aren't found to differentiate between
the failure to load the module condition and module not found condition.
Kselftest framework SKIP code is 4 and the framework prints appropriate
messages to indicate that the test is skipped.
Signed-off-by: Shuah Khan (Samsung OSG) <shuah@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 856e7c4b61 ]
When pstore_post_reboot test gets skipped because of unmet dependencies
and/or unsupported configuration, it returns 0 which is treated as a pass
by the Kselftest framework. This leads to false positive result even when
the test could not be run.
Change it to return kselftest skip code when a test gets skipped to clearly
report that the test could not be run.
Kselftest framework SKIP code is 4 and the framework prints appropriate
messages to indicate that the test is skipped.
Signed-off-by: Shuah Khan (Samsung OSG) <shuah@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Shuah Khan (Samsung OSG) <shuah@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 9ce7bc036a ]
It is a waste of memory to use a full "struct netns_sysctl_ipv6"
while only one pointer is really used, considering netns_sysctl_ipv6
keeps growing.
Also, since "struct netns_frags" has cache line alignment,
it is better to move the frags_hdr pointer outside, otherwise
we spend a full cache line for this pointer.
This saves 192 bytes of memory per netns.
Fixes: c038a767cd ("ipv6: add a new namespace for nf_conntrack_reasm")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 896e518883 ]
The clocks have already been explicitly disabled and put as part of
remove() so the runtime suspend callback must not be run when balancing
the runtime PM usage count before returning.
Fixes: 16adc674d0 ("usb: dwc3: add generic OF glue layer")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 74c11e300c ]
GCC built for arc*-*-linux has "-mmedium-calls" implicitly enabled by default
thus we don't see any problems during Linux kernel compilation.
----------------------------->8------------------------
arc-linux-gcc -mcpu=arc700 -Q --help=target | grep calls
-mlong-calls [disabled]
-mmedium-calls [enabled]
----------------------------->8------------------------
But if we try to use so-called Elf32 toolchain with GCC configured for
arc*-*-elf* then we'd see the following failure:
----------------------------->8------------------------
init/do_mounts.o: In function 'init_rootfs':
do_mounts.c:(.init.text+0x108): relocation truncated to fit: R_ARC_S21W_PCREL
against symbol 'unregister_filesystem' defined in .text section in fs/filesystems.o
arc-elf32-ld: final link failed: Symbol needs debug section which does not exist
make: *** [vmlinux] Error 1
----------------------------->8------------------------
That happens because neither "-mmedium-calls" nor "-mlong-calls" are enabled in
Elf32 GCC:
----------------------------->8------------------------
arc-elf32-gcc -mcpu=arc700 -Q --help=target | grep calls
-mlong-calls [disabled]
-mmedium-calls [disabled]
----------------------------->8------------------------
Now to make it possible to use Elf32 toolchain for building Linux kernel
we're explicitly add "-mmedium-calls" to CFLAGS.
And since we add "-mmedium-calls" to the global CFLAGS there's no point in
having per-file copies thus removing them.
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b3681dd548 upstream.
error_entry and error_exit communicate the user vs. kernel status of
the frame using %ebx. This is unnecessary -- the information is in
regs->cs. Just use regs->cs.
This makes error_entry simpler and makes error_exit more robust.
It also fixes a nasty bug. Before all the Spectre nonsense, the
xen_failsafe_callback entry point returned like this:
ALLOC_PT_GPREGS_ON_STACK
SAVE_C_REGS
SAVE_EXTRA_REGS
ENCODE_FRAME_POINTER
jmp error_exit
And it did not go through error_entry. This was bogus: RBX
contained garbage, and error_exit expected a flag in RBX.
Fortunately, it generally contained *nonzero* garbage, so the
correct code path was used. As part of the Spectre fixes, code was
added to clear RBX to mitigate certain speculation attacks. Now,
depending on kernel configuration, RBX got zeroed and, when running
some Wine workloads, the kernel crashes. This was introduced by:
commit 3ac6d8c787 ("x86/entry/64: Clear registers for exceptions/interrupts, to reduce speculation attack surface")
With this patch applied, RBX is no longer needed as a flag, and the
problem goes away.
I suspect that malicious userspace could use this bug to crash the
kernel even without the offending patch applied, though.
[ Historical note: I wrote this patch as a cleanup before I was aware
of the bug it fixed. ]
[ Note to stable maintainers: this should probably get applied to all
kernels. If you're nervous about that, a more conservative fix to
add xorl %ebx,%ebx; incl %ebx before the jump to error_exit should
also fix the problem. ]
Reported-and-tested-by: M. Vefa Bicakci <m.v.b@runbox.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Cc: xen-devel@lists.xenproject.org
Fixes: 3ac6d8c787 ("x86/entry/64: Clear registers for exceptions/interrupts, to reduce speculation attack surface")
Link: http://lkml.kernel.org/r/b5010a090d3586b2d6e06c7ad3ec5542d1241c45.1532282627.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sarah Newman <srn@prgmr.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dfcab6ba57 upstream.
dw8250_set_termios() doesn't set baud rate if the arg "old ktermios" is
NULL. This happens during resume.
Call Trace:
...
[ 54.928108] dw8250_set_termios+0x162/0x170
[ 54.928114] serial8250_set_termios+0x17/0x20
[ 54.928117] uart_change_speed+0x64/0x160
[ 54.928119] uart_resume_port
...
So the baud rate is not restored after S3 and breaks the apps who use
UART, for example, console and bluetooth etc.
We address this issue by setting the baud rate irrespective of arg
"old", just like the drivers for other 8250 IPs. This is tested with
Intel Broxton platform.
Signed-off-by: Chen Hu <hu1.chen@intel.com>
Fixes: 4e26b134bd ("serial: 8250_dw: clock rate handling for all ACPI platforms")
Cc: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Cc: stable <stable@vger.kernel.org>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 47ac76662c upstream.
Revert commit ecb988a3b7: tty: serial:
8250: 8250_core: NXP SC16C2552 workaround
The above commit causes userland application to no longer write
correctly its first write to a dumb terminal connected to /dev/ttyS0.
This commit seems to be the culprit. It's as though the TX FIFO is being
reset during that write. What should be displayed is:
PSW 80000000 INST 00000000 HALT
//
What is displayed is some variation of:
T 00000000 HAL//
Reverting this commit via this patch fixes my problem.
Signed-off-by: Mark Hounschell <dmarkh@cfl.rr.com>
Fixes: ecb988a3b7 ("tty: serial: 8250: 8250_core: NXP SC16C2552 workaround")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 231f941500 upstream.
Every time I tried to upgrade my laptop from 3.10.x to 4.x I faced an
issue by which the fan would run at full speed upon resume. Bisecting
it showed me the issue was introduced in 3.17 by commit 821d6f0359
(ACPI / sleep: Do not save NVS for new machines to accelerate S3). This
code only affects machines built starting as of 2012, but this Asus
1025C laptop was made in 2012 and apparently needs the NVS data to be
saved, otherwise the CPU's thermal state is not properly reported on
resume and the fan runs at full speed upon resume.
Here's a very simple way to check if such a machine is affected :
# cat /sys/class/thermal/thermal_zone0/temp
55000
( now suspend, wait one second and resume )
# cat /sys/class/thermal/thermal_zone0/temp
0
(and after ~15 seconds the fan starts to spin)
Let's apply the same quirk as commit cbc00c13 (ACPI: save NVS memory
for Lenovo G50-45) and reuse the function it provides. Note that this
commit was already backported to 4.9.x but not 4.4.x.
Cc: 3.17+ <stable@vger.kernel.org> # 3.17+: requires cbc00c13
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e60870012e upstream.
The portdata spinlock can be taken in interrupt context (via
sierra_outdat_callback()).
Disable interrupts when taking the portdata spinlock when discarding
deferred URBs during close to prevent a possible deadlock.
Fixes: 014333f77c ("USB: sierra: fix urb and memory leak on disconnect")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
[ johan: amend commit message and add fixes and stable tags ]
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5e22002aa8 ]
It was possible to directly leak the kernel address where the isdn_dev
structure pointer was stored. This is a kernel ASLR bypass for anyone
with access to the ioctl. The code had been present since the beginning
of git history, though this shouldn't ever be needed for normal operation,
therefore remove it.
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Karsten Keil <isdn@linux-pingi.de>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3acd3e3bab upstream.
The endian conversions used in vxp_dma_read() and vxp_dma_write() are
superfluous and even wrong on big-endian machines, as inw() and outw()
already do conversions. Kill them.
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dfef01e150 upstream.
snd_dma_alloc_pages_fallback() tries to allocate pages again when the
allocation fails with reduced size. But the first try actually
*increases* the size to power-of-two, which may give back a larger
chunk than the requested size. This confuses the callers, e.g. sgbuf
assumes that the size is equal or less, and it may result in a bad
loop due to the underflow and eventually lead to Oops.
The code of this function seems incorrectly assuming the usage of
get_order(). We need to decrease at first, then align to
power-of-two.
Reported-and-tested-by: he, bo <bo.he@intel.com>
Reported-by: zhang jun <jun.zhang@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 69756930f2 upstream.
One place in cs5535audio_build_dma_packets() does an extra conversion
via cpu_to_le32(); namely jmpprd_addr is passed to setup_prd() ops,
which writes the value via cs_writel(). That is, the callback does
the conversion by itself, and we don't need to convert beforehand.
This patch fixes that bogus conversion.
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 50e9ffb199 upstream.
The virmidi output trigger tries to parse the all available bytes and
process sequencer events as much as possible. In a normal situation,
this is supposed to be relatively short, but a program may give a huge
buffer and it'll take a long time in a single spin lock, which may
eventually lead to a soft lockup.
This patch simply adds a workaround, a cond_resched() call in the loop
if applicable. A better solution would be to move the event processor
into a work, but let's put a duct-tape quickly at first.
Reported-and-tested-by: Dae R. Jeong <threeearcat@gmail.com>
Reported-by: syzbot+619d9f40141d826b097e@syzkaller.appspotmail.com
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fff71a4c05 upstream.
The endian conversions used in vx2_dma_read() and vx2_dma_write() are
superfluous and even wrong on big-endian machines, as inl() and outl()
already do conversions. Kill them.
Spotted by sparse, a warning like:
sound/pci/vx222/vx222_ops.c:278:30: warning: incorrect type in argument 1 (different base types)
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d77a4b4a5b upstream.
As an equivalent codec with CX20724,
CX8200 is also subject to the reboot bug.
Late 2017 and 2018 LG Gram and some HP Spectre laptops are known victims
to this issue, causing extremely loud noises upon reboot.
Now that we know that this bug is subject to multiple codecs,
fix the comment as well.
Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f59cf9a055 upstream.
On rare occasions, we are still noticing that the internal speaker
spitting out spurious noises even after adding the problematic codec
to the list.
Adding a 10ms artificial delay before rebooting fixes the issue entirely.
Patch for Realtek codecs also adds the same amount of delay after
entering D3.
Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 82a40777de ]
According to RFC791, 68 bytes is the minimum size of IPv4 datagram every
device must be able to forward without further fragmentation while 576
bytes is the minimum size of IPv4 datagram every device has to be able
to receive, so in ip6_tnl_xmit(), 68(IPV4_MIN_MTU) should be the right
value for the ipv4 min mtu check in ip6_tnl_xmit.
While at it, change to use max() instead of if statement.
Fixes: c9fefa0819 ("ip6_tunnel: get the min mtu properly in ip6_tnl_xmit")
Reported-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 455f05ecd2 ]
syzbot reported that we reinitialize an active delayed
work in vsock_stream_connect():
ODEBUG: init active (active state 0) object type: timer_list hint:
delayed_work_timer_fn+0x0/0x90 kernel/workqueue.c:1414
WARNING: CPU: 1 PID: 11518 at lib/debugobjects.c:329
debug_print_object+0x16a/0x210 lib/debugobjects.c:326
The pattern is apparently wrong, we should only initialize
the dealyed work once and could repeatly schedule it. So we
have to move out the initializations to allocation side.
And to avoid confusion, we can split the shared dwork
into two, instead of re-using the same one.
Fixes: d021c34405 ("VSOCK: Introduce VM Sockets")
Reported-by: <syzbot+8a9b1bd330476a4f3db6@syzkaller.appspotmail.com>
Cc: Andy king <acking@vmware.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0dcb82254d ]
llc_sap_put() decreases the refcnt before deleting sap
from the global list. Therefore, there is a chance
llc_sap_find() could find a sap with zero refcnt
in this global list.
Close this race condition by checking if refcnt is zero
or not in llc_sap_find(), if it is zero then it is being
removed so we can just treat it as gone.
Reported-by: <syzbot+278893f3f7803871f7ce@syzkaller.appspotmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 61ef4b07fc ]
The shift of 'cwnd' with '(now - hc->tx_lsndtime) / hc->tx_rto' value
can lead to undefined behavior [1].
In order to fix this use a gradual shift of the window with a 'while'
loop, similar to what tcp_cwnd_restart() is doing.
When comparing delta and RTO there is a minor difference between TCP
and DCCP, the last one also invokes dccp_cwnd_restart() and reduces
'cwnd' if delta equals RTO. That case is preserved in this change.
[1]:
[40850.963623] UBSAN: Undefined behaviour in net/dccp/ccids/ccid2.c:237:7
[40851.043858] shift exponent 67 is too large for 32-bit type 'unsigned int'
[40851.127163] CPU: 3 PID: 15940 Comm: netstress Tainted: G W E 4.18.0-rc7.x86_64 #1
...
[40851.377176] Call Trace:
[40851.408503] dump_stack+0xf1/0x17b
[40851.451331] ? show_regs_print_info+0x5/0x5
[40851.503555] ubsan_epilogue+0x9/0x7c
[40851.548363] __ubsan_handle_shift_out_of_bounds+0x25b/0x2b4
[40851.617109] ? __ubsan_handle_load_invalid_value+0x18f/0x18f
[40851.686796] ? xfrm4_output_finish+0x80/0x80
[40851.739827] ? lock_downgrade+0x6d0/0x6d0
[40851.789744] ? xfrm4_prepare_output+0x160/0x160
[40851.845912] ? ip_queue_xmit+0x810/0x1db0
[40851.895845] ? ccid2_hc_tx_packet_sent+0xd36/0x10a0 [dccp]
[40851.963530] ccid2_hc_tx_packet_sent+0xd36/0x10a0 [dccp]
[40852.029063] dccp_xmit_packet+0x1d3/0x720 [dccp]
[40852.086254] dccp_write_xmit+0x116/0x1d0 [dccp]
[40852.142412] dccp_sendmsg+0x428/0xb20 [dccp]
[40852.195454] ? inet_dccp_listen+0x200/0x200 [dccp]
[40852.254833] ? sched_clock+0x5/0x10
[40852.298508] ? sched_clock+0x5/0x10
[40852.342194] ? inet_create+0xdf0/0xdf0
[40852.388988] sock_sendmsg+0xd9/0x160
...
Fixes: 113ced1f52 ("dccp ccid-2: Perform congestion-window validation")
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f19f5c49bb upstream.
It turns out that we should *not* invert all not-present mappings,
because the all zeroes case is obviously special.
clear_page() does not undergo the XOR logic to invert the address bits,
i.e. PTE, PMD and PUD entries that have not been individually written
will have val=0 and so will trigger __pte_needs_invert(). As a result,
{pte,pmd,pud}_pfn() will return the wrong PFN value, i.e. all ones
(adjusted by the max PFN mask) instead of zero. A zeroed entry is ok
because the page at physical address 0 is reserved early in boot
specifically to mitigate L1TF, so explicitly exempt them from the
inversion when reading the PFN.
Manifested as an unexpected mprotect(..., PROT_NONE) failure when called
on a VMA that has VM_PFNMAP and was mmap'd to as something other than
PROT_NONE but never used. mprotect() sends the PROT_NONE request down
prot_none_walk(), which walks the PTEs to check the PFNs.
prot_none_pte_entry() gets the bogus PFN from pte_pfn() and returns
-EACCES because it thinks mprotect() is trying to adjust a high MMIO
address.
[ This is a very modified version of Sean's original patch, but all
credit goes to Sean for doing this and also pointing out that
sometimes the __pte_needs_invert() function only gets the protection
bits, not the full eventual pte. But zero remains special even in
just protection bits, so that's ok. - Linus ]
Fixes: f22cc87f6c ("x86/speculation/l1tf: Invert all not present mappings")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5e0fb5df2e upstream.
ioremap() calls pud_free_pmd_page() / pmd_free_pte_page() when it creates
a pud / pmd map. The following preconditions are met at their entry.
- All pte entries for a target pud/pmd address range have been cleared.
- System-wide TLB purges have been peformed for a target pud/pmd address
range.
The preconditions assure that there is no stale TLB entry for the range.
Speculation may not cache TLB entries since it requires all levels of page
entries, including ptes, to have P & A-bits set for an associated address.
However, speculation may cache pud/pmd entries (paging-structure caches)
when they have P-bit set.
Add a system-wide TLB purge (INVLPG) to a single page after clearing
pud/pmd entry's P-bit.
SDM 4.10.4.1, Operation that Invalidate TLBs and Paging-Structure Caches,
states that:
INVLPG invalidates all paging-structure caches associated with the
current PCID regardless of the liner addresses to which they correspond.
Fixes: 28ee90fe60 ("x86/mm: implement free pmd/pte page interfaces")
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: mhocko@suse.com
Cc: akpm@linux-foundation.org
Cc: hpa@zytor.com
Cc: cpandya@codeaurora.org
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: Joerg Roedel <joro@8bytes.org>
Cc: stable@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20180627141348.21777-4-toshi.kani@hpe.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3bbda5a386 upstream.
If the ts3a227e audio accessory detection hardware is present and its
driver probed, the jack needs to be created before enabling jack
detection in the ts3a227e driver. With this patch, the jack is
instantiated in the max98090 headset init function if the ts3a227e is
present. This fixes a null pointer dereference as the jack detection
enabling function in the ts3a driver was called before the jack is
created.
[minor correction to keep error handling on jack creation the same
as before by Pierre Bossart]
Signed-off-by: Thierry Escande <thierry.escande@collabora.com>
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Acked-By: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 318abdfbe7 upstream.
Like the skcipher_walk and blkcipher_walk cases:
scatterwalk_done() is only meant to be called after a nonzero number of
bytes have been processed, since scatterwalk_pagedone() will flush the
dcache of the *previous* page. But in the error case of
ablkcipher_walk_done(), e.g. if the input wasn't an integer number of
blocks, scatterwalk_done() was actually called after advancing 0 bytes.
This caused a crash ("BUG: unable to handle kernel paging request")
during '!PageSlab(page)' on architectures like arm and arm64 that define
ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE, provided that the input was
page-aligned as in that case walk->offset == 0.
Fix it by reorganizing ablkcipher_walk_done() to skip the
scatterwalk_advance() and scatterwalk_done() if an error has occurred.
Reported-by: Liu Chao <liuchao741@huawei.com>
Fixes: bf06099db1 ("crypto: skcipher - Add ablkcipher_walk interfaces")
Cc: <stable@vger.kernel.org> # v2.6.35+
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0868def3e4 upstream.
Like the skcipher_walk case:
scatterwalk_done() is only meant to be called after a nonzero number of
bytes have been processed, since scatterwalk_pagedone() will flush the
dcache of the *previous* page. But in the error case of
blkcipher_walk_done(), e.g. if the input wasn't an integer number of
blocks, scatterwalk_done() was actually called after advancing 0 bytes.
This caused a crash ("BUG: unable to handle kernel paging request")
during '!PageSlab(page)' on architectures like arm and arm64 that define
ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE, provided that the input was
page-aligned as in that case walk->offset == 0.
Fix it by reorganizing blkcipher_walk_done() to skip the
scatterwalk_advance() and scatterwalk_done() if an error has occurred.
This bug was found by syzkaller fuzzing.
Reproducer, assuming ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE:
#include <linux/if_alg.h>
#include <sys/socket.h>
#include <unistd.h>
int main()
{
struct sockaddr_alg addr = {
.salg_type = "skcipher",
.salg_name = "ecb(aes-generic)",
};
char buffer[4096] __attribute__((aligned(4096))) = { 0 };
int fd;
fd = socket(AF_ALG, SOCK_SEQPACKET, 0);
bind(fd, (void *)&addr, sizeof(addr));
setsockopt(fd, SOL_ALG, ALG_SET_KEY, buffer, 16);
fd = accept(fd, NULL, NULL);
write(fd, buffer, 15);
read(fd, buffer, 15);
}
Reported-by: Liu Chao <liuchao741@huawei.com>
Fixes: 5cde0af2a9 ("[CRYPTO] cipher: Added block cipher type")
Cc: <stable@vger.kernel.org> # v2.6.19+
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bb29648102 upstream.
syzbot reported a crash in vmac_final() when multiple threads
concurrently use the same "vmac(aes)" transform through AF_ALG. The bug
is pretty fundamental: the VMAC template doesn't separate per-request
state from per-tfm (per-key) state like the other hash algorithms do,
but rather stores it all in the tfm context. That's wrong.
Also, vmac_final() incorrectly zeroes most of the state including the
derived keys and cached pseudorandom pad. Therefore, only the first
VMAC invocation with a given key calculates the correct digest.
Fix these bugs by splitting the per-tfm state from the per-request state
and using the proper init/update/final sequencing for requests.
Reproducer for the crash:
#include <linux/if_alg.h>
#include <sys/socket.h>
#include <unistd.h>
int main()
{
int fd;
struct sockaddr_alg addr = {
.salg_type = "hash",
.salg_name = "vmac(aes)",
};
char buf[256] = { 0 };
fd = socket(AF_ALG, SOCK_SEQPACKET, 0);
bind(fd, (void *)&addr, sizeof(addr));
setsockopt(fd, SOL_ALG, ALG_SET_KEY, buf, 16);
fork();
fd = accept(fd, NULL, NULL);
for (;;)
write(fd, buf, 256);
}
The immediate cause of the crash is that vmac_ctx_t.partial_size exceeds
VMAC_NHBYTES, causing vmac_final() to memset() a negative length.
Reported-by: syzbot+264bca3a6e8d645550d3@syzkaller.appspotmail.com
Fixes: f1939f7c56 ("crypto: vmac - New hash algorithm for intel_txt support")
Cc: <stable@vger.kernel.org> # v2.6.32+
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 73bf20ef3d upstream.
The VMAC template assumes the block cipher has a 128-bit block size, but
it failed to check for that. Thus it was possible to instantiate it
using a 64-bit block size cipher, e.g. "vmac(cast5)", causing
uninitialized memory to be used.
Add the needed check when instantiating the template.
Fixes: f1939f7c56 ("crypto: vmac - New hash algorithm for intel_txt support")
Cc: <stable@vger.kernel.org> # v2.6.32+
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0a957467c5 upstream.
i8259.h uses inb/outb and thus needs to include asm/io.h to avoid the
following build error, as seen with x86_64:defconfig and CONFIG_SMP=n.
In file included from drivers/rtc/rtc-cmos.c:45:0:
arch/x86/include/asm/i8259.h: In function 'inb_pic':
arch/x86/include/asm/i8259.h:32:24: error:
implicit declaration of function 'inb'
arch/x86/include/asm/i8259.h: In function 'outb_pic':
arch/x86/include/asm/i8259.h:45:2: error:
implicit declaration of function 'outb'
Reported-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Suggested-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Fixes: 447ae31667 ("x86: Don't include linux/irq.h from asm/hardirq.h")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b89b41d0b8 upstream.
Current cpu_core_id fixup causes downcored F17h configurations to be
incorrect:
NODE: 0
processor 0 core id : 0
processor 1 core id : 1
processor 2 core id : 2
processor 3 core id : 4
processor 4 core id : 5
processor 5 core id : 0
NODE: 1
processor 6 core id : 2
processor 7 core id : 3
processor 8 core id : 4
processor 9 core id : 0
processor 10 core id : 1
processor 11 core id : 2
Code that relies on the cpu_core_id, like match_smt(), for example,
which builds the thread siblings masks used by the scheduler, is
mislead.
So, limit the fixup to pre-F17h machines. The new value for cpu_core_id
for F17h and later will represent the CPUID_Fn8000001E_EBX[CoreId],
which is guaranteed to be unique for each core within a socket.
This way we have:
NODE: 0
processor 0 core id : 0
processor 1 core id : 1
processor 2 core id : 2
processor 3 core id : 4
processor 4 core id : 5
processor 5 core id : 6
NODE: 1
processor 6 core id : 8
processor 7 core id : 9
processor 8 core id : 10
processor 9 core id : 12
processor 10 core id : 13
processor 11 core id : 14
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
[ Heavily massaged. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yazen Ghannam <Yazen.Ghannam@amd.com>
Link: http://lkml.kernel.org/r/20170731085159.9455-2-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6c26fcd2ab upstream.
pfn_modify_allowed() and arch_has_pfn_modify_check() are outside of the
!__ASSEMBLY__ section in include/asm-generic/pgtable.h, which confuses
assembler on archs that don't have __HAVE_ARCH_PFN_MODIFY_ALLOWED (e.g.
ia64) and breaks build:
include/asm-generic/pgtable.h: Assembler messages:
include/asm-generic/pgtable.h:538: Error: Unknown opcode `static inline bool pfn_modify_allowed(unsigned long pfn,pgprot_t prot)'
include/asm-generic/pgtable.h:540: Error: Unknown opcode `return true'
include/asm-generic/pgtable.h:543: Error: Unknown opcode `static inline bool arch_has_pfn_modify_check(void)'
include/asm-generic/pgtable.h:545: Error: Unknown opcode `return false'
arch/ia64/kernel/entry.S:69: Error: `mov' does not fit into bundle
Move those two static inlines into the !__ASSEMBLY__ section so that they
don't confuse the asm build pass.
Fixes: 42e4089c78 ("x86/speculation/l1tf: Disallow non privileged high MMIO PROT_NONE mappings")
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[groeck: Context changes]
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 792adb90fa upstream.
The introduction of generic_max_swapfile_size and arch-specific versions has
broken linking on x86 with CONFIG_SWAP=n due to undefined reference to
'generic_max_swapfile_size'. Fix it by compiling the x86-specific
max_swapfile_size() only with CONFIG_SWAP=y.
Reported-by: Tomas Pruzina <pruzinat@gmail.com>
Fixes: 377eeaa8e1 ("x86/speculation/l1tf: Limit swap file size to MAX_PA/2")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 269777aa53 upstream.
Commit 0cc3cd2165 ("cpu/hotplug: Boot HT siblings at least once")
breaks non-SMP builds.
[ I suspect the 'bool' fields should just be made to be bitfields and be
exposed regardless of configuration, but that's a separate cleanup
that I'll leave to the owners of this file for later. - Linus ]
Fixes: 0cc3cd2165 ("cpu/hotplug: Boot HT siblings at least once")
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Abel Vesa <abelvesa@linux.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d0055f351e upstream.
The function has an inline "return false;" definition with CONFIG_SMP=n
but the "real" definition is also visible leading to "redefinition of
‘apic_id_is_primary_thread’" compiler error.
Guard it with #ifdef CONFIG_SMP
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Fixes: 6a4d2657e0 ("x86/smp: Provide topology_is_primary_thread()")
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 07d981ad4c upstream
The kernel unnecessarily prevents late microcode loading when SMT is
disabled. It should be safe to allow it if all the primary threads are
online.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1063711b57 upstream
The mmio tracer sets io mapping PTEs and PMDs to non present when enabled
without inverting the address bits, which makes the PTE entry vulnerable
for L1TF.
Make it use the right low level macros to actually invert the address bits
to protect against L1TF.
In principle this could be avoided because MMIO tracing is not likely to be
enabled on production machines, but the fix is straigt forward and for
consistency sake it's better to get rid of the open coded PTE manipulation.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 958f79b9ee upstream
set_memory_np() is used to mark kernel mappings not present, but it has
it's own open coded mechanism which does not have the L1TF protection of
inverting the address bits.
Replace the open coded PTE manipulation with the L1TF protecting low level
PTE routines.
Passes the CPA self test.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[ dwmw2: Pull in pud_mkhuge() from commit a00cc7d9dd, and pfn_pud() ]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0768f91530 upstream
Some cases in THP like:
- MADV_FREE
- mprotect
- split
mark the PMD non present for temporarily to prevent races. The window for
an L1TF attack in these contexts is very small, but it wants to be fixed
for correctness sake.
Use the proper low level functions for pmd/pud_mknotpresent() to address
this.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f22cc87f6c upstream
For kernel mappings PAGE_PROTNONE is not necessarily set for a non present
mapping, but the inversion logic explicitely checks for !PRESENT and
PROT_NONE.
Remove the PROT_NONE check and make the inversion unconditional for all not
present mappings.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bc2d8d262c upstream
Josh reported that the late SMT evaluation in cpu_smt_state_init() sets
cpu_smt_control to CPU_SMT_NOT_SUPPORTED in case that 'nosmt' was supplied
on the kernel command line as it cannot differentiate between SMT disabled
by BIOS and SMT soft disable via 'nosmt'. That wreckages the state and
makes the sysfs interface unusable.
Rework this so that during bringup of the non boot CPUs the availability of
SMT is determined in cpu_smt_allowed(). If a newly booted CPU is not a
'primary' thread then set the local cpu_smt_available marker and evaluate
this explicitely right after the initial SMP bringup has finished.
SMT evaulation on x86 is a trainwreck as the firmware has all the
information _before_ booting the kernel, but there is no interface to query
it.
Fixes: 73d5e2b472 ("cpu/hotplug: detect SMT disabled by BIOS")
Reported-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5b76a3cff0 upstream
When nested virtualization is in use, VMENTER operations from the nested
hypervisor into the nested guest will always be processed by the bare metal
hypervisor, and KVM's "conditional cache flushes" mode in particular does a
flush on nested vmentry. Therefore, include the "skip L1D flush on
vmentry" bit in KVM's suggested ARCH_CAPABILITIES setting.
Add the relevant Documentation.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8e0b2b9166 upstream
Bit 3 of ARCH_CAPABILITIES tells a hypervisor that L1D flush on vmentry is
not needed. Add a new value to enum vmx_l1d_flush_state, which is used
either if there is no L1TF bug at all, or if bit 3 is set in ARCH_CAPABILITIES.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ea156d192f upstream
Three changes to the content of the sysfs file:
- If EPT is disabled, L1TF cannot be exploited even across threads on the
same core, and SMT is irrelevant.
- If mitigation is completely disabled, and SMT is enabled, print "vulnerable"
instead of "vulnerable, SMT vulnerable"
- Reorder the two parts so that the main vulnerability state comes first
and the detail on SMT is second.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cd28325249 upstream
This lets userspace read the MSR_IA32_ARCH_CAPABILITIES and check that all
requested features are available on the host.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 518e7b9481 upstream
Linux (among the others) has checks to make sure that certain features
aren't enabled on a certain family/model/stepping if the microcode version
isn't greater than or equal to a known good version.
By exposing the real microcode version, we're preventing buggy guests that
don't check that they are running virtualized (i.e., they should trust the
hypervisor) from disabling features that are effectively not buggy.
Suggested-by: Filippo Sironi <sironi@amazon.de>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Liran Alon <liran.alon@oracle.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d1d93fa90f upstream
In order to determine if LFENCE is a serializing instruction on AMD
processors, MSR 0xc0011029 (MSR_F10H_DECFG) must be read and the state
of bit 1 checked. This patch will add support to allow a guest to
properly make this determination.
Add the MSR feature callback operation to svm.c and add MSR 0xc0011029
to the list of MSR-based features. If LFENCE is serializing, then the
feature is supported, allowing the hypervisor to set the value of the
MSR that guest will see. Support is also added to write (hypervisor only)
and read the MSR value for the guest. A write by the guest will result in
a #GP. A read by the guest will return the value as set by the host. In
this way, the support to expose the feature to the guest is controlled by
the hypervisor.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 801e459a6f upstream
Provide a new KVM capability that allows bits within MSRs to be recognized
as features. Two new ioctls are added to the /dev/kvm ioctl routine to
retrieve the list of these MSRs and then retrieve their values. A kvm_x86_ops
callback is used to determine support for the listed MSR-based features.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[Tweaked documentation. - Radim]
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 18b57ce2eb upstream
For VMEXITs caused by external interrupts, vmx_handle_external_intr()
indirectly calls into the interrupt handlers through the host's IDT.
It follows that these interrupts get accounted for in the
kvm_cpu_l1tf_flush_l1d per-cpu flag.
The subsequently executed vmx_l1d_flush() will thus be aware that some
interrupts have happened and conduct a L1d flush anyway.
Setting l1tf_flush_l1d from vmx_handle_external_intr() isn't needed
anymore. Drop it.
Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ffcba43ff6 upstream
The last missing piece to having vmx_l1d_flush() take interrupts after
VMEXIT into account is to set the kvm_cpu_l1tf_flush_l1d per-cpu flag on
irq entry.
Issue calls to kvm_set_cpu_l1tf_flush_l1d() from entering_irq(),
ipi_entering_ack_irq(), smp_reschedule_interrupt() and
uv_bau_message_interrupt().
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 447ae31667 upstream
The next patch in this series will have to make the definition of
irq_cpustat_t available to entering_irq().
Inclusion of asm/hardirq.h into asm/apic.h would cause circular header
dependencies like
asm/smp.h
asm/apic.h
asm/hardirq.h
linux/irq.h
linux/topology.h
linux/smp.h
asm/smp.h
or
linux/gfp.h
linux/mmzone.h
asm/mmzone.h
asm/mmzone_64.h
asm/smp.h
asm/apic.h
asm/hardirq.h
linux/irq.h
linux/irqdesc.h
linux/kobject.h
linux/sysfs.h
linux/kernfs.h
linux/idr.h
linux/gfp.h
and others.
This causes compilation errors because of the header guards becoming
effective in the second inclusion: symbols/macros that had been defined
before wouldn't be available to intermediate headers in the #include chain
anymore.
A possible workaround would be to move the definition of irq_cpustat_t
into its own header and include that from both, asm/hardirq.h and
asm/apic.h.
However, this wouldn't solve the real problem, namely asm/harirq.h
unnecessarily pulling in all the linux/irq.h cruft: nothing in
asm/hardirq.h itself requires it. Also, note that there are some other
archs, like e.g. arm64, which don't have that #include in their
asm/hardirq.h.
Remove the linux/irq.h #include from x86' asm/hardirq.h.
Fix resulting compilation errors by adding appropriate #includes to *.c
files as needed.
Note that some of these *.c files could be cleaned up a bit wrt. to their
set of #includes, but that should better be done from separate patches, if
at all.
Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[dwmw2: More fixes for EFI and Xen in 4.9]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 45b575c00d upstream
Part of the L1TF mitigation for vmx includes flushing the L1D cache upon
VMENTRY.
L1D flushes are costly and two modes of operations are provided to users:
"always" and the more selective "conditional" mode.
If operating in the latter, the cache would get flushed only if a host side
code path considered unconfined had been traversed. "Unconfined" in this
context means that it might have pulled in sensitive data like user data
or kernel crypto keys.
The need for L1D flushes is tracked by means of the per-vcpu flag
l1tf_flush_l1d. KVM exit handlers considered unconfined set it. A
vmx_l1d_flush() subsequently invoked before the next VMENTER will conduct a
L1d flush based on its value and reset that flag again.
Currently, interrupts delivered "normally" while in root operation between
VMEXIT and VMENTER are not taken into account. Part of the reason is that
these don't leave any traces and thus, the vmx code is unable to tell if
any such has happened.
As proposed by Paolo Bonzini, prepare for tracking all interrupts by
introducing a new per-cpu flag, "kvm_cpu_l1tf_flush_l1d". It will be in
strong analogy to the per-vcpu ->l1tf_flush_l1d.
A later patch will make interrupt handlers set it.
For the sake of cache locality, group kvm_cpu_l1tf_flush_l1d into x86'
per-cpu irq_cpustat_t as suggested by Peter Zijlstra.
Provide the helpers kvm_set_cpu_l1tf_flush_l1d(),
kvm_clear_cpu_l1tf_flush_l1d() and kvm_get_cpu_l1tf_flush_l1d(). Make them
trivial resp. non-existent for !CONFIG_KVM_INTEL as appropriate.
Let vmx_l1d_flush() handle kvm_cpu_l1tf_flush_l1d in the same way as
l1tf_flush_l1d.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9aee5f8a7e upstream
An upcoming patch will extend KVM's L1TF mitigation in conditional mode
to also cover interrupts after VMEXITs. For tracking those, stores to a
new per-cpu flag from interrupt handlers will become necessary.
In order to improve cache locality, this new flag will be added to x86's
irq_cpustat_t.
Make some space available there by shrinking the ->softirq_pending bitfield
from 32 to 16 bits: the number of bits actually used is only NR_SOFTIRQS,
i.e. 10.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5b6ccc6c3b upstream
Currently, vmx_vcpu_run() checks if l1tf_flush_l1d is set and invokes
vmx_l1d_flush() if so.
This test is unncessary for the "always flush L1D" mode.
Move the check to vmx_l1d_flush()'s conditional mode code path.
Notes:
- vmx_l1d_flush() is likely to get inlined anyway and thus, there's no
extra function call.
- This inverts the (static) branch prediction, but there hadn't been any
explicit likely()/unlikely() annotations before and so it stays as is.
Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 427362a142 upstream
The vmx_l1d_flush_always static key is only ever evaluated if
vmx_l1d_should_flush is enabled. In that case however, there are only two
L1d flushing modes possible: "always" and "conditional".
The "conditional" mode's implementation tends to require more sophisticated
logic than the "always" mode.
Avoid inverted logic by replacing the 'vmx_l1d_flush_always' static key
with a 'vmx_l1d_flush_cond' one.
There is no change in functionality.
Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 379fd0c7e6 upstream
vmx_l1d_flush() gets invoked only if l1tf_flush_l1d is true. There's no
point in setting l1tf_flush_l1d to true from there again.
Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 73d5e2b472 upstream
If SMT is disabled in BIOS, the CPU code doesn't properly detect it.
The /sys/devices/system/cpu/smt/control file shows 'on', and the 'l1tf'
vulnerabilities file shows SMT as vulnerable.
Fix it by forcing 'cpu_smt_control' to CPU_SMT_NOT_SUPPORTED in such a
case. Unfortunately the detection can only be done after bringing all
the CPUs online, so we have to overwrite any previous writes to the
variable.
Reported-by: Joe Mario <jmario@redhat.com>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Fixes: f048c399e0 ("x86/topology: Provide topology_smt_supported()")
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 288d152c23 upstream
The slow path in vmx_l1d_flush() reads from vmx_l1d_flush_pages in order
to evict the L1d cache.
However, these pages are never cleared and, in theory, their data could be
leaked.
More importantly, KSM could merge a nested hypervisor's vmx_l1d_flush_pages
to fewer than 1 << L1D_CACHE_ORDER host physical pages and this would break
the L1d flushing algorithm: L1D on x86_64 is tagged by physical addresses.
Fix this by initializing the individual vmx_l1d_flush_pages with a
different pattern each.
Rename the "empty_zp" asm constraint identifier in vmx_l1d_flush() to
"flush_pages" to reflect this change.
Fixes: a47dd5f067 ("x86/KVM/VMX: Add L1D flush algorithm")
Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d90a7a0ec8 upstream
Introduce the 'l1tf=' kernel command line option to allow for boot-time
switching of mitigation that is used on processors affected by L1TF.
The possible values are:
full
Provides all available mitigations for the L1TF vulnerability. Disables
SMT and enables all mitigations in the hypervisors. SMT control via
/sys/devices/system/cpu/smt/control is still possible after boot.
Hypervisors will issue a warning when the first VM is started in
a potentially insecure configuration, i.e. SMT enabled or L1D flush
disabled.
full,force
Same as 'full', but disables SMT control. Implies the 'nosmt=force'
command line option. sysfs control of SMT and the hypervisor flush
control is disabled.
flush
Leaves SMT enabled and enables the conditional hypervisor mitigation.
Hypervisors will issue a warning when the first VM is started in a
potentially insecure configuration, i.e. SMT enabled or L1D flush
disabled.
flush,nosmt
Disables SMT and enables the conditional hypervisor mitigation. SMT
control via /sys/devices/system/cpu/smt/control is still possible
after boot. If SMT is reenabled or flushing disabled at runtime
hypervisors will issue a warning.
flush,nowarn
Same as 'flush', but hypervisors will not warn when
a VM is started in a potentially insecure configuration.
off
Disables hypervisor mitigations and doesn't emit any warnings.
Default is 'flush'.
Let KVM adhere to these semantics, which means:
- 'lt1f=full,force' : Performe L1D flushes. No runtime control
possible.
- 'l1tf=full'
- 'l1tf-flush'
- 'l1tf=flush,nosmt' : Perform L1D flushes and warn on VM start if
SMT has been runtime enabled or L1D flushing
has been run-time enabled
- 'l1tf=flush,nowarn' : Perform L1D flushes and no warnings are emitted.
- 'l1tf=off' : L1D flushes are not performed and no warnings
are emitted.
KVM can always override the L1D flushing behavior using its 'vmentry_l1d_flush'
module parameter except when lt1f=full,force is set.
This makes KVM's private 'nosmt' option redundant, and as it is a bit
non-systematic anyway (this is something to control globally, not on
hypervisor level), remove that option.
Add the missing Documentation entry for the l1tf vulnerability sysfs file
while at it.
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20180713142323.202758176@linutronix.de
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fee0aede6f upstream
The CPU_SMT_NOT_SUPPORTED state is set (if the processor does not support
SMT) when the sysfs SMT control file is initialized.
That was fine so far as this was only required to make the output of the
control file correct and to prevent writes in that case.
With the upcoming l1tf command line parameter, this needs to be set up
before the L1TF mitigation selection and command line parsing happens.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20180713142323.121795971@linutronix.de
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7db92e165a upstream
In preparation of allowing run time control for L1D flushing, move the
setup code to the module parameter handler.
In case of pre module init parsing, just store the value and let vmx_init()
do the actual setup after running kvm_init() so that enable_ept is having
the correct state.
During run-time invoke it directly from the parameter setter to prepare for
run-time control.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20180713142322.694063239@linutronix.de
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2f055947ae upstream
The VMX module parameter to control the L1D flush should become
writeable.
The MSR list is set up at VM init per guest VCPU, but the run time
switching is based on a static key which is global. Toggling the MSR list
at run time might be feasible, but for now drop this optimization and use
the regular MSR write to make run-time switching possible.
The default mitigation is the conditional flush anyway, so for extra
paranoid setups this will add some small overhead, but the extra code
executed is in the noise compared to the flush itself.
Aside of that the EPT disabled case is not handled correctly at the moment
and the MSR list magic is in the way for fixing that as well.
If it's really providing a significant advantage, then this needs to be
revisited after the code is correct and the control is writable.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20180713142322.516940445@linutronix.de
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 215af5499d upstream
Writing 'off' to /sys/devices/system/cpu/smt/control offlines all SMT
siblings. Writing 'on' merily enables the abilify to online them, but does
not online them automatically.
Make 'on' more useful by onlining all offline siblings.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 390d975e0c upstream
If the L1D flush module parameter is set to 'always' and the IA32_FLUSH_CMD
MSR is available, optimize the VMENTER code with the MSR save list.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 989e3992d2 upstream
The IA32_FLUSH_CMD MSR needs only to be written on VMENTER. Extend
add_atomic_switch_msr() with an entry_only parameter to allow storing the
MSR only in the guest (ENTRY) MSR array.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 33966dd6b2 upstream
There is no semantic change but this change allows an unbalanced amount of
MSRs to be loaded on VMEXIT and VMENTER, i.e. the number of MSRs to save or
restore on VMEXIT or VMENTER may be different.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 83bafef1a1 upstream
When L0 establishes (or removes) an MSR entry in the VM-entry or VM-exit
MSR load lists, the change should affect the dormant VMCS as well as the
current VMCS. Moreover, the vmcs02 MSR-load addresses should be
initialized.
[ dwmw2: Pulled in to 4.9 backports for L1TF ]
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c595ceee45 upstream
Add the logic for flushing L1D on VMENTER. The flush depends on the static
key being enabled and the new l1tf_flush_l1d flag being set.
The flags is set:
- Always, if the flush module parameter is 'always'
- Conditionally at:
- Entry to vcpu_run(), i.e. after executing user space
- From the sched_in notifier, i.e. when switching to a vCPU thread.
- From vmexit handlers which are considered unsafe, i.e. where
sensitive data can be brought into L1D:
- The emulator, which could be a good target for other speculative
execution-based threats,
- The MMU, which can bring host page tables in the L1 cache.
- External interrupts
- Nested operations that require the MMU (see above). That is
vmptrld, vmptrst, vmclear,vmwrite,vmread.
- When handling invept,invvpid
[ tglx: Split out from combo patch and reduced to a single flag ]
[ dwmw2: Backported to 4.9, set l1tf_flush_l1d in svm/vmx code ]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3fa045be4c upstream
336996-Speculative-Execution-Side-Channel-Mitigations.pdf defines a new MSR
(IA32_FLUSH_CMD aka 0x10B) which has similar write-only semantics to other
MSRs defined in the document.
The semantics of this MSR is to allow "finer granularity invalidation of
caching structures than existing mechanisms like WBINVD. It will writeback
and invalidate the L1 data cache, including all cachelines brought in by
preceding instructions, without invalidating all caches (eg. L2 or
LLC). Some processors may also invalidate the first level level instruction
cache on a L1D_FLUSH command. The L1 data and instruction caches may be
shared across the logical processors of a core."
Use it instead of the loop based L1 flush algorithm.
A copy of this document is available at
https://bugzilla.kernel.org/show_bug.cgi?id=199511
[ tglx: Avoid allocating pages when the MSR is available ]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a47dd5f067 upstream
To mitigate the L1 Terminal Fault vulnerability it's required to flush L1D
on VMENTER to prevent rogue guests from snooping host memory.
CPUs will have a new control MSR via a microcode update to flush L1D with a
single MSR write, but in the absence of microcode a fallback to a software
based flush algorithm is required.
Add a software flush loop which is based on code from Intel.
[ tglx: Split out from combo patch ]
[ bpetkov: Polish the asm code ]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a399477e52 upstream
Add a mitigation mode parameter "vmentry_l1d_flush" for CVE-2018-3620, aka
L1 terminal fault. The valid arguments are:
- "always" L1D cache flush on every VMENTER.
- "cond" Conditional L1D cache flush, explained below
- "never" Disable the L1D cache flush mitigation
"cond" is trying to avoid L1D cache flushes on VMENTER if the code executed
between VMEXIT and VMENTER is considered safe, i.e. is not bringing any
interesting information into L1D which might exploited.
[ tglx: Split out from a larger patch ]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 26acfb666a upstream
If the L1TF CPU bug is present we allow the KVM module to be loaded as the
major of users that use Linux and KVM have trusted guests and do not want a
broken setup.
Cloud vendors are the ones that are uncomfortable with CVE 2018-3620 and as
such they are the ones that should set nosmt to one.
Setting 'nosmt' means that the system administrator also needs to disable
SMT (Hyper-threading) in the BIOS, or via the 'nosmt' command line
parameter, or via the /sys/devices/system/cpu/smt/control. See commit
05736e4ac1 ("cpu/hotplug: Provide knobs to control SMT").
Other mitigations are to use task affinity, cpu sets, interrupt binding,
etc - anything to make sure that _only_ the same guests vCPUs are running
on sibling threads.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0cc3cd2165 upstream
Due to the way Machine Check Exceptions work on X86 hyperthreads it's
required to boot up _all_ logical cores at least once in order to set the
CR4.MCE bit.
So instead of ignoring the sibling threads right away, let them boot up
once so they can configure themselves. After they came out of the initial
boot stage check whether its a "secondary" sibling and cancel the operation
which puts the CPU back into offline state.
[dwmw2: Backport to 4.9]
Reported-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 506a66f374 upstream
Dave Hansen reported, that it's outright dangerous to keep SMT siblings
disabled completely so they are stuck in the BIOS and wait for SIPI.
The reason is that Machine Check Exceptions are broadcasted to siblings and
the soft disabled sibling has CR4.MCE = 0. If a MCE is delivered to a
logical core with CR4.MCE = 0, it asserts IERR#, which shuts down or
reboots the machine. The MCE chapter in the SDM contains the following
blurb:
Because the logical processors within a physical package are tightly
coupled with respect to shared hardware resources, both logical
processors are notified of machine check errors that occur within a
given physical processor. If machine-check exceptions are enabled when
a fatal error is reported, all the logical processors within a physical
package are dispatched to the machine-check exception handler. If
machine-check exceptions are disabled, the logical processors enter the
shutdown state and assert the IERR# signal. When enabling machine-check
exceptions, the MCE flag in control register CR4 should be set for each
logical processor.
Reverting the commit which ignores siblings at enumeration time solves only
half of the problem. The core cpuhotplug logic needs to be adjusted as
well.
This thoughtful engineered mechanism also turns the boot process on all
Intel HT enabled systems into a MCE lottery. MCE is enabled on the boot CPU
before the secondary CPUs are brought up. Depending on the number of
physical cores the window in which this situation can happen is smaller or
larger. On a HSW-EX it's about 750ms:
MCE is enabled on the boot CPU:
[ 0.244017] mce: CPU supports 22 MCE banks
The corresponding sibling #72 boots:
[ 1.008005] .... node #0, CPUs: #72
That means if an MCE hits on physical core 0 (logical CPUs 0 and 72)
between these two points the machine is going to shutdown. At least it's a
known safe state.
It's obvious that the early boot can be hit by an MCE as well and then runs
into the same situation because MCEs are not yet enabled on the boot CPU.
But after enabling them on the boot CPU, it does not make any sense to
prevent the kernel from recovering.
Adjust the nosmt kernel parameter documentation as well.
Reverts: 2207def700 ("x86/apic: Ignore secondary threads if nosmt=force")
Reported-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e14d7dfb41 upstream
Jan has noticed that pte_pfn and co. resp. pfn_pte are incorrect for
CONFIG_PAE because phys_addr_t is wider than unsigned long and so the
pte_val reps. shift left would get truncated. Fix this up by using proper
types.
[dwmw2: Backport to 4.9]
Fixes: 6b28baca9b ("x86/speculation/l1tf: Protect PROT_NONE PTEs against speculation")
Reported-by: Jan Beulich <JBeulich@suse.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0d0f624905 upstream
The PAE 3-level paging code currently doesn't mitigate L1TF by flipping the
offset bits, and uses the high PTE word, thus bits 32-36 for type, 37-63 for
offset. The lower word is zeroed, thus systems with less than 4GB memory are
safe. With 4GB to 128GB the swap type selects the memory locations vulnerable
to L1TF; with even more memory, also the swap offfset influences the address.
This might be a problem with 32bit PAE guests running on large 64bit hosts.
By continuing to keep the whole swap entry in either high or low 32bit word of
PTE we would limit the swap size too much. Thus this patch uses the whole PAE
PTE with the same layout as the 64bit version does. The macros just become a
bit tricky since they assume the arch-dependent swp_entry_t to be 32bit.
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7ce2f0393e upstream
The TOPOEXT reenablement is a workaround for broken BIOSen which didn't
enable the CPUID bit. amd_get_topology_early(), however, relies on
that bit being set so that it can read out the CPUID leaf and set
smp_num_siblings properly.
Move the reenablement up to early_init_amd(). While at it, simplify
amd_get_topology_early().
[dwmw2: Backport to 4.9]
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 11e34e64e4 upstream
336996-Speculative-Execution-Side-Channel-Mitigations.pdf defines a new MSR
(IA32_FLUSH_CMD) which is detected by CPUID.7.EDX[28]=1 bit being set.
This new MSR "gives software a way to invalidate structures with finer
granularity than other architectual methods like WBINVD."
A copy of this document is available at
https://bugzilla.kernel.org/show_bug.cgi?id=199511
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1a7ed1ba4b upstream
The previous patch has limited swap file size so that large offsets cannot
clear bits above MAX_PA/2 in the pte and interfere with L1TF mitigation.
It assumed that offsets are encoded starting with bit 12, same as pfn. But
on x86_64, offsets are encoded starting with bit 9.
Thus the limit can be raised by 3 bits. That means 16TB with 42bit MAX_PA
and 256TB with 46bit MAX_PA.
Fixes: 377eeaa8e1 ("x86/speculation/l1tf: Limit swap file size to MAX_PA/2")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2207def700 upstream
nosmt on the kernel command line merely prevents the onlining of the
secondary SMT siblings.
nosmt=force makes the APIC detection code ignore the secondary SMT siblings
completely, so they even do not show up as possible CPUs. That reduces the
amount of memory allocations for per cpu variables and saves other
resources from being allocated too large.
This is not fully equivalent to disabling SMT in the BIOS because the low
level SMT enabling in the BIOS can result in partitioning of resources
between the siblings, which is not undone by just ignoring them. Some CPUs
can use the full resources when their sibling is not onlined, but this is
depending on the CPU family and model and it's not well documented whether
this applies to all partitioned resources. That means depending on the
workload disabling SMT in the BIOS might result in better performance.
Linus analysis of the Intel manual:
The intel optimization manual is not very clear on what the partitioning
rules are.
I find:
"In general, the buffers for staging instructions between major pipe
stages are partitioned. These buffers include µop queues after the
execution trace cache, the queues after the register rename stage, the
reorder buffer which stages instructions for retirement, and the load
and store buffers.
In the case of load and store buffers, partitioning also provided an
easier implementation to maintain memory ordering for each logical
processor and detect memory ordering violations"
but some of that partitioning may be relaxed if the HT thread is "not
active":
"In Intel microarchitecture code name Sandy Bridge, the micro-op queue
is statically partitioned to provide 28 entries for each logical
processor, irrespective of software executing in single thread or
multiple threads. If one logical processor is not active in Intel
microarchitecture code name Ivy Bridge, then a single thread executing
on that processor core can use the 56 entries in the micro-op queue"
but I do not know what "not active" means, and how dynamic it is. Some of
that partitioning may be entirely static and depend on the early BIOS
disabling of HT, and even if we park the cores, the resources will just be
wasted.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1e1d7e25fd upstream
To support force disabling of SMT it's required to know the number of
thread siblings early. amd_get_topology() cannot be called before the APIC
driver is selected, so split out the part which initializes
smp_num_siblings and invoke it from amd_early_init().
[dwmw2: Backport to 4.9]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 119bff8a9c upstream
Old code used to check whether CPUID ext max level is >= 0x80000008 because
that last leaf contains the number of cores of the physical CPU. The three
functions called there now do not depend on that leaf anymore so the check
can go.
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1910ad5624 upstream
Make use of the new early detection function to initialize smp_num_siblings
on the boot cpu before the MP-Table or ACPI/MADT scan happens. That's
required for force disabling SMT.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 95f3d39ccf upstream
To support force disabling of SMT it's required to know the number of
thread siblings early. detect_extended_topology() cannot be called before
the APIC driver is selected, so split out the part which initializes
smp_num_siblings.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 545401f444 upstream
To support force disabling of SMT it's required to know the number of
thread siblings early. detect_ht() cannot be called before the APIC driver
is selected, so split out the part which initializes smp_num_siblings.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 55e6d279ab upstream
The value of this printout is dubious at best and there is no point in
having it in two different places along with convoluted ways to reach it.
Remove it completely.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 05736e4ac1 upstream
Provide a command line and a sysfs knob to control SMT.
The command line options are:
'nosmt': Enumerate secondary threads, but do not online them
'nosmt=force': Ignore secondary threads completely during enumeration
via MP table and ACPI/MADT.
The sysfs control file has the following states (read/write):
'on': SMT is enabled. Secondary threads can be freely onlined
'off': SMT is disabled. Secondary threads, even if enumerated
cannot be onlined
'forceoff': SMT is permanentely disabled. Writes to the control
file are rejected.
'notsupported': SMT is not supported by the CPU
The command line option 'nosmt' sets the sysfs control to 'off'. This
can be changed to 'on' to reenable SMT during runtime.
The command line option 'nosmt=force' sets the sysfs control to
'forceoff'. This cannot be changed during runtime.
When SMT is 'on' and the control file is changed to 'off' then all online
secondary threads are offlined and attempts to online a secondary thread
later on are rejected.
When SMT is 'off' and the control file is changed to 'on' then secondary
threads can be onlined again. The 'off' -> 'on' transition does not
automatically online the secondary threads.
When the control file is set to 'forceoff', the behaviour is the same as
setting it to 'off', but the operation is irreversible and later writes to
the control file are rejected.
When the control status is 'notsupported' then writes to the control file
are rejected.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c4de65696d upstream
The asymmetry caused a warning to trigger if the bootup was stopped in state
CPUHP_AP_ONLINE_IDLE. The warning no longer triggers as kthread_park() can
now be invoked on already or still parked threads. But there is still no
reason to have this be asymmetric.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6a4d2657e0 upstream
If the CPU is supporting SMT then the primary thread can be found by
checking the lower APIC ID bits for zero. smp_num_siblings is used to build
the mask for the APIC ID bits which need to be taken into account.
This uses the MPTABLE or ACPI/MADT supplied APIC ID, which can be different
than the initial APIC ID in CPUID. But according to AMD the lower bits have
to be consistent. Intel gave a tentative confirmation as well.
Preparatory patch to support disabling SMT at boot/runtime.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 56563f53d3 upstream
The pr_warn in l1tf_select_mitigation would have used the prior pr_fmt
which was defined as "Spectre V2 : ".
Move the function to be past SSBD and also define the pr_fmt.
Fixes: 17dbca1193 ("x86/speculation/l1tf: Add sysfs reporting for l1tf")
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 377eeaa8e1 upstream
For the L1TF workaround its necessary to limit the swap file size to below
MAX_PA/2, so that the higher bits of the swap offset inverted never point
to valid memory.
Add a mechanism for the architecture to override the swap file size check
in swapfile.c and add a x86 specific max swapfile check function that
enforces that limit.
The check is only enabled if the CPU is vulnerable to L1TF.
In VMs with 42bit MAX_PA the typical limit is 2TB now, on a native system
with 46bit PA it is 32TB. The limit is only per individual swap file, so
it's always possible to exceed these limits with multiple swap files or
partitions.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 42e4089c78 upstream
For L1TF PROT_NONE mappings are protected by inverting the PFN in the page
table entry. This sets the high bits in the CPU's address space, thus
making sure to point to not point an unmapped entry to valid cached memory.
Some server system BIOSes put the MMIO mappings high up in the physical
address space. If such an high mapping was mapped to unprivileged users
they could attack low memory by setting such a mapping to PROT_NONE. This
could happen through a special device driver which is not access
protected. Normal /dev/mem is of course access protected.
To avoid this forbid PROT_NONE mappings or mprotect for high MMIO mappings.
Valid page mappings are allowed because the system is then unsafe anyways.
It's not expected that users commonly use PROT_NONE on MMIO. But to
minimize any impact this is only enforced if the mapping actually refers to
a high MMIO address (defined as the MAX_PA-1 bit being set), and also skip
the check for root.
For mmaps this is straight forward and can be handled in vm_insert_pfn and
in remap_pfn_range().
For mprotect it's a bit trickier. At the point where the actual PTEs are
accessed a lot of state has been changed and it would be difficult to undo
on an error. Since this is a uncommon case use a separate early page talk
walk pass for MMIO PROT_NONE mappings that checks for this condition
early. For non MMIO and non PROT_NONE there are no changes.
[dwmw2: Backport to 4.9]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 17dbca1193 upstream
L1TF core kernel workarounds are cheap and normally always enabled, However
they still should be reported in sysfs if the system is vulnerable or
mitigated. Add the necessary CPU feature/bug bits.
- Extend the existing checks for Meltdowns to determine if the system is
vulnerable. All CPUs which are not vulnerable to Meltdown are also not
vulnerable to L1TF
- Check for 32bit non PAE and emit a warning as there is no practical way
for mitigation due to the limited physical address bits
- If the system has more than MAX_PA/2 physical memory the invert page
workarounds don't protect the system against the L1TF attack anymore,
because an inverted physical address will also point to valid
memory. Print a warning in this case and report that the system is
vulnerable.
Add a function which returns the PFN limit for the L1TF mitigation, which
will be used in follow up patches for sanity and range checks.
[ tglx: Renamed the CPU feature bit to L1TF_PTEINV ]
[ dwmw2: Backport to 4.9 (cpufeatures.h, E820) ]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 10a70416e1 upstream
The L1TF workaround doesn't make any attempt to mitigate speculate accesses
to the first physical page for zeroed PTEs. Normally it only contains some
data from the early real mode BIOS.
It's not entirely clear that the first page is reserved in all
configurations, so add an extra reservation call to make sure it is really
reserved. In most configurations (e.g. with the standard reservations)
it's likely a nop.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6b28baca9b upstream
When PTEs are set to PROT_NONE the kernel just clears the Present bit and
preserves the PFN, which creates attack surface for L1TF speculation
speculation attacks.
This is important inside guests, because L1TF speculation bypasses physical
page remapping. While the host has its own migitations preventing leaking
data from other VMs into the guest, this would still risk leaking the wrong
page inside the current guest.
This uses the same technique as Linus' swap entry patch: while an entry is
is in PROTNONE state invert the complete PFN part part of it. This ensures
that the the highest bit will point to non existing memory.
The invert is done by pte/pmd_modify and pfn/pmd/pud_pte for PROTNONE and
pte/pmd/pud_pfn undo it.
This assume that no code path touches the PFN part of a PTE directly
without using these primitives.
This doesn't handle the case that MMIO is on the top of the CPU physical
memory. If such an MMIO region was exposed by an unpriviledged driver for
mmap it would be possible to attack some real memory. However this
situation is all rather unlikely.
For 32bit non PAE the inversion is not done because there are really not
enough bits to protect anything.
Q: Why does the guest need to be protected when the HyperVisor already has
L1TF mitigations?
A: Here's an example:
Physical pages 1 2 get mapped into a guest as
GPA 1 -> PA 2
GPA 2 -> PA 1
through EPT.
The L1TF speculation ignores the EPT remapping.
Now the guest kernel maps GPA 1 to process A and GPA 2 to process B, and
they belong to different users and should be isolated.
A sets the GPA 1 PA 2 PTE to PROT_NONE to bypass the EPT remapping and
gets read access to the underlying physical page. Which in this case
points to PA 2, so it can read process B's data, if it happened to be in
L1, so isolation inside the guest is broken.
There's nothing the hypervisor can do about this. This mitigation has to
be done in the guest itself.
[ tglx: Massaged changelog ]
[ dwmw2: backported to 4.9 ]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2f22b4cd45 upstream
With L1 terminal fault the CPU speculates into unmapped PTEs, and resulting
side effects allow to read the memory the PTE is pointing too, if its
values are still in the L1 cache.
For swapped out pages Linux uses unmapped PTEs and stores a swap entry into
them.
To protect against L1TF it must be ensured that the swap entry is not
pointing to valid memory, which requires setting higher bits (between bit
36 and bit 45) that are inside the CPUs physical address space, but outside
any real memory.
To do this invert the offset to make sure the higher bits are always set,
as long as the swap file is not too big.
Note there is no workaround for 32bit !PAE, or on systems which have more
than MAX_PA/2 worth of memory. The later case is very unlikely to happen on
real systems.
[AK: updated description and minor tweaks by. Split out from the original
patch ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bcd11afa7a upstream
If pages are swapped out, the swap entry is stored in the corresponding
PTE, which has the Present bit cleared. CPUs vulnerable to L1TF speculate
on PTE entries which have the present bit set and would treat the swap
entry as phsyical address (PFN). To mitigate that the upper bits of the PTE
must be set so the PTE points to non existent memory.
The swap entry stores the type and the offset of a swapped out page in the
PTE. type is stored in bit 9-13 and offset in bit 14-63. The hardware
ignores the bits beyond the phsyical address space limit, so to make the
mitigation effective its required to start 'offset' at the lowest possible
bit so that even large swap offsets do not reach into the physical address
space limit bits.
Move offset to bit 9-58 and type to bit 59-63 which are the bits that
hardware generally doesn't care about.
That, in turn, means that if you on desktop chip with only 40 bits of
physical addressing, now that the offset starts at bit 9, there needs to be
30 bits of offset actually *in use* until bit 39 ends up being set, which
means when inverted it will again point into existing memory.
So that's 4 terabyte of swap space (because the offset is counted in pages,
so 30 bits of offset is 42 bits of actual coverage). With bigger physical
addressing, that obviously grows further, until the limit of the offset is
hit (at 50 bits of offset - 62 bits of actual swap file coverage).
This is a preparatory change for the actual swap entry inversion to protect
against L1TF.
[ AK: Updated description and minor tweaks. Split into two parts ]
[ tglx: Massaged changelog ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 50896e180c upstream
L1 Terminal Fault (L1TF) is a speculation related vulnerability. The CPU
speculates on PTE entries which do not have the PRESENT bit set, if the
content of the resulting physical address is available in the L1D cache.
The OS side mitigation makes sure that a !PRESENT PTE entry points to a
physical address outside the actually existing and cachable memory
space. This is achieved by inverting the upper bits of the PTE. Due to the
address space limitations this only works for 64bit and 32bit PAE kernels,
but not for 32bit non PAE.
This mitigation applies to both host and guest kernels, but in case of a
64bit host (hypervisor) and a 32bit PAE guest, inverting the upper bits of
the PAE address space (44bit) is not enough if the host has more than 43
bits of populated memory address space, because the speculation treats the
PTE content as a physical host address bypassing EPT.
The host (hypervisor) protects itself against the guest by flushing L1D as
needed, but pages inside the guest are not protected against attacks from
other processes inside the same guest.
For the guest the inverted PTE mask has to match the host to provide the
full protection for all pages the host could possibly map into the
guest. The hosts populated address space is not known to the guest, so the
mask must cover the possible maximal host address space, i.e. 52 bit.
On 32bit PAE the maximum PTE mask is currently set to 44 bit because that
is the limit imposed by 32bit unsigned long PFNs in the VMs. This limits
the mask to be below what the host could possible use for physical pages.
The L1TF PROT_NONE protection code uses the PTE masks to determine which
bits to invert to make sure the higher bits are set for unmapped entries to
prevent L1TF speculation attacks against EPT inside guests.
In order to invert all bits that could be used by the host, increase
__PHYSICAL_PAGE_SHIFT to 52 to match 64bit.
The real limit for a 32bit PAE kernel is still 44 bits because all Linux
PTEs are created from unsigned long PFNs, so they cannot be higher than 44
bits on a 32bit kernel. So these extra PFN bits should be never set. The
only users of this macro are using it to look at PTEs, so it's safe.
[ tglx: Massaged changelog ]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5800dc5c19 upstream.
Nadav reported that on guests we're failing to rewrite the indirect
calls to CALLEE_SAVE paravirt functions. In particular the
pv_queued_spin_unlock() call is left unpatched and that is all over the
place. This obviously wrecks Spectre-v2 mitigation (for paravirt
guests) which relies on not actually having indirect calls around.
The reason is an incorrect clobber test in paravirt_patch_call(); this
function rewrites an indirect call with a direct call to the _SAME_
function, there is no possible way the clobbers can be different
because of this.
Therefore remove this clobber check. Also put WARNs on the other patch
failure case (not enough room for the instruction) which I've not seen
trigger in my (limited) testing.
Three live kernel image disassemblies for lock_sock_nested (as a small
function that illustrates the problem nicely). PRE is the current
situation for guests, POST is with this patch applied and NATIVE is with
or without the patch for !guests.
PRE:
(gdb) disassemble lock_sock_nested
Dump of assembler code for function lock_sock_nested:
0xffffffff817be970 <+0>: push %rbp
0xffffffff817be971 <+1>: mov %rdi,%rbp
0xffffffff817be974 <+4>: push %rbx
0xffffffff817be975 <+5>: lea 0x88(%rbp),%rbx
0xffffffff817be97c <+12>: callq 0xffffffff819f7160 <_cond_resched>
0xffffffff817be981 <+17>: mov %rbx,%rdi
0xffffffff817be984 <+20>: callq 0xffffffff819fbb00 <_raw_spin_lock_bh>
0xffffffff817be989 <+25>: mov 0x8c(%rbp),%eax
0xffffffff817be98f <+31>: test %eax,%eax
0xffffffff817be991 <+33>: jne 0xffffffff817be9ba <lock_sock_nested+74>
0xffffffff817be993 <+35>: movl $0x1,0x8c(%rbp)
0xffffffff817be99d <+45>: mov %rbx,%rdi
0xffffffff817be9a0 <+48>: callq *0xffffffff822299e8
0xffffffff817be9a7 <+55>: pop %rbx
0xffffffff817be9a8 <+56>: pop %rbp
0xffffffff817be9a9 <+57>: mov $0x200,%esi
0xffffffff817be9ae <+62>: mov $0xffffffff817be993,%rdi
0xffffffff817be9b5 <+69>: jmpq 0xffffffff81063ae0 <__local_bh_enable_ip>
0xffffffff817be9ba <+74>: mov %rbp,%rdi
0xffffffff817be9bd <+77>: callq 0xffffffff817be8c0 <__lock_sock>
0xffffffff817be9c2 <+82>: jmp 0xffffffff817be993 <lock_sock_nested+35>
End of assembler dump.
POST:
(gdb) disassemble lock_sock_nested
Dump of assembler code for function lock_sock_nested:
0xffffffff817be970 <+0>: push %rbp
0xffffffff817be971 <+1>: mov %rdi,%rbp
0xffffffff817be974 <+4>: push %rbx
0xffffffff817be975 <+5>: lea 0x88(%rbp),%rbx
0xffffffff817be97c <+12>: callq 0xffffffff819f7160 <_cond_resched>
0xffffffff817be981 <+17>: mov %rbx,%rdi
0xffffffff817be984 <+20>: callq 0xffffffff819fbb00 <_raw_spin_lock_bh>
0xffffffff817be989 <+25>: mov 0x8c(%rbp),%eax
0xffffffff817be98f <+31>: test %eax,%eax
0xffffffff817be991 <+33>: jne 0xffffffff817be9ba <lock_sock_nested+74>
0xffffffff817be993 <+35>: movl $0x1,0x8c(%rbp)
0xffffffff817be99d <+45>: mov %rbx,%rdi
0xffffffff817be9a0 <+48>: callq 0xffffffff810a0c20 <__raw_callee_save___pv_queued_spin_unlock>
0xffffffff817be9a5 <+53>: xchg %ax,%ax
0xffffffff817be9a7 <+55>: pop %rbx
0xffffffff817be9a8 <+56>: pop %rbp
0xffffffff817be9a9 <+57>: mov $0x200,%esi
0xffffffff817be9ae <+62>: mov $0xffffffff817be993,%rdi
0xffffffff817be9b5 <+69>: jmpq 0xffffffff81063aa0 <__local_bh_enable_ip>
0xffffffff817be9ba <+74>: mov %rbp,%rdi
0xffffffff817be9bd <+77>: callq 0xffffffff817be8c0 <__lock_sock>
0xffffffff817be9c2 <+82>: jmp 0xffffffff817be993 <lock_sock_nested+35>
End of assembler dump.
NATIVE:
(gdb) disassemble lock_sock_nested
Dump of assembler code for function lock_sock_nested:
0xffffffff817be970 <+0>: push %rbp
0xffffffff817be971 <+1>: mov %rdi,%rbp
0xffffffff817be974 <+4>: push %rbx
0xffffffff817be975 <+5>: lea 0x88(%rbp),%rbx
0xffffffff817be97c <+12>: callq 0xffffffff819f7160 <_cond_resched>
0xffffffff817be981 <+17>: mov %rbx,%rdi
0xffffffff817be984 <+20>: callq 0xffffffff819fbb00 <_raw_spin_lock_bh>
0xffffffff817be989 <+25>: mov 0x8c(%rbp),%eax
0xffffffff817be98f <+31>: test %eax,%eax
0xffffffff817be991 <+33>: jne 0xffffffff817be9ba <lock_sock_nested+74>
0xffffffff817be993 <+35>: movl $0x1,0x8c(%rbp)
0xffffffff817be99d <+45>: mov %rbx,%rdi
0xffffffff817be9a0 <+48>: movb $0x0,(%rdi)
0xffffffff817be9a3 <+51>: nopl 0x0(%rax)
0xffffffff817be9a7 <+55>: pop %rbx
0xffffffff817be9a8 <+56>: pop %rbp
0xffffffff817be9a9 <+57>: mov $0x200,%esi
0xffffffff817be9ae <+62>: mov $0xffffffff817be993,%rdi
0xffffffff817be9b5 <+69>: jmpq 0xffffffff81063ae0 <__local_bh_enable_ip>
0xffffffff817be9ba <+74>: mov %rbp,%rdi
0xffffffff817be9bd <+77>: callq 0xffffffff817be8c0 <__lock_sock>
0xffffffff817be9c2 <+82>: jmp 0xffffffff817be993 <lock_sock_nested+35>
End of assembler dump.
Fixes: 63f70270cc ("[PATCH] i386: PARAVIRT: add common patching machinery")
Fixes: 3010a0663f ("x86/paravirt, objtool: Annotate indirect calls")
Reported-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1bcfe05640 upstream.
Use the correct IRQ line for the MSI controller in the PCIe host
controller. Apparently a different IRQ line is used compared to other
i.MX6 variants. Without this change MSI IRQs aren't properly propagated
to the upstream interrupt controller.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Fixes: b1d17f68e5 ("ARM: dts: imx: add initial imx6sx device tree source")
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 062d0f22a3 upstream.
In write to debugfs file 'resource_stats' the local buffer 'tmp_str' is
written at index 'count-1' where 'count' is the size of the write, so
potentially 0.
This patch filters odd values for the write size/position to avoid this
type of problem.
Signed-off-by: Michael Mera <dev@michaelmera.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d8f9cc328c upstream.
To allow rereg_user_mr to modify the MR from read-only to writable without
using get_user_pages again, we needed to define the initial MR as writable.
However, this was originally done unconditionally, without taking into
account the writability of the underlying virtual memory.
As a result, any attempt to register a read-only MR over read-only
virtual memory failed.
To fix this, do not add the writable flag bit when the user virtual memory
is not writable (e.g. const memory).
However, when the underlying memory is NOT writable (and we therefore
do not define the initial MR as writable), the IB core adds a
"force writable" flag to its user-pages request. If this succeeds,
the reg_user_mr caller gets a writable copy of the original pages.
If the user-space caller then does a rereg_user_mr operation to enable
writability, this will succeed. This should not be allowed, since
the original virtual memory was not writable.
Cc: <stable@vger.kernel.org>
Fixes: 9376932d0c ("IB/mlx4_ib: Add support for user MR re-registration")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2fd1d2c4ce upstream.
Andrei Vagin writes:
FYI: This bug has been reproduced on 4.11.7
> BUG: Dentry ffff895a3dd01240{i=4e7c09a,n=lo} still in use (1) [unmount of proc proc]
> ------------[ cut here ]------------
> WARNING: CPU: 1 PID: 13588 at fs/dcache.c:1445 umount_check+0x6e/0x80
> CPU: 1 PID: 13588 Comm: kworker/1:1 Not tainted 4.11.7-200.fc25.x86_64 #1
> Hardware name: CompuLab sbc-flt1/fitlet, BIOS SBCFLT_0.08.04 06/27/2015
> Workqueue: events proc_cleanup_work
> Call Trace:
> dump_stack+0x63/0x86
> __warn+0xcb/0xf0
> warn_slowpath_null+0x1d/0x20
> umount_check+0x6e/0x80
> d_walk+0xc6/0x270
> ? dentry_free+0x80/0x80
> do_one_tree+0x26/0x40
> shrink_dcache_for_umount+0x2d/0x90
> generic_shutdown_super+0x1f/0xf0
> kill_anon_super+0x12/0x20
> proc_kill_sb+0x40/0x50
> deactivate_locked_super+0x43/0x70
> deactivate_super+0x5a/0x60
> cleanup_mnt+0x3f/0x90
> mntput_no_expire+0x13b/0x190
> kern_unmount+0x3e/0x50
> pid_ns_release_proc+0x15/0x20
> proc_cleanup_work+0x15/0x20
> process_one_work+0x197/0x450
> worker_thread+0x4e/0x4a0
> kthread+0x109/0x140
> ? process_one_work+0x450/0x450
> ? kthread_park+0x90/0x90
> ret_from_fork+0x2c/0x40
> ---[ end trace e1c109611e5d0b41 ]---
> VFS: Busy inodes after unmount of proc. Self-destruct in 5 seconds. Have a nice day...
> BUG: unable to handle kernel NULL pointer dereference at (null)
> IP: _raw_spin_lock+0xc/0x30
> PGD 0
Fix this by taking a reference to the super block in proc_sys_prune_dcache.
The superblock reference is the core of the fix however the sysctl_inodes
list is converted to a hlist so that hlist_del_init_rcu may be used. This
allows proc_sys_prune_dache to remove inodes the sysctl_inodes list, while
not causing problems for proc_sys_evict_inode when if it later choses to
remove the inode from the sysctl_inodes list. Removing inodes from the
sysctl_inodes list allows proc_sys_prune_dcache to have a progress
guarantee, while still being able to drop all locks. The fact that
head->unregistering is set in start_unregistering ensures that no more
inodes will be added to the the sysctl_inodes list.
Previously the code did a dance where it delayed calling iput until the
next entry in the list was being considered to ensure the inode remained on
the sysctl_inodes list until the next entry was walked to. The structure
of the loop in this patch does not need that so is much easier to
understand and maintain.
Cc: stable@vger.kernel.org
Reported-by: Andrei Vagin <avagin@gmail.com>
Tested-by: Andrei Vagin <avagin@openvz.org>
Fixes: ace0c791e6 ("proc/sysctl: Don't grab i_lock under sysctl_lock.")
Fixes: d6cffbbe9a ("proc/sysctl: prune stale dentries during unregistering")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ace0c791e6 upstream.
Konstantin Khlebnikov <khlebnikov@yandex-team.ru> writes:
> This patch has locking problem. I've got lockdep splat under LTP.
>
> [ 6633.115456] ======================================================
> [ 6633.115502] [ INFO: possible circular locking dependency detected ]
> [ 6633.115553] 4.9.10-debug+ #9 Tainted: G L
> [ 6633.115584] -------------------------------------------------------
> [ 6633.115627] ksm02/284980 is trying to acquire lock:
> [ 6633.115659] (&sb->s_type->i_lock_key#4){+.+...}, at: [<ffffffff816bc1ce>] igrab+0x1e/0x80
> [ 6633.115834] but task is already holding lock:
> [ 6633.115882] (sysctl_lock){+.+...}, at: [<ffffffff817e379b>] unregister_sysctl_table+0x6b/0x110
> [ 6633.116026] which lock already depends on the new lock.
> [ 6633.116026]
> [ 6633.116080]
> [ 6633.116080] the existing dependency chain (in reverse order) is:
> [ 6633.116117]
> -> #2 (sysctl_lock){+.+...}:
> -> #1 (&(&dentry->d_lockref.lock)->rlock){+.+...}:
> -> #0 (&sb->s_type->i_lock_key#4){+.+...}:
>
> d_lock nests inside i_lock
> sysctl_lock nests inside d_lock in d_compare
>
> This patch adds i_lock nesting inside sysctl_lock.
Al Viro <viro@ZenIV.linux.org.uk> replied:
> Once ->unregistering is set, you can drop sysctl_lock just fine. So I'd
> try something like this - use rcu_read_lock() in proc_sys_prune_dcache(),
> drop sysctl_lock() before it and regain after. Make sure that no inodes
> are added to the list ones ->unregistering has been set and use RCU list
> primitives for modifying the inode list, with sysctl_lock still used to
> serialize its modifications.
>
> Freeing struct inode is RCU-delayed (see proc_destroy_inode()), so doing
> igrab() is safe there. Since we don't drop inode reference until after we'd
> passed beyond it in the list, list_for_each_entry_rcu() should be fine.
I agree with Al Viro's analsysis of the situtation.
Fixes: d6cffbbe9a ("proc/sysctl: prune stale dentries during unregistering")
Reported-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Tested-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Suggested-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d6cffbbe9a upstream.
Currently unregistering sysctl table does not prune its dentries.
Stale dentries could slowdown sysctl operations significantly.
For example, command:
# for i in {1..100000} ; do unshare -n -- sysctl -a &> /dev/null ; done
creates a millions of stale denties around sysctls of loopback interface:
# sysctl fs.dentry-state
fs.dentry-state = 25812579 24724135 45 0 0 0
All of them have matching names thus lookup have to scan though whole
hash chain and call d_compare (proc_sys_compare) which checks them
under system-wide spinlock (sysctl_lock).
# time sysctl -a > /dev/null
real 1m12.806s
user 0m0.016s
sys 1m12.400s
Currently only memory reclaimer could remove this garbage.
But without significant memory pressure this never happens.
This patch collects sysctl inodes into list on sysctl table header and
prunes all their dentries once that table unregisters.
Konstantin Khlebnikov <khlebnikov@yandex-team.ru> writes:
> On 10.02.2017 10:47, Al Viro wrote:
>> how about >> the matching stats *after* that patch?
>
> dcache size doesn't grow endlessly, so stats are fine
>
> # sysctl fs.dentry-state
> fs.dentry-state = 92712 58376 45 0 0 0
>
> # time sysctl -a &>/dev/null
>
> real 0m0.013s
> user 0m0.004s
> sys 0m0.008s
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 119e1ef80e upstream.
__legitimize_mnt() has two problems - one is that in case of success
the check of mount_lock is not ordered wrt preceding increment of
refcount, making it possible to have successful __legitimize_mnt()
on one CPU just before the otherwise final mntpu() on another,
with __legitimize_mnt() not seeing mntput() taking the lock and
mntput() not seeing the increment done by __legitimize_mnt().
Solved by a pair of barriers.
Another is that failure of __legitimize_mnt() on the second
read_seqretry() leaves us with reference that'll need to be
dropped by caller; however, if that races with final mntput()
we can end up with caller dropping rcu_read_lock() and doing
mntput() to release that reference - with the first mntput()
having freed the damn thing just as rcu_read_lock() had been
dropped. Solution: in "do mntput() yourself" failure case
grab mount_lock, check if MNT_DOOMED has been set by racing
final mntput() that has missed our increment and if it has -
undo the increment and treat that as "failure, caller doesn't
need to drop anything" case.
It's not easy to hit - the final mntput() has to come right
after the first read_seqretry() in __legitimize_mnt() *and*
manage to miss the increment done by __legitimize_mnt() before
the second read_seqretry() in there. The things that are almost
impossible to hit on bare hardware are not impossible on SMP
KVM, though...
Reported-by: Oleg Nesterov <oleg@redhat.com>
Fixes: 48a066e72d ("RCU'd vsfmounts")
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9ea0a46ca2 upstream.
mntput_no_expire() does the calculation of total refcount under mount_lock;
unfortunately, the decrement (as well as all increments) are done outside
of it, leading to false positives in the "are we dropping the last reference"
test. Consider the following situation:
* mnt is a lazy-umounted mount, kept alive by two opened files. One
of those files gets closed. Total refcount of mnt is 2. On CPU 42
mntput(mnt) (called from __fput()) drops one reference, decrementing component
* After it has looked at component #0, the process on CPU 0 does
mntget(), incrementing component #0, gets preempted and gets to run again -
on CPU 69. There it does mntput(), which drops the reference (component #69)
and proceeds to spin on mount_lock.
* On CPU 42 our first mntput() finishes counting. It observes the
decrement of component #69, but not the increment of component #0. As the
result, the total it gets is not 1 as it should've been - it's 0. At which
point we decide that vfsmount needs to be killed and proceed to free it and
shut the filesystem down. However, there's still another opened file
on that filesystem, with reference to (now freed) vfsmount, etc. and we are
screwed.
It's not a wide race, but it can be reproduced with artificial slowdown of
the mnt_get_count() loop, and it should be easier to hit on SMP KVM setups.
Fix consists of moving the refcount decrement under mount_lock; the tricky
part is that we want (and can) keep the fast case (i.e. mount that still
has non-NULL ->mnt_ns) entirely out of mount_lock. All places that zero
mnt->mnt_ns are dropping some reference to mnt and they call synchronize_rcu()
before that mntput(). IOW, if mntput() observes (under rcu_read_lock())
a non-NULL ->mnt_ns, it is guaranteed that there is another reference yet to
be dropped.
Reported-by: Jann Horn <jannh@google.com>
Tested-by: Jann Horn <jannh@google.com>
Fixes: 48a066e72d ("RCU'd vsfmounts")
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4c0d7cd5c8 upstream.
RCU pathwalk relies upon the assumption that anything that changes
->d_inode of a dentry will invalidate its ->d_seq. That's almost
true - the one exception is that the final dput() of already unhashed
dentry does *not* touch ->d_seq at all. Unhashing does, though,
so for anything we'd found by RCU dcache lookup we are fine.
Unfortunately, we can *start* with an unhashed dentry or jump into
it.
We could try and be careful in the (few) places where that could
happen. Or we could just make the final dput() invalidate the damn
thing, unhashed or not. The latter is much simpler and easier to
backport, so let's do it that way.
Reported-by: "Dae R. Jeong" <threeearcat@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 90bad5e05b upstream.
Since mountpoint crossing can happen without leaving lazy mode,
root dentries do need the same protection against having their
memory freed without RCU delay as everything else in the tree.
It's partially hidden by RCU delay between detaching from the
mount tree and dropping the vfsmount reference, but the starting
point of pathwalk can be on an already detached mount, in which
case umount-caused RCU delay has already passed by the time the
lazy pathwalk grabs rcu_read_lock(). If the starting point
happens to be at the root of that vfsmount *and* that vfsmount
covers the entire filesystem, we get trouble.
Fixes: 48a066e72d ("RCU'd vsfmounts")
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b5b1404d08 upstream.
This is purely a preparatory patch for upcoming changes during the 4.19
merge window.
We have a function called "boot_cpu_state_init()" that isn't really
about the bootup cpu state: that is done much earlier by the similarly
named "boot_cpu_init()" (note lack of "state" in name).
This function initializes some hotplug CPU state, and needs to run after
the percpu data has been properly initialized. It even has a comment to
that effect.
Except it _doesn't_ actually run after the percpu data has been properly
initialized. On x86 it happens to do that, but on at least arm and
arm64, the percpu base pointers are initialized by the arch-specific
'smp_prepare_boot_cpu()' hook, which ran _after_ boot_cpu_state_init().
This had some unexpected results, and in particular we have a patch
pending for the merge window that did the obvious cleanup of using
'this_cpu_write()' in the cpu hotplug init code:
- per_cpu_ptr(&cpuhp_state, smp_processor_id())->state = CPUHP_ONLINE;
+ this_cpu_write(cpuhp_state.state, CPUHP_ONLINE);
which is obviously the right thing to do. Except because of the
ordering issue, it actually failed miserably and unexpectedly on arm64.
So this just fixes the ordering, and changes the name of the function to
be 'boot_cpu_hotplug_init()' to make it obvious that it's about cpu
hotplug state, because the core CPU state was supposed to have already
been done earlier.
Marked for stable, since the (not yet merged) patch that will show this
problem is marked for stable.
Reported-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Mian Yousaf Kaukab <yousaf.kaukab@suse.com>
Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1214fd7b49 upstream.
Surround scsi_execute() calls with scsi_autopm_get_device() and
scsi_autopm_put_device(). Note: removing sr_mutex protection from the
scsi_cd_get() and scsi_cd_put() calls is safe because the purpose of
sr_mutex is to serialize cdrom_*() calls.
This patch avoids that complaints similar to the following appear in the
kernel log if runtime power management is enabled:
INFO: task systemd-udevd:650 blocked for more than 120 seconds.
Not tainted 4.18.0-rc7-dbg+ #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
systemd-udevd D28176 650 513 0x00000104
Call Trace:
__schedule+0x444/0xfe0
schedule+0x4e/0xe0
schedule_preempt_disabled+0x18/0x30
__mutex_lock+0x41c/0xc70
mutex_lock_nested+0x1b/0x20
__blkdev_get+0x106/0x970
blkdev_get+0x22c/0x5a0
blkdev_open+0xe9/0x100
do_dentry_open.isra.19+0x33e/0x570
vfs_open+0x7c/0xd0
path_openat+0x6e3/0x1120
do_filp_open+0x11c/0x1c0
do_sys_open+0x208/0x2d0
__x64_sys_openat+0x59/0x70
do_syscall_64+0x77/0x230
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Maurizio Lombardi <mlombard@redhat.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: <stable@vger.kernel.org>
Tested-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fdcb613d49 upstream.
The LPSS PWM device on on Bay Trail and Cherry Trail devices has a set
of private registers at offset 0x800, the current lpss_device_desc for
them already sets the LPSS_SAVE_CTX flag to have these saved/restored
over device-suspend, but the current lpss_device_desc was not setting
the prv_offset field, leading to the regular device registers getting
saved/restored instead.
This is causing the PWM controller to no longer work, resulting in a black
screen, after a suspend/resume on systems where the firmware clears the
APB clock and reset bits at offset 0x804.
This commit fixes this by properly setting prv_offset to 0x800 for
the PWM devices.
Cc: stable@vger.kernel.org
Fixes: e1c7481797 ("ACPI / LPSS: Add Intel BayTrail ACPI mode PWM")
Fixes: 1bfbd8eb8a ("ACPI / LPSS: Add ACPI IDs for Intel Braswell")
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Rafael J . Wysocki <rjw@rjwysocki.net>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3c53776e29 upstream.
Way back in 4.9, we committed 4cd13c21b2 ("softirq: Let ksoftirqd do
its job"), and ever since we've had small nagging issues with it. For
example, we've had:
1ff688209e ("watchdog: core: make sure the watchdog_worker is not deferred")
8d5755b3f7 ("watchdog: softdog: fire watchdog even if softirqs do not get to run")
217f697436 ("net: busy-poll: allow preemption in sk_busy_loop()")
all of which worked around some of the effects of that commit.
The DVB people have also complained that the commit causes excessive USB
URB latencies, which seems to be due to the USB code using tasklets to
schedule USB traffic. This seems to be an issue mainly when already
living on the edge, but waiting for ksoftirqd to handle it really does
seem to cause excessive latencies.
Now Hanna Hawa reports that this issue isn't just limited to USB URB and
DVB, but also causes timeout problems for the Marvell SoC team:
"I'm facing kernel panic issue while running raid 5 on sata disks
connected to Macchiatobin (Marvell community board with Armada-8040
SoC with 4 ARMv8 cores of CA72) Raid 5 built with Marvell DMA engine
and async_tx mechanism (ASYNC_TX_DMA [=y]); the DMA driver (mv_xor_v2)
uses a tasklet to clean the done descriptors from the queue"
The latency problem causes a panic:
mv_xor_v2 f0400000.xor: dma_sync_wait: timeout!
Kernel panic - not syncing: async_tx_quiesce: DMA error waiting for transaction
We've discussed simply just reverting the original commit entirely, and
also much more involved solutions (with per-softirq threads etc). This
patch is intentionally stupid and fairly limited, because the issue
still remains, and the other solutions either got sidetracked or had
other issues.
We should probably also consider the timer softirqs to be synchronous
and not be delayed to ksoftirqd (since they were the issue with the
earlier watchdog problems), but that should be done as a separate patch.
This does only the tasklet cases.
Reported-and-tested-by: Hanna Hawa <hannah@marvell.com>
Reported-and-tested-by: Josef Griebichler <griebichler.josef@gmx.at>
Reported-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fedb8da963 upstream.
For years I thought all parisc machines executed loads and stores in
order. However, Jeff Law recently indicated on gcc-patches that this is
not correct. There are various degrees of out-of-order execution all the
way back to the PA7xxx processor series (hit-under-miss). The PA8xxx
series has full out-of-order execution for both integer operations, and
loads and stores.
This is described in the following article:
http://web.archive.org/web/20040214092531/http://www.cpus.hp.com/technical_references/advperf.shtml
For this reason, we need to define mb() and to insert a memory barrier
before the store unlocking spinlocks. This ensures that all memory
accesses are complete prior to unlocking. The ldcw instruction performs
the same function on entry.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Cc: stable@vger.kernel.org # 4.0+
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 66509a276c upstream.
Enable the -mlong-calls compiler option by default, because otherwise in most
cases linking the vmlinux binary fails due to truncations of R_PARISC_PCREL22F
relocations. This fixes building the 64-bit defconfig.
Cc: stable@vger.kernel.org # 4.0+
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3ab2011ea3 upstream.
There is a race condition in tpm_common_write function allowing
two threads on the same /dev/tpm<N>, or two different applications
on the same /dev/tpmrm<N> to overwrite each other commands/responses.
Fixed this by taking the priv->buffer_mutex early in the function.
Also converted the priv->data_pending from atomic to a regular size_t
type. There is no need for it to be atomic since it is only touched
under the protection of the priv->buffer_mutex.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Cc: stable@vger.kernel.org
Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5012284700 upstream.
Commit 8844618d8a: "ext4: only look at the bg_flags field if it is
valid" will complain if block group zero does not have the
EXT4_BG_INODE_ZEROED flag set. Unfortunately, this is not correct,
since a freshly created file system has this flag cleared. It gets
almost immediately after the file system is mounted read-write --- but
the following somewhat unlikely sequence will end up triggering a
false positive report of a corrupted file system:
mkfs.ext4 /dev/vdc
mount -o ro /dev/vdc /vdc
mount -o remount,rw /dev/vdc
Instead, when initializing the inode table for block group zero, test
to make sure that itable_unused count is not too large, since that is
the case that will result in some or all of the reserved inodes
getting cleared.
This fixes the failures reported by Eric Whiteney when running
generic/230 and generic/231 in the the nojournal test case.
Fixes: 8844618d8a ("ext4: only look at the bg_flags field if it is valid")
Reported-by: Eric Whitney <enwlinux@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-15 18:14:41 +02:00
1351 changed files with 13515 additions and 6567 deletions
* Allow the asm code to perform the finalization if there is no
* partial data and the input is a round multiple of the block size.
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.