commit d72e91efca upstream.
Remove devm_kfree of memory where VLAN entry to RVU PF mapping
info is saved. This will be freed anyway at driver exit.
Having this could result in warning from devm_kfree() if
the memory is not allocated due to errors in rvu_nix_block_init()
before nix_setup_txvlan().
Fixes: 9a946def26 ("octeontx2-af: Modify nix_vtag_cfg mailbox to support TX VTAG entries")
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c07d5c9226 upstream.
Commit c47a5599ed ("perf tools: Fix pattern matching for same
substring in different PMU type"), may have fixed some alias matching,
but has broken some others.
Firstly it cannot handle the simple scenario of PMU name in form
pmu_name{digits} - it can only handle pmu_name_{digits}.
Secondly it cannot handle more complex matching in the case where we
have multiple tokens. In this scenario, the code failed to realise that
we may examine multiple substrings in the PMU name.
Fix in two ways:
- Change perf_pmu__valid_suffix() to accept a PMU name without '_' in the
suffix
- Only pay attention to perf_pmu__valid_suffix() for the final token
Also add const qualifiers as necessary to avoid casting.
Fixes: c47a5599ed ("perf tools: Fix pattern matching for same substring in different PMU type")
Signed-off-by: John Garry <john.garry@huawei.com>
Tested-by: Jin Yao <yao.jin@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/1626793819-79090-1-git-send-email-john.garry@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0c71437dd5 upstream.
The j1939_session_deactivate() is decrementing the session ref-count and
potentially can free() the session. This would cause use-after-free
situation.
However, the code calling j1939_session_deactivate() does always hold
another reference to the session, so that it would not be free()ed in
this code path.
This patch adds a comment to make this clear and a WARN_ON, to ensure
that future changes will not violate this requirement. Further this
patch avoids dereferencing the session pointer as a precaution to avoid
use-after-free if the session is actually free()ed.
Fixes: 9d71dd0c70 ("can: add support of SAE J1939 protocol")
Link: https://lore.kernel.org/r/20210714111602.24021-1-o.rempel@pengutronix.de
Reported-by: Xiaochen Zou <xzou017@ucr.edu>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 110aa25c3c upstream.
We use a bit to manage if we need to add the shared task_work, but
a list + lock for the pending work. Before aborting a current run
of the task_work we check if the list is empty, but we do so without
grabbing the lock that protects it. This can lead to races where
we think we have nothing left to run, where in practice we could be
racing with a task adding new work to the list. If we do hit that
race condition, we could be left with work items that need processing,
but the shared task_work is not active.
Ensure that we grab the lock before checking if the list is empty,
so we know if it's safe to exit the run or not.
Link: https://lore.kernel.org/io-uring/c6bd5987-e9ae-cd02-49d0-1b3ac1ef65b1@tnonline.net/
Cc: stable@vger.kernel.org # 5.11+
Reported-by: Forza <forza@tnonline.net>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 333cf50746 upstream.
With commit c9f3401313 ("powerpc: Always enable queued spinlocks for
64s, disable for others") CONFIG_PPC_QUEUED_SPINLOCKS is always
enabled on ppc64le, external modules that use spinlock APIs are
failing.
ERROR: modpost: GPL-incompatible module XXX.ko uses GPL-only symbol 'shared_processor'
Before the above commit, modules were able to build without any
issues. Also this problem is not seen on other architectures. This
problem can be workaround if CONFIG_UNINLINE_SPIN_UNLOCK is enabled in
the config. However CONFIG_UNINLINE_SPIN_UNLOCK is not enabled by
default and only enabled in certain conditions like
CONFIG_DEBUG_SPINLOCKS is set in the kernel config.
#include <linux/module.h>
spinlock_t spLock;
static int __init spinlock_test_init(void)
{
spin_lock_init(&spLock);
spin_lock(&spLock);
spin_unlock(&spLock);
return 0;
}
static void __exit spinlock_test_exit(void)
{
printk("spinlock_test unloaded\n");
}
module_init(spinlock_test_init);
module_exit(spinlock_test_exit);
MODULE_DESCRIPTION ("spinlock_test");
MODULE_LICENSE ("non-GPL");
MODULE_AUTHOR ("Srikar Dronamraju");
Given that spin locks are one of the basic facilities for module code,
this effectively makes it impossible to build/load almost any non GPL
modules on ppc64le.
This was first reported at https://github.com/openzfs/zfs/issues/11172
Currently shared_processor is exported as GPL only symbol.
Fix this for parity with other architectures by exposing
shared_processor to non-GPL modules too.
Fixes: 14c73bd344 ("powerpc/vcpu: Assume dedicated processors as non-preempt")
Cc: stable@vger.kernel.org # v5.5+
Reported-by: marc.c.dionne@gmail.com
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210729060449.292780-1-srikar@linux.vnet.ibm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a88603f4b9 upstream.
The Go runtime uses r30 for some special value called 'g'. It assumes
that value will remain unchanged even when calling VDSO functions.
Although r30 is non-volatile across function calls, the callee is free
to use it, as long as the callee saves the value and restores it before
returning.
It used to be true by accident that the VDSO didn't use r30, because the
VDSO was hand-written asm. When we switched to building the VDSO from C
the compiler started using r30, at least in some builds, leading to
crashes in Go. eg:
~/go/src$ ./all.bash
Building Go cmd/dist using /usr/lib/go-1.16. (go1.16.2 linux/ppc64le)
Building Go toolchain1 using /usr/lib/go-1.16.
go build os/exec: /usr/lib/go-1.16/pkg/tool/linux_ppc64le/compile: signal: segmentation fault
go build reflect: /usr/lib/go-1.16/pkg/tool/linux_ppc64le/compile: signal: segmentation fault
go tool dist: FAILED: /usr/lib/go-1.16/bin/go install -gcflags=-l -tags=math_big_pure_go compiler_bootstrap bootstrap/cmd/...: exit status 1
There are patches in flight to fix Go[1], but until they are released
and widely deployed we can workaround it in the VDSO by avoiding use of
r30.
Note this only works with GCC, clang does not support -ffixed-rN.
1: https://go-review.googlesource.com/c/go/+/328110
Fixes: ab037dd87a ("powerpc/vdso: Switch VDSO to generic C implementation.")
Cc: stable@vger.kernel.org # v5.11+
Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210729131244.2595519-1-mpe@ellerman.id.au
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f2a26a3cff upstream.
readpage was calculating the offset of the page incorrectly
for the case of large swapcaches.
loff_t offset = (loff_t)page->index << PAGE_SHIFT;
As pointed out by Matthew Wilcox, this needs to use
page_file_offset() to calculate the offset instead.
Pages coming from the swap cache have page->index set
to their index within the swapcache, not within the backing
file. For a sufficiently large swapcache, we could have
overlapping values of page->index within the same backing file.
Suggested by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: <stable@vger.kernel.org> # v5.7+
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e042aa532c upstream.
In 7fedb63a83 ("bpf: Tighten speculative pointer arithmetic mask") we
narrowed the offset mask for unprivileged pointer arithmetic in order to
mitigate a corner case where in the speculative domain it is possible to
advance, for example, the map value pointer by up to value_size-1 out-of-
bounds in order to leak kernel memory via side-channel to user space.
The verifier's state pruning for scalars leaves one corner case open
where in the first verification path R_x holds an unknown scalar with an
aux->alu_limit of e.g. 7, and in a second verification path that same
register R_x, here denoted as R_x', holds an unknown scalar which has
tighter bounds and would thus satisfy range_within(R_x, R_x') as well as
tnum_in(R_x, R_x') for state pruning, yielding an aux->alu_limit of 3:
Given the second path fits the register constraints for pruning, the final
generated mask from aux->alu_limit will remain at 7. While technically
not wrong for the non-speculative domain, it would however be possible
to craft similar cases where the mask would be too wide as in 7fedb63a83.
One way to fix it is to detect the presence of unknown scalar map pointer
arithmetic and force a deeper search on unknown scalars to ensure that
we do not run into a masking mismatch.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 59089a189e upstream.
Follow-up to fe9a5ca7e3 ("bpf: Do not mark insn as seen under speculative
path verification"). The sanitize_insn_aux_data() helper does not serve a
particular purpose in today's code. The original intention for the helper
was that if function-by-function verification fails, a given program would
be cleared from temporary insn_aux_data[], and then its verification would
be re-attempted in the context of the main program a second time.
However, a failure in do_check_subprogs() will skip do_check_main() and
propagate the error to the user instead, thus such situation can never occur.
Given its interaction is not compatible to the Spectre v1 mitigation (due to
comparing aux->seen with env->pass_cnt), just remove sanitize_insn_aux_data()
to avoid future bugs in this area.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2039f26f3a ]
Spectre v4 gadgets make use of memory disambiguation, which is a set of
techniques that execute memory access instructions, that is, loads and
stores, out of program order; Intel's optimization manual, section 2.4.4.5:
A load instruction micro-op may depend on a preceding store. Many
microarchitectures block loads until all preceding store addresses are
known. The memory disambiguator predicts which loads will not depend on
any previous stores. When the disambiguator predicts that a load does
not have such a dependency, the load takes its data from the L1 data
cache. Eventually, the prediction is verified. If an actual conflict is
detected, the load and all succeeding instructions are re-executed.
af86ca4e30 ("bpf: Prevent memory disambiguation attack") tried to mitigate
this attack by sanitizing the memory locations through preemptive "fast"
(low latency) stores of zero prior to the actual "slow" (high latency) store
of a pointer value such that upon dependency misprediction the CPU then
speculatively executes the load of the pointer value and retrieves the zero
value instead of the attacker controlled scalar value previously stored at
that location, meaning, subsequent access in the speculative domain is then
redirected to the "zero page".
The sanitized preemptive store of zero prior to the actual "slow" store is
done through a simple ST instruction based on r10 (frame pointer) with
relative offset to the stack location that the verifier has been tracking
on the original used register for STX, which does not have to be r10. Thus,
there are no memory dependencies for this store, since it's only using r10
and immediate constant of zero; hence af86ca4e30 /assumed/ a low latency
operation.
However, a recent attack demonstrated that this mitigation is not sufficient
since the preemptive store of zero could also be turned into a "slow" store
and is thus bypassed as well:
[...]
// r2 = oob address (e.g. scalar)
// r7 = pointer to map value
31: (7b) *(u64 *)(r10 -16) = r2
// r9 will remain "fast" register, r10 will become "slow" register below
32: (bf) r9 = r10
// JIT maps BPF reg to x86 reg:
// r9 -> r15 (callee saved)
// r10 -> rbp
// train store forward prediction to break dependency link between both r9
// and r10 by evicting them from the predictor's LRU table.
33: (61) r0 = *(u32 *)(r7 +24576)
34: (63) *(u32 *)(r7 +29696) = r0
35: (61) r0 = *(u32 *)(r7 +24580)
36: (63) *(u32 *)(r7 +29700) = r0
37: (61) r0 = *(u32 *)(r7 +24584)
38: (63) *(u32 *)(r7 +29704) = r0
39: (61) r0 = *(u32 *)(r7 +24588)
40: (63) *(u32 *)(r7 +29708) = r0
[...]
543: (61) r0 = *(u32 *)(r7 +25596)
544: (63) *(u32 *)(r7 +30716) = r0
// prepare call to bpf_ringbuf_output() helper. the latter will cause rbp
// to spill to stack memory while r13/r14/r15 (all callee saved regs) remain
// in hardware registers. rbp becomes slow due to push/pop latency. below is
// disasm of bpf_ringbuf_output() helper for better visual context:
//
// ffffffff8117ee20: 41 54 push r12
// ffffffff8117ee22: 55 push rbp
// ffffffff8117ee23: 53 push rbx
// ffffffff8117ee24: 48 f7 c1 fc ff ff ff test rcx,0xfffffffffffffffc
// ffffffff8117ee2b: 0f 85 af 00 00 00 jne ffffffff8117eee0 <-- jump taken
// [...]
// ffffffff8117eee0: 49 c7 c4 ea ff ff ff mov r12,0xffffffffffffffea
// ffffffff8117eee7: 5b pop rbx
// ffffffff8117eee8: 5d pop rbp
// ffffffff8117eee9: 4c 89 e0 mov rax,r12
// ffffffff8117eeec: 41 5c pop r12
// ffffffff8117eeee: c3 ret
545: (18) r1 = map[id:4]
547: (bf) r2 = r7
548: (b7) r3 = 0
549: (b7) r4 = 4
550: (85) call bpf_ringbuf_output#194288
// instruction 551 inserted by verifier \
551: (7a) *(u64 *)(r10 -16) = 0 | /both/ are now slow stores here
// storing map value pointer r7 at fp-16 | since value of r10 is "slow".
552: (7b) *(u64 *)(r10 -16) = r7 /
// following "fast" read to the same memory location, but due to dependency
// misprediction it will speculatively execute before insn 551/552 completes.
553: (79) r2 = *(u64 *)(r9 -16)
// in speculative domain contains attacker controlled r2. in non-speculative
// domain this contains r7, and thus accesses r7 +0 below.
554: (71) r3 = *(u8 *)(r2 +0)
// leak r3
As can be seen, the current speculative store bypass mitigation which the
verifier inserts at line 551 is insufficient since /both/, the write of
the zero sanitation as well as the map value pointer are a high latency
instruction due to prior memory access via push/pop of r10 (rbp) in contrast
to the low latency read in line 553 as r9 (r15) which stays in hardware
registers. Thus, architecturally, fp-16 is r7, however, microarchitecturally,
fp-16 can still be r2.
Initial thoughts to address this issue was to track spilled pointer loads
from stack and enforce their load via LDX through r10 as well so that /both/
the preemptive store of zero /as well as/ the load use the /same/ register
such that a dependency is created between the store and load. However, this
option is not sufficient either since it can be bypassed as well under
speculation. An updated attack with pointer spill/fills now _all_ based on
r10 would look as follows:
[...]
// r2 = oob address (e.g. scalar)
// r7 = pointer to map value
[...]
// longer store forward prediction training sequence than before.
2062: (61) r0 = *(u32 *)(r7 +25588)
2063: (63) *(u32 *)(r7 +30708) = r0
2064: (61) r0 = *(u32 *)(r7 +25592)
2065: (63) *(u32 *)(r7 +30712) = r0
2066: (61) r0 = *(u32 *)(r7 +25596)
2067: (63) *(u32 *)(r7 +30716) = r0
// store the speculative load address (scalar) this time after the store
// forward prediction training.
2068: (7b) *(u64 *)(r10 -16) = r2
// preoccupy the CPU store port by running sequence of dummy stores.
2069: (63) *(u32 *)(r7 +29696) = r0
2070: (63) *(u32 *)(r7 +29700) = r0
2071: (63) *(u32 *)(r7 +29704) = r0
2072: (63) *(u32 *)(r7 +29708) = r0
2073: (63) *(u32 *)(r7 +29712) = r0
2074: (63) *(u32 *)(r7 +29716) = r0
2075: (63) *(u32 *)(r7 +29720) = r0
2076: (63) *(u32 *)(r7 +29724) = r0
2077: (63) *(u32 *)(r7 +29728) = r0
2078: (63) *(u32 *)(r7 +29732) = r0
2079: (63) *(u32 *)(r7 +29736) = r0
2080: (63) *(u32 *)(r7 +29740) = r0
2081: (63) *(u32 *)(r7 +29744) = r0
2082: (63) *(u32 *)(r7 +29748) = r0
2083: (63) *(u32 *)(r7 +29752) = r0
2084: (63) *(u32 *)(r7 +29756) = r0
2085: (63) *(u32 *)(r7 +29760) = r0
2086: (63) *(u32 *)(r7 +29764) = r0
2087: (63) *(u32 *)(r7 +29768) = r0
2088: (63) *(u32 *)(r7 +29772) = r0
2089: (63) *(u32 *)(r7 +29776) = r0
2090: (63) *(u32 *)(r7 +29780) = r0
2091: (63) *(u32 *)(r7 +29784) = r0
2092: (63) *(u32 *)(r7 +29788) = r0
2093: (63) *(u32 *)(r7 +29792) = r0
2094: (63) *(u32 *)(r7 +29796) = r0
2095: (63) *(u32 *)(r7 +29800) = r0
2096: (63) *(u32 *)(r7 +29804) = r0
2097: (63) *(u32 *)(r7 +29808) = r0
2098: (63) *(u32 *)(r7 +29812) = r0
// overwrite scalar with dummy pointer; same as before, also including the
// sanitation store with 0 from the current mitigation by the verifier.
2099: (7a) *(u64 *)(r10 -16) = 0 | /both/ are now slow stores here
2100: (7b) *(u64 *)(r10 -16) = r7 | since store unit is still busy.
// load from stack intended to bypass stores.
2101: (79) r2 = *(u64 *)(r10 -16)
2102: (71) r3 = *(u8 *)(r2 +0)
// leak r3
[...]
Looking at the CPU microarchitecture, the scheduler might issue loads (such
as seen in line 2101) before stores (line 2099,2100) because the load execution
units become available while the store execution unit is still busy with the
sequence of dummy stores (line 2069-2098). And so the load may use the prior
stored scalar from r2 at address r10 -16 for speculation. The updated attack
may work less reliable on CPU microarchitectures where loads and stores share
execution resources.
This concludes that the sanitizing with zero stores from af86ca4e30 ("bpf:
Prevent memory disambiguation attack") is insufficient. Moreover, the detection
of stack reuse from af86ca4e30 where previously data (STACK_MISC) has been
written to a given stack slot where a pointer value is now to be stored does
not have sufficient coverage as precondition for the mitigation either; for
several reasons outlined as follows:
1) Stack content from prior program runs could still be preserved and is
therefore not "random", best example is to split a speculative store
bypass attack between tail calls, program A would prepare and store the
oob address at a given stack slot and then tail call into program B which
does the "slow" store of a pointer to the stack with subsequent "fast"
read. From program B PoV such stack slot type is STACK_INVALID, and
therefore also must be subject to mitigation.
2) The STACK_SPILL must not be coupled to register_is_const(&stack->spilled_ptr)
condition, for example, the previous content of that memory location could
also be a pointer to map or map value. Without the fix, a speculative
store bypass is not mitigated in such precondition and can then lead to
a type confusion in the speculative domain leaking kernel memory near
these pointer types.
While brainstorming on various alternative mitigation possibilities, we also
stumbled upon a retrospective from Chrome developers [0]:
[...] For variant 4, we implemented a mitigation to zero the unused memory
of the heap prior to allocation, which cost about 1% when done concurrently
and 4% for scavenging. Variant 4 defeats everything we could think of. We
explored more mitigations for variant 4 but the threat proved to be more
pervasive and dangerous than we anticipated. For example, stack slots used
by the register allocator in the optimizing compiler could be subject to
type confusion, leading to pointer crafting. Mitigating type confusion for
stack slots alone would have required a complete redesign of the backend of
the optimizing compiler, perhaps man years of work, without a guarantee of
completeness. [...]
From BPF side, the problem space is reduced, however, options are rather
limited. One idea that has been explored was to xor-obfuscate pointer spills
to the BPF stack:
[...]
// preoccupy the CPU store port by running sequence of dummy stores.
[...]
2106: (63) *(u32 *)(r7 +29796) = r0
2107: (63) *(u32 *)(r7 +29800) = r0
2108: (63) *(u32 *)(r7 +29804) = r0
2109: (63) *(u32 *)(r7 +29808) = r0
2110: (63) *(u32 *)(r7 +29812) = r0
// overwrite scalar with dummy pointer; xored with random 'secret' value
// of 943576462 before store ...
2111: (b4) w11 = 943576462
2112: (af) r11 ^= r7
2113: (7b) *(u64 *)(r10 -16) = r11
2114: (79) r11 = *(u64 *)(r10 -16)
2115: (b4) w2 = 943576462
2116: (af) r2 ^= r11
// ... and restored with the same 'secret' value with the help of AX reg.
2117: (71) r3 = *(u8 *)(r2 +0)
[...]
While the above would not prevent speculation, it would make data leakage
infeasible by directing it to random locations. In order to be effective
and prevent type confusion under speculation, such random secret would have
to be regenerated for each store. The additional complexity involved for a
tracking mechanism that prevents jumps such that restoring spilled pointers
would not get corrupted is not worth the gain for unprivileged. Hence, the
fix in here eventually opted for emitting a non-public BPF_ST | BPF_NOSPEC
instruction which the x86 JIT translates into a lfence opcode. Inserting the
latter in between the store and load instruction is one of the mitigations
options [1]. The x86 instruction manual notes:
[...] An LFENCE that follows an instruction that stores to memory might
complete before the data being stored have become globally visible. [...]
The latter meaning that the preceding store instruction finished execution
and the store is at minimum guaranteed to be in the CPU's store queue, but
it's not guaranteed to be in that CPU's L1 cache at that point (globally
visible). The latter would only be guaranteed via sfence. So the load which
is guaranteed to execute after the lfence for that local CPU would have to
rely on store-to-load forwarding. [2], in section 2.3 on store buffers says:
[...] For every store operation that is added to the ROB, an entry is
allocated in the store buffer. This entry requires both the virtual and
physical address of the target. Only if there is no free entry in the store
buffer, the frontend stalls until there is an empty slot available in the
store buffer again. Otherwise, the CPU can immediately continue adding
subsequent instructions to the ROB and execute them out of order. On Intel
CPUs, the store buffer has up to 56 entries. [...]
One small upside on the fix is that it lifts constraints from af86ca4e30
where the sanitize_stack_off relative to r10 must be the same when coming
from different paths. The BPF_ST | BPF_NOSPEC gets emitted after a BPF_STX
or BPF_ST instruction. This happens either when we store a pointer or data
value to the BPF stack for the first time, or upon later pointer spills.
The former needs to be enforced since otherwise stale stack data could be
leaked under speculation as outlined earlier. For non-x86 JITs the BPF_ST |
BPF_NOSPEC mapping is currently optimized away, but others could emit a
speculation barrier as well if necessary. For real-world unprivileged
programs e.g. generated by LLVM, pointer spill/fill is only generated upon
register pressure and LLVM only tries to do that for pointers which are not
used often. The program main impact will be the initial BPF_ST | BPF_NOSPEC
sanitation for the STACK_INVALID case when the first write to a stack slot
occurs e.g. upon map lookup. In future we might refine ways to mitigate
the latter cost.
[0] https://arxiv.org/pdf/1902.05178.pdf
[1] https://msrc-blog.microsoft.com/2018/05/21/analysis-and-mitigation-of-speculative-store-bypass-cve-2018-3639/
[2] https://arxiv.org/pdf/1905.05725.pdf
Fixes: af86ca4e30 ("bpf: Prevent memory disambiguation attack")
Fixes: f7cf25b202 ("bpf: track spill/fill of constants")
Co-developed-by: Piotr Krysiuk <piotras@gmail.com>
Co-developed-by: Benedict Schlueter <benedict.schlueter@rub.de>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Piotr Krysiuk <piotras@gmail.com>
Signed-off-by: Benedict Schlueter <benedict.schlueter@rub.de>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f5e81d1117 ]
In case of JITs, each of the JIT backends compiles the BPF nospec instruction
/either/ to a machine instruction which emits a speculation barrier /or/ to
/no/ machine instruction in case the underlying architecture is not affected
by Speculative Store Bypass or has different mitigations in place already.
This covers both x86 and (implicitly) arm64: In case of x86, we use 'lfence'
instruction for mitigation. In case of arm64, we rely on the firmware mitigation
as controlled via the ssbd kernel parameter. Whenever the mitigation is enabled,
it works for all of the kernel code with no need to provide any additional
instructions here (hence only comment in arm64 JIT). Other archs can follow
as needed. The BPF nospec instruction is specifically targeting Spectre v4
since i) we don't use a serialization barrier for the Spectre v1 case, and
ii) mitigation instructions for v1 and v4 might be different on some archs.
The BPF nospec is required for a future commit, where the BPF verifier does
annotate intermediate BPF programs with speculation barriers.
Co-developed-by: Piotr Krysiuk <piotras@gmail.com>
Co-developed-by: Benedict Schlueter <benedict.schlueter@rub.de>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Piotr Krysiuk <piotras@gmail.com>
Signed-off-by: Benedict Schlueter <benedict.schlueter@rub.de>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 89fb62fde3 ]
Replace pci_enable_device() with pcim_enable_device(),
pci_disable_device() and pci_release_regions() will be
called in release automatically.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 76a16be07b ]
Replace pci_enable_device() with pcim_enable_device(),
pci_disable_device() and pci_release_regions() will be
called in release automatically.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 340e845738 ]
blkdev_get_no_open acquires a reference to the block_device through
the block device inode and then tries to acquire a device model
reference to the gendisk. But at this point the disk migh already
be freed (although the race is free). Fix this by only freeing the
gendisk from the whole device bdevs ->free_inode callback as well.
Fixes: 22ae8ce8b8 ("block: simplify bdev/disk lookup in blkdev_get")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20210722075402.983367-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b1c2f6312c ]
The result of __dev_get_by_index() is not checked for NULL and then gets
dereferenced immediately.
Also, __dev_get_by_index() must be called while holding either RTNL lock
or @dev_base_lock, which isn't satisfied by mlx5e_hairpin_get_mdev() or
its callers. This makes the underlying hlist_for_each_entry() loop not
safe, and can have adverse effects in itself.
Fix by using dev_get_by_index() and handling nullptr return value when
ifindex device is not found. Update mlx5e_hairpin_get_mdev() callers to
check for possible PTR_ERR() result.
Fixes: 77ab67b7f0 ("net/mlx5e: Basic setup of hairpin object")
Addresses-Coverity: ("Dereference null return value")
Signed-off-by: Dima Chumak <dchumak@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7f331bf0f0 ]
When fw_fatal reporter reports an error, the firmware in not responding.
Unload the device to ensure that the driver closes all its resources,
even if recovery is not due (user disabled auto-recovery or reporter is
in grace period). On successful recovery the device is loaded back up.
Fixes: b3bd076f75 ("net/mlx5: Report devlink health on FW fatal issues")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 678b1ae1af ]
Set the correct pci-device pointer to the ptp-RQ. This allows access to
dma_mask and avoids allocation request with wrong pci-device.
Fixes: a099da8ffc ("net/mlx5e: Add RQ to PTP channel")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9841d58f35 ]
If a feature flag is only present in features, but not in hw_features,
the user can't reset it. Although hw_features may contain NETIF_F_HW_TC
by the point where the driver checks whether HTB offload is supported,
this flag is controlled by another condition that may not hold. Set it
explicitly to make sure the user can disable it.
Fixes: 214baf2287 ("net/mlx5e: Support HTB offload")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e2351e5170 ]
When HW aggregates packets for an LRO session, it writes the payload
of two consecutive packets of a flow contiguously, so that they usually
share a cacheline.
The first byte of a packet's payload is written immediately after
the last byte of the preceding packet.
In this flow, there are two consecutive write requests to the shared
cacheline:
1. Regular write for the earlier packet.
2. Read-modify-write for the following packet.
In case of relaxed-ordering on, these two writes might be re-ordered.
Using the end padding optimization (to avoid partial write for the last
cacheline of a packet) becomes problematic if the two writes occur
out-of-order, as the padding would overwrite payload that belongs to
the following packet, causing data corruption.
Avoid this by disabling the end padding optimization when both
LRO and relaxed-ordering are enabled.
Fixes: 17347d5430 ("net/mlx5e: Add support for PCI relaxed ordering")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit dd3fddb827 ]
This is the same check as LAG mode checks if to enable lag.
This will fix adding peer miss rules if lag is not supported
and even an incorrect rules in socket direct mode.
Also fix the incorrect comment on mlx5_get_next_phys_dev() as flow #1
doesn't exists.
Fixes: ac004b8321 ("net/mlx5e: E-Switch, Add peer miss rules")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c671972534 ]
Destination vport vhca id is valid flag is set only merged eswitch isn't supported.
Change destination vport vhca id value to be set also only when merged eswitch
is supported.
Fixes: e4ad91f23f ("net/mlx5e: Split offloaded eswitch TC rules for port mirroring")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8b54874ef1 ]
Fix a bug when flow table is created in priority that already
has other flow tables as shown in the below diagram.
If the new flow table (FT-B) has the lowest level in the priority,
we need to connect the flow tables from the previous priority (p0)
to this new table. In addition when this flow table is destroyed
(FT-B), we need to connect the flow tables from the previous
priority (p0) to the next level flow table (FT-C) in the same
priority of the destroyed table (if exists).
---------
|root_ns|
---------
|
--------------------------------
| | |
---------- ---------- ---------
|p(prio)-x| | p-y | | p-n |
---------- ---------- ---------
| |
---------------- ------------------
|ns(e.g bypass)| |ns(e.g. kernel) |
---------------- ------------------
| | |
------- ------ ----
| p0 | | p1 | |p2|
------- ------ ----
| | \
-------- ------- ------
| FT-A | |FT-B | |FT-C|
-------- ------- ------
Fixes: f90edfd279 ("net/mlx5_core: Connect flow tables")
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 343597d558 ]
We don't want strparser to run and pass skbs into skmsg handlers when
the psock is null. We just sk_drop them in this case. When removing
a live socket from map it means extra drops that we do not need to
incur. Move the zap below strparser close to avoid this condition.
This way we stop the stream parser first stopping it from processing
packets and then delete the psock.
Fixes: a136678c0b ("bpf: sk_msg, zap ingress queue on psock down")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20210727160500.1713554-2-john.fastabend@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7591c532b8 ]
DP cable should always connect to DPU during the entire PHY compliance
testing run. Since DP PHY compliance test is executed at irq_hpd event
context, dp_ctrl_off_link_stream() should be used instead of dp_ctrl_off().
dp_ctrl_off() is used for unplug event which is triggered when DP cable is
dis connected.
Changes in V2:
-- add fixes statement
Fixes: f21c8a276c ("drm/msm/dp: handle irq_hpd with sink_count = 0 correctly")
Signed-off-by: Kuogee Hsieh <khsieh@codeaurora.org>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Link: https://lore.kernel.org/r/1626191647-13901-2-git-send-email-khsieh@codeaurora.org
Signed-off-by: Rob Clark <robdclark@chromium.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c7c9d2102c ]
Syzbot reported skb_over_panic() in llc_pdu_init_as_xid_cmd(). The
problem was in wrong LCC header manipulations.
Syzbot's reproducer tries to send XID packet. llc_ui_sendmsg() is
doing following steps:
1. skb allocation with size = len + header size
len is passed from userpace and header size
is 3 since addr->sllc_xid is set.
2. skb_reserve() for header_len = 3
3. filling all other space with memcpy_from_msg()
Ok, at this moment we have fully loaded skb, only headers needs to be
filled.
Then code comes to llc_sap_action_send_xid_c(). This function pushes 3
bytes for LLC PDU header and initializes it. Then comes
llc_pdu_init_as_xid_cmd(). It initalizes next 3 bytes *AFTER* LLC PDU
header and call skb_push(skb, 3). This looks wrong for 2 reasons:
1. Bytes rigth after LLC header are user data, so this function
was overwriting payload.
2. skb_push(skb, 3) call can cause skb_over_panic() since
all free space was filled in llc_ui_sendmsg(). (This can
happen is user passed 686 len: 686 + 14 (eth header) + 3 (LLC
header) = 703. SKB_DATA_ALIGN(703) = 704)
So, in this patch I added 2 new private constansts: LLC_PDU_TYPE_U_XID
and LLC_PDU_LEN_U_XID. LLC_PDU_LEN_U_XID is used to correctly reserve
header size to handle LLC + XID case. LLC_PDU_TYPE_U_XID is used by
llc_pdu_header_init() function to push 6 bytes instead of 3. And finally
I removed skb_push() call from llc_pdu_init_as_xid_cmd().
This changes should not affect other parts of LLC, since after
all steps we just transmit buffer.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reported-and-tested-by: syzbot+5e5a981ad7cc54c4b2b4@syzkaller.appspotmail.com
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7e4960b3d6 ]
The error code is missing in this code scenario, add the error code
'-EINVAL' to the return value 'err'.
Eliminate the follow smatch warning:
drivers/net/ethernet/mellanox/mlx4/main.c:3538 mlx4_load_one() warn:
missing error code 'err'.
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Fixes: 7ae0e400cd ("net/mlx4_core: Flexible (asymmetric) allocation of EQs and MSI-X vectors for PF/VFs")
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4c85e57575 ]
Avoid configure backpressure for LBK links as they
don't support it and enable lmacs before configuration
pause frames.
Fixes: 75f3627099 ("octeontx2-pf: Support to enable/disable pause frames via ethtool")
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 69f0aeb13b ]
In the existing code while changing the number of TX/RX
queues using ethtool the PF/VF interface resources are
freed and reallocated (otx2_stop and otx2_open is called)
if the device is in running state. If any resource allocation
fails in otx2_open, driver free already allocated resources
and return. But again, when the number of queues changes
as the device state still running oxt2_stop is called.
In which we try to free already freed resources leading
to driver crash.
This patch fixes the issue by setting the INTF_DOWN flag on
error and free the resources in otx2_stop only if the flag is
not set.
Fixes: 50fe6c02e5 ("octeontx2-pf: Register and handle link notifications")
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <Sunil.Goutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3cf4375a09 ]
One skb's skb_shinfo frags are not writable, and they can be shared with
other skbs' like by pskb_copy(). To write the frags may cause other skb's
data crash.
So before doing en/decryption, skb_cow_data() should always be called for
a cloned or nonlinear skb if req dst is using the same sg as req src.
While at it, the likely branch can be removed, as it will be covered
by skb_cow_data().
Note that esp_input() has the same issue, and I will fix it in another
patch. tipc_aead_encrypt() doesn't have this issue, as it only processes
linear data in the unlikely branch.
Fixes: fc1b6d6de2 ("tipc: introduce TIPC encryption & authentication")
Reported-by: Shuang Li <shuali@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ef68a71796 ]
In case an error occurred in the IRQ handler, the chip status is
dumped via devcoredump and all IRQs are disabled, but the chip stays
powered for further analysis.
The chip is in an undefined state and will not receive any CAN frames,
so shut down the timestamping worker, which reads the TBC register
regularly, too. This avoids any CRC read error messages if there is a
communication problem with the chip.
Fixes: efd8d98dfb ("can: mcp251xfd: add HW timestamp infrastructure")
Link: https://lore.kernel.org/r/20210724155131.471303-1-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f07f9815b7 ]
Be sure to count the csum_none cases when csum offload is
enabled.
Fixes: 0f3154e6bc ("ionic: Add Tx and Rx handling")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 76ed8a4a00 ]
We need to count the correct Tx and/or Rx packets for dynamic
interrupt moderation, depending on which we're processing on
the queue interrupt.
Fixes: 04a834592b ("ionic: dynamic interrupt moderation")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a6ff85e0a2 ]
Move the interrupt coalesce value update out of the napi
thread and into the dim_work thread and set it only when it
has actually changed.
Fixes: 04a834592b ("ionic: dynamic interrupt moderation")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f79eef711e ]
If PTP configuration is attempted on ports that don't support
it, such as VF ports, the driver will return an error status
-95, or EOPNOSUPP and print an error message
enp98s0: hwstamp set failed: -95
Because some daemons can retry every few seconds, this can end
up filling the dmesg log and pushing out other more useful
messages.
We can catch this issue earlier in our handling and return
the error without a log message.
Fixes: 829600ce5e ("ionic: add ts_config replay")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6840e17b8e ]
Move the bulk of the code from ionic_set_rx_mode(), which
can be called from atomic context, into ionic_lif_rx_mode()
which is a safe context.
A call from the stack will get pushed off into a work thread,
but it is also possible to simultaneously have a call driven
by a queue reconfig request from an ethtool command or fw
recovery event. We add a mutex around the rx_mode work to be
sure they don't collide.
Fixes: 81dbc24147 ("ionic: change set_rx_mode from_ndo to can_sleep")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 52f3456a96 ]
Syzbot reported memory leak in qrtr. The problem was in unputted
struct sock. qrtr_local_enqueue() function calls qrtr_port_lookup()
which takes sock reference if port was found. Then there is the following
check:
if (!ipc || &ipc->sk == skb->sk) {
...
return -ENODEV;
}
Since we should drop the reference before returning from this function and
ipc can be non-NULL inside this if, we should add qrtr_port_put() inside
this if.
The similar corner case is in qrtr_endpoint_post() as Manivannan
reported. In case of sock_queue_rcv_skb() failure we need to put
port reference to avoid leaking struct sock pointer.
Fixes: e04df98adf ("net: qrtr: Remove receive worker")
Fixes: bdabad3e36 ("net: Add Qualcomm IPC router")
Reported-and-tested-by: syzbot+35a511c72ea7356cdcf3@syzkaller.appspotmail.com
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3ce6e1f662 ]
Commit 6cc8e74308 ("loop: scale loop device by introducing per
device lock") re-opened a race window for NULL pointer dereference at
loop_validate_file() where commit 310ca162d7 ("block/loop: Use
global lock for ioctl() operation.") has closed.
Although we need to guarantee that other loop devices will not change
during traversal, we can't take remote "struct loop_device"->lo_mutex
inside loop_validate_file() in order to avoid AB-BA deadlock. Therefore,
introduce a global lock dedicated for loop_validate_file() which is
conditionally taken before local "struct loop_device"->lo_mutex is taken.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Fixes: 6cc8e74308 ("loop: scale loop device by introducing per device lock")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c92c74131a ]
The blamed commit modified the driver to accept the addition of VID 0
without doing anything, but deleting that VID still fails:
[ 32.080780] mv88e6085 d0032004.mdio-mii:10 lan8: failed to kill vid 0081/0
Modify mv88e6xxx_port_vlan_leave() to do the same thing as the addition.
Fixes: b8b79c414e ("net: dsa: mv88e6xxx: Fix adding vlan 0")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 227adfb2b1 ]
In cases where the header straight after the tunnel header was
another ethernet header (TEB), instead of the network header,
the ECN decapsulation code would treat the ethernet header as if
it was an IP header, resulting in mishandling and possible
wrong drops or corruption of the IP header.
In this case, ECT(1) is sent, so IP_ECN_decapsulate tries to copy it to the
inner IPv4 header, and correct its checksum.
The offset of the ECT bits in an IPv4 header corresponds to the
lower 2 bits of the second octet of the destination MAC address
in the ethernet header.
The IPv4 checksum corresponds to end of the source address.
In order to reproduce:
$ ip netns add A
$ ip netns add B
$ ip -n A link add _v0 type veth peer name _v1 netns B
$ ip -n A link set _v0 up
$ ip -n A addr add dev _v0 10.254.3.1/24
$ ip -n A route add default dev _v0 scope global
$ ip -n B link set _v1 up
$ ip -n B addr add dev _v1 10.254.1.6/24
$ ip -n B route add default dev _v1 scope global
$ ip -n B link add gre1 type gretap local 10.254.1.6 remote 10.254.3.1 key 0x49000000
$ ip -n B link set gre1 up
# Now send an IPv4/GRE/Eth/IPv4 frame where the outer header has ECT(1),
# and the inner header has no ECT bits set:
$ cat send_pkt.py
#!/usr/bin/env python3
from scapy.all import *
pkt = IP(b'E\x01\x00\xa7\x00\x00\x00\x00@/`%\n\xfe\x03\x01\n\xfe\x01\x06 \x00eXI\x00'
b'\x00\x00\x18\xbe\x92\xa0\xee&\x18\xb0\x92\xa0l&\x08\x00E\x00\x00}\x8b\x85'
b'@\x00\x01\x01\xe4\xf2\x82\x82\x82\x01\x82\x82\x82\x02\x08\x00d\x11\xa6\xeb'
b'3\x1e\x1e\\xf3\\xf7`\x00\x00\x00\x00ZN\x00\x00\x00\x00\x00\x00\x10\x11\x12'
b'\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./01234'
b'56789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ')
send(pkt)
$ sudo ip netns exec B tcpdump -neqlllvi gre1 icmp & ; sleep 1
$ sudo ip netns exec A python3 send_pkt.py
In the original packet, the source/destinatio MAC addresses are
dst=18:be:92:a0:ee:26 src=18:b0:92:a0:6c:26
In the received packet, they are
dst=18:bd:92:a0:ee:26 src=18:b0:92:a0:6c:27
Thanks to Lahav Schlesinger <lschlesinger@drivenets.com> and Isaac Garzon <isaac@speed.io>
for helping me pinpoint the origin.
Fixes: b723748750 ("tunnel: Propagate ECT(1) when decapsulating as recommended by RFC6040")
Cc: David S. Miller <davem@davemloft.net>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: David Ahern <dsahern@kernel.org>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Gilad Naaman <gnaaman@drivenets.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d237a7f117 ]
The release_sock() is blocking function, it would change the state
after sleeping. In order to evaluate the stated condition outside
the socket lock context, switch to use wait_woken() instead.
Fixes: 6398e23cdb ("tipc: standardize accept routine")
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f8dd60de19 ]
For implicit-connect, when it's either SYN- or SYN+, an ACK should
be sent back to the client immediately. It's not appropriate for
the client to enter established state only after receiving data
from the server.
On client side, after the SYN is sent out, tipc_wait_for_connect()
should be called to wait for the ACK if timeout is set.
This patch also restricts __tipc_sendstream() to call __sendmsg()
only when it's in TIPC_OPEN state, so that the client can program
in a single loop doing both connecting and data sending like:
for (...)
sendmsg(dest, buf);
This makes the implicit-connect more implicit.
Fixes: b97bf3fd8f ("[TIPC] Initial merge")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 89ec1f0886 ]
In SW DCB mode the packets sent receive incorrect UP tags. They are
constructed correctly and put into tx_ring, but UP is later remapped by
HW on the basis of TCTUPR register contents according to Tx queue
selected, and BW used is consistent with the new UP values. This is
caused by Tx queue selection in kernel not taking into account DCB
configuration. This patch fixes the issue by implementing the
ndo_select_queue NDO callback.
Fixes: fd0a05ce74 ("i40e: transmit, receive, and NAPI")
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Tested-by: Imam Hassan Reza Biswas <imam.hassan.reza.biswas@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 71d6fdba4b ]
Make warning meaningful for the user.
Previously the trace:
"Starting FW LLDP agent failed: error: I40E_ERR_ADMIN_QUEUE_ERROR, I40E_AQ_RC_EAGAIN"
was produced when user tried to start Firmware LLDP agent,
just after it was stopped with sequence:
ethtool --set-priv-flags <dev> disable-fw-lldp on
ethtool --set-priv-flags <dev> disable-fw-lldp off
(without any delay between the commands)
At that point the firmware is still processing stop command, the behavior
is expected.
Fixes: c1041d0704 ("i40e: Missing response checks in driver when starting/stopping FW LLDP")
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Tested-by: Imam Hassan Reza Biswas <imam.hassan.reza.biswas@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 65662a8dcd ]
Correct the message flow between driver and firmware when disabling
queues.
Previously in case of PF reset (due to required reinit after reconfig),
the error like: "VSI seid 397 Tx ring 60 disable timeout" could show up
occasionally. The error was not a real issue of hardware or firmware,
it was caused by wrong sequence of messages invoked by the driver.
Fixes: 41c445ff0f ("i40e: main driver core")
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 30a56a2b88 ]
In case the entry is evicted via garbage collection there is
delay between the timeout value and the eviction event.
This adjusts the stop value based on how much time has passed.
Fixes: b87a2f9199 ("netfilter: conntrack: add gc worker to remove timed-out entries")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d6371c76e2 ]
We got the following UBSAN report on one of our testing machines:
================================================================================
UBSAN: array-index-out-of-bounds in kernel/bpf/syscall.c:2389:24
index 6 is out of range for type 'char *[6]'
CPU: 43 PID: 930921 Comm: systemd-coredum Tainted: G O 5.10.48-cloudflare-kasan-2021.7.0 #1
Hardware name: <snip>
Call Trace:
dump_stack+0x7d/0xa3
ubsan_epilogue+0x5/0x40
__ubsan_handle_out_of_bounds.cold+0x43/0x48
? seq_printf+0x17d/0x250
bpf_link_show_fdinfo+0x329/0x380
? bpf_map_value_size+0xe0/0xe0
? put_files_struct+0x20/0x2d0
? __kasan_kmalloc.constprop.0+0xc2/0xd0
seq_show+0x3f7/0x540
seq_read_iter+0x3f8/0x1040
seq_read+0x329/0x500
? seq_read_iter+0x1040/0x1040
? __fsnotify_parent+0x80/0x820
? __fsnotify_update_child_dentry_flags+0x380/0x380
vfs_read+0x123/0x460
ksys_read+0xed/0x1c0
? __x64_sys_pwrite64+0x1f0/0x1f0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xa9
<snip>
================================================================================
================================================================================
UBSAN: object-size-mismatch in kernel/bpf/syscall.c:2384:2
From the report, we can infer that some array access in bpf_link_show_fdinfo at index 6
is out of bounds. The obvious candidate is bpf_link_type_strs[BPF_LINK_TYPE_XDP] with
BPF_LINK_TYPE_XDP == 6. It turns out that BPF_LINK_TYPE_XDP is missing from bpf_types.h
and therefore doesn't have an entry in bpf_link_type_strs:
pos: 0
flags: 02000000
mnt_id: 13
link_type: (null)
link_id: 4
prog_tag: bcf7977d3b93787c
prog_id: 4
ifindex: 1
Fixes: aa8d3a716b ("bpf, xdp: Add bpf_link-based XDP attachment API")
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210719085134.43325-2-lmb@cloudflare.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit cfbe3650dd ]
In nf_tables_commit, if nf_tables_commit_audit_alloc fails, it does not
free the adp variable.
Fix this by adding nf_tables_commit_audit_free which frees
the linked list with the head node adl.
backtrace:
kmalloc include/linux/slab.h:591 [inline]
kzalloc include/linux/slab.h:721 [inline]
nf_tables_commit_audit_alloc net/netfilter/nf_tables_api.c:8439 [inline]
nf_tables_commit+0x16e/0x1760 net/netfilter/nf_tables_api.c:8508
nfnetlink_rcv_batch+0x512/0xa80 net/netfilter/nfnetlink.c:562
nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:634 [inline]
nfnetlink_rcv+0x1fa/0x220 net/netfilter/nfnetlink.c:652
netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline]
netlink_unicast+0x2c7/0x3e0 net/netlink/af_netlink.c:1340
netlink_sendmsg+0x36b/0x6b0 net/netlink/af_netlink.c:1929
sock_sendmsg_nosec net/socket.c:702 [inline]
sock_sendmsg+0x56/0x80 net/socket.c:722
Reported-by: syzbot <syzkaller@googlegroups.com>
Reported-by: kernel test robot <lkp@intel.com>
Fixes: c520292f29 ("audit: log nftables configuration change events once per table")
Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4c06d35dfe ]
It was lately understood that the current mechanism available in the
driver to get SMU firmware info works only on internal SMU builds and
there is a separate way to get all the SMU logging counters (addressed
in the next patch). Hence remove all the smu info shown via debugfs as it
is no more useful.
Fixes: 156ec4731c ("platform/x86: amd-pmc: Add AMD platform support for S2Idle")
Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20210629084803.248498-3-Shyam-sundar.S-k@amd.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 95e1b60f8d ]
The protocol to submit a job request to SMU is to wait for
AMD_PMC_REGISTER_RESPONSE to return 1,meaning SMU is ready to take
requests. PMC driver has to make sure that the response code is always
AMD_PMC_RESULT_OK before making any command submissions.
When we submit a message to SMU, we have to wait until it processes
the request. Adding a read_poll_timeout() check as this was missing in
the existing code.
Also, add a mutex to protect amd_pmc_send_cmd() calls to SMU.
Fixes: 156ec4731c ("platform/x86: amd-pmc: Add AMD platform support for S2Idle")
Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Acked-by: Raul E Rangel <rrangel@chromium.org>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20210629084803.248498-2-Shyam-sundar.S-k@amd.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit f9a5c358c8 upstream.
When we exceed the limit of BSS entries, this function will free the
new entry, however, at this time, it is the last door to access the
inputed ies, so these ies will be unreferenced objects and cause memory
leak.
Therefore we should free its ies before deallocating the new entry, beside
of dropping it from hidden_list.
Signed-off-by: Nguyen Dinh Phi <phind.uet@gmail.com>
Link: https://lore.kernel.org/r/20210628132334.851095-1-phind.uet@gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ef04688871 upstream.
Some setups, like SCSI, can throw spurious -EAGAIN off the softirq
completion path. Normally we expect this to happen inline as part
of submission, but apparently SCSI has a weird corner case where it
can happen as part of normal completions.
This should be solved by having the -EAGAIN bubble back up the stack
as part of submission, but previous attempts at this failed and we're
not just quite there yet. Instead we currently use REQ_F_REISSUE to
handle this case.
For now, catch it in io_rw_should_reissue() and prevent a reissue
from a bogus path.
Cc: stable@vger.kernel.org
Reported-by: Fabian Ebner <f.ebner@proxmox.com>
Tested-by: Fabian Ebner <f.ebner@proxmox.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5e7b30d24a upstream.
There is a use after free memory corruption during module exit:
- nfcsim_exit()
- nfcsim_device_free(dev0)
- nfc_digital_unregister_device()
This iterates over command queue and frees all commands,
- dev->up = false
- nfcsim_link_shutdown()
- nfcsim_link_recv_wake()
This wakes the sleeping thread nfcsim_link_recv_skb().
- nfcsim_link_recv_skb()
Wake from wait_event_interruptible_timeout(),
call directly the deb->cb callback even though (dev->up == false),
- digital_send_cmd_complete()
Dereference of "struct digital_cmd" cmd which was freed earlier by
nfc_digital_unregister_device().
This causes memory corruption shortly after (with unrelated stack
trace):
nfc nfc0: NFC: nfcsim_recv_wq: Device is down
llcp: nfc_llcp_recv: err -19
nfc nfc1: NFC: nfcsim_recv_wq: Device is down
BUG: unable to handle page fault for address: ffffffffffffffed
Call Trace:
fsnotify+0x54b/0x5c0
__fsnotify_parent+0x1fe/0x300
? vfs_write+0x27c/0x390
vfs_write+0x27c/0x390
ksys_write+0x63/0xe0
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
KASAN report:
BUG: KASAN: use-after-free in digital_send_cmd_complete+0x16/0x50
Write of size 8 at addr ffff88800a05f720 by task kworker/0:2/71
Workqueue: events nfcsim_recv_wq [nfcsim]
Call Trace:
dump_stack_lvl+0x45/0x59
print_address_description.constprop.0+0x21/0x140
? digital_send_cmd_complete+0x16/0x50
? digital_send_cmd_complete+0x16/0x50
kasan_report.cold+0x7f/0x11b
? digital_send_cmd_complete+0x16/0x50
? digital_dep_link_down+0x60/0x60
digital_send_cmd_complete+0x16/0x50
nfcsim_recv_wq+0x38f/0x3d5 [nfcsim]
? nfcsim_in_send_cmd+0x4a/0x4a [nfcsim]
? lock_is_held_type+0x98/0x110
? finish_wait+0x110/0x110
? rcu_read_lock_sched_held+0x9c/0xd0
? rcu_read_lock_bh_held+0xb0/0xb0
? lockdep_hardirqs_on_prepare+0x12e/0x1f0
This flow of calling digital_send_cmd_complete() callback on driver exit
is specific to nfcsim which implements reading and sending work queues.
Since the NFC digital device was unregistered, the callback should not
be called.
Fixes: 204bddcb50 ("NFC: nfcsim: Make use of the Digital layer")
Cc: <stable@vger.kernel.org>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5ab189cf3a upstream.
iocg_wake_fn() open-codes wait_queue_entry removal and wakeup because it
wants the wq_entry to be always removed whether it ended up waking the
task or not. finish_wait() tests whether wq_entry needs removal without
grabbing the wait_queue lock and expects the waker to use
list_del_init_careful() after all waking operations are complete, which
iocg_wake_fn() didn't do. The operation order was wrong and the regular
list_del_init() was used.
The result is that if a waiter wakes up racing the waker, it can free pop
the wq_entry off stack before the waker is still looking at it, which can
lead to a backtrace like the following.
[7312084.588951] general protection fault, probably for non-canonical address 0x586bf4005b2b88: 0000 [#1] SMP
...
[7312084.647079] RIP: 0010:queued_spin_lock_slowpath+0x171/0x1b0
...
[7312084.858314] Call Trace:
[7312084.863548] _raw_spin_lock_irqsave+0x22/0x30
[7312084.872605] try_to_wake_up+0x4c/0x4f0
[7312084.880444] iocg_wake_fn+0x71/0x80
[7312084.887763] __wake_up_common+0x71/0x140
[7312084.895951] iocg_kick_waitq+0xe8/0x2b0
[7312084.903964] ioc_rqos_throttle+0x275/0x650
[7312084.922423] __rq_qos_throttle+0x20/0x30
[7312084.930608] blk_mq_make_request+0x120/0x650
[7312084.939490] generic_make_request+0xca/0x310
[7312084.957600] submit_bio+0x173/0x200
[7312084.981806] swap_readpage+0x15c/0x240
[7312084.989646] read_swap_cache_async+0x58/0x60
[7312084.998527] swap_cluster_readahead+0x201/0x320
[7312085.023432] swapin_readahead+0x2df/0x450
[7312085.040672] do_swap_page+0x52f/0x820
[7312085.058259] handle_mm_fault+0xa16/0x1420
[7312085.066620] do_page_fault+0x2c6/0x5c0
[7312085.074459] page_fault+0x2f/0x40
Fix it by switching to list_del_init_careful() and putting it at the end.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Rik van Riel <riel@surriel.com>
Fixes: 7caa47151a ("blkcg: implement blk-iocost")
Cc: stable@vger.kernel.org # v5.4+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d47255d3f8 upstream.
This reverts commit 4192f7b576.
It is not true (as stated in the reverted commit changelog) that we never
unmap the BAR on failure; it actually does happen properly on
amdgpu_driver_load_kms() -> amdgpu_driver_unload_kms() ->
amdgpu_device_fini() error path.
What's worse, this commit actually completely breaks resource freeing on
probe failure (like e.g. failure to load microcode), as
amdgpu_driver_unload_kms() notices adev->rmmio being NULL and bails too
early, leaving all the resources that'd normally be freed in
amdgpu_acpi_fini() and amdgpu_device_fini() still hanging around, leading
to all sorts of oopses when someone tries to, for example, access the
sysfs and procfs resources which are still around while the driver is
gone.
Fixes: 4192f7b576 ("drm/amdgpu: unmap register bar on device init failure")
Reported-by: Vojtech Pavlik <vojtech@ucw.cz>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6aade587d3 upstream.
In case when psp_init_asd_microcode() fails to load ASD microcode file,
psp_v12_0_init_microcode() tries to print the firmware filename that
failed to load before bailing out.
This is wrong because:
- the firmware filename it would want it print is an incorrect one as
psp_init_asd_microcode() and psp_v12_0_init_microcode() are loading
different filenames
- it tries to print fw_name, but that's not yet been initialized by that
time, so it prints random stack contents, e.g.
amdgpu 0000:04:00.0: Direct firmware load for amdgpu/renoir_asd.bin failed with error -2
amdgpu 0000:04:00.0: amdgpu: fail to initialize asd microcode
amdgpu 0000:04:00.0: amdgpu: psp v12.0: Failed to load firmware "\xfeTO\x8e\xff\xff"
Fix that by bailing out immediately, instead of priting the bogus error
message.
Reported-by: Vojtech Pavlik <vojtech@ucw.cz>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b53e041d8e upstream.
[Why]
We don't check DENTIST_DISPCLK_CHG_DONE to ensure dentist
display clockis updated to target value. In some scenarios with large
display clock margin, it will deliver unfinished display clock and cause
issues like display black screen.
[How]
Checking DENTIST_DISPCLK_CHG_DONE to ensure display clock
has been update to target value before driver do other clock related
actions.
Reviewed-by: Cyr Aric <aric.cyr@amd.com>
Acked-by: Solomon Chiu <solomon.chiu@amd.com>
Signed-off-by: Dale Zhao <dale.zhao@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6ca2350e11 upstream.
Commit 670e90924b ("HID: wacom: support named keys on older devices")
added support for sending named events from the soft buttons on the
24HDT and 27QHDT. In the process, however, it inadvertantly disabled the
touchscreen of the 24HDT and 27QHDT by default. The
`wacom_set_shared_values` function would normally enable touch by default
but because it checks the state of the non-shared `has_mute_touch_switch`
flag and `wacom_setup_touch_input_capabilities` sets the state of the
/shared/ version, touch ends up being disabled by default.
This patch sets the non-shared flag, letting `wacom_set_shared_values`
take care of copying the value over to the shared version and setting
the default touch state to "on".
Fixes: 670e90924b ("HID: wacom: support named keys on older devices")
CC: stable@vger.kernel.org # 5.4+
Signed-off-by: Jason Gerecke <jason.gerecke@wacom.com>
Reviewed-by: Ping Cheng <ping.cheng@wacom.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 640b7ea5f8 upstream.
The memory reserved by console/PALcode or non-volatile memory is not added
to memblock.memory.
Since commit fa3354e4ea (mm: free_area_init: use maximal zone PFNs rather
than zone sizes) the initialization of the memory map relies on the
accuracy of memblock.memory to properly calculate zone sizes. The holes in
memblock.memory caused by absent regions reserved by the firmware cause
incorrect initialization of struct pages which leads to BUG() during the
initial page freeing:
BUG: Bad page state in process swapper pfn:2ffc53
page:fffffc000ecf14c0 refcount:0 mapcount:1 mapping:0000000000000000 index:0x0
flags: 0x0()
raw: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
raw: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
page dumped because: nonzero mapcount
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 5.7.0-03841-gfa3354e4ea39-dirty #26
fffffc0001b5bd68 fffffc0001b5be80 fffffc00011cd148 fffffc000ecf14c0
fffffc00019803df fffffc0001b5be80 fffffc00011ce340 fffffc000ecf14c0
0000000000000000 fffffc0001b5be80 fffffc0001b482c0 fffffc00027d6618
fffffc00027da7d0 00000000002ff97a 0000000000000000 fffffc0001b5be80
fffffc00011d1abc fffffc000ecf14c0 fffffc0002d00000 fffffc0001b5be80
fffffc0001b2350c 0000000000300000 fffffc0001b48298 fffffc0001b482c0
Trace:
[<fffffc00011cd148>] bad_page+0x168/0x1b0
[<fffffc00011ce340>] free_pcp_prepare+0x1e0/0x290
[<fffffc00011d1abc>] free_unref_page+0x2c/0xa0
[<fffffc00014ee5f0>] cmp_ex_sort+0x0/0x30
[<fffffc00014ee5f0>] cmp_ex_sort+0x0/0x30
[<fffffc000101001c>] _stext+0x1c/0x20
Fix this by registering the reserved ranges in memblock.memory.
Link: https://lore.kernel.org/lkml/20210726192311.uffqnanxw3ac5wwi@ivybridge
Fixes: fa3354e4ea ("mm: free_area_init: use maximal zone PFNs rather than zone sizes")
Reported-by: Matt Turner <mattst88@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 928150fad4 upstream.
In esd_usb2_setup_rx_urbs() MAX_RX_URBS coherent buffers are allocated
and there is nothing, that frees them:
1) In callback function the urb is resubmitted and that's all
2) In disconnect function urbs are simply killed, but URB_FREE_BUFFER
is not set (see esd_usb2_setup_rx_urbs) and this flag cannot be used
with coherent buffers.
So, all allocated buffers should be freed with usb_free_coherent()
explicitly.
Side note: This code looks like a copy-paste of other can drivers. The
same patch was applied to mcba_usb driver and it works nice with real
hardware. There is no change in functionality, only clean-up code for
coherent buffers.
Fixes: 96d8e90382 ("can: Add driver for esd CAN-USB/2 device")
Link: https://lore.kernel.org/r/b31b096926dcb35998ad0271aac4b51770ca7cc8.1627404470.git.paskripkin@gmail.com
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9969e3c5f4 upstream.
In ems_usb_start() MAX_RX_URBS coherent buffers are allocated and
there is nothing, that frees them:
1) In callback function the urb is resubmitted and that's all
2) In disconnect function urbs are simply killed, but URB_FREE_BUFFER
is not set (see ems_usb_start) and this flag cannot be used with
coherent buffers.
So, all allocated buffers should be freed with usb_free_coherent()
explicitly.
Side note: This code looks like a copy-paste of other can drivers. The
same patch was applied to mcba_usb driver and it works nice with real
hardware. There is no change in functionality, only clean-up code for
coherent buffers.
Fixes: 702171adee ("ems_usb: Added support for EMS CPC-USB/ARM7 CAN/USB interface")
Link: https://lore.kernel.org/r/59aa9fbc9a8cbf9af2bbd2f61a659c480b415800.1627404470.git.paskripkin@gmail.com
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0e865f0c31 upstream.
In usb_8dev_start() MAX_RX_URBS coherent buffers are allocated and
there is nothing, that frees them:
1) In callback function the urb is resubmitted and that's all
2) In disconnect function urbs are simply killed, but URB_FREE_BUFFER
is not set (see usb_8dev_start) and this flag cannot be used with
coherent buffers.
So, all allocated buffers should be freed with usb_free_coherent()
explicitly.
Side note: This code looks like a copy-paste of other can drivers. The
same patch was applied to mcba_usb driver and it works nice with real
hardware. There is no change in functionality, only clean-up code for
coherent buffers.
Fixes: 0024d8ad16 ("can: usb_8dev: Add support for USB2CAN interface from 8 devices")
Link: https://lore.kernel.org/r/d39b458cd425a1cf7f512f340224e6e9563b07bd.1627404470.git.paskripkin@gmail.com
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fc43fb69a7 upstream.
Yasushi reported, that his Microchip CAN Analyzer stopped working
since commit 91c0255717 ("can: mcba_usb: fix memory leak in
mcba_usb"). The problem was in missing urb->transfer_dma
initialization.
In my previous patch to this driver I refactored mcba_usb_start() code
to avoid leaking usb coherent buffers. To archive it, I passed local
stack variable to usb_alloc_coherent() and then saved it to private
array to correctly free all coherent buffers on ->close() call. But I
forgot to initialize urb->transfer_dma with variable passed to
usb_alloc_coherent().
All of this was causing device to not work, since dma addr 0 is not
valid and following log can be found on bug report page, which points
exactly to problem described above.
| DMAR: [DMA Write] Request device [00:14.0] PASID ffffffff fault addr 0 [fault reason 05] PTE Write access is not set
Fixes: 91c0255717 ("can: mcba_usb: fix memory leak in mcba_usb")
Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=990850
Link: https://lore.kernel.org/r/20210725103630.23864-1-paskripkin@gmail.com
Cc: linux-stable <stable@vger.kernel.org>
Reported-by: Yasushi SHOJI <yasushi.shoji@gmail.com>
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Tested-by: Yasushi SHOJI <yashi@spacecubics.com>
[mkl: fixed typos in commit message - thanks Yasushi SHOJI]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 121dffe20b upstream.
When I use kfree_rcu() to free a large memory allocated by kmalloc_node(),
the following dump occurs.
BUG: kernel NULL pointer dereference, address: 0000000000000020
[...]
Oops: 0000 [#1] SMP
[...]
Workqueue: events kfree_rcu_work
RIP: 0010:__obj_to_index include/linux/slub_def.h:182 [inline]
RIP: 0010:obj_to_index include/linux/slub_def.h:191 [inline]
RIP: 0010:memcg_slab_free_hook+0x120/0x260 mm/slab.h:363
[...]
Call Trace:
kmem_cache_free_bulk+0x58/0x630 mm/slub.c:3293
kfree_bulk include/linux/slab.h:413 [inline]
kfree_rcu_work+0x1ab/0x200 kernel/rcu/tree.c:3300
process_one_work+0x207/0x530 kernel/workqueue.c:2276
worker_thread+0x320/0x610 kernel/workqueue.c:2422
kthread+0x13d/0x160 kernel/kthread.c:313
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
When kmalloc_node() a large memory, page is allocated, not slab, so when
freeing memory via kfree_rcu(), this large memory should not be used by
memcg_slab_free_hook(), because memcg_slab_free_hook() is is used for
slab.
Using page_objcgs_check() instead of page_objcgs() in
memcg_slab_free_hook() to fix this bug.
Link: https://lkml.kernel.org/r/20210728145655.274476-1-wanghai38@huawei.com
Fixes: 270c6a7146 ("mm: memcontrol/slab: Use helpers to access slab page's memcg_data")
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 30def93565 upstream.
Dan Carpenter reports:
The patch 2d146aa3aa: "mm: memcontrol: switch to rstat" from Apr
29, 2021, leads to the following static checker warning:
kernel/cgroup/rstat.c:200 cgroup_rstat_flush()
warn: sleeping in atomic context
mm/memcontrol.c
3572 static unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap)
3573 {
3574 unsigned long val;
3575
3576 if (mem_cgroup_is_root(memcg)) {
3577 cgroup_rstat_flush(memcg->css.cgroup);
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This is from static analysis and potentially a false positive. The
problem is that mem_cgroup_usage() is called from __mem_cgroup_threshold()
which holds an rcu_read_lock(). And the cgroup_rstat_flush() function
can sleep.
3578 val = memcg_page_state(memcg, NR_FILE_PAGES) +
3579 memcg_page_state(memcg, NR_ANON_MAPPED);
3580 if (swap)
3581 val += memcg_page_state(memcg, MEMCG_SWAP);
3582 } else {
3583 if (!swap)
3584 val = page_counter_read(&memcg->memory);
3585 else
3586 val = page_counter_read(&memcg->memsw);
3587 }
3588 return val;
3589 }
__mem_cgroup_threshold() indeed holds the rcu lock. In addition, the
thresholding code is invoked during stat changes, and those contexts
have irqs disabled as well. If the lock breaking occurs inside the
flush function, it will result in a sleep from an atomic context.
Use the irqsafe flushing variant in mem_cgroup_usage() to fix this.
Link: https://lkml.kernel.org/r/20210726150019.251820-1-hannes@cmpxchg.org
Fixes: 2d146aa3aa ("mm: memcontrol: switch to rstat")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Chris Down <chris@chrisdown.name>
Reviewed-by: Rik van Riel <riel@surriel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9449ad33be upstream.
For punch holes in EOF blocks, fallocate used buffer write to zero the
EOF blocks in last cluster. But since ->writepage will ignore EOF
pages, those zeros will not be flushed.
This "looks" ok as commit 6bba4471f0 ("ocfs2: fix data corruption by
fallocate") will zero the EOF blocks when extend the file size, but it
isn't. The problem happened on those EOF pages, before writeback, those
pages had DIRTY flag set and all buffer_head in them also had DIRTY flag
set, when writeback run by write_cache_pages(), DIRTY flag on the page
was cleared, but DIRTY flag on the buffer_head not.
When next write happened to those EOF pages, since buffer_head already
had DIRTY flag set, it would not mark page DIRTY again. That made
writeback ignore them forever. That will cause data corruption. Even
directio write can't work because it will fail when trying to drop pages
caches before direct io, as it found the buffer_head for those pages
still had DIRTY flag set, then it will fall back to buffer io mode.
To make a summary of the issue, as writeback ingores EOF pages, once any
EOF page is generated, any write to it will only go to the page cache,
it will never be flushed to disk even file size extends and that page is
not EOF page any more. The fix is to avoid zero EOF blocks with buffer
write.
The following code snippet from qemu-img could trigger the corruption.
656 open("6b3711ae-3306-4bdd-823c-cf1c0060a095.conv.2", O_RDWR|O_DIRECT|O_CLOEXEC) = 11
...
660 fallocate(11, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 2275868672, 327680 <unfinished ...>
660 fallocate(11, 0, 2275868672, 327680) = 0
658 pwrite64(11, "
Link: https://lkml.kernel.org/r/20210722054923.24389-2-junxiao.bi@oracle.com
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8750f9bbda upstream.
The arguments to the KVM_CLEAR_DIRTY_LOG ioctl include a pointer,
therefore it needs a compat ioctl implementation. Otherwise,
32-bit userspace fails to invoke it on 64-bit kernels; for x86
it might work fine by chance if the padding is zero, but not
on big-endian architectures.
Reported-by: Thomas Sattler
Cc: stable@vger.kernel.org
Fixes: 2a31b9db15 ("kvm: introduce manual dirty log reprotect")
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 76b4f357d0 upstream.
KVM_MAX_VCPU_ID is the maximum vcpu-id of a guest, and not the number
of vcpu-ids. Fix array indexed by vcpu-id to have KVM_MAX_VCPU_ID+1
elements.
Note that this is currently no real problem, as KVM_MAX_VCPU_ID is
an odd number, resulting in always enough padding being available at
the end of those arrays.
Nevertheless this should be fixed in order to avoid rare problems in
case someone is using an even number for KVM_MAX_VCPU_ID.
Signed-off-by: Juergen Gross <jgross@suse.com>
Message-Id: <20210701154105.23215-2-jgross@suse.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 41a8457f3f upstream.
The current assumption that methods to read PCH FIVR attributes will
return integer, is not correct. There is no good way to return integer
as negative numbers are also valid.
These read methods return a package of integers. The first integer returns
status, which is 0 on success and any other value for failure. When the
returned status is zero, then the second integer returns the actual value.
This change fixes this issue by replacing acpi_evaluate_integer() with
acpi_evaluate_object() and use acpi_extract_package() to extract results.
Fixes: 2ce6324ead ("ACPI: DPTF: Add PCH FIVR participant driver")
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: 5.10+ <stable@vger.kernel.org> # 5.10+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 240246f6b9 upstream.
In compression write endio sequence, the range which the compressed_bio
writes is marked as uptodate if the last bio of the compressed (sub)bios
is completed successfully. There could be previous bio which may
have failed which is recorded in cb->errors.
Set the writeback range as uptodate only if cb->errors is zero, as opposed
to checking only the last bio's status.
Backporting notes: in all versions up to 4.4 the last argument is always
replaced by "!cb->errors".
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b2a6166768 upstream.
When removing a writeable device in __btrfs_free_extra_devids, the rw
device count should be decremented.
This error was caught by Syzbot which reported a warning in
close_fs_devices:
WARNING: CPU: 1 PID: 9355 at fs/btrfs/volumes.c:1168 close_fs_devices+0x763/0x880 fs/btrfs/volumes.c:1168
Modules linked in:
CPU: 0 PID: 9355 Comm: syz-executor552 Not tainted 5.13.0-rc1-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:close_fs_devices+0x763/0x880 fs/btrfs/volumes.c:1168
RSP: 0018:ffffc9000333f2f0 EFLAGS: 00010293
RAX: ffffffff8365f5c3 RBX: 0000000000000001 RCX: ffff888029afd4c0
RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000
RBP: ffff88802846f508 R08: ffffffff8365f525 R09: ffffed100337d128
R10: ffffed100337d128 R11: 0000000000000000 R12: dffffc0000000000
R13: ffff888019be8868 R14: 1ffff1100337d10d R15: 1ffff1100337d10a
FS: 00007f6f53828700(0000) GS:ffff8880b9a00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000047c410 CR3: 00000000302a6000 CR4: 00000000001506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
btrfs_close_devices+0xc9/0x450 fs/btrfs/volumes.c:1180
open_ctree+0x8e1/0x3968 fs/btrfs/disk-io.c:3693
btrfs_fill_super fs/btrfs/super.c:1382 [inline]
btrfs_mount_root+0xac5/0xc60 fs/btrfs/super.c:1749
legacy_get_tree+0xea/0x180 fs/fs_context.c:592
vfs_get_tree+0x86/0x270 fs/super.c:1498
fc_mount fs/namespace.c:993 [inline]
vfs_kern_mount+0xc9/0x160 fs/namespace.c:1023
btrfs_mount+0x3d3/0xb50 fs/btrfs/super.c:1809
legacy_get_tree+0xea/0x180 fs/fs_context.c:592
vfs_get_tree+0x86/0x270 fs/super.c:1498
do_new_mount fs/namespace.c:2905 [inline]
path_mount+0x196f/0x2be0 fs/namespace.c:3235
do_mount fs/namespace.c:3248 [inline]
__do_sys_mount fs/namespace.c:3456 [inline]
__se_sys_mount+0x2f9/0x3b0 fs/namespace.c:3433
do_syscall_64+0x3f/0xb0 arch/x86/entry/common.c:47
entry_SYSCALL_64_after_hwframe+0x44/0xae
Because fs_devices->rw_devices was not 0 after
closing all devices. Here is the call trace that was observed:
btrfs_mount_root():
btrfs_scan_one_device():
device_list_add(); <---------------- device added
btrfs_open_devices():
open_fs_devices():
btrfs_open_one_device(); <-------- writable device opened,
rw device count ++
btrfs_fill_super():
open_ctree():
btrfs_free_extra_devids():
__btrfs_free_extra_devids(); <--- writable device removed,
rw device count not decremented
fail_tree_roots:
btrfs_close_devices():
close_fs_devices(); <------- rw device count off by 1
As a note, prior to commit cf89af146b ("btrfs: dev-replace: fail
mount if we don't have replace item with target device"), rw_devices
was decremented on removing a writable device in
__btrfs_free_extra_devids only if the BTRFS_DEV_STATE_REPLACE_TGT bit
was not set for the device. However, this check does not need to be
reinstated as it is now redundant and incorrect.
In __btrfs_free_extra_devids, we skip removing the device if it is the
target for replacement. This is done by checking whether device->devid
== BTRFS_DEV_REPLACE_DEVID. Since BTRFS_DEV_STATE_REPLACE_TGT is set
only on the device with devid BTRFS_DEV_REPLACE_DEVID, no devices
should have the BTRFS_DEV_STATE_REPLACE_TGT bit set after the check,
and so it's redundant to test for that bit.
Additionally, following commit 82372bc816 ("Btrfs: make
the logic of source device removing more clear"), rw_devices is
incremented whenever a writeable device is added to the alloc
list (including the target device in btrfs_dev_replace_finishing), so
all removals of writable devices from the alloc list should also be
accompanied by a decrement to rw_devices.
Reported-by: syzbot+a70e2ad0879f160b9217@syzkaller.appspotmail.com
Fixes: cf89af146b ("btrfs: dev-replace: fail mount if we don't have replace item with target device")
CC: stable@vger.kernel.org # 5.10+
Tested-by: syzbot+a70e2ad0879f160b9217@syzkaller.appspotmail.com
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ecc64fab7d upstream.
When checking if we need to log the new name of a renamed inode, we are
checking if the inode and its parent inode have been logged before, and if
not we don't log the new name. The check however is buggy, as it directly
compares the logged_trans field of the inodes versus the ID of the current
transaction. The problem is that logged_trans is a transient field, only
stored in memory and never persisted in the inode item, so if an inode
was logged before, evicted and reloaded, its logged_trans field is set to
a value of 0, meaning the check will return false and the new name of the
renamed inode is not logged. If the old parent directory was previously
fsynced and we deleted the logged directory entries corresponding to the
old name, we end up with a log that when replayed will delete the renamed
inode.
The following example triggers the problem:
$ mkfs.btrfs -f /dev/sdc
$ mount /dev/sdc /mnt
$ mkdir /mnt/A
$ mkdir /mnt/B
$ echo -n "hello world" > /mnt/A/foo
$ sync
# Add some new file to A and fsync directory A.
$ touch /mnt/A/bar
$ xfs_io -c "fsync" /mnt/A
# Now trigger inode eviction. We are only interested in triggering
# eviction for the inode of directory A.
$ echo 2 > /proc/sys/vm/drop_caches
# Move foo from directory A to directory B.
# This deletes the directory entries for foo in A from the log, and
# does not add the new name for foo in directory B to the log, because
# logged_trans of A is 0, which is less than the current transaction ID.
$ mv /mnt/A/foo /mnt/B/foo
# Now make an fsync to anything except A, B or any file inside them,
# like for example create a file at the root directory and fsync this
# new file. This syncs the log that contains all the changes done by
# previous rename operation.
$ touch /mnt/baz
$ xfs_io -c "fsync" /mnt/baz
<power fail>
# Mount the filesystem and replay the log.
$ mount /dev/sdc /mnt
# Check the filesystem content.
$ ls -1R /mnt
/mnt/:
A
B
baz
/mnt/A:
bar
/mnt/B:
$
# File foo is gone, it's neither in A/ nor in B/.
Fix this by using the inode_logged() helper at btrfs_log_new_name(), which
safely checks if an inode was logged before in the current transaction.
A test case for fstests will follow soon.
CC: stable@vger.kernel.org # 4.14+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 728d392f8a upstream.
Commit 782b76d7ab ("fs/ext2: Replace
kmap() with kmap_local_page()") replaced the kmap/kunmap calls in
ext2_get_page/ext2_put_page with kmap_local_page/kunmap_local for
efficiency reasons. As a necessary side change, the commit also
made ext2_get_page (and ext2_find_entry and ext2_dotdot) return
the mapping address along with the page itself, as it is required
for kunmap_local, and converted uses of page_address on such pages
to use the newly returned address instead. However, uses of
page_address on such pages were missed in ext2_check_page and
ext2_delete_entry, which triggers oopses if kmap_local_page happens
to return an address from high memory. Fix this now by converting
the remaining uses of page_address to use the right address, as
returned by kmap_local_page.
Link: https://lore.kernel.org/r/20210714185448.8707ac239e9f12b3a7f5b9f9@urjc.es
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Javier Pello <javier.pello@urjc.es>
Fixes: 782b76d7ab ("fs/ext2: Replace kmap() with kmap_local_page()")
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3a34b13a88 upstream.
Since commit 1b6b26ae70 ("pipe: fix and clarify pipe write wakeup
logic") we have sanitized the pipe write logic, and would only try to
wake up readers if they needed it.
In particular, if the pipe already had data in it before the write,
there was no point in trying to wake up a reader, since any existing
readers must have been aware of the pre-existing data already. Doing
extraneous wakeups will only cause potential thundering herd problems.
However, it turns out that some Android libraries have misused the EPOLL
interface, and expected "edge triggered" be to "any new write will
trigger it". Even if there was no edge in sight.
Quoting Sandeep Patil:
"The commit 1b6b26ae70 ('pipe: fix and clarify pipe write wakeup
logic') changed pipe write logic to wakeup readers only if the pipe
was empty at the time of write. However, there are libraries that
relied upon the older behavior for notification scheme similar to
what's described in [1]
One such library 'realm-core'[2] is used by numerous Android
applications. The library uses a similar notification mechanism as GNU
Make but it never drains the pipe until it is full. When Android moved
to v5.10 kernel, all applications using this library stopped working.
The library has since been fixed[3] but it will be a while before all
applications incorporate the updated library"
Our regression rule for the kernel is that if applications break from
new behavior, it's a regression, even if it was because the application
did something patently wrong. Also note the original report [4] by
Michal Kerrisk about a test for this epoll behavior - but at that point
we didn't know of any actual broken use case.
So add the extraneous wakeup, to approximate the old behavior.
[ I say "approximate", because the exact old behavior was to do a wakeup
not for each write(), but for each pipe buffer chunk that was filled
in. The behavior introduced by this change is not that - this is just
"every write will cause a wakeup, whether necessary or not", which
seems to be sufficient for the broken library use. ]
It's worth noting that this adds the extraneous wakeup only for the
write side, while the read side still considers the "edge" to be purely
about reading enough from the pipe to allow further writes.
See commit f467a6a664 ("pipe: fix and clarify pipe read wakeup logic")
for the pipe read case, which remains that "only wake up if the pipe was
full, and we read something from it".
Link: https://lore.kernel.org/lkml/CAHk-=wjeG0q1vgzu4iJhW5juPkTsjTYmiqiMUYAebWW+0bam6w@mail.gmail.com/ [1]
Link: https://github.com/realm/realm-core [2]
Link: https://github.com/realm/realm-core/issues/4666 [3]
Link: https://lore.kernel.org/lkml/CAKgNAkjMBGeAwF=2MKK758BhxvW58wYTgYKB2V-gY1PwXxrH+Q@mail.gmail.com/ [4]
Link: https://lore.kernel.org/lkml/20210729222635.2937453-1-sspatil@android.com/
Reported-by: Sandeep Patil <sspatil@android.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When backporting 0db282ba2c ("selftest: use mmap instead of
posix_memalign to allocate memory") to this stable branch, I forgot a {
breaking the build.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 82a1c67554 ]
Once the new schema interrupt-controller/arm,vic.yaml is added, we get
the below warnings:
arch/arm/boot/dts/versatile-ab.dt.yaml:
intc@10140000: $nodename:0: 'intc@10140000' does not match
'^interrupt-controller(@[0-9a-f,]+)*$'
arch/arm/boot/dts/versatile-ab.dt.yaml:
intc@10140000: 'clear-mask' does not match any of the regexes
Fix the node names for the interrupt controller to conform
to the standard node name interrupt-controller@.. Also drop invalid
clear-mask property.
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20210701132118.759454-1-sudeep.holla@arm.com'
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 49694d14ff ]
The length variable is rather pointless given that it can be trivially
deduced from offset and size. Also the initial calculation can lead
to KASAN warnings.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Leizhen (ThunderTown) <thunder.leizhen@huawei.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3ac1d42651 ]
The length variable is rather pointless given that it can be trivially
deduced from offset and size. Also the initial calculation can lead
to KASAN warnings.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Leizhen (ThunderTown) <thunder.leizhen@huawei.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c9c9c6815f ]
Because the out of range assignment to bit fields
are compiler-dependant, the fields could have wrong
value.
Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bdb8742dc6 ]
SCMI message headers carry a sequence number and such field is sized to
allow for MSG_TOKEN_MAX distinct numbers; moreover zero is not really an
acceptable maximum number of pending in-flight messages.
Fix accordingly the checks performed on the value exported by transports
in scmi_desc.max_msg
Link: https://lore.kernel.org/r/20210712141833.6628-3-cristian.marussi@arm.com
Reported-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
[sudeep.holla: updated the patch title and error message]
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7a691f16cc ]
The scmi_linux_errmap buffer access index is supposed to depend on the
array size to prevent element out of bounds access. It uses SCMI_ERR_MAX
to check bounds but that can mismatch with the array size. It also
changes the success into -EIO though scmi_linux_errmap is never used in
case of success, it is expected to work for success case too.
It is slightly confusing code as the negative of the error code
is used as index to the buffer. Fix it by negating it at the start and
make it more readable.
Link: https://lore.kernel.org/r/20210707135028.1869642-1-sudeep.holla@arm.com
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Cristian Marussi <cristian.marussi@arm.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e4b9852a0f ]
Below two paths could overlap each other if we power off a drive quickly
after powering it on. There are multiple races in nvme_setup_io_queues()
because of shutdown_lock missing and improper use of NVMEQ_ENABLED bit.
nvme_reset_work() nvme_remove()
nvme_setup_io_queues() nvme_dev_disable()
... ...
A1 clear NVMEQ_ENABLED bit for admin queue lock
retry: B1 nvme_suspend_io_queues()
A2 pci_free_irq() admin queue B2 nvme_suspend_queue() admin queue
A3 pci_free_irq_vectors() nvme_pci_disable()
A4 nvme_setup_irqs(); B3 pci_free_irq_vectors()
... unlock
A5 queue_request_irq() for admin queue
set NVMEQ_ENABLED bit
...
nvme_create_io_queues()
A6 result = queue_request_irq();
set NVMEQ_ENABLED bit
...
fail to allocate enough IO queues:
A7 nvme_suspend_io_queues()
goto retry
If B3 runs in between A1 and A2, it will crash if irqaction haven't
been freed by A2. B2 is supposed to free admin queue IRQ but it simply
can't fulfill the job as A1 has cleared NVMEQ_ENABLED bit.
Fix: combine A1 A2 so IRQ get freed as soon as the NVMEQ_ENABLED bit
gets cleared.
After solved #1, A2 could race with B3 if A2 is freeing IRQ while B3
is checking irqaction. A3 also could race with B2 if B2 is freeing
IRQ while A3 is checking irqaction.
Fix: A2 and A3 take lock for mutual exclusion.
A3 could race with B3 since they could run free_msi_irqs() in parallel.
Fix: A3 takes lock for mutual exclusion.
A4 could fail to allocate all needed IRQ vectors if A3 and A4 are
interrupted by B3.
Fix: A4 takes lock for mutual exclusion.
If A5/A6 happened after B2/B1, B3 will crash since irqaction is not NULL.
They are just allocated by A5/A6.
Fix: Lock queue_request_irq() and setting of NVMEQ_ENABLED bit.
A7 could get chance to pci_free_irq() for certain IO queue while B3 is
checking irqaction.
Fix: A7 takes lock.
nvme_dev->online_queues need to be protected by shutdown_lock. Since it
is not atomic, both paths could modify it using its own copy.
Co-developed-by: Yuanyuan Zhong <yzhong@purestorage.com>
Signed-off-by: Casey Chen <cachen@purestorage.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a9ab9cce93 ]
Invoking trc_del_holdout() from within trc_wait_for_one_reader() is
only a performance optimization because the RCU Tasks Trace grace-period
kthread will eventually do this within check_all_holdout_tasks_trace().
But it is not a particularly important performance optimization because
it only applies to the grace-period kthread, of which there is but one.
This commit therefore removes this invocation of trc_del_holdout() in
favor of the one in check_all_holdout_tasks_trace() in the grace-period
kthread.
Reported-by: "Xu, Yanfei" <yanfei.xu@windriver.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1d10bf55d8 ]
As Yanfei pointed out, although invoking trc_del_holdout() is safe
from the viewpoint of the integrity of the holdout list itself,
the put_task_struct() invoked by trc_del_holdout() can result in
use-after-free errors due to later accesses to this task_struct structure
by the RCU Tasks Trace grace-period kthread.
This commit therefore removes this call to trc_del_holdout() from
trc_inspect_reader() in favor of the grace-period thread's existing call
to trc_del_holdout(), thus eliminating that particular class of
use-after-free errors.
Reported-by: "Xu, Yanfei" <yanfei.xu@windriver.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1d11fa231c ]
The doc draft-stewart-tsvwg-sctp-ipv4-00 that restricts 198 addresses
was never published. These addresses as private addresses should be
allowed to use in SCTP.
As Michael Tuexen suggested, this patch is to move 198 addresses from
unusable to private scope.
Reported-by: Sérgio <surkamp@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0dbffbb533 ]
sk_ll_usec is read locklessly from sk_can_busy_loop()
while another thread can change its value in sock_setsockopt()
This is correct but needs annotations.
BUG: KCSAN: data-race in __skb_try_recv_datagram / sock_setsockopt
write to 0xffff88814eb5f904 of 4 bytes by task 14011 on cpu 0:
sock_setsockopt+0x1287/0x2090 net/core/sock.c:1175
__sys_setsockopt+0x14f/0x200 net/socket.c:2100
__do_sys_setsockopt net/socket.c:2115 [inline]
__se_sys_setsockopt net/socket.c:2112 [inline]
__x64_sys_setsockopt+0x62/0x70 net/socket.c:2112
do_syscall_64+0x4a/0x90 arch/x86/entry/common.c:47
entry_SYSCALL_64_after_hwframe+0x44/0xae
read to 0xffff88814eb5f904 of 4 bytes by task 14001 on cpu 1:
sk_can_busy_loop include/net/busy_poll.h:41 [inline]
__skb_try_recv_datagram+0x14f/0x320 net/core/datagram.c:273
unix_dgram_recvmsg+0x14c/0x870 net/unix/af_unix.c:2101
unix_seqpacket_recvmsg+0x5a/0x70 net/unix/af_unix.c:2067
____sys_recvmsg+0x15d/0x310 include/linux/uio.h:244
___sys_recvmsg net/socket.c:2598 [inline]
do_recvmmsg+0x35c/0x9f0 net/socket.c:2692
__sys_recvmmsg net/socket.c:2771 [inline]
__do_sys_recvmmsg net/socket.c:2794 [inline]
__se_sys_recvmmsg net/socket.c:2787 [inline]
__x64_sys_recvmmsg+0xcf/0x150 net/socket.c:2787
do_syscall_64+0x4a/0x90 arch/x86/entry/common.c:47
entry_SYSCALL_64_after_hwframe+0x44/0xae
value changed: 0x00000000 -> 0x00000101
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 14001 Comm: syz-executor.3 Not tainted 5.13.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 1e7107c5ef upstream.
Richard reported sporadic (roughly one in 10 or so) null dereferences and
other strange behaviour for a set of automated LTP tests. Things like:
BUG: kernel NULL pointer dereference, address: 0000000000000008
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 0 PID: 1516 Comm: umount Not tainted 5.10.0-yocto-standard #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-48-gd9c812dda519-prebuilt.qemu.org 04/01/2014
RIP: 0010:kernfs_sop_show_path+0x1b/0x60
...or these others:
RIP: 0010:do_mkdirat+0x6a/0xf0
RIP: 0010:d_alloc_parallel+0x98/0x510
RIP: 0010:do_readlinkat+0x86/0x120
There were other less common instances of some kind of a general scribble
but the common theme was mount and cgroup and a dubious dentry triggering
the NULL dereference. I was only able to reproduce it under qemu by
replicating Richard's setup as closely as possible - I never did get it
to happen on bare metal, even while keeping everything else the same.
In commit 71d883c37e ("cgroup_do_mount(): massage calling conventions")
we see this as a part of the overall change:
--------------
struct cgroup_subsys *ss;
- struct dentry *dentry;
[...]
- dentry = cgroup_do_mount(&cgroup_fs_type, fc->sb_flags, root,
- CGROUP_SUPER_MAGIC, ns);
[...]
- if (percpu_ref_is_dying(&root->cgrp.self.refcnt)) {
- struct super_block *sb = dentry->d_sb;
- dput(dentry);
+ ret = cgroup_do_mount(fc, CGROUP_SUPER_MAGIC, ns);
+ if (!ret && percpu_ref_is_dying(&root->cgrp.self.refcnt)) {
+ struct super_block *sb = fc->root->d_sb;
+ dput(fc->root);
deactivate_locked_super(sb);
msleep(10);
return restart_syscall();
}
--------------
In changing from the local "*dentry" variable to using fc->root, we now
export/leave that dentry pointer in the file context after doing the dput()
in the unlikely "is_dying" case. With LTP doing a crazy amount of back to
back mount/unmount [testcases/bin/cgroup_regression_5_1.sh] the unlikely
becomes slightly likely and then bad things happen.
A fix would be to not leave the stale reference in fc->root as follows:
--------------
dput(fc->root);
+ fc->root = NULL;
deactivate_locked_super(sb);
--------------
...but then we are just open-coding a duplicate of fc_drop_locked() so we
simply use that instead.
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Tejun Heo <tj@kernel.org>
Cc: Zefan Li <lizefan.x@bytedance.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: stable@vger.kernel.org # v5.1+
Reported-by: Richard Purdie <richard.purdie@linuxfoundation.org>
Fixes: 71d883c37e ("cgroup_do_mount(): massage calling conventions")
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cbcf01128d upstream.
unix_gc() assumes that candidate sockets can never gain an external
reference (i.e. be installed into an fd) while the unix_gc_lock is
held. Except for MSG_PEEK this is guaranteed by modifying inflight
count under the unix_gc_lock.
MSG_PEEK does not touch any variable protected by unix_gc_lock (file
count is not), yet it needs to be serialized with garbage collection.
Do this by locking/unlocking unix_gc_lock:
1) increment file count
2) lock/unlock barrier to make sure incremented file count is visible
to garbage collection
3) install file into fd
This is a lock barrier (unlike smp_mb()) that ensures that garbage
collection is run completely before or completely after the barrier.
Cc: <stable@vger.kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9615fe36b3 upstream.
We will fail to build with CONFIG_SKB_EXTENSIONS disabled after
8550ff8d8c ("skbuff: Release nfct refcount on napi stolen or re-used
skbs") since there is an unconditionally use of skb_ext_find() without
an appropriate stub. Simply build the code conditionally and properly
guard against both COFNIG_SKB_EXTENSIONS as well as
CONFIG_NET_TC_SKB_EXT being disabled.
Fixes: Fixes: 8550ff8d8c ("skbuff: Release nfct refcount on napi stolen or re-used skbs")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 788bc000d4 ]
Commit 99ba0ea616 ("sfc: adjust efx->xdp_tx_queue_count with the real
number of initialized queues") intended to fix a problem caused by a
round up when calculating the number of XDP channels and queues.
However, this was not the real problem. The real problem was that the
number of XDP TX queues had been reduced to half in
commit e26ca4b535 ("sfc: reduce the number of requested xdp ev queues"),
but the variable xdp_tx_queue_count had remained the same.
Once the correct number of XDP TX queues is created again in the
previous patch of this series, this also can be reverted since the error
doesn't actually exist.
Only in the case that there is a bug in the code we can have different
values in xdp_queue_number and efx->xdp_tx_queue_count. Because of this,
and per Edward Cree's suggestion, I add instead a WARN_ON to catch if it
happens again in the future.
Note that the number of allocated queues can be higher than the number
of used ones due to the round up, as explained in the existing comment
in the code. That's why we also have to stop increasing xdp_queue_number
beyond efx->xdp_tx_queue_count.
Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 0e85ee8978 upstream.
Fix below division by zero warning:
- The reason for dividing by zero is because the dummy bus width is zero,
but if the dummy n bytes is zero, it indicates that there is no data transfer,
so we can just return zero without doing any calculations.
[ 0.795337] Division by zero in kernel.
:
[ 0.834051] [<807fd40c>] (__div0) from [<804e1acc>] (Ldiv0+0x8/0x10)
[ 0.839097] [<805f0710>] (cqspi_exec_mem_op) from [<805edb4c>] (spi_mem_exec_op+0x3b0/0x3f8)
Fixes: 7512eaf541 ("spi: cadence-quadspi: Fix dummy cycle calculation when buswidth > 1")
Signed-off-by: Yoshitaka Ikeda <ikeda@nskint.co.jp>
Reviewed-by: Pratyush Yadav <p.yadav@ti.com>
Link: https://lore.kernel.org/r/92eea403-9b21-2488-9cc1-664bee760c5e@nskint.co.jp
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c90b4503cc upstream.
d3_entered flag is used to mark for vgpu_reset a previous power
transition from D3->D0, typically for VM resume from S3, so that gvt
could skip PPGTT invalidation in current vgpu_reset during resuming.
In case S0ix exit, although there is D3->D0, guest driver continue to
use vgpu as normal, with d3_entered set, until next shutdown/reboot or
power transition.
If a reboot follows a S0ix exit, device power state transite as:
D0->D3->D0->D0(reboot), while system power state transites as:
S0->S0 (reboot). There is no vgpu_reset until D0(reboot), thus
d3_entered won't be cleared, the vgpu_reset will skip PPGTT invalidation
however those PPGTT entries are no longer valid. Err appears like:
gvt: vgpu 2: vfio_pin_pages failed for gfn 0xxxxx, ret -22
gvt: vgpu 2: fail: spt xxxx guest entry 0xxxxx type 2
gvt: vgpu 2: fail: shadow page xxxx guest entry 0xxxxx type 2.
Give gvt a chance to clear d3_entered on elsp cmd submission so that the
states before & after S0ix enter/exit are consistent.
Fixes: ba25d97757 ("drm/i915/gvt: Do not destroy ppgtt_mm during vGPU D3->D0.")
Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Colin Xu <colin.xu@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20210707004531.4873-1-colin.xu@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ab37a7a890 upstream.
The usage of usb-nop-xceiv PHY on Raspberry Pi boards with BCM283x has
been a "regression source" a lot of times. The last case is breakage of
USB mass storage boot has been commit e590474768 ("driver core: Set
fw_devlink=on by default") for multi_v7_defconfig. As long as
NOP_USB_XCEIV is configured as module, the dwc2 USB driver defer probing
endlessly and prevent booting from USB mass storage device. So make
the driver built-in as in bcm2835_defconfig and arm64/defconfig.
Fixes: e590474768 ("driver core: Set fw_devlink=on by default")
Reported-by: Ojaswin Mujoo <ojaswin98@gmail.com>
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Link: https://lore.kernel.org/r/1625915095-23077-1-git-send-email-stefan.wahren@i2se.com'
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8550ff8d8c upstream.
When multiple SKBs are merged to a new skb under napi GRO,
or SKB is re-used by napi, if nfct was set for them in the
driver, it will not be released while freeing their stolen
head state or on re-use.
Release nfct on napi's stolen or re-used SKBs, and
in gro_list_prepare, check conntrack metadata diff.
Fixes: 5c6b946047 ("net/mlx5e: CT: Handle misses after executing CT action")
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c4512c63b1 upstream.
Dan Carpenter reported an issue introduced in
commit fde56eea01 ("mptcp: refine mptcp_cleanup_rbuf") where a new
boolean (ack_pending) is masked with 0x9.
This is not the intention to ignore values by using a boolean. This
variable should not have a 'bool' type: we should keep the 'u8' to allow
this comparison.
Fixes: fde56eea01 ("mptcp: refine mptcp_cleanup_rbuf")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5b69874f74 upstream.
The commit 9a5605505d (" bonding: Add struct bond_ipesc to manage SA") is causing
following build error when XFRM is not selected in kernel config.
lld: error: undefined symbol: xfrm_dev_state_flush
>>> referenced by bond_main.c:3453 (drivers/net/bonding/bond_main.c:3453)
>>> net/bonding/bond_main.o:(bond_netdev_event) in archive drivers/built-in.a
Fixes: 9a5605505d (" bonding: Add struct bond_ipesc to manage SA")
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
CC: Taehee Yoo <ap420073@gmail.com>
CC: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3abab27c32 upstream.
drm: Return -ENOTTY for non-drm ioctls
Return -ENOTTY from drm_ioctl() when userspace passes in a cmd number
which doesn't relate to the drm subsystem.
Glibc uses the TCGETS ioctl to implement isatty(), and without this
change isatty() returns it incorrectly returns true for drm devices.
To test run this command:
$ if [ -t 0 ]; then echo is a tty; fi < /dev/dri/card0
which shows "is a tty" without this patch.
This may also modify memory which the userspace application is not
expecting.
Signed-off-by: Charles Baylis <cb-kernel@fishzet.co.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/YPG3IBlzaMhfPqCr@stando.fishzet.co.uk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c36748ac54 upstream.
We need to append device id even if eeprom have a label property set as some
platform can have multiple eeproms with same label and we can not register
each of those with same label. Failing to register those eeproms trigger
cascade failures on such platform (system is no longer working).
This fix regression on such platform introduced with 4e302c3b56
Reported-by: Alexander Fomichev <fomichev.ru@gmail.com>
Fixes: 4e302c3b56 ("misc: eeprom: at24: fix NVMEM name with custom AT24 device name")
Cc: stable@vger.kernel.org
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8798d070d4 upstream.
Skipping the "lock has been released" notification if the lock owner
is not what we expect based on owner_cid can lead to I/O hangs.
One example is our own notifications: because owner_cid is cleared
in rbd_unlock(), when we get our own notification it is processed as
unexpected/duplicate and maybe_kick_acquire() isn't called. If a peer
that requested the lock then doesn't go through with acquiring it,
I/O requests that came in while the lock was being quiesced would
be stalled until another I/O request is submitted and kicks acquire
from rbd_img_exclusive_lock().
This makes the comment in rbd_release_lock() actually true: prior to
this change the canceled work was being requeued in response to the
"lock has been acquired" notification from rbd_handle_acquired_lock().
Cc: stable@vger.kernel.org # 5.3+
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Tested-by: Robin Geuze <robin.geuze@nl.team.blue>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ed9eb71085 upstream.
Currently rbd_quiesce_lock() holds lock_rwsem for read while blocking
on releasing_wait completion. On the I/O completion side, each image
request also needs to take lock_rwsem for read. Because rw_semaphore
implementation doesn't allow new readers after a writer has indicated
interest in the lock, this can result in a deadlock if something that
needs to take lock_rwsem for write gets involved. For example:
1. watch error occurs
2. rbd_watch_errcb() takes lock_rwsem for write, clears owner_cid and
releases lock_rwsem
3. after reestablishing the watch, rbd_reregister_watch() takes
lock_rwsem for write and calls rbd_reacquire_lock()
4. rbd_quiesce_lock() downgrades lock_rwsem to for read and blocks on
releasing_wait until running_list becomes empty
5. another watch error occurs
6. rbd_watch_errcb() blocks trying to take lock_rwsem for write
7. no in-flight image request can complete and delete itself from
running_list because lock_rwsem won't be granted anymore
A similar scenario can occur with "lock has been acquired" and "lock
has been released" notification handers which also take lock_rwsem for
write to update owner_cid.
We don't actually get anything useful from sitting on lock_rwsem in
rbd_quiesce_lock() -- owner_cid updates certainly don't need to be
synchronized with. In fact the whole owner_cid tracking logic could
probably be removed from the kernel client because we don't support
proxied maintenance operations.
Cc: stable@vger.kernel.org # 5.3+
URL: https://tracker.ceph.com/issues/42757
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Tested-by: Robin Geuze <robin.geuze@nl.team.blue>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 79e482e9c3 upstream.
Commit b10d6bca87 ("arch, drivers: replace for_each_membock() with
for_each_mem_range()") didn't take into account that when there is
movable_node parameter in the kernel command line, for_each_mem_range()
would skip ranges marked with MEMBLOCK_HOTPLUG.
The page table setup code in POWER uses for_each_mem_range() to create
the linear mapping of the physical memory and since the regions marked
as MEMORY_HOTPLUG are skipped, they never make it to the linear map.
A later access to the memory in those ranges will fail:
BUG: Unable to handle kernel data access on write at 0xc000000400000000
Faulting instruction address: 0xc00000000008a3c0
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
Modules linked in:
CPU: 0 PID: 53 Comm: kworker/u2:0 Not tainted 5.13.0 #7
NIP: c00000000008a3c0 LR: c0000000003c1ed8 CTR: 0000000000000040
REGS: c000000008a57770 TRAP: 0300 Not tainted (5.13.0)
MSR: 8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE> CR: 84222202 XER: 20040000
CFAR: c0000000003c1ed4 DAR: c000000400000000 DSISR: 42000000 IRQMASK: 0
GPR00: c0000000003c1ed8 c000000008a57a10 c0000000019da700 c000000400000000
GPR04: 0000000000000280 0000000000000180 0000000000000400 0000000000000200
GPR08: 0000000000000100 0000000000000080 0000000000000040 0000000000000300
GPR12: 0000000000000380 c000000001bc0000 c0000000001660c8 c000000006337e00
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000040000000 0000000020000000 c000000001a81990 c000000008c30000
GPR24: c000000008c20000 c000000001a81998 000fffffffff0000 c000000001a819a0
GPR28: c000000001a81908 c00c000001000000 c000000008c40000 c000000008a64680
NIP clear_user_page+0x50/0x80
LR __handle_mm_fault+0xc88/0x1910
Call Trace:
__handle_mm_fault+0xc44/0x1910 (unreliable)
handle_mm_fault+0x130/0x2a0
__get_user_pages+0x248/0x610
__get_user_pages_remote+0x12c/0x3e0
get_arg_page+0x54/0xf0
copy_string_kernel+0x11c/0x210
kernel_execve+0x16c/0x220
call_usermodehelper_exec_async+0x1b0/0x2f0
ret_from_kernel_thread+0x5c/0x70
Instruction dump:
79280fa4 79271764 79261f24 794ae8e2 7ca94214 7d683a14 7c893a14 7d893050
7d4903a6 60000000 60000000 60000000 <7c001fec> 7c091fec 7c081fec 7c051fec
---[ end trace 490b8c67e6075e09 ]---
Making for_each_mem_range() include MEMBLOCK_HOTPLUG regions in the
traversal fixes this issue.
Link: https://bugzilla.redhat.com/show_bug.cgi?id=1976100
Link: https://lkml.kernel.org/r/20210712071132.20902-1-rppt@kernel.org
Fixes: b10d6bca87 ("arch, drivers: replace for_each_membock() with for_each_mem_range()")
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Tested-by: Greg Kurz <groug@kaod.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: <stable@vger.kernel.org> [5.10+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8dad53a11f upstream.
memcpy_to_page and memzero_page can write to arbitrary pages, which
could be in the page cache or in high memory, so call
flush_kernel_dcache_pages to flush the dcache.
This is a problem when using these helpers on dcache challeneged
architectures. Right now there are just a few users, chances are no one
used the PC floppy driver, the aha1542 driver for an ISA SCSI HBA, and a
few advanced and optional btrfs and ext4 features on those platforms yet
since the conversion.
Link: https://lkml.kernel.org/r/20210713055231.137602-2-hch@lst.de
Fixes: bb90d4bc7b ("mm/highmem: Lift memcpy_[to|from]_page to core")
Fixes: 28961998f8 ("iov_iter: lift memzero_page() to highmem.h")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Cc: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e71e2ace57 upstream.
Patch series "userfaultfd: do not untag user pointers", v5.
If a user program uses userfaultfd on ranges of heap memory, it may end
up passing a tagged pointer to the kernel in the range.start field of
the UFFDIO_REGISTER ioctl. This can happen when using an MTE-capable
allocator, or on Android if using the Tagged Pointers feature for MTE
readiness [1].
When a fault subsequently occurs, the tag is stripped from the fault
address returned to the application in the fault.address field of struct
uffd_msg. However, from the application's perspective, the tagged
address *is* the memory address, so if the application is unaware of
memory tags, it may get confused by receiving an address that is, from
its point of view, outside of the bounds of the allocation. We observed
this behavior in the kselftest for userfaultfd [2] but other
applications could have the same problem.
Address this by not untagging pointers passed to the userfaultfd ioctls.
Instead, let the system call fail. Also change the kselftest to use
mmap so that it doesn't encounter this problem.
[1] https://source.android.com/devices/tech/debug/tagged-pointers
[2] tools/testing/selftests/vm/userfaultfd.c
This patch (of 2):
Do not untag pointers passed to the userfaultfd ioctls. Instead, let
the system call fail. This will provide an early indication of problems
with tag-unaware userspace code instead of letting the code get confused
later, and is consistent with how we decided to handle brk/mmap/mremap
in commit dcde237319 ("mm: Avoid creating virtual address aliases in
brk()/mmap()/mremap()"), as well as being consistent with the existing
tagged address ABI documentation relating to how ioctl arguments are
handled.
The code change is a revert of commit 7d0325749a ("userfaultfd: untag
user pointers") plus some fixups to some additional calls to
validate_range that have appeared since then.
[1] https://source.android.com/devices/tech/debug/tagged-pointers
[2] tools/testing/selftests/vm/userfaultfd.c
Link: https://lkml.kernel.org/r/20210714195437.118982-1-pcc@google.com
Link: https://lkml.kernel.org/r/20210714195437.118982-2-pcc@google.com
Link: https://linux-review.googlesource.com/id/I761aa9f0344454c482b83fcfcce547db0a25501b
Fixes: 63f0c60379 ("arm64: Introduce prctl() options to control the tagged user addresses ABI")
Signed-off-by: Peter Collingbourne <pcc@google.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Alistair Delva <adelva@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dave Martin <Dave.Martin@arm.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Mitch Phillips <mitchp@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: William McVicker <willmcvicker@google.com>
Cc: <stable@vger.kernel.org> [5.4]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 46fee9ab02 upstream.
__io_queue_proc() can enqueue both poll entries and still fail
afterwards, so the callers trying to cancel it should also try to remove
the second poll entry (if any).
For example, it may leave the request alive referencing a io_uring
context but not accessible for cancellation:
[ 282.599913][ T1620] task:iou-sqp-23145 state:D stack:28720 pid:23155 ppid: 8844 flags:0x00004004
[ 282.609927][ T1620] Call Trace:
[ 282.613711][ T1620] __schedule+0x93a/0x26f0
[ 282.634647][ T1620] schedule+0xd3/0x270
[ 282.638874][ T1620] io_uring_cancel_generic+0x54d/0x890
[ 282.660346][ T1620] io_sq_thread+0xaac/0x1250
[ 282.696394][ T1620] ret_from_fork+0x1f/0x30
Cc: stable@vger.kernel.org
Fixes: 18bceab101 ("io_uring: allow POLL_ADD with double poll_wait() users")
Reported-and-tested-by: syzbot+ac957324022b7132accf@syzkaller.appspotmail.com
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/0ec1228fc5eda4cb524eeda857da8efdc43c331c.1626774457.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1a3402d93c upstream.
Since the process wide cputime counter is started locklessly from
posix_cpu_timer_rearm(), it can be concurrently stopped by operations
on other timers from the same thread group, such as in the following
unlucky scenario:
CPU 0 CPU 1
----- -----
timer_settime(TIMER B)
posix_cpu_timer_rearm(TIMER A)
cpu_clock_sample_group()
(pct->timers_active already true)
handle_posix_cpu_timers()
check_process_timers()
stop_process_timers()
pct->timers_active = false
arm_timer(TIMER A)
tick -> run_posix_cpu_timers()
// sees !pct->timers_active, ignore
// our TIMER A
Fix this with simply locking process wide cputime counting start and
timer arm in the same block.
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Fixes: 60f2ceaa81 ("posix-cpu-timers: Remove unnecessary locking around cpu_clock_sample_group")
Cc: stable@vger.kernel.org
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 09cfae9f13 upstream.
When receiving a packet with multiple fragments, hardware may still
touch the first fragment until the entire packet has been received. The
driver therefore keeps the first fragment mapped for DMA until end of
packet has been asserted, and delays its dma_sync call until then.
The driver tries to fit multiple receive buffers on one page. When using
3K receive buffers (e.g. using Jumbo frames and legacy-rx is turned
off/build_skb is being used) on an architecture with 4K pages, the
driver allocates an order 1 compound page and uses one page per receive
buffer. To determine the correct offset for a delayed DMA sync of the
first fragment of a multi-fragment packet, the driver then cannot just
use PAGE_MASK on the DMA address but has to construct a mask based on
the actual size of the backing page.
Using PAGE_MASK in the 3K RX buffer/4K page architecture configuration
will always sync the first page of a compound page. With the SWIOTLB
enabled this can lead to corrupted packets (zeroed out first fragment,
re-used garbage from another packet) and various consequences, such as
slow/stalling data transfers and connection resets. For example, testing
on a link with MTU exceeding 3058 bytes on a host with SWIOTLB enabled
(e.g. "iommu=soft swiotlb=262144,force") TCP transfers quickly fizzle
out without this patch.
Cc: stable@vger.kernel.org
Fixes: 0c5661ecc5 ("ixgbe: fix crash in build_skb Rx code path")
Signed-off-by: Markus Boehme <markubo@amazon.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8d4abca95e upstream.
Fix an 11-year old bug in ngene_command_config_free_buf() while
addressing the following warnings caught with -Warray-bounds:
arch/alpha/include/asm/string.h:22:16: warning: '__builtin_memcpy' offset [12, 16] from the object at 'com' is out of the bounds of referenced subobject 'config' with type 'unsigned char' at offset 10 [-Warray-bounds]
arch/x86/include/asm/string_32.h:182:25: warning: '__builtin_memcpy' offset [12, 16] from the object at 'com' is out of the bounds of referenced subobject 'config' with type 'unsigned char' at offset 10 [-Warray-bounds]
The problem is that the original code is trying to copy 6 bytes of
data into a one-byte size member _config_ of the wrong structue
FW_CONFIGURE_BUFFERS, in a single call to memcpy(). This causes a
legitimate compiler warning because memcpy() overruns the length
of &com.cmd.ConfigureBuffers.config. It seems that the right
structure is FW_CONFIGURE_FREE_BUFFERS, instead, because it contains
6 more members apart from the header _hdr_. Also, the name of
the function ngene_command_config_free_buf() suggests that the actual
intention is to ConfigureFreeBuffers, instead of ConfigureBuffers
(which takes place in the function ngene_command_config_buf(), above).
Fix this by enclosing those 6 members of struct FW_CONFIGURE_FREE_BUFFERS
into new struct config, and use &com.cmd.ConfigureFreeBuffers.config as
the destination address, instead of &com.cmd.ConfigureBuffers.config,
when calling memcpy().
This also helps with the ongoing efforts to globally enable
-Warray-bounds and get us closer to being able to tighten the
FORTIFY_SOURCE routines on memcpy().
Link: https://github.com/KSPP/linux/issues/109
Fixes: dae52d009f ("V4L/DVB: ngene: Initial check-in")
Cc: stable@vger.kernel.org
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/linux-hardening/20210420001631.GA45456@embeddedor/
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8949b9a114 upstream.
At btrfs_qgroup_trace_extent_post() we call btrfs_find_all_roots() with a
NULL value as the transaction handle argument, which makes that function
take the commit_root_sem semaphore, which is necessary when we don't hold
a transaction handle or any other mechanism to prevent a transaction
commit from wiping out commit roots.
However btrfs_qgroup_trace_extent_post() can be called in a context where
we are holding a write lock on an extent buffer from a subvolume tree,
namely from btrfs_truncate_inode_items(), called either during truncate
or unlink operations. In this case we end up with a lock inversion problem
because the commit_root_sem is a higher level lock, always supposed to be
acquired before locking any extent buffer.
Lockdep detects this lock inversion problem since we switched the extent
buffer locks from custom locks to semaphores, and when running btrfs/158
from fstests, it reported the following trace:
[ 9057.626435] ======================================================
[ 9057.627541] WARNING: possible circular locking dependency detected
[ 9057.628334] 5.14.0-rc2-btrfs-next-93 #1 Not tainted
[ 9057.628961] ------------------------------------------------------
[ 9057.629867] kworker/u16:4/30781 is trying to acquire lock:
[ 9057.630824] ffff8e2590f58760 (btrfs-tree-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x24/0x110 [btrfs]
[ 9057.632542]
but task is already holding lock:
[ 9057.633551] ffff8e25582d4b70 (&fs_info->commit_root_sem){++++}-{3:3}, at: iterate_extent_inodes+0x10b/0x280 [btrfs]
[ 9057.635255]
which lock already depends on the new lock.
[ 9057.636292]
the existing dependency chain (in reverse order) is:
[ 9057.637240]
-> #1 (&fs_info->commit_root_sem){++++}-{3:3}:
[ 9057.638138] down_read+0x46/0x140
[ 9057.638648] btrfs_find_all_roots+0x41/0x80 [btrfs]
[ 9057.639398] btrfs_qgroup_trace_extent_post+0x37/0x70 [btrfs]
[ 9057.640283] btrfs_add_delayed_data_ref+0x418/0x490 [btrfs]
[ 9057.641114] btrfs_free_extent+0x35/0xb0 [btrfs]
[ 9057.641819] btrfs_truncate_inode_items+0x424/0xf70 [btrfs]
[ 9057.642643] btrfs_evict_inode+0x454/0x4f0 [btrfs]
[ 9057.643418] evict+0xcf/0x1d0
[ 9057.643895] do_unlinkat+0x1e9/0x300
[ 9057.644525] do_syscall_64+0x3b/0xc0
[ 9057.645110] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 9057.645835]
-> #0 (btrfs-tree-00){++++}-{3:3}:
[ 9057.646600] __lock_acquire+0x130e/0x2210
[ 9057.647248] lock_acquire+0xd7/0x310
[ 9057.647773] down_read_nested+0x4b/0x140
[ 9057.648350] __btrfs_tree_read_lock+0x24/0x110 [btrfs]
[ 9057.649175] btrfs_read_lock_root_node+0x31/0x40 [btrfs]
[ 9057.650010] btrfs_search_slot+0x537/0xc00 [btrfs]
[ 9057.650849] scrub_print_warning_inode+0x89/0x370 [btrfs]
[ 9057.651733] iterate_extent_inodes+0x1e3/0x280 [btrfs]
[ 9057.652501] scrub_print_warning+0x15d/0x2f0 [btrfs]
[ 9057.653264] scrub_handle_errored_block.isra.0+0x135f/0x1640 [btrfs]
[ 9057.654295] scrub_bio_end_io_worker+0x101/0x2e0 [btrfs]
[ 9057.655111] btrfs_work_helper+0xf8/0x400 [btrfs]
[ 9057.655831] process_one_work+0x247/0x5a0
[ 9057.656425] worker_thread+0x55/0x3c0
[ 9057.656993] kthread+0x155/0x180
[ 9057.657494] ret_from_fork+0x22/0x30
[ 9057.658030]
other info that might help us debug this:
[ 9057.659064] Possible unsafe locking scenario:
[ 9057.659824] CPU0 CPU1
[ 9057.660402] ---- ----
[ 9057.660988] lock(&fs_info->commit_root_sem);
[ 9057.661581] lock(btrfs-tree-00);
[ 9057.662348] lock(&fs_info->commit_root_sem);
[ 9057.663254] lock(btrfs-tree-00);
[ 9057.663690]
*** DEADLOCK ***
[ 9057.664437] 4 locks held by kworker/u16:4/30781:
[ 9057.665023] #0: ffff8e25922a1148 ((wq_completion)btrfs-scrub){+.+.}-{0:0}, at: process_one_work+0x1c7/0x5a0
[ 9057.666260] #1: ffffabb3451ffe70 ((work_completion)(&work->normal_work)){+.+.}-{0:0}, at: process_one_work+0x1c7/0x5a0
[ 9057.667639] #2: ffff8e25922da198 (&ret->mutex){+.+.}-{3:3}, at: scrub_handle_errored_block.isra.0+0x5d2/0x1640 [btrfs]
[ 9057.669017] #3: ffff8e25582d4b70 (&fs_info->commit_root_sem){++++}-{3:3}, at: iterate_extent_inodes+0x10b/0x280 [btrfs]
[ 9057.670408]
stack backtrace:
[ 9057.670976] CPU: 7 PID: 30781 Comm: kworker/u16:4 Not tainted 5.14.0-rc2-btrfs-next-93 #1
[ 9057.672030] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 9057.673492] Workqueue: btrfs-scrub btrfs_work_helper [btrfs]
[ 9057.674258] Call Trace:
[ 9057.674588] dump_stack_lvl+0x57/0x72
[ 9057.675083] check_noncircular+0xf3/0x110
[ 9057.675611] __lock_acquire+0x130e/0x2210
[ 9057.676132] lock_acquire+0xd7/0x310
[ 9057.676605] ? __btrfs_tree_read_lock+0x24/0x110 [btrfs]
[ 9057.677313] ? lock_is_held_type+0xe8/0x140
[ 9057.677849] down_read_nested+0x4b/0x140
[ 9057.678349] ? __btrfs_tree_read_lock+0x24/0x110 [btrfs]
[ 9057.679068] __btrfs_tree_read_lock+0x24/0x110 [btrfs]
[ 9057.679760] btrfs_read_lock_root_node+0x31/0x40 [btrfs]
[ 9057.680458] btrfs_search_slot+0x537/0xc00 [btrfs]
[ 9057.681083] ? _raw_spin_unlock+0x29/0x40
[ 9057.681594] ? btrfs_find_all_roots_safe+0x11f/0x140 [btrfs]
[ 9057.682336] scrub_print_warning_inode+0x89/0x370 [btrfs]
[ 9057.683058] ? btrfs_find_all_roots_safe+0x11f/0x140 [btrfs]
[ 9057.683834] ? scrub_write_block_to_dev_replace+0xb0/0xb0 [btrfs]
[ 9057.684632] iterate_extent_inodes+0x1e3/0x280 [btrfs]
[ 9057.685316] scrub_print_warning+0x15d/0x2f0 [btrfs]
[ 9057.685977] ? ___ratelimit+0xa4/0x110
[ 9057.686460] scrub_handle_errored_block.isra.0+0x135f/0x1640 [btrfs]
[ 9057.687316] scrub_bio_end_io_worker+0x101/0x2e0 [btrfs]
[ 9057.688021] btrfs_work_helper+0xf8/0x400 [btrfs]
[ 9057.688649] ? lock_is_held_type+0xe8/0x140
[ 9057.689180] process_one_work+0x247/0x5a0
[ 9057.689696] worker_thread+0x55/0x3c0
[ 9057.690175] ? process_one_work+0x5a0/0x5a0
[ 9057.690731] kthread+0x155/0x180
[ 9057.691158] ? set_kthread_struct+0x40/0x40
[ 9057.691697] ret_from_fork+0x22/0x30
Fix this by making btrfs_find_all_roots() never attempt to lock the
commit_root_sem when it is called from btrfs_qgroup_trace_extent_post().
We can't just pass a non-NULL transaction handle to btrfs_find_all_roots()
from btrfs_qgroup_trace_extent_post(), because that would make backref
lookup not use commit roots and acquire read locks on extent buffers, and
therefore could deadlock when btrfs_qgroup_trace_extent_post() is called
from the btrfs_truncate_inode_items() code path which has acquired a write
lock on an extent buffer of the subvolume btree.
CC: stable@vger.kernel.org # 4.19+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9acc8103ab upstream.
If we have an inode that does not have the full sync flag set, was changed
in the current transaction, then it is logged while logging some other
inode (like its parent directory for example), its i_size is increased by
a truncate operation, the log is synced through an fsync of some other
inode and then finally we explicitly call fsync on our inode, the new
i_size is not persisted.
The following example shows how to trigger it, with comments explaining
how and why the issue happens:
$ mkfs.btrfs -f /dev/sdc
$ mount /dev/sdc /mnt
$ touch /mnt/foo
$ xfs_io -f -c "pwrite -S 0xab 0 1M" /mnt/bar
$ sync
# Fsync bar, this will be a noop since the file has not yet been
# modified in the current transaction. The goal here is to clear
# BTRFS_INODE_NEEDS_FULL_SYNC from the inode's runtime flags.
$ xfs_io -c "fsync" /mnt/bar
# Now rename both files, without changing their parent directory.
$ mv /mnt/bar /mnt/bar2
$ mv /mnt/foo /mnt/foo2
# Increase the size of bar2 with a truncate operation.
$ xfs_io -c "truncate 2M" /mnt/bar2
# Now fsync foo2, this results in logging its parent inode (the root
# directory), and logging the parent results in logging the inode of
# file bar2 (its inode item and the new name). The inode of file bar2
# is logged with an i_size of 0 bytes since it's logged in
# LOG_INODE_EXISTS mode, meaning we are only logging its names (and
# xattrs if it had any) and the i_size of the inode will not be changed
# when the log is replayed.
$ xfs_io -c "fsync" /mnt/foo2
# Now explicitly fsync bar2. This resulted in doing nothing, not
# logging the inode with the new i_size of 2M and the hole from file
# offset 1M to 2M. Because the inode did not have the flag
# BTRFS_INODE_NEEDS_FULL_SYNC set, when it was logged through the
# fsync of file foo2, its last_log_commit field was updated,
# resulting in this explicit of file bar2 not doing anything.
$ xfs_io -c "fsync" /mnt/bar2
# File bar2 content and size before a power failure.
$ od -A d -t x1 /mnt/bar2
0000000 ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab
*
1048576 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
*
2097152
<power failure>
# Mount the filesystem to replay the log.
$ mount /dev/sdc /mnt
# Read the file again, should have the same content and size as before
# the power failure happened, but it doesn't, i_size is still at 1M.
$ od -A d -t x1 /mnt/bar2
0000000 ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab
*
1048576
This started to happen after commit 209ecbb858 ("btrfs: remove stale
comment and logic from btrfs_inode_in_log()"), since btrfs_inode_in_log()
no longer checks if the inode's list of modified extents is not empty.
However, checking that list is not the right way to address this case
and the check was added long time ago in commit 125c4cf9f3
("Btrfs: set inode's logged_trans/last_log_commit after ranged fsync")
for a different purpose, to address consecutive ranged fsyncs.
The reason that checking for the list emptiness makes this test pass is
because during an expanding truncate we create an extent map to represent
a hole from the old i_size to the new i_size, and add that extent map to
the list of modified extents in the inode. However if we are low on
available memory and we can not allocate a new extent map, then we don't
treat it as an error and just set the full sync flag on the inode, so that
the next fsync does not rely on the list of modified extents - so checking
for the emptiness of the list to decide if the inode needs to be logged is
not reliable, and results in not logging the inode if it was not possible
to allocate the extent map for the hole.
Fix this by ensuring that if we are only logging that an inode exists
(inode item, names/references and xattrs), we don't update the inode's
last_log_commit even if it does not have the full sync runtime flag set.
A test case for fstests follows soon.
CC: stable@vger.kernel.org # 5.13+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 67f0d6d988 upstream.
The "rb_per_cpu_empty()" misinterpret the condition (as not-empty) when
"head_page" and "commit_page" of "struct ring_buffer_per_cpu" points to
the same buffer page, whose "buffer_data_page" is empty and "read" field
is non-zero.
An error scenario could be constructed as followed (kernel perspective):
1. All pages in the buffer has been accessed by reader(s) so that all of
them will have non-zero "read" field.
2. Read and clear all buffer pages so that "rb_num_of_entries()" will
return 0 rendering there's no more data to read. It is also required
that the "read_page", "commit_page" and "tail_page" points to the same
page, while "head_page" is the next page of them.
3. Invoke "ring_buffer_lock_reserve()" with large enough "length"
so that it shot pass the end of current tail buffer page. Now the
"head_page", "commit_page" and "tail_page" points to the same page.
4. Discard current event with "ring_buffer_discard_commit()", so that
"head_page", "commit_page" and "tail_page" points to a page whose buffer
data page is now empty.
When the error scenario has been constructed, "tracing_read_pipe" will
be trapped inside a deadloop: "trace_empty()" returns 0 since
"rb_per_cpu_empty()" returns 0 when it hits the CPU containing such
constructed ring buffer. Then "trace_find_next_entry_inc()" always
return NULL since "rb_num_of_entries()" reports there's no more entry
to read. Finally "trace_seq_to_user()" returns "-EBUSY" spanking
"tracing_read_pipe" back to the start of the "waitagain" loop.
I've also written a proof-of-concept script to construct the scenario
and trigger the bug automatically, you can use it to trace and validate
my reasoning above:
https://github.com/aegistudio/RingBufferDetonator.git
Tests has been carried out on linux kernel 5.14-rc2
(2734d6c1b1), my fixed version
of kernel (for testing whether my update fixes the bug) and
some older kernels (for range of affected kernels). Test result is
also attached to the proof-of-concept repository.
Link: https://lore.kernel.org/linux-trace-devel/YPaNxsIlb2yjSi5Y@aegistudio/
Link: https://lore.kernel.org/linux-trace-devel/YPgrN85WL9VyrZ55@aegistudio
Cc: stable@vger.kernel.org
Fixes: bf41a158ca ("ring-buffer: make reentrant")
Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org>
Signed-off-by: Haoran Luo <www@aegistudio.net>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1e3bac71c5 upstream.
Currently the histogram logic allows the user to write "cpu" in as an
event field, and it will record the CPU that the event happened on.
The problem with this is that there's a lot of events that have "cpu"
as a real field, and using "cpu" as the CPU it ran on, makes it
impossible to run histograms on the "cpu" field of events.
For example, if I want to have a histogram on the count of the
workqueue_queue_work event on its cpu field, running:
># echo 'hist:keys=cpu' > events/workqueue/workqueue_queue_work/trigger
Gives a misleading and wrong result.
Change the command to "common_cpu" as no event should have "common_*"
fields as that's a reserved name for fields used by all events. And
this makes sense here as common_cpu would be a field used by all events.
Now we can even do:
># echo 'hist:keys=common_cpu,cpu if cpu < 100' > events/workqueue/workqueue_queue_work/trigger
># cat events/workqueue/workqueue_queue_work/hist
# event histogram
#
# trigger info: hist:keys=common_cpu,cpu:vals=hitcount:sort=hitcount:size=2048 if cpu < 100 [active]
#
{ common_cpu: 0, cpu: 2 } hitcount: 1
{ common_cpu: 0, cpu: 4 } hitcount: 1
{ common_cpu: 7, cpu: 7 } hitcount: 1
{ common_cpu: 0, cpu: 7 } hitcount: 1
{ common_cpu: 0, cpu: 1 } hitcount: 1
{ common_cpu: 0, cpu: 6 } hitcount: 2
{ common_cpu: 0, cpu: 5 } hitcount: 2
{ common_cpu: 1, cpu: 1 } hitcount: 4
{ common_cpu: 6, cpu: 6 } hitcount: 4
{ common_cpu: 5, cpu: 5 } hitcount: 14
{ common_cpu: 4, cpu: 4 } hitcount: 26
{ common_cpu: 0, cpu: 0 } hitcount: 39
{ common_cpu: 2, cpu: 2 } hitcount: 184
Now for backward compatibility, I added a trick. If "cpu" is used, and
the field is not found, it will fall back to "common_cpu" and work as
it did before. This way, it will still work for old programs that use
"cpu" to get the actual CPU, but if the event has a "cpu" as a field, it
will get that event's "cpu" field, which is probably what it wants
anyway.
I updated the tracefs/README to include documentation about both the
common_timestamp and the common_cpu. This way, if that text is present in
the README, then an application can know that common_cpu is supported over
just plain "cpu".
Link: https://lkml.kernel.org/r/20210721110053.26b4f641@oasis.local.home
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@vger.kernel.org
Fixes: 8b7622bf94 ("tracing: Add cpu field for hist triggers")
Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 352384d5c8 upstream.
Because of the significant overhead that retpolines pose on indirect
calls, the tracepoint code was updated to use the new "static_calls" that
can modify the running code to directly call a function instead of using
an indirect caller, and this function can be changed at runtime.
In the tracepoint code that calls all the registered callbacks that are
attached to a tracepoint, the following is done:
it_func_ptr = rcu_dereference_raw((&__tracepoint_##name)->funcs);
if (it_func_ptr) {
__data = (it_func_ptr)->data;
static_call(tp_func_##name)(__data, args);
}
If there's just a single callback, the static_call is updated to just call
that callback directly. Once another handler is added, then the static
caller is updated to call the iterator, that simply loops over all the
funcs in the array and calls each of the callbacks like the old method
using indirect calling.
The issue was discovered with a race between updating the funcs array and
updating the static_call. The funcs array was updated first and then the
static_call was updated. This is not an issue as long as the first element
in the old array is the same as the first element in the new array. But
that assumption is incorrect, because callbacks also have a priority
field, and if there's a callback added that has a higher priority than the
callback on the old array, then it will become the first callback in the
new array. This means that it is possible to call the old callback with
the new callback data element, which can cause a kernel panic.
static_call = callback1()
funcs[] = {callback1,data1};
callback2 has higher priority than callback1
CPU 1 CPU 2
----- -----
new_funcs = {callback2,data2},
{callback1,data1}
rcu_assign_pointer(tp->funcs, new_funcs);
/*
* Now tp->funcs has the new array
* but the static_call still calls callback1
*/
it_func_ptr = tp->funcs [ new_funcs ]
data = it_func_ptr->data [ data2 ]
static_call(callback1, data);
/* Now callback1 is called with
* callback2's data */
[ KERNEL PANIC ]
update_static_call(iterator);
To prevent this from happening, always switch the static_call to the
iterator before assigning the tp->funcs to the new array. The iterator will
always properly match the callback with its data.
To trigger this bug:
In one terminal:
while :; do hackbench 50; done
In another terminal
echo 1 > /sys/kernel/tracing/events/sched/sched_waking/enable
while :; do
echo 1 > /sys/kernel/tracing/set_event_pid;
sleep 0.5
echo 0 > /sys/kernel/tracing/set_event_pid;
sleep 0.5
done
And it doesn't take long to crash. This is because the set_event_pid adds
a callback to the sched_waking tracepoint with a high priority, which will
be called before the sched_waking trace event callback is called.
Note, the removal to a single callback updates the array first, before
changing the static_call to single callback, which is the proper order as
the first element in the array is the same as what the static_call is
being changed to.
Link: https://lore.kernel.org/io-uring/4ebea8f0-58c9-e571-fd30-0ce4f6f09c70@samba.org/
Cc: stable@vger.kernel.org
Fixes: d25e37d89d ("tracepoint: Optimize using static_call()")
Reported-by: Stefan Metzmacher <metze@samba.org>
tested-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2bab693a60 upstream.
kexec_load_file() relies on the memblock infrastructure to avoid
stamping over regions of memory that are essential to the survival
of the system.
However, nobody seems to agree how to flag these regions as reserved,
and (for example) EFI only publishes its reservations in /proc/iomem
for the benefit of the traditional, userspace based kexec tool.
On arm64 platforms with GICv3, this can result in the payload being
placed at the location of the LPI tables. Shock, horror!
Let's augment the EFI reservation code with a memblock_reserve() call,
protecting our dear tables from the secondary kernel invasion.
Reported-by: Moritz Fischer <mdf@kernel.org>
Tested-by: Moritz Fischer <mdf@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 86762ad4ab upstream.
During interrupt registration, attach state is checked. If attached,
then the Type-C state is updated with typec_set_xxx functions and role
switch is set with usb_role_switch_set_role().
If the usb_role_switch parameter is error or null, the function simply
returns 0.
So, to update usb_role_switch role if a device is attached before the
irq is registered, usb_role_switch must be registered before irq
registration.
Fixes: da0cb63100 ("usb: typec: add support for STUSB160x Type-C controller family")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Amelie Delaunay <amelie.delaunay@foss.st.com>
Link: https://lore.kernel.org/r/20210716120718.20398-2-amelie.delaunay@foss.st.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fecb3a171d upstream.
Because of dwc2_hsotg_ep_stop_xfr() function uses poll
mode, first need to mask GINTSTS_GOUTNAKEFF interrupt.
In Slave mode GINTSTS_GOUTNAKEFF interrupt will be
aserted only after pop OUT NAK status packet from RxFIFO.
In dwc2_hsotg_ep_sethalt() function before setting
DCTL_SGOUTNAK need to unmask GOUTNAKEFF interrupt.
Tested by USBCV CH9 and MSC tests set in Slave, BDMA and DDMA.
All tests are passed.
Fixes: a4f8277145 ("usb: dwc2: gadget: Disable enabled HW endpoint in dwc2_hsotg_ep_disable")
Fixes: 6070636c49 ("usb: dwc2: Fix Stalling a Non-Isochronous OUT EP")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Minas Harutyunyan <Minas.Harutyunyan@synopsys.com>
Link: https://lore.kernel.org/r/e17fad802bbcaf879e1ed6745030993abb93baf8.1626152924.git.Minas.Harutyunyan@synopsys.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c4a0f7a6ab upstream.
Commit 0112b7ce68 ("usb: dwc2: Update dwc2_handle_usb_suspend_intr
function.") changed the way the driver handles power down modes in a such
way that it uses clock gating when no other power down mode is available.
This however doesn't work well on the DWC2 implementation used on the
Samsung SoCs. When a clock gating is enabled, system hangs. It looks that
the proper clock gating requires some additional glue code in the shared
USB2 PHY and/or Samsung glue code for the DWC2. To restore driver
operation on the Samsung SoCs simply skip enabling clock gating mode
until one finds what is really needed to make it working reliably.
Fixes: 0112b7ce68 ("usb: dwc2: Update dwc2_handle_usb_suspend_intr function.")
Cc: stable <stable@vger.kernel.org>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/20210716050127.4406-1-m.szyprowski@samsung.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5719df243e upstream.
This driver has a potential issue which this driver is possible to
cause superfluous irqs after usb_pkt_pop() is called. So, after
the commit 3af3260528 ("usb: renesas_usbhs: fix error return
code of usbhsf_pkt_handler()") had been applied, we could observe
the following error happened when we used g_audio.
renesas_usbhs e6590000.usb: irq_ready run_error 1 : -22
To fix the issue, disable the tx or rx interrupt in usb_pkt_pop().
Fixes: 2743e7f90d ("usb: renesas_usbhs: fix the usb_pkt_pop()")
Cc: <stable@vger.kernel.org> # v4.4+
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Link: https://lore.kernel.org/r/20210624122039.596528-1-yoshihiro.shimoda.uh@renesas.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b5fdf5c6e6 upstream.
The MAX-3421 USB driver remembers the state of the USB toggles for a
device/endpoint. To save SPI writes, this was only done when a new
device/endpoint was being used. Unfortunately, if the old device was
removed, this would cause writes to freed memory.
To fix this, a simpler scheme is used. The toggles are read from
hardware when a URB is completed, and the toggles are always written to
hardware when any URB transaction is started. This will cause a few more
SPI transactions, but no causes kernel panics.
Fixes: 2d53139f31 ("Add support for using a MAX3421E chip as a host driver.")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Mark Tomlinson <mark.tomlinson@alliedtelesis.co.nz>
Link: https://lore.kernel.org/r/20210625031456.8632-1-mark.tomlinson@alliedtelesis.co.nz
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1bf2761c83 upstream.
Maximum Exit Latency (MEL) value is used by host to know how much in
advance it needs to start waking up a U1/U2 suspended link in order to
service a periodic transfer in time.
Current MEL calculation only includes the time to wake up the path from
U1/U2 to U0. This is called tMEL1 in USB 3.1 section C 1.5.2
Total MEL = tMEL1 + tMEL2 +tMEL3 + tMEL4 which should additinally include:
- tMEL2 which is the time it takes for PING message to reach device
- tMEL3 time for device to process the PING and submit a PING_RESPONSE
- tMEL4 time for PING_RESPONSE to traverse back upstream to host.
Add the missing tMEL2, tMEL3 and tMEL4 to MEL calculation.
Cc: <stable@kernel.org> # v3.5
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20210715150122.1995966-1-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1b7f56fbc7 upstream.
The device initiated link power management U1/U2 states should not be
enabled in case the system exit latency plus one bus interval (125us) is
greater than the shortest service interval of any periodic endpoint.
This is the case for both U1 and U2 sytstem exit latencies and link states.
See USB 3.2 section 9.4.9 "Set Feature" for more details
Note, before this patch the host and device initiated U1/U2 lpm states
were both enabled with lpm. After this patch it's possible to end up with
only host inititated U1/U2 lpm in case the exit latencies won't allow
device initiated lpm.
If this case we still want to set the udev->usb3_lpm_ux_enabled flag so
that sysfs users can see the link may go to U1/U2.
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Cc: stable <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20210715150122.1995966-2-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d9c57d3ed5 upstream.
The H_ENTER_NESTED hypercall is handled by the L0, and it is a request
by the L1 to switch the context of the vCPU over to that of its L2
guest, and return with an interrupt indication. The L1 is responsible
for switching some registers to guest context, and the L0 switches
others (including all the hypervisor privileged state).
If the L2 MSR has TM active, then the L1 is responsible for
recheckpointing the L2 TM state. Then the L1 exits to L0 via the
H_ENTER_NESTED hcall, and the L0 saves the TM state as part of the exit,
and then it recheckpoints the TM state as part of the nested entry and
finally HRFIDs into the L2 with TM active MSR. Not efficient, but about
the simplest approach for something that's horrendously complicated.
Problems arise if the L1 exits to the L0 with a TM state which does not
match the L2 TM state being requested. For example if the L1 is
transactional but the L2 MSR is non-transactional, or vice versa. The
L0's HRFID can take a TM Bad Thing interrupt and crash.
Fix this by disallowing H_ENTER_NESTED in TM[T] state entirely, and then
ensuring that if the L1 is suspended then the L2 must have TM active,
and if the L1 is not suspended then the L2 must not have TM active.
Fixes: 360cae3137 ("KVM: PPC: Book3S HV: Nested guest entry via hypercall")
Cc: stable@vger.kernel.org # v4.20+
Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f62f3c2064 upstream.
The kvmppc_rtas_hcall() sets the host rtas_args.rets pointer based on
the rtas_args.nargs that was provided by the guest. That guest nargs
value is not range checked, so the guest can cause the host rets pointer
to be pointed outside the args array. The individual rtas function
handlers check the nargs and nrets values to ensure they are correct,
but if they are not, the handlers store a -3 (0xfffffffd) failure
indication in rets[0] which corrupts host memory.
Fix this by testing up front whether the guest supplied nargs and nret
would exceed the array size, and fail the hcall directly without storing
a failure indication to rets[0].
Also expand on a comment about why we kill the guest and try not to
return errors directly if we have a valid rets[0] pointer.
Fixes: 8e591cb720 ("KVM: PPC: Book3S: Add infrastructure to implement kernel-side RTAS calls")
Cc: stable@vger.kernel.org # v3.10+
Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0b60557230 upstream.
When MSI is used by the ehci-hcd driver, it can cause lost interrupts which
results in EHCI only continuing to work due to a polling fallback. But the
reliance of polling drastically reduces performance of any I/O through EHCI.
Interrupts are lost as the EHCI interrupt handler does not safely handle
edge-triggered interrupts. It fails to ensure all interrupt status bits are
cleared, which works with level-triggered interrupts but not the
edge-triggered interrupts typical from using MSI.
To fix this problem, check if the driver may have raced with the hardware
setting additional interrupt status bits and clear status until it is in a
stable state.
Fixes: 306c54d0ed ("usb: hcd: Try MSI interrupts on PCI devices")
Tested-by: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: David Jeffery <djeffery@redhat.com>
Link: https://lore.kernel.org/r/20210715213744.GA44506@redhat
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 72f68bf5c7 upstream.
There's a small window where a USB 2 remote wake may be left unhandled
due to a race between hub thread and xhci port event interrupt handler.
When the resume event is detected in the xhci interrupt handler it kicks
the hub timer, which should move the port from resume to U0 once resume
has been signalled for long enough.
To keep the hub "thread" running we set a bus_state->resuming_ports flag.
This flag makes sure hub timer function kicks itself.
checking this flag was not properly protected by the spinlock. Flag was
copied to a local variable before lock was taken. The local variable was
then checked later with spinlock held.
If interrupt is handled right after copying the flag to the local variable
we end up stopping the hub thread before it can handle the USB 2 resume.
CPU0 CPU1
(hub thread) (xhci event handler)
xhci_hub_status_data()
status = bus_state->resuming_ports;
<Interrupt>
handle_port_status()
spin_lock()
bus_state->resuming_ports = 1
set_flag(HCD_FLAG_POLL_RH)
spin_unlock()
spin_lock()
if (!status)
clear_flag(HCD_FLAG_POLL_RH)
spin_unlock()
Fix this by taking the lock a bit earlier so that it covers
the resuming_ports flag copy in the hub thread
Cc: <stable@vger.kernel.org>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20210715150651.1996099-2-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0665e38731 upstream.
Commit a66d21d7db ("usb: xhci: Add support for Renesas controller with
memory") added renesas_usb_fw.mem firmware reference to xhci-pci. Thus
modinfo indicates xhci-pci.ko has "firmware: renesas_usb_fw.mem". But
the firmware is only actually used with CONFIG_USB_XHCI_PCI_RENESAS. An
unusable firmware reference can trigger safety checkers which look for
drivers with unmet firmware dependencies.
Avoid referring to renesas_usb_fw.mem in circumstances when it cannot be
loaded (when CONFIG_USB_XHCI_PCI_RENESAS isn't set).
Fixes: a66d21d7db ("usb: xhci: Add support for Renesas controller with memory")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Thelen <gthelen@google.com>
Link: https://lore.kernel.org/r/20210702071224.3673568-1-gthelen@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2e2832562c upstream.
If a 32-bit application is being used with a 64-bit kernel and is using
the mmap mechanism to write data, then the SNDRV_PCM_IOCTL_SYNC_PTR
ioctl results in calling snd_pcm_ioctl_sync_ptr_compat(). Make this use
pcm_lib_apply_appl_ptr() so that the substream's ack() method, if
defined, is called.
The snd_pcm_sync_ptr() function, used in the 64-bit ioctl case, already
uses snd_pcm_ioctl_sync_ptr_compat().
Fixes: 9027c4639e ("ALSA: pcm: Call ack() whenever appl_ptr is updated")
Signed-off-by: Alan Young <consult.awy@gmail.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/c441f18c-eb2a-3bdd-299a-696ccca2de9c@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1c2b951915 upstream.
SB16 CSP driver may hit potentially a typical ABBA deadlock in two
code paths:
In snd_sb_csp_stop():
spin_lock_irqsave(&p->chip->mixer_lock, flags);
spin_lock(&p->chip->reg_lock);
In snd_sb_csp_load():
spin_lock_irqsave(&p->chip->reg_lock, flags);
spin_lock(&p->chip->mixer_lock);
Also the similar pattern is seen in snd_sb_csp_start().
Although the practical impact is very small (those states aren't
triggered in the same running state and this happens only on a real
hardware, decades old ISA sound boards -- which must be very difficult
to find nowadays), it's a real scenario and has to be fixed.
This patch addresses those deadlocks by splitting the locks in
snd_sb_csp_start() and snd_sb_csp_stop() for avoiding the nested
locks.
Reported-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/7b0fcdaf-cd4f-4728-2eae-48c151a92e10@gmail.com
Link: https://lore.kernel.org/r/20210716132723.13216-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 64752a95b7 upstream.
Recently we've added a new usb_mixer element type, USB_MIXER_BESPOKEN,
but it wasn't added in the table in snd_usb_mixer_dump_cval(). This
is no big problem since each bespoken type should have its own dump
method, but it still isn't disallowed to use the standard one, so we
should cover it as well. Along with it, define the table with the
explicit array initializer for avoiding other pitfalls.
Fixes: 785b6f29a7 ("ALSA: usb-audio: scarlett2: Fix wrong resume call")
Reported-by: Pavel Machek <pavel@denx.de>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20210714084836.1977-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 463f36c76f upstream.
The DMA code section of the decompressor must be compiled with expolines
if Spectre V2 mitigation has been enabled for the decompressed kernel.
This is required because although the decompressor's image contains
the DMA code section, it is handed over to the decompressed kernel for use.
Because the DMA code is already slow w/o expolines, use expolines always
regardless whether the decompressed kernel is using them or not. This
simplifies the DMA code by dropping the conditional compilation of
expolines.
Fixes: bf72630130 ("s390: use proper expoline sections for .dma code")
Cc: <stable@vger.kernel.org> # 5.2
Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 10252bae86 upstream.
There's a chance that the IDA allocated in mmc_alloc_host() is not freed
for some time because it's freed as part of a class' release function
(see mmc_host_classdev_release() where the IDA is freed). If another
thread is holding a reference to the class, then only once all balancing
device_put() calls (in turn calling kobject_put()) have been made will
the IDA be released and usable again.
Normally this isn't a problem because the kobject is released before
anything else that may want to use the same number tries to again, but
with CONFIG_DEBUG_KOBJECT_RELEASE=y and OF aliases it becomes pretty
easy to try to allocate an alias from the IDA twice while the first time
it was allocated is still pending a call to ida_simple_remove(). It's
also possible to trigger it by using CONFIG_DEBUG_KOBJECT_RELEASE and
probe defering a driver at boot that calls mmc_alloc_host() before
trying to get resources that may defer likes clks or regulators.
Instead of allocating from the IDA in this scenario, let's just skip it
if we know this is an OF alias. The number is already "claimed" and
devices that aren't using OF aliases won't try to use the claimed
numbers anyway (see mmc_first_nonreserved_index()). This should avoid
any issues with mmc_alloc_host() returning failures from the
ida_simple_get() in the case that we're using an OF alias.
Cc: Matthias Schiffer <matthias.schiffer@ew.tq-group.com>
Cc: Sujit Kautkar <sujitka@chromium.org>
Reported-by: Zubin Mithra <zsm@chromium.org>
Fixes: fa2d0aa969 ("mmc: core: Allow setting slot index via device tree alias")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Link: https://lore.kernel.org/r/20210623075002.1746924-3-swboyd@chromium.org
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 488968a894 ]
Remove the conditional checking for out_data_len and skipping the fallocate
if it is 0. This is wrong will actually change any legitimate the fallocate
where the entire region is unallocated into a no-op.
Additionally, before allocating the range, if FALLOC_FL_KEEP_SIZE is set then
we need to clamp the length of the fallocate region as to not extend the size of the file.
Fixes: 966a3cb7c7 ("cifs: improve fallocate emulation")
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2485bd7557 ]
We only allow sending single credit writes through the SMB2_write() synchronous
api so split this into smaller chunks.
Fixes: 966a3cb7c7 ("cifs: improve fallocate emulation")
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reported-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7aaa0f311e ]
Any interraction with the buffer pool (seeding a buffer, acquire one) is
made through a software portal (SWP, a DPIO object).
There are circumstances where the dpaa2-switch driver probes on a DPSW
before any DPIO devices have been probed. In this case, seeding of the
buffer pool will lead to a panic since no SWPs are initialized.
To fix this, seed the buffer pool after making sure that the software
portals have been probed and are ready to be used.
Fixes: 0b1b713704 ("staging: dpaa2-switch: handle Rx path on control interface")
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9d85a6f44b ]
The 4th parameter in tc_chain_notify() should be flags rather than seq.
Let's change it back correctly.
Fixes: 32a4f5ecd7 ("net: sched: introduce chain object to uapi")
Signed-off-by: Yajun Deng <yajun.deng@linux.dev>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e40cba9490 ]
This simple series of commands:
ip link add br0 type bridge vlan_filtering 1
ip link set swp0 master br0
fails on sja1105 with the following error:
[ 33.439103] sja1105 spi0.1: vlan-lookup-table needs to have at least the default untagged VLAN
[ 33.447710] sja1105 spi0.1: Invalid config, cannot upload
Warning: sja1105: Failed to change VLAN Ethertype.
For context, sja1105 has 3 operating modes:
- SJA1105_VLAN_UNAWARE: the dsa_8021q_vlans are committed to hardware
- SJA1105_VLAN_FILTERING_FULL: the bridge_vlans are committed to hardware
- SJA1105_VLAN_FILTERING_BEST_EFFORT: both the dsa_8021q_vlans and the
bridge_vlans are committed to hardware
Swapping out a VLAN list and another in happens in
sja1105_build_vlan_table(), which performs a delta update procedure.
That function is called from a few places, notably from
sja1105_vlan_filtering() which is called from the
SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING handler.
The above set of 2 commands fails when run on a kernel pre-commit
8841f6e63f ("net: dsa: sja1105: make devlink property
best_effort_vlan_filtering true by default"). So the priv->vlan_state
transition that takes place is between VLAN-unaware and full VLAN
filtering. So the dsa_8021q_vlans are swapped out and the bridge_vlans
are swapped in.
So why does it fail?
Well, the bridge driver, through nbp_vlan_init(), first sets up the
SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING attribute, and only then
proceeds to call nbp_vlan_add for the default_pvid.
So when we swap out the dsa_8021q_vlans and swap in the bridge_vlans in
the SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING handler, there are no bridge
VLANs (yet). So we have wiped the VLAN table clean, and the low-level
static config checker complains of an invalid configuration. We _will_
add the bridge VLANs using the dynamic config interface, albeit later,
when nbp_vlan_add() calls us. So it is natural that it fails.
So why did it ever work?
Surprisingly, it looks like I only tested this configuration with 2
things set up in a particular way:
- a network manager that brings all ports up
- a kernel with CONFIG_VLAN_8021Q=y
It is widely known that commit ad1afb0039 ("vlan_dev: VLAN 0 should be
treated as "no vlan tag" (802.1p packet)") installs VID 0 to every net
device that comes up. DSA treats these VLANs as bridge VLANs, and
therefore, in my testing, the list of bridge_vlans was never empty.
However, if CONFIG_VLAN_8021Q is not enabled, or the port is not up when
it joins a VLAN-aware bridge, the bridge_vlans list will be temporarily
empty, and the sja1105_static_config_reload() call from
sja1105_vlan_filtering() will fail.
To fix this, the simplest thing is to keep VID 4095, the one used for
CPU-injected control packets since commit ed040abca4 ("net: dsa:
sja1105: use 4095 as the private VLAN for untagged traffic"), in the
list of bridge VLANs too, not just the list of tag_8021q VLANs. This
ensures that the list of bridge VLANs will never be empty.
Fixes: ec5ae61076 ("net: dsa: sja1105: save/restore VLANs using a delta commit method")
Reported-by: Radu Pirea (NXP OSS) <radu-nicolae.pirea@oss.nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 213ad73d06 ]
Multiple complaints have been raised from the TFO users on the internet
stating that the TFO blackhole logic is too aggressive and gets falsely
triggered too often.
(e.g. https://blog.apnic.net/2021/07/05/tcp-fast-open-not-so-fast/)
Considering that most middleboxes no longer drop TFO packets, we decide
to disable the blackhole logic by setting
/proc/sys/net/ipv4/tcp_fastopen_blackhole_timeout_set to 0 by default.
Fixes: cf1ef3f071 ("net/tcp_fastopen: Disable active side TFO in certain scenarios")
Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d0e4dae744 ]
Commit dd2d082b57 ("riscv: Cleanup setup_bootmem()") adjusted
the calling sequence in setup_bootmem(), which invalidates the fix
commit de043da0b9 ("RISC-V: Fix usage of memblock_enforce_memory_limit")
did for 32-bit RISC-V unfortunately.
So now 32-bit RISC-V does not boot again when testing booting kernel
on QEMU 'virt' with '-m 2G', which was exactly what the original
commit de043da0b9 ("RISC-V: Fix usage of memblock_enforce_memory_limit")
tried to fix.
Fixes: dd2d082b57 ("riscv: Cleanup setup_bootmem()")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bb55362bd6 ]
Commit 65d6470d13 ("ibmvnic: clean pending indirect buffs during reset")
intended to remove the call to ibmvnic_tx_scrq_flush() when the
->resetting flag is true and was tested that way. But during the final
rebase to net-next, the hunk got applied to a block few lines below
(which happened to have the same diff context) and the wrong call to
ibmvnic_tx_scrq_flush() got removed.
Fix that by removing the correct ibmvnic_tx_scrq_flush() and restoring
the one that was incorrectly removed.
Fixes: 65d6470d13 ("ibmvnic: clean pending indirect buffs during reset")
Reported-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9bfce73c89 ]
Commit d26796ae58 ("udp: check udp sock encap_type in __udp_lib_err")
added checks for encapsulated sockets but it broke cases when there is
no implementation of encap_err_lookup for encapsulation, i.e. ESP in
UDP encapsulation. Fix it by calling encap_err_lookup only if socket
implements this method otherwise treat it as legal socket.
Fixes: d26796ae58 ("udp: check udp sock encap_type in __udp_lib_err")
Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 58acd10092 ]
syzbot reported a call trace:
BUG: KASAN: use-after-free in sctp_auth_shkey_hold+0x22/0xa0 net/sctp/auth.c:112
Call Trace:
sctp_auth_shkey_hold+0x22/0xa0 net/sctp/auth.c:112
sctp_set_owner_w net/sctp/socket.c:131 [inline]
sctp_sendmsg_to_asoc+0x152e/0x2180 net/sctp/socket.c:1865
sctp_sendmsg+0x103b/0x1d30 net/sctp/socket.c:2027
inet_sendmsg+0x99/0xe0 net/ipv4/af_inet.c:821
sock_sendmsg_nosec net/socket.c:703 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:723
This is an use-after-free issue caused by not updating asoc->shkey after
it was replaced in the key list asoc->endpoint_shared_keys, and the old
key was freed.
This patch is to fix by also updating active_key for asoc when old key is
being replaced with a new one. Note that this issue doesn't exist in
sctp_auth_del_key_id(), as it's not allowed to delete the active_key
from the asoc.
Fixes: 1b1e0bc994 ("sctp: add refcnt support for sh_key")
Reported-by: syzbot+b774577370208727d12b@syzkaller.appspotmail.com
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit aaeb7bb061 ]
When using Write Zeroes on a namespace that has protection
information enabled they behavior without the PRACT bit
counter-intuitive and will generally lead to validation failures
when reading the written blocks. Fix this by always setting the
PRACT bit that generates matching PI data on the fly.
Fixes: 6e02318eae ("nvme: add support for the Write Zeroes command")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e9a72f874d ]
When registering the MDIO bus for a r8169 device, we use the PCI
bus/device specifier as a (seemingly) unique device identifier.
However the very same BDF number can be used on another PCI segment,
which makes the driver fail probing:
[ 27.544136] r8169 0002:07:00.0: enabling device (0000 -> 0003)
[ 27.559734] sysfs: cannot create duplicate filename '/class/mdio_bus/r8169-700'
....
[ 27.684858] libphy: mii_bus r8169-700 failed to register
[ 27.695602] r8169: probe of 0002:07:00.0 failed with error -22
Add the segment number to the device name to make it more unique.
This fixes operation on ARM N1SDP boards, with two boards connected
together to form an SMP system, and all on-board devices showing up
twice, just on different PCI segments. A similar issue would occur on
large systems with many PCI slots and multiple RTL8169 NICs.
Fixes: f1e911d5d0 ("r8169: add basic phylib support")
Signed-off-by: Sayanta Pattanayak <sayanta.pattanayak@arm.com>
[Andre: expand commit message, use pci_domain_nr()]
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6c881ca0b3 ]
To quote Alexey[1]:
I was adding custom tracepoint to the kernel, grabbed full F34 kernel
.config, disabled modules and booted whole shebang as VM kernel.
Then did
perf record -a -e ...
It crashed:
general protection fault, probably for non-canonical address 0x435f5346592e4243: 0000 [#1] SMP PTI
CPU: 1 PID: 842 Comm: cat Not tainted 5.12.6+ #26
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014
RIP: 0010:t_show+0x22/0xd0
Then reproducer was narrowed to
# cat /sys/kernel/tracing/printk_formats
Original F34 kernel with modules didn't crash.
So I started to disable options and after disabling AFS everything
started working again.
The root cause is that AFS was placing char arrays content into a
section full of _pointers_ to strings with predictable consequences.
Non canonical address 435f5346592e4243 is "CB.YFS_" which came from
CM_NAME macro.
Steps to reproduce:
CONFIG_AFS=y
CONFIG_TRACING=y
# cat /sys/kernel/tracing/printk_formats
Fix this by the following means:
(1) Add enum->string translation tables in the event header with the AFS
and YFS cache/callback manager operations listed by RPC operation ID.
(2) Modify the afs_cb_call tracepoint to print the string from the
translation table rather than using the string at the afs_call name
pointer.
(3) Switch translation table depending on the service we're being accessed
as (AFS or YFS) in the tracepoint print clause. Will this cause
problems to userspace utilities?
Note that the symbolic representation of the YFS service ID isn't
available to this header, so I've put it in as a number. I'm not sure
if this is the best way to do this.
(4) Remove the name wrangling (CM_NAME) macro and put the names directly
into the afs_call_type structs in cmservice.c.
Fixes: 8e8d7f13b6 ("afs: Add some tracepoints")
Reported-by: Alexey Dobriyan (SK hynix) <adobriyan@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
cc: Andrew Morton <akpm@linux-foundation.org>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/YLAXfvZ+rObEOdc%2F@localhost.localdomain/ [1]
Link: https://lore.kernel.org/r/643721.1623754699@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/162430903582.2896199.6098150063997983353.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/162609463957.3133237.15916579353149746363.stgit@warthog.procyon.org.uk/ # v1 (repost)
Link: https://lore.kernel.org/r/162610726860.3408253.445207609466288531.stgit@warthog.procyon.org.uk/ # v2
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f3a1a937f7 ]
This reverts commit 0bd860493f.
While the patch was working as stated,ie preventing the L850-GL LTE modem
from crashing on some U3 wake-ups due to a race condition between the
host wake-up and the modem-side wake-up, when using the MBIM interface,
this would force disabling the USB runtime PM on the device.
The increased power consumption is significant for LTE laptops,
and given that with decently recent modem firmwares, when the modem hits
the bug, it automatically recovers (ie it drops from the bus, but
automatically re-enumerates after less than half a second, rather than being
stuck until a power cycle as it was doing with ancient firmware), for
most people, the trade-off now seems in favor of re-enabling it by
default.
For people with access to the platform code, the bug can also be worked-around
successfully by changing the USB3 LFPM polling off-time for the XHCI
controller in the BIOS code.
Signed-off-by: Vincent Palatin <vpalatin@chromium.org>
Link: https://lore.kernel.org/r/20210721092516.2775971-1-vpalatin@chromium.org
Fixes: 0bd860493f ("USB: quirks: ignore remote wake-up on Fibocom L850-GL LTE modem")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a47fa41381 ]
CPU affinity control added with commit 39ae3edda3 ("scsi: target: core:
Make completion affinity configurable") makes target_complete_cmd() queue
work on a CPU based on se_tpg->se_tpg_wwn->cmd_compl_affinity state.
LIO's EXTENDED COPY worker is a special case in that read/write cmds are
dispatched using the global xcopy_pt_tpg, which carries a NULL se_tpg_wwn
pointer following initialization in target_xcopy_setup_pt().
The NULL xcopy_pt_tpg->se_tpg_wwn pointer is dereferenced on completion of
any EXTENDED COPY initiated read/write cmds. E.g using the libiscsi
SCSI.ExtendedCopy.Simple test:
BUG: kernel NULL pointer dereference, address: 00000000000001a8
RIP: 0010:target_complete_cmd+0x9d/0x130 [target_core_mod]
Call Trace:
fd_execute_rw+0x148/0x42a [target_core_file]
? __dynamic_pr_debug+0xa7/0xe0
? target_check_reservation+0x5b/0x940 [target_core_mod]
__target_execute_cmd+0x1e/0x90 [target_core_mod]
transport_generic_new_cmd+0x17c/0x330 [target_core_mod]
target_xcopy_issue_pt_cmd+0x9/0x60 [target_core_mod]
target_xcopy_read_source.isra.7+0x10b/0x1b0 [target_core_mod]
? target_check_fua+0x40/0x40 [target_core_mod]
? transport_complete_task_attr+0x130/0x130 [target_core_mod]
target_xcopy_do_work+0x61f/0xc00 [target_core_mod]
This fix makes target_complete_cmd() queue work on se_cmd->cpuid if
se_tpg_wwn is NULL.
Link: https://lore.kernel.org/r/20210720225522.26291-1-ddiss@suse.de
Fixes: 39ae3edda3 ("scsi: target: core: Make completion affinity configurable")
Cc: Lee Duncan <lduncan@suse.com>
Cc: Mike Christie <michael.christie@oracle.com>
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: David Disseldorp <ddiss@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4a8ac5e45c ]
During some transfers the bus can still be busy when an interrupt is
received. Commit 763778cd79 ("i2c: mpc: Restore reread of I2C status
register") attempted to address this by re-reading MPC_I2C_SR once but
that just made it less likely to happen without actually preventing it.
Instead of a single re-read, poll with a timeout so that the bus is given
enough time to settle but a genuine stuck SCL is still noticed.
Fixes: 1538d82f46 ("i2c: mpc: Interrupt driven transfer")
Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit cdb330f4b4 ]
If MDSs aren't available while mounting a filesystem, the session state
will transition from SESSION_OPENING to SESSION_CLOSING. And in that
scenario check_session_state() will be called from delayed_work() and
trigger this WARN.
Avoid this by only WARNing after a session has already been established
(i.e., the s_ttl will be different from 0).
Fixes: 62575e270f ("ceph: check session state after bumping session->s_seq")
Signed-off-by: Luis Henriques <lhenriques@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8fb4792f09 ]
While running the self-tests on a KASAN enabled kernel, I observed a
slab-out-of-bounds splat very similar to the one reported in
commit 821bbf79fe ("ipv6: Fix KASAN: slab-out-of-bounds Read in
fib6_nh_flush_exceptions").
We additionally need to take care of fib6_metrics initialization
failure when the caller provides an nh.
The fix is similar, explicitly free the route instead of calling
fib6_info_release on a half-initialized object.
Fixes: f88d8ea67f ("ipv6: Plumb support for nexthop object in a fib6_info")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 727d6a8b7e ]
Currently tcf_skbmod_act() assumes that packets use Ethernet as their L2
protocol, which is not always the case. As an example, for CAN devices:
$ ip link add dev vcan0 type vcan
$ ip link set up vcan0
$ tc qdisc add dev vcan0 root handle 1: htb
$ tc filter add dev vcan0 parent 1: protocol ip prio 10 \
matchall action skbmod swap mac
Doing the above silently corrupts all the packets. Do not perform skbmod
actions for non-Ethernet packets.
Fixes: 86da71b573 ("net_sched: Introduce skbmod action")
Reviewed-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bbfd4506f9 ]
Currently, VF doesn't enable rx VLAN offload when initializating,
and PF does it for VFs. If user disable the rx VLAN offload for
VF with ethtool -K, and reload the VF driver, it may cause the
rx VLAN offload state being inconsistent between hardware and
software.
Fixes it by enabling rx VLAN offload when VF initializing.
Fixes: e2cb1dec97 ("net: hns3: Add HNS3 VF HCL(Hardware Compatibility Layer) Support")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1b713d14dc ]
Currently, the mailbox synchronous communication between VF and PF use
the following fields to maintain communication:
1. Origin_mbx_msg which was combined by message code and subcode, used
to match request and response.
2. Received_resp which means whether received response.
There may possible mismatches of the following situation:
1. VF sends message A with code=1 subcode=1.
2. PF was blocked about 500ms when processing the message A.
3. VF will detect message A timeout because it can't get the response
within 500ms.
4. VF sends message B with code=1 subcode=1 which equal message A.
5. PF processes the first message A and send the response message to
VF.
6. VF will identify the response matched the message B because the
code/subcode is the same. This will lead to mismatch of request and
response.
To fix the above bug, we use the following scheme:
1. The message sent from VF was labelled with match_id which was a
unique 16-bit non-zero value.
2. The response sent from PF will label with match_id which got from
the request.
3. The VF uses the match_id to match request and response message.
As for PF driver, it only needs to copy the match_id from request to
response.
Fixes: dde1a86e93 ("net: hns3: Add mailbox support to PF driver")
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6f20c8adb1 ]
tfo_active_disable_stamp is read and written locklessly.
We need to annotate these accesses appropriately.
Then, we need to perform the atomic_inc(tfo_active_disable_times)
after the timestamp has been updated, and thus add barriers
to make sure tcp_fastopen_active_should_disable() wont read
a stale timestamp.
Fixes: cf1ef3f071 ("net/tcp_fastopen: Disable active side TFO in certain scenarios")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Wei Wang <weiwan@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Wei Wang <weiwan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b16f3299ae ]
Building on ARCH=arc causes a "redefined" warning, so rename this
driver's CACHE_LINE_MASK to avoid the warning.
../drivers/net/ethernet/hisilicon/hip04_eth.c:134: warning: "CACHE_LINE_MASK" redefined
134 | #define CACHE_LINE_MASK 0x3F
In file included from ../include/linux/cache.h:6,
from ../include/linux/printk.h:9,
from ../include/linux/kernel.h:19,
from ../include/linux/list.h:9,
from ../include/linux/module.h:12,
from ../drivers/net/ethernet/hisilicon/hip04_eth.c:7:
../arch/arc/include/asm/cache.h:17: note: this is the location of the previous definition
17 | #define CACHE_LINE_MASK (~(L1_CACHE_BYTES - 1))
Fixes: d413779cdd ("net: hisilicon: Add an tx_desc to adapt HI13X1_GMAC")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Jiangfeng Xiao <xiaojiangfeng@huawei.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 11a39259ff ]
bnxt_half_open_nic() is called during during ethtool self test and is
protected by rtnl_lock. Firmware reset can be happening at the same
time. Only critical portions of the entire firmware reset sequence
are protected by the rtnl_lock. It is possible that bnxt_half_open_nic()
can be called when the firmware reset sequence is aborting. In that
case, bnxt_half_open_nic() needs to check if the ABORT_ERR flag is set
and abort if it is. The ethtool self test will fail but the NIC will be
brought to a consistent IF_DOWN state.
Without this patch, if bnxt_half_open_nic() were to continue in this
error state, it may crash like this:
bnxt_en 0000:82:00.1 enp130s0f1np1: FW reset in progress during close, FW reset will be aborted
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
...
Process ethtool (pid: 333327, stack limit = 0x0000000046476577)
Call trace:
bnxt_alloc_mem+0x444/0xef0 [bnxt_en]
bnxt_half_open_nic+0x24/0xb8 [bnxt_en]
bnxt_self_test+0x2dc/0x390 [bnxt_en]
ethtool_self_test+0xe0/0x1f8
dev_ethtool+0x1744/0x22d0
dev_ioctl+0x190/0x3e0
sock_ioctl+0x238/0x480
do_vfs_ioctl+0xc4/0x758
ksys_ioctl+0x84/0xb8
__arm64_sys_ioctl+0x28/0x38
el0_svc_handler+0xb0/0x180
el0_svc+0x8/0xc
Fixes: a1301f08c5 ("bnxt_en: Check abort error state in bnxt_open_nic().")
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 96bdd4b9ea ]
Only pass supported VLAN protocol IDs for stripped VLAN tags to the
stack. The stack will hit WARN() if the protocol ID is unsupported.
Existing firmware sets up the chip to strip 0x8100, 0x88a8, 0x9100.
Only the 1st two protocols are supported by the kernel.
Fixes: a196e96bb6 ("bnxt_en: clean up VLAN feature bit handling")
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3958b1da72 ]
When bnxt_open() fails in the firmware reset path, the driver needs to
gracefully abort, but it is executing code that should be invoked only
in the success path. Define a function to abort FW reset and
consolidate all error paths to call this new function.
Fixes: dab62e7c2d ("bnxt_en: Implement faster recovery for firmware fatal error.")
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6cd657cb3e ]
In the BNXT_FW_RESET_STATE_POLL_VF state in bnxt_fw_reset_task() after all
VFs have unregistered, we need to check for BNXT_STATE_ABORT_ERR after
we acquire the rtnl_lock. If the flag is set, we need to abort.
Fixes: 230d1f0de7 ("bnxt_en: Handle firmware reset.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2c9f046bc3 ]
The capabilities can change after firmware upgrade/downgrade, so we
should get the up-to-date RoCE capabilities everytime bnxt_ulp_probe()
is called.
Fixes: 2151fe0830 ("bnxt_en: Handle RESET_NOTIFY async event from firmware.")
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c81cfb6256 ]
If device is already disabled in reset path and PCI io error is
detected before the device could be enabled, driver could
call pci_disable_device() for already disabled device. Fix this
problem by calling pci_disable_device() only if the device is already
enabled.
Fixes: 6316ea6db9 ("bnxt_en: Enable AER support.")
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 71f6428332 ]
Currently it's possible to iterate over the dangling pointer in case the device
suddenly disappears. This may happen becase callers put it at the end of a loop.
Instead, let's move that call inside acpi_dev_get_next_match_dev().
Fixes: 803abec64e ("media: ipu3-cio2: Add cio2-bridge to ipu3-cio2 driver")
Fixes: bf263f64e8 ("media: ACPI / bus: Add acpi_dev_get_next_match_dev() and helper macro")
Fixes: edbd1bc495 ("efi/dev-path-parser: Switch to use for_each_acpi_dev_match()")
Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Reviewed-by: Daniel Scally <djrscally@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit edbd1bc495 ]
Switch to use for_each_acpi_dev_match() instead of home grown analogue.
No functional change intended.
Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d2cbbf1fe5 ]
During a rework of initramfs code the INITRAMFS_COMPRESSION config
option was removed in commit 65e00e04e5. A leftover as a dependency
broke the config option ACPI_TABLE_OVERRIDE_VIA_ BUILTIN_INITRD that
is used to enable the overriding of ACPI tables from built-in initrd.
Fixing the dependency.
Fixes: 65e00e04e5 ("initramfs: refactor the initramfs build rules")
Signed-off-by: Robert Richter <rrichter@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 56912da7a6 ]
The original implementation of RPM handling in probe() was mostly
correct, except it failed to call pm_runtime_get_*() to activate the
hardware. The subsequent fix, 734882a8bf ("spi: cadence: Correct
initialisation of runtime PM"), breaks the implementation further,
to the point where the system using this hard IP on ZynqMP hangs on
boot, because it accesses hardware which is gated off.
Undo 734882a8bf ("spi: cadence: Correct initialisation of runtime
PM") and instead add missing pm_runtime_get_noresume() and move the
RPM disabling all the way to the end of probe(). That makes ZynqMP
not hang on boot yet again.
Fixes: 734882a8bf ("spi: cadence: Correct initialisation of runtime PM")
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Charles Keepax <ckeepax@opensource.cirrus.com>
Cc: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20210716182133.218640-1-marex@denx.de
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e746f3451e ]
A ISCSI_IFACE_PARAM can have the same value as a ISCSI_NET_PARAM so when
iscsi_iface_attr_is_visible tries to figure out the type by just checking
the value, we can collide and return the wrong type. When we call into the
driver we might not match and return that we don't want attr visible in
sysfs. The patch fixes this by setting the type when we figure out what the
param is.
Link: https://lore.kernel.org/r/20210701002559.89533-1-michael.christie@oracle.com
Fixes: 3e0f65b34c ("[SCSI] iscsi_transport: Additional parameters for network settings")
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 517a16b1a8 ]
Commit 63346650c1 ("netrom: switch to sock timer API") switched to use
sock timer API. It replaces mod_timer() by sk_reset_timer(), and
del_timer() by sk_stop_timer().
Function sk_reset_timer() will increase the refcount of sock if it is
called on an inactive timer, hence, in case the timer expires, we need to
decrease the refcount ourselves in the handler, otherwise, the sock
refcount will be unbalanced and the sock will never be freed.
Signed-off-by: Nguyen Dinh Phi <phind.uet@gmail.com>
Reported-by: syzbot+10f1194569953b72f1ae@syzkaller.appspotmail.com
Fixes: 63346650c1 ("netrom: switch to sock timer API")
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2f3fdd8d48 ]
After commit ca84bd058d ("sctp: copy the optval from user space in
sctp_setsockopt"), it does memory allocation in sctp_setsockopt with
the optlen, and it would fail the allocation and return error if the
optlen from user space is a huge value.
This breaks some sockopts, like SCTP_HMAC_IDENT, SCTP_RESET_STREAMS and
SCTP_AUTH_KEY, as when processing these sockopts before, optlen would
be trimmed to a biggest value it needs when optlen is a huge value,
instead of failing the allocation and returning error.
This patch is to fix the allocation failure when it's a huge optlen from
user space by trimming it to the biggest size sctp sockopt may need when
necessary, and this biggest size is from sctp_setsockopt_reset_streams()
for SCTP_RESET_STREAMS, which is bigger than those for SCTP_HMAC_IDENT
and SCTP_AUTH_KEY.
Fixes: ca84bd058d ("sctp: copy the optval from user space in sctp_setsockopt")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f5051bcece ]
Syzbot reported memory leak in tcindex_set_parms(). The problem was in
non-freed perfect hash in tcindex_partial_destroy_work().
In tcindex_set_parms() new tcindex_data is allocated and some fields from
old one are copied to new one, but not the perfect hash. Since
tcindex_partial_destroy_work() is the destroy function for old
tcindex_data, we need to free perfect hash to avoid memory leak.
Reported-and-tested-by: syzbot+f0bbb2287b8993d4fa74@syzkaller.appspotmail.com
Fixes: 331b72922c ("net: sched: RCU cls_tcindex")
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bd31ecf44b ]
When running CPU_FTR_P9_TM_HV_ASSIST, HFSCR[TM] is set for the guest
even if the host has CONFIG_TRANSACTIONAL_MEM=n, which causes it to be
unprepared to handle guest exits while transactional.
Normal guests don't have a problem because the HTM capability will not
be advertised, but a rogue or buggy one could crash the host.
Fixes: 4bb3c7a020 ("KVM: PPC: Book3S HV: Work around transactional memory bugs in POWER9")
Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210716024310.164448-1-npiggin@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5f119ba1d5 ]
The release_sock() is blocking function, it would change the state
after sleeping. use wait_woken() instead.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Yajun Deng <yajun.deng@linux.dev>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 674a9f1f68 ]
Missing TPM final event log table is not a firmware bug.
Clearly if providing event log in the old format makes the final event
log invalid it should not be provided at least in that case.
Fixes: b4f1874c62 ("tpm: check event log version before reading final events")
Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 59dd33f82d ]
On stream stop, currently CPU DAI stop sequence invoked first
followed by DMA. For Few platforms, it is required to stop the
DMA first before stopping CPU DAI.
Introduced new flag in dai_link structure for reordering stop sequence.
Based on flag check, ASoC core will re-order the stop sequence.
Fixes: 4378f1fbe9 ("ASoC: soc-pcm: Use different sequence for start/stop trigger")
Signed-off-by: Vijendar Mukunda <Vijendar.Mukunda@amd.com>
Link: https://lore.kernel.org/r/20210716123015.15697-1-vijendar.mukunda@amd.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 40ac971eab ]
xen-swiotlb can use vmalloc backed addresses for dma coherent allocations
and uses the common helpers. Properly handle them to unbreak Xen on
ARM platforms.
Fixes: 1b65c4e5a9 ("swiotlb-xen: use xen_alloc/free_coherent_pages")
Signed-off-by: Roman Skakun <roman_skakun@epam.com>
Reviewed-by: Andrii Anisov <andrii_anisov@epam.com>
[hch: split the patch, renamed the helpers]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 55cef88bbf ]
Fix below division by zero warning:
- Added an if statement because buswidth can be zero, resulting in division by zero.
- The modified code was based on another driver (atmel-quadspi).
[ 0.795337] Division by zero in kernel.
:
[ 0.834051] [<807fd40c>] (__div0) from [<804e1acc>] (Ldiv0+0x8/0x10)
[ 0.839097] [<805f0710>] (cqspi_exec_mem_op) from [<805edb4c>] (spi_mem_exec_op+0x3b0/0x3f8)
Fixes: 7512eaf541 ("spi: cadence-quadspi: Fix dummy cycle calculation when buswidth > 1")
Signed-off-by: Yoshitaka Ikeda <ikeda@nskint.co.jp>
Link: https://lore.kernel.org/r/ed989af6-da88-4e0b-9ed8-126db6cad2e4@nskint.co.jp
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 991e634360 ]
When nr_segs equal to zero in iovec_from_user, the object
msg->msg_iter.iov is uninit stack memory in caif_seqpkt_sendmsg
which is defined in ___sys_sendmsg. So we cann't just judge
msg->msg_iter.iov->base directlly. We can use nr_segs to judge
msg in caif_seqpkt_sendmsg whether has data buffers.
=====================================================
BUG: KMSAN: uninit-value in caif_seqpkt_sendmsg+0x693/0xf60 net/caif/caif_socket.c:542
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x1c9/0x220 lib/dump_stack.c:118
kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:118
__msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:215
caif_seqpkt_sendmsg+0x693/0xf60 net/caif/caif_socket.c:542
sock_sendmsg_nosec net/socket.c:652 [inline]
sock_sendmsg net/socket.c:672 [inline]
____sys_sendmsg+0x12b6/0x1350 net/socket.c:2343
___sys_sendmsg net/socket.c:2397 [inline]
__sys_sendmmsg+0x808/0xc90 net/socket.c:2480
__compat_sys_sendmmsg net/compat.c:656 [inline]
Reported-by: syzbot+09a5d591c1f98cf5efcb@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=1ace85e8fc9b0d5a45c08c2656c3e91762daa9b8
Fixes: bece7b2398 ("caif: Rewritten socket implementation")
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 54ea2f49fd ]
The proc socket stats use sk_prot->inuse_idx value to record inuse sock
stats. We currently do not set this correctly from sockmap side. The
result is reading sock stats '/proc/net/sockstat' gives incorrect values.
The socket counter is incremented correctly, but because we don't set the
counter correctly when we replace sk_prot we may omit the decrement.
To get the correct inuse_idx value move the core_initcall that initializes
the UDP proto handlers to late_initcall. This way it is initialized after
UDP has the chance to assign the inuse_idx value from the register protocol
handler.
Fixes: edc6741cc6 ("bpf: Add sockmap hooks for UDP sockets")
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Cong Wang <cong.wang@bytedance.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20210714154750.528206-1-jakub@cloudflare.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 228a4a7ba8 ]
The proc socket stats use sk_prot->inuse_idx value to record inuse sock
stats. We currently do not set this correctly from sockmap side. The
result is reading sock stats '/proc/net/sockstat' gives incorrect values.
The socket counter is incremented correctly, but because we don't set the
counter correctly when we replace sk_prot we may omit the decrement.
To get the correct inuse_idx value move the core_initcall that initializes
the TCP proto handlers to late_initcall. This way it is initialized after
TCP has the chance to assign the inuse_idx value from the register protocol
handler.
Fixes: 604326b41a ("bpf, sockmap: convert to generic sk_msg interface")
Suggested-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Cong Wang <cong.wang@bytedance.com>
Link: https://lore.kernel.org/bpf/20210712195546.423990-3-john.fastabend@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9109165625 ]
Currently array jit->seen_reg[r1] is being accessed before the range
checking of index r1. The range changing on r1 should be performed
first since it will avoid any potential out-of-range accesses on the
array seen_reg[] and also it is more optimal to perform checks on r1
before fetching data from the array. Fix this by swapping the order
of the checks before the array access.
Fixes: 0546231057 ("s390/bpf: Add s390x eBPF JIT compiler backend")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Ilya Leoshkevich <iii@linux.ibm.com>
Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/bpf/20210715125712.24690-1-colin.king@canonical.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e7efc2ce3d ]
Shifting the u16 integer oct->pcie_port by CN23XX_PKT_INPUT_CTL_MAC_NUM_POS
(29) bits will be promoted to a 32 bit signed int and then sign-extended
to a u64. In the cases where oct->pcie_port where bit 2 is set (e.g. 3..7)
the shifted value will be sign extended and the top 32 bits of the result
will be set.
Fix this by casting the u16 values to a u64 before the 29 bit left shift.
Addresses-Coverity: ("Unintended sign extension")
Fixes: 3451b97cce ("liquidio: CN23XX register setup")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 99bb2ebab9 ]
Making global2 support mandatory removed the Kconfig symbol
NET_DSA_MV88E6XXX_GLOBAL2. This symbol also served as an intermediate
symbol to make NET_DSA_MV88E6XXX_PTP depend on NET_DSA_MV88E6XXX. With
the symbol removed, the user is always asked about PTP support for
Marvell 88E6xxx switches, even if the latter support is not enabled.
Fix this by reinstating the dependency.
Fixes: 63368a7416 ("net: dsa: mv88e6xxx: Make global2 support mandatory")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 32a19de21a ]
The CEC interrupt handlers are registered through the
devm_request_threaded_irq function. However, while free_irq is indeed
called properly when the device is unbound or bind fails, it's called
after unbind or bind is done.
In our particular case, it means that on failure it creates a window
where our interrupt handler can be called, but we're freeing every
resource (CEC adapter, DRM objects, etc.) it might need.
In order to address this, let's switch to the non-devm variant to
control better when the handler will be unregistered and allow us to
make it safe.
Fixes: 15b4511a4a ("drm/vc4: add HDMI CEC support")
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Reviewed-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210707095112.1469670-2-maxime@cerno.tech
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit aebacb7f6c ]
31cd0e119d ("timers: Recalculate next timer interrupt only when
necessary") subtly altered get_next_timer_interrupt()'s behaviour. The
function no longer consistently returns KTIME_MAX with no timers
pending.
In order to decide if there are any timers pending we check whether the
next expiry will happen NEXT_TIMER_MAX_DELTA jiffies from now.
Unfortunately, the next expiry time and the timer base clock are no
longer updated in unison. The former changes upon certain timer
operations (enqueue, expire, detach), whereas the latter keeps track of
jiffies as they move forward. Ultimately breaking the logic above.
A simplified example:
- Upon entering get_next_timer_interrupt() with:
jiffies = 1
base->clk = 0;
base->next_expiry = NEXT_TIMER_MAX_DELTA;
'base->next_expiry == base->clk + NEXT_TIMER_MAX_DELTA', the function
returns KTIME_MAX.
- 'base->clk' is updated to the jiffies value.
- The next time we enter get_next_timer_interrupt(), taking into account
no timer operations happened:
base->clk = 1;
base->next_expiry = NEXT_TIMER_MAX_DELTA;
'base->next_expiry != base->clk + NEXT_TIMER_MAX_DELTA', the function
returns a valid expire time, which is incorrect.
This ultimately might unnecessarily rearm sched's timer on nohz_full
setups, and add latency to the system[1].
So, introduce 'base->timers_pending'[2], update it every time
'base->next_expiry' changes, and use it in get_next_timer_interrupt().
[1] See tick_nohz_stop_tick().
[2] A quick pahole check on x86_64 and arm64 shows it doesn't make
'struct timer_base' any bigger.
Fixes: 31cd0e119d ("timers: Recalculate next timer interrupt only when necessary")
Signed-off-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c71f78a662 ]
The ALC5631 does not like multi-write accesses, avoid them. This fixes:
rt5631 4-001a: Unable to sync registers 0x3a-0x3c. -121
errors on resume from suspend (and all registers after the registers in
the error not being synced).
Inspired by commit 2d30e9494f ("ASoC: rt5651: Fix regcache sync errors
on resume") from Hans de Geode, which fixed the same errors on ALC5651.
Signed-off-by: Maxim Schwalm <maxim.schwalm@gmail.com>
Link: https://lore.kernel.org/r/20210712005011.28536-1-digetx@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1c73daee4b ]
Since config.dev = pdev->dev.parent in current code, so
dev_get_drvdata(rdev->dev.parent) call in hi6421_regulator_enable
returns the drvdata of the mfd device rather than the regulator. Fix it.
This was broken while converting to use simplified DT parsing because the
config.dev changed from pdev->dev to pdev->dev.parent for parsing the
parent's of_node.
Fixes: 29dc269a85 ("regulator: hi6421: Convert to use simplified DT parsing")
Signed-off-by: Axel Lin <axel.lin@ingics.com>
Link: https://lore.kernel.org/r/20210630095959.2411543-1-axel.lin@ingics.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7999d2555c ]
Add pm_runtime calls in probe/probe error path and remove
in order to be consistent in all places in ordering and
ensure that pm_runtime is disabled prior to resources used
by the SPI controller.
This patch also fixes the 2 following warnings on driver remove:
WARNING: CPU: 0 PID: 743 at drivers/clk/clk.c:594 clk_core_disable_lock+0x18/0x24
WARNING: CPU: 0 PID: 743 at drivers/clk/clk.c:476 clk_unprepare+0x24/0x2c
Fixes: 038ac869c9 ("spi: stm32: add runtime PM support")
Signed-off-by: Amelie Delaunay <amelie.delaunay@foss.st.com>
Signed-off-by: Alain Volmat <alain.volmat@foss.st.com>
Link: https://lore.kernel.org/r/1625646426-5826-2-git-send-email-alain.volmat@foss.st.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1d11053dc6 ]
When building modules(CONFIG_...=m), I found some of module versions
are incorrect and set to 0.
This can be found in build log for first clean build which shows
WARNING: EXPORT symbol "XXXX" [drivers/XXX/XXX.ko] version generation failed,
symbol will not be versioned.
But in second build(incremental build), the WARNING disappeared and the
module version becomes valid CRC and make someone who want to change
modules without updating kernel image can't insert their modules.
The problematic code is
+ $(foreach n, $(filter-out FORCE,$^), \
+ $(if $(wildcard $(n).symversions), \
+ ; cat $(n).symversions >> $@.symversions))
For example:
rm -f fs/notify/built-in.a.symversions ; rm -f fs/notify/built-in.a; \
llvm-ar cDPrST fs/notify/built-in.a fs/notify/fsnotify.o \
fs/notify/notification.o fs/notify/group.o ...
`foreach n` shows nothing to `cat` into $(n).symversions because
`if $(wildcard $(n).symversions)` return nothing, but actually
they do exist during this line was executed.
-rw-r--r-- 1 root root 168580 Jun 13 19:10 fs/notify/fsnotify.o
-rw-r--r-- 1 root root 111 Jun 13 19:10 fs/notify/fsnotify.o.symversions
The reason is the $(n).symversions are generated at runtime, but
Makefile wildcard function expends and checks the file exist or not
during parsing the Makefile.
Thus fix this by use `test` shell command to check the file
existence in runtime.
Rebase from both:
1. [https://lore.kernel.org/lkml/20210616080252.32046-1-lecopzer.chen@mediatek.com/]
2. [https://lore.kernel.org/lkml/20210702032943.7865-1-lecopzer.chen@mediatek.com/]
Fixes: 38e8918490 ("kbuild: lto: fix module versioning")
Co-developed-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b0f008551f ]
The tracepoints trace_sched_stat_{wait, sleep, iowait} are not exposed to user
if CONFIG_SCHEDSTATS is not set, "perf sched record" records the three events.
As a result, the command fails.
Before:
#perf sched record sleep 1
event syntax error: 'sched:sched_stat_wait'
\___ unknown tracepoint
Error: File /sys/kernel/tracing/events/sched/sched_stat_wait not found.
Hint: Perhaps this kernel misses some CONFIG_ setting to enable this feature?.
Run 'perf list' for a list of valid events
Usage: perf record [<options>] [<command>]
or: perf record [<options>] -- <command> [<options>]
-e, --event <event> event selector. use 'perf list' to list available events
Solution:
Check whether schedstat tracepoints are exposed. If no, these events are not recorded.
After:
# perf sched record sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.163 MB perf.data (1091 samples) ]
# perf sched report
run measurement overhead: 4736 nsecs
sleep measurement overhead: 9059979 nsecs
the run test took 999854 nsecs
the sleep test took 8945271 nsecs
nr_run_events: 716
nr_sleep_events: 785
nr_wakeup_events: 0
...
------------------------------------------------------------
Fixes: 2a09b5de23 ("sched/fair: do not expose some tracepoints to user if CONFIG_SCHEDSTATS is not set")
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Yafang Shao <laoar.shao@gmail.com>
Link: http://lore.kernel.org/lkml/20210713112358.194693-1-yangjihong1@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 812bae32e5 ]
This device-tree was merged with a provisional vuart IRQ-polarity
property that was still under review and ended up taking a somewhat
different form. This patch updates it to match the final form of the
new vuart properties, which additionally allow specifying the SIRQ
number and LPC address.
Signed-off-by: Zev Weiss <zev@bewilderbeest.net>
Reviewed-by: Andrew Jeffery <andrew@aj.id.au>
Fixes: ca03042f0f ("serial: 8250_aspeed_vuart: add aspeed, lpc-io-reg and aspeed, lpc-interrupts DT properties")
Reviewed-by: Joel Stanley <joel@jms.id.au>
Link: https://lore.kernel.org/r/20210416075113.18047-1-zev@bewilderbeest.net
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 59f44069e0 ]
Since commit:
bad1e1c663 ("arm64: mte: switch GCR_EL1 in kernel entry and exit")
we saved/restored the user GCR_EL1 value at exception boundaries, and
update_gcr_el1_excl() is no longer used for this. However it is used to
restore the kernel's GCR_EL1 value when returning from a suspend state.
Thus, the comment is misleading (and an ISB is necessary).
When restoring the kernel's GCR value, we need an ISB to ensure this is
used by subsequent instructions. We don't necessarily get an ISB by
other means (e.g. if the kernel is built without support for pointer
authentication). As __cpu_setup() initialised GCR_EL1.Exclude to 0xffff,
until a context synchronization event, allocation tag 0 may be used
rather than the desired set of tags.
This patch drops the misleading comment, adds the missing ISB, and for
clarity folds update_gcr_el1_excl() into its only user.
Fixes: bad1e1c663 ("arm64: mte: switch GCR_EL1 in kernel entry and exit")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210714143843.56537-2-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7234c362cc ]
The AMD platform does not support the functions Ah CPUID leaf. The returned
results for this entry should all remain zero just like the native does:
AMD host:
0x0000000a 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
(uncanny) AMD guest:
0x0000000a 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00008000
Fixes: cadbaa039b ("perf/x86/intel: Make anythread filter support conditional")
Signed-off-by: Like Xu <likexu@tencent.com>
Message-Id: <20210628074354.33848-1-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f28100cb9c ]
Fixes: e26ca4b535 sfc: reduce the number of requested xdp ev queues
The buggy commit intended to allocate less channels for XDP in order to
be more unlikely to reach the limit of 32 channels of the driver.
The idea was to use each IRQ/eventqeue for more XDP TX queues than
before, calculating which is the maximum number of TX queues that one
event queue can handle. For example, in EF10 each event queue could
handle up to 8 queues, better than the 4 they were handling before the
change. This way, it would have to allocate half of channels than before
for XDP TX.
The problem is that the TX queues are also contained inside the channel
structs, and there are only 4 queues per channel. Reducing the number of
channels means also reducing the number of queues, resulting in not
having the desired number of 1 queue per CPU.
This leads to getting errors on XDP_TX and XDP_REDIRECT if they're
executed from a high numbered CPU, because there only exist queues for
the low half of CPUs, actually. If XDP_TX/REDIRECT is executed in a low
numbered CPU, the error doesn't happen. This is the error in the logs
(repeated many times, even rate limited):
sfc 0000:5e:00.0 ens3f0np0: XDP TX failed (-22)
This errors happens in function efx_xdp_tx_buffers, where it expects to
have a dedicated XDP TX queue per CPU.
Reverting the change makes again more likely to reach the limit of 32
channels in machines with many CPUs. If this happen, no XDP_TX/REDIRECT
will be possible at all, and we will have this log error messages:
At interface probe:
sfc 0000:5e:00.0: Insufficient resources for 12 XDP event queues (24 other channels, max 32)
At every subsequent XDP_TX/REDIRECT failure, rate limited:
sfc 0000:5e:00.0 ens3f0np0: XDP TX failed (-22)
However, without reverting the change, it makes the user to think that
everything is OK at probe time, but later it fails in an unpredictable
way, depending on the CPU that handles the packet.
It is better to restore the predictable behaviour. If the user sees the
error message at probe time, he/she can try to configure the best way it
fits his/her needs. At least, he/she will have 2 options:
- Accept that XDP_TX/REDIRECT is not available (he/she may not need it)
- Load sfc module with modparam 'rss_cpus' with a lower number, thus
creating less normal RX queues/channels, letting more free resources
for XDP, with some performance penalty.
Anyway, let the calculation of maximum TX queues that can be handled by
a single event queue, and use it only if it's less than the number of TX
queues per channel. This doesn't happen in practice, but could happen if
some constant values are tweaked in the future, such us
EFX_MAX_TXQ_PER_CHANNEL, EFX_MAX_EVQ_SIZE or EFX_MAX_DMAQ_SIZE.
Related mailing list thread:
https://lore.kernel.org/bpf/20201215104327.2be76156@carbon/
Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e56c6bbd98 ]
The point with a *dev and a *brport_dev is that when we have a LAG net
device that is a bridge port, *dev is an ocelot net device and
*brport_dev is the bonding/team net device. The ocelot net device
beneath the LAG does not exist from the bridge's perspective, so we need
to sync the switchdev objects belonging to the brport_dev and not to the
dev.
Fixes: e4bd44e89d ("net: ocelot: replay switchdev events when joining bridge")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 251ef6f71b ]
nvme_dev_remove_admin could free dev->admin_q and the admin_tagset
while they are being accessed by nvme_dev_disable(), which can be called
by nvme_reset_work via nvme_remove_dead_ctrl.
Commit cb4bfda62a ("nvme-pci: fix hot removal during error handling")
intended to avoid requests being stuck on a removed controller by killing
the admin queue. But the later fix c8e9e9b764 ("nvme-pci: unquiesce
admin queue on shutdown"), together with nvme_dev_disable(dev, true)
right before nvme_dev_remove_admin() could help dispatch requests and
fail them early, so we don't need nvme_dev_remove_admin() any more.
Fixes: cb4bfda62a ("nvme-pci: fix hot removal during error handling")
Signed-off-by: Casey Chen <cachen@purestorage.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a5de4be0aa ]
It seems that we cannot differentiate 88X3310 from 88X3340 by simply
looking at bit 3 of revision ID. This only works on revisions A0 and A1.
On revision B0, this bit is always 1.
Instead use the 3.d00d register for differentiation, since this register
contains information about number of ports on the device.
Fixes: 9885d016ff ("net: phy: marvell10g: add separate structure for 88X3340")
Signed-off-by: Marek Behún <kabel@kernel.org>
Reported-by: Matteo Croce <mcroce@linux.microsoft.com>
Tested-by: Matteo Croce <mcroce@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ce599c5163 ]
After commit 879526030c ("mptcp: protect the rx path with
the msk socket spinlock") the rmem currently used by a given
msk is really sk_rmem_alloc - rmem_released.
The safety check in mptcp_data_ready() does not take the above
in due account, as a result legit incoming data is kept in
subflow receive queue with no reason, delaying or blocking
MPTCP-level ack generation.
This change addresses the issue introducing a new helper to fetch
the rmem memory and using it as needed. Additionally add a MIB
counter for the exceptional event described above - the peer is
misbehaving.
Finally, introduce the required annotation when rmem_released is
updated.
Fixes: 879526030c ("mptcp: protect the rx path with the msk socket spinlock")
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/211
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fde56eea01 ]
The current cleanup rbuf tries a bit too hard to avoid acquiring
the subflow socket lock. We may end-up delaying the needed ack,
or skip acking a blocked subflow.
Address the above extending the conditions used to trigger the cleanup
to reflect more closely what TCP does and invoking tcp_cleanup_rbuf()
on all the active subflows.
Note that we can't replicate the exact tests implemented in
tcp_cleanup_rbuf(), as MPTCP lacks some of the required info - e.g.
ping-pong mode.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 75e908c336 ]
There are a bunch of callsite where the ssk socket
lock is acquired using the full-blown version eligible for
the fast variant. Let's move to the latter.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a7da441621 ]
After patch "mptcp: fix syncookie process if mptcp can not_accept new
subflow", if subflow is limited, MP_JOIN SYN is dropped, and no SYN/ACK
will be replied.
So in case "multiple subflows limited by server", the expected SYN/ACK
number should be 1.
Fixes: 00587187ad ("selftests: mptcp: add test cases for mptcp join tests with syn cookies")
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Jianguo Wu <wujianguo@chinatelecom.cn>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6787b7e350 ]
If check_fully_established() causes a subflow reset, it should not
continue to process the packet in tcp_data_queue().
Add a return value to mptcp_incoming_options(), and return false if a
subflow has been reset, else return true. Then drop the packet in
tcp_data_queue()/tcp_rcv_state_process() if mptcp_incoming_options()
return false.
Fixes: d582484726 ("mptcp: fix fallback for MP_JOIN subflows")
Signed-off-by: Jianguo Wu <wujianguo@chinatelecom.cn>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8547ea5f52 ]
Lots of "TCP: tcp_fin: Impossible, sk->sk_state=7" in client side
when doing stress testing using wrk and webfsd.
There are at least two cases may trigger this warning:
1.mptcp is in syncookie, and server recv MP_JOIN SYN request,
in subflow_check_req(), the mptcp_can_accept_new_subflow()
return false, so subflow_init_req_cookie_join_save() isn't
called, i.e. not store the data present in the MP_JOIN syn
request and the random nonce in hash table - join_entries[],
but still send synack. When recv 3rd-ack,
mptcp_token_join_cookie_init_state() will return false, and
3rd-ack is dropped, then if mptcp conn is closed by client,
client will send a DATA_FIN and a MPTCP FIN, the DATA_FIN
doesn't have MP_CAPABLE or MP_JOIN,
so mptcp_subflow_init_cookie_req() will return 0, and pass
the cookie check, MP_JOIN request is fallback to normal TCP.
Server will send a TCP FIN if closed, in client side,
when process TCP FIN, it will do reset, the code path is:
tcp_data_queue()->mptcp_incoming_options()
->check_fully_established()->mptcp_subflow_reset().
mptcp_subflow_reset() will set sock state to TCP_CLOSE,
so tcp_fin will hit TCP_CLOSE, and print the warning.
2.mptcp is in syncookie, and server recv 3rd-ack, in
mptcp_subflow_init_cookie_req(), mptcp_can_accept_new_subflow()
return false, and subflow_req->mp_join is not set to 1,
so in subflow_syn_recv_sock() will not reset the MP_JOIN
subflow, but fallback to normal TCP, and then the same thing
happens when server will send a TCP FIN if closed.
For case1, subflow_check_req() return -EPERM,
then tcp_conn_request() will drop MP_JOIN SYN.
For case2, let subflow_syn_recv_sock() call
mptcp_can_accept_new_subflow(), and do fatal fallback, send reset.
Fixes: 9466a1cceb ("mptcp: enable JOIN requests even if cookies are in use")
Signed-off-by: Jianguo Wu <wujianguo@chinatelecom.cn>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 030d37bd1c ]
In subflow_check_req(), if subflow sport is mismatch, will put msk,
destroy token, and destruct req, then return -EPERM, which can be
done by subflow_req_destructor() via:
tcp_conn_request()
|--__reqsk_free()
|--subflow_req_destructor()
So we should remove these redundant code, otherwise will call
tcp_v4_reqsk_destructor() twice, and may double free
inet_rsk(req)->ireq_opt.
Fixes: 5bc56388c7 ("mptcp: add port number check for MP_JOIN")
Signed-off-by: Jianguo Wu <wujianguo@chinatelecom.cn>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 34bd46bcf3 ]
Change
2ef4fb9236 ("drm/vmwgfx: Make sure bo's are unpinned before putting them back")
caused a conflict in one of the drm trees and the merge commit
68a32ba141 ("Merge tag 'drm-next-2021-04-28' of git://anongit.freedesktop.org/drm/drm")
accidently re-added code that the original change was removing.
Fixed by removing the incorrect buffer unpin - it has already been unpinned
two lines above.
Fixes: 68a32ba141 ("Merge tag 'drm-next-2021-04-28' of git://anongit.freedesktop.org/drm/drm")
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Martin Krastev <krastevm@vmware.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210615182336.995192-4-zackr@vmware.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 015fe6fd29 ]
IRQs are requested during driver's ndo_open() and then later
freed up in disable_interrupts() during driver unload.
A race exists where driver can set the CXGB4_FULL_INIT_DONE
flag in ndo_open() after the disable_interrupts() in driver
unload path checks it, and hence misses calling free_irq().
Fix by unregistering netdevice first and sync with driver's
ndo_open(). This ensures disable_interrupts() checks the flag
correctly and frees up the IRQs properly.
Fixes: b37987e8db ("cxgb4: Disable interrupts and napi before unregistering netdev")
Signed-off-by: Shahjada Abul Husain <shahjada@chelsio.com>
Signed-off-by: Raju Rangoju <rajur@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 65e2e6c1c2 ]
As the last call to sprd_pwm_apply() might have exited early if
state->enabled was false, the values for period and duty_cycle stored in
pwm->state might not have been written to hardware and it must be
ensured that they are configured before enabling the PWM.
Fixes: 8aae4b02e8 ("pwm: sprd: Add Spreadtrum PWM support")
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0e02bf5de4 ]
After redirecting, it's already a new path. So the old PMTU info should
be cleared. The IPv6 test "mtu exception plus redirect" should only
has redirect info without old PMTU.
The IPv4 test can not be changed because of legacy.
Fixes: ec81053528 ("selftests: Add redirect tests")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 24b671aad4 ]
If the kernel doesn't enable option CONFIG_IPV6_SUBTREES, the RTA_SRC
info will not be exported to userspace in rt6_fill_node(). And ip cmd will
not print "from ::" to the route output. So remove this check.
Fixes: ec81053528 ("selftests: Add redirect tests")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit eca81f0914 ]
The "plat->phy_interface" variable is an enum and in this context GCC
will treat it as an unsigned int so the error handling is never
triggered.
Fixes: b9f0b2f634 ("net: stmmac: platform: fix probe for ACPI devices")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ccd27f05ae ]
The goal of commit df789fe752 ("ipv6: Provide ipv6 version of
"disable_policy" sysctl") was to have the disable_policy from ipv4
available on ipv6.
However, it's not exactly the same mechanism. On IPv4, all packets coming
from an interface, which has disable_policy set, bypass the policy check.
For ipv6, this is done only for local packets, ie for packets destinated to
an address configured on the incoming interface.
Let's align ipv6 with ipv4 so that the 'disable_policy' sysctl has the same
effect for both protocols.
My first approach was to create a new kind of route cache entries, to be
able to set DST_NOPOLICY without modifying routes. This would have added a
lot of code. Because the local delivery path is already handled, I choose
to focus on the forwarding path to minimize code churn.
Fixes: df789fe752 ("ipv6: Provide ipv6 version of "disable_policy" sysctl")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 168e696a36 ]
bond_ipsec_offload_ok() is called to check whether the interface supports
ipsec offload or not.
bonding interface support ipsec offload only in active-backup mode.
So, if a bond interface is not in active-backup mode, it should return
false but it returns true.
Fixes: a3b658cfb6 ("bonding: allow xfrm offload setup post-module-load")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9a5605505d ]
bonding has been supporting ipsec offload.
When SA is added, bonding just passes SA to its own active real interface.
But it doesn't manage SA.
So, when events(add/del real interface, active real interface change, etc)
occur, bonding can't handle that well because It doesn't manage SA.
So some problems(panic, UAF, refcnt leak)occur.
In order to make it stable, it should manage SA.
That's the reason why struct bond_ipsec is added.
When a new SA is added to bonding interface, it is stored in the
bond_ipsec list. And the SA is passed to a current active real interface.
If events occur, it uses bond_ipsec data to handle these events.
bond->ipsec_list is protected by bond->ipsec_lock.
If a current active real interface is changed, the following logic works.
1. delete all SAs from old active real interface
2. Add all SAs to the new active real interface.
3. If a new active real interface doesn't support ipsec offload or SA's
option, it sets real_dev to NULL.
Fixes: 18cb261afd ("bonding: support hardware encryption offload to slaves")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b121693381 ]
bonding interface can be nested and it supports ipsec offload.
So, it allows setting the nested bonding + ipsec scenario.
But code does not support this scenario.
So, it should be disallowed.
interface graph:
bond2
|
bond1
|
eth0
The nested bonding + ipsec offload may not a real usecase.
So, disallowing this scenario is fine.
Fixes: 18cb261afd ("bonding: support hardware encryption offload to slaves")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a22c39b831 ]
To dereference bond->curr_active_slave, it uses rcu_dereference().
But it and the caller doesn't acquire RCU so a warning occurs.
So add rcu_read_lock().
Test commands:
ip netns add A
ip netns exec A bash
modprobe netdevsim
echo "1 1" > /sys/bus/netdevsim/new_device
ip link add bond0 type bond
ip link set eth0 master bond0
ip link set eth0 up
ip link set bond0 up
ip x s add proto esp dst 14.1.1.1 src 15.1.1.1 spi 0x07 mode \
transport reqid 0x07 replay-window 32 aead 'rfc4106(gcm(aes))' \
0x44434241343332312423222114131211f4f3f2f1 128 sel src 14.0.0.52/24 \
dst 14.0.0.70/24 proto tcp offload dev bond0 dir in
ip x s f
Splat looks like:
=============================
WARNING: suspicious RCU usage
5.13.0-rc3+ #1168 Not tainted
-----------------------------
drivers/net/bonding/bond_main.c:448 suspicious rcu_dereference_check()
usage!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
2 locks held by ip/705:
#0: ffff888106701780 (&net->xfrm.xfrm_cfg_mutex){+.+.}-{3:3},
at: xfrm_netlink_rcv+0x59/0x80 [xfrm_user]
#1: ffff8880075b0098 (&x->lock){+.-.}-{2:2},
at: xfrm_state_delete+0x16/0x30
stack backtrace:
CPU: 6 PID: 705 Comm: ip Not tainted 5.13.0-rc3+ #1168
Call Trace:
dump_stack+0xa4/0xe5
bond_ipsec_del_sa+0x16a/0x1c0 [bonding]
__xfrm_state_delete+0x51f/0x730
xfrm_state_delete+0x1e/0x30
xfrm_state_flush+0x22f/0x390
xfrm_flush_sa+0xd8/0x260 [xfrm_user]
? xfrm_flush_policy+0x290/0x290 [xfrm_user]
xfrm_user_rcv_msg+0x331/0x660 [xfrm_user]
? rcu_read_lock_sched_held+0x91/0xc0
? xfrm_user_state_lookup.constprop.39+0x320/0x320 [xfrm_user]
? find_held_lock+0x3a/0x1c0
? mutex_lock_io_nested+0x1210/0x1210
? sched_clock_cpu+0x18/0x170
netlink_rcv_skb+0x121/0x350
[ ... ]
Fixes: 18cb261afd ("bonding: support hardware encryption offload to slaves")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b648eba4c6 ]
To dereference bond->curr_active_slave, it uses rcu_dereference().
But it and the caller doesn't acquire RCU so a warning occurs.
So add rcu_read_lock().
Test commands:
ip link add dummy0 type dummy
ip link add bond0 type bond
ip link set dummy0 master bond0
ip link set dummy0 up
ip link set bond0 up
ip x s add proto esp dst 14.1.1.1 src 15.1.1.1 spi 0x07 \
mode transport \
reqid 0x07 replay-window 32 aead 'rfc4106(gcm(aes))' \
0x44434241343332312423222114131211f4f3f2f1 128 sel \
src 14.0.0.52/24 dst 14.0.0.70/24 proto tcp offload \
dev bond0 dir in
Splat looks like:
=============================
WARNING: suspicious RCU usage
5.13.0-rc3+ #1168 Not tainted
-----------------------------
drivers/net/bonding/bond_main.c:411 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
1 lock held by ip/684:
#0: ffffffff9a2757c0 (&net->xfrm.xfrm_cfg_mutex){+.+.}-{3:3},
at: xfrm_netlink_rcv+0x59/0x80 [xfrm_user]
55.191733][ T684] stack backtrace:
CPU: 0 PID: 684 Comm: ip Not tainted 5.13.0-rc3+ #1168
Call Trace:
dump_stack+0xa4/0xe5
bond_ipsec_add_sa+0x18c/0x1f0 [bonding]
xfrm_dev_state_add+0x2a9/0x770
? memcpy+0x38/0x60
xfrm_add_sa+0x2278/0x3b10 [xfrm_user]
? xfrm_get_policy+0xaa0/0xaa0 [xfrm_user]
? register_lock_class+0x1750/0x1750
xfrm_user_rcv_msg+0x331/0x660 [xfrm_user]
? rcu_read_lock_sched_held+0x91/0xc0
? xfrm_user_state_lookup.constprop.39+0x320/0x320 [xfrm_user]
? find_held_lock+0x3a/0x1c0
? mutex_lock_io_nested+0x1210/0x1210
? sched_clock_cpu+0x18/0x170
netlink_rcv_skb+0x121/0x350
? xfrm_user_state_lookup.constprop.39+0x320/0x320 [xfrm_user]
? netlink_ack+0x9d0/0x9d0
? netlink_deliver_tap+0x17c/0xa50
xfrm_netlink_rcv+0x68/0x80 [xfrm_user]
netlink_unicast+0x41c/0x610
? netlink_attachskb+0x710/0x710
netlink_sendmsg+0x6b9/0xb70
[ ... ]
Fixes: 18cb261afd ("bonding: support hardware encryption offload to slaves")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2620e92ae6 ]
Execute the following command and exit, then execute it again, the following
error will be reported:
$ sudo ./samples/bpf/xdpsock -i ens4f2 -M
^C
$ sudo ./samples/bpf/xdpsock -i ens4f2 -M
libbpf: elf: skipping unrecognized data section(16) .eh_frame
libbpf: elf: skipping relo section(17) .rel.eh_frame for section(16) .eh_frame
libbpf: Kernel error message: XDP program already attached
ERROR: link set xdp fd failed
Commit c9d27c9e8d ("samples: bpf: Do not unload prog within xdpsock") removed
the unloading prog code because of the presence of bpf_link. This is fine if
XDP_SHARED_UMEM is disabled, but if it is enabled, unloading the prog is still
needed.
Fixes: c9d27c9e8d ("samples: bpf: Do not unload prog within xdpsock")
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Cc: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20210628091815.2373487-1-wanghai38@huawei.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2342ae10d1 ]
If the 'register_netdev() call fails, we must release the resources
allocated by the previous 'gve_init_priv()' call, as already done in the
remove function.
Add a new label and the missing 'gve_teardown_priv_resources()' in the
error handling path.
Fixes: 893ce44df5 ("gve: Add basic driver framework for Compute Engine Virtual NIC")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6b28a86d6c ]
Add stmmac_fpe_stop_wq() in stmmac_suspend() to terminate FPE workqueue
during suspend. So, in suspend mode, there will be no FPE workqueue
available. Without this fix, new additional FPE workqueue will be created
in every suspend->resume cycle.
Fixes: 5a5586112b ("net: stmmac: support FPE link partner hand-shaking procedure")
Signed-off-by: Mohammad Athari Bin Ismail <mohammad.athari.ismail@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit af30cbd2f4 ]
If an error occurs after a 'pci_enable_pcie_error_reporting()' call, it
must be undone by a corresponding 'pci_disable_pcie_error_reporting()'
call, as already done in the remove function.
Fixes: 5eae00c57f ("i40evf: main driver core")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e85e14d68f ]
If an error occurs after a 'pci_enable_pcie_error_reporting()' call, it
must be undone by a corresponding 'pci_disable_pcie_error_reporting()'
call, as already done in the remove function.
Fixes: 19ae1b3fb9 ("fm10k: Add support for PCI power management and error handling")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fea03b1ceb ]
If an error occurs after a 'pci_enable_pcie_error_reporting()' call, it
must be undone by a corresponding 'pci_disable_pcie_error_reporting()'
call, as already done in the remove function.
Fixes: 40a914fa72 ("igb: Add support for pci-e Advanced Error Reporting")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit dd2aefcd5e ]
If an error occurs after a 'pci_enable_pcie_error_reporting()' call, it
must be undone by a corresponding 'pci_disable_pcie_error_reporting()'
call, as already done in the remove function.
Fixes: 6fabd715e6 ("ixgbe: Implement PCIe AER support")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 05682a0a61 ]
Static analysis reports this problem
igc_main.c:4944:20: warning: The left operand of '&'
is a garbage value
if (!(phy_data & SR_1000T_REMOTE_RX_STATUS) &&
~~~~~~~~ ^
phy_data is set by the call to igc_read_phy_reg() only if
there is a read_reg() op, else it is unset and a 0 is
returned. Change the return to -EOPNOTSUPP.
Fixes: 208983f099 ("igc: Add watchdog")
Signed-off-by: Tom Rix <trix@redhat.com>
Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7b292608db ]
Cleans the next descriptor to watch (next_to_watch) when cleaning the
TX ring.
Failure to do so can cause invalid memory accesses. If igb_poll() runs
while the controller is reset this can lead to the driver try to free
a skb that was already freed.
(The crash is harder to reproduce with the igb driver, but the same
potential problem exists as the code is identical to igc)
Fixes: 7cc6fd4c60 ("igb: Don't bother clearing Tx buffer_info in igb_clean_tx_ring")
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Reported-by: Erez Geva <erez.geva.ext@siemens.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit c342690411 upstream.
When reboot system, no power cycles, firmware is already downloaded,
return -EIO will break driver as error:
mt7921e: probe of 0000:03:00.0 failed with error -5
Skip firmware download and continue to probe.
Signed-off-by: Aaron Ma <aaron.ma@canonical.com>
Fixes: 1c099ab447 ("mt76: mt7921: add MCU support")
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b43c8909be upstream.
If an UDP packet enters the GRO engine but is not eligible
for aggregation and is not targeting an UDP tunnel,
udp_gro_receive() will not set the flush bit, and packet
could delayed till the next napi flush.
Fix the issue ensuring non GROed packets traverse
skb_gro_flush_final().
Reported-and-tested-by: Matthias Treydte <mt@waldheinz.de>
Fixes: 18f25dc399 ("udp: skip L4 aggregation for UDP tunnel packets")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 187a002b07 upstream.
scmi_resp_sensor_reading_complete structure is meant to represent an
SCMI asynchronous reading complete message. The readings field with
a 64bit type forces padding and breaks reads in scmi_sensor_reading_get.
Split it in two adjacent 32bit readings_low/high subfields to avoid the
padding within the structure. Alternatively we could to mark the structure
packed.
Link: https://lore.kernel.org/r/20210628170042.34105-1-cristian.marussi@arm.com
Fixes: e2083d3673 ("firmware: arm_scmi: Add SCMI v3.0 sensors timestamped reads")
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f263a81451 upstream.
Subprograms are calling map_poke_track(), but on program release there is no
hook to call map_poke_untrack(). However, on program release, the aux memory
(and poke descriptor table) is freed even though we still have a reference to
it in the element list of the map aux data. When we run map_poke_run(), we then
end up accessing free'd memory, triggering KASAN in prog_array_map_poke_run():
[...]
[ 402.824689] BUG: KASAN: use-after-free in prog_array_map_poke_run+0xc2/0x34e
[ 402.824698] Read of size 4 at addr ffff8881905a7940 by task hubble-fgs/4337
[ 402.824705] CPU: 1 PID: 4337 Comm: hubble-fgs Tainted: G I 5.12.0+ #399
[ 402.824715] Call Trace:
[ 402.824719] dump_stack+0x93/0xc2
[ 402.824727] print_address_description.constprop.0+0x1a/0x140
[ 402.824736] ? prog_array_map_poke_run+0xc2/0x34e
[ 402.824740] ? prog_array_map_poke_run+0xc2/0x34e
[ 402.824744] kasan_report.cold+0x7c/0xd8
[ 402.824752] ? prog_array_map_poke_run+0xc2/0x34e
[ 402.824757] prog_array_map_poke_run+0xc2/0x34e
[ 402.824765] bpf_fd_array_map_update_elem+0x124/0x1a0
[...]
The elements concerned are walked as follows:
for (i = 0; i < elem->aux->size_poke_tab; i++) {
poke = &elem->aux->poke_tab[i];
[...]
The access to size_poke_tab is a 4 byte read, verified by checking offsets
in the KASAN dump:
[ 402.825004] The buggy address belongs to the object at ffff8881905a7800
which belongs to the cache kmalloc-1k of size 1024
[ 402.825008] The buggy address is located 320 bytes inside of
1024-byte region [ffff8881905a7800, ffff8881905a7c00)
The pahole output of bpf_prog_aux:
struct bpf_prog_aux {
[...]
/* --- cacheline 5 boundary (320 bytes) --- */
u32 size_poke_tab; /* 320 4 */
[...]
In general, subprograms do not necessarily manage their own data structures.
For example, BTF func_info and linfo are just pointers to the main program
structure. This allows reference counting and cleanup to be done on the latter
which simplifies their management a bit. The aux->poke_tab struct, however,
did not follow this logic. The initial proposed fix for this use-after-free
bug further embedded poke data tracking into the subprogram with proper
reference counting. However, Daniel and Alexei questioned why we were treating
these objects special; I agree, its unnecessary. The fix here removes the per
subprogram poke table allocation and map tracking and instead simply points
the aux->poke_tab pointer at the main programs poke table. This way, map
tracking is simplified to the main program and we do not need to manage them
per subprogram.
This also means, bpf_prog_free_deferred(), which unwinds the program reference
counting and kfrees objects, needs to ensure that we don't try to double free
the poke_tab when free'ing the subprog structures. This is easily solved by
NULL'ing the poke_tab pointer. The second detail is to ensure that per
subprogram JIT logic only does fixups on poke_tab[] entries it owns. To do
this, we add a pointer in the poke structure to point at the subprogram value
so JITs can easily check while walking the poke_tab structure if the current
entry belongs to the current program. The aux pointer is stable and therefore
suitable for such comparison. On the jit_subprogs() error path, we omit
cleaning up the poke->aux field because these are only ever referenced from
the JIT side, but on error we will never make it to the JIT, so its fine to
leave them dangling. Removing these pointers would complicate the error path
for no reason. However, we do need to untrack all poke descriptors from the
main program as otherwise they could race with the freeing of JIT memory from
the subprograms. Lastly, a748c6975d ("bpf: propagate poke descriptors to
subprograms") had an off-by-one on the subprogram instruction index range
check as it was testing 'insn_idx >= subprog_start && insn_idx <= subprog_end'.
However, subprog_end is the next subprogram's start instruction.
Fixes: a748c6975d ("bpf: propagate poke descriptors to subprograms")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Co-developed-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210707223848.14580-2-john.fastabend@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1d719254c1 upstream.
make[2]: *** No rule to make target 'install'. Stop.
make[1]: *** [Makefile:122: runqslower_install] Error 2
make: *** [Makefile:116: bpf_install] Error 2
There is no rule for target 'install' in tools/bpf/runqslower/Makefile,
and there is no need to install it, so just remove 'runqslower_install'.
Fixes: 9c01546d26 ("tools/bpf: Add runqslower tool to tools/bpf")
Signed-off-by: Wei Li <liwei391@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210628030409.3459095-1-liwei391@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 358ed62420 upstream.
sk_wmem_schedule makes sure that sk_forward_alloc has enough
bytes for charging that is going to be done by sk_mem_charge.
In the transmit zerocopy path, there is sk_mem_charge but there was
no call to sk_wmem_schedule. This change adds that call.
Without this call to sk_wmem_schedule, sk_forward_alloc can go
negetive which is a bug because sk_forward_alloc is a per-socket
space that has been forward charged so this can't be negative.
Fixes: f214f915e7 ("tcp: enable MSG_ZEROCOPY")
Signed-off-by: Talal Ahmad <talalahmad@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Wei Wang <weiwan@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c7bb4b8903 upstream.
While TCP stack scales reasonably well, there is still one part that
can be used to DDOS it.
IPv6 Packet too big messages have to lookup/insert a new route,
and if abused by attackers, can easily put hosts under high stress,
with many cpus contending on a spinlock while one is stuck in fib6_run_gc()
ip6_protocol_deliver_rcu()
icmpv6_rcv()
icmpv6_notify()
tcp_v6_err()
tcp_v6_mtu_reduced()
inet6_csk_update_pmtu()
ip6_rt_update_pmtu()
__ip6_rt_update_pmtu()
ip6_rt_cache_alloc()
ip6_dst_alloc()
dst_alloc()
ip6_dst_gc()
fib6_run_gc()
spin_lock_bh() ...
Some of our servers have been hit by malicious ICMPv6 packets
trying to _increase_ the MTU/MSS of TCP flows.
We believe these ICMPv6 packets are a result of a bug in one ISP stack,
since they were blindly sent back for _every_ (small) packet sent to them.
These packets are for one TCP flow:
09:24:36.266491 IP6 Addr1 > Victim ICMP6, packet too big, mtu 1460, length 1240
09:24:36.266509 IP6 Addr1 > Victim ICMP6, packet too big, mtu 1460, length 1240
09:24:36.316688 IP6 Addr1 > Victim ICMP6, packet too big, mtu 1460, length 1240
09:24:36.316704 IP6 Addr1 > Victim ICMP6, packet too big, mtu 1460, length 1240
09:24:36.608151 IP6 Addr1 > Victim ICMP6, packet too big, mtu 1460, length 1240
TCP stack can filter some silly requests :
1) MTU below IPV6_MIN_MTU can be filtered early in tcp_v6_err()
2) tcp_v6_mtu_reduced() can drop requests trying to increase current MSS.
This tests happen before the IPv6 routing stack is entered, thus
removing the potential contention and route exhaustion.
Note that IPv6 stack was performing these checks, but too late
(ie : after the route has been added, and after the potential
garbage collect war)
v2: fix typo caught by Martin, thanks !
v3: exports tcp_mtu_to_mss(), caught by David, thanks !
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Maciej Żenczykowski <maze@google.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit be5d1b61a2 upstream.
This commit fixes a bug (found by syzkaller) that could cause spurious
double-initializations for congestion control modules, which could cause
memory leaks or other problems for congestion control modules (like CDG)
that allocate memory in their init functions.
The buggy scenario constructed by syzkaller was something like:
(1) create a TCP socket
(2) initiate a TFO connect via sendto()
(3) while socket is in TCP_SYN_SENT, call setsockopt(TCP_CONGESTION),
which calls:
tcp_set_congestion_control() ->
tcp_reinit_congestion_control() ->
tcp_init_congestion_control()
(4) receive ACK, connection is established, call tcp_init_transfer(),
set icsk_ca_initialized=0 (without first calling cc->release()),
call tcp_init_congestion_control() again.
Note that in this sequence tcp_init_congestion_control() is called
twice without a cc->release() call in between. Thus, for CC modules
that allocate memory in their init() function, e.g, CDG, a memory leak
may occur. The syzkaller tool managed to find a reproducer that
triggered such a leak in CDG.
The bug was introduced when that commit 8919a9b31e ("tcp: Only init
congestion control if not initialized already")
introduced icsk_ca_initialized and set icsk_ca_initialized to 0 in
tcp_init_transfer(), missing the possibility for a sequence like the
one above, where a process could call setsockopt(TCP_CONGESTION) in
state TCP_SYN_SENT (i.e. after the connect() or TFO open sendmsg()),
which would call tcp_init_congestion_control(). It did not intend to
reset any initialization that the user had already explicitly made;
it just missed the possibility of that particular sequence (which
syzkaller managed to find).
Fixes: 8919a9b31e ("tcp: Only init congestion control if not initialized already")
Reported-by: syzbot+f1e24a0594d4e3a895d3@syzkaller.appspotmail.com
Signed-off-by: Nguyen Dinh Phi <phind.uet@gmail.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Tested-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 561022acb1 upstream.
While tp->mtu_info is read while socket is owned, the write
sides happen from err handlers (tcp_v[46]_mtu_reduced)
which only own the socket spinlock.
Fixes: 563d34d057 ("tcp: dont drop MTU reduction indications")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 71158bb1f2 upstream.
The MPTCP receive path is hooked only into the TCP slow-path.
The DSS presence allows plain MPTCP traffic to hit that
consistently.
Since commit e1ff9e82e2 ("net: mptcp: improve fallback to TCP"),
when an MPTCP socket falls back to TCP, it can hit the TCP receive
fast-path, and delay or stop triggering the event notification.
Address the issue explicitly disabling the header prediction
for MPTCP sockets.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/200
Fixes: e1ff9e82e2 ("net: mptcp: improve fallback to TCP")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit faffd1b2bd upstream.
The values were determined experimentally via boot tests, not by
measuring the bus behaviour with a scope. We plan to do scope
measurements to confirm or refine the values and will update the
devicetree if necessary once these have been obtained.
However, with the patch we can write and read data without issue, where
as booting the system without the patch failed at the point of mounting
the rootfs.
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Link: https://lore.kernel.org/r/20210628013605.1257346-1-andrew@aj.id.au
Fixes: 2fc88f9235 ("mmc: sdhci-of-aspeed: Expose clock phase controls")
Fixes: a5c5168478 ("ARM: dts: aspeed: Add Everest BMC machine")
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2d6608b57c upstream.
The degree values were reversed out from the magic tap values of 7 (in)
and 15 + inversion (out) initially suggested by Aspeed.
With the patch tacoma survives several gigabytes of reads and writes
using dd while without it locks up randomly during the boot process.
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Link: https://lore.kernel.org/r/20210625061017.1149942-1-andrew@aj.id.au
Fixes: 2fc88f9235 ("mmc: sdhci-of-aspeed: Expose clock phase controls")
Fixes: 961216c135 ("ARM: dts: aspeed: Add Rainier system")
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ca46ad2214 upstream.
Tacoma and Rainier both have a line-names array that is too long:
gpio gpiochip0: gpio-line-names is length 232 but should be at most length 208
This was probably copied from an AST2500 device tree that did have more
GPIOs on the controller.
Fixes: e9b24b55ca ("ARM: dts: aspeed: rainier: Add gpio line names")
Fixes: 2f68e4e7df ("ARM: dts: aspeed: tacoma: Add gpio line names")
Link: https://lore.kernel.org/r/20210624090742.56640-1-joel@jms.id.au
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 52dfd86aa5 upstream.
Opening a new file is done in 2 steps on regular filesystems:
1. Call the create inode-op on the parent-dir to create an inode
to hold the meta-data related to the file.
2. Call the open file-op to get a handle for the file.
vboxsf however does not really use disk-backed inodes because it
is based on passing through file-related system-calls through to
the hypervisor. So both steps translate to an open(2) call being
passed through to the hypervisor. With the handle returned by
the first call immediately being closed again.
Making 2 open calls for a single open(..., O_CREATE, ...) calls
has 2 problems:
a) It is not really efficient.
b) It actually breaks some apps.
An example of b) is doing a git clone inside a vboxsf mount.
When git clone tries to create a tempfile to store the pak
files which is downloading the following happens:
1. vboxsf_dir_mkfile() gets called with a mode of 0444 and succeeds.
2. vboxsf_file_open() gets called with file->f_flags containing
O_RDWR. When the host is a Linux machine this fails because doing
a open(..., O_RDWR) on a file which exists and has mode 0444 results
in an -EPERM error.
Other network-filesystems and fuse avoid the problem of needing to
pass 2 open() calls to the other side by using the atomic_open
directory-inode op.
This commit fixes git clone not working inside a vboxsf mount,
by adding support for the atomic_open directory-inode op.
As an added bonus this should also make opening new files faster.
The atomic_open implementation is modelled after the atomic_open
implementations from the 9p and fuse code.
Fixes: 0fd1695766 ("fs: Add VirtualBox guest shared folder (vboxsf) support")
Reported-by: Ludovic Pouzenc <bugreports@pouzenc.fr>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 02f840f907 upstream.
Factor out the code to create / release a struct vboxsf_handle into
2 new helper functions.
This is a preparation patch for adding atomic_open support.
Fixes: 0fd1695766 ("fs: Add VirtualBox guest shared folder (vboxsf) support")
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ab0c29687b upstream.
Make vboxsf_dir_create() optionally return the vboxsf-handle for
the created file. This is a preparation patch for adding atomic_open
support.
Fixes: 0fd1695766 ("fs: Add VirtualBox guest shared folder (vboxsf) support")
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cc3ddee97c upstream.
Honor the excl flag to the dir-inode create op, instead of behaving
as if it is always set.
Note the old behavior still worked most of the time since a non-exclusive
open only calls the create op, if there is a race and the file is created
between the dentry lookup and the calling of the create call.
While at it change the type of the is_dir parameter to the
vboxsf_dir_create() helper from an int to a bool, to be consistent with
the use of bool for the excl parameter.
Fixes: 0fd1695766 ("fs: Add VirtualBox guest shared folder (vboxsf) support")
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d952cfaf0c upstream.
When a new CONFIG option is available, Kbuild shows a prompt to get
the user input.
$ make
[ snip ]
Core Scheduling for SMT (SCHED_CORE) [N/y/?] (NEW)
This is the only interactive place in the build process.
Commit 174a1dcc96 ("kbuild: sink stdout from cmd for silent build")
suppressed Kconfig prompts as well because syncconfig is invoked by
the 'cmd' macro. You cannot notice the fact that Kconfig is waiting
for the user input.
Use 'kecho' to show the equivalent short log without suppressing stdout
from sub-make.
Fixes: 174a1dcc96 ("kbuild: sink stdout from cmd for silent build")
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit deb7178eb9 upstream.
fp is netdev private data and it cannot be
used after free_netdev() call. Using fp after free_netdev()
can cause UAF bug. Fix it by moving free_netdev() after error message.
Fixes: 61414f5ec9 ("FDDI: defza: Add support for DEC FDDIcontroller 700
TURBOchannel adapter")
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bcb9928a15 upstream.
This was not caught because there is no switch driver which implements
the .port_bridge_join but not .port_bridge_leave method, but it should
nonetheless be fixed, as in certain conditions (driver development) it
might lead to NULL pointer dereference.
Fixes: f66a6a69f9 ("net: dsa: permit cross-chip bridging between all trees in the system")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 67a9c94317 upstream.
skb_tunnel_info() returns pointer of lwtstate->data as ip_tunnel_info
type without validation. lwtstate->data can have various types such as
mpls_iptunnel_encap, etc and these are not compatible.
So skb_tunnel_info() should validate before returning that pointer.
Splat looks like:
BUG: KASAN: slab-out-of-bounds in vxlan_get_route+0x418/0x4b0 [vxlan]
Read of size 2 at addr ffff888106ec2698 by task ping/811
CPU: 1 PID: 811 Comm: ping Not tainted 5.13.0+ #1195
Call Trace:
dump_stack_lvl+0x56/0x7b
print_address_description.constprop.8.cold.13+0x13/0x2ee
? vxlan_get_route+0x418/0x4b0 [vxlan]
? vxlan_get_route+0x418/0x4b0 [vxlan]
kasan_report.cold.14+0x83/0xdf
? vxlan_get_route+0x418/0x4b0 [vxlan]
vxlan_get_route+0x418/0x4b0 [vxlan]
[ ... ]
vxlan_xmit_one+0x148b/0x32b0 [vxlan]
[ ... ]
vxlan_xmit+0x25c5/0x4780 [vxlan]
[ ... ]
dev_hard_start_xmit+0x1ae/0x6e0
__dev_queue_xmit+0x1f39/0x31a0
[ ... ]
neigh_xmit+0x2f9/0x940
mpls_xmit+0x911/0x1600 [mpls_iptunnel]
lwtunnel_xmit+0x18f/0x450
ip_finish_output2+0x867/0x2040
[ ... ]
Fixes: 61adedf3e3 ("route: move lwtunnel state to dst_entry")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 28b34f01a7 upstream.
Some socket buffers allocated in the fclone cache (in __alloc_skb) can
end-up in the following path[1]:
napi_skb_finish
__kfree_skb_defer
napi_skb_cache_put
The issue is napi_skb_cache_put is not fclone friendly and will put
those skbuff in the skb cache to be reused later, although this cache
only expects skbuff allocated from skbuff_head_cache. When this happens
the skbuff is eventually freed using the wrong origin cache, and we can
see traces similar to:
[ 1223.947534] cache_from_obj: Wrong slab cache. skbuff_head_cache but object is from skbuff_fclone_cache
[ 1223.948895] WARNING: CPU: 3 PID: 0 at mm/slab.h:442 kmem_cache_free+0x251/0x3e0
[ 1223.950211] Modules linked in:
[ 1223.950680] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.13.0+ #474
[ 1223.951587] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-3.fc34 04/01/2014
[ 1223.953060] RIP: 0010:kmem_cache_free+0x251/0x3e0
Leading sometimes to other memory related issues.
Fix this by using __kfree_skb for fclone skbuff, similar to what is done
the other place __kfree_skb_defer is called.
[1] At least in setups using veth pairs and tunnels. Building a kernel
with KASAN we can for example see packets allocated in
sk_stream_alloc_skb hit the above path and later the issue arises
when the skbuff is reused.
Fixes: 9243adfc31 ("skbuff: queue NAPI_MERGED_FREE skbs into NAPI cache instead of freeing")
Cc: Alexander Lobakin <alobakin@pm.me>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0336f8ffec upstream.
priv is netdev private data and it cannot be
used after free_netdev() call. Using priv after free_netdev()
can cause UAF bug. Fix it by moving free_netdev() at the end of the
function.
Fixes: 1e0a8b13d3 ("tlan: cancel work at remove path")
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ad297cd2db upstream.
adpt is netdev private data and it cannot be
used after free_netdev() call. Using adpt after free_netdev()
can cause UAF bug. Fix it by moving free_netdev() at the end of the
function.
Fixes: 54e19bc74f ("net: qcom/emac: do not use devm on internal phy pdev")
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c78eaeebe8 upstream.
In case of netdev registration failure the code path will
jump to init_fail label:
init_fail:
netdev_err(ndev, "init failed\n");
moxart_mac_free_memory(ndev);
irq_map_fail:
free_netdev(ndev);
return ret;
So, there is no need to call free_netdev() before jumping
to error handling path, since it can cause UAF or double-free
bug.
Fixes: 6c821bd9ed ("net: Add MOXA ART SoCs ethernet driver")
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9992a078b1 upstream.
Commit 28e104d002 ("net: ip_tunnel: fix mtu calculation") removed
dev->hard_header_len subtraction when calculate MTU for tunnel devices
as there is an overhead for device that has header_ops.
But there are ETHER tunnel devices, like gre_tap or erspan, which don't
have header_ops but set dev->hard_header_len during setup. This makes
pkts greater than (MTU - ETH_HLEN) could not be xmited. Fix it by
subtracting the ETHER tunnel devices' dev->hard_header_len for MTU
calculation.
Fixes: 28e104d002 ("net: ip_tunnel: fix mtu calculation")
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2b452550a2 upstream.
Make sure that we disable each of the TX and RX queues in the TDMA and
RDMA control registers. This is a correctness change to be symmetrical
with the code that enables the TX and RX queues.
Tested-by: Maxime Ripard <maxime@cerno.tech>
Fixes: 1c1008c793 ("net: bcmgenet: add main driver file")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 09adf7566d upstream.
There are two pointers in struct xfrm_state_offload, *dev, *real_dev.
These are used in callback functions of struct xfrmdev_ops.
The *dev points whether bonding interface or real interface.
If bonding ipsec offload is used, it points bonding interface If not,
it points real interface.
And real_dev always points real interface.
So, netdevsim should always use real_dev instead of dev.
Of course, real_dev always not be null.
Test commands:
ip netns add A
ip netns exec A bash
modprobe netdevsim
echo "1 1" > /sys/bus/netdevsim/new_device
ip link add bond0 type bond mode active-backup
ip link set eth0 master bond0
ip link set eth0 up
ip link set bond0 up
ip x s add proto esp dst 14.1.1.1 src 15.1.1.1 spi 0x07 mode \
transport reqid 0x07 replay-window 32 aead 'rfc4106(gcm(aes))' \
0x44434241343332312423222114131211f4f3f2f1 128 sel src 14.0.0.52/24 \
dst 14.0.0.70/24 proto tcp offload dev bond0 dir in
Splat looks like:
BUG: spinlock bad magic on CPU#5, kworker/5:1/53
lock: 0xffff8881068c2cc8, .magic: 11121314, .owner: <none>/-1,
.owner_cpu: -235736076
CPU: 5 PID: 53 Comm: kworker/5:1 Not tainted 5.13.0-rc3+ #1168
Workqueue: events linkwatch_event
Call Trace:
dump_stack+0xa4/0xe5
do_raw_spin_lock+0x20b/0x270
? rwlock_bug.part.1+0x90/0x90
_raw_spin_lock_nested+0x5f/0x70
bond_get_stats+0xe4/0x4c0 [bonding]
? rcu_read_lock_sched_held+0xc0/0xc0
? bond_neigh_init+0x2c0/0x2c0 [bonding]
? dev_get_alias+0xe2/0x190
? dev_get_port_parent_id+0x14a/0x360
? rtnl_unregister+0x190/0x190
? dev_get_phys_port_name+0xa0/0xa0
? memset+0x1f/0x40
? memcpy+0x38/0x60
? rtnl_phys_switch_id_fill+0x91/0x100
dev_get_stats+0x8c/0x270
rtnl_fill_stats+0x44/0xbe0
? nla_put+0xbe/0x140
rtnl_fill_ifinfo+0x1054/0x3ad0
[ ... ]
Fixes: 272c2330ad ("xfrm: bail early on slave pass over skb")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6ff63a150b upstream.
Always set skb_shared_info data structure in mvneta_swbm_add_rx_fragment
routine even if the fragment contains only the ethernet FCS.
Fixes: 039fbc47f9 ("net: mvneta: alloc skb_shared_info on the mvneta_rx_swbm stack")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a019abd802 upstream.
Since commit 2796d0c648 ("bridge: Automatically manage
port promiscuous mode.")
bridges with `vlan_filtering 1` and only 1 auto-port don't
set IFF_PROMISC for unicast-filtering-capable ports.
Normally on port changes `br_manage_promisc` is called to
update the promisc flags and unicast filters if necessary,
but it cannot distinguish between *new* ports and ones
losing their promisc flag, and new ports end up not
receiving the MAC address list.
Fix this by calling `br_fdb_sync_static` in `br_add_if`
after the port promisc flags are updated and the unicast
filter was supposed to have been filled.
Fixes: 2796d0c648 ("bridge: Automatically manage port promiscuous mode.")
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 77ac5e40c4 upstream.
When cleaning up the nf_table in tcf_ct_flow_table_cleanup_work
there is no guarantee that the callback list, added to by
nf_flow_table_offload_add_cb, is empty. This means that it is
possible that the flow_block_cb memory allocated will be lost.
Fix this by iterating the list and freeing the flow_block_cb entries
before freeing the nf_table entry (via freeing ct_ft).
Fixes: 978703f425 ("netfilter: flowtable: Add API for registering to flow table events")
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b22580233d upstream.
Commit dacce2be33 ("vmxnet3: add geneve and vxlan tunnel offload
support") added support for encapsulation offload. However, the inner
offload capability is to be restricted to UDP tunnels with default
Vxlan and Geneve ports.
This patch fixes the issue for tunnels with non-default ports using
features check capability and filtering appropriate features for such
tunnels.
Fixes: dacce2be33 ("vmxnet3: add geneve and vxlan tunnel offload support")
Signed-off-by: Ronak Doshi <doshir@vmware.com>
Acked-by: Guolin Yang <gyang@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4ca041f919 upstream.
In the case where chain->flags & NFT_CHAIN_HW_OFFLOAD is false then
nft_flow_rule_create is not called and flow is NULL. The subsequent
error handling execution via label err_destroy_flow_rule will lead
to a null pointer dereference on flow when calling nft_flow_rule_destroy.
Since the error path to err_destroy_flow_rule has to cater for null
and non-null flows, only call nft_flow_rule_destroy if flow is non-null
to fix this issue.
Addresses-Coverity: ("Explicity null dereference")
Fixes: 3c5e446220 ("netfilter: nf_tables: memleak in hw offload abort path")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8955b90c3c upstream.
The confirm operation should be checked. If there are any failed,
the packet should be dropped like in ovs and netfilter.
Fixes: b57dc7c13e ("net/sched: Introduce action ct")
Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 40fc3054b4 upstream.
Commit 628a5c5618 ("[INET]: Add IP(V6)_PMTUDISC_RPOBE") introduced
ip6_skb_dst_mtu with return value of signed int which is inconsistent
with actually returned values. Also 2 users of this function actually
assign its value to unsigned int variable and only __xfrm6_output
assigns result of this function to signed variable but actually uses
as unsigned in further comparisons and calls. Change this function
to return unsigned int value.
Fixes: 628a5c5618 ("[INET]: Add IP(V6)_PMTUDISC_RPOBE")
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 953b0dcbe2 upstream.
Commit bf3504cea7 ("net: dsa: mv88e6xxx: Add 6390 family PCS
registers to ethtool -d") added support for dumping SerDes PCS registers
via ethtool -d for Peridot.
The same implementation is also valid for Topaz, but was not
enabled at the time.
Signed-off-by: Marek Behún <kabel@kernel.org>
Fixes: bf3504cea7 ("net: dsa: mv88e6xxx: Add 6390 family PCS registers to ethtool -d")
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a03b98d683 upstream.
Commit 0df9528736 ("mv88e6xxx: Add serdes Rx statistics") added
support for RX statistics on SerDes ports for Peridot.
This same implementation is also valid for Topaz, but was not enabled
at the time.
We need to use the generic .serdes_get_lane() method instead of the
Peridot specific one in the stats methods so that on Topaz the proper
one is used.
Signed-off-by: Marek Behún <kabel@kernel.org>
Fixes: 0df9528736 ("mv88e6xxx: Add serdes Rx statistics")
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c07fff3492 upstream.
Commit 23e8b470c7 ("net: dsa: mv88e6xxx: Add devlink param for ATU
hash algorithm.") introduced ATU hash algorithm access via devlink, but
did not enable it for Topaz.
Enable this feature also for Topaz.
Signed-off-by: Marek Behún <kabel@kernel.org>
Fixes: 23e8b470c7 ("net: dsa: mv88e6xxx: Add devlink param for ATU hash algorithm.")
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3709488790 upstream.
Commit 9e5baf9b36 ("net: dsa: mv88e6xxx: add RMU disable op")
introduced .rmu_disable() method with implementation for several models,
but forgot to add Topaz, which can use the Peridot implementation.
Use the Peridot implementation of .rmu_disable() on Topaz.
Signed-off-by: Marek Behún <kabel@kernel.org>
Fixes: 9e5baf9b36 ("net: dsa: mv88e6xxx: add RMU disable op")
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7da467d82d upstream.
Commit f3a2cd326e ("net: dsa: mv88e6xxx: introduce .port_set_policy")
introduced .port_set_policy() method with implementation for several
models, but forgot to add Topaz, which can use the 6352 implementation.
Use the 6352 implementation of .port_set_policy() on Topaz.
Signed-off-by: Marek Behún <kabel@kernel.org>
Fixes: f3a2cd326e ("net: dsa: mv88e6xxx: introduce .port_set_policy")
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5a3c680aa2 upstream.
Setting the EXT_ENERGY_DET_MASK bit allows the port energy detection
logic of the internal PHY to prevent the system from sleeping. Some
internal PHYs will report that energy is detected when the network
interface is closed which can prevent the system from going to sleep
if WoL is enabled when the interface is brought down.
Since the driver does not support waking the system on this logic,
this commit clears the bit whenever the internal PHY is powered up
and the other logic for manipulating the bit is removed since it
serves no useful function.
Fixes: 1c1008c793 ("net: bcmgenet: add main driver file")
Signed-off-by: Doug Berger <opendmb@gmail.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8f34f1eac3 upstream.
We tried to do something similar in b569a17607 ("userfaultfd: wp: drop
_PAGE_UFFD_WP properly when fork") previously, but it's not doing it all
right.. A few fixes around the code path:
1. We were referencing VM_UFFD_WP vm_flags on the _old_ vma rather
than the new vma. That's overlooked in b569a17607, so it won't work
as expected. Thanks to the recent rework on fork code
(7a4830c380), we can easily get the new vma now, so switch the
checks to that.
2. Dropping the uffd-wp bit in copy_huge_pmd() could be wrong if the
huge pmd is a migration huge pmd. When it happens, instead of using
pmd_uffd_wp(), we should use pmd_swp_uffd_wp(). The fix is simply to
handle them separately.
3. Forget to carry over uffd-wp bit for a write migration huge pmd
entry. This also happens in copy_huge_pmd(), where we converted a
write huge migration entry into a read one.
4. In copy_nonpresent_pte(), drop uffd-wp if necessary for swap ptes.
5. In copy_present_page() when COW is enforced when fork(), we also
need to pass over the uffd-wp bit if VM_UFFD_WP is armed on the new
vma, and when the pte to be copied has uffd-wp bit set.
Remove the comment in copy_present_pte() about this. It won't help a huge
lot to only comment there, but comment everywhere would be an overkill.
Let's assume the commit messages would help.
[peterx@redhat.com: fix a few thp pmd missing uffd-wp bit]
Link: https://lkml.kernel.org/r/20210428225030.9708-4-peterx@redhat.com
Link: https://lkml.kernel.org/r/20210428225030.9708-3-peterx@redhat.com
Fixes: b569a17607 ("userfaultfd: wp: drop _PAGE_UFFD_WP properly when fork")
Signed-off-by: Peter Xu <peterx@redhat.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joe Perches <joe@perches.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Wang Qing <wangqing@vivo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5fc7a5f6fd upstream.
Patch series "mm/uffd: Misc fix for uffd-wp and one more test".
This series tries to fix some corner case bugs for uffd-wp on either thp
or fork(). Then it introduced a new test with pagemap/pageout.
Patch layout:
Patch 1: cleanup for THP, it'll slightly simplify the follow up patches
Patch 2-4: misc fixes for uffd-wp here and there; please refer to each patch
Patch 5: add pagemap support for uffd-wp
Patch 6: add pagemap/pageout test for uffd-wp
The last test introduced can also verify some of the fixes in previous
patches, as the test will fail without the fixes. However it's not easy
to verify all the changes in patch 2-4, but hopefully they can still be
properly reviewed.
Note that if considering the ongoing uffd-wp shmem & hugetlbfs work, patch
5 will be incomplete as it's missing e.g. hugetlbfs part or the special
swap pte detection. However that's not needed in this series, and since
that series is still during review, this series does not depend on that
one (the last test only runs with anonymous memory, not file-backed). So
this series can be merged even before that series.
This patch (of 6):
Huge zero page is handled in a special path in copy_huge_pmd(), however it
should share most codes with a normal thp page. Trying to share more code
with it by removing the special path. The only leftover so far is the
huge zero page refcounting (mm_get_huge_zero_page()), because that's
separately done with a global counter.
This prepares for a future patch to modify the huge pmd to be installed,
so that we don't need to duplicate it explicitly into huge zero page case
too.
Link: https://lkml.kernel.org/r/20210428225030.9708-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20210428225030.9708-2-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Mike Kravetz <mike.kravetz@oracle.com>, peterx@redhat.com
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Brian Geffon <bgeffon@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Wang Qing <wangqing@vivo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3a52a48973 ]
Move the turris-mox-rwtm firmware node from Turris MOX' device tree into
the generic armada-37xx.dtsi file and use the generic compatible string
'marvell,armada-3700-rwtm-firmware' instead of the current one.
Turris MOX DTS file contains also old compatible string for backward
compatibility.
The Turris MOX rWTM firmware can be used on any Armada 37xx device,
giving them access to the rWTM hardware random number generator, which
is otherwise unavailable.
This change allows Linux to load the turris-mox-rwtm.ko module on these
boards.
Tested on ESPRESSObin v5 with both default Marvell WTMI firmware and
CZ.NIC's firmware. With default WTMI firmware the turris-mox-rwtm fails
to probe, while with CZ.NIC's firmware it registers the HW random number
generator.
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Marek Behún <kabel@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 90ae47215d ]
Add more generic compatible string 'marvell,armada-3700-rwtm-firmware' for
this driver, since it can also be used on other Armada 3720 devices.
Current compatible string 'cznic,turris-mox-rwtm' is kept for backward
compatibility.
Signed-off-by: Pali Rohár <pali@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 03313d1c3a ]
The optional @ref parameter might contain an NULL node_name, so
prevent dereferencing it in cifs_compose_mount_options().
Addresses-Coverity: 1476408 ("Explicit null dereferenced")
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 41d71fe59c ]
The existing CALL_ON_STACK() macro allows for subtle bugs:
- There is no type checking of the function that is being called. That
is: missing or too many arguments do not cause any compile error or
warning. The same is true if the return type of the called function
changes. This can lead to quite random bugs.
- Sign and zero extension of arguments is missing. Given that the s390
C ABI requires that the caller of a function performs proper sign
and zero extension this can also lead to subtle bugs.
- If arguments to the CALL_ON_STACK() macros contain functions calls
register corruption can happen due to register asm constructs being
used.
Therefore introduce a new call_on_stack() macro which is supposed to
fix all these problems.
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b8e9cc20b8 ]
tinyconfig fails to boot, because without CONFIG_BUG report_bug()
always returns BUG_TRAP_TYPE_BUG, which causes mc 0,0 in
test_monitor_call() to panic. Fix by skipping the test without
CONFIG_BUG.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5e5c9f9a75 ]
Zone device is enabled after thermal_zone_of_sensor_register() completion,
but it's not disabled before senor is unregistered, leaving temperature
polling active. This results in accessing a disabled zone device and
produces a warning about this problem. Stop zone device before
unregistering it in order to fix this "use-after-free" problem.
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20210616190417.32214-3-digetx@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 72d0ad7cb5 ]
The time remaining until expiry of the refresh_timer can be negative.
Casting the type to an unsigned 64-bit value will cause integer
underflow, making the runtime_refresh_within return false instead of
true. These situations are rare, but they do happen.
This does not cause user-facing issues or errors; other than
possibly unthrottling cfs_rq's using runtime from the previous period(s),
making the CFS bandwidth enforcement less strict in those (special)
situations.
Signed-off-by: Odin Ugedal <odin@uged.al>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Ben Segall <bsegall@google.com>
Link: https://lore.kernel.org/r/20210629121452.18429-1-odin@uged.al
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 49da96d779 ]
Offlining a SATA device connected to a hisi SAS controller and then
scanning the host will result in detecting 255 non-existent devices:
# lsscsi
[2:0:0:0] disk ATA Samsung SSD 860 2B6Q /dev/sda
[2:0:1:0] disk ATA WDC WD2003FYYS-3 1D01 /dev/sdb
[2:0:2:0] disk SEAGATE ST600MM0006 B001 /dev/sdc
# echo "offline" > /sys/block/sdb/device/state
# echo "- - -" > /sys/class/scsi_host/host2/scan
# lsscsi
[2:0:0:0] disk ATA Samsung SSD 860 2B6Q /dev/sda
[2:0:1:0] disk ATA WDC WD2003FYYS-3 1D01 /dev/sdb
[2:0:1:1] disk ATA WDC WD2003FYYS-3 1D01 /dev/sdh
...
[2:0:1:255] disk ATA WDC WD2003FYYS-3 1D01 /dev/sdjb
After a REPORT LUN command issued to the offline device fails, the SCSI
midlayer tries to do a sequential scan of all devices whose LUN number is
not 0. However, SATA does not support LUN numbers at all.
Introduce a generic sas_slave_alloc() handler which will return -ENXIO for
SATA devices if the requested LUN number is larger than 0 and make libsas
drivers use this function as their .slave_alloc callback.
Link: https://lore.kernel.org/r/20210622034037.1467088-1-yuyufen@huawei.com
Reported-by: Wu Bo <wubo40@huawei.com>
Suggested-by: John Garry <john.garry@huawei.com>
Reviewed-by: John Garry <john.garry@huawei.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 332a9dd1d8 ]
The shifting of the u8 integer returned fom ahc_inb(ahc, port+3) by 24 bits
to the left will be promoted to a 32 bit signed int and then sign-extended
to a u64. In the event that the top bit of the u8 is set then all then all
the upper 32 bits of the u64 end up as also being set because of the
sign-extension. Fix this by casting the u8 values to a u64 before the 24
bit left shift.
[ This dates back to 2002, I found the offending commit from the git
history git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git,
commit f58eb66c0b0a ("Update aic7xxx driver to 6.2.10...") ]
Link: https://lore.kernel.org/r/20210621151727.20667-1-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Addresses-Coverity: ("Unintended sign extension")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 742b0d7e15 ]
Interrupt line can be configured on different hardware in different way,
even inverted. Therefore driver should not enforce specific trigger
type - edge falling - but instead rely on Devicetree to configure it.
The Maxim 77686 datasheet describes the interrupt line as active low
with a requirement of acknowledge from the CPU therefore the edge
falling is not correct.
The interrupt line is shared between PMIC and RTC driver, so using level
sensitive interrupt is here especially important to avoid races. With
an edge configuration in case if first PMIC signals interrupt followed
shortly after by the RTC, the interrupt might not be yet cleared/acked
thus the second one would not be noticed.
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20210526172036.183223-6-krzysztof.kozlowski@canonical.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a979522a1a ]
To avoid unnecessary recompilations, mkcompile_h does not regenerate
compile.h if just the timestamp changed.
Though, if KBUILD_BUILD_TIMESTAMP is set, an explicit timestamp for the
build was requested, in which case we should not ignore it.
If a user follows the documentation for reproducible builds [1] and
defines KBUILD_BUILD_TIMESTAMP as the git commit timestamp, a clean
build will have the correct timestamp. A subsequent cherry-pick (or
amend) changes the commit timestamp and if an incremental build is done
with a different KBUILD_BUILD_TIMESTAMP now, that new value is not taken
into consideration. But it should for reproducibility.
Hence, whenever KBUILD_BUILD_TIMESTAMP is explicitly set, do not ignore
UTS_VERSION when making a decision about whether the regenerated version
of compile.h should be moved into place.
[1] https://www.kernel.org/doc/html/latest/kbuild/reproducible-builds.html
Signed-off-by: Matthias Maennich <maennich@google.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3ae5950db6 ]
With -Wshadow:
drivers/thermal/rcar_gen3_thermal.c: In function ‘rcar_gen3_thermal_probe’:
drivers/thermal/rcar_gen3_thermal.c:310:13: warning: declaration of ‘rcar_gen3_ths_tj_1’ shadows a global declaration [-Wshadow]
310 | const int *rcar_gen3_ths_tj_1 = of_device_get_match_data(dev);
| ^~~~~~~~~~~~~~~~~~
drivers/thermal/rcar_gen3_thermal.c:246:18: note: shadowed declaration is here
246 | static const int rcar_gen3_ths_tj_1 = 126;
| ^~~~~~~~~~~~~~~~~~
To add to the confusion, the local variable has a different type.
Fix the shadowing by renaming the local variable to ths_tj_1.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/9ea7e65d0331daba96f9a7925cb3d12d2170efb1.1623076804.git.geert+renesas@glider.be
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit dfda1fd16a ]
enet_clk_ref actually is sourced from internal gpr clocks
which needs a default rate. Also update enet lpcg clock
output names to be more straightforward.
Cc: Abel Vesa <abel.vesa@nxp.com>
Cc: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Dong Aisheng <aisheng.dong@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 15a5261e4d ]
This fixes multiple issues with the current non-existent PCIe clock setup:
The controller can run at up to 250MHz, so use a parent that provides this
clock.
The PHY needs an exact 100MHz reference clock to function if the PCIe
refclock is not fed in via the refclock pads. While this mode is not
supported (yet) in the driver it doesn't hurt to make sure we are
providing a clock with the right rate.
The AUX clock is specified to have a maximum clock rate of 10MHz. So
the current setup, which drives it straight from the 25MHz oscillator is
actually overclocking the AUX input.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1de3aa8611 ]
The PMIC throws an errors because the clock isn't assigned to it.
Fix this by assigning the clocks info.
Signed-off-by: Adam Ford <aford173@gmail.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8240c972c1 ]
On LS2088A-RDB board, if the spi-fsl-dspi driver is built as module
then its probe fails with the following warning:
[ 10.471363] couldn't get idr
[ 10.471381] WARNING: CPU: 4 PID: 488 at drivers/spi/spi.c:2689 spi_register_controller+0x73c/0x8d0
...
[ 10.471651] fsl-dspi 2100000.spi: Problem registering DSPI ctlr
[ 10.471708] fsl-dspi: probe of 2100000.spi failed with error -16
Reason for the failure is that bus-num property is set for dspi node.
However, bus-num property is not set for the qspi node. If probe for
spi-fsl-qspi happens first then id 0 is dynamically allocated to it.
Call to spi_register_controller() from spi-fsl-dspi driver then fails.
Since commit 29d2daf2c3 ("spi: spi-fsl-dspi: Make bus-num property
optional") bus-num property is optional. Remove bus-num property from
dspi node to fix the issue.
Signed-off-by: Mian Yousaf Kaukab <ykaukab@suse.de>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8efe01b438 ]
The PCIe host bridge on RK3399 advertises a single 64-bit memory
address range even though it lies entirely below 4GB.
Previously the OF PCI range parser treated 64-bit ranges more
leniently (i.e., as 32-bit), but since commit 9d57e61bf7 ("of/pci:
Add IORESOURCE_MEM_64 to resource flags for 64-bit memory addresses")
the code takes a stricter view and treats the ranges as advertised in
the device tree (i.e, as 64-bit).
The change in behaviour causes failure when allocating bus addresses
to devices connected behind a PCI-to-PCI bridge that require
non-prefetchable memory ranges. The allocation failure was observed
for certain Samsung NVMe drives connected to RockPro64 boards.
Update the host bridge window attributes to treat it as 32-bit address
memory. This fixes the allocation failure observed since commit
9d57e61bf7.
Reported-by: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/r/7a1e2ebc-f7d8-8431-d844-41a9c36a8911@arm.com
Suggested-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Punit Agrawal <punitagrawal@gmail.com>
Tested-by: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/r/20210607112856.3499682-5-punitagrawal@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bd778b8939 ]
The tegra186_bpmp_ops symbol is used on Tegra234, so make sure it's
available.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e2d0ee225e ]
The tegra30_fuse_read() symbol is used on Tegra234, so make sure it's
available.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 093a474ce1 ]
If the PM related modules are not loaded and PM firmware not configured,
the system suspend fails to resume. Let's fix this by adding initial
platform_suspend_ops to block suspend and warn about missing modules.
When pm33xx and wkup_m3_ipc have been loaded and m3 coprocessor booted
with it's firmware, pm33xx sets up working platform_suspend_ops. Note
that we need to configure at least PM_SUSPEND_STANDBY to have
suspend_set_ops().
Cc: Dave Gerlach <d-gerlach@ti.com>
Cc: Suman Anna <s-anna@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4bf4abe190 ]
On some STM32 MP15 boards, stpmic node is not correct which generates
warnings running "make dtbs_check W=1" command. Issues are:
-"regulator-active-discharge" is not a boolean but an uint32.
-"regulator-over-current-protection" is not a valid entry for vref_ddr.
-LDO4 has a fixed voltage (3v3) so min/max entries are not allowed.
Signed-off-by: Alexandre Torgue <alexandre.torgue@foss.st.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f493162319 ]
The Seeed SoM-STM32MP157C device tree had the eMMC’s (SDMMC2) VQMMC
supply set to v3v3 (buck4), the same as the VMMC supply. That was
incorrect, as on the SoM, the VQMMC supply is provided from vdd (buck3)
instead.
Signed-off-by: Grzegorz Szymaszek <gszymaszek@short.pl>
Signed-off-by: Alexandre Torgue <alexandre.torgue@foss.st.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 11aaf2a0f8 ]
It prevents the following warning:
pin-controller@50002000: 'ltdc' does not match any of the regexes:
'-[0-9]*$', '^gpio@[0-9a-f]*$', 'pinctrl-[0-9]+'
Signed-off-by: Alexandre Torgue <alexandre.torgue@foss.st.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e0740fb869 ]
Fix compilation warning on 64bit platforms caused by implicit promotion
of 32bit signed integer to a 64bit unsigned value which happens after
enabling compile-testing of the EMC drivers.
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bf24b91f4b ]
Fix following warning observed with "make dtbs_check W=1" command.
It concerns f429 eval and disco boards, f769 disco board.
Warning (unit_address_vs_reg): /gpio_keys/button@0: node has a unit name,
but no reg or ranges property
Signed-off-by: Alexandre Torgue <alexandre.torgue@foss.st.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0171b07373 ]
The microSD card detect pin is physically connected to the MPU pin PI3.
The Device Tree configuration of the card detect pin was wrong—it was
set to pin PB7 instead. If such configuration was used, the kernel would
hang on “Waiting for root device” when booting from a microSD card.
Signed-off-by: Grzegorz Szymaszek <gszymaszek@short.pl>
Signed-off-by: Alexandre Torgue <alexandre.torgue@foss.st.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1dee9e3b09 ]
Some node unit addresses were put wrongly in the dts, resulting in
below warning when run with W=1
arch/arm64/boot/dts/qcom/sm8350.dtsi:693.34-702.5: Warning (simple_bus_reg): /soc@0/thermal-sensor@c222000: simple-bus unit address format error, expected "c263000"
arch/arm64/boot/dts/qcom/sm8350.dtsi:704.34-713.5: Warning (simple_bus_reg): /soc@0/thermal-sensor@c223000: simple-bus unit address format error, expected "c265000"
arch/arm64/boot/dts/qcom/sm8350.dtsi:1180.32-1185.5: Warning (simple_bus_reg): /soc@0/interconnect@90e0000: simple-bus unit address format error, expected "90c0000"
Fix by correcting to the correct address as given in reg node
Reviewed-by: Robert Foss <robert.foss@linaro.org>
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Link: https://lore.kernel.org/r/20210513060733.382420-1-vkoul@kernel.org
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c4dd6066bc ]
The 3v3 regulator GPIO is GP6 and not GP7, which is the DDR regulator.
Both regulators are always-on, nevertheless the DT model needs to be
corrected, fix it.
Reported-by: Svyatoslav Ryhel <clamor95@gmail.com>
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5f45da704d ]
All Tegra boards which use WM8903 audio codec are specifying a wrong
polarity for the headphones detection GPIO. The kernel driver hardcodes
the polarity to active-low, which is the correct polarity, so we can fix
the device-trees safely.
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 69db725cdb ]
The MCU RGMII MCU_RGMII1_TXC pin is defined as input by mistake, although
this does not make any difference functionality wise it's better to update
to avoid confusion.
Hence fix MCU RGMII MCU_RGMII1_TXC pin pinmux definitions to be an output
in K3 am654x/j721e/j7200 board files.
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Reviewed-by: Vignesh Raghavendra <vigneshr@ti.com>
Signed-off-by: Nishanth Menon <nm@ti.com>
Link: https://lore.kernel.org/r/20210526132041.6104-1-grygorii.strashko@ti.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9e79e58f33 ]
Populate the device-tree node for the PMU device on Tegra194. This also
fixes the following warning that is observed on booting Tegra194.
ERR KERN kvm: pmu event creation failed -2
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9e7f5ee113 ]
A number of sub-mailbox node names in various OMAP2+ dts files are
currently using underscores. This is not adhering to the node name
convention, fix all of these to use hiphens.
These nodes are already using the prefix mbox, so they will be in
compliance with the sub-mailbox node name convention being added in
the OMAP Mailbox YAML binding as well.
Signed-off-by: Suman Anna <s-anna@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d7d30b8fcd ]
The ti,no-reset-on-init flag need to be at the interconnect target module
level for the modules that have it defined.
The ti-sysc driver handles this case, but produces warning, not a critical
issue.
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2566d5b8c1 ]
The ti,no-reset-on-init flag need to be at the interconnect target module
level for the modules that have it defined.
The ti-sysc driver handles this case, but produces warning, not a critical
issue.
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b644c5e01c ]
The ti,no-reset-on-init flag need to be at the interconnect target module
level for the modules that have it defined.
The ti-sysc driver handles this case, but produces warning, not a critical
issue.
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 174a1dcc96 ]
When building with 'make -s', no output to stdout should be printed.
As Arnd Bergmann reported [1], mkimage shows the detailed information
of the generated images.
I think this should be suppressed by the 'cmd' macro instead of by
individual scripts.
Insert 'exec >/dev/null;' in order to redirect stdout to /dev/null for
silent builds.
[Note about this implementation]
'exec >/dev/null;' may look somewhat tricky, but this has a reason.
Appending '>/dev/null' at the end of command line is a common way for
redirection, so I first tried this:
cmd = @set -e; $(echo-cmd) $(cmd_$(1)) >/dev/null
... but it would not work if $(cmd_$(1)) itself contains a redirection.
For example, cmd_wrap in scripts/Makefile.asm-generic redirects the
output from the 'echo' command into the target file.
It would be expanded into:
echo "#include <asm-generic/$*.h>" > $@ >/dev/null
Then, the target file gets empty because the string will go to /dev/null
instead of $@.
Next, I tried this:
cmd = @set -e; $(echo-cmd) { $(cmd_$(1)); } >/dev/null
The form above would be expanded into:
{ echo "#include <asm-generic/$*.h>" > $@; } >/dev/null
This works as expected. However, it would be a syntax error if
$(cmd_$(1)) is empty.
When CONFIG_TRIM_UNUSED_KSYMS is disabled, $(call cmd,gen_ksymdeps) in
scripts/Makefile.build would be expanded into:
set -e; { ; } >/dev/null
..., which causes an syntax error.
I also tried this:
cmd = @set -e; $(echo-cmd) ( $(cmd_$(1)) ) >/dev/null
... but this causes a syntax error for the same reason.
So, finally I adopted:
cmd = @set -e; $(echo-cmd) exec >/dev/null; $(cmd_$(1))
[1]: https://lore.kernel.org/lkml/20210514135752.2910387-1-arnd@kernel.org/
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ebc666f39f ]
The RZ/G2 boards expect there to be an external clock reference for
USB2 EHCI controllers. For the Beacon boards, this reference clock
is controlled by a programmable versaclock. Because the RZ/G2
family has a special clock driver when using an external clock,
the third clock reference in the EHCI node needs to point to this
special clock, called usb2_clksel.
Since the usb2_clksel does not keep the usb_extal clock enabled,
the 4th clock entry for the EHCI nodes needs to reference it to
keep the clock running and make USB functional.
Signed-off-by: Adam Ford <aford173@gmail.com>
Link: https://lore.kernel.org/r/20210513114617.30191-2-aford173@gmail.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e409c1e1d5 ]
This fixes up the axis on the Janice accelerometer to give
the right orientation according to tests.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 723de6a412 ]
Without SoC specific PHY fixups the network interface on this board will
fail to work. Provide missing DT properties to make it work again.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 59ba546d16 ]
The Golden and Skomer phones have BCM4334 WLAN+BT chips,
so make the compatible strings reflect the new available
bindings for these.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4beba40119 ]
This adds a mounting matrix to the accelerometer
on the TVK1281618 R3.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4917b70281 ]
Rename the AB8500 gpio controller node from ab8500-gpio to
ab8500-gpiocontroller, since -gpio is a common suffix for
gpio consumers.
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b82f8e2992 ]
A test with the command below gives this error:
/arch/arm64/boot/dts/rockchip/rk3328-nanopi-r2s.dt.yaml:
sdmmcio-regulator: states:0:
[1800000, 1, 3300000, 0] is too long
dtbs_check expects regulator-gpio states in a format
of 2 per item, so fix them all.
make ARCH=arm64 dtbs_check
DT_SCHEMA_FILES=Documentation/devicetree/bindings/
regulator/gpio-regulator.yaml
Signed-off-by: Johan Jonker <jbx6244@gmail.com>
Link: https://lore.kernel.org/r/20210510215840.16270-1-jbx6244@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 14cdc1f243 ]
Serial interface uart3 on phyFLEX board is capable of 5-wire connection
including signals RTS and CTS for hardware flow control.
Fix signals UART3_CTS_B and UART3_RTS_B padmux assignments and add
missing property "uart-has-rtscts" to allow serial interface to be
configured and used with the hardware flow control.
Signed-off-by: Primoz Fiser <primoz.fiser@norik.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 05cf8fffcd ]
The to_ti_syscon_reset_data macro currently only works if the
parameter passed into it is called 'rcdev'.
Fixes a checkpatch --strict issue:
CHECK: Macro argument reuse 'rcdev' - possible side-effects?
#53: FILE: drivers/reset/reset-ti-syscon.c:53:
+#define to_ti_syscon_reset_data(rcdev) \
+ container_of(rcdev, struct ti_syscon_reset_data, rcdev)
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a7ecfad495 ]
A test with the command below aimed at powerpc generates
notifications in the Rockchip arm64 tree.
Fix pinctrl "sleep" nodename by renaming it to "suspend"
for rk3399.dtsi
make ARCH=arm64 dtbs_check
DT_SCHEMA_FILES=Documentation/devicetree/bindings/powerpc/sleep.yaml
Signed-off-by: Johan Jonker <jbx6244@gmail.com>
Link: https://lore.kernel.org/r/20210126110221.10815-2-jbx6244@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit dfbfb86a43 ]
A test with the command below aimed at powerpc generates
notifications in the Rockchip ARM tree.
Fix pinctrl "sleep" nodename by renaming it to "suspend"
for rk3036-kylin and rk3288
make ARCH=arm dtbs_check
DT_SCHEMA_FILES=Documentation/devicetree/bindings/powerpc/sleep.yaml
Signed-off-by: Johan Jonker <jbx6244@gmail.com>
Link: https://lore.kernel.org/r/20210126110221.10815-1-jbx6244@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c47a5599ed ]
Some different PMU types may have the same substring. For example, on
Icelake server we have PMU types "uncore_imc" and
"uncore_imc_free_running". Both PMU types have the substring
"uncore_imc". But the parser wrongly thinks they are the same PMU type.
We enable an imc event,
perf stat -e uncore_imc/event=0xe3/ -a -- sleep 1
Perf actually expands the event to:
uncore_imc_0/event=0xe3/
uncore_imc_1/event=0xe3/
uncore_imc_2/event=0xe3/
uncore_imc_3/event=0xe3/
uncore_imc_4/event=0xe3/
uncore_imc_5/event=0xe3/
uncore_imc_6/event=0xe3/
uncore_imc_7/event=0xe3/
uncore_imc_free_running_0/event=0xe3/
uncore_imc_free_running_1/event=0xe3/
uncore_imc_free_running_3/event=0xe3/
uncore_imc_free_running_4/event=0xe3/
That's because the "uncore_imc_free_running" matches the
pattern "uncore_imc*".
Now we check that the last characters of PMU name is '_<digit>'.
For example, for pattern "uncore_imc*", "uncore_imc_0" is parsed ok, but
"uncore_imc_free_running_0" fails.
Fixes: b2b9d3a3f0 ("perf pmu: Support wildcards on pmu name in dynamic pmu events")
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Agustin Vega-Frias <agustinv@codeaurora.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210701064253.1175-1-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 47ce8527fb ]
Accessing raw timers (currently only CLOCK_MONOTONIC_RAW) through VDSO
doesn't return the correct time when using the GIC as clock source.
The address of the GIC mapped page is in this case not calculated
correctly. The GIC mapped page is calculated from the VDSO data by
subtracting PAGE_SIZE:
void *get_gic(const struct vdso_data *data) {
return (void __iomem *)data - PAGE_SIZE;
}
However, the data pointer is not page aligned for raw clock sources.
This is because the VDSO data for raw clock sources (CS_RAW = 1) is
stored after the VDSO data for coarse clock sources (CS_HRES_COARSE = 0).
Therefore, only the VDSO data for CS_HRES_COARSE is page aligned:
+--------------------+
| |
| vd[CS_RAW] | ---+
| vd[CS_HRES_COARSE] | |
+--------------------+ | -PAGE_SIZE
| | |
| GIC mapped page | <--+
| |
+--------------------+
When __arch_get_hw_counter() is called with &vd[CS_RAW], get_gic returns
the wrong address (somewhere inside the GIC mapped page). The GIC counter
values are not returned which results in an invalid time.
Fixes: a7f4df4e21 ("MIPS: VDSO: Add implementations of gettimeofday() and clock_gettime()")
Signed-off-by: Martin Fäcknitz <faecknitz@hotsplots.de>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0aa4ff7688 ]
This is no longer true since we switched to generic entry. The code
switches to the IRQ stack before calling do_IRQ, but switches back
to the kernel stack before calling irq_exit().
Fixes: 56e62a7370 ("s390: convert to generic entry")
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit dea8cfcc33 ]
Commit 48a1f56526 ("perf script python: Add more PMU fields to
event handler dict") added functionality to report fields like weight,
iregs, uregs etc via perf report. That commit predefined buffer size to
512 bytes to print those fields.
But in PowerPC, since we added extended regs support in:
068aeea377 ("perf powerpc: Support exposing Performance Monitor Counter SPRs as part of extended regs")
d735599a06 ("powerpc/perf: Add extended regs support for power10 platform")
Now iregs can carry more bytes of data and this predefined buffer size
can result to data loss in perf script output.
This patch resolves this issue by making the buffer size dynamic, based
on the number of registers needed to print. It also changes the
regs_map() return type from int to void, as it is not being used by the
set_regs_in_dict(), its only caller.
Fixes: 068aeea377 ("perf powerpc: Support exposing Performance Monitor Counter SPRs as part of extended regs")
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Tested-by: Nageswara R Sastry <rnsastry@linux.ibm.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Paul Clarke <pc@us.ibm.com>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lore.kernel.org/lkml/20210628062341.155839-1-kjain@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 97e488073c ]
Use DISABLE_BRANCH_PROFILING for arch/mips/boot/compressed/decompress.o
to prevent linkage errors.
mips64-linux-ld: arch/mips/boot/compressed/decompress.o: in function `LZ4_decompress_fast_extDict':
decompress.c:(.text+0x8c): undefined reference to `ftrace_likely_update'
mips64-linux-ld: decompress.c:(.text+0xf4): undefined reference to `ftrace_likely_update'
mips64-linux-ld: decompress.c:(.text+0x200): undefined reference to `ftrace_likely_update'
mips64-linux-ld: decompress.c:(.text+0x230): undefined reference to `ftrace_likely_update'
mips64-linux-ld: decompress.c:(.text+0x320): undefined reference to `ftrace_likely_update'
mips64-linux-ld: arch/mips/boot/compressed/decompress.o:decompress.c:(.text+0x3f4): more undefined references to `ftrace_likely_update' follow
Fixes: e76e1fdfa8 ("lib: add support for LZ4-compressed kernel")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: linux-mips@vger.kernel.org
Cc: Kyungsik Lee <kyungsik.lee@lge.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit cddc40f561 ]
My series to clean up the unaligned access implementation
across architectures caused some mips randconfig builds to
fail with:
mips64-linux-ld: arch/mips/boot/compressed/decompress.o: in function `decompress_kernel':
decompress.c:(.text.decompress_kernel+0x54): undefined reference to `__bswapsi2'
It turns out that this problem has already been fixed for the XZ
decompressor but now it also shows up in (at least) LZO and LZ4. From my
analysis I concluded that the compiler could always have emitted those
calls, but the different implementation allowed it to make otherwise
better decisions about not inlining the byteswap, which results in the
link error when the out-of-line code is missing.
While it could be addressed by adding it to the two decompressor
implementations that are known to be affected, but as this only adds
112 bytes to the kernel, the safer choice is to always add them.
Fixes: c50ec67875 ("MIPS: zboot: Fix the build with XZ compression on older GCC versions")
Fixes: 0652035a57 ("asm-generic: unaligned: remove byteshift helpers")
Link: https://lore.kernel.org/linux-mm/202106301304.gz2wVY9w-lkp@intel.com/
Link: https://lore.kernel.org/linux-mm/202106260659.TyMe8mjr-lkp@intel.com/
Link: https://lore.kernel.org/linux-mm/202106172016.onWT6Tza-lkp@intel.com/
Link: https://lore.kernel.org/linux-mm/202105231743.JJcALnhS-lkp@intel.com/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9e667624c2 ]
It turns out that jump_label_text_reserved() was reporting __init text
as being reserved past the time when the __init text was freed and
re-used.
For a long time, this resulted in, at worst, not being able to kprobe
text that happened to land at the re-used address. However a recent
commit e7bf1ba97a ("jump_label, x86: Emit short JMP") made it a
fatal mistake because it now needs to read the instruction in order to
determine the conflict -- an instruction that's no longer there.
Fixes: 4c3ef6d793 ("jump label: Add jump_label_text_reserved() to reserve jump points")
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20210628113045.045141693@infradead.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3e1493f463 ]
When a task wakes up on an idle rq, uclamp_rq_util_with() would max
aggregate with rq value. But since there is no task enqueued yet, the
values are stale based on the last task that was running. When the new
task actually wakes up and enqueued, then the rq uclamp values should
reflect that of the newly woken up task effective uclamp values.
This is a problem particularly for uclamp_max because it default to
1024. If a task p with uclamp_max = 512 wakes up, then max aggregation
would ignore the capping that should apply when this task is enqueued,
which is wrong.
Fix that by ignoring max aggregation if the rq is idle since in that
case the effective uclamp value of the rq will be the ones of the task
that will wake up.
Fixes: 9d20ad7dfc ("sched/uclamp: Add uclamp_util_with()")
Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
[qias: Changelog]
Reviewed-by: Qais Yousef <qais.yousef@arm.com>
Link: https://lore.kernel.org/r/20210630141204.8197-1-xuewen.yan94@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit eb607cd495 ]
Re-add the regulator-always-on property for vcc_sdio which supplies sdmmc,
since it gets disabled during reboot now and the bootrom expects it to be
enabled when booting from SD card. This makes rebooting impossible in that
case and requires a hard reset to boot again.
Fixes: 04a0077fdb ("arm64: dts: rockchip: Remove always-on properties from regulator nodes on rk3399-roc-pc.")
Signed-off-by: Alex Bee <knaerzche@gmail.com>
Link: https://lore.kernel.org/r/20210619121306.7740-1-knaerzche@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 06b2818678 ]
This might be a limitation of either the current panfrost driver
devfreq implementation or how the gpu is implemented in RK3399 SoC.
The gpu regulator must never get disabled or the registers get
(randomly?) inaccessable by the driver. (see all other RK3399 boards)
Fixes: ec7d731d81 ("arm64: dts: rockchip: Add node for gpu on rk3399-roc-pc")
Signed-off-by: Alex Bee <knaerzche@gmail.com>
Link: https://lore.kernel.org/r/20210619121446.7802-1-knaerzche@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fae2016099 ]
Currently it is hard to determinate if on Armada 3720 device is HWRNG
by running kernel accessible or not. So print information message into
dmesg when HWRNG is available and registration was successful.
Fixes: 389711b374 ("firmware: Add Turris Mox rWTM firmware driver")
Signed-off-by: Pali Rohár <pali@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2eab59cf0d ]
When Marvell's rWTM firmware, which does not support the GET_RANDOM
command, is used, kernel prints an error message
hwrng: no data available
every 10 seconds.
Fail probing of this driver if the rWTM firmware does not support the
GET_RANDOM command.
Fixes: 389711b374 ("firmware: Add Turris Mox rWTM firmware driver")
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Marek Behún <kabel@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 72f9988894 ]
Report a notice level message if a command is not supported by the rWTM
firmware.
This should not be an error, merely a notice, because the firmware can
be used on boards that do not have manufacturing information burned.
Fixes: 389711b374 ("firmware: Add Turris Mox rWTM firmware driver")
Signed-off-by: Marek Behún <kabel@kernel.org>
Reviewed-by: Pali Rohár <pali@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e34e60253d ]
The status decoding function mox_get_status() currently contains an
incorrect check: if the error status is not MBOX_STS_SUCCESS, it always
returns -EIO, so the comparison to MBOX_STS_FAIL is never executed and
we don't get the actual error code sent by the firmware.
Fix this.
Signed-off-by: Marek Behún <kabel@kernel.org>
Reviewed-by: Pali Rohár <pali@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Fixes: 389711b374 ("firmware: Add Turris Mox rWTM firmware driver")
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 74ee585b7e ]
M= (or KBUILD_EXTMOD) generally expects a directory path without any
trailing slashes, like M=a/b/c.
If you add a trailing slash, like M=a/b/c/, you will get ugly build
logs (two slashes in a series), but it still works fine as long as it
is consistent between 'make modules' and 'make modules_install'.
The following commands correctly build and install the modules.
$ make M=a/b/c/ modules
$ sudo make M=a/b/c/ modules_install
Since commit ccae4cfa7b ("kbuild: refactor scripts/Makefile.modinst"),
a problem happens if you add a trailing slash only for modules_install.
$ make M=a/b/c modules
$ sudo make M=a/b/c/ modules_install
No module is installed in this case, Johannes Berg reported. [1]
Trim any trailing slashes from $(KBUILD_EXTMOD).
I used the 'dirname' command to remove all the trailing slashes in
case someone adds more slashes like M=a/b/c/////. The Make's built-in
function, $(dir ...) cannot take care of such a case.
[1]: https://lore.kernel.org/lkml/10cc8522b27a051e6a9c3e158a4c4b6414fd04a0.camel@sipsolutions.net/
Fixes: ccae4cfa7b ("kbuild: refactor scripts/Makefile.modinst")
Reported-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 50c9bfca1b ]
The current device tree CPSW3g node adds non-zero "mac-address" property to
the ports, which prevents random MAC address assignment to network devices
if bootloader failed to update DT. This may cause more then one host to
have the same MAC in the network.
mac-address = [00 00 de ad be ef];
mac-address = [00 01 de ad be ef];
In addition, there is one MAC address available in eFuse registers which
can be used for default port 1.
Hence, fix ports MAC properties by:
- resetting "mac-address" property to 0
- adding ti,syscon-efuse = <&main_conf 0x200> to Port 1
Fixes: 3753b12877 ("arm64: dts: ti: k3-am64-main: Add CPSW DT node")
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Reviewed-by: Vignesh Raghavendra <vigneshr@ti.com>
Signed-off-by: Nishanth Menon <nm@ti.com>
Link: https://lore.kernel.org/r/20210608184940.25934-1-grygorii.strashko@ti.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9b11fec734 ]
ti,pindir-d0-out-d1-in property is expected to be of type boolean.
Therefore, fix the property accordingly.
Fixes: b0b0395154 ("ARM: dts: am43x-epos-evm: set data pin directions for spi0 and spi1")
Signed-off-by: Aswath Govindraju <a-govindraju@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 414bfe1d26 ]
ti,pindir-d0-out-d1-in property is expected to be of type boolean.
Therefore, fix the property accordingly.
Fixes: 444d66fafa ("ARM: dts: add spi wifi support to cm-t335")
Signed-off-by: Aswath Govindraju <a-govindraju@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 78b4b16528 ]
With [1] USB4 target-module node got defined in dra74x.dtsi file.
However, the earlier definition in [2] was not removed, and this
duplication of the target module is causing boot failure on dra74
variant boards - dra7-evm, dra76-evm.
USB4 is only present in DRA74x variants, so keeping the entry in
dra74x.dtsi and removing it from the top level interconnect hierarchy
dra7-l4.dtsi file. This change makes the USB4 target module no longer
visible to AM5718, DRA71x and DRA72x so removing references to it in
their respective dts files.
[1]: commit c7b72abca6 ("ARM: OMAP2+: Drop legacy platform data for
dra7 dwc3")
[2]: commit 549fce068a ("ARM: dts: dra7: Add l4 interconnect
hierarchy and ti-sysc data")
Fixes: c7b72abca6 ("ARM: OMAP2+: Drop legacy platform data for dra7 dwc3")
Signed-off-by: Gowtham Tammana <g-tammana@ti.com>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bd5431b2f9 ]
Although the schematics of Pine A64-LTS and SoPine Baseboard shows both
the RX and TX internal delay are enabled, they're using the same broken
RTL8211E chip batch with Pine A64+, so they should use TXID instead, not
ID.
In addition, by checking the real components soldered on both a SoPine
Baseboard and a Pine A64-LTS, RX delay is not enabled (GR69 soldered and
GR70 NC) despite the schematics says it's enabled. It's a common
situation for Pine64 boards that the NC information on schematics is not
the same with the board.
So the RGMII delay mode should be TXID on these boards.
Fixes: c2b111e59a ("arm64: dts: allwinner: A64 Sopine: phy-mode rgmii-id")
Signed-off-by: Icenowy Zheng <icenowy@aosc.io>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Link: https://lore.kernel.org/r/20210609083843.463750-1-icenowy@aosc.io
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5c6d0b55b4 ]
Rename the external refclk inputs to the SERDES from
dummy_cmn_refclk/dummy_cmn_refclk1 to cmn_refclk/cmn_refclk1
respectively. Also move the external refclk DT nodes outside the
cbass_main DT node. Since in j721e common processor board, only the
cmn_refclk1 is connected to 100MHz clock, fix the clock frequency.
Fixes: afd094ebe6 ("arm64: dts: ti: k3-j721e-main: Add WIZ and SERDES PHY nodes")
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Reviewed-by: Aswath Govindraju <a-govindraju@ti.com>
Signed-off-by: Nishanth Menon <nm@ti.com>
Link: https://lore.kernel.org/r/20210603143427.28735-2-kishon@ti.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1771a33b34 ]
"make dtbs_check":
arm64/boot/dts/renesas/r8a779a0-falcon.dt.yaml: interrupt-controller@f1000000: 'power-domains' does not match any of the regexes: '^(msi-controller|gic-its|interrupt-controller)@[0-9a-f]+$', '^gic-its@', '^interrupt-controller@[0-9a-f]+$', 'pinctrl-[0-9]+'
From schema: Documentation/devicetree/bindings/interrupt-controller/arm,gic-v3.yaml
Remove the "power-domains" property, as the GIC on R-Car V3U is
always-on, and not part of a clock domain.
Fixes: 834c310f54 ("arm64: dts: renesas: Add Renesas R8A779A0 SoC support")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Link: https://lore.kernel.org/r/a9ae5cbc7c586bf2c6b18ddc665ad7051bd1d206.1622560236.git.geert+renesas@glider.be
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d5aede3e6d ]
1. Reorder interrupts
2. Fix typo: s/spi_lr_overhead/spi_lr_overread/
3. Rename node: s/spi-nor@0/flash@0/
This fixes:
arch/arm/boot/dts/bcm4709-buffalo-wxr-1900dhp.dt.yaml: spi@18029200: interrupt-names: 'oneOf' conditional failed, one must be fixed:
['spi_lr_fullness_reached', 'spi_lr_session_aborted', 'spi_lr_impatient', 'spi_lr_session_done', 'spi_lr_overhead', 'mspi_done', 'mspi_halted'] is too long
Additional items are not allowed ('spi_lr_session_aborted', 'spi_lr_impatient', 'spi_lr_session_done', 'spi_lr_overhead', 'mspi_done', 'mspi_halted' were unexpected)
'mspi_done' was expected
'spi_l1_intr' was expected
'mspi_halted' was expected
'spi_lr_fullness_reached' was expected
'spi_lr_session_aborted' was expected
'spi_lr_impatient' was expected
'spi_lr_session_done' was expected
'spi_lr_overread' was expected
From schema: Documentation/devicetree/bindings/spi/brcm,spi-bcm-qspi.yaml
arch/arm/boot/dts/bcm4709-buffalo-wxr-1900dhp.dt.yaml: spi-nor@0: $nodename:0: 'spi-nor@0' does not match '^flash(@.*)?$'
From schema: Documentation/devicetree/bindings/mtd/jedec,spi-nor.yaml
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 92e669017f ]
The SCL gpio pin used by I2C bus for recovery needs to be configured as
open drain, so fix the binding example accordingly.
In relation with fix c5a2838025 ("ARM: dts: at91: Configure I2C SCL
gpio as open drain").
Signed-off-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Fixes: 19e5cef058 ("dt-bindings: i2c: at91: document optional bus recovery properties")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0cb7af474e ]
During an async commands execution the Rx buffer length is at first set
to max_msg_sz when the synchronous part of the command is first sent.
However once the synchronous part completes the transport layer waits
for the delayed response which will be processed using the same xfer
descriptor initially allocated. Since synchronous response received at
the end of the xfer will shrink the Rx buffer length to the effective
payload response length, it needs to be reset again.
Raise the Rx buffer length again to max_msg_sz before fetching the
delayed response to ensure full response is read correctly from the
shared memory.
Link: https://lore.kernel.org/r/20210601102421.26581-2-cristian.marussi@arm.com
Fixes: 58ecdf03db ("firmware: arm_scmi: Add support for asynchronous commands and delayed response")
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
[sudeep.holla: moved reset to scmi_handle_response as it could race with
do_xfer_with_response]
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f0fce06e34 ]
In this new power domain driver, when adding one power domain
it will prepare the dependent clocks at the same.
So we only do clk_bulk_enable/disable control during power ON/OFF.
When system suspend, the pm runtime framework will forcely power off
power domains. However, the dependent clocks are disabled but kept
prepared.
In MediaTek clock drivers, PLL would be turned ON when we do
clk_bulk_prepare control.
Clock hierarchy:
PLL -->
DIV_CK -->
CLK_MUX
(may be dependent clocks)
-->
SUBSYS_CG
(may be dependent clocks)
It will lead some unexpected clock states during system suspend.
This patch will fix by doing prepare_enable/disable_unprepare on
dependent clocks at the same time while we are going to power on/off
any power domain.
Fixes: 59b644b01c ("soc: mediatek: Add MediaTek SCPSYS power domains")
Signed-off-by: Weiyi Lu <weiyi.lu@mediatek.com>
Signed-off-by: Hsin-Yi Wang <hsinyi@chromium.org>
Reviewed-by: chun-jie.chen <chun-jie.chen@mediatek.com>
Reviewed-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
Link: https://lore.kernel.org/r/20210601035905.2970384-1-hsinyi@chromium.org
Signed-off-by: Matthias Brugger <matthias.bgg@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit eed6ff1bb2 ]
Mediatek requires mmsys clocks to be unprepared during suspend,
otherwise system has chances to hang.
syscon_regmap_lookup_by_phandle_optional() will attach and prepare the
first clock in smi node, leading to additional prepare to the clock
which is not balanced with the prepare/unprepare pair in resume/suspend
callbacks.
If a power domain node requests an smi node and the smi node's first
clock is an mmsys clock, it will results in an unstable suspend resume.
Fixes: f414854c88 ("soc: mediatek: pm-domains: Add SMI block as bus protection block")
Signed-off-by: Hsin-Yi Wang <hsinyi@chromium.org>
Reviewed-by: chun-jie.chen <chun-jie.chen@mediatek.com>
Reviewed-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
Link: https://lore.kernel.org/r/20210601035905.2970384-2-hsinyi@chromium.org
Signed-off-by: Matthias Brugger <matthias.bgg@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7fea67710e ]
When call irq_get_irq_data() to get the IRQ's irq_data failed, an
appropriate error code -ENOENT should be returned. However, we directly
return 'err', which records the IRQ number instead of the error code.
Fixes: 139251fc22 ("firmware: tegra: add bpmp driver for Tegra210")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1cebcf9932 ]
The Microchip LAN8710Ai PHY requires XTAL1/CLKIN external clock to be
enabled when the nRST is toggled according to datasheet Microchip
LAN8710A/LAN8710Ai DS00002164B page 35 section 3.8.5.1 Hardware Reset:
"
A Hardware reset is asserted by driving the nRST input pin low. When
driven, nRST should be held low for the minimum time detailed in
Section 5.5.3, "Power-On nRST & Configuration Strap Timing," on page
59 to ensure a proper transceiver reset. During a Hardware reset, an
external clock must be supplied to the XTAL1/CLKIN signal.
"
This is accidentally fulfilled in the current setup, where ETHCK_K is used
to supply both PHY XTAL1/CLKIN and is also fed back through eth_clk_fb to
supply ETHRX clock of the DWMAC. Hence, the DWMAC enables ETHRX clock,
that has ETHCK_K as parent, so ETHCK_K clock are also enabled, and then
the PHY reset toggles.
However, this is not always the case, e.g. in case the PHY XTAL1/CLKIN
clock are supplied by some other clock source than ETHCK_K or in case
ETHRX clock are not supplied by ETHCK_K. In the later case, ETHCK_K would
be kept disabled, while ETHRX clock would be enabled, so the PHY would
not be receiving XTAL1/CLKIN clock and the reset would fail.
Improve the DT by adding the PHY clock phandle into the PHY node, which
then also requires moving the PHY reset GPIO specifier in the same place
and that then also requires correct PHY reset GPIO timing, so add that
too.
A brief note regarding the timing, the datasheet says the reset should
stay asserted for at least 100uS and software should wait at least 200nS
after deassertion. Set both delays to 500uS which should be plenty.
Fixes: 34e0c7847d ("ARM: dts: stm32: Add DH Electronics DHCOM STM32MP1 SoM and PDK2 board")
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Cc: Patrice Chotard <patrice.chotard@st.com>
Cc: Patrick Delaunay <patrick.delaunay@st.com>
Cc: linux-stm32@st-md-mailman.stormreply.com
To: linux-arm-kernel@lists.infradead.org
Signed-off-by: Alexandre Torgue <alexandre.torgue@foss.st.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e60fd5ac1f ]
The rmtfs_mem region is a weird one, downstream allocates it
dynamically, and supports a "qcom,guard-memory" property which when set
will reserve 4k above and below the rmtfs memory.
A common from qcom 4.9 kernel msm_sharedmem driver:
/*
* If guard_memory is set, then the shared memory region
* will be guarded by SZ_4K at the start and at the end.
* This is needed to overcome the XPU limitation on few
* MSM HW, so as to make this memory not contiguous with
* other allocations that may possibly happen from other
* clients in the system.
*/
When the kernel tries to touch memory that is too close the
rmtfs region it may cause an XPU violation. Such is the case on the
OnePlus 6 where random crashes would occur usually after boot.
Reserve 4k above and below the rmtfs_mem to avoid hitting these XPU
Violations.
This doesn't entirely solve the random crashes on the OnePlus 6/6T but
it does seem to prevent the ones which happen shortly after modem
bringup.
Fixes: 288ef8a426 ("arm64: dts: sdm845: add oneplus6/6t devices")
Signed-off-by: Caleb Connolly <caleb@connolly.tech>
Link: https://lore.kernel.org/r/20210502014146.85642-4-caleb@connolly.tech
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 54b909436e ]
The scnprintf() function silently truncates the printf() and returns
the number bytes that it was able to copy (not counting the NUL
terminator). Thus, the highest value it can return here is
"NAME_SIZE - 1" and the overflow check is dead code. Fix this by
using the snprintf() function which returns the number of bytes that
would have been copied if there was enough space and changing the
condition from "> NAME_SIZE" to ">= NAME_SIZE".
Fixes: 92589c986b ("rtc-proc: permit the /proc/driver/rtc device to use other devices")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/YJov/pcGmhLi2pEl@mwanda
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b0ddc5b170 ]
The added Kconfig dependency is slightly incorrect, which can
lead to a link failure when the watchdog is a loadable module:
arm-linux-gnueabi-ld: drivers/rtc/rtc-bd70528.o: in function `bd70528_set_rtc_based_timers':
rtc-bd70528.c:(.text+0x6cc): undefined reference to `bd70528_wdt_set'
arm-linux-gnueabi-ld: drivers/rtc/rtc-bd70528.o: in function `bd70528_set_time':
rtc-bd70528.c:(.text+0xaa0): undefined reference to `bd70528_wdt_lock'
arm-linux-gnueabi-ld: rtc-bd70528.c:(.text+0xab8): undefined reference to `bd70528_wdt_unlock'
arm-linux-gnueabi-ld: drivers/rtc/rtc-bd70528.o: in function `bd70528_alm_enable':
rtc-bd70528.c:(.text+0xfc0): undefined reference to `bd70528_wdt_lock'
arm-linux-gnueabi-ld: rtc-bd70528.c:(.text+0x1030): undefined reference to `bd70528_wdt_unlock'
The problem is that it allows to be built-in if MFD_ROHM_BD71828
is built-in, even when the watchdog is a loadable module.
Rework this so that having the watchdog as a loadable module always
forces the rtc to be a module as well instead of built-in,
regardless of bd71828.
Fixes: c56dc069f2 ("rtc: bd70528: Support RTC on ROHM BD71815")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Matti Vaittinen <matti.vaittinen@fi.rohmeurope.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20210422151545.2403356-1-arnd@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f0e70d4946 ]
Commit 'fbbe38309d56 ("arm64: defconfig: Allow mt8173-based boards to boot
from usb")' added the MTK_PMIC_WRAP config built-in. It needs to be
built-in in order to be able to boot from USB or the MMC without needing
a ramdisk, but that symbol was already defined as a module so now we are
getting the following warning:
arch/arm64/configs/defconfig:996:warning: override: reassigning to symbol MTK_PMIC_WRAP
Remove the MTK_PMIC_WRAP=m from the defconfig to remove the error.
Fixes: fbbe38309d ("arm64: defconfig: Allow mt8173-based boards to boot from usb")
Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
Link: https://lore.kernel.org/r/20210423075201.2616023-1-enric.balletbo@collabora.com
Signed-off-by: Matthias Brugger <matthias.bgg@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6ab9d6219f ]
The Intel Gateway reset controller is only present on Intel Gateway
platforms. Hence add a dependency on X86, to prevent asking the user
about this driver when configuring a kernel without Intel Gateway
support.
Fixes: c9aef213e3 ("reset: intel: Add system reset controller driver")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 42f6a76fbe ]
The Broadcom STB RESCAL reset controller is only present on Broadcom
BCM7216 platforms. Hence add a dependency on ARCH_BRCMSTB, to prevent
asking the user about this driver when configuring a kernel without
BCM7216 support.
Also, merely enabling CONFIG_COMPILE_TEST should not enable additional
code, and thus should not enable this driver by default.
Fixes: 4cf176e523 ("reset: Add Broadcom STB RESCAL reset controller")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3d3bb3d27c ]
Two ethernet node was added by
commit 95220046a6 ("ARM: dts: Add ethernet to a bunch of platforms")
and commit d6d0cef55e ("ARM: dts: Add the FOTG210 USB host to Gemini boards")
This patch removes the duplicate one.
Fixes: d6d0cef55e ("ARM: dts: Add the FOTG210 USB host to Gemini boards")
Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 681ba73c72 ]
ld.lld warns that the '.modinfo' section is not currently handled:
ld.lld: warning: kernel/built-in.a(workqueue.o):(.modinfo) is being placed in '.modinfo'
ld.lld: warning: kernel/built-in.a(printk/printk.o):(.modinfo) is being placed in '.modinfo'
ld.lld: warning: kernel/built-in.a(irq/spurious.o):(.modinfo) is being placed in '.modinfo'
ld.lld: warning: kernel/built-in.a(rcu/update.o):(.modinfo) is being placed in '.modinfo'
The '.modinfo' section was added in commit 898490c010 ("moduleparam:
Save information about built-in modules in separate file") to the DISCARDS
macro but Hexagon has never used that macro. The unification of DISCARDS
happened in commit 023bf6f1b8 ("linker script: unify usage of discard
definition") in 2009, prior to Hexagon being added in 2011.
Switch Hexagon over to the DISCARDS macro so that anything that is
expected to be discarded gets discarded.
Link: https://lkml.kernel.org/r/20210521011239.1332345-3-nathan@kernel.org
Fixes: e95bf452a9 ("Hexagon: Add configuration and makefiles for the Hexagon architecture.")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Brian Cain <bcain@codeaurora.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Oliver Glitta <glittao@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6fef087d0d ]
Patch series "hexagon: Fix build error with CONFIG_STACKDEPOT and select CONFIG_ARCH_WANT_LD_ORPHAN_WARN".
This series fixes an error with ARCH=hexagon that was pointed out by the
patch "mm/slub: use stackdepot to save stack trace in objects".
The first patch fixes that error by handling the '.irqentry.text' and
'.softirqentry.text' sections.
The second patch switches Hexagon over to the common DISCARDS macro, which
should have been done when Hexagon was merged into the tree to match
commit 023bf6f1b8 ("linker script: unify usage of discard definition").
The third patch selects CONFIG_ARCH_WANT_LD_ORPHAN_WARN so that something
like this does not happen again.
This patch (of 3):
Patch "mm/slub: use stackdepot to save stack trace in objects" in -mm
selects CONFIG_STACKDEPOT when CONFIG_STACKTRACE_SUPPORT is selected and
CONFIG_STACKDEPOT requires IRQENTRY_TEXT and SOFTIRQENTRY_TEXT to be
handled after commit 505a0ef15f ("kasan: stackdepot: move
filter_irq_stacks() to stackdepot.c") due to the use of the
__{,soft}irqentry_text_{start,end} section symbols. If those sections are
not handled, the build is broken.
$ make ARCH=hexagon CROSS_COMPILE=hexagon-linux- LLVM=1 LLVM_IAS=1 defconfig all
...
ld.lld: error: undefined symbol: __irqentry_text_start
>>> referenced by stackdepot.c
>>> stackdepot.o:(filter_irq_stacks) in archive lib/built-in.a
>>> referenced by stackdepot.c
>>> stackdepot.o:(filter_irq_stacks) in archive lib/built-in.a
ld.lld: error: undefined symbol: __irqentry_text_end
>>> referenced by stackdepot.c
>>> stackdepot.o:(filter_irq_stacks) in archive lib/built-in.a
>>> referenced by stackdepot.c
>>> stackdepot.o:(filter_irq_stacks) in archive lib/built-in.a
ld.lld: error: undefined symbol: __softirqentry_text_start
>>> referenced by stackdepot.c
>>> stackdepot.o:(filter_irq_stacks) in archive lib/built-in.a
>>> referenced by stackdepot.c
>>> stackdepot.o:(filter_irq_stacks) in archive lib/built-in.a
ld.lld: error: undefined symbol: __softirqentry_text_end
>>> referenced by stackdepot.c
>>> stackdepot.o:(filter_irq_stacks) in archive lib/built-in.a
>>> referenced by stackdepot.c
>>> stackdepot.o:(filter_irq_stacks) in archive lib/built-in.a
...
Add these sections to the Hexagon linker script so the build continues to
work. ld.lld's orphan section warning would have caught this prior to the
-mm commit mentioned above:
ld.lld: warning: kernel/built-in.a(softirq.o):(.softirqentry.text) is being placed in '.softirqentry.text'
ld.lld: warning: kernel/built-in.a(softirq.o):(.softirqentry.text) is being placed in '.softirqentry.text'
ld.lld: warning: kernel/built-in.a(softirq.o):(.softirqentry.text) is being placed in '.softirqentry.text'
Link: https://lkml.kernel.org/r/20210521011239.1332345-1-nathan@kernel.org
Link: https://lkml.kernel.org/r/20210521011239.1332345-2-nathan@kernel.org
Link: https://github.com/ClangBuiltLinux/linux/issues/1381
Fixes: 505a0ef15f ("kasan: stackdepot: move filter_irq_stacks() to stackdepot.c")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Brian Cain <bcain@codeaurora.org>
Cc: Oliver Glitta <glittao@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f46f84931a ]
After we grab the lock in nfs4_pnfs_ds_connect(), there is no check for
whether or not ds->ds_clp has already been initialised, so we can end up
adding the same transports multiple times.
Fixes: fc821d5920 ("pnfs/NFSv4.1: Add multipath capabilities to pNFS flexfiles servers over NFSv3")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0b77f97a7e ]
If the layout gets invalidated, we should wait for any outstanding
layoutget requests for that layout to complete, and we should resend
them only after re-establishing the layout stateid.
Fixes: d29b468da4 ("pNFS/NFSv4: Improve rejection of out-of-order layouts")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit aa95edf309 ]
If we have multiple outstanding layoutget requests, the current code to
update the layout barrier assumes that the outstanding layout stateids
are updated in order. That's not necessarily the case.
Instead of using the value of lo->plh_outstanding as a guesstimate for
the window of values we need to accept, just wait to update the window
until we're processing the last one. The intention here is just to
ensure that we don't process 2^31 seqid updates without also updating
the barrier.
Fixes: 1bcf34fdac ("pNFS/NFSv4: Update the layout barrier when we schedule a layoutreturn")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ba512c1bc3 ]
Earlier commits refactored some NFS read code and removed
nfs_readpage_async(), but neglected to properly fixup
nfs_readpage_from_fscache_complete(). The code path is
only hit when something unusual occurs with the cachefiles
backing filesystem, such as an IO error or while a cookie
is being invalidated.
Mark page with PG_checked if fscache IO completes in error,
unlock the page, and let the VM decide to re-issue based on
PG_uptodate. When the VM reissues the readpage, PG_checked
allows us to skip over fscache and read from the server.
Link: https://marc.info/?l=linux-nfs&m=162498209518739
Fixes: 1e83b173b2 ("NFS: Add nfs_pageio_complete_read() and remove nfs_readpage_async()")
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e0340f16a0 upstream.
A previous refactoring of nfs_readpage() might end up calling
wait_on_page_locked_killable() even if readpage_async_filler() failed
with an internal error and pg_error was non-zero (for example, if
nfs_create_request() failed). In the case of an internal error,
skip over wait_on_page_locked_killable() as this is only needed
when the read is sent and an error occurs during completion handling.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 500817bf5e ]
We are reading a Big Block Mode value while in Sub Block Mode
when initializing. Fortunately, vm->bbm.bb_size maps to some counter
in the vm->sbm.mb_count array, which is 0 at that point in time.
No harm done; still, this was unintended and is not future-proof.
Fixes: 4ba50cd335 ("virtio-mem: Big Block Mode (BBM) memory hotplug")
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20210602185720.31821-2-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e3aadf2e16 ]
After device reset, the virtqueues are not ready so clear the ready
field.
Failing to do so can result in virtio_vdpa failing to load if the device
was previously used by vhost_vdpa and the old values are ready.
virtio_vdpa expects to find VQs in "not ready" state.
Fixes: 1a86b377aa ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
Signed-off-by: Eli Cohen <elic@nvidia.com>
Link: https://lore.kernel.org/r/20210606053128.170399-1-elic@nvidia.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ab1016d39c ]
In error cases the dentry may be NULL.
Before 20798dfe24, the encoder also checked dentry and
d_really_is_positive(dentry), but that looks like overkill to me--zero
status should be enough to guarantee a positive dentry.
This isn't the first time we've seen an error-case NULL dereference
hidden in the initialization of a local variable in an xdr encoder. But
I went back through the other recent rewrites and didn't spot any
similar bugs.
Reported-by: JianHong Yin <jiyin@redhat.com>
Reviewed-by: Chuck Lever III <chuck.lever@oracle.com>
Fixes: 20798dfe24 ("NFSD: Update the NFSv3 GETACL result encoder...")
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7b08cf62b1 ]
The double copy of the string is a mistake, plus __assign_str()
uses strlen(), which is wrong to do on a string that isn't
guaranteed to be NUL-terminated.
Fixes: 6019ce0742 ("NFSD: Add a tracepoint to record directory entry encoding")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 474bc33469 ]
When flushing out the unstable file writes as part of a COMMIT call, try
to perform most of of the data writes and waits outside the semaphore.
This means that if the client is sending the COMMIT as part of a memory
reclaim operation, then it can continue performing I/O, with contention
for the lock occurring only once the data sync is finished.
Fixes: 5011af4c69 ("nfsd: Fix stable writes")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Tested-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 934bd07fae ]
This was causing a "sleeping function called from invalid context"
warning.
I don't think we need the set_and_test_bit() here; clients move from
unconfirmed to confirmed only once, under the client_lock.
The (conf == unconf) is a way to check whether we're in that confirming
case, hopefully that's not too obscure.
Fixes: 472d155a06 "nfsd: report client confirmation status in "info" file"
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 419ac82176 ]
Commit 91c960b005 ("bpf: Rename BPF_XADD and prepare to encode other
atomics in .imm") converted BPF_XADD to BPF_ATOMIC and added a way to
distinguish instructions based on the immediate field. Existing JIT
implementations were updated to check for the immediate field and to
reject programs utilizing anything more than BPF_ADD (such as BPF_FETCH)
in the immediate field.
However, the check added to powerpc64 JIT did not look at the correct
BPF instruction. Due to this, such programs would be accepted and
incorrectly JIT'ed resulting in soft lockups, as seen with the atomic
bounds test. Fix this by looking at the correct immediate value.
Fixes: 91c960b005 ("bpf: Rename BPF_XADD and prepare to encode other atomics in .imm")
Reported-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Tested-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/4117b430ffaa8cd7af042496f87fd7539e4f17fd.1625145429.git.naveen.n.rao@linux.vnet.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0755d3be2d ]
The sk_user_data pointer is supposed to be modified only while
holding the write_lock "sk_callback_lock", otherwise
we could race with other threads and crash the kernel.
we can't take the write_lock in nvmet_tcp_state_change()
because it would cause a deadlock, but the release_work queue
will set the pointer to NULL later so we can simply remove
the assignment.
Fixes: b5332a9f3f ("nvmet-tcp: fix incorrect locking in state_change sk callback")
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5a2f966d0f ]
It's unsafe to operate a vq from multiple threads.
Unfortunately this is exactly what we do when invoking
clean tx poll from rx napi.
Same happens with napi-tx even without the
opportunistic cleaning from the receive interrupt: that races
with processing the vq in start_xmit.
As a fix move everything that deals with the vq to under tx lock.
Fixes: b92f1e6751 ("virtio-net: transmit napi")
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7d23dcdf21 ]
Before SF support was introduced, the DMA device was equal to
mdev->device which was in essence equal to pdev->dev.
With SF introduction this is no longer true. It has already been
handled for vhost_vdpa since the reference to the dma device can from
within mlx5_vdpa. With virtio_vdpa this broke. To fix this we set the
real dma device when initializing the device.
In addition, for the sake of consistency, previous references in the
code to the dma device are changed to vdev->dma_dev.
Fixes: d13a15d544 ("vdpa/mlx5: Use the correct dma device when registering memory")
Signed-off-by: Eli Cohen <elic@nvidia.com>
Link: https://lore.kernel.org/r/20210606053150.170489-1-elic@nvidia.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 71ab6a7cfb ]
umem size is a 32 bit unsigned value so assigning it to an int could
cause false failures. Set the calculated value inside the function and
modify function name to reflect the fact it updates the size.
This bug was found during code review but never had real impact to this
date.
Fixes: 1a86b377aa ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
Signed-off-by: Eli Cohen <elic@nvidia.com>
Link: https://lore.kernel.org/r/20210530090349.8360-1-elic@nvidia.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e3011776af ]
Fix copy paste bug assigning umem1 size to umem2 and umem3. The issue
was discovered when trying to use a 1:1 MR that covers the entire
address space where firmware complained that provided sizes are not
large enough. 1:1 MRs are required to support virtio_vdpa.
Fixes: 1a86b377aa ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
Signed-off-by: Eli Cohen <elic@nvidia.com>
Link: https://lore.kernel.org/r/20210530090317.8284-1-elic@nvidia.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 83150f5d05 ]
Currently topology_scale_freq_tick() (which gets called from
scheduler_tick()) may end up using a pointer to "struct
scale_freq_data", which was previously cleared by
topology_clear_scale_freq_source(), as there is no protection in place
here. The users of topology_clear_scale_freq_source() though needs a
guarantee that the previously cleared scale_freq_data isn't used
anymore, so they can free the related resources.
Since topology_scale_freq_tick() is called from scheduler tick, we don't
want to add locking in there. Use the RCU update mechanism instead
(which is already used by the scheduler's utilization update path) to
guarantee race free updates here.
synchronize_rcu() makes sure that all RCU critical sections that started
before it is called, will finish before it returns. And so the callers
of topology_clear_scale_freq_source() don't need to worry about their
callback getting called anymore.
Cc: Paul E. McKenney <paulmck@kernel.org>
Fixes: 01e055c120 ("arch_topology: Allow multiple entities to provide sched_freq_tick() callback")
Tested-by: Vincent Guittot <vincent.guittot@linaro.org>
Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com>
Tested-by: Qian Cai <quic_qiancai@quicinc.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f67092eff2 ]
tegra_pcie_ep_raise_msi_irq() shifted a signed 32-bit value left by 31
bits. The behavior of this is implementation-defined.
Replace the shift by BIT(), which is well-defined.
Found by cppcheck:
$ cppcheck --enable=all drivers/pci/controller/dwc/pcie-tegra194.c
Checking drivers/pci/controller/dwc/pcie-tegra194.c ...
drivers/pci/controller/dwc/pcie-tegra194.c:1829:23: portability: Shifting signed 32-bit value by 31 bits is implementation-defined behaviour. See condition at line 1826. [shiftTooManyBitsSigned]
appl_writel(pcie, (1 << irq), APPL_MSI_CTRL_1);
^
[bhelgaas: commit log]
Link: https://lore.kernel.org/r/20210618160219.303092-1-jonathanh@nvidia.com
Fixes: c57247f940 ("PCI: tegra: Add support for PCIe endpoint mode in Tegra194")
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 132e320978 ]
This patch removes setting SBI_NEED_FSCK when GC gets an error on f2fs_iget,
since f2fs_iget can give ENOMEM and others by race condition.
If we set this critical fsck flag, we'll get EIO during fsync via the below
code path.
In f2fs_inplace_write_data(),
if (is_sbi_flag_set(sbi, SBI_NEED_FSCK) || f2fs_cp_error(sbi)) {
err = -EIO;
goto drop_bio;
}
Fixes: 9557727876 ("f2fs: drop inplace IO if fs status is abnormal")
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit eae00c5d6e ]
After calling security_sb_clone_mnt_opts() in nfs_get_root(), it's
necessary to copy the value of has_sec_mnt_opts from the cloned
super_block's nfs_server. Otherwise, calls to nfs_compare_super()
using this super_block may not return the correct result, leading to
mount failures.
For example, mounting an nfs server with the following in /etc/exports:
/export *(rw,insecure,crossmnt,no_root_squash,security_label)
and having /export/scratch on a separate block device.
mount -o v4.2,context=system_u:object_r:root_t:s0 server:/export/test /mnt/test
mount -o v4.2,context=system_u:object_r:swapfile_t:s0 server:/export/scratch /mnt/scratch
The second mount would fail with "mount.nfs: /mnt/scratch is busy or
already mounted or sharecache fail" and "SELinux: mount invalid. Same
superblock, different security settings for..." would appear in the
syslog.
Also while we're in there, replace several instances of "NFS_SB(s)"
with "server", which was already declared at the top of the
nfs_get_root().
Fixes: ec1ade6a04 ("nfs: account for selinux security context when deciding to share superblock")
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1bc6ea31cb ]
The .remove() callback disables clocks that were not enabled in
.probe(). So just probing and then unbinding the driver results in a clk
enable imbalance.
So just drop the call to disable the clocks. (Which BTW was also in the
wrong order because the call makes the PWM unfunctional and so should
have come only after pwmchip_remove()).
Fixes: 9f4c8f9607 ("pwm: imx: Add ipg clock operation")
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9625895011 ]
The gap handling in copy_xstate_to_kernel() is wrong when XSAVES is in
use.
Using init_fpstate for copying the init state of features which are
not set in the xstate header is only correct for the legacy area, but
not for the extended features area because when XSAVES is in use then
init_fpstate is in compacted form which means the xstate offsets which
are used to copy from init_fpstate are not valid.
Fortunately, this is not a real problem today because all extended
features in use have an all-zeros init state, but it is wrong
nevertheless and with a potentially dynamically sized init_fpstate this
would result in an access outside of the init_fpstate.
Fix this by keeping track of the last copied state in the target buffer and
explicitly zero it when there is a feature or alignment gap.
Use the compacted offset when accessing the extended feature space in
init_fpstate.
As this is not a functional issue on older kernels this is intentionally
not tagged for stable.
Fixes: b8be15d588 ("x86/fpu/xstate: Re-enable XSAVES")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210623121451.294282032@linutronix.de
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bc1c56e9bb ]
If an RPC client is created without RPC_CLNT_CREATE_REUSEPORT, it should
not reuse the source port when a TCP connection is re-established.
This is currently implemented by preventing the source port being
recorded after a successful connection (the call to xs_set_srcport()).
However the source port is also recorded after a successful bind in xs_bind().
This may not be needed at all and certainly is not wanted when
RPC_CLNT_CREATE_REUSEPORT wasn't requested.
So avoid that assignment when xprt.reuseport is not set.
With this change, NFSv4.1 and later mounts use a different port number on
each connection. This is helpful with some firewalls which don't cope
well with port reuse.
Signed-off-by: NeilBrown <neilb@suse.de>
Fixes: e6237b6feb ("NFSv4.1: Don't rebind to the same source port when reconnecting to the server")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a2c2a622d4 ]
Fix to return a negative error code from the error handling case instead
of 0, as done elsewhere in this function.
Fixes: 9ca2d73264 ("ubifs: Limit number of xattrs per inode")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 558f9b2f94 ]
GCC assumes that stack is aligned to 16-byte on call sites [1].
Since GCC 8, GCC began using 16-byte aligned SSE instructions to
implement assignments to structs on stack. When
CC_OPTIMIZE_FOR_PERFORMANCE is enabled, this affects
os-Linux/sigio.c, write_sigio_thread:
struct pollfds *fds, tmp;
tmp = current_poll;
Note that struct pollfds is exactly 16 bytes in size.
GCC 8+ generates assembly similar to:
movdqa (%rdi),%xmm0
movaps %xmm0,-0x50(%rbp)
This is an issue, because movaps will #GP if -0x50(%rbp) is not
aligned to 16 bytes [2], and how rbp gets assigned to is via glibc
clone thread_start, then function prologue, going though execution
trace similar to (showing only relevant instructions):
sub $0x10,%rsi
mov %rcx,0x8(%rsi)
mov %rdi,(%rsi)
syscall
pop %rax
pop %rdi
callq *%rax
push %rbp
mov %rsp,%rbp
The stack pointer always points to the topmost element on stack,
rather then the space right above the topmost. On push, the
pointer decrements first before writing to the memory pointed to
by it. Therefore, there is no need to have the stack pointer
pointer always point to valid memory unless the stack is poped;
so the `- sizeof(void *)` in the code is unnecessary.
On the other hand, glibc reserves the 16 bytes it needs on stack
and pops itself, so by the call instruction the stack pointer
is exactly the caller-supplied sp. It then push the 16 bytes of
the return address and the saved stack pointer, so the base
pointer will be 16-byte aligned if and only if the caller
supplied sp is 16-byte aligned. Therefore, the caller must supply
a 16-byte aligned pointer, which `stack + UM_KERN_PAGE_SIZE`
already satisfies.
On a side note, musl is unaffected by this issue because it forces
16 byte alignment via `and $-16,%rsi` in its clone wrapper.
Similarly, glibc i386 is also unaffected because it has
`andl $0xfffffff0, %ecx`.
To reproduce this bug, enable CONFIG_UML_RTC and
CC_OPTIMIZE_FOR_PERFORMANCE. uml_rtc will call
add_sigio_fd which will then cause write_sigio_thread to either go
into segfault loop or panic with "Segfault with no mm".
Similarly, signal stacks will be aligned by the host kernel upon
signal delivery. `- sizeof(void *)` to sigaltstack is
unconventional and extraneous.
On a related note, initialization of longjmp buffers do require
`- sizeof(void *)`. This is to account for the return address
that would have been pushed to the stack at the call site.
The reason for uml to respect 16-byte alignment, rather than
telling GCC to assume 8-byte alignment like the host kernel since
commit d9b0cde91c ("x86-64, gcc: Use
-mpreferred-stack-boundary=3 if supported"), is because uml links
against libc. There is no reason to assume libc is also compiled
with that flag and assumes 8-byte alignment rather than 16-byte.
[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838
[2] https://c9x.me/x86/html/file_module_x86_id_180.html
Signed-off-by: YiFei Zhu <zhuyifei1999@gmail.com>
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6d1c0f3d28 ]
This seems to happen fairly easily during READ_PLUS testing on NFS v4.2.
I found that we could end up accessing xdr->buf->pages[pgnr] with a pgnr
greater than the number of pages in the array. So let's just return
early if we're setting base to a point at the end of the page data and
let xdr_set_tail_base() handle setting up the buffer pointers instead.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Fixes: 8d86e373b0 ("SUNRPC: Clean up helpers xdr_set_iov() and xdr_set_page_base()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3731d44bba ]
Fix an Oopsable condition in pnfs_mark_request_commit() when we're
putting a set of writes on the commit list to reschedule them after a
failed pNFS attempt.
Fixes: 9c455a8c1e ("NFS/pNFS: Clean up pNFS commit operations")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit dd99e9f98f ]
Set up the connection to the NFSv4 server in nfs4_alloc_client(), before
we've added the struct nfs_client to the net-namespace's nfs_client_list
so that a downed server won't cause other mounts to hang in the trunking
detection code.
Reported-by: Michael Wakabayashi <mwakabayashi@vmware.com>
Fixes: 5c6e5b60aa ("NFS: Fix an Oops in the pNFS files and flexfiles connection setup to the DS")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f3076cd8d1 ]
The fuel gauge in the RT5033 PMIC has its own I2C bus and interrupt
line. Therefore, it is not actually part of the RT5033 MFD and needs
its own of_match_table to probe properly.
Also, given that it's independent of the MFD, there is actually
no need to make the Kconfig depend on MFD_RT5033. Although the driver
uses the shared <linux/mfd/rt5033.h> header, there is no compile
or runtime dependency on the RT5033 MFD driver.
Cc: Beomho Seo <beomho.seo@samsung.com>
Cc: Chanwoo Choi <cw00.choi@samsung.com>
Fixes: b847dd96e6 ("power: rt5033_battery: Add RT5033 Fuel gauge device driver")
Signed-off-by: Stephan Gerhold <stephan@gerhold.net>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bdcdaa13ad ]
"utf16s_to_utf8s(..., buf, PAGE_SIZE)" puts up to PAGE_SIZE bytes into
"buf" and returns the number of bytes it actually put there. If it wrote
PAGE_SIZE bytes, the newline added by dsm_label_utf16s_to_utf8s() would
overrun "buf".
Reduce the size available for utf16s_to_utf8s() to use so there is always
space for the newline.
[bhelgaas: reorder patch in series, commit log]
Fixes: 6058989bad ("PCI: Export ACPI _DSM provided firmware instance number and string name to sysfs")
Link: https://lore.kernel.org/r/20210603000112.703037-7-kw@linux.com
Reported-by: Joe Perches <joe@perches.com>
Signed-off-by: Krzysztof Wilczyński <kw@linux.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 601423bc0c ]
The ac->state field is __le32, not u32. So change the variable we're
temporarily storing it in to __le32 as well.
Reported-by: kernel test robot <lkp@intel.com>
Fixes: e61ffb3445 ("power: supply: Add AC driver for Surface Aggregator Module")
Signed-off-by: Maximilian Luz <luzmaximilian@gmail.com>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3e25e407a1 ]
A phys_addr_t may be wider than an int or pointer:
drivers/remoteproc/stm32_rproc.c: In function 'stm32_rproc_da_to_pa':
drivers/remoteproc/stm32_rproc.c:583:30: error: format '%x' expects argument of type 'unsigned int', but argument 5 has type 'phys_addr_t' {aka 'long long unsigned int'} [-Werror=format=]
583 | dev_dbg(dev, "da %llx to pa %#x\n", da, *pa);
Print it by reference using the special %pap format string.
Reviewed-by: Arnaud Pouliquen <arnaud.pouliquen@foss.st.com>
Fixes: 8a471396d2 ("remoteproc: stm32: Move resource table setup to rproc_ops")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20210421140053.3727528-1-arnd@kernel.org
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0dd571785d ]
As marcosfrm reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=213089
Initramfs generators rely on "pre" softdeps (and "depends") to include
additional required modules.
F2FS does not declare "pre: crc32" softdep. Then every generator (dracut,
mkinitcpio...) has to maintain a hardcoded list for this purpose.
Hence let's use MODULE_SOFTDEP("pre: crc32") in f2fs code.
Fixes: 43b6573bac ("f2fs: use cryptoapi crc32 functions")
Reported-by: marcosfrm <marcosfrm@gmail.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2c0f0f3639 ]
Commit 013c1667cf ("kallsyms: refactor
{,module_}kallsyms_on_each_symbol") replaced the return inside the
nested loop with a break, changing the semantics of the function: the
break only exits the innermost loop, so the code continues iterating the
symbols of the next module instead of exiting.
Fixes: 013c1667cf ("kallsyms: refactor {,module_}kallsyms_on_each_symbol")
Reviewed-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Jon Mediero <jmdr@disroot.org>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 937efa29e7 ]
With the original code a request for period = 65536000 ns and period =
32768000 ns yields the same register settings (which results in 32768000
ns) because the value for pwmc0 was miscalculated.
Also simplify using that fls(0) is 0.
Fixes: 721b595744 ("pwm: visconti: Add Toshiba Visconti SoC PWM support")
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Acked-by: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2beb4a53fc ]
The kernel pushes context on to the userspace stack to prepare for the
user's signal handler. When the user has supplied an alternate signal
stack, via sigaltstack(2), it is easy for the kernel to verify that the
stack size is sufficient for the current hardware context.
Check if writing the hardware context to the alternate stack will exceed
it's size. If yes, then instead of corrupting user-data and proceeding with
the original signal handler, an immediate SIGSEGV signal is delivered.
Refactor the stack pointer check code from on_sig_stack() and use the new
helper.
While the kernel allows new source code to discover and use a sufficient
alternate signal stack size, this check is still necessary to protect
binaries with insufficient alternate signal stack size from data
corruption.
Fixes: c2bc11f10a ("x86, AVX-512: Enable AVX-512 States Context Switch")
Reported-by: Florian Weimer <fweimer@redhat.com>
Suggested-by: Jann Horn <jannh@google.com>
Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Len Brown <len.brown@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20210518200320.17239-6-chang.seok.bae@intel.com
Link: https://bugzilla.kernel.org/show_bug.cgi?id=153531
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a948b1142c ]
Since commit 9a6944fee6 ("tracing: Add a verifier to check string
pointers for trace events"), which was merged in v5.13-rc1,
TP_printk() no longer tacitly supports the "%.*s" format specifier.
These are low value tracepoints, so just remove them.
Reported-by: David Wysochanski <dwysocha@redhat.com>
Fixes: dd5e3fbc1f ("NFSD: Add tracepoints to the NFSD state management code")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e633f33d26 ]
The battery subsystem of the Surface Aggregator Module EC requires us to
register the battery notifier with instance ID 0. However, battery
events are actually sent with the instance ID corresponding to the
device, which is nonzero. Thus, the strict-matching approach doesn't
work here and will discard events that the driver is expected to handle.
To fix this we have to fall back on notifier matching by target-category
only and have to manually check the instance ID in the notifier
callback.
Fixes: 167f77f7d0 ("power: supply: Add battery driver for Surface Aggregator Module")
Signed-off-by: Maximilian Luz <luzmaximilian@gmail.com>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d80c228d44 ]
On the IO submission path, blk_account_io_start() may interrupt
the system interruption. When the interruption returns, the value
of part->stamp may have been updated by other cores, so the time
value collected before the interruption may be less than part->
stamp. So when this happens, we should do nothing to make io_ticks
more accurate? For kernels less than 5.0, this may cause io_ticks
to become smaller, which in turn may cause abnormal ioutil values.
Signed-off-by: Chunguang Xu <brookxu@tencent.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/1625521646-1069-1-git-send-email-brookxu.cn@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 558d6450c7 ]
If a writeback of the superblock fails with an I/O error, the buffer
is marked not uptodate. However, this can cause a WARN_ON to trigger
when we attempt to write superblock a second time. (Which might
succeed this time, for cerrtain types of block devices such as iSCSI
devices over a flaky network.)
Try to detect this case in flush_stashed_error_work(), and also change
__ext4_handle_dirty_metadata() so we always set the uptodate flag, not
just in the nojournal case.
Before this commit, this problem can be repliciated via:
1. dmsetup create dust1 --table '0 2097152 dust /dev/sdc 0 4096'
2. mount /dev/mapper/dust1 /home/test
3. dmsetup message dust1 0 addbadblock 0 10
4. cd /home/test
5. echo "XXXXXXX" > t
After a few seconds, we got following warning:
[ 80.654487] end_buffer_async_write: bh=0xffff88842f18bdd0
[ 80.656134] Buffer I/O error on dev dm-0, logical block 0, lost async page write
[ 85.774450] EXT4-fs error (device dm-0): ext4_check_bdev_write_error:193: comm kworker/u16:8: Error while async write back metadata
[ 91.415513] mark_buffer_dirty: bh=0xffff88842f18bdd0
[ 91.417038] ------------[ cut here ]------------
[ 91.418450] WARNING: CPU: 1 PID: 1944 at fs/buffer.c:1092 mark_buffer_dirty.cold+0x1c/0x5e
[ 91.440322] Call Trace:
[ 91.440652] __jbd2_journal_temp_unlink_buffer+0x135/0x220
[ 91.441354] __jbd2_journal_unfile_buffer+0x24/0x90
[ 91.441981] __jbd2_journal_refile_buffer+0x134/0x1d0
[ 91.442628] jbd2_journal_commit_transaction+0x249a/0x3240
[ 91.443336] ? put_prev_entity+0x2a/0x200
[ 91.443856] ? kjournald2+0x12e/0x510
[ 91.444324] kjournald2+0x12e/0x510
[ 91.444773] ? woken_wake_function+0x30/0x30
[ 91.445326] kthread+0x150/0x1b0
[ 91.445739] ? commit_timeout+0x20/0x20
[ 91.446258] ? kthread_flush_worker+0xb0/0xb0
[ 91.446818] ret_from_fork+0x1f/0x30
[ 91.447293] ---[ end trace 66f0b6bf3d1abade ]---
Signed-off-by: Ye Bin <yebin10@huawei.com>
Link: https://lore.kernel.org/r/20210615090537.3423231-1-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3cf5f7ab23 ]
An IRQ handler may be called at any time after it is registered, so
anything it relies on must be ready before registration.
rockchip_pcie_subsys_irq_handler() and rockchip_pcie_client_irq_handler()
read registers in the PCIe controller, but we registered them before
turning on clocks to the controller. If either is called before the clocks
are turned on, the register reads fail and the machine hangs.
Similarly, rockchip_pcie_legacy_int_handler() uses rockchip->irq_domain,
but we installed it before initializing irq_domain.
Register IRQ handlers after their data structures are initialized and
clocks are enabled.
Found by enabling CONFIG_DEBUG_SHIRQ, which calls the IRQ handler when it
is being unregistered. An error during the probe path might cause this
unregistration and IRQ handler execution before the device or data
structure init has finished.
[bhelgaas: commit log]
Link: https://lore.kernel.org/r/20210608080409.1729276-1-javierm@redhat.com
Reported-by: Peter Robinson <pbrobinson@gmail.com>
Tested-by: Peter Robinson <pbrobinson@gmail.com>
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Shawn Lin <shawn.lin@rock-chips.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9249c32ec9 ]
The Dell Vostro 3350 ACPI video-bus device reports spurious
ACPI_VIDEO_NOTIFY_CYCLE events resulting in spurious KEY_SWITCHVIDEOMODE
events being reported to userspace (and causing trouble there).
Add a quirk setting the report_key_events mask to
REPORT_BRIGHTNESS_KEY_EVENTS so that the ACPI_VIDEO_NOTIFY_CYCLE
events will be ignored, while still reporting brightness up/down
hotkey-presses to userspace normally.
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1911763
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7718629432 ]
In function amba_handler_attach(), dev->res.name is initialized by
amba_device_alloc. But when address_found is false, dev->res.name is
assigned to null value, which leads to wrong resource name display in
/proc/iomem, "<BAD>" is seen for those resources.
Signed-off-by: Liguang Zhang <zhangliguang@linux.alibaba.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 86f7fa71cd ]
A consumer is expected to disable a PWM before calling pwm_put(). And if
they didn't there is hopefully a good reason (or the consumer needs
fixing). Also if disabling an enabled PWM was the right thing to do,
this should better be done in the framework instead of in each low level
driver.
So drop the hardware modification from the .remove() callback.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fde25294df ]
pm_runtime_get_sync will increment pm usage counter even it failed.
Forgetting to putting operation will result in reference leak here.
Fix it by replacing it with pm_runtime_resume_and_get to keep usage
counter balanced.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zou Wei <zou_wei@huawei.com>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit dcdb4d904b ]
3 cases of kobj leak, which causes memory leak:
kobj_type must have release() method to free memory from release
callback. Don't need NULL default_attrs to init kobj.
sysfs files created under kobj_status should be removed with kobj_status
as parent kobject.
Remove queue sysfs files when releasing queue from process MMU notifier
release callback.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3a06b912a5 ]
It turns out that the "T3 MRD" DMI_BOARD_NAME value is used in a lot of
different Cherry Trail x5-z8300 / x5-z8350 based Mini-PC / HDMI-stick
models from Ace PC / Meegopad / MinisForum / Wintel (and likely also
other vendors).
Most of the other DMI strings on these boxes unfortunately contain various
generic values like "Default string" or "$(DEFAULT_STRING)", so we cannot
match on them. These devices do have their chassis-type correctly set to a
value of "3" (desktop) which is a pleasant surprise, so also match on that.
This should avoid the quirk accidentally also getting applied to laptops /
tablets (which do actually have a battery). Although in my quite large
database of Bay and Cherry Trail based devices DMIdecode dumps I don't
have any laptops / tables with a board-name of "T3 MRD", so this should
not be an issue.
BugLink: https://askubuntu.com/questions/1206714/how-can-a-mini-pc-be-stopped-from-being-detected-as-a-laptop-with-a-battery/
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit dfe52db13a ]
This patch adds missing MODULE_DEVICE_TABLE definition which generates
correct modalias for automatic loading of this driver when it is built
as an external module.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zou Wei <zou_wei@huawei.com>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 073b5d5b1f ]
This patch adds missing MODULE_DEVICE_TABLE definition which generates
correct modalias for automatic loading of this driver when it is built
as an external module.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zou Wei <zou_wei@huawei.com>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4465b3a621 ]
This patch adds missing MODULE_DEVICE_TABLE definition which generates
correct modalias for automatic loading of this driver when it is built
as an external module.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zou Wei <zou_wei@huawei.com>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e97bc66377 ]
If a file has already been closed, then it should not be selected to
support further I/O.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
[Trond: Fix an invalid pointer deref reported by Colin Ian King]
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 22d41cdcd3 ]
The checks for page->mapping are odd, as set_page_dirty is an
address_space operation, and I don't see where it would be called on a
non-pagecache page.
The warning about the page lock also seems bogus. The comment over
set_page_dirty() says that it can be called without the page lock in
some rare cases. I don't think we want to warn if that's the case.
Reported-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0fdec1b3c9 ]
Orangefs df output is whacky. Walt Ligon suggested this might fix it.
It seems way more in line with reality now...
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 20cf7d4ea4 ]
If the change attribute update is declared to be non-atomic by the
server, or our cached value does not match the server's value before the
operation was performed, then we should declare the inode cache invalid.
On the other hand, if the change to the directory raced with a lookup or
getattr which already updated the change attribute, then optimise away
the revalidation.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 213bb58475 ]
The inode is considered revalidated when we've checked the value of the
change attribute against our cached value since that suffices to
establish whether or not the other cached values are valid.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit aee8c67a4f ]
When *RSTOR from user memory raises an exception, there is no way to
differentiate them. That's bad because it forces the slow path even when
the failure was not a fault. If the operation raised eg. #GP then going
through the slow path is pointless.
Use _ASM_EXTABLE_FAULT() which stores the trap number and let the exception
fixup return the negated trap number as error.
This allows to separate the fast path and let it handle faults directly and
avoid the slow path for all other exceptions.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210623121457.601480369@linutronix.de
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 15db16837a ]
Server responds to LOOKUP and other ops (READDIRPLUS/CREATE/MKNOD/...)
with ourarg containing nodeid and generation.
If a fuse inode is found in inode cache with the same nodeid but different
generation, the existing fuse inode should be unhashed and marked "bad" and
a new inode with the new generation should be hashed instead.
This can happen, for example, with passhrough fuse filesystem that returns
the real filesystem ino/generation on lookup and where real inode numbers
can get recycled due to real files being unlinked not via the fuse
passthrough filesystem.
With current code, this situation will not be detected and an old fuse
dentry that used to point to an older generation real inode, can be used to
access a completely new inode, which should be accessed only via the new
dentry.
Note that because the FORGET message carries the nodeid w/o generation, the
server should wait to get FORGET counts for the nlookup counts of the old
and reused inodes combined, before it can free the resources associated to
that nodeid.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2d82ab251e ]
Even if POSIX doesn't mandate it, linux users legitimately expect sync() to
flush all data and metadata to physical storage when it is located on the
same system. This isn't happening with virtiofs though: sync() inside the
guest returns right away even though data still needs to be flushed from
the host page cache.
This is easily demonstrated by doing the following in the guest:
$ dd if=/dev/zero of=/mnt/foo bs=1M count=5K ; strace -T -e sync sync
5120+0 records in
5120+0 records out
5368709120 bytes (5.4 GB, 5.0 GiB) copied, 5.22224 s, 1.0 GB/s
sync() = 0 <0.024068>
and start the following in the host when the 'dd' command completes
in the guest:
$ strace -T -e fsync /usr/bin/sync virtiofs/foo
fsync(3) = 0 <10.371640>
There are no good reasons not to honor the expected behavior of sync()
actually: it gives an unrealistic impression that virtiofs is super fast
and that data has safely landed on HW, which isn't the case obviously.
Implement a ->sync_fs() superblock operation that sends a new FUSE_SYNCFS
request type for this purpose. Provision a 64-bit placeholder for possible
future extensions. Since the file server cannot handle the wait == 0 case,
we skip it to avoid a gratuitous roundtrip. Note that this is
per-superblock: a FUSE_SYNCFS is send for the root mount and for each
submount.
Like with FUSE_FSYNC and FUSE_FSYNCDIR, lack of support for FUSE_SYNCFS in
the file server is treated as permanent success. This ensures
compatibility with older file servers: the client will get the current
behavior of sync() not being propagated to the file server.
Note that such an operation allows the file server to DoS sync(). Since a
typical FUSE file server is an untrusted piece of software running in
userspace, this is disabled by default. Only enable it with virtiofs for
now since virtiofsd is supposedly trusted by the guest kernel.
Reported-by: Robert Krawitz <rlk@redhat.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit cb011044e3 ]
This was already attempted to fix via 1fccb73011: If the BIOS did not
enable TCO SMIs, the timer definitely needs to trigger twice in order to
cause a reboot. If TCO SMIs are on, as well as SMIs in general, we can
continue to assume that the BIOS will perform a reboot on the first
timeout.
QEMU with its ICH9 and related BIOS falls into the former category,
currently taking twice the configured timeout in order to reboot the
machine. For iTCO version that fall under turn_SMI_watchdog_clear_off,
this is also true and was currently only addressed for v1, irrespective
of the turn_SMI_watchdog_clear_off value.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/0b8bb307-d08b-41b5-696c-305cdac6789c@siemens.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8fe55ef233 ]
Attempting to boot 32-bit ARM kernels under QEMU's 3.x virt models fails
when we have more than 512M of RAM in the model as we run out of vmalloc
space for the PCI ECAM regions. This failure will be silent when running
libvirt, as the console in that situation is a PCI device.
In this configuration, the kernel maps the whole ECAM, which QEMU sets up
for 256 buses, even when maybe only seven buses are in use. Each bus uses
1M of ECAM space, and ioremap() adds an additional guard page between
allocations. The kernel vmap allocator will align these regions to 512K,
resulting in each mapping eating 1.5M of vmalloc space. This means we need
384M of vmalloc space just to map all of these, which is very wasteful of
resources.
Fix this by only mapping the ECAM for buses we are going to be using. In
my setups, this is around seven buses in most guests, which is 10.5M of
vmalloc space - way smaller than the 384M that would otherwise be required.
This also means that the kernel can boot without forcing extra RAM into
highmem with the vmalloc= argument, or decreasing the virtual RAM available
to the guest.
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/E1lhCAV-0002yb-50@rmk-PC.armlinux.org.uk
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a97396c6eb ]
Downstream Port Containment (PCIe r5.0, sec. 6.2.10) disables the link upon
an error and attempts to re-enable it when instructed by the DPC driver.
A slot which is both DPC- and hotplug-capable is currently powered off by
pciehp once DPC is triggered (due to the link change) and powered back up
on successful recovery. That's undesirable, the slot should remain powered
so the hotplugged device remains bound to its driver. DPC notifies the
driver of the error and of successful recovery in pcie_do_recovery() and
the driver may then restore the device to working state.
Moreover, Sinan points out that turning off slot power by pciehp may foil
recovery by DPC: Power off/on is a cold reset concurrently to DPC's warm
reset. Sathyanarayanan reports extended delays or failure in link
retraining by DPC if pciehp brings down the slot.
Fix by detecting whether a Link Down event is caused by DPC and awaiting
recovery if so. On successful recovery, ignore both the Link Down and the
subsequent Link Up event.
Afterwards, check whether the link is down to detect surprise-removal or
another DPC event immediately after DPC recovery. Ensure that the
corresponding DLLSC event is not ignored by synthesizing it and invoking
irq_wake_thread() to trigger a re-run of pciehp_ist().
The IRQ threads of the hotplug and DPC drivers, pciehp_ist() and
dpc_handler(), race against each other. If pciehp is faster than DPC, it
will wait until DPC recovery completes.
Recovery consists of two steps: The first step (waiting for link
disablement) is recognizable by pciehp through a set DPC Trigger Status
bit. The second step (waiting for link retraining) is recognizable through
a newly introduced PCI_DPC_RECOVERING flag.
If DPC is faster than pciehp, neither of the two flags will be set and
pciehp may glean the recovery status from the new PCI_DPC_RECOVERED flag.
The flag is zero if DPC didn't occur at all, hence DLLSC events are not
ignored by default.
pciehp waits up to 4 seconds before assuming that DPC recovery failed and
bringing down the slot. This timeout is not taken from the spec (it
doesn't mandate one) but based on a report from Yicong Yang that DPC may
take a bit more than 3 seconds on HiSilicon's Kunpeng platform.
The timeout is necessary because the DPC Trigger Status bit may never
clear: On Root Ports which support RP Extensions for DPC, the DPC driver
polls the DPC RP Busy bit for up to 1 second before giving up on DPC
recovery. Without the timeout, pciehp would then wait indefinitely for DPC
to complete.
This commit draws inspiration from previous attempts to synchronize DPC
with pciehp:
By Sinan Kaya, August 2018:
https://lore.kernel.org/linux-pci/20180818065126.77912-1-okaya@kernel.org/
By Ethan Zhao, October 2020:
https://lore.kernel.org/linux-pci/20201007113158.48933-1-haifeng.zhao@intel.com/
By Kuppuswamy Sathyanarayanan, March 2021:
https://lore.kernel.org/linux-pci/59cb30f5e5ac6d65427ceaadf1012b2ba8dbf66c.1615606143.git.sathyanarayanan.kuppuswamy@linux.intel.com/
Link: https://lore.kernel.org/r/0be565d97438fe2a6d57354b3aa4e8626952a00b.1619857124.git.lukas@wunner.de
Reported-by: Sinan Kaya <okaya@kernel.org>
Reported-by: Ethan Zhao <haifeng.zhao@intel.com>
Reported-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Tested-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Tested-by: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit be20037725 ]
If we're unable to immediately recover all locks because the server is
unable to immediately service our reclaim calls, then we want to retry
after we've finished servicing all the other asynchronous delegation
returns on our queue.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3ec0c3ec2d ]
In order to use upstream_bridge_distance_warn() from a dma_map function, it
must not sleep. However, pci_get_slot() takes the pci_bus_sem so it might
sleep.
In order to avoid this, try to get the host bridge's device from the first
element in the device list. It should be impossible for the host bridge's
device to go away while references are held on child devices, so the first
element should not be able to change and, thus, this should be safe.
Introduce a static function called pci_host_bridge_dev() to obtain the host
bridge's root device.
Link: https://lore.kernel.org/r/20210610160609.28447-7-logang@deltatee.com
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ed3443fb4d ]
This patch adds missing MODULE_DEVICE_TABLE definition which generates
correct modalias for automatic loading of this driver when it is built
as an external module.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Bixuan Cui <cuibixuan@huawei.com>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8bb2314fc2 ]
Interrupt line can be configured on different hardware in different way,
even inverted. Therefore driver should not enforce specific trigger
type - edge falling - but instead rely on Devicetree to configure it.
The Maxim 14577/77836 datasheets describe the interrupt line as active
low with a requirement of acknowledge from the CPU therefore the edge
falling is not correct.
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Acked-by: Iskren Chernev <iskren.chernev@gmail.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7fbf6b731b ]
Interrupt line can be configured on different hardware in different way,
even inverted. Therefore driver should not enforce specific trigger
type - edge falling - but instead rely on Devicetree to configure it.
The Maxim 17047/77693 datasheets describe the interrupt line as active
low with a requirement of acknowledge from the CPU therefore the edge
falling is not correct.
The interrupt line is shared between PMIC and RTC driver, so using level
sensitive interrupt is here especially important to avoid races. With
an edge configuration in case if first PMIC signals interrupt followed
shortly after by the RTC, the interrupt might not be yet cleared/acked
thus the second one would not be noticed.
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6d6e705027 ]
Previously, the last used PWM channel could change the global prescale
setting, even if other channels are already in use.
Fix it by only allowing the first enabled PWM to change the global
chip-wide prescale setting. If there is more than one channel in use,
the prescale settings resulting from the chosen periods must match.
GPIOs do not count as enabled PWMs as they are not using the prescaler
and can't change it.
Signed-off-by: Clemens Gruber <clemens.gruber@pqgruber.com>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 94d2276320 ]
On removing the device, any work item (hv_pci_devices_present() or
hv_pci_eject_device()) scheduled on workqueue hbus->wq may still be running
and race with hv_pci_remove().
This can happen because the host may send PCI_EJECT or PCI_BUS_RELATIONS(2)
and decide to rescind the channel immediately after that.
Fix this by flushing/destroying the workqueue of hbus before doing hbus remove.
Link: https://lore.kernel.org/r/1620806800-30983-1-git-send-email-longli@linuxonhyperv.com
Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f9184a228d ]
The vendor code tree supplies platform data to enable he
USB charging for AB8500 and AB8500 and disable the AC
charging on the AB8505. This was missed when the driver
was submitted to the mainline kernel.
Fix this by doing what the vendor kernel does: always
register the USB charger, do not register the AC charger
on the AB8505.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5bcb5087c9 ]
Sometimes the code will crash because we haven't enabled
AC or USB charging and thus not created the corresponding
psy device. Fix it by checking that it is there before
notifying.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1c1f13a006 ]
The driver has problems with the different components of
the charging code racing with each other to probe().
This results in all four subdrivers populating battery
information to ascertain that it is populated for their
own needs for example.
Fix this by using component probing and thus expressing
to the kernel that these are dependent components.
The probes can happen in any order and will only acquire
resources such as state container, regulators and
interrupts and initialize the data structures, but no
execution happens until the .bind() callback is called.
The charging driver is the main component and binds
first, then bind in order the three subcomponents:
ab8500-fg, ab8500-btemp and ab8500-chargalg.
Do some housekeeping while we are moving the code around.
Like use devm_* for IRQs so as to cut down on some
boilerplate.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b601a18f12 ]
A consumer is expected to disable a PWM before calling pwm_put(). And if
they didn't there is hopefully a good reason (or the consumer needs
fixing). Also if disabling an enabled PWM was the right thing to do,
this should better be done in the framework instead of in each low level
driver.
So drop the hardware modification from the .remove() callback.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2aac79d14d ]
This patch adds missing MODULE_DEVICE_TABLE definition which generates
correct modalias for automatic loading of this driver when it is built
as an external module.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zou Wei <zou_wei@huawei.com>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 603fcfb9d4 ]
This patch adds missing MODULE_DEVICE_TABLE definition which generates
correct modalias for automatic loading of this driver when it is built
as an external module.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zou Wei <zou_wei@huawei.com>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2c484419ef ]
lz4 compatible decompressor is simple. The format is underspecified and
relies on EOF notification to determine when to stop. Initramfs buffer
format[1] explicitly states that it can have arbitrary number of zero
padding. Thus when operating without a fill function, be extra careful to
ensure that sizes less than 4, or apperantly empty chunksizes are treated
as EOF.
To test this I have created two cpio initrds, first a normal one,
main.cpio. And second one with just a single /test-file with content
"second" second.cpio. Then i compressed both of them with gzip, and with
lz4 -l. Then I created a padding of 4 bytes (dd if=/dev/zero of=pad4 bs=1
count=4). To create four testcase initrds:
1) main.cpio.gzip + extra.cpio.gzip = pad0.gzip
2) main.cpio.lz4 + extra.cpio.lz4 = pad0.lz4
3) main.cpio.gzip + pad4 + extra.cpio.gzip = pad4.gzip
4) main.cpio.lz4 + pad4 + extra.cpio.lz4 = pad4.lz4
The pad4 test-cases replicate the initrd load by grub, as it pads and
aligns every initrd it loads.
All of the above boot, however /test-file was not accessible in the initrd
for the testcase #4, as decoding in lz4 decompressor failed. Also an
error message printed which usually is harmless.
Whith a patched kernel, all of the above testcases now pass, and
/test-file is accessible.
This fixes lz4 initrd decompress warning on every boot with grub. And
more importantly this fixes inability to load multiple lz4 compressed
initrds with grub. This patch has been shipping in Ubuntu kernels since
January 2021.
[1] ./Documentation/driver-api/early-userspace/buffer-format.rst
BugLink: https://bugs.launchpad.net/bugs/1835660
Link: https://lore.kernel.org/lkml/20210114200256.196589-1-xnox@ubuntu.com/ # v0
Link: https://lkml.kernel.org/r/20210513104831.432975-1-dimitri.ledkov@canonical.com
Signed-off-by: Dimitri John Ledkov <dimitri.ledkov@canonical.com>
Cc: Kyungsik Lee <kyungsik.lee@lge.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Bongkyu Kim <bongkyu.kim@lge.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Sven Schmidt <4sschmid@informatik.uni-hamburg.de>
Cc: Rajat Asthana <thisisrast7@gmail.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Gao Xiang <hsiangkao@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b64210f2f7 ]
If an i2c client receives an interrupt during reboot or shutdown it may
be too late to service it by making an i2c transaction on the bus
because the i2c controller has already been shutdown. This can lead to
system hangs if the i2c controller tries to make a transfer that is
doomed to fail because the access to the i2c pins is already shut down,
or an iommu translation has been torn down so i2c controller register
access doesn't work.
Let's simply disable the irq if there isn't a shutdown callback for an
i2c client when there is an irq associated with the device. This will
make sure that irqs don't come in later than the time that we can handle
it. We don't do this if the i2c client device already has a shutdown
callback because presumably they're doing the right thing and quieting
the device so irqs don't come in after the shutdown callback returns.
Reported-by: kernel test robot <lkp@intel.com>
[swboyd@chromium.org: Dropped newline, added commit text, added
interrupt.h for robot build error]
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9d0150db97 ]
This patch adds missing MODULE_DEVICE_TABLE definition which generates
correct modalias for automatic loading of this driver when it is built
as an external module.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zou Wei <zou_wei@huawei.com>
Signed-off-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fa4db23233 ]
The devices in protocol version 2 has a register with flag for IEC 60958
signal detection as source of sampling clock without discrimination
between coaxial and optical interfaces. On the other hand, current
implementation of driver manage to interpret type of signal on optical
interface instead.
This commit fixes the detection of optical/coaxial interface for S/PDIF
signal.
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Link: https://lore.kernel.org/r/20210623075941.72562-2-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b9481a667a ]
According to <linux/backlight.h> .update_status() is supposed to
return 0 on success and a negative error code otherwise. Adapt
lm3630a_bank_a_update_status() and lm3630a_bank_b_update_status() to
actually do it.
While touching that also add the error code to the failure message.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9b5ddea9ce ]
The private->vol_updated flag was being checked outside of the
mutex_lock/unlock() of private->data_mutex leading to the volume data
being fetched twice from the device unnecessarily or old volume data
being returned.
Update scarlett2_*_ctl_get() and include the private->vol_updated flag
check inside the critical region.
Signed-off-by: Geoffrey D. Bennett <g@b4.vu>
Link: https://lore.kernel.org/r/20210620164643.GA9216@m.b4.vu
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3f605ee17f ]
The kernel test robot reports that the "screen_bits" splash screen data
is missing for the dragen platform:
arch/m68k/68000/dragen2.c:73:16: error: 'screen_bits' undeclared (first use in this function)
73 | LSSA = (long) screen_bits;
| ^~~~~~~~~~~
arch/m68k/68000/dragen2.c:73:16: note: each undeclared identifier is reported only once for each function it appears in
Digging around a bit I found the screen_bits data structure was originally
in a screen.h file that was generated from a screen.xbm file. That was
removed in commit 0c0e6db806 ("m68k: drop unused parts of 68VZ328
Makefile").
Other splash screen initializers for 68000 targets are kept here as the
C data structures so lets do the same for this one. Add the missing
screen.h file and include it in the dragen specific startup code.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 50ebe56222 ]
A user of FFADO project reported the issue of ToneWeal FW66. As a result,
the device is identified as one of applications of BeBoB solution.
I note that in the report the device returns contradictory result in plug
discovery process for audio subunit. Fortunately ALSA BeBoB driver doesn't
perform it thus it's likely to handle the device without issues.
I receive no reaction to test request for this patch yet, however it would
be worth to add support for it.
daniel@gibbonmoon:/sys/bus/firewire/devices/fw1$ grep -r . *
Binary file config_rom matches
dev:244:1
guid:0x0023270002000000
hardware_version:0x000002
is_local:0
model:0x020002
model_name:FW66
power/runtime_active_time:0
power/runtime_active_kids:0
power/runtime_usage:0
power/runtime_status:unsupported
power/async:disabled
power/runtime_suspended_time:0
power/runtime_enabled:disabled
power/control:auto
subsystem/drivers_autoprobe:1
uevent:MAJOR=244
uevent:MINOR=1
uevent:DEVNAME=fw1
units:0x00a02d:0x010001
vendor:0x002327
vendor_name:ToneWeal
fw1.0/uevent:MODALIAS=ieee1394:ven00002327mo00020002sp0000A02Dver00010001
fw1.0/power/runtime_active_time:0
fw1.0/power/runtime_active_kids:0
fw1.0/power/runtime_usage:0
fw1.0/power/runtime_status:unsupported
fw1.0/power/async:disabled
fw1.0/power/runtime_suspended_time:0
fw1.0/power/runtime_enabled:disabled
fw1.0/power/control:auto
fw1.0/model:0x020002
fw1.0/rom_index:15
fw1.0/specifier_id:0x00a02d
fw1.0/model_name:FW66
fw1.0/version:0x010001
fw1.0/modalias:ieee1394:ven00002327mo00020002sp0000A02Dver00010001
Cc: Daniel Jozsef <daniel.jozsef@gmail.com>
Reference: https://lore.kernel.org/alsa-devel/20200119164335.GA11974@workstation/
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Link: https://lore.kernel.org/r/20210619083922.16060-1-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit cac7100d4c ]
Inside function hideep_nvm_unlock(), variable "unmask_code" could
be uninitialized if hideep_pgm_r_reg() returns error, however, it
is used in the later if statement after an "and" operation, which
is potentially unsafe.
Signed-off-by: Yizhuo <yzhai003@ucr.edu>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit da90575767 ]
The tprot() inline asm temporarily changes the program check new psw
to redirect a potential program check on the diag instruction.
Restoring of the program check new psw is done in C code behind the
inline asm.
This can be problematic, especially if the function is inlined, since
the compiler can reorder instructions in such a way that a different
instruction, which may result in a program check, might be executed
before the program check new psw has been restored.
To avoid such a scenario move restoring into the inline asm. For
consistency reasons move also saving of the original program check new
psw into the inline asm.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 86807f348f ]
The __diag260() inline asm temporarily changes the program check new
psw to redirect a potential program check on the diag instruction.
Restoring of the program check new psw is done in C code behind the
inline asm.
This can be problematic, especially if the function is inlined, since
the compiler can reorder instructions in such a way that a different
instruction, which may result in a program check, might be executed
before the program check new psw has been restored.
To avoid such a scenario move restoring into the inline asm. For
consistency reasons move also saving of the original program check new
psw into the inline asm.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 88c2510cec ]
The __diag308() inline asm temporarily changes the program check new
psw to redirect a potential program check on the diag instruction.
Restoring of the program check new psw is done in C code behind the
inline asm.
This can be problematic, especially if the function is inlined, since
the compiler can reorder instructions in such a way that a different
instruction, which may result in a program check, might be executed
before the program check new psw has been restored.
To avoid such a scenario move restoring into the inline asm. For
consistency reasons move also saving of the original program check new
psw into the inline asm.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9c9a915afd ]
s390 is the only architecture which makes use of the __no_kasan_or_inline
attribute for two functions. Given that both stap() and __load_psw_mask()
are very small functions they can and should be always inlined anyway.
Therefore get rid of __no_kasan_or_inline and always inline these
functions.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f5eb7bf0c4 ]
fix the following smatch warnings:
goya_pin_memory_before_cs()
warn: '&userptr->job_node' not removed from list
gaudi_pin_memory_before_cs()
warn: '&userptr->job_node' not removed from list
Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0f37510ca3 ]
When converting virtual address to physical we need to add correct
offset to the physical page.
For this we need to use mask that include ALL bits of page offset.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b92c637c5f ]
This fix was applied since there was an incorrect reported CPU ID to GIC
such that an error in MME2 QMAN aliased to be an arriving from DMA0_QM.
Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 190ec49710 ]
Fix issue in which the input to the function is_asic_secured was device
PCI_IDS number instead of the asic_type enumeration.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 07d8ad6fd8 ]
Update _tlbiel_pid() such that we can avoid build errors like below when
using this function in other places.
arch/powerpc/mm/book3s64/radix_tlb.c: In function ‘__radix__flush_tlb_range_psize’:
arch/powerpc/mm/book3s64/radix_tlb.c:114:2: warning: ‘asm’ operand 3 probably does not match constraints
114 | asm volatile(PPC_TLBIEL(%0, %4, %3, %2, %1)
| ^~~
arch/powerpc/mm/book3s64/radix_tlb.c:114:2: error: impossible constraint in ‘asm’
make[4]: *** [scripts/Makefile.build:271: arch/powerpc/mm/book3s64/radix_tlb.o] Error 1
m
With this fix, we can also drop the __always_inline in __radix_flush_tlb_range_psize
which was added by commit e12d6d7d46 ("powerpc/mm/radix: mark __radix__flush_tlb_range_psize() as __always_inline")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210610083639.387365-1-aneesh.kumar@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6d49b3a0f3 ]
The On Semi pca9655 is a 16 bit variant of the On Semi pca9654 GPIO
expander, with 16 GPIOs and interrupt functionality.
Signed-off-by: Peter Robinson <pbrobinson@gmail.com>
[Bartosz: fixed indentation as noted by Andy]
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 45677c9aeb ]
The "no_handler_test" in ebb selftests attempts to read the PMU
registers twice via helper function "dump_ebb_state". First dump is
just before closing of event and the second invocation is done after
closing of the event. The original intention of second
dump_ebb_state was to dump the state of registers at the end of
the test when the counters are frozen. But this will be achieved
with the first call itself since sample period is set to low value
and PMU will be frozen by then. Hence patch removes the
dump which was done before closing of the event.
Reported-by: Shirisha Ganta <shirisha.ganta1@ibm.com>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Tested-by: Nageswara R Sastry <rnsastry@linux.ibm.com <mailto:rnsastry@linux.ibm.com>>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1621950703-1532-2-git-send-email-atrajeev@linux.vnet.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 52e1b3b3da ]
Hyper-V is observed to sometimes set multiple flags in the srb_status, such
as ABORTED and ERROR. Current code in storvsc_handle_error() handles only a
single flag being set, and does nothing when multiple flags are set. Fix
this by changing the case statement into a series of "if" statements
testing individual flags. The functionality for handling each flag is
unchanged.
Link: https://lore.kernel.org/r/1622827263-12516-3-git-send-email-mikelley@microsoft.com
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b18f901382 ]
DROM for USB4 host/device has a shorter header than Thunderbolt DROM
header. This patch addresses host/device with USB4 DROM (According to spec:
Universal Serial Bus 4 (USB4) Device ROM Specification, Rev 1.0, Feb-2021).
While there correct the data_len field to be 12 bits and rename
__unknown1 to reserved following the spec.
Signed-off-by: Gil Fine <gil.fine@intel.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 35d7b72a63 ]
In two different instances the return value of "irq_get_irq_data"
API was neither captured nor checked.
Fixed it by capturing the return value and then checking for any error.
Addresses-Coverity: "returned_null"
Signed-off-by: Srinivas Neeli <srinivas.neeli@xilinx.com>
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a51b2fb94b ]
Return value of "pm_runtime_get_sync" API was neither captured nor checked.
Fixed it by capturing the return value and then checking for any warning.
Addresses-Coverity: "check_return"
Signed-off-by: Srinivas Neeli <srinivas.neeli@xilinx.com>
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1adf30f198 ]
arm_smmu_rpm_get() invokes pm_runtime_get_sync(), which increases the
refcount of the "smmu" even though the return value is less than 0.
The reference counting issue happens in some error handling paths of
arm_smmu_rpm_get() in its caller functions. When arm_smmu_rpm_get()
fails, the caller functions forget to decrease the refcount of "smmu"
increased by arm_smmu_rpm_get(), causing a refcount leak.
Fix this issue by calling pm_runtime_resume_and_get() instead of
pm_runtime_get_sync() in arm_smmu_rpm_get(), which can keep the refcount
balanced in case of failure.
Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Link: https://lore.kernel.org/r/1623293672-17954-1-git-send-email-xiyuyang19@fudan.edu.cn
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9733862e50 ]
Commit f959dcd6dd (dma-direct: Fix
potential NULL pointer dereference) added a null check on the
dma_mask pointer of the kernel's device structure.
Add a dma_mask variable to the ps3_dma_region structure and set
the device structure's dma_mask pointer to point to this new variable.
Fixes runtime errors like these:
# WARNING: Fixes tag on line 10 doesn't match correct format
# WARNING: Fixes tag on line 10 doesn't match correct format
ps3_system_bus_match:349: dev=8.0(sb_01), drv=8.0(ps3flash): match
WARNING: CPU: 0 PID: 1 at kernel/dma/mapping.c:151 .dma_map_page_attrs+0x34/0x1e0
ps3flash sb_01: ps3stor_setup:193: map DMA region failed
Signed-off-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/562d0c9ea0100a30c3b186bcc7adb34b0bbd2cd7.1622746428.git.geoff@infradead.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c305366a37 ]
snd_sb_qsound_destroy() contains the calls of removing the previously
created mixer controls, but it doesn't clear the pointers. As
snd_sb_qsound_destroy() itself may be repeatedly called via ioctl,
this could lead to double-free potentially.
Fix it by clearing the struct fields properly afterwards.
Link: https://lore.kernel.org/r/20210608140540.17885-4-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0d3e5a0579 ]
This test will require /dev/rtc0, the default RTC device, or one
specified by user to run. Since this default RTC is not guaranteed to
exist on all of the devices, so check its existence first, otherwise
skip this test with the kselftest skip code 4.
Without this patch this test will fail like this on a s390x zVM:
$ selftests: timers: rtcpie
$ /dev/rtc0: No such file or directory
not ok 1 selftests: timers: rtcpie # exit=22
With this patch:
$ selftests: timers: rtcpie
$ Default RTC /dev/rtc0 does not exist. Test Skipped!
not ok 9 selftests: timers: rtcpie # SKIP
Fixed up change log so "With this patch" text doesn't get dropped.
Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d157fca711 ]
Remove the hack to assign the global console_port variable at probe time.
This assumption that cons->index is -1 is wrong for systems that specify
'console=' in the cmdline (or 'stdout-path' in dts). Hence, on such system
the actual console assignment is ignored, and the first UART that happens
to be probed is used as console instead.
Move the logic to console_setup() and map the console to the correct port
through the array of available ports instead.
Signed-off-by: Daniel Mack <daniel@zonque.org>
Link: https://lore.kernel.org/r/20210528133321.1859346-1-daniel@zonque.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3b3fc051cd ]
asking to advance by more than we have left in the iov_iter should
move to the very end; it should *not* leave negative i->count and
it should not spew into syslog, etc. - it's a legitimate operation.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0b7cbe811c ]
We have started to get a bunch of pointless dmamask not set warnings
that makes the output of dmesg -l err,warn hard to read with many
extra warnings:
cpcap-regulator cpcap-regulator.0: DMA mask not set
cpcap_adc cpcap_adc.0: DMA mask not set
cpcap_battery cpcap_battery.0: DMA mask not set
cpcap-charger cpcap-charger.0: DMA mask not set
cpcap-pwrbutton cpcap-pwrbutton.0: DMA mask not set
cpcap-led cpcap-led.0: DMA mask not set
cpcap-led cpcap-led.1: DMA mask not set
cpcap-led cpcap-led.2: DMA mask not set
cpcap-led cpcap-led.3: DMA mask not set
cpcap-led cpcap-led.4: DMA mask not set
cpcap-rtc cpcap-rtc.0: DMA mask not set
cpcap-usb-phy cpcap-usb-phy.0: DMA mask not set
This seems to have started with commit 4d8bde883b ("OF: Don't set
default coherent DMA mask"). We have the parent SPI controller use
DMA, while CPCAP driver and it's children do not. For audio, the
DMA is handled over I2S bus with the McBSP driver.
Cc: Carl Philipp Klemm <philipp@uvos.xyz>
Cc: Ivan Jelincic <parazyd@dyne.org>
Cc: Merlijn Wajer <merlijn@wizzup.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Sebastian Reichel <sre@kernel.org>
Cc: Sicelo A. Mhlongo <absicsz@gmail.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4700ef3265 ]
This patch adds/modifies MODULE_DEVICE_TABLE definition which generates
correct modalias for automatic loading of this driver when it is built
as an external module.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zou Wei <zou_wei@huawei.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2819b4ae28 ]
Drivers shouldn't be calling block/unblock session for tmf handling because
the functions can change the session state from under libiscsi.
iscsi_queuecommand's call to iscsi_prep_scsi_cmd_pdu->
iscsi_check_tmf_restrictions will prevent new cmds from being sent to qedi
after we've started handling a TMF. So we don't need to try and block it in
the driver, and we can remove these block calls.
Link: https://lore.kernel.org/r/20210525181821.7617-25-michael.christie@oracle.com
Reviewed-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2ce002366a ]
If the SCSI cmd completes after qedi_tmf_work calls iscsi_itt_to_task then
the qedi qedi_cmd->task_id could be freed and used for another cmd. If we
then call qedi_iscsi_cleanup_task with that task_id we will be cleaning up
the wrong cmd.
Wait to release the task_id until the last put has been done on the
iscsi_task. Because libiscsi grabs a ref to the task when sending the
abort, we know that for the non-abort timeout case that the task_id we are
referencing is for the cmd that was supposed to be aborted.
A latter commit will fix the case where the abort times out while we are
running qedi_tmf_work.
Link: https://lore.kernel.org/r/20210525181821.7617-21-michael.christie@oracle.com
Reviewed-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bdd4aad7ff ]
The iscsi offload drivers are setting the shost->max_id to the max number
of sessions they support. The problem is that max_id is not the max number
of targets but the highest identifier the targets can have. To use it to
limit the number of targets we need to set it to max sessions - 1, or we
can end up with a session we might not have preallocated resources for.
Link: https://lore.kernel.org/r/20210525181821.7617-15-michael.christie@oracle.com
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ec29d0ac29 ]
If we haven't done a unbind target call we can race where
iscsi_conn_teardown wakes up the EH thread and then frees the conn while
those threads are still accessing the conn ehwait.
We can only do one TMF per session so this just moves the TMF fields from
the conn to the session. We can then rely on the
iscsi_session_teardown->iscsi_remove_session->__iscsi_unbind_session call
to remove the target and it's devices, and know after that point there is
no device or scsi-ml callout trying to access the session.
Link: https://lore.kernel.org/r/20210525181821.7617-14-michael.christie@oracle.com
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ae6874ba4b ]
Consider the case where a VD is deleted and the targetID of that VD is
assigned to a newly created VD. If the sequence of deletion/addition of VD
happens very quickly there is a possibility that second event (VD add)
occurs even before the driver processes the first event (VD delete). As
event processing is done in deferred context the device list remains the
same (but targetID is re-used) so driver will not learn the VD
deletion/additon. I/Os meant for the older VD will be directed to new VD
which may lead to data corruption.
Make driver detect the deleted VD as soon as possible based on the RaidMap
update and block further I/O to that device.
Link: https://lore.kernel.org/r/20210528131307.25683-4-chandrakanth.patil@broadcom.com
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Chandrakanth Patil <chandrakanth.patil@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8793613de9 ]
The description for scsi_mode_sense() claims to return the number of valid
bytes on success, which is not what the code does. Additionally there is
no gain in returning the SCSI status, as everything the callers do is to
check against scsi_result_is_good(), which is what scsi_mode_sense() does
already. So change the calling convention to return a standard error code
on failure, and 0 on success, and adapt the description and all callers.
Link: https://lore.kernel.org/r/20210427083046.31620-4-hare@suse.de
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e30d55137e ]
An 'unexpected timeout' message may be seen in a point-2-point topology.
The message occurs when a PLOGI is received before the driver is notified
of FLOGI completion. The FLOGI completion failure causes discovery to be
triggered for a second time. The discovery timer is restarted but no new
discovery activity is initiated, thus the timeout message eventually
appears.
In point-2-point, when discovery has progressed before the FLOGI completion
is processed, it is not a failure. Add code to FLOGI completion to detect
that discovery has progressed and exit the FLOGI handling (noop'ing it).
Link: https://lore.kernel.org/r/20210514195559.119853-4-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5d6fb80a14 ]
This reverts commit 0edabdfe89.
I've explained that optional FireWire card for d.2 is also built-in to
d.2 Pro, however it's wrong. The optional card uses DM1000 ASIC and has
'Mackie DJ Mixer' in its model name of configuration ROM. On the other
hand, built-in FireWire card for d.2 Pro and d.4 Pro uses OXFW971 ASIC
and has 'd.Pro' in its model name according to manuals and user
experiences. The former card is not the card for d.2 Pro. They are similar
in appearance but different internally.
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Link: https://lore.kernel.org/r/20210518084557.102681-2-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bae3ce4942 ]
For improving readability, convert camelCase fields, variables and
functions to the plain names with underscore. Also align the macros
to be capital letters.
All done via sed, no functional changes.
Note that you'll still see many coding style issues even after this
patch; the fixes will follow.
Link: https://lore.kernel.org/r/20210517131545.27252-2-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 264da51243 ]
probe() error paths after runtime pm is enabled, should disable it.
remove() should not call pm_runtime_put_noidle() as there is no
matching get() to have raised the reference count. This case
has no affect a the runtime pm core protects against going negative.
Whilst here use pm_runtime_resume_and_get() to tidy things up a little.
coccicheck script didn't get this one due to complex code structure so
found by inspection.
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/20210509113354.660190-12-jic23@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1b1774998b ]
A simplification of get_unaligned() clashes with callers that pass
in a character pointer, causing a harmless warning like:
block/partitions/msdos.c: In function 'msdos_partition':
include/asm-generic/unaligned.h:13:22: warning: 'packed' attribute ignored for field of type 'u8' {aka 'unsigned char'} [-Wattributes]
Remove the SYS_IND() macro with the get_unaligned() call
and just use the ->ind field directly.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3ce3e45cc3 ]
There is an issue with the ASPM(optional) capability checking function.
A device might be attached to root complex directly, in this case,
bus->self(bridge) will be NULL, thus priv->parent_pdev is NULL.
Since alcor_pci_init_check_aspm(priv->parent_pdev) checks the PCI link's
ASPM capability and populate parent_cap_off, which will be used later by
alcor_pci_aspm_ctrl() to dynamically turn on/off device, what we can do
here is to avoid checking the capability if we are on the root complex.
This will make pdev_cap_off 0 and alcor_pci_aspm_ctrl() will simply
return when bring called, effectively disable ASPM for the device.
[ 1.246492] BUG: kernel NULL pointer dereference, address: 00000000000000c0
[ 1.248731] RIP: 0010:pci_read_config_byte+0x5/0x40
[ 1.253998] Call Trace:
[ 1.254131] ? alcor_pci_find_cap_offset.isra.0+0x3a/0x100 [alcor_pci]
[ 1.254476] alcor_pci_probe+0x169/0x2d5 [alcor_pci]
Co-developed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tong Zhang <ztong0001@gmail.com>
Link: https://lore.kernel.org/r/20210513040732.1310159-1-ztong0001@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7272b591c4 ]
In ibmasm_init_one, it calls ibmasm_init_remote_input_dev().
Inside ibmasm_init_remote_input_dev, mouse_dev and keybd_dev are
allocated by input_allocate_device(), and assigned to
sp->remote.mouse_dev and sp->remote.keybd_dev respectively.
In the err_free_devices error branch of ibmasm_init_one,
mouse_dev and keybd_dev are freed by input_free_device(), and return
error. Then the execution runs into error_send_message error branch
of ibmasm_init_one, where ibmasm_free_remote_input_dev(sp) is called
to unregister the freed sp->remote.mouse_dev and sp->remote.keybd_dev.
My patch add a "error_init_remote" label to handle the error of
ibmasm_init_remote_input_dev(), to avoid the uaf bugs.
Signed-off-by: Lv Yunlong <lyl2019@mail.ustc.edu.cn>
Link: https://lore.kernel.org/r/20210426170620.10546-1-lyl2019@mail.ustc.edu.cn
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f5b08386de ]
Our SoC's have always had a NS16650A UART core and older SoC's would
have a compatible string of: 'compatible = ""ns16550a"' and use the
8250_of driver. Our newer SoC's have added enhancements to the base
core to add support for DMA and accurate high speed baud rates and use
this newer 8250_bcm7271 driver. The Device Tree node for our enhanced
UARTs has a compatible string of: 'compatible = "brcm,bcm7271-uart",
"ns16550a"''. With both drivers running and the link order setup so
that the 8250_bcm7217 driver is initialized before the 8250_of driver,
we should bind the 8250_bcm7271 driver to the enhanced UART, or for
upstream kernels that don't have the 8250_bcm7271 driver, we bind to
the 8250_of driver.
The problem is that when both the 8250_of and 8250_bcm7271 drivers
were running, occasionally the 8250_of driver would be bound to the
enhanced UART instead of the 8250_bcm7271 driver. This was happening
because we use SCMI based clocks which come up late in initialization
and cause probe DEFER's when the two drivers get their clocks.
Occasionally the SCMI clock would become ready between the 8250_bcm7271
probe and the 8250_of probe and the 8250_of driver would be bound. To
fix this we decided to config only our 8250_bcm7271 driver and added
"ns16665a0" to the compatible string so the driver would work on our
older system.
This commit has of_platform_serial_probe() check specifically for the
"brcm,bcm7271-uart" and whether its companion driver is enabled. If it
is the case, and the clock provider is not ready, we want to make sure
that when the 8250_bcm7271.c driver returns EPROBE_DEFER, we are not
getting the UART registered via 8250_of.c.
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Jim Quinlan <jim2101024@gmail.com>
Signed-off-by: Al Cooper <alcooperx@gmail.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20210423183206.3917725-1-f.fainelli@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8cac2f6eb8 ]
SYSRQ doesn't work with DMA. This is because there is no error
indication whether a symbol had a framing error or not. Actually,
this is not completely correct, there is a bit in the data register
which is set in this case, but we'd have to read change the DMA access
to 16 bit and we'd need to post process the data, thus make the DMA
pointless in the first place.
Signed-off-by: Michael Walle <michael@walle.cc>
Link: https://lore.kernel.org/r/20210512141255.18277-10-michael@walle.cc
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fcb10ee27f ]
We should be very careful about the register values that will be used
for division or modulo operations, althrough the possibility that the
UARTBAUD register value is zero is very low, but we had better to deal
with the "bad data" of hardware in advance to avoid division or modulo
by zero leading to undefined kernel behavior.
Signed-off-by: Sherry Sun <sherry.sun@nxp.com>
Link: https://lore.kernel.org/r/20210427021226.27468-1-sherry.sun@nxp.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b50bb8ba36 ]
If a device lost sync and can no longer ACK a command, it may not be
able to enter a lower-power state but it will still be able to resync
when the clock restarts. In those cases, we want to continue with the
clock stop sequence.
This patch modifies the behavior during clock stop sequences to only
log errors unrelated to -ENODATA/Command_Ignored. The flow is also
modified so that loops continue to prepare/deprepare other devices
even when one seems to have lost sync.
When resuming the clocks, all issues are logged with a dev_warn(),
previously only some of them were checked. This is the only part that
now differs between the clock stop entry and clock stop exit
sequences: while we don't want to stop the suspend flow, we do want
information on potential issues while resuming, as they may have
ripple effects.
For consistency the log messages are also modified to be unique and
self-explanatory. Errors in sdw_slave_clk_stop_callback() were
removed, they are now handled in the caller.
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
Reviewed-by: Rander Wang <rander.wang@intel.com>
Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Link: https://lore.kernel.org/r/20210511030048.25622-4-yung-chuan.liao@linux.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 345e9f5ca7 ]
Existing devices and implementations only support the required
CLOCK_STOP_MODE0. All the code related to CLOCK_STOP_MODE1 has not
been tested and is highly questionable, with a clear confusion between
CLOCK_STOP_MODE1 and the simple clock stop state machine.
This patch removes all usages of CLOCK_STOP_MODE1 - which has no
impact on any solution - and fixes the use of the simple clock stop
state machine. The resulting code should be a lot more symmetrical and
easier to maintain.
Note that CLOCK_STOP_MODE1 is not supported in the SoundWire Device
Class specification so it's rather unlikely that we need to re-add
this mode later.
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
Reviewed-by: Rander Wang <rander.wang@intel.com>
Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Link: https://lore.kernel.org/r/20210511030048.25622-2-yung-chuan.liao@linux.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3066820034 ]
If another lockdep report runs concurrently with an RCU lockdep report
from RCU_LOCKDEP_WARN(), the following sequence of events can occur:
1. debug_lockdep_rcu_enabled() sees that lockdep is enabled
when called from (say) synchronize_rcu().
2. Lockdep is disabled by a concurrent lockdep report.
3. debug_lockdep_rcu_enabled() evaluates its lockdep-expression
argument, for example, lock_is_held(&rcu_bh_lock_map).
4. Because lockdep is now disabled, lock_is_held() plays it safe and
returns the constant 1.
5. But in this case, the constant 1 is not safe, because invoking
synchronize_rcu() under rcu_read_lock_bh() is disallowed.
6. debug_lockdep_rcu_enabled() wrongly invokes lockdep_rcu_suspicious(),
resulting in a false-positive splat.
This commit therefore changes RCU_LOCKDEP_WARN() to check
debug_lockdep_rcu_enabled() after checking the lockdep expression,
so that any "safe" returns from lock_is_held() are rejected by
debug_lockdep_rcu_enabled(). This requires memory ordering, which is
supplied by READ_ONCE(debug_locks). The resulting volatile accesses
prevent the compiler from reordering and the fact that only one variable
is being accessed prevents the underlying hardware from reordering.
The combination works for IA64, which can reorder reads to the same
location, but this is defeated by the volatile accesses, which compile
to load instructions that provide ordering.
Reported-by: syzbot+dde0cc33951735441301@syzkaller.appspotmail.com
Reported-by: Matthew Wilcox <willy@infradead.org>
Reported-by: syzbot+88e4f02896967fe1ab0d@syzkaller.appspotmail.com
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Suggested-by: Boqun Feng <boqun.feng@gmail.com>
Reviewed-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b5befe842e ]
An srcu_struct structure that is initialized before rcu_init_geometry()
will have its srcu_node hierarchy based on CONFIG_NR_CPUS. Once
rcu_init_geometry() is called, this hierarchy is compressed as needed
for the actual maximum number of CPUs for this system.
Later on, that srcu_struct structure is confused, sometimes referring
to its initial CONFIG_NR_CPUS-based hierarchy, and sometimes instead
to the new num_possible_cpus() hierarchy. For example, each of its
->mynode fields continues to reference the original leaf rcu_node
structures, some of which might no longer exist. On the other hand,
srcu_for_each_node_breadth_first() traverses to the new node hierarchy.
There are at least two bad possible outcomes to this:
1) a) A callback enqueued early on an srcu_data structure (call it
*sdp) is recorded pending on sdp->mynode->srcu_data_have_cbs in
srcu_funnel_gp_start() with sdp->mynode pointing to a deep leaf
(say 3 levels).
b) The grace period ends after rcu_init_geometry() shrinks the
nodes level to a single one. srcu_gp_end() walks through the new
srcu_node hierarchy without ever reaching the old leaves so the
callback is never executed.
This is easily reproduced on an 8 CPUs machine with CONFIG_NR_CPUS >= 32
and "rcupdate.rcu_self_test=1". The srcu_barrier() after early tests
verification never completes and the boot hangs:
[ 5413.141029] INFO: task swapper/0:1 blocked for more than 4915 seconds.
[ 5413.147564] Not tainted 5.12.0-rc4+ #28
[ 5413.151927] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 5413.159753] task:swapper/0 state:D stack: 0 pid: 1 ppid: 0 flags:0x00004000
[ 5413.168099] Call Trace:
[ 5413.170555] __schedule+0x36c/0x930
[ 5413.174057] ? wait_for_completion+0x88/0x110
[ 5413.178423] schedule+0x46/0xf0
[ 5413.181575] schedule_timeout+0x284/0x380
[ 5413.185591] ? wait_for_completion+0x88/0x110
[ 5413.189957] ? mark_held_locks+0x61/0x80
[ 5413.193882] ? mark_held_locks+0x61/0x80
[ 5413.197809] ? _raw_spin_unlock_irq+0x24/0x50
[ 5413.202173] ? wait_for_completion+0x88/0x110
[ 5413.206535] wait_for_completion+0xb4/0x110
[ 5413.210724] ? srcu_torture_stats_print+0x110/0x110
[ 5413.215610] srcu_barrier+0x187/0x200
[ 5413.219277] ? rcu_tasks_verify_self_tests+0x50/0x50
[ 5413.224244] ? rdinit_setup+0x2b/0x2b
[ 5413.227907] rcu_verify_early_boot_tests+0x2d/0x40
[ 5413.232700] do_one_initcall+0x63/0x310
[ 5413.236541] ? rdinit_setup+0x2b/0x2b
[ 5413.240207] ? rcu_read_lock_sched_held+0x52/0x80
[ 5413.244912] kernel_init_freeable+0x253/0x28f
[ 5413.249273] ? rest_init+0x250/0x250
[ 5413.252846] kernel_init+0xa/0x110
[ 5413.256257] ret_from_fork+0x22/0x30
2) An srcu_struct structure that is initialized before rcu_init_geometry()
and used afterward will always have stale rdp->mynode references,
resulting in callbacks to be missed in srcu_gp_end(), just like in
the previous scenario.
This commit therefore causes init_srcu_struct_nodes to initialize the
geometry, if needed. This ensures that the srcu_node hierarchy is
properly built and distributed from the get-go.
Suggested-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Uladzislau Rezki <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ee522bcf02 ]
After device_get_match_data(), tlc591xx is not checked, add
check for it and also check np after dev_of_node.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 000b7287b6 upstream.
When an MRD advertisement is received on a bridge port with multicast
snooping enabled, we mark it as a router port automatically, that
includes adding that port to the router port list. The multicast lock
protects that list, but it is not acquired in the MRD advertisement case
leading to a race condition, we need to take it to fix the race.
Cc: stable@vger.kernel.org
Cc: linus.luessing@c0d3.blue
Fixes: 4b3087c7e3 ("bridge: Snoop Multicast Router Advertisements")
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 04bef83a33 upstream.
When a PIM hello packet is received on a bridge port with multicast
snooping enabled, we mark it as a router port automatically, that
includes adding that port the router port list. The multicast lock
protects that list, but it is not acquired in the PIM message case
leading to a race condition, we need to take it to fix the race.
Cc: stable@vger.kernel.org
Fixes: 91b02d3d13 ("bridge: mcast: add router port on PIM hello message")
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3769e4c0af upstream.
[Why]
After unplug/hotplug hub from the system, userspace might start to
clear stale payloads gradually. If we call drm_dp_mst_deallocate_vcpi()
to release stale VCPI of those ports which are not relating to current
topology, we have chane to wrongly clear active payload table entry for
current topology.
E.g.
We have allocated VCPI 1 in current payload table and we call
drm_dp_mst_deallocate_vcpi() to clean VCPI 1 in stale topology. In
drm_dp_mst_deallocate_vcpi(), it will call drm_dp_mst_put_payload_id()
tp put VCPI 1 and which means ID 1 is available again. Thereafter, if we
want to allocate a new payload stream, it will find ID 1 is available by
drm_dp_mst_assign_payload_id(). However, ID 1 is being used
[How]
Check target sink is relating to current topology or not before doing
any payload table update.
Searching upward to find the target sink's relevant root branch device.
If the found root branch device is not the same root of current
topology, don't update payload table.
Changes since v1:
* Change debug macro to use drm_dbg_kms() instead
* Amend the commit message to add Cc tag.
Signed-off-by: Wayne Lin <Wayne.Lin@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Lyude Paul <lyude@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210616035501.3776-3-Wayne.Lin@amd.com
Reviewed-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 35d3e8cb35 upstream.
[Why]
When we receive CSN message to notify one port is disconnected, we will
implicitly set its corresponding num_slots to 0. Later on, we will
eventually call drm_dp_update_payload_part1() to arrange down streams.
In drm_dp_update_payload_part1(), we iterate over all proposed_vcpis[]
to do the update. Not specific to a target sink only. For example, if we
light up 2 monitors, Monitor_A and Monitor_B, and then we unplug
Monitor_B. Later on, when we call drm_dp_update_payload_part1() to try
to update payload for Monitor_A, we'll also implicitly clean payload for
Monitor_B at the same time. And finally, when we try to call
drm_dp_update_payload_part1() to clean payload for Monitor_B, we will do
nothing at this time since payload for Monitor_B has been cleaned up
previously.
For StarTech 1to3 DP hub, it seems like if we didn't update DPCD payload
ID table then polling for "ACT Handled"(BIT_1 of DPCD 002C0h) will fail
and this polling will last for 3 seconds.
Therefore, guess the best way is we don't set the proposed_vcpi[]
diretly. Let user of these herlper functions to set the proposed_vcpi
directly.
[How]
1. Revert commit 7617e9621b ("drm/dp_mst: clear time slots for ports
invalid")
2. Tackle the issue in previous commit by skipping those trasient
proposed VCPIs. These stale VCPIs shoulde be explicitly cleared by
user later on.
Changes since v1:
* Change debug macro to use drm_dbg_kms() instead
* Amend the commit message to add Fixed & Cc tags
Signed-off-by: Wayne Lin <Wayne.Lin@amd.com>
Fixes: 7617e9621b ("drm/dp_mst: clear time slots for ports invalid")
Cc: Lyude Paul <lyude@redhat.com>
Cc: Wayne Lin <Wayne.Lin@amd.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: dri-devel@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v5.5+
Signed-off-by: Lyude Paul <lyude@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210616035501.3776-2-Wayne.Lin@amd.com
Reviewed-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ea32af47f0 upstream.
When syncing the log, if we fail to allocate the root node for the log
root tree:
1) We are unlocking fs_info->tree_log_mutex, but at this point we have
not yet locked this mutex;
2) We have locked fs_info->tree_root->log_mutex, but we end up not
unlocking it;
So fix this by unlocking fs_info->tree_root->log_mutex instead of
fs_info->tree_log_mutex.
Fixes: e75f9fd194 ("btrfs: zoned: move log tree node allocation out of log_root_tree->log_mutex")
CC: stable@vger.kernel.org # 5.13+
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9cc0b837e1 upstream.
If we can't acquire the reclaim_bgs_lock on block group reclaim, we
block until it is free. This can potentially stall for a long time.
While reclaim of block groups is necessary for a good user experience on
a zoned file system, there still is no need to block as it is best
effort only, just like when we're deleting unused block groups.
CC: stable@vger.kernel.org # 5.13
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 79bd37120b upstream.
Commit eafa4fd0ad ("btrfs: fix exhaustion of the system chunk array
due to concurrent allocations") fixed a problem that resulted in
exhausting the system chunk array in the superblock when there are many
tasks allocating chunks in parallel. Basically too many tasks enter the
first phase of chunk allocation without previous tasks having finished
their second phase of allocation, resulting in too many system chunks
being allocated. That was originally observed when running the fallocate
tests of stress-ng on a PowerPC machine, using a node size of 64K.
However that commit also introduced a deadlock where a task in phase 1 of
the chunk allocation waited for another task that had allocated a system
chunk to finish its phase 2, but that other task was waiting on an extent
buffer lock held by the first task, therefore resulting in both tasks not
making any progress. That change was later reverted by a patch with the
subject "btrfs: fix deadlock with concurrent chunk allocations involving
system chunks", since there is no simple and short solution to address it
and the deadlock is relatively easy to trigger on zoned filesystems, while
the system chunk array exhaustion is not so common.
This change reworks the chunk allocation to avoid the system chunk array
exhaustion. It accomplishes that by making the first phase of chunk
allocation do the updates of the device items in the chunk btree and the
insertion of the new chunk item in the chunk btree. This is done while
under the protection of the chunk mutex (fs_info->chunk_mutex), in the
same critical section that checks for available system space, allocates
a new system chunk if needed and reserves system chunk space. This way
we do not have chunk space reserved until the second phase completes.
The same logic is applied to chunk removal as well, since it keeps
reserved system space long after it is done updating the chunk btree.
For direct allocation of system chunks, the previous behaviour remains,
because otherwise we would deadlock on extent buffers of the chunk btree.
Changes to the chunk btree are by large done by chunk allocation and chunk
removal, which first reserve chunk system space and then later do changes
to the chunk btree. The other remaining cases are uncommon and correspond
to adding a device, removing a device and resizing a device. All these
other cases do not pre-reserve system space, they modify the chunk btree
right away, so they don't hold reserved space for a long period like chunk
allocation and chunk removal do.
The diff of this change is huge, but more than half of it is just addition
of comments describing both how things work regarding chunk allocation and
removal, including both the new behavior and the parts of the old behavior
that did not change.
CC: stable@vger.kernel.org # 5.12+
Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Tested-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Tested-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1cb3db1cf3 upstream.
When a task attempting to allocate a new chunk verifies that there is not
currently enough free space in the system space_info and there is another
task that allocated a new system chunk but it did not finish yet the
creation of the respective block group, it waits for that other task to
finish creating the block group. This is to avoid exhaustion of the system
chunk array in the superblock, which is limited, when we have a thundering
herd of tasks allocating new chunks. This problem was described and fixed
by commit eafa4fd0ad ("btrfs: fix exhaustion of the system chunk array
due to concurrent allocations").
However there are two very similar scenarios where this can lead to a
deadlock:
1) Task B allocated a new system chunk and task A is waiting on task B
to finish creation of the respective system block group. However before
task B ends its transaction handle and finishes the creation of the
system block group, it attempts to allocate another chunk (like a data
chunk for an fallocate operation for a very large range). Task B will
be unable to progress and allocate the new chunk, because task A set
space_info->chunk_alloc to 1 and therefore it loops at
btrfs_chunk_alloc() waiting for task A to finish its chunk allocation
and set space_info->chunk_alloc to 0, but task A is waiting on task B
to finish creation of the new system block group, therefore resulting
in a deadlock;
2) Task B allocated a new system chunk and task A is waiting on task B to
finish creation of the respective system block group. By the time that
task B enter the final phase of block group allocation, which happens
at btrfs_create_pending_block_groups(), when it modifies the extent
tree, the device tree or the chunk tree to insert the items for some
new block group, it needs to allocate a new chunk, so it ends up at
btrfs_chunk_alloc() and keeps looping there because task A has set
space_info->chunk_alloc to 1, but task A is waiting for task B to
finish creation of the new system block group and release the reserved
system space, therefore resulting in a deadlock.
In short, the problem is if a task B needs to allocate a new chunk after
it previously allocated a new system chunk and if another task A is
currently waiting for task B to complete the allocation of the new system
chunk.
Unfortunately this deadlock scenario introduced by the previous fix for
the system chunk array exhaustion problem does not have a simple and short
fix, and requires a big change to rework the chunk allocation code so that
chunk btree updates are all made in the first phase of chunk allocation.
And since this deadlock regression is being frequently hit on zoned
filesystems and the system chunk array exhaustion problem is triggered
in more extreme cases (originally observed on PowerPC with a node size
of 64K when running the fallocate tests from stress-ng), revert the
changes from that commit. The next patch in the series, with a subject
of "btrfs: rework chunk allocation to avoid exhaustion of the system
chunk array" does the necessary changes to fix the system chunk array
exhaustion problem.
Reported-by: Naohiro Aota <naohiro.aota@wdc.com>
Link: https://lore.kernel.org/linux-btrfs/20210621015922.ewgbffxuawia7liz@naota-xeon/
Fixes: eafa4fd0ad ("btrfs: fix exhaustion of the system chunk array due to concurrent allocations")
CC: stable@vger.kernel.org # 5.12+
Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Tested-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Tested-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 54afaae34e upstream.
The types in calculation of the used percentage in the reclaiming
messages are both u64, though bg->length is either 1GiB (non-zoned) or
the zone size in the zoned mode. The upper limit on zone size is 8GiB so
this could theoretically overflow in the future, right now the values
fit.
Fixes: 18bb8bbf13 ("btrfs: zoned: automatically reclaim zones")
CC: stable@vger.kernel.org # 5.13
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit abb99cfdaf upstream.
Damien reported a test failure with btrfs/209. The test itself ran fine,
but the fsck ran afterwards reported a corrupted filesystem.
The filesystem corruption happens because we're splitting an extent and
then writing the extent twice. We have to split the extent though, because
we're creating too large extents for a REQ_OP_ZONE_APPEND operation.
When dumping the extent tree, we can see two EXTENT_ITEMs at the same
start address but different lengths.
$ btrfs inspect dump-tree /dev/nullb1 -t extent
...
item 19 key (269484032 EXTENT_ITEM 126976) itemoff 15470 itemsize 53
refs 1 gen 7 flags DATA
extent data backref root FS_TREE objectid 257 offset 786432 count 1
item 20 key (269484032 EXTENT_ITEM 262144) itemoff 15417 itemsize 53
refs 1 gen 7 flags DATA
extent data backref root FS_TREE objectid 257 offset 786432 count 1
The duplicated EXTENT_ITEMs originally come from wrongly split extent_map in
extract_ordered_extent(). Since extract_ordered_extent() uses
create_io_em() to split an existing extent_map, we will have
split->orig_start != split->start. Then, it will be logged with non-zero
"extent data offset". Finally, the logged entries are replayed into
a duplicated EXTENT_ITEM.
Introduce and use proper splitting function for extent_map. The function is
intended to be simple and specific usage for extract_ordered_extent() e.g.
not supporting compression case (we do not allow splitting compressed
extent_map anyway).
There was a question raised by Qu, in summary why we want to split the
extent map (and not the bio):
The problem is not the limit on the zone end, which as you mention is
the same as the block group end. The problem is that data write use zone
append (ZA) operations. ZA BIOs cannot be split so a large extent may
need to be processed with multiple ZA BIOs, While that is also true for
regular writes, the major difference is that ZA are "nameless" write
operation giving back the written sectors on completion. And ZA
operations may be reordered by the block layer (not intentionally
though). Combine both of these characteristics and you can see that the
data for a large extent may end up being shuffled when written resulting
in data corruption and the impossibility to map the extent to some start
sector.
To avoid this problem, zoned btrfs uses the principle "one data extent
== one ZA BIO". So large extents need to be split. This is unfortunate,
but we can revisit this later and optimize, e.g. merge back together the
fragments of an extent once written if they actually were written
sequentially in the zone.
Reported-by: Damien Le Moal <damien.lemoal@wdc.com>
Fixes: d22002fd37 ("btrfs: zoned: split ordered extent when bio is sent")
CC: stable@vger.kernel.org # 5.12+
CC: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 295cf15623 upstream.
Al reminds us that the usercopy API must only return complete failure
if absolutely nothing could be copied. Currently, if userspace does
something silly like giving us an unaligned pointer to Device memory,
or a size which overruns MTE tag bounds, we may fail to honour that
requirement when faulting on a multi-byte access even though a smaller
access could have succeeded.
Add a mitigation to the fixup routines to fall back to a single-byte
copy if we faulted on a larger access before anything has been written
to the destination, to guarantee making *some* forward progress. We
needn't be too concerned about the overall performance since this should
only occur when callers are doing something a bit dodgy in the first
place. Particularly broken userspace might still be able to trick
generic_perform_write() into an infinite loop by targeting write() at
an mmap() of some read-only device register where the fault-in load
succeeds but any store synchronously aborts such that copy_to_user() is
genuinely unable to make progress, but, well, don't do that...
CC: stable@vger.kernel.org
Reported-by: Chen Huang <chenhuang5@huawei.com>
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/dc03d5c675731a1f24a62417dba5429ad744234e.1626098433.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3b0462726e upstream.
The following sequence can be used to trigger a UAF:
int fscontext_fd = fsopen("cgroup");
int fd_null = open("/dev/null, O_RDONLY);
int fsconfig(fscontext_fd, FSCONFIG_SET_FD, "source", fd_null);
close_range(3, ~0U, 0);
The cgroup v1 specific fs parser expects a string for the "source"
parameter. However, it is perfectly legitimate to e.g. specify a file
descriptor for the "source" parameter. The fs parser doesn't know what
a filesystem allows there. So it's a bug to assume that "source" is
always of type fs_value_is_string when it can reasonably also be
fs_value_is_file.
This assumption in the cgroup code causes a UAF because struct
fs_parameter uses a union for the actual value. Access to that union is
guarded by the param->type member. Since the cgroup paramter parser
didn't check param->type but unconditionally moved param->string into
fc->source a close on the fscontext_fd would trigger a UAF during
put_fs_context() which frees fc->source thereby freeing the file stashed
in param->file causing a UAF during a close of the fd_null.
Fix this by verifying that param->type is actually a string and report
an error if not.
In follow up patches I'll add a new generic helper that can be used here
and by other filesystems instead of this error-prone copy-pasta fix.
But fixing it in here first makes backporting a it to stable a lot
easier.
Fixes: 8d2451f499 ("cgroup1: switch to option-by-option parsing")
Reported-by: syzbot+283ce5a46486d6acdbaf@syzkaller.appspotmail.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: <stable@kernel.org>
Cc: syzkaller-bugs <syzkaller-bugs@googlegroups.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2feeb52859 upstream.
The conversion to ww mutexes failed to address the fence code which
already returns -EDEADLK when we run out of fences. Ww mutexes on
the other hand treat -EDEADLK as an internal errno value indicating
a need to restart the operation due to a deadlock. So now when the
fence code returns -EDEADLK the higher level code erroneously
restarts everything instead of returning the error to userspace
as is expected.
To remedy this let's switch the fence code to use a different errno
value for this. -ENOBUFS seems like a semi-reasonable unique choice.
Apart from igt the only user of this I could find is sna, and even
there all we do is dump the current fence registers from debugfs
into the X server log. So no user visible functionality is affected.
If we really cared about preserving this we could of course convert
back to -EDEADLK higher up, but doesn't seem like that's worth
the hassle here.
Not quite sure which commit specifically broke this, but I'll
just attribute it to the general gem ww mutex work.
Cc: stable@vger.kernel.org
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Thomas Hellström <thomas.hellstrom@intel.com>
Testcase: igt/gem_pread/exhaustion
Testcase: igt/gem_pwrite/basic-exhaustion
Testcase: igt/gem_fenced_exec_thrash/too-many-fences
Fixes: 80f0b679d6 ("drm/i915: Add an implementation for i915_gem_ww_ctx locking, v2.")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210630164413.25481-1-ville.syrjala@linux.intel.com
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
(cherry picked from commit 78d2ad7eb4)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0abb33bfca upstream.
We skip filling out the pt with scratch entries if the va range covers
the entire pt, since we later have to fill it with the PTEs for the
object pages anyway. However this might leave open a small window where
the PTEs don't point to anything valid for the HW to consume.
When for example using 2M GTT pages this fill_px() showed up as being
quite significant in perf measurements, and ends up being completely
wasted since we ignore the pt and just use the pde directly.
Anyway, currently we have our PTE construction split between alloc and
insert, which is probably slightly iffy nowadays, since the alloc
doesn't actually allocate anything anymore, instead it just sets up the
page directories and points the PTEs at the scratch page. Later when we
do the insert step we re-program the PTEs again. Better might be to
squash the alloc and insert into a single step, then bringing back this
optimisation(along with some others) should be possible.
Fixes: 1482667324 ("drm/i915: Only initialize partially filled pagetables")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Chris Wilson <chris.p.wilson@intel.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: <stable@vger.kernel.org> # v4.15+
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210713130431.2392740-1-matthew.auld@intel.com
(cherry picked from commit 8f88ca76b3)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8b3bdd99c0 upstream.
On remote cable pull, a zfcp_port keeps its status and only gets
ZFCP_STATUS_PORT_LINK_TEST added. Only after an ADISC timeout, we would
actually start port recovery and remove ZFCP_STATUS_COMMON_UNBLOCKED which
zfcp_sysfs_port_fc_security_show() detected and reported as "unknown"
instead of the old and possibly stale zfcp_port->connection_info.
Add check for ZFCP_STATUS_PORT_LINK_TEST for timely "unknown" report.
Link: https://lore.kernel.org/r/20210702160922.2667874-1-maier@linux.ibm.com
Fixes: a17c784600 ("scsi: zfcp: report FC Endpoint Security in sysfs")
Cc: <stable@vger.kernel.org> #5.7+
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 93aa71ad73 upstream.
Commit 66a834d092 ("scsi: core: Fix error handling of scsi_host_alloc()")
changed the allocation logic to call put_device() to perform host cleanup
with the assumption that IDA removal and stopping the kthread would
properly be performed in scsi_host_dev_release(). However, in the unlikely
case that the error handler thread fails to spawn, shost->ehandler is set
to ERR_PTR(-ENOMEM).
The error handler cleanup code in scsi_host_dev_release() will call
kthread_stop() if shost->ehandler != NULL which will always be the case
whether the kthread was successfully spawned or not. In the case that it
failed to spawn this has the nasty side effect of trying to dereference an
invalid pointer when kthread_stop() is called. The following splat provides
an example of this behavior in the wild:
scsi host11: error handler thread failed to spawn, error = -4
Kernel attempted to read user page (10c) - exploit attempt? (uid: 0)
BUG: Kernel NULL pointer dereference on read at 0x0000010c
Faulting instruction address: 0xc00000000818e9a8
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
Modules linked in: ibmvscsi(+) scsi_transport_srp dm_multipath dm_mirror dm_region
hash dm_log dm_mod fuse overlay squashfs loop
CPU: 12 PID: 274 Comm: systemd-udevd Not tainted 5.13.0-rc7 #1
NIP: c00000000818e9a8 LR: c0000000089846e8 CTR: 0000000000007ee8
REGS: c000000037d12ea0 TRAP: 0300 Not tainted (5.13.0-rc7)
MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 28228228
XER: 20040001
CFAR: c0000000089846e4 DAR: 000000000000010c DSISR: 40000000 IRQMASK: 0
GPR00: c0000000089846e8 c000000037d13140 c000000009cc1100 fffffffffffffffc
GPR04: 0000000000000001 0000000000000000 0000000000000000 c000000037dc0000
GPR08: 0000000000000000 c000000037dc0000 0000000000000001 00000000fffff7ff
GPR12: 0000000000008000 c00000000a049000 c000000037d13d00 000000011134d5a0
GPR16: 0000000000001740 c0080000190d0000 c0080000190d1740 c000000009129288
GPR20: c000000037d13bc0 0000000000000001 c000000037d13bc0 c0080000190b7898
GPR24: c0080000190b7708 0000000000000000 c000000033bb2c48 0000000000000000
GPR28: c000000046b28280 0000000000000000 000000000000010c fffffffffffffffc
NIP [c00000000818e9a8] kthread_stop+0x38/0x230
LR [c0000000089846e8] scsi_host_dev_release+0x98/0x160
Call Trace:
[c000000033bb2c48] 0xc000000033bb2c48 (unreliable)
[c0000000089846e8] scsi_host_dev_release+0x98/0x160
[c00000000891e960] device_release+0x60/0x100
[c0000000087e55c4] kobject_release+0x84/0x210
[c00000000891ec78] put_device+0x28/0x40
[c000000008984ea4] scsi_host_alloc+0x314/0x430
[c0080000190b38bc] ibmvscsi_probe+0x54/0xad0 [ibmvscsi]
[c000000008110104] vio_bus_probe+0xa4/0x4b0
[c00000000892a860] really_probe+0x140/0x680
[c00000000892aefc] driver_probe_device+0x15c/0x200
[c00000000892b63c] device_driver_attach+0xcc/0xe0
[c00000000892b740] __driver_attach+0xf0/0x200
[c000000008926f28] bus_for_each_dev+0xa8/0x130
[c000000008929ce4] driver_attach+0x34/0x50
[c000000008928fc0] bus_add_driver+0x1b0/0x300
[c00000000892c798] driver_register+0x98/0x1a0
[c00000000810eb60] __vio_register_driver+0x80/0xe0
[c0080000190b4a30] ibmvscsi_module_init+0x9c/0xdc [ibmvscsi]
[c0000000080121d0] do_one_initcall+0x60/0x2d0
[c000000008261abc] do_init_module+0x7c/0x320
[c000000008265700] load_module+0x2350/0x25b0
[c000000008265cb4] __do_sys_finit_module+0xd4/0x160
[c000000008031110] system_call_exception+0x150/0x2d0
[c00000000800d35c] system_call_common+0xec/0x278
Fix this be nulling shost->ehandler when the kthread fails to spawn.
Link: https://lore.kernel.org/r/20210701195659.3185475-1-tyreld@linux.ibm.com
Fixes: 66a834d092 ("scsi: core: Fix error handling of scsi_host_alloc()")
Cc: stable@vger.kernel.org
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 896707c212 upstream.
Kernel never sends real INIT even to CPUs, other than on boot.
Thus INIT interception is an error which should be caught
by a check for an unknown VMexit reason.
On top of that, the current INIT VM exit handler skips
the current instruction which is wrong.
That was added in commit 5ff3a351f6 ("KVM: x86: Move trivial
instruction-based exit handlers to common code").
Fixes: 5ff3a351f6 ("KVM: x86: Move trivial instruction-based exit handlers to common code")
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210707125100.677203-3-mlevitsk@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 991afbbee8 upstream.
Commit 5ff3a351f6 ("KVM: x86: Move trivial instruction-based
exit handlers to common code"), unfortunately made a mistake of
treating nop_on_interception and nop_interception in the same way.
Former does truly nothing while the latter skips the instruction.
SMI VM exit handler should do nothing.
(SMI itself is handled by the host when we do STGI)
Fixes: 5ff3a351f6 ("KVM: x86: Move trivial instruction-based exit handlers to common code")
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210707125100.677203-2-mlevitsk@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f85d401606 upstream.
When the host is using debug registers but the guest is not using them
nor is the guest in guest-debug state, the kvm code does not reset
the host debug registers before kvm_x86->run(). Rather, it relies on
the hardware vmentry instruction to automatically reset the dr7 registers
which ensures that the host breakpoints do not affect the guest.
This however violates the non-instrumentable nature around VM entry
and exit; for example, when a host breakpoint is set on vcpu->arch.cr2,
Another issue is consistency. When the guest debug registers are active,
the host breakpoints are reset before kvm_x86->run(). But when the
guest debug registers are inactive, the host breakpoints are delayed to
be disabled. The host tracing tools may see different results depending
on what the guest is doing.
To fix the problems, we clear %db7 unconditionally before kvm_x86->run()
if the host has set any breakpoints, no matter if the guest is using
them or not.
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20210628172632.81029-1-jiangshanlai@gmail.com>
Cc: stable@vger.kernel.org
[Only clear %db7 instead of reloading all debug registers. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fce7e152ff upstream.
APM states that #GP is raised upon write to MSR_VM_HSAVE_PA when
the supplied address is not page-aligned or is outside of "maximum
supported physical address for this implementation".
page_address_valid() check seems suitable. Also, forcefully page-align
the address when it's written from VMM.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210628104425.391276-2-vkuznets@redhat.com>
Cc: stable@vger.kernel.org
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
[Add comment about behavior for host-provided values. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 76ff371b67 upstream.
Don't clear the C-bit in the #NPF handler, as it is a legal GPA bit for
non-SEV guests, and for SEV guests the C-bit is dropped before the GPA
hits the NPT in hardware. Clearing the bit for non-SEV guests causes KVM
to mishandle #NPFs with that collide with the host's C-bit.
Although the APM doesn't explicitly state that the C-bit is not reserved
for non-SEV, Tom Lendacky confirmed that the following snippet about the
effective reduction due to the C-bit does indeed apply only to SEV guests.
Note that because guest physical addresses are always translated
through the nested page tables, the size of the guest physical address
space is not impacted by any physical address space reduction indicated
in CPUID 8000_001F[EBX]. If the C-bit is a physical address bit however,
the guest physical address space is effectively reduced by 1 bit.
And for SEV guests, the APM clearly states that the bit is dropped before
walking the nested page tables.
If the C-bit is an address bit, this bit is masked from the guest
physical address when it is translated through the nested page tables.
Consequently, the hypervisor does not need to be aware of which pages
the guest has chosen to mark private.
Note, the bogus C-bit clearing was removed from legacy #PF handler in
commit 6d1b867d04 ("KVM: SVM: Don't strip the C-bit from CR2 on #PF
interception").
Fixes: 0ede79e132 ("KVM: SVM: Clear C-bit from the page fault address")
Cc: Peter Gonda <pgonda@google.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210625020354.431829-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fc9bf2e087 upstream.
Ignore "dynamic" host adjustments to the physical address mask when
generating the masks for guest PTEs, i.e. the guest PA masks. The host
physical address space and guest physical address space are two different
beasts, e.g. even though SEV's C-bit is the same bit location for both
host and guest, disabling SME in the host (which clears shadow_me_mask)
does not affect the guest PTE->GPA "translation".
For non-SEV guests, not dropping bits is the correct behavior. Assuming
KVM and userspace correctly enumerate/configure guest MAXPHYADDR, bits
that are lost as collateral damage from memory encryption are treated as
reserved bits, i.e. KVM will never get to the point where it attempts to
generate a gfn using the affected bits. And if userspace wants to create
a bogus vCPU, then userspace gets to deal with the fallout of hardware
doing odd things with bad GPAs.
For SEV guests, not dropping the C-bit is technically wrong, but it's a
moot point because KVM can't read SEV guest's page tables in any case
since they're always encrypted. Not to mention that the current KVM code
is also broken since sme_me_mask does not have to be non-zero for SEV to
be supported by KVM. The proper fix would be to teach all of KVM to
correctly handle guest private memory, but that's a task for the future.
Fixes: d0ec49d4de ("kvm/x86/svm: Support Secure Memory Encryption within KVM")
Cc: stable@vger.kernel.org
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210623230552.4027702-5-seanjc@google.com>
[Use a new header instead of adding header guards to paging_tmpl.h. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e39f00f60e upstream.
Use boot_cpu_data.x86_phys_bits instead of the raw CPUID information to
enumerate the MAXPHYADDR for KVM guests when TDP is disabled (the guest
version is only relevant to NPT/TDP).
When using shadow paging, any reductions to the host's MAXPHYADDR apply
to KVM and its guests as well, i.e. using the raw CPUID info will cause
KVM to misreport the number of PA bits available to the guest.
Unconditionally zero out the "Physical Address bit reduction" entry.
For !TDP, the adjustment is already done, and for TDP enumerating the
host's reduction is wrong as the reduction does not apply to GPAs.
Fixes: 9af9b94068 ("x86/cpu/AMD: Handle SME reduction in physical address size")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210623230552.4027702-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4bf48e3c0a upstream.
Ignore the guest MAXPHYADDR reported by CPUID.0x8000_0008 if TDP, i.e.
NPT, is disabled, and instead use the host's MAXPHYADDR. Per AMD'S APM:
Maximum guest physical address size in bits. This number applies only
to guests using nested paging. When this field is zero, refer to the
PhysAddrSize field for the maximum guest physical address size.
Fixes: 24c82e576b ("KVM: Sanitize cpuid")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210623230552.4027702-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 506c1da44f upstream.
We recently fixed DNS resolution of the server hostname during reconnect.
However, server IP address may change, even when the old one continues
to server (although sub-optimally).
We should schedule the next DNS resolution based on the TTL of
the DNS record used for the last resolution. This way, we resolve the
server hostname again when a DNS record expires.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Cc: <stable@vger.kernel.org> # v5.11+
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cad83c968c upstream.
As syzbot reported, there is an use-after-free issue during f2fs recovery:
Use-after-free write at 0xffff88823bc16040 (in kfence-#10):
kmem_cache_destroy+0x1f/0x120 mm/slab_common.c:486
f2fs_recover_fsync_data+0x75b0/0x8380 fs/f2fs/recovery.c:869
f2fs_fill_super+0x9393/0xa420 fs/f2fs/super.c:3945
mount_bdev+0x26c/0x3a0 fs/super.c:1367
legacy_get_tree+0xea/0x180 fs/fs_context.c:592
vfs_get_tree+0x86/0x270 fs/super.c:1497
do_new_mount fs/namespace.c:2905 [inline]
path_mount+0x196f/0x2be0 fs/namespace.c:3235
do_mount fs/namespace.c:3248 [inline]
__do_sys_mount fs/namespace.c:3456 [inline]
__se_sys_mount+0x2f9/0x3b0 fs/namespace.c:3433
do_syscall_64+0x3f/0xb0 arch/x86/entry/common.c:47
entry_SYSCALL_64_after_hwframe+0x44/0xae
The root cause is multi f2fs filesystem instances can race on accessing
global fsync_entry_slab pointer, result in use-after-free issue of slab
cache, fixes to init/destroy this slab cache only once during module
init/destroy procedure to avoid this issue.
Reported-by: syzbot+9d90dad32dd9727ed084@syzkaller.appspotmail.com
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7b53cca764 upstream.
As seen from a recent syzbot bug report, mistakes in the compat ioctl
implementation can lead to uninitialized kernel stack data getting used
as input for driver ioctl handlers.
The reported bug is now fixed, but it's possible that other related
bugs are still present or get added in the future. As the drivers need
to check user input already, the possible impact is fairly low, but it
might still cause an information leak.
To be on the safe side, always clear the entire ioctl buffer before
calling the conversion handler functions that are meant to initialize
them.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 867de40c4c upstream.
SSDs perform badly with sub-4k writes (because they perfrorm
read-modify-write internally), so make sure writecache writes at least
4k when committing.
Fixes: 991bd8d7bc ("dm writecache: commit just one block, not a full page")
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4c6e097629 upstream.
The Elgato Cam Link 4K HDMI video capture card reports to support three
different pixel formats, where the first format depends on the connected
HDMI device.
```
$ v4l2-ctl -d /dev/video0 --list-formats-ext
ioctl: VIDIOC_ENUM_FMT
Type: Video Capture
[0]: 'NV12' (Y/CbCr 4:2:0)
Size: Discrete 3840x2160
Interval: Discrete 0.033s (29.970 fps)
[1]: 'NV12' (Y/CbCr 4:2:0)
Size: Discrete 3840x2160
Interval: Discrete 0.033s (29.970 fps)
[2]: 'YU12' (Planar YUV 4:2:0)
Size: Discrete 3840x2160
Interval: Discrete 0.033s (29.970 fps)
```
Changing the pixel format to anything besides the first pixel format
does not work:
```
$ v4l2-ctl -d /dev/video0 --try-fmt-video pixelformat=YU12
Format Video Capture:
Width/Height : 3840/2160
Pixel Format : 'NV12' (Y/CbCr 4:2:0)
Field : None
Bytes per Line : 3840
Size Image : 12441600
Colorspace : sRGB
Transfer Function : Rec. 709
YCbCr/HSV Encoding: Rec. 709
Quantization : Default (maps to Limited Range)
Flags :
```
User space applications like VLC might show an error message on the
terminal in that case:
```
libv4l2: error set_fmt gave us a different result than try_fmt!
```
Depending on the error handling of the user space applications, they
might display a distorted video, because they use the wrong pixel format
for decoding the stream.
The Elgato Cam Link 4K responds to the USB video probe
VS_PROBE_CONTROL/VS_COMMIT_CONTROL with a malformed data structure: The
second byte contains bFormatIndex (instead of being the second byte of
bmHint). The first byte is always zero. The third byte is always 1.
The firmware bug was reported to Elgato on 2020-12-01 and it was
forwarded by the support team to the developers as feature request.
There is no firmware update available since then. The latest firmware
for Elgato Cam Link 4K as of 2021-03-23 has MCU 20.02.19 and FPGA 67.
Therefore correct the malformed data structure for this device. The
change was successfully tested with VLC, OBS, and Chromium using
different pixel formats (YUYV, NV12, YU12), resolutions (3840x2160,
1920x1080), and frame rates (29.970 and 59.940 fps).
Cc: stable@vger.kernel.org
Signed-off-by: Benjamin Drung <bdrung@posteo.de>
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b4bb4d425b upstream.
The direction of the pipe argument must match the request-type direction
bit or control requests may fail depending on the host-controller-driver
implementation.
Control transfers without a data stage are treated as OUT requests by
the USB stack and should be using usb_sndctrlpipe(). Failing to do so
will now trigger a warning.
Fix the single zero-length control request which was using the
read-register helper, and update the helper so that zero-length reads
fail with an error message instead.
Fixes: 6a7eba24e4 ("V4L/DVB (8157): gspca: all subdrivers")
Cc: stable@vger.kernel.org # 2.6.27
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 53ae298fde upstream.
The direction of the pipe argument must match the request-type direction
bit or control requests may fail depending on the host-controller-driver
implementation.
Fix the USB_REQ_SYNCH_FRAME request which erroneously used
usb_sndctrlpipe().
Fixes: 27d35fc3fb ("V4L/DVB (10639): gspca - sq905: New subdriver.")
Cc: stable@vger.kernel.org # 2.6.30
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8c8b9a9be2 upstream.
The direction of the pipe argument must match the request-type direction
bit or control requests may fail depending on the host-controller-driver
implementation.
Fix the control requests which erroneously used usb_rcvctrlpipe().
Fixes: 8466028be7 ("V4L/DVB (8734): Initial support for AME DTV-5100 USB2.0 DVB-T")
Cc: stable@vger.kernel.org # 2.6.28
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit da3a1858c3 upstream.
The pm_runtime_get_sync() internally increments the
dev->power.usage_count without decrementing it, even on errors.
There is a bug at ccs_pm_get_init(): when this function returns
an error, the stream is not started, and RPM usage_count
should not be incremented. However, if the calls to
v4l2_ctrl_handler_setup() return errors, it will be kept
incremented.
At ccs_suspend() the best is to replace it by the new
pm_runtime_resume_and_get(), introduced by:
commit dd8088d5a8 ("PM: runtime: Add pm_runtime_resume_and_get to deal with usage counter")
in order to properly decrement the usage counter automatically,
in the case of errors.
Fixes: 96e3a6b92f ("media: smiapp: Avoid maintaining power state information")
Cc: stable@vger.kernel.org
Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0a7790be18 upstream.
The saa6588_ioctl() function expects to get called from other kernel
functions with a 'saa6588_command' pointer, but I found nothing stops it
from getting called from user space instead, which seems rather dangerous.
The same thing happens in the davinci vpbe driver with its VENC_GET_FLD
command.
As a quick fix, add a separate .command() callback pointer for this
driver and change the two callers over to that. This change can easily
get backported to stable kernels if necessary, but since there are only
two drivers, we may want to eventually replace this with a set of more
specialized callbacks in the long run.
Fixes: c3fda7f835 ("V4L/DVB (10537): saa6588: convert to v4l2_subdev.")
Cc: stable@vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7f71a409fe upstream.
Marvell Armada 3700 Functional Errata, Guidelines, and Restrictions
document describes in erratum 4.1 PCIe value of vendor ID (Ref #: 243):
The readback value of VEND_ID (RD0070000h [15:0]) is 1B4Bh, while it
should read 11ABh.
The firmware can write the correct value, 11ABh, through VEND_ID
(RD0076044h [15:0]).
Implement this workaround in aardvark driver for both PCI vendor id and PCI
subsystem vendor id.
This change affects and fixes PCI vendor id of emulated PCIe root bridge.
After this change emulated PCIe root bridge has correct vendor id.
Link: https://lore.kernel.org/r/20210624222621.4776-5-pali@kernel.org
Fixes: 8a3ebd8de3 ("PCI: aardvark: Implement emulated root PCI bridge config space")
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Marek Behún <kabel@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4694ae373d upstream.
On Macbook 2013, resuming from suspend-to-idle or standby resulted in the
external monitor no longer being detected, a stacktrace, and errors like
this in dmesg:
pcieport 0000:06:00.0: can't change power state from D3hot to D0 (config space inaccessible)
The reason is that we know how to turn power to the Thunderbolt controller
*off* via the SXIO/SXFP/SXLF methods, but we don't know how to turn power
back on. We have to rely on firmware to turn the power back on.
When going to the "suspend-to-idle" or "standby" system sleep states,
firmware is not involved either on the suspend side or the resume side, so
we can't use SXIO/SXFP/SXLF to turn the power off.
Skip SXIO/SXFP/SXLF when firmware isn't involved in suspend, e.g., when
we're going to the "suspend-to-idle" or "standby" system sleep states.
Fixes: 1df5172c5c ("PCI: Suspend/resume quirks for Apple thunderbolt")
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=212767
Link: https://lore.kernel.org/r/20210520235501.917397-1-Hi-Angel@yandex.ru
Signed-off-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b6e58b5466 upstream.
remove_raw() in dm_btree_remove() may fail due to IO read error
(e.g. read the content of origin block fails during shadowing),
and the value of shadow_spine::root is uninitialized, but
the uninitialized value is still assign to new_root in the
end of dm_btree_remove().
For dm-thin, the value of pmd->details_root or pmd->root will become
an uninitialized value, so if trying to read details_info tree again
out-of-bound memory may occur as showed below:
general protection fault, probably for non-canonical address 0x3fdcb14c8d7520
CPU: 4 PID: 515 Comm: dmsetup Not tainted 5.13.0-rc6
Hardware name: QEMU Standard PC
RIP: 0010:metadata_ll_load_ie+0x14/0x30
Call Trace:
sm_metadata_count_is_more_than_one+0xb9/0xe0
dm_tm_shadow_block+0x52/0x1c0
shadow_step+0x59/0xf0
remove_raw+0xb2/0x170
dm_btree_remove+0xf4/0x1c0
dm_pool_delete_thin_device+0xc3/0x140
pool_message+0x218/0x2b0
target_message+0x251/0x290
ctl_ioctl+0x1c4/0x4d0
dm_ctl_ioctl+0xe/0x20
__x64_sys_ioctl+0x7b/0xb0
do_syscall_64+0x40/0xb0
entry_SYSCALL_64_after_hwframe+0x44/0xae
Fixing it by only assign new_root when removal succeeds
Signed-off-by: Hou Tao <houtao1@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ee55b92a73 upstream.
Commit d53f1fafec ("dm writecache: do
direct write if the cache is full") changed dm-writecache, so that it
writes directly to the origin device if the cache is full.
Unfortunately, it doesn't forward flush requests to the origin device,
so that there is a bug where flushes are being ignored.
Fix this by adding missing flush forwarding.
For PMEM mode, we fix this bug by disabling direct writes to the origin
device, because it performs better.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Fixes: d53f1fafec ("dm writecache: do direct write if the cache is full")
Cc: stable@vger.kernel.org # v5.7+
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bab6849942 upstream.
The dm-zoned target cannot support zoned block devices with zones that
have a capacity smaller than the zone size (e.g. NVMe zoned namespaces)
due to the current chunk zone mapping implementation as it is assumed
that zones and chunks have the same size with all blocks usable.
If a zoned drive is found to have zones with a capacity different from
the zone size, fail the target initialization.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Cc: stable@vger.kernel.org # v5.9+
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5fae8a946a upstream.
commit 6f755e85c3 ("coresight: Add helper for inserting synchronization
packets") removed trailing '\0' from barrier_pkt array and updated the
call sites like etb_update_buffer() to have proper checks for barrier_pkt
size before read but missed updating tmc_update_etf_buffer() which still
reads barrier_pkt past the array size resulting in KASAN out-of-bounds
bug. Fix this by adding a check for barrier_pkt size before accessing
like it is done in etb_update_buffer().
BUG: KASAN: global-out-of-bounds in tmc_update_etf_buffer+0x4b8/0x698
Read of size 4 at addr ffffffd05b7d1030 by task perf/2629
Call trace:
dump_backtrace+0x0/0x27c
show_stack+0x20/0x2c
dump_stack+0x11c/0x188
print_address_description+0x3c/0x4a4
__kasan_report+0x140/0x164
kasan_report+0x10/0x18
__asan_report_load4_noabort+0x1c/0x24
tmc_update_etf_buffer+0x4b8/0x698
etm_event_stop+0x248/0x2d8
etm_event_del+0x20/0x2c
event_sched_out+0x214/0x6f0
group_sched_out+0xd0/0x270
ctx_sched_out+0x2ec/0x518
__perf_event_task_sched_out+0x4fc/0xe6c
__schedule+0x1094/0x16a0
preempt_schedule_irq+0x88/0x170
arm64_preempt_schedule_irq+0xf0/0x18c
el1_irq+0xe8/0x180
perf_event_exec+0x4d8/0x56c
setup_new_exec+0x204/0x400
load_elf_binary+0x72c/0x18c0
search_binary_handler+0x13c/0x420
load_script+0x500/0x6c4
search_binary_handler+0x13c/0x420
exec_binprm+0x118/0x654
__do_execve_file+0x77c/0xba4
__arm64_compat_sys_execve+0x98/0xac
el0_svc_common+0x1f8/0x5e0
el0_svc_compat_handler+0x84/0xb0
el0_svc_compat+0x10/0x50
The buggy address belongs to the variable:
barrier_pkt+0x10/0x40
Memory state around the buggy address:
ffffffd05b7d0f00: fa fa fa fa 04 fa fa fa fa fa fa fa 00 00 00 00
ffffffd05b7d0f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffffffd05b7d1000: 00 00 00 00 00 00 fa fa fa fa fa fa 00 00 00 03
^
ffffffd05b7d1080: fa fa fa fa 00 02 fa fa fa fa fa fa 03 fa fa fa
ffffffd05b7d1100: fa fa fa fa 00 00 00 00 05 fa fa fa fa fa fa fa
==================================================================
Link: https://lore.kernel.org/r/20210505093430.18445-1-saiprakash.ranjan@codeaurora.org
Fixes: 0c3fc4d5fa ("coresight: Add barrier packet for synchronisation")
Cc: stable@vger.kernel.org
Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Link: https://lore.kernel.org/r/20210614175901.532683-6-mathieu.poirier@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9272e5d002 upstream.
In the out_err_bus_register error branch of tpci200_pci_probe,
tpci200->info->cfg_regs is freed by tpci200_uninstall()->
tpci200_unregister()->pci_iounmap(..,tpci200->info->cfg_regs)
in the first time.
But later, iounmap() is called to free tpci200->info->cfg_regs
again.
My patch sets tpci200->info->cfg_regs to NULL after tpci200_uninstall()
to avoid the double free.
Fixes: cea2f7cdff ("Staging: ipack/bridges/tpci200: Use the TPCI200 in big endian mode")
Cc: stable <stable@vger.kernel.org>
Acked-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Signed-off-by: Lv Yunlong <lyl2019@mail.ustc.edu.cn>
Link: https://lore.kernel.org/r/20210524093205.8333-1-lyl2019@mail.ustc.edu.cn
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4030a6e6a6 upstream.
Currently tgid_map is sized at PID_MAX_DEFAULT entries, which means that
on systems where pid_max is configured higher than PID_MAX_DEFAULT the
ftrace record-tgid option doesn't work so well. Any tasks with PIDs
higher than PID_MAX_DEFAULT are simply not recorded in tgid_map, and
don't show up in the saved_tgids file.
In particular since systemd v243 & above configure pid_max to its
highest possible 1<<22 value by default on 64 bit systems this renders
the record-tgids option of little use.
Increase the size of tgid_map to the configured pid_max instead,
allowing it to cover the full range of PIDs up to the maximum value of
PID_MAX_LIMIT if the system is configured that way.
On 64 bit systems with pid_max == PID_MAX_LIMIT this will increase the
size of tgid_map from 256KiB to 16MiB. Whilst this 64x increase in
memory overhead sounds significant 64 bit systems are presumably best
placed to accommodate it, and since tgid_map is only allocated when the
record-tgid option is actually used presumably the user would rather it
spends sufficient memory to actually record the tgids they expect.
The size of tgid_map could also increase for CONFIG_BASE_SMALL=y
configurations, but these seem unlikely to be systems upon which people
are both configuring a large pid_max and running ftrace with record-tgid
anyway.
Of note is that we only allocate tgid_map once, the first time that the
record-tgid option is enabled. Therefore its size is only set once, to
the value of pid_max at the time the record-tgid option is first
enabled. If a user increases pid_max after that point, the saved_tgids
file will not contain entries for any tasks with pids beyond the earlier
value of pid_max.
Link: https://lkml.kernel.org/r/20210701172407.889626-2-paulburton@google.com
Fixes: d914ba37d7 ("tracing: Add support for recording tgid of tasks")
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Paul Burton <paulburton@google.com>
[ Fixed comment coding style ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b81b3e959a upstream.
The tgid_map array records a mapping from pid to tgid, where the index
of an entry within the array is the pid & the value stored at that index
is the tgid.
The saved_tgids_next() function iterates over pointers into the tgid_map
array & dereferences the pointers which results in the tgid, but then it
passes that dereferenced value to trace_find_tgid() which treats it as a
pid & does a further lookup within the tgid_map array. It seems likely
that the intent here was to skip over entries in tgid_map for which the
recorded tgid is zero, but instead we end up skipping over entries for
which the thread group leader hasn't yet had its own tgid recorded in
tgid_map.
A minimal fix would be to remove the call to trace_find_tgid, turning:
if (trace_find_tgid(*ptr))
into:
if (*ptr)
..but it seems like this logic can be much simpler if we simply let
seq_read() iterate over the whole tgid_map array & filter out empty
entries by returning SEQ_SKIP from saved_tgids_show(). Here we take that
approach, removing the incorrect logic here entirely.
Link: https://lkml.kernel.org/r/20210630003406.4013668-1-paulburton@google.com
Fixes: d914ba37d7 ("tracing: Add support for recording tgid of tasks")
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Paul Burton <paulburton@google.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 11c7aa0dde upstream.
Commit 545fbd0775 ("rq-qos: fix missed wake-ups in rq_qos_throttle")
tried to fix a problem that a process could be sleeping in rq_qos_wait()
without anyone to wake it up. However the fix is not complete and the
following can still happen:
CPU1 (waiter1) CPU2 (waiter2) CPU3 (waker)
rq_qos_wait() rq_qos_wait()
acquire_inflight_cb() -> fails
acquire_inflight_cb() -> fails
completes IOs, inflight
decreased
prepare_to_wait_exclusive()
prepare_to_wait_exclusive()
has_sleeper = !wq_has_single_sleeper() -> true as there are two sleepers
has_sleeper = !wq_has_single_sleeper() -> true
io_schedule() io_schedule()
Deadlock as now there's nobody to wakeup the two waiters. The logic
automatically blocking when there are already sleepers is really subtle
and the only way to make it work reliably is that we check whether there
are some waiters in the queue when adding ourselves there. That way, we
are guaranteed that at least the first process to enter the wait queue
will recheck the waiting condition before going to sleep and thus
guarantee forward progress.
Fixes: 545fbd0775 ("rq-qos: fix missed wake-ups in rq_qos_throttle")
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20210607112613.25344-1-jack@suse.cz
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 56a1188159 upstream.
The commit 529a1101212a("mfd: syscon: Don't free allocated name
for regmap_config") doesn't free the allocated name field of struct
regmap_config, but introduce a memory leak. There is another
commit 94cc89eb8fa5("regmap: debugfs: Fix handling of name string
for debugfs init delays") fixing this debugfs init issue from root
cause. With this fixing, the name field in struct regmap_debugfs_node
is removed. When initialize debugfs for syscon driver, the name
field of struct regmap_config is not used anymore. So, the allocated
name field of struct regmap_config is need to be freed directly after
regmap initialization to avoid memory leak.
Cc: stable@vger.kernel.org
Fixes: 529a1101212a("mfd: syscon: Don't free allocated name for regmap_config")
Signed-off-by: Meng Li <Meng.Li@windriver.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f1c74a6c07 upstream.
Trying to get the AB8500 charging driver working I ran into a bit
of bitrot: we haven't used the driver for a while so errors in
refactorings won't be noticed.
This one is pretty self evident: use argument to the macro or we
end up with a random pointer to something else.
Cc: stable@vger.kernel.org
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Marcus Cooper <codekipper@gmail.com>
Fixes: 297d716f62 ("power_supply: Change ownership from driver to core")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2253042d86 upstream.
When an IPMI watchdog timer is being stopped in ipmi_close() or
ipmi_ioctl(WDIOS_DISABLECARD), the current watchdog action is updated to
WDOG_TIMEOUT_NONE and _ipmi_set_timeout(IPMI_SET_TIMEOUT_NO_HB) is called
to install this action. The latter function ends up invoking
__ipmi_set_timeout() which makes the actual 'Set Watchdog Timer' IPMI
request.
For IPMI 1.0, this operation results in fully stopping the watchdog timer.
For IPMI >= 1.5, function __ipmi_set_timeout() always specifies the "don't
stop" flag in the prepared 'Set Watchdog Timer' IPMI request. This causes
that the watchdog timer has its action correctly updated to 'none' but the
timer continues to run. A problem is that IPMI firmware can then still log
an expiration event when the configured timeout is reached, which is
unexpected because the watchdog timer was requested to be stopped.
The patch fixes this problem by not setting the "don't stop" flag in
__ipmi_set_timeout() when the current action is WDOG_TIMEOUT_NONE which
results in stopping the watchdog timer. This makes the behaviour for
IPMI >= 1.5 consistent with IPMI 1.0. It also matches the logic in
__ipmi_heartbeat() which does not allow to reset the watchdog if the
current action is WDOG_TIMEOUT_NONE as that would start the timer.
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Message-Id: <10a41bdc-9c99-089c-8d89-fa98ce5ea080@suse.com>
Cc: stable@vger.kernel.org
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fca41af18e upstream.
fw_cfg_showrev() is called by an indirect call in kobj_attr_show(),
which violates clang's CFI checking because fw_cfg_showrev()'s second
parameter is 'struct attribute', whereas the ->show() member of 'struct
kobj_structure' expects the second parameter to be of type 'struct
kobj_attribute'.
$ cat /sys/firmware/qemu_fw_cfg/rev
3
$ dmesg | grep "CFI failure"
[ 26.016832] CFI failure (target: fw_cfg_showrev+0x0/0x8):
Fix this by converting fw_cfg_rev_attr to 'struct kobj_attribute' where
this would have been caught automatically by the incompatible pointer
types compiler warning. Update fw_cfg_showrev() accordingly.
Fixes: 75f3e8e47f ("firmware: introduce sysfs driver for QEMU's fw_cfg device")
Link: https://github.com/ClangBuiltLinux/linux/issues/1299
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210211194258.4137998-1-nathan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1f9482aa8d upstream.
We can deadlock when rmmod'ing the driver or going through firmware
reset, because the cfg80211_unregister_wdev() has to bring down the link
for us, ... which then grab the same wiphy lock.
nl80211_del_interface() already handles a very similar case, with a nice
description:
/*
* We hold RTNL, so this is safe, without RTNL opencount cannot
* reach 0, and thus the rdev cannot be deleted.
*
* We need to do it for the dev_close(), since that will call
* the netdev notifiers, and we need to acquire the mutex there
* but don't know if we get there from here or from some other
* place (e.g. "ip link set ... down").
*/
mutex_unlock(&rdev->wiphy.mtx);
...
Do similarly for mwifiex teardown, by ensuring we bring the link down
first.
Sample deadlock trace:
[ 247.103516] INFO: task rmmod:2119 blocked for more than 123 seconds.
[ 247.110630] Not tainted 5.12.4 #5
[ 247.115796] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 247.124557] task:rmmod state:D stack: 0 pid: 2119 ppid: 2114 flags:0x00400208
[ 247.133905] Call trace:
[ 247.136644] __switch_to+0x130/0x170
[ 247.140643] __schedule+0x714/0xa0c
[ 247.144548] schedule_preempt_disabled+0x88/0xf4
[ 247.149714] __mutex_lock_common+0x43c/0x750
[ 247.154496] mutex_lock_nested+0x5c/0x68
[ 247.158884] cfg80211_netdev_notifier_call+0x280/0x4e0 [cfg80211]
[ 247.165769] raw_notifier_call_chain+0x4c/0x78
[ 247.170742] call_netdevice_notifiers_info+0x68/0xa4
[ 247.176305] __dev_close_many+0x7c/0x138
[ 247.180693] dev_close_many+0x7c/0x10c
[ 247.184893] unregister_netdevice_many+0xfc/0x654
[ 247.190158] unregister_netdevice_queue+0xb4/0xe0
[ 247.195424] _cfg80211_unregister_wdev+0xa4/0x204 [cfg80211]
[ 247.201816] cfg80211_unregister_wdev+0x20/0x2c [cfg80211]
[ 247.208016] mwifiex_del_virtual_intf+0xc8/0x188 [mwifiex]
[ 247.214174] mwifiex_uninit_sw+0x158/0x1b0 [mwifiex]
[ 247.219747] mwifiex_remove_card+0x38/0xa0 [mwifiex]
[ 247.225316] mwifiex_pcie_remove+0xd0/0xe0 [mwifiex_pcie]
[ 247.231451] pci_device_remove+0x50/0xe0
[ 247.235849] device_release_driver_internal+0x110/0x1b0
[ 247.241701] driver_detach+0x5c/0x9c
[ 247.245704] bus_remove_driver+0x84/0xb8
[ 247.250095] driver_unregister+0x3c/0x60
[ 247.254486] pci_unregister_driver+0x2c/0x90
[ 247.259267] cleanup_module+0x18/0xcdc [mwifiex_pcie]
Fixes: a05829a722 ("cfg80211: avoid holding the RTNL when calling the driver")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/linux-wireless/98392296-40ee-6300-369c-32e16cff3725@gmail.com/
Link: https://lore.kernel.org/linux-wireless/ab4d00ce52f32bd8e45ad0448a44737e@bewaar.me/
Reported-by: Maximilian Luz <luzmaximilian@gmail.com>
Reported-by: dave@bewaar.me
Cc: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Brian Norris <briannorris@chromium.org>
Tested-by: Maximilian Luz <luzmaximilian@gmail.com>
Tested-by: Dave Olsthoorn <dave@bewaar.me>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20210515024227.2159311-1-briannorris@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f6eb84fa59 upstream.
The driver_name="tegra" is now required by the newer ALSA UCMs, otherwise
Tegra UCMs don't match by the path/name.
All Tegra machine drivers are specifying the card's name, but it has no
effect if model name is specified in the device-tree since it overrides
the card's name. We need to set the driver_name to "tegra" in order to
get a usable lookup path for the updated ALSA UCMs. The new UCM lookup
path has a form of driver_name/card_name.
The old lookup paths that are based on driver module name continue to
work as before. Note that UCM matching never worked for Tegra ASoC drivers
if they were compiled as built-in, this is fixed by supporting the new
naming scheme.
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Link: https://lore.kernel.org/r/20210529154649.25936-2-digetx@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8b33dfe0ba upstream.
Bad counter reads are experienced sometimes when bit 10 or greater rolls
over. Originally, testing showed that at least 10 lower bits would be
set to the same value during these bad reads. However, some users still
reported time skips.
Wider testing revealed that on some chips, occasionally only the lowest
9 bits would read as the anomalous value. During these reads (which
still happen only when bit 10), bit 9 would read as the correct value.
Reduce the mask by one bit to cover these cases as well.
Cc: stable@vger.kernel.org
Fixes: c950ca8c35 ("clocksource/drivers/arch_timer: Workaround for Allwinner A64 timer instability")
Reported-by: Roman Stratiienko <r.stratiienko@gmail.com>
Signed-off-by: Samuel Holland <samuel@sholland.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20210515021439.55316-1-samuel@sholland.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b22afcdf04 upstream.
Alexey and Joshua tried to solve a cpusets related hotplug problem which is
user space visible and results in unexpected behaviour for some time after
a CPU has been plugged in and the corresponding uevent was delivered.
cpusets delegate the hotplug work (rebuilding cpumasks etc.) to a
workqueue. This is done because the cpusets code has already a lock
nesting of cgroups_mutex -> cpu_hotplug_lock. A synchronous callback or
waiting for the work to finish with cpu_hotplug_lock held can and will
deadlock because that results in the reverse lock order.
As a consequence the uevent can be delivered before cpusets have consistent
state which means that a user space invocation of sched_setaffinity() to
move a task to the plugged CPU fails up to the point where the scheduled
work has been processed.
The same is true for CPU unplug, but that does not create user observable
failure (yet).
It's still inconsistent to claim that an operation is finished before it
actually is and that's the real issue at hand. uevents just make it
reliably observable.
Obviously the problem should be fixed in cpusets/cgroups, but untangling
that is pretty much impossible because according to the changelog of the
commit which introduced this 8 years ago:
3a5a6d0c2b03("cpuset: don't nest cgroup_mutex inside get_online_cpus()")
the lock order cgroups_mutex -> cpu_hotplug_lock is a design decision and
the whole code is built around that.
So bite the bullet and invoke the relevant cpuset function, which waits for
the work to finish, in _cpu_up/down() after dropping cpu_hotplug_lock and
only when tasks are not frozen by suspend/hibernate because that would
obviously wait forever.
Waiting there with cpu_add_remove_lock, which is protecting the present
and possible CPU maps, held is not a problem at all because neither work
queues nor cpusets/cgroups have any lockchains related to that lock.
Waiting in the hotplug machinery is not problematic either because there
are already state callbacks which wait for hardware queues to drain. It
makes the operations slightly slower, but hotplug is slow anyway.
This ensures that state is consistent before returning from a hotplug
up/down operation. It's still inconsistent during the operation, but that's
a different story.
Add a large comment which explains why this is done and why this is not a
dump ground for the hack of the day to work around half thought out locking
schemes. Document also the implications vs. hotplug operations and
serialization or the lack of it.
Thanks to Alexy and Joshua for analyzing why this temporary
sched_setaffinity() failure happened.
Fixes: 3a5a6d0c2b03("cpuset: don't nest cgroup_mutex inside get_online_cpus()")
Reported-by: Alexey Klimov <aklimov@redhat.com>
Reported-by: Joshua Baker <jobaker@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexey Klimov <aklimov@redhat.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/87tuowcnv3.ffs@nanos.tec.linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 52218fcd61 upstream.
The TTL field indicates the level of page table walk holding the *leaf*
entry for the address being invalidated. But currently, the TTL field
may be set to an incorrent value in the following stack:
pte_free_tlb
__pte_free_tlb
tlb_remove_table
tlb_table_invalidate
tlb_flush_mmu_tlbonly
tlb_flush
In this case, we just want to flush a PTE page, but the tlb->cleared_pmds
is set and we get tlb_level = 2 in the tlb_get_level() function. This may
cause some unexpected problems.
This patch set the TTL field to 0 if tlb->freed_tables is set. The
tlb->freed_tables indicates page table pages are freed, not the leaf
entry.
Cc: <stable@vger.kernel.org> # 5.9.x
Fixes: c4ab2cbc1d ("arm64: tlb: Set the TTL field in flush_tlb_range")
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: ZhuRui <zhurui3@huawei.com>
Signed-off-by: Zhenyu Ye <yezhenyu2@huawei.com>
Link: https://lore.kernel.org/r/b80ead47-1f88-3a00-18e1-cacc22f54cc4@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 09247e110b upstream.
While initializing an UHS-I SD card, the mmc core first tries to switch to
1.8V I/O voltage, before it continues to change the settings for the bus
speed mode.
However, the current behaviour in the mmc core is inconsistent and doesn't
conform to the SD spec. More precisely, an SD card that supports UHS-I must
set both the SD_OCR_CCS bit and the SD_OCR_S18R bit in the OCR register
response. When switching to 1.8V I/O the mmc core correctly checks both of
the bits, but only the SD_OCR_S18R bit when changing the settings for bus
speed mode.
Rather than actually fixing the code to confirm to the SD spec, let's
deliberately deviate from it by requiring only the SD_OCR_S18R bit for both
parts. This enables us to support UHS-I for SDSC cards (outside spec),
which is actually being supported by some existing SDSC cards. Moreover,
this fixes the inconsistent behaviour.
Signed-off-by: Christian Loehle <cloehle@hyperstone.com>
Link: https://lore.kernel.org/r/CWXP265MB26803AE79E0AD5ED083BF2A6C4529@CWXP265MB2680.GBRP265.PROD.OUTLOOK.COM
Cc: stable@vger.kernel.org
[Ulf: Rewrote commit message and comments to clarify the changes]
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d0244847f9 upstream.
When an eMMC device is being run in HS400 mode, any access to the
RPMB device will cause the error message "mmc1: Invalid UHS-I mode
selected". This happens as a result of tuning being disabled before
RPMB access and then re-enabled after the RPMB access is complete.
When tuning is re-enabled, the system has to switch from HS400
to HS200 to do the tuning and then back to HS400. As part of
sequence to switch from HS400 to HS200 the system is temporarily
put into HS mode. When switching to HS mode, sdhci_get_preset_value()
is called and does not have support for HS mode and prints the warning
message and returns the preset for SDR12. The fix is to add support
for MMC and SD HS modes to sdhci_get_preset_value().
This can be reproduced on any system running eMMC in HS400 mode
(not HS400ES) by using the "mmc" utility to run the following
command: "mmc rpmb read-counter /dev/mmcblk0rpmb".
Signed-off-by: Al Cooper <alcooperx@gmail.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Fixes: 52983382c7 ("mmc: sdhci: enhance preset value function")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210624163045.33651-1-alcooperx@gmail.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 47a5074303 upstream.
The vc4_set_crtc_possible_masks is meant to run over all the encoders
and then set their possible_crtcs mask to their associated pixelvalve.
However, since the commit 39fcb28083 ("drm/vc4: txp: Turn the TXP into
a CRTC of its own"), the TXP has been turned to a CRTC and encoder of
its own, and while it does indeed register an encoder, it no longer has
an associated pixelvalve. The code will thus run over the TXP encoder
and set a bogus possible_crtcs mask, overriding the one set in the TXP
bind function.
In order to fix this, let's skip any virtual encoder.
Cc: <stable@vger.kernel.org> # v5.9+
Fixes: 39fcb28083 ("drm/vc4: txp: Turn the TXP into a CRTC of its own")
Acked-by: Thomas Zimmermann <tzimmermann@suse.de>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20210507150515.257424-3-maxime@cerno.tech
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c1bfd74bfe upstream.
On the Loongson64 platform used with Radeon GPU, shutdown or reboot failed
when console=tty is in the boot cmdline.
radeon_suspend_kms() puts the hw in the suspend state, especially set fb
state as FBINFO_STATE_SUSPENDED:
if (fbcon) {
console_lock();
radeon_fbdev_set_suspend(rdev, 1);
console_unlock();
}
Then avoid to do any more fb operations in the related functions:
if (p->state != FBINFO_STATE_RUNNING)
return;
So call radeon_suspend_kms() in radeon_pci_shutdown() for Loongson64 to fix
this issue, it looks like some kind of workaround like powerpc.
Co-developed-by: Jianmin Lv <lvjianmin@loongson.cn>
Signed-off-by: Jianmin Lv <lvjianmin@loongson.cn>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit adcf949e66 upstream.
SMU had set all the necessary fields for a link width switch
but the width switch wasn't occurring because the link was idle
in the L1 state. Setting LC_L1_RECONFIG_EN=0x1 will allow width
switches to also be initiated while in L1 instead of waiting until
the link is back in L0.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit aa61581126 upstream.
Navi series GPUs have 2 SIMDs per CU (and then 2 CUs per WGP).
The NV enum headers incorrectly listed this as 4, which later meant
we were incorrectly reporting the number of SIMDs in the HSA
topology. This could cause problems down the line for user-space
applications that want to launch a fixed amount of work to each
SIMD.
Signed-off-by: Joseph Greathouse <Joseph.Greathouse@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 91cdbb955a upstream.
The kernel handles the NX fault by updating CSB or sending
signal to process. In multithread applications, children can
open VAS windows and can exit without closing them. But the
parent can continue to send NX requests with these windows. To
prevent pid reuse, reference will be taken on pid and tgid
when the window is opened and release them during window close.
The current code is not releasing the tgid reference which can
cause pid leak and this patch fixes the issue.
Fixes: db1c08a740 ("powerpc/vas: Take reference to PID and mm for user space windows")
Cc: stable@vger.kernel.org # 5.8+
Reported-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/6020fc4d444864fe20f7dcdc5edfe53e67480a1c.camel@linux.ibm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 015d98149b upstream.
A change in clang 13 results in the __lwsync macro being defined as
__builtin_ppc_lwsync, which emits 'lwsync' or 'msync' depending on what
the target supports. This breaks the build because of -Werror in
arch/powerpc, along with thousands of warnings:
In file included from arch/powerpc/kernel/pmc.c:12:
In file included from include/linux/bug.h:5:
In file included from arch/powerpc/include/asm/bug.h:109:
In file included from include/asm-generic/bug.h:20:
In file included from include/linux/kernel.h:12:
In file included from include/linux/bitops.h:32:
In file included from arch/powerpc/include/asm/bitops.h:62:
arch/powerpc/include/asm/barrier.h:49:9: error: '__lwsync' macro redefined [-Werror,-Wmacro-redefined]
#define __lwsync() __asm__ __volatile__ (stringify_in_c(LWSYNC) : : :"memory")
^
<built-in>:308:9: note: previous definition is here
#define __lwsync __builtin_ppc_lwsync
^
1 error generated.
Undefine this macro so that the runtime patching introduced by
commit 2d1b202762 ("powerpc: Fixup lwsync at runtime") continues to
work properly with clang and the build no longer breaks.
Cc: stable@vger.kernel.org
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://github.com/ClangBuiltLinux/linux/issues/1386
Link: 62b5df7fe2
Link: https://lore.kernel.org/r/20210528182752.1852002-1-nathan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cd5d5e602f upstream.
The powerpc kernel is not prepared to handle exec faults from kernel.
Especially, the function is_exec_fault() will return 'false' when an
exec fault is taken by kernel, because the check is based on reading
current->thread.regs->trap which contains the trap from user.
For instance, when provoking a LKDTM EXEC_USERSPACE test,
current->thread.regs->trap is set to SYSCALL trap (0xc00), and
the fault taken by the kernel is not seen as an exec fault by
set_access_flags_filter().
Commit d7df2443cd ("powerpc/mm: Fix spurious segfaults on radix
with autonuma") made it clear and handled it properly. But later on
commit d3ca587404 ("powerpc/mm: Fix reporting of kernel execute
faults") removed that handling, introducing test based on error_code.
And here is the problem, because on the 603 all upper bits of SRR1
get cleared when the TLB instruction miss handler bails out to ISI.
Until commit cbd7e6ca02 ("powerpc/fault: Avoid heavy
search_exception_tables() verification"), an exec fault from kernel
at a userspace address was indirectly caught by the lack of entry for
that address in the exception tables. But after that commit the
kernel mainly relies on KUAP or on core mm handling to catch wrong
user accesses. Here the access is not wrong, so mm handles it.
It is a minor fault because PAGE_EXEC is not set,
set_access_flags_filter() should set PAGE_EXEC and voila.
But as is_exec_fault() returns false as explained in the beginning,
set_access_flags_filter() bails out without setting PAGE_EXEC flag,
which leads to a forever minor exec fault.
As the kernel is not prepared to handle such exec faults, the thing to
do is to fire in bad_kernel_fault() for any exec fault taken by the
kernel, as it was prior to commit d3ca587404.
Fixes: d3ca587404 ("powerpc/mm: Fix reporting of kernel execute faults")
Cc: stable@vger.kernel.org # v4.14+
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/024bb05105050f704743a0083fe3548702be5706.1625138205.git.christophe.leroy@csgroup.eu
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 97113eb39f upstream.
To avoid a race between rmap walk and mremap, mremap does
take_rmap_locks(). The lock was taken to ensure that rmap walk don't miss
a page table entry due to PTE moves via move_pagetables(). The kernel
does further optimization of this lock such that if we are going to find
the newly added vma after the old vma, the rmap lock is not taken. This
is because rmap walk would find the vmas in the same order and if we don't
find the page table attached to older vma we would find it with the new
vma which we would iterate later.
As explained in commit eb66ae0308 ("mremap: properly flush TLB before
releasing the page") mremap is special in that it doesn't take ownership
of the page. The optimized version for PUD/PMD aligned mremap also
doesn't hold the ptl lock. This can result in stale TLB entries as show
below.
This patch updates the rmap locking requirement in mremap to handle the race condition
explained below with optimized mremap::
Optmized PMD move
CPU 1 CPU 2 CPU 3
mremap(old_addr, new_addr) page_shrinker/try_to_unmap_one
mmap_write_lock_killable()
addr = old_addr
lock(pte_ptl)
lock(pmd_ptl)
pmd = *old_pmd
pmd_clear(old_pmd)
flush_tlb_range(old_addr)
*new_pmd = pmd
*new_addr = 10; and fills
TLB with new addr
and old pfn
unlock(pmd_ptl)
ptep_clear_flush()
old pfn is free.
Stale TLB entry
Optimized PUD move also suffers from a similar race. Both the above race
condition can be fixed if we force mremap path to take rmap lock.
Link: https://lkml.kernel.org/r/20210616045239.370802-7-aneesh.kumar@linux.ibm.com
Fixes: 2c91bd4a4e ("mm: speed up mremap by 20x on large regions")
Fixes: c49dd34018 ("mm: speedup mremap on 1GB or larger regions")
Link: https://lore.kernel.org/linux-mm/CAHk-=wgXVR04eBNtxQfevontWnP6FDm+oj5vauQXP3S-huwbPw@mail.gmail.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Acked-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cad065ed8d upstream.
MIPS MT extensions were added with the MIPS 34K processor, which was
based on the MIPS32r2 ISA.
This fixes a build error when building a generic kernel for a MIPS32r1
CPU.
Fixes: c434b9f80b ("MIPS: Kconfig: add MIPS_GENERIC_KERNEL symbol")
Cc: stable@vger.kernel.org # v5.9
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 6817c94443 ]
The cause of the problem is as follows:
1. when cat /sys/devices/system/memory/memory0/valid_zones,
test_pages_in_a_zone() will be called.
2. test_pages_in_a_zone() finds the zone according to stat_pfn = 0.
The smallest pfn of the numa node in the mips architecture is 128,
and the page corresponding to the previous 0~127 pfn is not
initialized (page->flags is 0xFFFFFFFF)
3. The nid and zonenum obtained using page_zone(pfn_to_page(0)) are out
of bounds in the corresponding array,
&NODE_DATA(page_to_nid(page))->node_zones[page_zonenum(page)],
access to the out-of-bounds zone member variables appear abnormal,
resulting in Oops.
Therefore, it is necessary to keep the page between 0 and the minimum
pfn to prevent Oops from appearing.
Signed-off-by: zhanglianjie <zhanglianjie@uniontech.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 50619dbf8d ]
The first chunk in a packet is ensured to be present at the beginning of
sctp_rcv(), as a packet needs to have at least 1 chunk. But the second
one, may not be completely available and ch->length can be over
uninitialized memory.
Fix here is by only trying to walk on the next chunk if there is enough to
hold at least the header, and then proceed with the ch->length validation
that is already there.
Reported-by: Ilja Van Sprundel <ivansprundel@ioactive.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0c5dc070ff ]
Ilja reported that, simply putting it, nothing was validating that
from_addr_param functions were operating on initialized memory. That is,
the parameter itself was being validated by sctp_walk_params, but it
doesn't check for types and their specific sizes and it could be a 0-length
one, causing from_addr_param to potentially work over the next parameter or
even uninitialized memory.
The fix here is to, in all calls to from_addr_param, check if enough space
is there for the wanted IP address type.
Reported-by: Ilja Van Sprundel <ivansprundel@ioactive.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ccff81e1d0 ]
kmemleak scans struct page, but it does not scan the page content. If we
allocate some memory with kmalloc(), then allocate page with alloc_page(),
and if we put kmalloc pointer somewhere inside that page, kmemleak will
report kmalloc pointer as a false positive.
We can instruct kmemleak to scan the memory area by calling kmemleak_alloc()
and kmemleak_free(), but part of struct bpf_ringbuf is mmaped to user space,
and if struct bpf_ringbuf changes we would have to revisit and review size
argument in kmemleak_alloc(), because we do not want kmemleak to scan the
user space memory. Let's simplify things and use kmemleak_not_leak() here.
For posterity, also adding additional prior analysis from Andrii:
I think either kmemleak or syzbot are misreporting this. I've added a
bunch of printks around all allocations performed by BPF ringbuf. [...]
On repro side I get these two warnings:
[vmuser@archvm bpf]$ sudo ./repro
BUG: memory leak
unreferenced object 0xffff88810d538c00 (size 64):
comm "repro", pid 2140, jiffies 4294692933 (age 14.540s)
hex dump (first 32 bytes):
00 af 19 04 00 ea ff ff c0 ae 19 04 00 ea ff ff ................
80 ae 19 04 00 ea ff ff c0 29 2e 04 00 ea ff ff .........)......
backtrace:
[<0000000077bfbfbd>] __bpf_map_area_alloc+0x31/0xc0
[<00000000587fa522>] ringbuf_map_alloc.cold.4+0x48/0x218
[<0000000044d49e96>] __do_sys_bpf+0x359/0x1d90
[<00000000f601d565>] do_syscall_64+0x2d/0x40
[<0000000043d3112a>] entry_SYSCALL_64_after_hwframe+0x44/0xae
BUG: memory leak
unreferenced object 0xffff88810d538c80 (size 64):
comm "repro", pid 2143, jiffies 4294699025 (age 8.448s)
hex dump (first 32 bytes):
80 aa 19 04 00 ea ff ff 00 ab 19 04 00 ea ff ff ................
c0 ab 19 04 00 ea ff ff 80 44 28 04 00 ea ff ff .........D(.....
backtrace:
[<0000000077bfbfbd>] __bpf_map_area_alloc+0x31/0xc0
[<00000000587fa522>] ringbuf_map_alloc.cold.4+0x48/0x218
[<0000000044d49e96>] __do_sys_bpf+0x359/0x1d90
[<00000000f601d565>] do_syscall_64+0x2d/0x40
[<0000000043d3112a>] entry_SYSCALL_64_after_hwframe+0x44/0xae
Note that both reported leaks (ffff88810d538c80 and ffff88810d538c00)
correspond to pages array bpf_ringbuf is allocating and tracking properly
internally. Note also that syzbot repro doesn't close FD of created BPF
ringbufs, and even when ./repro itself exits with error, there are still
two forked processes hanging around in my system. So clearly ringbuf maps
are alive at that point. So reporting any memory leak looks weird at that
point, because that memory is being used by active referenced BPF ringbuf.
It's also a question why repro doesn't clean up its forks. But if I do a
`pkill repro`, I do see that all the allocated memory is /properly/ cleaned
up [and the] "leaks" are deallocated properly.
BTW, if I add close() right after bpf() syscall in syzbot repro, I see that
everything is immediately deallocated, like designed. And no memory leak
is reported. So I don't think the problem is anywhere in bpf_ringbuf code,
rather in the leak detection and/or repro itself.
Reported-by: syzbot+5d895828587f49e7fe9b@syzkaller.appspotmail.com
Signed-off-by: Rustam Kovhaev <rkovhaev@gmail.com>
[ Daniel: also included analysis from Andrii to the commit log ]
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: syzbot+5d895828587f49e7fe9b@syzkaller.appspotmail.com
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/CAEf4BzYk+dqs+jwu6VKXP-RttcTEGFe+ySTGWT9CRNkagDiJVA@mail.gmail.com
Link: https://lore.kernel.org/lkml/YNTAqiE7CWJhOK2M@nuc10
Link: https://lore.kernel.org/lkml/20210615101515.GC26027@arm.com
Link: https://syzkaller.appspot.com/bug?extid=5d895828587f49e7fe9b
Link: https://lore.kernel.org/bpf/20210626181156.1873604-1-rkovhaev@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1c35b07e6d ]
The _sum and _avg values are in general sync together with the PELT
divider. They are however not always completely in perfect sync,
resulting in situations where _sum gets to zero while _avg stays
positive. Such situations are undesirable.
This comes from the fact that PELT will increase period_contrib, also
increasing the PELT divider, without updating _sum and _avg values to
stay in perfect sync where (_sum == _avg * divider). However, such PELT
change will never lower _sum, making it impossible to end up in a
situation where _sum is zero and _avg is not.
Therefore, we need to ensure that when subtracting load outside PELT,
that when _sum is zero, _avg is also set to zero. This occurs when
(_sum < _avg * divider), and the subtracted (_avg * divider) is bigger
or equal to the current _sum, while the subtracted _avg is smaller than
the current _avg.
Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Odin Ugedal <odin@uged.al>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Link: https://lore.kernel.org/r/20210624111815.57937-1-odin@uged.al
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4f00bfb372 ]
This is btsoc timing issue, after host start to downloading bt firmware,
ep2 need time to switch from function acl to function dfu, so host add
20ms delay as workaround.
Signed-off-by: Tim Jiang <tjiang@codeaurora.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 02ce2c2c24 ]
When the Get Device Flags command fails, it returns the error status
with the parameters filled with the garbage values. Although the
parameters are not used, it is better to fill with zero than the random
values.
Signed-off-by: Tedd Ho-Jeong An <tedd.an@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e848dbd364 ]
Because mSBC frames do not need to be aligned to the SCO packet
boundary. Using USB ALT 3 let HCI payload >= 60 bytes, let mSBC
data satisfy 60 Bytes avoid payload unaligned situation and fixed
some headset no voise issue.
USB Alt 3 supported also need HFP support transparent MTU in 72 Bytes.
Signed-off-by: Hilda Wu <hildawu@realtek.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4ef36a52b0 ]
0x2B, 0x31 and 0x33 are reserved for future use but were not present in
the HCI to MGMT conversion table, this caused the conversion to be
incorrect for the HCI status code greater than 0x2A.
Reviewed-by: Miao-chen Chou <mcchou@chromium.org>
Signed-off-by: Yu Liu <yudiliu@google.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3cfdf8fcaa ]
When cmtp_attach_device fails, cmtp_add_connection returns the error value
which leads to the caller to doing fput through sockfd_put. But
cmtp_session kthread, which is stopped in this path will also call fput,
leading to a potential refcount underflow or a use-after-free.
Add a refcount before we signal the kthread to stop. The kthread will try
to grab the cmtp_session_sem mutex before doing the fput, which is held
when get_file is called, so there should be no races there.
Reported-by: Ryota Shiga
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6d123b81ac ]
Dave observed number of machines hitting OOM on the UDP send
path. The workload seems to be sending large UDP packets over
loopback. Since loopback has MTU of 64k kernel will try to
allocate an skb with up to 64k of head space. This has a good
chance of failing under memory pressure. What's worse if
the message length is <32k the allocation may trigger an
OOM killer.
This is entirely avoidable, we can use an skb with page frags.
af_unix solves a similar problem by limiting the head
length to SKB_MAX_ALLOC. This seems like a good and simple
approach. It means that UDP messages > 16kB will now
use fragments if underlying device supports SG, if extra
allocator pressure causes regressions in real workloads
we can switch to trying the large allocation first and
falling back.
v4: pre-calculate all the additions to alloclen so
we can be sure it won't go over order-2
Reported-by: Dave Jones <dsj@fb.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e8b9eab992 ]
It's getting more common to run nested container environments for
testing cloud software. One of such examples is Kind [1] which runs a
Kubernetes cluster in Docker containers on a single host. Each container
acts as a Kubernetes node, and thus can run any Pod (aka container)
inside the former. This approach simplifies testing a lot, as it
eliminates complicated VM setups.
Unfortunately, such a setup breaks some functionality when cgroupv2 BPF
programs are used for load-balancing. The load-balancer BPF program
needs to detect whether a request originates from the host netns or a
container netns in order to allow some access, e.g. to a service via a
loopback IP address. Typically, the programs detect this by comparing
netns cookies with the one of the init ns via a call to
bpf_get_netns_cookie(NULL). However, in nested environments the latter
cannot be used given the Kubernetes node's netns is outside the init ns.
To fix this, we need to pass the Kubernetes node netns cookie to the
program in a different way: by extending getsockopt() with a
SO_NETNS_COOKIE option, the orchestrator which runs in the Kubernetes
node netns can retrieve the cookie and pass it to the program instead.
Thus, this is following up on Eric's commit 3d368ab87c ("net:
initialize net->net_cookie at netns setup") to allow retrieval via
SO_NETNS_COOKIE. This is also in line in how we retrieve socket cookie
via SO_COOKIE.
[1] https://kind.sigs.k8s.io/
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Martynas Pumputis <m@lambda.lt>
Cc: Eric Dumazet <edumazet@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 647d446d66 ]
The syscall bpf(BPF_PROG_QUERY, &attr) should use the prog_cnt field to
see how many entries user space provided and return ENOSPC if there are
more programs than that. Before this patch, this is not checked and
ENOSPC is never returned.
Note that one lirc device is limited to 64 bpf programs, and user space
I'm aware of -- ir-keytable -- always gives enough space for 64 entries
already. However, we should not copy program ids than are requested.
Signed-off-by: Sean Young <sean@mess.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210623213754.632-1-sean@mess.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 45daaa1318 ]
The following race was possible:
1. The device driver requests HW restart.
2. A scan is requested from user space and is propagated
to the driver. During this flow HW_SCANNING flag is set.
3. The thread that handles the HW restart is scheduled,
and before starting the actual reconfiguration it
checks that HW_SCANNING is not set. The flow does so
without acquiring any lock, and thus the WARN fires.
Fix this by checking that HW_SCANNING is on only after RTNL is
acquired, i.e., user space scan request handling is no longer
in transit.
Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Link: https://lore.kernel.org/r/iwlwifi.20210618133832.8238ab3e19ab.I2693c581c70251472b4f9089e37e06fb2c18268f@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d656a4c6ea ]
If we have been keeping per-CPU statistics, consider them
regardless of USES_RSS, because we may not actually fill
those, for example in non-fast-RX cases when the connection
is not compatible with fast-RX. If we didn't fill them, the
additional data will be zero and not affect anything, and
if we did fill them then it's more correct to consider them.
This fixes an issue in mesh mode where some statistics are
not updated due to USES_RSS being set, but fast-RX isn't
used.
Reported-by: Thiraviyam Mariyappan <tmariyap@codeaurora.org>
Link: https://lore.kernel.org/r/20210610220814.13b35f5797c5.I511e9b33c5694e0d6cef4b6ae755c873d7c22124@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e93bdd7840 ]
Fix the following out-of-bounds warning:
net/wireless/wext-spy.c:178:2: warning: 'memcpy' offset [25, 28] from the object at 'threshold' is out of the bounds of referenced subobject 'low' with type 'struct iw_quality' at offset 20 [-Warray-bounds]
The problem is that the original code is trying to copy data into a
couple of struct members adjacent to each other in a single call to
memcpy(). This causes a legitimate compiler warning because memcpy()
overruns the length of &threshold.low and &spydata->spy_thr_low. As
these are just a couple of struct members, fix this by using direct
assignments, instead of memcpy().
This helps with the ongoing efforts to globally enable -Warray-bounds
and get us closer to being able to tighten the FORTIFY_SOURCE routines
on memcpy().
Link: https://github.com/KSPP/linux/issues/109
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210422200032.GA168995@embeddedor
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1ebe4feb8b ]
If SRIOV cannot be disabled during device removal or module unloading,
return error code so it can be logged properly in the calling function.
Note that this can only happen if any VF is currently attached to a
guest using Xen, but not with vfio/KVM. Despite that in that case the
VFs won't work properly with PF removed and/or the module unloaded, I
have let it as is because I don't know what side effects may have
changing it, and also it seems to be the same that other drivers are
doing in this situation.
In the case of being called during SRIOV reconfiguration, the behavior
hasn't changed because the function is called with force=false.
Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 45423cff1d ]
If pci_remove was called for a PF with VFs, the removal of the VFs was
called twice from efx_ef10_sriov_fini: one directly with pci_driver->remove
and another implicit by calling pci_disable_sriov, which also perform
the VFs remove. This was leading to crashing the kernel on the second
attempt.
Given that pci_disable_sriov already calls to pci remove function, get
rid of the direct call to pci_driver->remove from the driver.
2 different ways to trigger the bug:
- Create one or more VFs, then attach the PF to a virtual machine (at
least with qemu/KVM)
- Create one or more VFs, then remove the PF with:
echo 1 > /sys/bus/pci/devices/PF_PCI_ID/remove
Removing sfc module does not trigger the error, at least for me, because
it removes the VF first, and then the PF.
Example of a log with the error:
list_del corruption, ffff967fd20a8ad0->next is LIST_POISON1 (dead000000000100)
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:47!
[...trimmed...]
RIP: 0010:__list_del_entry_valid.cold.1+0x12/0x4c
[...trimmed...]
Call Trace:
efx_dissociate+0x1f/0x140 [sfc]
efx_pci_remove+0x27/0x150 [sfc]
pci_device_remove+0x3b/0xc0
device_release_driver_internal+0x103/0x1f0
pci_stop_bus_device+0x69/0x90
pci_stop_and_remove_bus_device+0xe/0x20
pci_iov_remove_virtfn+0xba/0x120
sriov_disable+0x2f/0xe0
efx_ef10_pci_sriov_disable+0x52/0x80 [sfc]
? pcie_aer_is_native+0x12/0x40
efx_ef10_sriov_fini+0x72/0x110 [sfc]
efx_pci_remove+0x62/0x150 [sfc]
pci_device_remove+0x3b/0xc0
device_release_driver_internal+0x103/0x1f0
unbind_store+0xf6/0x130
kernfs_fop_write+0x116/0x190
vfs_write+0xa5/0x1a0
ksys_write+0x4f/0xb0
do_syscall_64+0x5b/0x1a0
entry_SYSCALL_64_after_hwframe+0x65/0xca
Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 956c6d4f20 ]
8821CE with ASPM cannot work properly on Protempo Ltd L116HTN6SPW. Add a
quirk to disable the cap.
The reporter describes the symptom is that this module (driver) causes
frequent freezes, randomly but usually within a few minutes of running
(thus very soon after boot): screen display remains frozen, no response
to either keyboard or mouse input. All I can do is to hold the power
button to power off, then reboot.
Reported-by: Paul Szabo <psz2036@gmail.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20210607012254.6306-1-pkshih@realtek.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8835a64f74 ]
When we have a P2P Device active, we attempt to only change the
PHY context it uses when we get a new remain-on-channel, if the
P2P Device is the only user of the PHY context.
This is fine if we're switching within a band, but if we're
switching bands then the switch implies a removal and re-add
of the PHY context, which isn't permitted by the firmware while
it's bound to an interface.
Fix the code to skip the unbind/release/... cycle only if the
band doesn't change (or we have old devices that can switch the
band on the fly as well.)
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Link: https://lore.kernel.org/r/iwlwifi.20210612142637.e9ac313f70f3.I713b9d109957df7e7d9ed0861d5377ce3f8fccd3@changeid
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 20ec0a6d60 ]
rxe_mr_init_user() always returns the fixed -EINVAL when ib_umem_get()
fails so it's hard for user to know which actual error happens in
ib_umem_get(). For example, ib_umem_get() will return -EOPNOTSUPP when
trying to pin pages on a DAX file.
Return actual error as mlx4/mlx5 does.
Link: https://lore.kernel.org/r/20210621071456.4259-1-ice_yangxiao@163.com
Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c7ff9cff70 ]
The client's sk_state will be set to TCP_ESTABLISHED if the server
replay the client's connect request.
However, if the client has pending signal, its sk_state will be set
to TCP_CLOSE without notify the server, so the server will hold the
corrupt connection.
client server
1. sk_state=TCP_SYN_SENT |
2. call ->connect() |
3. wait reply |
| 4. sk_state=TCP_ESTABLISHED
| 5. insert to connected list
| 6. reply to the client
7. sk_state=TCP_ESTABLISHED |
8. insert to connected list |
9. *signal pending* <--------------------- the user kill client
10. sk_state=TCP_CLOSE |
client is exiting... |
11. call ->release() |
virtio_transport_close
if (!(sk->sk_state == TCP_ESTABLISHED ||
sk->sk_state == TCP_CLOSING))
return true; *return at here, the server cannot notice the connection is corrupt*
So the client should notify the peer in this case.
Cc: David S. Miller <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Jorgen Hansen <jhansen@vmware.com>
Cc: Norbert Slusarek <nslusarek@gmx.net>
Cc: Andra Paraschiv <andraprs@amazon.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: David Brazdil <dbrazdil@google.com>
Cc: Alexander Popov <alex.popov@linux.com>
Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lkml.org/lkml/2021/5/17/418
Signed-off-by: lixianming <lixianming5@huawei.com>
Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 52c4a1a85f ]
As we know that AVB is enabled by default, and the ENET IP design is
queue 0 for best effort, queue 1&2 for AVB Class A&B. Bandwidth of each
queue 1&2 set in driver is 50%, TX bandwidth fluctuated when selecting
tx queues randomly with FEC_QUIRK_HAS_AVB quirk available.
This patch adds ndo_select_queue callback to select queues for
transmitting to fix this issue. It will always return queue 0 if this is
not a vlan packet, and return queue 1 or 2 based on priority of vlan
packet.
You may complain that in fact we only use single queue for trasmitting
if we are not targeted to VLAN. Yes, but seems we have no choice, since
AVB is enabled when the driver probed, we can't switch this feature
dynamicly. After compare multiple queues to single queue, TX throughput
almost no improvement.
One way we can implemet is to configure the driver to multiple queues
with Round-robin scheme by default. Then add ndo_setup_tc callback to
enable/disable AVB feature for users. Unfortunately, ENET AVB IP seems
not follow the standard 802.1Qav spec. We only can program
DMAnCFG[IDLE_SLOPE] field to calculate bandwidth fraction. And idle
slope is restricted to certain valus (a total of 19). It's far away from
CBS QDisc implemented in Linux TC framework. If you strongly suggest to do
this, I think we only can support limited numbers of bandwidth and reject
others, but it's really urgly and wried.
With this patch, VLAN tagged packets route to queue 0/1/2 based on vlan
priority; VLAN untagged packets route to queue 0.
Tested-by: Frieder Schrempf <frieder.schrempf@kontron.de>
Reported-by: Frieder Schrempf <frieder.schrempf@kontron.de>
Signed-off-by: Fugang Duan <fugang.duan@nxp.com>
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 471ff4455d ]
Frieder Schrempf reported a TX throuthput issue [1], it happens quite often
that the measured bandwidth in TX direction drops from its expected/nominal
value to something like ~50% (for 100M) or ~67% (for 1G) connections.
[1] https://lore.kernel.org/linux-arm-kernel/421cc86c-b66f-b372-32f7-21e59f9a98bc@kontron.de/
The issue becomes clear after digging into it, Net core would select
queues when transmitting packets. Since FEC have not impletemented
ndo_select_queue callback yet, so it will call netdev_pick_tx to select
queues randomly.
For i.MX6SX ENET IP with AVB support, driver default enables this
feature. According to the setting of QOS/RCMRn/DMAnCFG registers, AVB
configured to Credit-based scheme, 50% bandwidth of each queue 1&2.
With below tests let me think more:
1) With FEC_QUIRK_HAS_AVB quirk, can reproduce TX bandwidth fluctuations issue.
2) Without FEC_QUIRK_HAS_AVB quirk, can't reproduce TX bandwidth fluctuations issue.
The related difference with or w/o FEC_QUIRK_HAS_AVB quirk is that, whether we
program FTYPE field of TxBD or not. As I describe above, AVB feature is
enabled by default. With FEC_QUIRK_HAS_AVB quirk, frames in queue 0
marked as non-AVB, and frames in queue 1&2 marked as AVB Class A&B. It's
unreasonable if frames in queue 1&2 are not required to be time-sensitive.
So when Net core select tx queues ramdomly, Credit-based scheme would work
and lead to TX bandwidth fluctuated. On the other hand, w/o
FEC_QUIRK_HAS_AVB quirk, frames in queue 1&2 are all marked as non-AVB, so
Credit-based scheme would not work.
Till now, how can we fix this TX throughput issue? Yes, please remove
FEC_QUIRK_HAS_AVB quirk if you suffer it from time-nonsensitive networking.
However, this quirk is used to indicate i.MX6SX, other setting depends
on it. So this patch adds a new quirk FEC_QUIRK_HAS_MULTI_QUEUES to
represent i.MX6SX, it is safe for us remove FEC_QUIRK_HAS_AVB quirk
now.
FEC_QUIRK_HAS_AVB quirk is set by default in the driver, and users may
not know much about driver details, they would waste effort to find the
root cause, that is not we want. The following patch is a implementation
to fix it and users don't need to modify the driver.
Tested-by: Frieder Schrempf <frieder.schrempf@kontron.de>
Reported-by: Frieder Schrempf <frieder.schrempf@kontron.de>
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ed914d48b6 ]
This fixes Page Table accounting bug.
MIPS is the ONLY arch just defining __HAVE_ARCH_PMD_ALLOC_ONE alone.
Since commit b2b29d6d01 (mm: account PMD tables like PTE tables),
"pmd_free" in asm-generic with PMD table accounting and "pmd_alloc_one"
in MIPS without PMD table accounting causes PageTable accounting number
negative, which read by global_zone_page_state(), always returns 0.
Signed-off-by: Huang Pei <huangpei@loongson.cn>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c240b044ed ]
Based on 2001:3319 and 2357:0109 which I used to test the fix and
0bda:818b and 2357:0108 for which I found efuse dumps online.
== 2357:0109 ==
=== Before ===
Vendor: Realtek
Product: \x03802.11n NI
Serial:
=== After ===
Vendor: Realtek
Product: 802.11n NIC
Serial not available.
== 2001:3319 ==
=== Before ===
Vendor: Realtek
Product: Wireless N
Serial: no USB Adap
=== After ===
Vendor: Realtek
Product: Wireless N Nano USB Adapter
Serial not available.
Signed-off-by: Pascal Terjan <pterjan@google.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20210424172959.1559890-1-pterjan@google.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c368362c36 ]
The iv from RXD is only for TKIP_RSC/CCMP_PN/GCMP_PN, and it needs a
check for CCMP header insertion. Move mt76_cipher_type to mt76.h to
reduce duplicated code.
Signed-off-by: Xing Song <xing.song@mediatek.com>
Signed-off-by: Ryder Lee <ryder.lee@mediatek.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit abded041a0 ]
Maximum interval (in seconds) for schedule scan plan supported by
the offload firmware can be U16_MAX.
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 01f7da4091 ]
This patch fixes a mcu hang during device probe on
Marvell ESPRESSObin after a hot reboot.
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b244163f2c ]
This node pointer is returned by of_parse_phandle() with refcount
incremented in this function. of_node_put() on it before exiting
this function.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Acked-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2d8ea148e5 ]
Th_strings arrays netdev_features_strings, tunable_strings, and
phy_tunable_strings has been moved to file net/ethtool/common.c.
So fixes the comment.
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 94e4f57946 ]
Fixes AQL issues on full queues, especially with 802.3 encap offload
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ec8f1a90d0 ]
Rely on the txs fixed-rate bit instead of info->control.rates
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0c526d440f ]
The entry for PTYPE 2 in the ice_ptype_lkup table incorrectly states
that this is an L2 packet with no payload. According to the datasheet,
this PTYPE is actually unused and reserved.
Fix the lookup entry to indicate this is an unused entry that is
reserved.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 638a0c8c88 ]
The entry for PTYPE 90 indicates that the payload is layer 3. This does
not match the specification in the datasheet which indicates the packet
is a MAC, IPv6, UDP packet, with a payload in layer 4.
Fix the lookup table to match the data sheet.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 28131e9d93 ]
syzbot reported a shift-out-of-bounds that KUBSAN observed in the
interpreter:
[...]
UBSAN: shift-out-of-bounds in kernel/bpf/core.c:1420:2
shift exponent 255 is too large for 64-bit type 'long long unsigned int'
CPU: 1 PID: 11097 Comm: syz-executor.4 Not tainted 5.12.0-rc2-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack+0x141/0x1d7 lib/dump_stack.c:120
ubsan_epilogue+0xb/0x5a lib/ubsan.c:148
__ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:327
___bpf_prog_run.cold+0x19/0x56c kernel/bpf/core.c:1420
__bpf_prog_run32+0x8f/0xd0 kernel/bpf/core.c:1735
bpf_dispatcher_nop_func include/linux/bpf.h:644 [inline]
bpf_prog_run_pin_on_cpu include/linux/filter.h:624 [inline]
bpf_prog_run_clear_cb include/linux/filter.h:755 [inline]
run_filter+0x1a1/0x470 net/packet/af_packet.c:2031
packet_rcv+0x313/0x13e0 net/packet/af_packet.c:2104
dev_queue_xmit_nit+0x7c2/0xa90 net/core/dev.c:2387
xmit_one net/core/dev.c:3588 [inline]
dev_hard_start_xmit+0xad/0x920 net/core/dev.c:3609
__dev_queue_xmit+0x2121/0x2e00 net/core/dev.c:4182
__bpf_tx_skb net/core/filter.c:2116 [inline]
__bpf_redirect_no_mac net/core/filter.c:2141 [inline]
__bpf_redirect+0x548/0xc80 net/core/filter.c:2164
____bpf_clone_redirect net/core/filter.c:2448 [inline]
bpf_clone_redirect+0x2ae/0x420 net/core/filter.c:2420
___bpf_prog_run+0x34e1/0x77d0 kernel/bpf/core.c:1523
__bpf_prog_run512+0x99/0xe0 kernel/bpf/core.c:1737
bpf_dispatcher_nop_func include/linux/bpf.h:644 [inline]
bpf_test_run+0x3ed/0xc50 net/bpf/test_run.c:50
bpf_prog_test_run_skb+0xabc/0x1c50 net/bpf/test_run.c:582
bpf_prog_test_run kernel/bpf/syscall.c:3127 [inline]
__do_sys_bpf+0x1ea9/0x4f00 kernel/bpf/syscall.c:4406
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xae
[...]
Generally speaking, KUBSAN reports from the kernel should be fixed.
However, in case of BPF, this particular report caused concerns since
the large shift is not wrong from BPF point of view, just undefined.
In the verifier, K-based shifts that are >= {64,32} (depending on the
bitwidth of the instruction) are already rejected. The register-based
cases were not given their content might not be known at verification
time. Ideas such as verifier instruction rewrite with an additional
AND instruction for the source register were brought up, but regularly
rejected due to the additional runtime overhead they incur.
As Edward Cree rightly put it:
Shifts by more than insn bitness are legal in the BPF ISA; they are
implementation-defined behaviour [of the underlying architecture],
rather than UB, and have been made legal for performance reasons.
Each of the JIT backends compiles the BPF shift operations to machine
instructions which produce implementation-defined results in such a
case; the resulting contents of the register may be arbitrary but
program behaviour as a whole remains defined.
Guard checks in the fast path (i.e. affecting JITted code) will thus
not be accepted.
The case of division by zero is not truly analogous here, as division
instructions on many of the JIT-targeted architectures will raise a
machine exception / fault on division by zero, whereas (to the best
of my knowledge) none will do so on an out-of-bounds shift.
Given the KUBSAN report only affects the BPF interpreter, but not JITs,
one solution is to add the ANDs with 63 or 31 into ___bpf_prog_run().
That would make the shifts defined, and thus shuts up KUBSAN, and the
compiler would optimize out the AND on any CPU that interprets the shift
amounts modulo the width anyway (e.g., confirmed from disassembly that
on x86-64 and arm64 the generated interpreter code is the same before
and after this fix).
The BPF interpreter is slow path, and most likely compiled out anyway
as distros select BPF_JIT_ALWAYS_ON to avoid speculative execution of
BPF instructions by the interpreter. Given the main argument was to
avoid sacrificing performance, the fact that the AND is optimized away
from compiler for mainstream archs helps as well as a solution moving
forward. Also add a comment on LSH/RSH/ARSH translation for JIT authors
to provide guidance when they see the ___bpf_prog_run() interpreter
code and use it as a model for a new JIT backend.
Reported-by: syzbot+bed360704c521841c85d@syzkaller.appspotmail.com
Reported-by: Kurt Manucredo <fuzzybritches0@gmail.com>
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Co-developed-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: syzbot+bed360704c521841c85d@syzkaller.appspotmail.com
Cc: Edward Cree <ecree.xilinx@gmail.com>
Link: https://lore.kernel.org/bpf/0000000000008f912605bd30d5d7@google.com
Link: https://lore.kernel.org/bpf/bac16d8d-c174-bdc4-91bd-bfa62b410190@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c2ae34a7de ]
Don't check the sequence number when deciding when to update time_in in
the node table if tag removal is offloaded since the sequence number is
part of the tag. This fixes a problem where the times in the node table
wouldn't update when 0 appeared to be before or equal to seq_out when
tag removal was offloaded.
Signed-off-by: George McCollister <george.mccollister@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a7b2451d31 ]
Calling free_mqd inside of destroy_queue_nocpsch_locked can cause a
circular lock. destroy_queue_nocpsch_locked is called under a DQM lock,
which is taken in MMU notifiers, potentially in FS reclaim context.
Taking another lock, which is BO reservation lock from free_mqd, while
causing an FS reclaim inside the DQM lock creates a problematic circular
lock dependency. Therefore move free_mqd out of
destroy_queue_nocpsch_locked and call it after unlocking DQM.
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 63f6e01237 ]
get_wave_state acquires the mmap_lock on copy_to_user but so do
mmu_notifiers. mmu_notifiers allows dqm locking so do get_wave_state
outside the dqm_lock to prevent circular locking.
v2: squash in unused variable removal.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0b5f0f29b1 ]
The SJA1110 switch integrates TJA1103 PHYs, but in SJA1110 switch rev B
silicon, there is a bug in that the registers for selecting the 100base-T1
autoneg master/slave roles are not writable.
To enable write access to the master/slave registers, these additional
PHY writes are necessary during initialization.
The issue has been corrected in later SJA1110 silicon versions and is
not present in the standalone PHY variants, but applying the workaround
unconditionally in the driver should not do any harm.
Suggested-by: Radu Pirea (NXP OSS) <radu-nicolae.pirea@oss.nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 11ef6bc846 ]
At least on wl12xx, reading the MAC after boot can fail with a warning
at drivers/net/wireless/ti/wlcore/sdio.c:78 wl12xx_sdio_raw_read.
The failed call comes from wl12xx_get_mac() that wlcore_nvs_cb() calls
after request_firmware_work_func().
After the error, no wireless interface is created. Reloading the wl12xx
module makes the interface work.
Turns out the wlan controller can be in a low-power ELP state after the
boot from the bootloader or kexec, and needs to be woken up first.
Let's wake the hardware and add a sleep after that similar to
wl12xx_pre_boot() is already doing.
Note that a similar issue could exist for wl18xx, but I have not seen it
so far. And a search for wl18xx_get_mac and wl12xx_sdio_raw_read did not
produce similar errors.
Cc: Carl Philipp Klemm <philipp@uvos.xyz>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20210603062814.19464-1-tony@atomide.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 991bd8d7bc ]
Some architectures have pages larger than 4k and committing a full
page causes needless overhead.
Fix this by writing a single block when committing the superblock.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6fd06963fa ]
When memory allocation for XFRMA_ENCAP or XFRMA_COADDR fails,
the error will not be reported because the -ENOMEM assignment
to the err variable is overwritten before. Fix this by moving
these two in front of the function so that memory allocation
failures will be reported.
Reported-by: Tobias Brunner <tobias@strongswan.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 73214a690c ]
Fix the following kernel build warnings:
drivers/net/ethernet/ibm/ibmvnic.c:1516: warning: Function parameter or member 'skb' not described in 'build_hdr_descs_arr'
drivers/net/ethernet/ibm/ibmvnic.c:1516: warning: Function parameter or member 'indir_arr' not described in 'build_hdr_descs_arr'
drivers/net/ethernet/ibm/ibmvnic.c:1516: warning: Excess function parameter 'txbuff' description in 'build_hdr_descs_arr'
Signed-off-by: Lijun Pan <lijunp213@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 03fc4cf45d ]
For each CRTC state, check the size of Gamma and Degamma LUTs so
unexpected and larger sizes wouldn't slip through.
TEST: IGT:kms_color::pipe-invalid-gamma-lut-sizes
v2: fix assignments in if clauses, Mark's email.
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Mark Yacoub <markyacoub@chromium.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1ee8856de8 ]
It has been reported that on RTL8106e the link-up interrupt may be
significantly delayed if the user enables ASPM L1. Per default ASPM
is disabled. The change leaves L1 enabled on the PCIe link (thus still
allowing to reach higher package power saving states), but the
NIC won't actively trigger it.
Reported-by: Koba Ko <koba.ko@canonical.com>
Tested-by: Koba Ko <koba.ko@canonical.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8a55a73433 ]
Use devm_platform_get_and_ioremap_resource() to simplify
code and avoid a null-ptr-deref by checking 'res' in it.
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f18c11812c ]
It will cause null-ptr-deref if platform_get_resource() returns NULL,
we need check the return value.
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bc05716d4f ]
Fixes handling when page tables are in system memory.
v3: remove struct amdgpu_vm_parser.
v2: remove unwanted variable.
change amdgpu_amdkfd_validate instead of amdgpu_amdkfd_bo_validate.
Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit db8f7be1e1 ]
It will cause null-ptr-deref if platform_get_resource() returns NULL,
we need check the return value.
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e67dfb8d15 ]
Several tests do not set some ports down as part of their cleanup(),
resulting in IPv6 link-local addresses and associated routes not being
deleted.
These leaks were found using a BPF tool that monitors ASIC resources.
Solve this by setting the ports down at the end of the tests.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6813cc8cfd ]
PHY will delay about 11.5ms to generate RXC clock when switching from
power down to normal operation. Read/write registers would also cause RXC
become unstable and stop for a while during this process. Realtek engineer
suggests 15ms or more delay can workaround this issue.
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 665f28507a ]
[Why]
When calculating recout width for an MPO plane on a mode that's using
ODM combine, driver can calculate a negative value, resulting in a
crash.
[How]
For negative widths, use zero such that validation will prune the
configuration correctly and disallow MPO.
Signed-off-by: Aric Cyr <aric.cyr@amd.com>
Reviewed-by: Krunoslav Kovac <Krunoslav.Kovac@amd.com>
Acked-by: Stylon Wang <stylon.wang@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3f8518b60c ]
[why]
When OS overrides training link training parameters
for MST device to SST mode, MST resources are not
released and leak of the resource may result crash and
incorrect MST discovery during following hot plugs.
[how]
Retaining sink object to be reused by SST link and
releasing MST resources.
Signed-off-by: Vladimir Stempen <vladimir.stempen@amd.com>
Reviewed-by: Wenjing Liu <Wenjing.Liu@amd.com>
Acked-by: Stylon Wang <stylon.wang@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c521fc316d ]
[Why]
We update scaling settings when scaling mode has been changed.
However when changing mode from native resolution the scaling mode previously
set gets ignored.
[How]
Perform scaling settings update on modeset.
Signed-off-by: Roman Li <roman.li@amd.com>
Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com>
Acked-by: Stylon Wang <stylon.wang@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 346cf627fb ]
[why]
DSCCLK validation is not necessary because DSCCLK is derrived from
DISPCLK, therefore if DISPCLK validation passes, DSCCLK is valid, too.
Doing DSCLK validation in addition to DISPCLK leads to modes being
wrongly rejected when DSCCLK was incorrectly set outside of DML.
[how]
Remove DSCCLK validation because it's implicitly validated under DISPCLK
Signed-off-by: Nikola Cornij <nikola.cornij@amd.com>
Reviewed-by: Dmytro Laktyushkin <Dmytro.Laktyushkin@amd.com>
Acked-by: Stylon Wang <stylon.wang@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 35cba15a50 ]
Use devm_platform_get_and_ioremap_resource() to simplify
code and avoid a null-ptr-deref by checking 'res' in it.
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 20f1932e22 ]
It will cause null-ptr-deref if platform_get_resource() returns NULL,
we need check the return value.
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0bb51a3a38 ]
It will cause null-ptr-deref if platform_get_resource() returns NULL,
we need check the return value.
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 85eb138945 ]
We should not directly BUG() when there is hdr error, it is
better to output a print when such error happens. Currently,
the caller of xmit_skb() already did it.
Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fb3612840d ]
It may need hold Global Config Lock a longer time when download DDP
package file, extend the timeout value to 5000ms to ensure that
download can be finished before other AQ command got time to run,
this will fix the issue below when probe the device, 5000ms is a test
value that work with both Backplane and BreakoutCable NVM image:
ice 0000:f4:00.0: VSI 12 failed lan queue config, error ICE_ERR_CFG
ice 0000:f4:00.0: Failed to delete VSI 12 in FW - error: ICE_ERR_AQ_TIMEOUT
ice 0000:f4:00.0: probe failed due to setup PF switch: -12
ice: probe of 0000:f4:00.0 failed with error -12
Signed-off-by: Liwei Song <liwei.song@windriver.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 897120d41e ]
Checking value of MCP_INTF in mcp23s08_irq suggests that the handler may be
called even when there is no interrupt pending.
But the actual interrupt could happened between reading MCP_INTF and MCP_GPIO.
In this situation we got nothing from MCP_INTF, but the event gets acknowledged
on the expander by reading MCP_GPIO. This leads to losing events.
Fix the problem by not reading any register until we see something in MCP_INTF.
The error was reproduced and fix tested on MCP23017.
Signed-off-by: Radim Pavlik <radim.pavlik@tbs-biometrics.com>
Link: https://lore.kernel.org/r/AM7PR06MB6769E1183F68DEBB252F665ABA3E9@AM7PR06MB6769.eurprd06.prod.outlook.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fcb3463585 ]
According to the standard IEC 62439-2, the number of transitions needs
to be counted for each transition 'between' ring state open and ring
state closed and not from open state to closed state.
Therefore fix this for both ring and interconnect ring.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9ffbbb435d ]
Introduce the BIO flag BIO_ZONE_WRITE_LOCKED to indicate that a BIO owns
the write lock of the zone it is targeting. This is the counterpart of
the struct request flag RQF_ZONE_WRITE_LOCKED.
This new BIO flag is reserved for now for zone write locking control
for device mapper targets exposing a zoned block device. Since in this
case, the lock flag must not be propagated to the struct request that
will be used to process the BIO, a BIO private flag is used rather than
changing the RQF_ZONE_WRITE_LOCKED request flag into a common REQ_XXX
flag that could be used for both BIO and request. This avoids conflicts
down the stack with the block IO scheduler zone write locking
(in mq-deadline).
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6842d264aa ]
Fix dm_accept_partial_bio() to actually check that zone management
commands are not passed as explained in the function documentation
comment. Also, since a zone append operation cannot be split, add
REQ_OP_ZONE_APPEND as a forbidden command.
White lines are added around the group of BUG_ON() calls to make the
code more legible.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ee50cc19d8 ]
If dm-writecache overwrites existing cached data, it splits the
incoming bio into many block-sized bios. The I/O scheduler does merge
these bios into one large request but this needless splitting and
merging causes performance degradation.
Fix this by avoiding bio splitting if the cache target area that is
being overwritten is contiguous.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5faafc77f7 ]
Current commit code resets the place where the search for free blocks
will begin back to the start of the metadata device. There are a couple
of repercussions to this:
- The first allocation after the commit is likely to take longer than
normal as it searches for a free block in an area that is likely to
have very few free blocks (if any).
- Any free blocks it finds will have been recently freed. Reusing them
means we have fewer old copies of the metadata to aid recovery from
hardware error.
Fix these issues by leaving the cursor alone, only resetting when the
search hits the end of the metadata device.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e32ea44c7a ]
Including <linux/in.h> and <netinet/in.h> in the dependencies breaks
compilation of trinity due to multiple definitions. <linux/in.h> is only
used in <linux/icmp.h> to provide the definition of the struct in_addr,
but this can be substituted out by using the datatype __be32.
Signed-off-by: Andreas Roeseler <andreas.a.roeseler@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a29cb69146 ]
This patch aims to improve the situation when reordering and loss are
ocurring in the same flight of packets.
Previously the reordering would first induce a spurious recovery, then
the subsequent ACK may undo the cwnd (based on the timestamps e.g.).
However the current loss recovery does not proceed to invoke
RACK to install a reordering timer. If some packets are also lost, this
may lead to a long RTO-based recovery. An example is
https://groups.google.com/g/bbr-dev/c/OFHADvJbTEI
The solution is to after reverting the recovery, always invoke RACK
to either mount the RACK timer to fast retransmit after the reordering
window, or restarts the recovery if new loss is identified. Hence
it is possible the sender may go from Recovery to Disorder/Open to
Recovery again in one ACK.
Reported-by: mingkun bian <bianmingkun@gmail.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7d9c70d235 ]
Take the situation with gfxoff, the optimization may cause
corrupt CE ram contents. In addition emit_cntxcntl callback
has similar optimization which firmware can handle properly
even for power feature.
Signed-off-by: Jiansong Chen <Jiansong.Chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 06888d571b ]
Instead of reading the desired 5 bytes of the actual target field,
the code was reading 8. This could result in a corrupted value if the
trailing 3 bytes were non-zero, so instead use an appropriately sized
and zero-initialized bounce buffer, and read only 5 bytes before casting
to u64.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit eba9852372 ]
Although the kfd_ioctl_get_dmabuf_info() still fail it will indicate
the caller right metadat_size useful for the same kfd ioctl next time.
Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com>
Reviewed-by: Nirmoy Das <nirmoy.das@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit eb3849370a ]
The clock driving the XBurst CPUs in Ingenic SoCs is integer divided
from the main PLL. As such, it is possible to control the frequency of
the CPU, either by changing the divider, or by changing the rate of the
main PLL.
The XBurst CPUs also lack the CP0 timer; the TCU, a separate piece of
hardware in the SoC, provides this functionality.
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fc52f92a65 ]
Ingenic JZ4760 and JZ4760B do have a FPU, but the config registers don't
report it. Force the FPU detection in case the processor ID match the
JZ4760(B) one.
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 62f20e068c ]
This is a complement to commit aa6dd211e4 ("inet: use bigger hash
table for IP ID generation"), but focusing on some specific aspects
of IPv6.
Contary to IPv4, IPv6 only uses packet IDs with fragments, and with a
minimum MTU of 1280, it's much less easy to force a remote peer to
produce many fragments to explore its ID sequence. In addition packet
IDs are 32-bit in IPv6, which further complicates their analysis. On
the other hand, it is often easier to choose among plenty of possible
source addresses and partially work around the bigger hash table the
commit above permits, which leaves IPv6 partially exposed to some
possibilities of remote analysis at the risk of weakening some
protocols like DNS if some IDs can be predicted with a good enough
probability.
Given the wide range of permitted IDs, the risk of collision is extremely
low so there's no need to rely on the positive increment algorithm that
is shared with the IPv4 code via ip_idents_reserve(). We have a fast
PRNG, so let's simply call prandom_u32() and be done with it.
Performance measurements at 10 Gbps couldn't show any difference with
the previous code, even when using a single core, because due to the
large fragments, we're limited to only ~930 kpps at 10 Gbps and the cost
of the random generation is completely offset by other operations and by
the network transfer time. In addition, this change removes the need to
update a shared entry in the idents table so it may even end up being
slightly faster on large scale systems where this matters.
The risk of at least one collision here is about 1/80 million among
10 IDs, 1/850k among 100 IDs, and still only 1/8.5k among 1000 IDs,
which remains very low compared to IPv4 where all IDs are reused
every 4 to 80ms on a 10 Gbps flow depending on packet sizes.
Reported-by: Amit Klein <aksecurity@gmail.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20210529110746.6796-1-w@1wt.eu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a7196048cd ]
The PLLU (USB) consists of the PLL configuration itself and configuration
of the PLLU outputs. The PLLU programming is inconsistent on T30 vs T114,
where T114 immediately bails out if PLLU is enabled and T30 re-enables
a potentially already enabled PLL (left after bootloader) and then fully
reprograms it, which could be unsafe to do. The correct way should be to
skip enabling of the PLL if it's already enabled and then apply
configuration to the outputs. This patch doesn't fix any known problems,
it's a minor improvement.
Acked-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c592c8a28f ]
The refcounting of the gate clocks has a bug causing the enable_refcnt
to underflow when unused clocks are disabled. This happens because clk
provider erroneously bumps the refcount if clock is enabled at a boot
time, which it shouldn't be doing, and it does this only for the gate
clocks, while peripheral clocks are using the same gate ops and the
peripheral clocks are missing the initial bump. Hence the refcount of
the peripheral clocks is 0 when unused clocks are disabled and then the
counter is decremented further by the gate ops, causing the integer
underflow.
Fix this problem by removing the erroneous bump and by implementing the
disable_unused() callback, which disables the unused gates properly.
The visible effect of the bug is such that the unused clocks are never
gated if a loaded kernel module grabs the unused clocks and starts to use
them. In practice this shouldn't cause any real problems for the drivers
and boards supported by the kernel today.
Acked-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e16efff4e5 ]
The driver currently exposes several YUV formats but fails to properly
program all the registers needed to display such formats. Add the right
programming sequences so that overlay windows can be used to accelerate
color format conversions in multimedia playback use-cases.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4751d2aa32 ]
stmmac_mdio_register() has logic to search for PHYs on the MDIO bus and
assign them IRQ lines, as well as to set priv->plat->phy_addr.
If no PHY is found, the "found" variable remains set to 0 and the
function errors out.
After the introduction of commit f213bbe8a9 ("net: stmmac: Integrate
it with DesignWare XPCS"), the "found" variable was immediately reused
for searching for a PCS on the same MDIO bus.
This can result in 2 types of potential problems (none of them seems to
be seen on the only Intel system that sets has_xpcs = true, otherwise it
would have been reported):
1. If a PCS is found but a PHY is not, then the code happily exits with
no error. One might say "yes, but this is not possible, because
of_mdiobus_register will probe a PHY for all MDIO addresses,
including for the XPCS, so if an XPCS exists, then a PHY certainly
exists too". Well, that is not true, see intel_mgbe_common_data():
/* Ensure mdio bus scan skips intel serdes and pcs-xpcs */
plat->mdio_bus_data->phy_mask = 1 << INTEL_MGBE_ADHOC_ADDR;
plat->mdio_bus_data->phy_mask |= 1 << INTEL_MGBE_XPCS_ADDR;
2. A PHY is found but an MDIO device with the XPCS PHY ID isn't, and in
that case, the error message will be "No PHY found". Confusing.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20210527155959.3270478-1-olteanv@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8613641063 ]
Fix the logic so that if both ports netdevices are enabled or disabled,
use the trivial mapping without swapping.
If only one of the netdevice's tx is enabled, use it to remap traffic to
that port.
Signed-off-by: Eli Cohen <elic@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c07274ab1a ]
rep_tc copy REG_C1 to REG_B. IPsec crypto utilizes the whole REG_B
register with BIT31 as IPsec marker. rep_tc_update_skb drops
IPsec because it thought REG_B contains bad value.
In previous patch, BIT 31 of REG_C1 is reserved for IPsec.
Skip the rep_tc_update_skb if BIT31 of REG_B is set.
Signed-off-by: Huy Nguyen <huyn@nvidia.com>
Signed-off-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6566cae7ae ]
There are two issues with scaling calculations, odm recout
calculation and matching viewport to actual recout.
This change fixes both issues. Odm recout calculation via
special casing and viewport matching issue by reworking
the viewport calcualtion to use scaling ratios and recout
to derrive the required offset and size.
Signed-off-by: Dmytro Laktyushkin <Dmytro.Laktyushkin@amd.com>
Reviewed-by: Jun Lei <Jun.Lei@amd.com>
Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b514958dd1 ]
The igb driver was trying hard to be sparse correct, but somehow
ended up converting a variable into little endian order and then
tries to OR something with it.
A much plainer way of doing things is to leave all variables and
OR operations in CPU (non-endian) mode, and then convert to
little endian only once, which is what this change does.
This probably fixes a bug that might have been seen only on
big endian systems.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c7cbfb028b ]
The sparse build (C=2) finds some issues with how the driver
dealt with the (very difficult) hardware that in some generations
uses little-endian, and in others uses big endian, for the VLAN
field. The code as written picks __le16 as a type and for some
hardware revisions we override it to __be16 as done in this
patch. This impacted the VF driver as well so fix it there too.
Also change the vlan_tci assignment to override the sparse
warning without changing functionality.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d4ef55288a ]
Sparse tool was warning on some implicit conversions from
little endian data read from the EEPROM on the e100 cards.
Fix these by being explicit about the conversions using
le16_to_cpu().
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d0d62baa7f ]
Printing kernel pointers is discouraged because they might leak kernel
memory layout. This fixes smatch warning:
drivers/net/ethernet/xilinx/xilinx_emaclite.c:1191 xemaclite_of_probe() warn:
argument 4 to %08lX specifier is cast from pointer
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fa236c2b2d ]
In function udf_symlink, epos.bh is assigned with the value returned
by udf_tgetblk. The function udf_tgetblk is defined in udf/misc.c
and returns the value of sb_getblk function that could be NULL.
Then, epos.bh is used without any check, causing a possible
NULL pointer dereference when sb_getblk fails.
This fix adds a check to validate the value of epos.bh.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=213083
Signed-off-by: Arturo Giusti <koredump@protonmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c61cdbdbff ]
Problem: If scheduler is already stopped by the time sched_entity
is released and entity's job_queue not empty I encountred
a hang in drm_sched_entity_flush. This is because drm_sched_entity_is_idle
never becomes false.
Fix: In drm_sched_fini detach all sched_entities from the
scheduler's run queues. This will satisfy drm_sched_entity_is_idle.
Also wakeup all those processes stuck in sched_entity flushing
as the scheduler main thread which wakes them up is stopped by now.
v2:
Reverse order of drm_sched_rq_remove_entity and marking
s_entity as stopped to prevent reinserion back to rq due
to race.
v3:
Drop drm_sched_rq_remove_entity, only modify entity->stopped
and check for it in drm_sched_entity_is_idle
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210512142648.666476-14-andrey.grodzovsky@amd.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 20e76d3d04 ]
In case of error, the function mdiobus_get_phy() returns NULL
pointer not ERR_PTR(). The IS_ERR() test in the return value
check should be replaced with NULL test.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 05ff8435e5 ]
modern userspace applications, like OVN, can configure the TC datapath to
"recirculate" packets several times. If more than 4 "recirculation" rules
are configured, packets can be dropped by __tcf_classify().
Changing the maximum number of reclassifications (from 4 to 16) should be
sufficient to prevent drops in most use cases, and guard against loops at
the same time.
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 86544c3de6 ]
Similar to the way in which of_mdiobus_register() has a fallback to the
non-DT based mdiobus_register() when CONFIG_OF is not set, we can create
a shim for the device-managed devm_of_mdiobus_register() which calls
devm_mdiobus_register() and discards the struct device_node *.
In particular, this solves a build issue with the qca8k DSA driver which
uses devm_of_mdiobus_register and can be compiled without CONFIG_OF.
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b097bea102 ]
mdio drivers should not use REGCHACHE. Also disable locking since it's
handled by the mdio users and regmap is always accessed atomically.
Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 09da319182 ]
GPUs with more than a single regulator (e.g. G72 on MT8183) will
require platform-specific handling for devfreq, for 2 reasons:
1. The opp core (drivers/opp/core.c:_generic_set_opp_regulator)
does not support multiple regulators, so we'll need custom
handlers.
2. Generally, platforms with 2 regulators have platform-specific
constraints on how the voltages should be set (e.g.
minimum/maximum voltage difference between them), so we
should not just create generic handlers that simply
change the voltages without taking care of those constraints.
Disable devfreq for now on those GPUs.
Signed-off-by: Nicolas Boichat <drinkcat@chromium.org>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210421132841.v13.3.I3af068abe30c9c85cabc4486385c52e56527a509@changeid
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 69777e6ca3 ]
pm_runtime_get_sync will increment pm usage counter even it failed.
Forgetting to putting operation will result in reference leak here.
Fix it by replacing it with pm_runtime_resume_and_get to keep usage
counter balanced.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wang Li <wangli74@huawei.com>
Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8380c81d5c ]
__napi_schedule_irqoff() is an optimized version of __napi_schedule()
which can be used where it is known that interrupts are disabled,
e.g. in interrupt-handlers, spin_lock_irq() sections or hrtimer
callbacks.
On PREEMPT_RT enabled kernels this assumptions is not true. Force-
threaded interrupt handlers and spinlocks are not disabling interrupts
and the NAPI hrtimer callback is forced into softirq context which runs
with interrupts enabled as well.
Chasing all usage sites of __napi_schedule_irqoff() is a whack-a-mole
game so make __napi_schedule_irqoff() invoke __napi_schedule() for
PREEMPT_RT kernels.
The callers of ____napi_schedule() in the networking core have been
audited and are correct on PREEMPT_RT kernels as well.
Reported-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 34e7434ba4 ]
This module's remove path calls del_timer(). However, that function
does not wait until the timer handler finishes. This means that the
timer handler may still be running after the driver's remove function
has finished, which would result in a use-after-free.
Fix by calling del_timer_sync(), which makes sure the timer handler
has finished, and unable to re-schedule itself.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zou Wei <zou_wei@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 009fc857c5 ]
This module's remove path calls del_timer(). However, that function
does not wait until the timer handler finishes. This means that the
timer handler may still be running after the driver's remove function
has finished, which would result in a use-after-free.
Fix by calling del_timer_sync(), which makes sure the timer handler
has finished, and unable to re-schedule itself.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zou Wei <zou_wei@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1c72e6ab66 ]
This module's remove path calls del_timer(). However, that function
does not wait until the timer handler finishes. This means that the
timer handler may still be running after the driver's remove function
has finished, which would result in a use-after-free.
Fix by calling del_timer_sync(), which makes sure the timer handler
has finished, and unable to re-schedule itself.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zou Wei <zou_wei@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 33ae8f801a ]
If multiple threads are accessing the same huge page at the same
time, hugetlb_cow will be called if one thread write the COW huge
page. And function huge_ptep_clear_flush is called to notify other
threads to clear the huge pte tlb entry. The other threads clear
the huge pte tlb entry and reload it from page table, the reload
huge pte entry may be old.
This patch fixes this issue on mips platform, and it clears huge
pte entry before notifying other threads to flush current huge
page entry, it is similar with other architectures.
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6a73022ee3 ]
On the Loongson64 platform, if CONFIG_SMP is not set, the following build
error will occur:
arch/mips/loongson64/reset.c:133:2: error:'secondary_kexec_args' undeclared
Because the definition and declaration of secondary_kexec_args are in the
CONFIG_SMP, the secondary_kexec_args variable should be used in CONFIG_SMP.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Youling Tang <tangyouling@loongson.cn>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c31bef1cb1 ]
[Why]
Currently, the code that fills the clock table can miss filling
information about some of the higher voltage states advertised
by the SMU. This, in turn, may cause some of the higher pixel clock
modes (e.g. 8k60) to fail validation.
[How]
Fill the table with one entry per DCFCLK level instead of one entry
per FCLK level. This is needed because the maximum FCLK does not
necessarily need maximum voltage, whereas DCFCLK values from SMU
cover the full voltage range.
Signed-off-by: Ilya Bakoulin <Ilya.Bakoulin@amd.com>
Reviewed-by: Dmytro Laktyushkin <Dmytro.Laktyushkin@amd.com>
Acked-by: Stylon Wang <stylon.wang@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 443ef39b49 ]
Sparse is not happy about handling of strict types in pch_ptp_match():
.../pch_gbe_main.c:158:33: warning: incorrect type in argument 2 (different base types)
.../pch_gbe_main.c:158:33: expected unsigned short [usertype] uid_hi
.../pch_gbe_main.c:158:33: got restricted __be16 [usertype]
.../pch_gbe_main.c:158:45: warning: incorrect type in argument 3 (different base types)
.../pch_gbe_main.c:158:45: expected unsigned int [usertype] uid_lo
.../pch_gbe_main.c:158:45: got restricted __be32 [usertype]
.../pch_gbe_main.c:158:56: warning: incorrect type in argument 4 (different base types)
.../pch_gbe_main.c:158:56: expected unsigned short [usertype] seqid
.../pch_gbe_main.c:158:56: got restricted __be16 [usertype]
Fix that by switching to use proper accessors to BE data.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Tested-by: Flavio Suligoi <f.suligoi@asem.it>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 94dfec48fc ]
Some standard resolutions like 1366x768 do not work properly with
i.MX6 SoCs, since the horizontal resolution needs to be aligned
to 8 pixels (so 1360x768 or 1368x768 would work).
This patch allocates framebuffers allocated to 8 pixels. The extra
time required to send the extra pixels are removed from the blank
time. In order to expose the correct display size to userspace,
the stride is increased without increasing the width.
Without this patch systems with this display resolution hang
indefinitely during boot up.
Suggested-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Link: https://lore.kernel.org/r/20210428222953.235280-3-sebastian.reichel@collabora.com
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 95ea3dbc4e ]
Disable all ip's hw status to false before any hw_init.
Only set it to true until its hw_init is executed.
The old 5.9 branch has this change but somehow the 5.11 kernrel does
not have this fix.
Without this change, sriov tdr have gfx IB test fail.
Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com>
Review-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d5433a9f69 ]
[Why]
This change only fix dpp clock switch to lower case.
New solution later can fix both case, which is "dc: skip
program clock when allow seamless boot"
[How]
This reverts commit "dc: wait vblank when stream enabled
and update dpp clock"
Signed-off-by: Lewis Huang <Lewis.Huang@amd.com>
Reviewed-by: Eric Yang <eric.yang2@amd.com>
Acked-by: Wayne Lin <waynelin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ba4e0339a6 ]
[Bug][DP501]
If ASPEED P2A (PCI to AHB) bridge is disabled and disallowed for
CVE_2019_6260 item3, and then the monitor's EDID is unable read through
Parade DP501.
The reason is the DP501's FW is mapped to BMC addressing space rather
than Host addressing space.
The resolution is that using "pci_iomap_range()" maps to DP501's FW that
stored on the end of FB (Frame Buffer).
In this case, FrameBuffer reserves the last 2MB used for the image of
DP501.
Signed-off-by: KuoHsiang Chou <kuohsiang_chou@aspeedtech.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20210421085859.17761-1-kuohsiang_chou@aspeedtech.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 2c669ef697 upstream.
Powerpc currently resets a CPU's idle task preempt_count to 0 before
said task starts executing the secondary startup routine (and becomes an
idle task proper).
This conflicts with commit f1a0a376ca ("sched/core: Initialize the
idle task with preemption disabled").
which initializes all of the idle tasks' preempt_count to
PREEMPT_DISABLED during smp_init(). Note that this was superfluous
before said commit, as back then the hotplug machinery would invoke
init_idle() via idle_thread_get(), which would have already reset the
CPU's idle task's preempt_count to PREEMPT_ENABLED.
Get rid of this preempt_count write.
Fixes: f1a0a376ca ("sched/core: Initialize the idle task with preemption disabled")
Reported-by: Bharata B Rao <bharata@linux.ibm.com>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Bharata B Rao <bharata@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210707183831.2106509-1-valentin.schneider@arm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7154cbd31c upstream.
Compiling the recent dma-iommu changes under 32-bit x86 triggers this
compile warning:
drivers/iommu/dma-iommu.c:249:5: warning: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 3 has type ‘phys_addr_t’ {aka ‘unsigned int’} [-Wformat=]
The reason is that %llx is used to print a variable of type
phys_addr_t. Fix it by using the correct %pa format specifier for
phys_addr_t.
Cc: Srinath Mannam <srinath.mannam@broadcom.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Oza Pawandeep <poza@codeaurora.org>
Fixes: 571f316074 ("iommu/dma: Fix IOVA reserve dma ranges")
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Link: https://lore.kernel.org/r/20210607124905.27525-1-joro@8bytes.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a8ea6fc9b0 upstream.
Commit:
00b89fe019 ("sched: Make the idle task quack like a per-CPU kthread")
... added PF_KTHREAD | PF_NO_SETAFFINITY to the idle kernel threads.
Unfortunately these properties are inherited to the init/0 children
through kernel_thread() calls: init/1 and kthreadd. There are several
side effects to that:
1) kthreadd affinity can not be reset anymore from userspace. Also
PF_NO_SETAFFINITY propagates to all kthreadd children, including
the unbound kthreads Therefore it's not possible anymore to overwrite
the affinity of any of them. Here is an example of warning reported
by rcutorture:
WARNING: CPU: 0 PID: 116 at kernel/rcu/tree_nocb.h:1306 rcu_bind_current_to_nocb+0x31/0x40
Call Trace:
rcu_torture_fwd_prog+0x62/0x730
kthread+0x122/0x140
ret_from_fork+0x22/0x30
2) init/1 does an exec() in the end which clears both
PF_KTHREAD and PF_NO_SETAFFINITY so we are fine once kernel_init()
escapes to userspace. But until then, no initcall or init code can
successfully call sched_setaffinity() to init/1.
Also PF_KTHREAD looks legit on init/1 before it calls exec() but
we better be careful with unknown introduced side effects.
One way to solve the PF_NO_SETAFFINITY issue is to not inherit this flag
on copy_process() at all. The cases where it matters are:
* fork_idle(): explicitly set the flag already.
* fork() syscalls: userspace tasks that shouldn't be concerned by that.
* create_io_thread(): the callers explicitly attribute the flag to the
newly created tasks.
* kernel_thread():
- Fix the issues on init/1 and kthreadd
- Fix the issues on kthreadd children.
- Usermode helper created by an unbound workqueue. This shouldn't
matter. In the worst case it gives more control to userspace
on setting affinity to these short living tasks although this can
be tuned with inherited unbound workqueues affinity already.
Fixes: 00b89fe019 ("sched: Make the idle task quack like a per-CPU kthread")
Reported-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lore.kernel.org/r/20210525235849.441842-1-frederic@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6a942f5780 upstream.
S390's init_idle_preempt_count(p, cpu) doesn't actually let us initialize the
preempt_count of the requested CPU's idle task: it unconditionally writes
to the current CPU's. This clearly conflicts with idle_threads_init(),
which intends to initialize *all* the idle tasks, including their
preempt_count (or their CPU's, if the arch uses a per-CPU preempt_count).
Unfortunately, it seems the way s390 does things doesn't let us initialize
every possible CPU's preempt_count early on, as the pages where this
resides are only allocated when a CPU is brought up and are freed when it
is brought down.
Let the arch-specific code set a CPU's preempt_count when its lowcore is
allocated, and turn init_idle_preempt_count() into an empty stub.
Fixes: f1a0a376ca ("sched/core: Initialize the idle task with preemption disabled")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Link: https://lore.kernel.org/r/20210707163338.1623014-1-valentin.schneider@arm.com
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 29dd19e3ac upstream.
The usage of pm_runtime_resume_and_get() removed the need of a
temporary integer. So, drop it.
Fixes: 59f96244af ("media: exynos4-is: fix pm_runtime_get_sync() usage count")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 70b52f0908 upstream.
According to the eMMC Spec:
"When command queuing is enabled (CMDQ Mode En bit in CMDQ_MODE_EN
field is set to ‘1’) class 11 commands are the only method through
which data transfer tasks can be issued. Existing data transfer
commands, namely CMD18/CMD17 and CMD25/CMD24, are not supported when
command queuing is enabled."
which means if CMDQ is enabled, the FFU commands will not be supported.
To fix this issue, just simply disable CMDQ on the ioctl path, and
re-enable CMDQ once ioctl request is completed.
Tested-by: Michael Brunner <Michael.Brunner@kontron.com>
Signed-off-by: Bean Huo <beanhuo@micron.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Fixes: 1e8e55b670 (mmc: block: Add CQE support)
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210504203209.361597-1-huobean@gmail.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 22634bc562 upstream.
We can't support IOPOLL with non-pollable request types, and we should
check for unused/reserved fields like we do for other request types.
Fixes: 14a1143b68 ("io_uring: add support for IORING_OP_UNLINKAT")
Cc: stable@vger.kernel.org
Reported-by: Dmitry Kadashev <dkadashev@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ed7eb25922 upstream.
We can't support IOPOLL with non-pollable request types, and we should
check for unused/reserved fields like we do for other request types.
Fixes: 80a261fd00 ("io_uring: add support for IORING_OP_RENAMEAT")
Cc: stable@vger.kernel.org
Reported-by: Dmitry Kadashev <dkadashev@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c9c9762d4d upstream.
After commit 07173c3ec2 ("block: enable multipage bvecs"), a bvec can
have multiple pages. But bio_will_gap() still assumes one page bvec while
checking for merging. If the pages in the bvec go across the
seg_boundary_mask, this check for merging can potentially succeed if only
the 1st page is tested, and can fail if all the pages are tested.
Later, when SCSI builds the SG list the same check for merging is done in
__blk_segment_map_sg_merge() with all the pages in the bvec tested. This
time the check may fail if the pages in bvec go across the
seg_boundary_mask (but tested okay in bio_will_gap() earlier, so those
BIOs were merged). If this check fails, we end up with a broken SG list
for drivers assuming the SG list not having offsets in intermediate pages.
This results in incorrect pages written to the disk.
Fix this by returning the multi-page bvec when testing gaps for merging.
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Pavel Begunkov <asml.silence@gmail.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Jeffle Xu <jefflexu@linux.alibaba.com>
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org
Fixes: 07173c3ec2 ("block: enable multipage bvecs")
Signed-off-by: Long Li <longli@microsoft.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/1623094445-22332-1-git-send-email-longli@linuxonhyperv.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2fc2b430f5 upstream.
Typically, the cryptographic APIs that fscrypt uses take keys as byte
arrays, which avoids endianness issues. However, siphash_key_t is an
exception. It is defined as 'u64 key[2];', i.e. the 128-bit key is
expected to be given directly as two 64-bit words in CPU endianness.
fscrypt_derive_dirhash_key() and fscrypt_setup_iv_ino_lblk_32_key()
forgot to take this into account. Therefore, the SipHash keys used to
index encrypted+casefolded directories differ on big endian vs. little
endian platforms, as do the SipHash keys used to hash inode numbers for
IV_INO_LBLK_32-encrypted directories. This makes such directories
non-portable between these platforms.
Fix this by always using the little endian order. This is a breaking
change for big endian platforms, but this should be fine in practice
since these features (encrypt+casefold support, and the IV_INO_LBLK_32
flag) aren't known to actually be used on any big endian platforms yet.
Fixes: aa408f835d ("fscrypt: derive dirhash key for casefolded directories")
Fixes: e3b1078bed ("fscrypt: add support for IV_INO_LBLK_32 policies")
Cc: <stable@vger.kernel.org> # v5.6+
Link: https://lore.kernel.org/r/20210605075033.54424-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 77f30bfcfc upstream.
When initializing a no-key name, fscrypt_fname_disk_to_usr() sets the
minor_hash to 0 if the (major) hash is 0.
This doesn't make sense because 0 is a valid hash code, so we shouldn't
ignore the filesystem-provided minor_hash in that case. Fix this by
removing the special case for 'hash == 0'.
This is an old bug that appears to have originated when the encryption
code in ext4 and f2fs was moved into fs/crypto/. The original ext4 and
f2fs code passed the hash by pointer instead of by value. So
'if (hash)' actually made sense then, as it was checking whether a
pointer was NULL. But now the hashes are passed by value, and
filesystems just pass 0 for any hashes they don't have. There is no
need to handle this any differently from the hashes actually being 0.
It is difficult to reproduce this bug, as it only made a difference in
the case where a filename's 32-bit major hash happened to be 0.
However, it probably had the largest chance of causing problems on
ubifs, since ubifs uses minor_hash to do lookups of no-key names, in
addition to using it as a readdir cookie. ext4 only uses minor_hash as
a readdir cookie, and f2fs doesn't use minor_hash at all.
Fixes: 0b81d07790 ("fs crypto: move per-file encryption from f2fs tree to fs/crypto")
Cc: <stable@vger.kernel.org> # v4.6+
Link: https://lore.kernel.org/r/20210527235236.2376556-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4012baeab6 upstream.
When lpfc is handling a solicited and unsolicited PLOGI with another
initiator, the remote initiator is never recovered. The node for the
initiator is erroneouosly removed and all resources released.
In lpfc_cmpl_els_plogi(), when lpfc_els_retry() returns a failure code, the
driver is calling the state machine with a device remove event because the
remote port is not currently registered with the SCSI or NVMe
transports. The issue is that on a PLOGI "collision" the driver correctly
aborts the solicited PLOGI and allows the unsolicited PLOGI to complete the
process, but this process is interrupted with a device_rm event.
Introduce logic in the PLOGI completion to capture the PLOGI collision
event and jump out of the routine. This will avoid removal of the node.
If there is no collision, the normal node removal will occur.
Fixes: 52edb2caf6 ("scsi: lpfc: Remove ndlp when a PLOGI/ADISC/PRLI/REG_RPI ultimately fails")
Cc: <stable@vger.kernel.org> # v5.11+
Link: https://lore.kernel.org/r/20210514195559.119853-6-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 01131e7aae upstream.
While testing NPIV and watching logins and used RPI levels, it was seen the
used RPI count was much higher than the number of remote ports discovered.
Code inspection showed that remote port removals on any NPIV instance are
releasing the RPI, but not performing an UNREG_RPI with the adapter thus
the reference counting never fully drops and the RPI is never fully
released. This was happening on NPIV nodes due to a log of fabric ELS's to
fabric addresses. This lack of UNREG_RPI was introduced by a prior node
rework patch that performed the UNREG_RPI as part of node cleanup.
To resolve the issue, do the following:
- Restore the RPI release code, but move the location to so that it is in
line with the new node cleanup design.
- NPIV ports now release the RPI and drop the node when the caller sets
the NLP_RELEASE_RPI flag.
- Set the NLP_RELEASE_RPI flag in node cleanup which will trigger a
release of RPI to free pool.
- Ensure there's an UNREG_RPI at LOGO completion so that RPI release is
completed.
- Stop offline_prep from skipping nodes that are UNUSED. The RPI may
not have been released.
- Stop the default RPI handling in lpfc_cmpl_els_rsp() for SLI4.
- Fixed up debugfs RPI displays for better debugging.
Fixes: a70e63eee1 ("scsi: lpfc: Fix NPIV Fabric Node reference counting")
Link: https://lore.kernel.org/r/20210514195559.119853-2-jsmart2021@gmail.com
Cc: <stable@vger.kernel.org> # v5.11+
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1e5654de0f upstream.
The compatibility issue between linux exfat and exfat of some camera
company was reported from Florian. In their exfat, if the number of files
exceeds any limit, the DataLength in stream entry of the directory is
no longer updated. So some files created from camera does not show in
linux exfat. because linux exfat doesn't allow that cpos becomes larger
than DataLength of stream entry. This patch check DataLength in stream
entry only if the type is ALLOC_NO_FAT_CHAIN and add the check ensure
that dentry offset does not exceed max dentries size(256 MB) to avoid
the circular FAT chain issue.
Fixes: ca06197382 ("exfat: add directory operations")
Cc: stable@vger.kernel.org # v5.9
Reported-by: Florian Cramer <flrncrmr@gmail.com>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Tested-by: Chris Down <chris@chrisdown.name>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 763778cd79 ]
Prior to commit 1538d82f46 ("i2c: mpc: Interrupt driven transfer") the
old interrupt handler would reread MPC_I2C_SR after checking the CSR_MIF
bit. When the driver was re-written this was removed as it seemed
unnecessary. However as it turns out this is necessary for i2c devices
which do clock stretching otherwise we end up thinking the bus is still
busy when processing the interrupt.
Fixes: 1538d82f46 ("i2c: mpc: Interrupt driven transfer")
Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6039ca2549 ]
The pkey test code keeps a "shadow" of the pkey register around. This
ensures that any bugs which might write to the register can be caught more
quickly.
Generally, userspace has a good idea when the kernel is going to write to
the register. For instance, alloc_pkey() is passed a permission mask.
The caller of alloc_pkey() can update the shadow based on the return value
and the mask.
But, the kernel can also modify the pkey register in a more sneaky way.
For mprotect(PROT_EXEC) mappings, the kernel will allocate a pkey and
write the pkey register to create an execute-only mapping. The kernel
never tells userspace what key it uses for this.
This can cause the test to fail with messages like:
protection_keys_64.2: pkey-helpers.h:132: _read_pkey_reg: Assertion `pkey_reg == shadow_pkey_reg' failed.
because the shadow was not updated with the new kernel-set value.
Forcibly update the shadow value immediately after an mprotect().
Link: https://lkml.kernel.org/r/20210611164200.EF76AB73@viggo.jf.intel.com
Fixes: 6af17cf89e ("x86/pkeys/selftests: Add PROT_EXEC test")
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Ram Pai <linuxram@us.ibm.com>
Cc: Sandipan Das <sandipan@linux.ibm.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: "Desnes A. Nunes do Rosario" <desnesn@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Suchanek <msuchanek@suse.de>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f36ef40762 ]
Patch series "selftests/vm/pkeys: Bug fixes and a new test".
There has been a lot of activity on the x86 front around the XSAVE
architecture which is used to context-switch processor state (among other
things). In addition, AMD has recently joined the protection keys club by
adding processor support for PKU.
The AMD implementation helped uncover a kernel bug around the PKRU "init
state", which actually applied to Intel's implementation but was just
harder to hit. This series adds a test which is expected to help find
this class of bug both on AMD and Intel. All the work around pkeys on x86
also uncovered a few bugs in the selftest.
This patch (of 4):
The "random" pkey allocation code currently does the good old:
srand((unsigned int)time(NULL));
*But*, it unfortunately does this on every random pkey allocation.
There may be thousands of these a second. time() has a one second
resolution. So, each time alloc_random_pkey() is called, the PRNG is
*RESET* to time(). This is nasty. Normally, if you do:
srand(<ANYTHING>);
foo = rand();
bar = rand();
You'll be quite guaranteed that 'foo' and 'bar' are different. But, if
you do:
srand(1);
foo = rand();
srand(1);
bar = rand();
You are quite guaranteed that 'foo' and 'bar' are the *SAME*. The recent
"fix" effectively forced the test case to use the same "random" pkey for
the whole test, unless the test run crossed a second boundary.
Only run srand() once at program startup.
This explains some very odd and persistent test failures I've been seeing.
Link: https://lkml.kernel.org/r/20210611164153.91B76FB8@viggo.jf.intel.com
Link: https://lkml.kernel.org/r/20210611164155.192D00FF@viggo.jf.intel.com
Fixes: 6e373263ce ("selftests/vm/pkeys: fix alloc_random_pkey() to make it really random")
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Ram Pai <linuxram@us.ibm.com>
Cc: Sandipan Das <sandipan@linux.ibm.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: "Desnes A. Nunes do Rosario" <desnesn@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Suchanek <msuchanek@suse.de>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 65a0d3c146 ]
If the input is out of the range of the allowed values, either larger than
the largest value or closer to zero than the smallest non-zero allowed
value, then a division by zero would occur.
In the case of input too large, the division by zero will occur on the
first iteration. The best result (largest allowed value) will be found by
always choosing the semi-convergent and excluding the denominator based
limit when finding it.
In the case of the input too small, the division by zero will occur on the
second iteration. The numerator based semi-convergent should not be
calculated to avoid the division by zero. But the semi-convergent vs
previous convergent test is still needed, which effectively chooses
between 0 (the previous convergent) vs the smallest allowed fraction (best
semi-convergent) as the result.
Link: https://lkml.kernel.org/r/20210525144250.214670-1-tpiepho@gmail.com
Fixes: 323dd2c3ed ("lib/math/rational.c: fix possible incorrect result from rational fractions helper")
Signed-off-by: Trent Piepho <tpiepho@gmail.com>
Reported-by: Yiyuan Guo <yguoaz@gmail.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Oskar Schirmer <oskar@scara.com>
Cc: Daniel Latypov <dlatypov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6acfb5ba15 ]
Since commit d6995da311 ("hugetlb: use page.private for hugetlb specific
page flags") converts page.private for hugetlb specific page flags. We
should use hugetlb_page_subpool() to get the subpool pointer instead of
page_private().
This 'could' prevent the migration of hugetlb pages. page_private(hpage)
is now used for hugetlb page specific flags. At migration time, the only
flag which could be set is HPageVmemmapOptimized. This flag will only be
set if the new vmemmap reduction feature is enabled. In addition,
!page_mapping() implies an anonymous mapping. So, this will prevent
migration of hugetb pages in anonymous mappings if the vmemmap reduction
feature is enabled.
In addition, that if statement checked for the rare race condition of a
page being migrated while in the process of being freed. Since that check
is now wrong, we could leak hugetlb subpool usage counts.
The commit forgot to update it in the page migration routine. So fix it.
[songmuchun@bytedance.com: fix compiler error when !CONFIG_HUGETLB_PAGE reported by Randy]
Link: https://lkml.kernel.org/r/20210521022747.35736-1-songmuchun@bytedance.com
Link: https://lkml.kernel.org/r/20210520025949.1866-1-songmuchun@bytedance.com
Fixes: d6995da311 ("hugetlb: use page.private for hugetlb specific page flags")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reported-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Tested-by: Anshuman Khandual <anshuman.khandual@arm.com> [arm64]
Cc: Oscar Salvador <osalvador@suse.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1b0482229c ]
If an NMI interrupt hits in an implicit soft-masked region, regs->softe
is modified to reflect that. This may not be necessary for correctness
at the moment, but it is less surprising and it's unhelpful when
debugging or adding checks.
Make sure this is changed back to how it was found before returning.
Fixes: 4ec5feec1a ("powerpc/64s: Make NMI record implicitly soft-masked code as irqs disabled")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210630074621.2109197-6-npiggin@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5567b1ee29 ]
The early bad fault or key fault test in do_hash_fault() ends up calling
into ___do_page_fault without having gone through an interrupt handler
wrapper (except the initial _RAW one). This can end up calling local irq
functions while the interrupt has not been reconciled, which will likely
cause crashes and it trips up on a later patch that adds more assertions.
pkey_exec_prot from selftests causes this path to be executed.
There is no real reason to run the in_nmi() test should be performed
before the key fault check. In fact if a perf interrupt in the hash
fault code did a stack walk that was made to take a key fault somehow
then running ___do_page_fault could possibly cause another hash fault
causing problems. Move the in_nmi() test first, and then do everything
else inside the regular interrupt handler function.
Fixes: 3a96570ffc ("powerpc: convert interrupt handlers to use wrappers")
Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210630074621.2109197-2-npiggin@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit deeaf96356 ]
For default (x16) scheme which is currently used by mvebu-uart.c driver,
maximal divisor of UART base clock is 1023*16. Therefore there is limit for
minimal supported baudrate. This change calculate it correctly and prevents
setting invalid divisor 0 into hardware registers.
Signed-off-by: Pali Rohár <pali@kernel.org>
Fixes: 68a0db1d7d ("serial: mvebu-uart: add function to change baudrate")
Link: https://lore.kernel.org/r/20210624224909.6350-4-pali@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ecd6b010d8 ]
Testing mvuart->clk for non-error is not enough as mvuart->clk may contain
valid clk pointer but when clk_prepare_enable(mvuart->clk) failed then
port->uartclk is zero.
When mvuart->clk is not available then port->uartclk is zero too.
Parent clock rate port->uartclk is needed to calculate UART clock divisor
and without it is not possible to change baudrate.
So fix test condition when it is possible to change baudrate.
Signed-off-by: Pali Rohár <pali@kernel.org>
Fixes: 68a0db1d7d ("serial: mvebu-uart: add function to change baudrate")
Link: https://lore.kernel.org/r/20210624224909.6350-3-pali@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9583922563 ]
Commit a21d1becaa ("powerpc: Reintroduce is_kvm_guest() as a fast-path
check") added is_kvm_guest() and changed kvm_para_available() to use it.
is_kvm_guest() checks a static key, kvm_guest, and that static key is
set in check_kvm_guest().
The problem is check_kvm_guest() is only called on pseries, and even
then only in some configurations. That means is_kvm_guest() always
returns false on all non-pseries and some pseries depending on
configuration. That's a bug.
For PR KVM guests this is noticable because they no longer do live
patching of themselves, which can be detected by the omission of a
message in dmesg such as:
KVM: Live patching for a fast VM worked
To fix it make check_kvm_guest() an initcall, to ensure it's always
called at boot. It needs to be core so that it runs before
kvm_guest_init() which is postcore. To be an initcall it needs to return
int, where 0 means success, so update that.
We still call it manually in pSeries_smp_probe(), because that runs
before init calls are run.
Fixes: a21d1becaa ("powerpc: Reintroduce is_kvm_guest() as a fast-path check")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210623130514.2543232-1-mpe@ellerman.id.au
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ed78f56e12 ]
In case performance stats for an nvdimm are not available, reading the
'perf_stats' sysfs file returns an -ENOENT error. A better approach is
to make the 'perf_stats' file entirely invisible to indicate that
performance stats for an nvdimm are unavailable.
So this patch updates 'papr_nd_attribute_group' to add a 'is_visible'
callback implemented as newly introduced 'papr_nd_attribute_visible()'
that returns an appropriate mode in case performance stats aren't
supported in a given nvdimm.
Also the initialization of 'papr_scm_priv.stat_buffer_len' is moved
from papr_scm_nvdimm_init() to papr_scm_probe() so that it value is
available when 'papr_nd_attribute_visible()' is called during nvdimm
initialization.
Even though 'perf_stats' attribute is available since v5.9, there are
no known user-space tools/scripts that are dependent on presence of its
sysfs file. Hence I dont expect any user-space breakage with this
patch.
Fixes: 2d02bf835e ("powerpc/papr_scm: Fetch nvdimm performance stats from PHYP")
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210513092349.285021-1-vaibhav@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f35d2f249e ]
copy-paste contains implicit "copy buffer" state that can contain
arbitrary user data (if the user process executes a copy instruction).
This could be snooped by another process if a context switch hits while
the state is live. So cp_abort is executed on context switch to clear
out possible sensitive data and prevent the leak.
cp_abort is done after the low level _switch(), which means it is never
reached by newly created tasks, so they could snoop on this buffer
between their first and second context switch.
Fix this by doing the cp_abort before calling _switch. Add some
comments which should make the issue harder to miss.
Fixes: 07d2a628bc ("powerpc/64s: Avoid cpabort in context switch when possible")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210622053036.474678-1-npiggin@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bab26238bb ]
printk_safe_flush_on_panic() has special lock breaking code for the case
where we panic()ed with the console lock held. It relies on panic IPI
causing other CPUs to mark themselves offline.
Do as most other architectures do.
This effectively reverts commit de6e5d3841 ("powerpc: smp_send_stop do
not offline stopped CPUs"), unfortunately it may result in some false
positive warnings, but the alternative is more situations where we can
crash without getting messages out.
Fixes: de6e5d3841 ("powerpc: smp_send_stop do not offline stopped CPUs")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210623041245.865134-1-npiggin@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b67e830d38 ]
On K3 family of SoCs (which includes AM654 SoC), it is observed that RX
TIMEOUT is signalled after RX FIFO has been drained, in which case a
dummy read of RX FIFO is required to clear RX TIMEOUT condition.
Otherwise, this would lead to an interrupt storm.
Fix this by introducing UART_RX_TIMEOUT_QUIRK flag and doing a dummy
read in IRQ handler when RX TIMEOUT is reported with no data in RX FIFO.
Fixes: be70874498 ("serial: 8250_omap: Add support for AM654 UART controller")
Reported-by: Jan Kiszka <jan.kiszka@siemens.com>
Tested-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
Link: https://lore.kernel.org/r/20210622145704.11168-1-vigneshr@ti.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 07b60713b5 ]
When running event-no-pid test on small machines (e.g. cloud 1-core
instance), other events might not happen:
+ cat trace
+ cnt=0
+ [ 0 -eq 0 ]
+ fail No other events were recorded
[15] event tracing - restricts events based on pid notrace filtering [FAIL]
Schedule a simple sleep task to be sure that some other process events
get recorded.
Fixes: ebed9628f5 ("selftests/ftrace: Add test to test new set_event_notrace_pid file")
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ee78b9360e ]
In 'ktd2692_parse_dt()', if an error occurs after a successful
'regulator_enable()' call, we should call 'regulator_enable()'.
This is the same in 'ktd2692_probe()', if an error occurs after a
successful 'ktd2692_parse_dt()' call.
Instead of adding 'regulator_enable()' in several places, implement a
resource managed solution and simplify the remove function accordingly.
Fixes: b7da8c5c72 ("leds: Add ktd2692 flash LED driver")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a7a0a2feb9 ]
When system enter suspend, the machine driver suspend callback
function will be called, then the cpu driver trigger callback
(SNDRV_PCM_TRIGGER_SUSPEND) be called, it would disable the
interrupt.
But the machine driver suspend and cpu dai driver suspend order
maybe changed, the cpu dai driver's suspend callback is called before
machine driver's suppend callback, then the interrupt is not cleared
successfully in trigger callback.
So need to clear interrupts in cpu dai driver's suspend callback
to avoid such issue.
Fixes: 9cb2b3796e ("ASoC: fsl_spdif: Add pm runtime function")
Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com>
Reviewed-by: Fabio Estevam <festevam@gmail.com>
Link: https://lore.kernel.org/r/1624365084-7934-1-git-send-email-shengjiu.wang@nxp.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3c252b087d ]
When reading binary attributes in progress, buffer->bin_buffer is setup in
configfs_read_bin_file() but never freed.
Fixes: 03607ace80 ("configfs: implement binary attributes")
Signed-off-by: Chung-Chiang Cheng <cccheng@synology.com>
[hch: move the vfree rather than duplicating it]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3729e0ec59 ]
POWER9 and POWER10 asynchronous machine checks due to stores have their
cause reported in SRR1 but SRR1[42] is set, which in other cases
indicates DSISR cause.
Check for these cases and clear SRR1[42], so the cause matching uses
the i-side (SRR1) table.
Fixes: 7b9f71f974 ("powerpc/64s: POWER9 machine check handler")
Fixes: 201220bb0e ("powerpc/powernv: Machine check handler for POWER10")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210517140355.2325406-1-npiggin@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit dc11fc2991 ]
The platform device driver name is "max8997-muic", so advertise it
properly in the modalias string. This fixes automated module loading when
this driver is compiled as a module.
Fixes: b76668ba8a ("Extcon: add MAX8997 extcon driver")
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d25b224f8e ]
When sm5502_init_dev_type() iterates over sm5502_reg_data to
initialize the registers it is limited by ARRAY_SIZE(sm5502_reg_data).
There is no need to add another empty element to sm5502_reg_data.
Having the additional empty element in sm5502_reg_data will just
result in writing 0xff to register 0x00, which does not really
make sense.
Fixes: 914b881f94 ("extcon: sm5502: Add support new SM5502 extcon device driver")
Signed-off-by: Stephan Gerhold <stephan@gerhold.net>
Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4a90bbb478 ]
The current driver uses a value from register TEST_O as the original
value for register TEST_I, though, the value is overwritten by "param",
so there is a bug that the original value isn't no longer used.
The value of TEST_O[7:0] should be masked with "mask", replaced with
"param", and placed in the bitfield TESTI_DAT_MASK as new TEST_I value.
Fixes: c6d9b13241 ("phy: socionext: add PCIe PHY driver support")
Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com>
Link: https://lore.kernel.org/r/1623037842-19363-1-git-send-email-hayashi.kunihiko@socionext.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3d3e88e336 ]
In sdw_prep_deprep_slave_ports(), after the wait_for_completion()
the DP prepare status register is read. If this indicates that the
port is now prepared, the code should continue with the port setup.
It is irrelevant whether the wait_for_completion() timed out if the
port is now ready.
The previous implementation would always fail if the
wait_for_completion() timed out, even if the port was reporting
successful prepare.
This patch also fixes a minor bug where the return from sdw_read()
was not checked for error - any error code with LSBits clear could
be misinterpreted as a successful port prepare.
Fixes: 79df15b7d3 ("soundwire: Add helpers for ports operations")
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Link: https://lore.kernel.org/r/20210618144745.30629-1-rf@opensource.cirrus.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3002f467a0 ]
If an error occurs after a 'pci_enable_pcie_error_reporting()' call, it
must be undone by a corresponding 'pci_disable_pcie_error_reporting()'
call, as already done in the remove function.
Fixes: 2e5eda4681 ("habanalabs: PCIe Advanced Error Reporting support")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2892d8a00d ]
Variable "size" has type "phys_addr_t", which can be either 32-bit or
64-bit on 32-bit systems, while "unsigned long" is always 32-bit on
32-bit systems. Hence the cast in
(unsigned long)size / SZ_1M
may truncate a 64-bit size to 32-bit, as casts have a higher operator
precedence than divisions.
Fix this by inverting the order of the cast and division, which should
be safe for memory blocks smaller than 4 PiB. Note that the division is
actually a shift, as SZ_1M is a power-of-two constant, hence there is no
need to use div_u64().
While at it, use "%lu" to format "unsigned long".
Fixes: e8d9d1f548 ("drivers: of: add initialization code for static reserved memory")
Fixes: 3f0c820664 ("drivers: of: add initialization code for dynamic reserved memory")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/4a1117e72d13d26126f57be034c20dac02f1e915.1623835273.git.geert+renesas@glider.be
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 92babc9938 ]
Add __aligned(8) to ensure the buffer passed to
iio_push_to_buffers_with_timestamp() is suitable for the naturally
aligned timestamp that will be inserted.
Here an explicit structure is not used, because the holes would
necessitate the addition of an explict memset(), to avoid a kernel
data leak, making for a less minimal fix.
Fixes: 1c28799257 ("iio: light: isl29501: Add support for the ISL29501 ToF sensor.")
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Mathieu Othacehe <m.othacehe@gmail.com>
Reviewed-by: Nuno Sá <nuno.sa@analog.com>
Link: https://lore.kernel.org/r/20210613152301.571002-9-jic23@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ec90b52c07 ]
Add __aligned(8) to ensure the buffer passed to
iio_push_to_buffers_with_timestamp() is suitable for the naturally
aligned timestamp that will be inserted.
Here an explicit structure is not used, because the holes would
necessitate the addition of an explict memset(), to avoid a potential
kernel data leak, making for a less minimal fix.
Fixes: 55707294c4 ("iio: light: Add support for vishay vcnl4035")
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Parthiban Nallathambi <pn@denx.de>
Reviewed-by: Nuno Sá <nuno.sa@analog.com>
Link: https://lore.kernel.org/r/20210613152301.571002-8-jic23@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 78bcae8616 ]
Support for magic baud rate divisors of 32770 and 32769 used with SMSC
Super I/O chips for extra baud rates of 230400 and 460800 respectively
where base rate is 115200[1] has been added around Linux 2.5.64, which
predates our repo history, but the origin could be identified as commit
2a717aad772f ("Merge with Linux 2.5.64.") with the old MIPS/Linux repo
also at: <git://git.kernel.org/pub/scm/linux/kernel/git/ralf/linux.git>.
Code that is now in `serial8250_do_get_divisor' was added back then to
`serial8250_get_divisor', but that code would only ever trigger if one
of the higher baud rates was actually requested, and that cannot ever
happen, because the earlier call to `serial8250_get_baud_rate' never
returns them. This is because it calls `uart_get_baud_rate' with the
maximum requested being the base rate, that is clk/16 or 115200 for SMSC
chips at their nominal clock rate.
Fix it then and allow UPF_MAGIC_MULTIPLIER baud rates to be selected, by
requesting the maximum baud rate of clk/4 rather than clk/16 if the flag
has been set. Also correct the minimum baud rate, observing that these
ports only support actual (non-magic) divisors of up to 32767 only.
References:
[1] "FDC37M81x, PC98/99 Compliant Enhanced Super I/O Controller with
Keyboard/Mouse Wake-Up", Standard Microsystems Corporation, Rev.
03/27/2000, Table 31 - "Baud Rates", p. 77
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Link: https://lore.kernel.org/r/alpine.DEB.2.21.2105190412280.29169@angie.orcam.me.uk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7917e90667 ]
The B_SESS_VLD_WAKEUP_EN bit 6 was added by a mistake in a previous
commit. This bit corresponds to B_SESS_END_WAKEUP_EN, which we don't use.
The B_VBUS_VLD_WAKEUP_EN doesn't exist at all and B_SESS_VLD_WAKEUP_EN
needs to be in place of it. We don't utilize B-sensors in the driver,
so it never was a problem, nevertheless let's correct the definition of
the bits.
Fixes: 35192007d2 ("usb: phy: tegra: Support waking up from a low power mode")
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Link: https://lore.kernel.org/r/20210613145936.9902-2-digetx@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6f8d39a8ef ]
Some devices need an extra delay after losing VBUS, otherwise VBUS may
be detected as active at suspend time, preventing the PHY's suspension
by the VBUS detection sensor. This problem was found on Asus Transformer
TF700T (Tegra30) tablet device, where the USB PHY wakes up immediately
from suspend because VBUS sensor continues to detect VBUS as active after
disconnection. We need to poll the PHY's VBUS wakeup status until it's
deasserted before suspending PHY in order to fix this minor trouble.
Fixes: 35192007d2 ("usb: phy: tegra: Support waking up from a low power mode")
Reported-by: Maxim Schwalm <maxim.schwalm@gmail.com> # Asus TF700T
Tested-by: Maxim Schwalm <maxim.schwalm@gmail.com> # Asus TF700T
Reviewed-by: Peter Chen <peter.chen@kernel.org>
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Link: https://lore.kernel.org/r/20210613145936.9902-1-digetx@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5b4f167ef3 ]
Driver code call 'devm_of_pci_get_host_bridge_resources'
to get resources and properly fill 'bridge->windows' and
'bridge->dma_ranges'. After parsing the ranges and store
as resources, at the end it makes a call to pci function
'pci_add_resource_offset' to set the offset for the
memory resource. To calculate offset, resource start address
subtracts pci address of the range. MT7621 does not need
any offset for the memory resource. Moreover, setting an
offset got into 'WARN_ON' calls from pci devices driver code.
Until now memory range pci_addr was being '0x00000000' and
res->start is '0x60000000' but becase pci controller driver
was manually setting resources and adding them using pci function
'pci_add_resource' where a zero is passed as offset, things
was properly working. Since PCI_IOBASE is defined now for
ralink we don't set nothing manually anymore so we have to
properly fix PCI address for this range to make things work
and the new pci address must be set to '0x60000000'. Doing
in this way the subtract result obtain zero as offset
and pci device driver code properly works.
Fixes: d59578da2b ("staging: mt7621-dts: add dts files")
Signed-off-by: Sergio Paracuellos <sergio.paracuellos@gmail.com>
Link: https://lore.kernel.org/r/20210614100617.28753-4-sergio.paracuellos@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e02a3b9458 ]
There is a leak in rtl8712 driver.
The problem was in non-freed adapter data if
firmware load failed.
This leak can be reproduced with this code:
https://syzkaller.appspot.com/text?tag=ReproC&x=16612f02d00000,
Autoload must fail (to not hit memory leak reported by syzkaller)
There are 2 possible ways how rtl871x_load_fw_cb() and
r871xu_dev_remove() can be called (in case of fw load error).
1st case:
r871xu_dev_remove() then rtl871x_load_fw_cb()
In this case r871xu_dev_remove() will wait for
completion and then will jump to the end, because
rtl871x_load_fw_cb() set intfdata to NULL:
if (pnetdev) {
struct _adapter *padapter = netdev_priv(pnetdev);
/* never exit with a firmware callback pending */
wait_for_completion(&padapter->rtl8712_fw_ready);
pnetdev = usb_get_intfdata(pusb_intf);
usb_set_intfdata(pusb_intf, NULL);
if (!pnetdev)
goto firmware_load_fail;
... clean up code here ...
}
2nd case:
rtl871x_load_fw_cb() then r871xu_dev_remove()
In this case pnetdev (from code snippet above) will
be zero (because rtl871x_load_fw_cb() set it to NULL)
And clean up code won't be executed again.
So, in all cases we need to free adapted data in rtl871x_load_fw_cb(),
because disconnect function cannot take care of it. And there won't be
any race conditions, because complete() call happens after setting
intfdata to NULL.
In previous patch I moved out free_netdev() from r8712_free_drv_sw()
and that's why now it's possible to free adapter data and then call
complete.
Fixes: 8c213fa591 ("staging: r8712u: Use asynchronous firmware loading")
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Link: https://lore.kernel.org/r/81e68fe0194499cc2e7692d35bc4dcf167827d8f.1623620630.git.paskripkin@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit dce793c0ab ]
Add __aligned(8) to ensure the buffer passed to
iio_push_to_buffers_with_timestamp() is suitable for the naturally
aligned timestamp that will be inserted.
Here an explicit structure is not used, because the holes would
necessitate the addition of an explict memset(), to avoid a kernel
data leak, making for a less minimal fix.
Found during an audit of all callers of iio_push_to_buffers_with_timestamp()
Fixes: 8fe78d5261 ("iio: vcnl4000: Add buffer support for VCNL4010/20.")
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Mathieu Othacehe <m.othacehe@gmail.com>
Reviewed-by: Nuno Sá <nuno.sa@analog.com>
Link: https://lore.kernel.org/r/20210613152301.571002-7-jic23@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 94588c1bf1 ]
Gcc reports build error when CONFIG_IIO_TRIGGERED_BUFFER is not set:
riscv64-linux-gnu-ld: drivers/iio/dummy/iio_simple_dummy_buffer.o: in function `iio_simple_dummy_configure_buffer':
iio_simple_dummy_buffer.c:(.text+0x2b0): undefined reference to `iio_triggered_buffer_setup_ext'
riscv64-linux-gnu-ld: drivers/iio/dummy/iio_simple_dummy_buffer.o: in function `.L0 ':
iio_simple_dummy_buffer.c:(.text+0x2fc): undefined reference to `iio_triggered_buffer_cleanup'
Fix it by select IIO_TRIGGERED_BUFFER for config IIO_SIMPLE_DUMMY_BUFFER.
Fixes: 738f6ba118 ("iio: dummy: iio_simple_dummy_buffer: use triggered buffer core calls")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 384426bd10 ]
When one parameter of a parameterised test failed, its failure would be
propagated to the overall test, but not to the suite result (unless it
was the last parameter).
This is because test_case->success was being reset to the test->success
result after each parameter was used, so a failing test's result would
be overwritten by a non-failing result. The overall test result was
handled in a third variable, test_result, but this was discarded after
the status line was printed.
Instead, just propagate the result after each parameter run.
Signed-off-by: David Gow <davidgow@google.com>
Fixes: fadb08e7c7 ("kunit: Support for Parameterized Testing")
Reviewed-by: Marco Elver <elver@google.com>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c93081b265 ]
In the raw NAND world, ECC engines increment ecc_stats and the final
caller is responsible for returning -EBADMSG if the verification
failed.
In the SPI-NAND world it was a bit different until now because there was
only one possible ECC engine: the on-die one. Indeed, the
spinand_mtd_read() call was incrementing the ecc_stats counters
depending on the outcome of spinand_check_ecc_status() directly.
So now let's split the logic like this:
- spinand_check_ecc_status() is specific to the SPI-NAND on-die engine
and is kept very simple: it just returns the ECC status (bonus point:
the content of this helper can be overloaded).
- spinand_ondie_ecc_finish_io_req() is the caller of
spinand_check_ecc_status() and will increment the counters and
eventually return -EBADMSG.
- spinand_mtd_read() is not tied to the on-die ECC implementation and
should be able to handle results coming from other ECC engines: it has
the responsibility of returning the maximum number of bitflips which
happened during the entire operation as this is the only helper that
is aware that several pages may be read in a row.
Fixes: 945845b54c ("mtd: spinand: Instantiate a SPI-NAND on-die ECC engine")
Reported-by: YouChing Lin <ycllin@mxic.com.tw>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Tested-by: YouChing Lin <ycllin@mxic.com.tw>
Link: https://lore.kernel.org/linux-mtd/20210527084345.208215-1-miquel.raynal@bootlin.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a0d36fa106 ]
We should not dereference ->dual_link_port if it is NULL and lane bonding
is requested. For this reason move lane bonding configuration happen
inside the block where ->dual_link_port != NULL.
Fixes: 54509f5005 ("thunderbolt: Add KUnit tests for path walking")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Yehezkel Bernat <YehezkelShB@gmail.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 388161ca45 ]
SPI NOR flashes will just ignore program commands if the OTP region is
locked. Thus, a user might not notice that the intended write didn't end
up in the flash. Return -EROFS to the user in this case. From what I can
tell, chips/cfi_cmdset_0001.c also return this error code.
One could optimize spi_nor_mtd_otp_range_is_locked() to read the status
register only once and not for every OTP region, but for that we would
need some more invasive changes. Given that this is
one-time-programmable memory and the normal access mode is reading, we
just live with the small overhead.
By moving the code around a bit, we can just check the length before
calling spi_nor_mtd_otp_range_is_locked() and avoid an underflow there
if a len is 0. This way we don't need to take the lock either. We also
skip the "*retlen = 0" assignment, mtdcore already takes care of that
for us.
Fixes: 069089acf8 ("mtd: spi-nor: add OTP support")
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
Reviewed-by: Pratyush Yadav <p.yadav@ti.com>
Reviewed-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b97b1a7698 ]
The security registers either take a 3 byte or a 4 byte address offset,
depending on the address mode of the flash. Thus just leave the
nor->addr_width as is.
Fixes: cad3193fe9 ("mtd: spi-nor: implement OTP support for Winbond and similar flashes")
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
Reviewed-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Acked-by: Pratyush Yadav <p.yadav@ti.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit aafe93516b ]
Every time the hub signals a reset while we (device) are hsotg->connected,
dwc2_hsotg_core_init_disconnected() is called, which in turn calls
dwc2_hs_phy_init().
GUSBCFG.USBTrdTim is cleared upon Core Soft Reset, so if
hsotg->params.phy_utmi_width is 8-bit, the value of GUSBCFG.USBTrdTim (the
default one: 0x5, corresponding to 16-bit) is always different from
hsotg->params.phy_utmi_width, thus dwc2_core_reset() is called every
time (usbcfg != usbcfg_old), which causes 2 issues:
1) The call to dwc2_core_reset() does another reset 300us after the initial
Chirp K of the first reset (which should last at least Tuch = 1ms), and
messes up the High-speed Detection Handshake: both hub and device drive
current into the D+ and D- lines at the same time.
2) GUSBCFG.USBTrdTim is cleared by the second reset, so its value is always
the default one (0x5).
Setting GUSBCFG.USBTrdTim after the potential call to dwc2_core_reset()
fixes both issues. It is now set even when select_phy is false because the
cost of the Core Soft Reset is removed.
Fixes: 1e868545f2 ("usb: dwc2: gadget: Move gadget phy init into core phy init")
Signed-off-by: Clément Lassieur <clement@lassieur.org>
Link: https://lore.kernel.org/r/20210603155921.940651-1-clement@lassieur.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d460bb6c64 ]
In commit b02002cc4c ("s390/pci: Implement ioremap_wc/prot() with
MIO") we implemented both ioremap_wc() and ioremap_prot() however until
now we had not set HAVE_IOREMAP_PROT in Kconfig, do so now.
This also requires implementing pte_pgprot() as this is used in the
generic_access_phys() code enabled by CONFIG_HAVE_IOREMAP_PROT. As with
ioremap_wc() we need to take the MMIO Write Back bit index into account.
Moreover since the pgprot value returned from pte_pgprot() is to be used
for mappings into kernel address space we must make sure that it uses
appropriate kernel page table protection bits. In particular a pgprot
value originally coming from userspace could have the _PAGE_PROTECT
bit set to enable fault based dirty bit accounting which would then make
the mapping inaccessible when used in kernel address space.
Fixes: b02002cc4c ("s390/pci: Implement ioremap_wc/prot() with MIO")
Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit be227f8e99 ]
Now that DMA ops are part of the core API via iommu-dma, fold the
vestigial remains of the IOMMU_DMA_OPS init state into the IOMMU API
phase, and clean up a few other leftovers. This should also close the
race window wherein bus_set_iommu() effectively makes the DMA ops state
visible before its nominal initialisation - it seems this was previously
fairly benign, but since commit a250c23f15 ("iommu: remove
DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE") it can now lead to the strict flush
queue policy inadvertently being picked for default domains allocated
during that window, with a corresponding unexpected perfomance impact.
Reported-by: Jussi Maki <joamaki@gmail.com>
Tested-by: Jussi Maki <joamaki@gmail.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Fixes: a250c23f15 ("iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE")
Link: https://lore.kernel.org/r/665db61e23ff8d54ac5eb391bef520b3a803fcb9.1622727974.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4b21a503ad ]
print_iommu_info prints the EFR register and then the decoded list of
features on a separate line:
pci 0000:00:00.2: AMD-Vi: Extended features (0x206d73ef22254ade):
PPR X2APIC NX GT IA GA PC GA_vAPIC
The second line is emitted via 'pr_cont', which causes it to have a
different ('warn') loglevel compared to the previous line ('info').
Commit 9a295ff0ff attempted to rectify this by removing the newline
from the pci_info format string, but this doesn't work, as pci_info
calls implicitly append a newline anyway.
Printing the decoded features on the same line would make it quite long.
Instead, change pci_info() to pr_info() to omit PCI bus location info,
which is also shown in the preceding message. This results in:
pci 0000:00:00.2: AMD-Vi: Found IOMMU cap 0x40
AMD-Vi: Extended features (0x206d73ef22254ade): PPR X2APIC NX GT IA GA PC GA_vAPIC
AMD-Vi: Interrupt remapping enabled
Fixes: 9a295ff0ff ("iommu/amd: Print extended features in one line to fix divergent log levels")
Link: https://lore.kernel.org/lkml/alpine.LNX.2.20.13.2104112326460.11104@monopod.intra.ispras.ru
Signed-off-by: Alexander Monakov <amonakov@ispras.ru>
Cc: Paul Menzel <pmenzel@molgen.mpg.de>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: iommu@lists.linux-foundation.org
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Link: https://lore.kernel.org/r/20210504102220.1793-1-amonakov@ispras.ru
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ce52ec5bee ]
Commit 1366a3db3d ("staging: unisys: visorbus: visorchipset_init clean
up gotos") assigns the initial value -ENODEV to the local variable 'err',
and the first several error branches will return this value after "goto
error". But commit f1f537c2e7 ("staging: unisys: visorbus: Consolidate
controlvm channel creation.") overwrites 'err' in the middle of the way.
As a result, some error branches do not successfully return the initial
value -ENODEV of 'err', but return 0.
In addition, when kzalloc() fails, -ENOMEM should be returned instead of
-ENODEV.
Fixes: f1f537c2e7 ("staging: unisys: visorbus: Consolidate controlvm channel creation.")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Link: https://lore.kernel.org/r/20210528082614.9337-1-thunder.leizhen@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9108109457 ]
Currently the cfam_read and cfam_write functions return the provided
number of bytes given in the count parameter and not the error return
code in variable rc, hence all failures of read/writes are being
silently ignored. Fix this by returning the error code in rc.
Addresses-Coverity: ("Unused value")
Fixes: d1dcd67825 ("fsi: Add cfam char devices")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Jeremy Kerr <jk@ozlabs.org>
Link: https://lore.kernel.org/r/20210603122812.83587-1-colin.king@canonical.com
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a1649a5260 ]
During more massive generation of interrupts, the IRQ got stuck,
and the subdevices did not see any new interrupts. That happens
especially at wonky USB supply in combination with ADC reads.
To fix that trigger the IRQ at level low instead of falling edge.
Fixes: 0c81604516 ("mfd: rn5t618: Add IRQ support")
Signed-off-by: Andreas Kemnade <andreas@kemnade.info>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5a23e8b0fd ]
Currently the software node is removed in error case and at ->remove()
stage unconditionally, that ruins the symmetry. Besides, in some cases,
when mfd_add_device() fails, the device_remove_software_node() call
may lead to NULL pointer dereference:
BUG: kernel NULL pointer dereference, address: 00000000
...
EIP: strlen+0x12/0x20
...
kernfs_name_hash+0x13/0x70
kernfs_find_ns+0x32/0xc0
kernfs_remove_by_name_ns+0x2a/0x90
sysfs_remove_link+0x16/0x30
software_node_notify.cold+0x34/0x6b
device_remove_software_node+0x5a/0x90
mfd_add_device.cold+0x30a/0x427
Fix all these by guarding device_remove_software_node() with a conditional
and locating it at the right place.
Fixes: 42e5998291 ("mfd: core: Add support for software nodes")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7ce9fc5ecd ]
We set the max_active iSCSI EH works to 1, so all work is going to execute
in order by default. However, userspace can now override this in sysfs. If
max_active > 1, we can end up with the block_work on CPU1 and
iscsi_unblock_session running the unblock_work on CPU2 and the session and
target/device state will end up out of sync with each other.
This adds a flush of the block_work in iscsi_unblock_session.
Link: https://lore.kernel.org/r/20210525181821.7617-17-michael.christie@oracle.com
Fixes: 1d726aa6ef ("scsi: iscsi: Optimize work queue flush use")
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 23d6fefbb3 ]
Commit 0ab710458d ("scsi: iscsi: Perform connection failure entirely in
kernel space") has the following regressions/bugs that this patch fixes:
1. It can return cmds to upper layers like dm-multipath where that can
retry them. After they are successful the fs/app can send new I/O to the
same sectors, but we've left the cmds running in FW or in the net layer.
We need to be calling ep_disconnect if userspace is not up.
This patch only fixes the issue for offload drivers. iscsi_tcp will be
fixed in separate commit because it doesn't have a ep_disconnect call.
2. The drivers that implement ep_disconnect expect that it's called before
conn_stop. Besides crashes, if the cleanup_task callout is called before
ep_disconnect it might free up driver/card resources for session1 then they
could be allocated for session2. But because the driver's ep_disconnect is
not called it has not cleaned up the firmware so the card is still using
the resources for the original cmd.
3. The stop_conn_work_fn can run after userspace has done its recovery and
we are happily using the session. We will then end up with various bugs
depending on what is going on at the time.
We may also run stop_conn_work_fn late after userspace has called stop_conn
and ep_disconnect and is now going to call start/bind conn. If
stop_conn_work_fn runs after bind but before start, we would leave the conn
in a unbound but sort of started state where IO might be allowed even
though the drivers have been set in a state where they no longer expect
I/O.
4. Returning -EAGAIN in iscsi_if_destroy_conn if we haven't yet run the in
kernel stop_conn function is breaking userspace. We should have been doing
this for the caller.
Link: https://lore.kernel.org/r/20210525181821.7617-8-michael.christie@oracle.com
Fixes: 0ab710458d ("scsi: iscsi: Perform connection failure entirely in kernel space")
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 06c203a556 ]
If the system is not up, we can just fail immediately since iscsid is not
going to ever answer our netlink events. We are already setting the
recovery_tmo to 0, but by passing stop_conn STOP_CONN_TERM we never will
block the session and start the recovery timer, because for that flag
userspace will do the unbind and destroy events which would remove the
devices and wake up and kill the eh.
Since the conn is dead and the system is going dowm this just has us use
STOP_CONN_RECOVER with recovery_tmo=0 so we fail immediately. However, if
the user has set the recovery_tmo=-1 we let the system hang like they
requested since they might have used that setting for specific reasons
(one known reason is for buggy cluster software).
Link: https://lore.kernel.org/r/20210525181821.7617-5-michael.christie@oracle.com
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 891e2639de ]
During ep_disconnect we have been doing iscsi_suspend_tx/queue to block new
I/O but every driver except cxgbi and iscsi_tcp can still get I/O from
__iscsi_conn_send_pdu() if we haven't called iscsi_conn_failure() before
ep_disconnect. This could happen if we were terminating the session, and
the logout timed out before it was even sent to libiscsi.
Fix the issue by adding a helper which reverses the bind_conn call that
allows new I/O to be queued. Drivers implementing ep_disconnect can use this
to make sure new I/O is not queued to them when handling the disconnect.
Link: https://lore.kernel.org/r/20210525181821.7617-3-michael.christie@oracle.com
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4d431153e7 ]
The BusLogic driver has build errors on ia64 due to a name collision (in
the #included FlashPoint.c file). Rename the struct field in struct
sccb_mgr_info from si_flags to si_mflags (manager flags) to mend the build.
This is the first problem. There are 50+ others after this one:
In file included from ../include/uapi/linux/signal.h:6,
from ../include/linux/signal_types.h:10,
from ../include/linux/sched.h:29,
from ../include/linux/hardirq.h:9,
from ../include/linux/interrupt.h:11,
from ../drivers/scsi/BusLogic.c:27:
../arch/ia64/include/uapi/asm/siginfo.h:15:27: error: expected ':', ',', ';', '}' or '__attribute__' before '.' token
15 | #define si_flags _sifields._sigfault._flags
| ^
../drivers/scsi/FlashPoint.c:43:6: note: in expansion of macro 'si_flags'
43 | u16 si_flags;
| ^~~~~~~~
In file included from ../drivers/scsi/BusLogic.c:51:
../drivers/scsi/FlashPoint.c: In function 'FlashPoint_ProbeHostAdapter':
../drivers/scsi/FlashPoint.c:1076:11: error: 'struct sccb_mgr_info' has no member named '_sifields'
1076 | pCardInfo->si_flags = 0x0000;
| ^~
../drivers/scsi/FlashPoint.c:1079:12: error: 'struct sccb_mgr_info' has no member named '_sifields'
Link: https://lore.kernel.org/r/20210529234857.6870-1-rdunlap@infradead.org
Fixes: 391e2f2560 ("[SCSI] BusLogic: Port driver to 64-bit.")
Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Khalid Aziz <khalid@gonehiking.org>
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f1e1d532da ]
fwnode_for_each_child_node() bumps a reference counting of a returned variable.
We have to balance it whenever we return to the caller.
OTOH, the successful iteration will drop reference count under the hood, no need
to do it twice.
Fixes: 242b81170f ("leds: lp50xx: Add the LP50XX family of the RGB LED driver")
Cc: Dan Murphy <dmurphy@ti.com>
Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 807553f8bf ]
When requesting GPIO line the probe can be deferred.
In such case don't spam logs with an error message.
This can be achieved by switching to dev_err_probe().
Fixes: 5c1d824cda ("leds: lm3697: Introduce the lm3697 driver")
Cc: Dan Murphy <dmurphy@ti.com>
Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f55db1c7fa ]
device_get_next_child_node() bumps a reference counting of a returned variable.
We have to balance it whenever we return to the caller.
Fixes: 9a5c1c64ac ("leds: lm3692x: Change DT calls to fwnode calls")
Cc: Dan Murphy <dmurphy@ti.com>
Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3c5f655c44 ]
device_get_next_child_node() bumps a reference counting of a returned variable.
We have to balance it whenever we return to the caller.
In the older code the same is implied with device_for_each_child_node().
Fixes: 11e1bbc116 ("leds: lm36274: Introduce the TI LM36274 LED driver")
Fixes: a448fcf19c ("leds: lm36274: don't iterate through children since there is only one")
Cc: Dan Murphy <dmurphy@ti.com>
Cc: Marek Behún <marek.behun@nic.cz>
Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 99be74f61c ]
Regmap APIs should be selected, otherwise link can fail
ERROR: modpost: "__devm_regmap_init_i2c" [drivers/leds/leds-lm3532.ko] undefined!
Fixes: bc1b8492c7 ("leds: lm3532: Introduce the lm3532 LED driver")
Cc: Dan Murphy <dmurphy@ti.com>
Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fba8a6f226 ]
The clock handling has a few issues:
- when getting second clock fails, the first one left prepared and enabled
- on ->remove() clocks are unprepared and disabled twice
Fix all these by converting to use bulk clock operations since both clocks
are mandatory.
Fixes: c3987cd2bc ("leds: lgm: Add LED controller driver for LGM SoC")
Cc: Amireddy Mallikarjuna reddy <mallikarjunax.reddy@linux.intel.com>
Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0ac40af860 ]
Drop the bogus error code and let of_led_get() to take care about absent
of_node.
Fixes: e389240ad9 ("leds: Add managed API to get a LED from a device driver")
Cc: Jean-Jacques Hiblot <jjhiblot@ti.com>
Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit afc023da53 ]
For some reason lost in history function vchiq_mmal_init used
a static variable for storing the vchiq_instance.
This value is retrieved from vchiq per instance, so worked fine
until you try to call vchiq_mmal_init multiple times concurrently
when things then go wrong. This seemed to happen quite frequently
if using the cutdown firmware (no MMAL or VCSM services running)
as the vchiq_connect then failed, and one or other vchiq_shutdown
was working on an invalid handle.
Remove the static so that each caller gets a unique vchiq_instance.
Fixes: 7b3ad5abf0 ("staging: Import the BCM2835 MMAL-based V4L2 camera driver.")
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Link: https://lore.kernel.org/r/1621979857-26754-1-git-send-email-stefan.wahren@i2se.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b5437c7b68 ]
The controller being always asserting one CS or the other, there is no
need to actually select the right target before doing a page read/write.
However, the anfc_select_target() helper actually also changes the
timing configuration and clock in the case were two different NAND chips
with different timing requirements would be used. In this situation, we
must ensure proper configuration of the controller by calling it.
As a consequence of this change, the anfc_select_target() helper is
being moved earlier in the driver.
Fixes: 88ffef1b65 ("mtd: rawnand: arasan: Support the hardware BCH ECC engine")
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20210526093242.183847-4-miquel.raynal@bootlin.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8dea228b17 ]
The samples buffer is passed to iio_push_to_buffers_with_timestamp()
which requires a buffer aligned to 8 bytes as it is assumed that
the timestamp will be naturally aligned if present.
Fixes tag is inaccurate but prior to that likely manual backporting needed
(for anything before 4.18) Earlier than that the include file to fix is
drivers/iio/common/cros_ec_sensors/cros_ec_sensors_core.h:
commit 974e6f02e2 ("iio: cros_ec_sensors_core: Add common functions
for the ChromeOS EC Sensor Hub.") present since kernel stable 4.10.
(Thanks to Gwendal for tracking this down)
Fixes: 5a0b8cb466 ("iio: cros_ec: Move cros_ec_sensors_core.h in /include")
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Gwendal Grignou <gwendal@chromium.org
Link: https://lore.kernel.org/r/20210501171352.512953-7-jic23@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit df2f37cffd ]
To make code more readable, use a structure to express the channel
layout and ensure the timestamp is 8 byte aligned.
Found during an audit of all calls of uses of
iio_push_to_buffers_with_timestamp().
Fixes tag is not strictly accurate as prior to that patch there was
potentially an unaligned write. However, any backport past there will
need to be done manually.
Fixes: 0624bf847d ("iio:tcs3472: Use iio_push_to_buffers_with_timestamp()")
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Link: https://lore.kernel.org/r/20210501170121.512209-20-jic23@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 00a72db718 ]
On an IRQ handler we should not return normal error codes as 'irqreturn_t'
is expected.
This is done by jumping to the 'check_burst32' label where we return
'IRQ_HANDLED'. Note that it is fine to do the burst32 check in this
error path. If we have proper settings to apply burst32, we might just
do the setup now so that the next sample already uses it.
Fixes: fff7352bf7 ("iio: imu: Add support for adis16475")
Reviewed-by: Alexandru Ardelean <ardeleanalex@gmail.com>
Signed-off-by: Nuno Sa <nuno.sa@analog.com>
Link: https://lore.kernel.org/r/20210427085454.30616-2-nuno.sa@analog.com
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8f4e3d48bb ]
A patch from 2017 changed some accesses to DMA memory to use
get_unaligned_le32() and similar interfaces, to avoid problems
with doing unaligned accesson uncached memory.
However, the change in the mwifiex_pcie_alloc_sleep_cookie_buf()
function ended up changing the size of the access instead,
as it operates on a pointer to u8.
Change this function back to actually access the entire 32 bits.
Note that the pointer is aligned by definition because it came
from dma_alloc_coherent().
Fixes: 92c70a958b ("mwifiex: fix for unaligned reads")
Acked-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 652a6a2e38 ]
The way of printing the pointer address for the 'port_base'
address got into compile warnings on some architectures
[-Wpointer-to-int-cast]. Instead of use '%08x' and cast
to an 'unsigned int' just make use of '%px' and avoid the
cast. To avoid not really needed driver verbosity on normal
behaviour change also from 'dev_info' to 'dev_dbg'.
Fixes: d87da32372 ("phy: ralink: Add PHY driver for MT7621 PCIe PHY")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Sergio Paracuellos <sergio.paracuellos@gmail.com>
Link: https://lore.kernel.org/r/20210508070930.5290-7-sergio.paracuellos@gmail.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e60c2991f1 ]
The wrong code in set_mctrl() was already removed in commit 2b30efe2e8
("tty: serial: lpuart: Remove unnecessary code from set_mctrl"), but the
code in get_mctrl() wasn't removed. It will not return the state of the
RTS or CTS line but whether automatic flow control is enabled, which is
wrong for the get_mctrl(). Thus remove it.
Fixes: 2b30efe2e8 ("tty: serial: lpuart: Remove unnecessary code from set_mctrl")
Signed-off-by: Michael Walle <michael@walle.cc>
Link: https://lore.kernel.org/r/20210512141255.18277-7-michael@walle.cc
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8e4b1d2bc1 ]
Currently, rcu_spawn_core_kthreads() is invoked via an early_initcall(),
which works, except that rcu_spawn_gp_kthread() is also invoked via an
early_initcall() and rcu_spawn_core_kthreads() relies on adjustments to
kthread_prio that are carried out by rcu_spawn_gp_kthread(). There is
no guaranttee of ordering among early_initcall() handlers, and thus no
guarantee that kthread_prio will be properly checked and range-limited
at the time that rcu_spawn_core_kthreads() needs it.
In most cases, this bug is harmless. After all, the only reason that
rcu_spawn_gp_kthread() adjusts the value of kthread_prio is if the user
specified a nonsensical value for this boot parameter, which experience
indicates is rare.
Nevertheless, a bug is a bug. This commit therefore causes the
rcu_spawn_core_kthreads() function to be invoked directly from
rcu_spawn_gp_kthread() after any needed adjustments to kthread_prio have
been carried out.
Fixes: 48d07c04b4 ("rcu: Enable elimination of Tree-RCU softirq processing")
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ec03c21043 ]
The infamous commit c440eee1a7 ("Staging: staging: fbtft: Switch to
the GPIO descriptor interface") broke GPIO handling completely.
It has already four commits to rectify and it seems not enough.
In order to fix the mess here we:
1) Set default to "inactive" for all requested pins
2) Fix CS#, RD#, and WR# pins polarity since it's active low
and GPIO descriptor interface takes it into consideration
from the Device Tree or ACPI
3) Consolidate chip activation (CS# assertion) under default
->reset() callback
To summarize the expectations about polarity for GPIOs:
RD# Low
WR# Low
CS# Low
RESET# Low
DC or RS High
RW High
Data 0 .. 15 High
See also Adafruit learning course [1] for the example of the schematics.
While at it, drop unneeded NULL checks, since GPIO API is tolerant to that.
[1]: https://learn.adafruit.com/adafruit-2-8-and-3-2-color-tft-touchscreen-breakout-v2/downloads
Fixes: 92e3e88488 ("Staging: fbtft: Fix GPIO handling")
Fixes: b918d1c270 ("Staging: fbtft: Fix reset assertion when using gpio descriptor")
Fixes: dbc4f989c8 ("Staging: fbtft: Fix probing of gpio descriptor")
Fixes: c440eee1a7 ("Staging: fbtft: Switch to the gpio descriptor interface")
Cc: Jan Sebastian Götte <linux@jaseg.net>
Cc: Nishad Kamdar <nishadkamdar@gmail.com>
Reviewed-by: Phil Reid <preid@electromag.com.au>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20210503172114.27891-2-andriy.shevchenko@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 116138c3bd ]
Commit 6e2baa44c6 ("staging: rtl8712: remove enum WIFI_FRAME_SUBTYPE")
was wrong because:
WIFI_DATA_NULL != IEEE80211_STYPE_NULLFUNC
WIFI_DATA_CFACK != IEEE80211_STYPE_DATA_CFACK
WIFI_DATA_CFPOLL != IEEE80211_STYPE_DATA_CFPOLL
WIFI_DATA_CFACKPOLL != IEEE80211_STYPE_DATA_CFACKPOLL
the WIFI_DATA_xxx definitions include WIFI_DATA_TYPE, which is 'BIT(3)'.
Restore the previous behavior by adding the missing
'IEEE80211_FTYPE_DATA |' (0x0008, that is to say BIT(3)) when these values
are used.
Hopefully, the wrong commit was small enough and hand review is possible.
Fixes: 6e2baa44c6 ("staging: rtl8712: remove enum WIFI_FRAME_SUBTYPE")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/44aebfa3c5ce8f45ae05369c73e9ff77c6d271f9.1619939806.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit cf02ce742f ]
When 32-bit MIPS huge page support is enabled, we halve the number of
pointers a PTE page holds, making its last half go to waste.
Correspondingly, we should halve the number of kmap entries, as we just
initialized only a single pte table for that in pagetable_init().
Fixes: 35476311e5 ("MIPS: Add partial 32-bit huge page support")
Signed-off-by: Wei Li <liwei391@huawei.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0c6de0c943 ]
FB_MTU is used in 'tipc_msg_build()' to alloc smaller skb when memory
allocation fails, which can avoid unnecessary sending failures.
The value of FB_MTU now is 3744, and the data size will be:
(3744 + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) + \
SKB_DATA_ALIGN(BUF_HEADROOM + BUF_TAILROOM + 3))
which is larger than one page(4096), and two pages will be allocated.
To avoid it, replace '3744' with a calculation:
(PAGE_SIZE - SKB_DATA_ALIGN(BUF_OVERHEAD) - \
SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))
What's more, alloc_skb_fclone() will call SKB_DATA_ALIGN for data size,
and it's not necessary to make alignment for buf_size in
tipc_buf_acquire(). So, just remove it.
Fixes: 4c94cc2d3d ("tipc: fall back to smaller MTU if allocation of local send skb fails")
Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn>
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fade56410c ]
Commit 14972cbd34 ("net: lwtunnel: Handle fragmentation") moved
fragmentation logic away from lwtunnel by carry encap headroom and
use it in output MTU calculation. But the forwarding part was not
covered and created difference in MTU for output and forwarding and
further to silent drops on ipv4 forwarding path. Fix it by taking
into account lwtunnel encap headroom.
The same commit also introduced difference in how to treat RTAX_MTU
in IPv4 and IPv6 where latter explicitly removes lwtunnel encap
headroom from route MTU. Make IPv4 version do the same.
Fixes: 14972cbd34 ("net: lwtunnel: Handle fragmentation")
Suggested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8b0ed8443a ]
The caller of wb_get_create() should pin the memcg, because
wb_get_create() relies on this guarantee. The rcu read lock
only can guarantee that the memcg css returned by css_from_id()
cannot be released, but the reference of the memcg can be zero.
rcu_read_lock()
memcg_css = css_from_id()
wb_get_create(memcg_css)
cgwb_create(memcg_css)
// css_get can change the ref counter from 0 back to 1
css_get(memcg_css)
rcu_read_unlock()
Fix it by holding a reference to the css before calling
wb_get_create(). This is not a problem I encountered in the
real world. Just the result of a code review.
Fixes: 682aa8e1a6 ("writeback: implement unlocked_inode_to_wb transaction and use it for stat updates")
Link: https://lore.kernel.org/r/20210402091145.80635-1-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 78f6f40602 ]
If the Si5341 is being initially programmed and has no stored NVM
configuration, some of the register contents may contain unexpected
values, such as zeros, which could cause divide by zero errors during
driver initialization. Trap errors caused by zero registers or zero clock
rates which could result in divide errors later in the code.
Fixes: 3044a860fd ("clk: Add Si5341/Si5340 driver")
Signed-off-by: Robert Hancock <robert.hancock@calian.com>
Link: https://lore.kernel.org/r/20210325192643.2190069-4-robert.hancock@calian.com
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6e7d2de1e0 ]
The Si5341 datasheet warns that before accessing any other registers,
including the PAGE register, we need to wait for the DEVICE_READY register
to indicate the device is ready, or the process of the device loading its
state from NVM can be corrupted. Wait for DEVICE_READY on startup before
continuing initialization. This is done using a raw I2C register read
prior to setting up regmap to avoid any potential unwanted automatic PAGE
register accesses from regmap at this stage.
Fixes: 3044a860fd ("clk: Add Si5341/Si5340 driver")
Signed-off-by: Robert Hancock <robert.hancock@calian.com>
Link: https://lore.kernel.org/r/20210325192643.2190069-3-robert.hancock@calian.com
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fd90b5b904 ]
There are a few issues with the setup of the Actions Semi Owl S500 SoC's
clock chain involving AHPPREDIV, H and AHB clocks:
* AHBPREDIV clock is defined as a muxer only, although it also acts as
a divider.
* H clock is using a wrong divider register offset
* AHB is defined as a multi-rate factor clock, but it is actually just
a fixed pass clock.
Let's provide the following fixes:
* Change AHBPREDIV clock to an ungated OWL_COMP_DIV definition.
* Use the correct register shift value in the OWL_DIVIDER definition
for H clock
* Drop the unneeded 'ahb_factor_table[]' and change AHB clock to an
ungated OWL_COMP_FIXED_FACTOR definition.
Fixes: ed6b4795ec ("clk: actions: Add clock driver for S500 SoC")
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@gmail.com>
Link: https://lore.kernel.org/r/21c1abd19a7089b65a34852ac6513961be88cbe1.1623354574.git.cristian.ciocaltea@gmail.com
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a8f1f03caa ]
The following clocks of the Actions Semi Owl S500 SoC have been defined
to use a shared clock factor table 'bisp_factor_table[]': DE[1-2], VCE,
VDE, BISP, SENSOR[0-1]
There are several issues involved in this approach:
* 'bisp_factor_table[]' describes the configuration of a regular 8-rates
divider, so its usage is redundant. Additionally, judging by the BISP
clock context, it is incomplete since it maps only 8 out of 12
possible entries.
* The clocks mentioned above are not identical in terms of the available
rates, therefore cannot rely on the same factor table. Specifically,
BISP and SENSOR* are standard 12-rate dividers so their configuration
should rely on a proper clock div table, while VCE and VDE require a
factor table that is a actually a subset of the one needed for DE[1-2]
clocks.
Let's fix this by implementing the following:
* Add new factor tables 'de_factor_table' and 'hde_factor_table' to
properly handle DE[1-2], VCE and VDE clocks.
* Add a common div table 'std12rate_div_table' for BISP and SENSOR[0-1]
clocks converted to OWL_COMP_DIV.
* Drop the now unused 'bisp_factor_table[]'.
Additionally, drop the CLK_IGNORE_UNUSED flag for SENSOR[0-1] since
there is no reason to always keep ON those clocks.
Fixes: ed6b4795ec ("clk: actions: Add clock driver for S500 SoC")
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@gmail.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/e675820a46cd9930d8d576c6cae61d41c1a8416f.1623354574.git.cristian.ciocaltea@gmail.com
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 23837a6d7a ]
Error status of this event means that it has ended due reasons other
than a connection:
'If advertising has terminated as a result of the advertising duration
elapsing, the Status parameter shall be set to the error code
Advertising Timeout (0x3C).'
'If advertising has terminated because the
Max_Extended_Advertising_Events was reached, the Status parameter
shall be set to the error code Limit Reached (0x43).'
Fixes: acf0aeae43 ("Bluetooth: Handle ADv set terminated event")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c9ed0a7077 ]
These command do have variable length and the length can go up to 251,
so this changes the struct to not use a fixed size and then when
creating the PDU only the actual length of the data send to the
controller.
Fixes: a0fb3726ba ("Bluetooth: Use Set ext adv/scan rsp data if controller supports")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 799acb9347 ]
This fixes parsing of LTV entries when the length is 0.
Found with:
tools/mgmt-tester -s "Add Advertising - Success (ScRsp only)"
Add Advertising - Success (ScRsp only) - run
Sending Add Advertising (0x003e)
Test condition added, total 1
[ 11.004577] ==================================================================
[ 11.005292] BUG: KASAN: slab-out-of-bounds in tlv_data_is_valid+0x87/0xe0
[ 11.005984] Read of size 1 at addr ffff888002c695b0 by task mgmt-tester/87
[ 11.006711]
[ 11.007176]
[ 11.007429] Allocated by task 87:
[ 11.008151]
[ 11.008438] The buggy address belongs to the object at ffff888002c69580
[ 11.008438] which belongs to the cache kmalloc-64 of size 64
[ 11.010526] The buggy address is located 48 bytes inside of
[ 11.010526] 64-byte region [ffff888002c69580, ffff888002c695c0)
[ 11.012423] The buggy address belongs to the page:
[ 11.013291]
[ 11.013544] Memory state around the buggy address:
[ 11.014359] ffff888002c69480: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[ 11.015453] ffff888002c69500: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[ 11.016232] >ffff888002c69580: 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc
[ 11.017010] ^
[ 11.017547] ffff888002c69600: 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc
[ 11.018296] ffff888002c69680: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[ 11.019116] ==================================================================
Fixes: 2bb36870e8 ("Bluetooth: Unify advertising instance flags check")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1cb027f2f8 ]
The call to alloc_skb with the GFP_KERNEL flag can return a null sk_buff
pointer, so add a null check to avoid any null pointer deference issues.
Addresses-Coverity: ("Dereference null return value")
Fixes: afd2daa26c ("Bluetooth: Add support for virtio transport driver")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6c9feabc2c ]
When the firmware code is disabled, the incomplete error handling
in the clk driver causes compile-time warnings:
drivers/clk/zynqmp/pll.c: In function 'zynqmp_pll_recalc_rate':
drivers/clk/zynqmp/pll.c:147:29: error: 'fbdiv' is used uninitialized [-Werror=uninitialized]
147 | rate = parent_rate * fbdiv;
| ~~~~~~~~~~~~^~~~~~~
In function 'zynqmp_pll_get_mode',
inlined from 'zynqmp_pll_recalc_rate' at drivers/clk/zynqmp/pll.c:148:6:
drivers/clk/zynqmp/pll.c:61:27: error: 'ret_payload' is used uninitialized [-Werror=uninitialized]
61 | return ret_payload[1];
| ~~~~~~~~~~~^~~
drivers/clk/zynqmp/pll.c: In function 'zynqmp_pll_recalc_rate':
drivers/clk/zynqmp/pll.c:53:13: note: 'ret_payload' declared here
53 | u32 ret_payload[PAYLOAD_ARG_CNT];
| ^~~~~~~~~~~
drivers/clk/zynqmp/clk-mux-zynqmp.c: In function 'zynqmp_clk_mux_get_parent':
drivers/clk/zynqmp/clk-mux-zynqmp.c:57:16: error: 'val' is used uninitialized [-Werror=uninitialized]
57 | return val;
| ^~~
As it was apparently intentional to support this for compile testing
purposes, change the code to have just enough error handling for the
compiler to not notice the remaining bugs.
Fixes: 21f2375346 ("clk: zynqmp: Drop dependency on ARCH_ZYNQMP")
Co-developed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Link: https://lore.kernel.org/r/f1c4e8c903fe2d5df5413421920a56890a46387a.1624356908.git.michal.simek@xilinx.com
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e84045eab6 ]
An approximation for the PacketLifeTime is half the local ACK timeout.
The encoding for both timers are logarithmic.
If the local ACK timeout is set, but zero, it means the timer is
disabled. In this case, we choose the CMA_IBOE_PACKET_LIFETIME value,
since 50% of infinite makes no sense.
Before this commit, the PacketLifeTime became 255 if local ACK
timeout was zero (not running).
Fixed by explicitly testing for timeout being zero.
Fixes: e1ee1e62be ("RDMA/cma: Use ACK timeout for RoCE packetLifeTime")
Link: https://lore.kernel.org/r/1624371207-26710-1-git-send-email-haakon.bugge@oracle.com
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit be7f62eeba ]
priv->cbs is an array of priv->info->num_cbs_shapers elements of type
struct sja1105_cbs_entry which only get allocated if CONFIG_NET_SCH_CBS
is enabled.
However, sja1105_reload_cbs() is called from sja1105_static_config_reload()
which in turn is called for any of the items in sja1105_reset_reasons,
therefore during the normal runtime of the driver and not just from a
code path which can be triggered by the tc-cbs offload.
The sja1105_reload_cbs() function does not contain a check whether the
priv->cbs array is NULL or not, it just assumes it isn't and proceeds to
iterate through the credit-based shaper elements. This leads to a NULL
pointer dereference.
The solution is to return success if the priv->cbs array has not been
allocated, since sja1105_reload_cbs() has nothing to do.
Fixes: 4d7525085a ("net: dsa: sja1105: offload the Credit-Based Shaper qdisc")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2e7256f12c ]
Complete to commit def4ec6dce ("e1000e: PCIm function state support")
Check the PCIm state only on CSME systems. There is no point to do this
check on non CSME systems.
This patch fixes a generation a false-positive warning:
"Error in exiting dmoff"
Fixes: def4ec6dce ("e1000e: PCIm function state support")
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 624085a31c ]
First problem is that optlen is fetched without checking
there is more than one byte to parse.
Fix this by taking care of IPV6_TLV_PAD1 before
fetching optlen (under appropriate sanity checks against len)
Second problem is that IPV6_TLV_PADN checks of zero
padding are performed before the check of remaining length.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Fixes: c1412fce7e ("net/ipv6/exthdrs.c: Strict PadN option checking")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d67fb4772d ]
The key length used to store the macsec key was set to MACSEC_KEYID_LEN
(16), which is an issue as:
- This was never meant to be the key length.
- The key length can be > 16.
Fix this by using MACSEC_MAX_KEY_LEN instead (the max length accepted in
uAPI).
Fixes: 27736563ce ("net: atlantic: MACSec egress offload implementation")
Fixes: 9ff40a751a ("net: atlantic: MACSec ingress offload implementation")
Reported-by: Lior Nahmanson <liorna@nvidia.com>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c309217f91 ]
The key length used to store the macsec key was set to MACSEC_KEYID_LEN
(16), which is an issue as:
- This was never meant to be the key length.
- The key length can be > 16.
Fix this by using MACSEC_MAX_KEY_LEN instead (the max length accepted in
uAPI).
Fixes: 28c5107aa9 ("net: phy: mscc: macsec support")
Reported-by: Lior Nahmanson <liorna@nvidia.com>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1f7fe51211 ]
The key length used when offloading macsec to Ethernet or PHY drivers
was set to MACSEC_KEYID_LEN (16), which is an issue as:
- This was never meant to be the key length.
- The key length can be > 16.
Fix this by using MACSEC_MAX_KEY_LEN to store the key (the max length
accepted in uAPI) and secy->key_len to copy it.
Fixes: 3cf3227a21 ("net: macsec: hardware offloading infrastructure")
Reported-by: Lior Nahmanson <liorna@nvidia.com>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ca0c448d2b ]
The struct rdma_id_private contains three bit-fields, tos_set,
timeout_set, and min_rnr_timer_set. These are set by accessor functions
without any synchronization. If two or all accessor functions are invoked
in close proximity in time, there will be Read-Modify-Write from several
contexts to the same variable, and the result will be intermittent.
Fixed by protecting the bit-fields by the qp_mutex in the accessor
functions.
The consumer of timeout_set and min_rnr_timer_set is in
rdma_init_qp_attr(), which is called with qp_mutex held for connected
QPs. Explicit locking is added for the consumers of tos and tos_set.
This commit depends on ("RDMA/cma: Remove unnecessary INIT->INIT
transition"), since the call to rdma_init_qp_attr() from
cma_init_conn_qp() does not hold the qp_mutex.
Fixes: 2c1619edef ("IB/cma: Define option to set ack timeout and pack tos_set")
Fixes: 3aeffc46af ("IB/cma: Introduce rdma_set_min_rnr_timer()")
Link: https://lore.kernel.org/r/1624369197-24578-3-git-send-email-haakon.bugge@oracle.com
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f6ebca8efa ]
Free tx_pool and clear it, if allocation of tso_pool fails.
release_tx_pools() assumes we have both tx and tso_pools if ->tx_pool is
non-NULL. If allocation of tso_pool fails in init_tx_pools(), the assumption
will not be true and we would end up dereferencing ->tx_buff, ->free_map
fields from a NULL pointer.
Fixes: 3205306c6b ("ibmvnic: Update TX pool initialization routine")
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 552a33729f ]
free_long_term_buff() checks ltb->buff to decide whether we have a long
term buffer to free. So set ltb->buff to NULL afer freeing. While here,
also clear ->map_id, fix up some coding style and log an error.
Fixes: 9c4eaabd1b ("Check CRQ command return codes")
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 72368f8b2b ]
This fixes a crash in replenish_rx_pool() when called from ibmvnic_poll()
after a previous call to replenish_rx_pool() encountered an error when
allocating a socket buffer.
Thanks to Rick Lindsley and Dany Madden for helping debug the crash.
Fixes: 4f0b6812e9 ("ibmvnic: Introduce batched RX buffer descriptor transmission")
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 65d6470d13 ]
We batch subordinate command response queue (scrq) descriptors that we
need to send to the VIOS using an "indirect" buffer. If after we queue
one or more scrqs in the indirect buffer encounter an error (say fail
to allocate an skb), we leave the queued scrq descriptors in the
indirect buffer until the next call to ibmvnic_xmit().
On the next call to ibmvnic_xmit(), it is possible that the adapter is
going through a reset and it is possible that the long term buffers
have been unmapped on the VIOS side. If we proceed to flush (send) the
packets that are in the indirect buffer, we will end up using the old
map ids and this can cause the VIOS to trigger an unnecessary FATAL
error reset.
Instead of flushing packets remaining on the indirect_buff, discard
(clean) them instead.
Fixes: 0d97338818 ("ibmvnic: Introduce xmit_more support using batched subCRQ hcalls")
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2ca220f928 ]
This reverts commit 7c451f3ef6.
When a vnic interface is taken down and then up, connectivity is not
restored. We bisected it to this commit. Reverting this commit until
we can fully investigate the issue/benefit of the change.
Fixes: 7c451f3ef6 ("ibmvnic: remove duplicate napi_schedule call in open function")
Reported-by: Cristobal Forno <cforno12@linux.ibm.com>
Reported-by: Abdul Haleem <abdhalee@in.ibm.com>
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0ec13aff05 ]
This reverts commit 1c7d45e7b2.
We tried to optimize the number of hcalls we send and skipped sending
the REQUEST_MAP calls for some maps. However during resets, we need to
resend all the maps to the VIOS since the VIOS does not remember the
old values. In fact we may have failed over to a new VIOS which will
not have any of the mappings.
When we send packets with map ids the VIOS does not know about, it
triggers a FATAL reset. While the client does recover from the FATAL
error reset, we are seeing a large number of such resets. Handling
FATAL resets is lot more unnecessary work than issuing a few more
hcalls so revert the commit and resend the maps to the VIOS.
Fixes: 1c7d45e7b2 ("ibmvnic: simplify reset_long_term_buff function")
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fe331da0f2 ]
GCC may reports an running time assert error when a value calculated from
ib_mtu_enum_to_int() is using as 'val' in FIELD_PREDP:
include/linux/compiler_types.h:328:38: error: call to
'__compiletime_assert_1524' declared with attribute error: FIELD_PREP:
value too large for the field
So a check is added about whether integer mtu from ib_mtu_enum_to_int() is
negative to avoid this warning.
Link: https://lore.kernel.org/r/1624262443-24528-3-git-send-email-liweihang@huawei.com
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 956e759d5f ]
A recent change that made i40e use new udp_tunnel infrastructure
uses a method that expects to be called under rtnl lock.
However, not all codepaths do the lock prior to calling
i40e_setup_pf_switch.
Fix that by adding additional rtnl locking and unlocking.
Fixes: 40a98cb6f0 ("i40e: convert to new udp_tunnel infrastructure")
Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9c04cfcd4a ]
When vsi->type == I40E_VSI_FDIR, we have caught the return value of
i40e_vsi_request_irq() but without further handling. Check and execute
memory clean on failure just like the other i40e_vsi_request_irq().
Fixes: 8a9eb7d3cb ("i40e: rework fdir setup and teardown")
Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 364745fbe9 ]
This is technically a backwards incompatible change in behaviour, but I'm
going to argue that it is very unlikely to break things, and likely to fix
*far* more then it breaks.
In no particular order, various reasons follow:
(a) I've long had a bug assigned to myself to debug a super rare kernel crash
on Android Pixel phones which can (per stacktrace) be traced back to BPF clat
IPv6 to IPv4 protocol conversion causing some sort of ugly failure much later
on during transmit deep in the GSO engine, AFAICT precisely because of this
change to gso_size, though I've never been able to manually reproduce it. I
believe it may be related to the particular network offload support of attached
USB ethernet dongle being used for tethering off of an IPv6-only cellular
connection. The reason might be we end up with more segments than max permitted,
or with a GSO packet with only one segment... (either way we break some
assumption and hit a BUG_ON)
(b) There is no check that the gso_size is > 20 when reducing it by 20, so we
might end up with a negative (or underflowing) gso_size or a gso_size of 0.
This can't possibly be good. Indeed this is probably somehow exploitable (or
at least can result in a kernel crash) by delivering crafted packets and perhaps
triggering an infinite loop or a divide by zero... As a reminder: gso_size (MSS)
is related to MTU, but not directly derived from it: gso_size/MSS may be
significantly smaller then one would get by deriving from local MTU. And on
some NICs (which do loose MTU checking on receive, it may even potentially be
larger, for example my work pc with 1500 MTU can receive 1520 byte frames [and
sometimes does due to bugs in a vendor plat46 implementation]). Indeed even just
going from 21 to 1 is potentially problematic because it increases the number
of segments by a factor of 21 (think DoS, or some other crash due to too many
segments).
(c) It's always safe to not increase the gso_size, because it doesn't result in
the max packet size increasing. So the skb_increase_gso_size() call was always
unnecessary for correctness (and outright undesirable, see later). As such the
only part which is potentially dangerous (ie. could cause backwards compatibility
issues) is the removal of the skb_decrease_gso_size() call.
(d) If the packets are ultimately destined to the local device, then there is
absolutely no benefit to playing around with gso_size. It only matters if the
packets will egress the device. ie. we're either forwarding, or transmitting
from the device.
(e) This logic only triggers for packets which are GSO. It does not trigger for
skbs which are not GSO. It will not convert a non-GSO MTU sized packet into a
GSO packet (and you don't even know what the MTU is, so you can't even fix it).
As such your transmit path must *already* be able to handle an MTU 20 bytes
larger then your receive path (for IPv4 to IPv6 translation) - and indeed 28
bytes larger due to IPv4 fragments. Thus removing the skb_decrease_gso_size()
call doesn't actually increase the size of the packets your transmit side must
be able to handle. ie. to handle non-GSO max-MTU packets, the IPv4/IPv6 device/
route MTUs must already be set correctly. Since for example with an IPv4 egress
MTU of 1500, IPv4 to IPv6 translation will already build 1520 byte IPv6 frames,
so you need a 1520 byte device MTU. This means if your IPv6 device's egress
MTU is 1280, your IPv4 route must be 1260 (and actually 1252, because of the
need to handle fragments). This is to handle normal non-GSO packets. Thus the
reduction is simply not needed for GSO packets, because when they're correctly
built, they will already be the right size.
(f) TSO/GSO should be able to exactly undo GRO: the number of packets (TCP
segments) should not be modified, so that TCP's MSS counting works correctly
(this matters for congestion control). If protocol conversion changes the
gso_size, then the number of TCP segments may increase or decrease. Packet loss
after protocol conversion can result in partial loss of MSS segments that the
sender sent. How's the sending TCP stack going to react to receiving ACKs/SACKs
in the middle of the segments it sent?
(g) skb_{decrease,increase}_gso_size() are already no-ops for GSO_BY_FRAGS
case (besides triggering WARN_ON_ONCE). This means you already cannot guarantee
that gso_size (and thus resulting packet MTU) is changed. ie. you must assume
it won't be changed.
(h) changing gso_size is outright buggy for UDP GSO packets, where framing
matters (I believe that's also the case for SCTP, but it's already excluded
by [g]). So the only remaining case is TCP, which also doesn't want it
(see [f]).
(i) see also the reasoning on the previous attempt at fixing this
(commit fa7b83bf3b) which shows that the current
behaviour causes TCP packet loss:
In the forwarding path GRO -> BPF 6 to 4 -> GSO for TCP traffic, the
coalesced packet payload can be > MSS, but < MSS + 20.
bpf_skb_proto_6_to_4() will upgrade the MSS and it can be > the payload
length. After then tcp_gso_segment checks for the payload length if it
is <= MSS. The condition is causing the packet to be dropped.
tcp_gso_segment():
[...]
mss = skb_shinfo(skb)->gso_size;
if (unlikely(skb->len <= mss)) goto out;
[...]
Thus changing the gso_size is simply a very bad idea. Increasing is unnecessary
and buggy, and decreasing can go negative.
Fixes: 6578171a7f ("bpf: add bpf_skb_change_proto helper")
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Dongseok Yi <dseok.yi@samsung.com>
Cc: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/bpf/CANP3RGfjLikQ6dg=YpBU0OeHvyv7JOki7CyOUS9modaXAi-9vQ@mail.gmail.com
Link: https://lore.kernel.org/bpf/20210617000953.2787453-2-zenczykowski@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit aaf473d010 ]
If optval != NULL and optlen == 0 are specified for SO_J1939_FILTER in
j1939_sk_setsockopt(), memdup_sockptr() will return ZERO_PTR for 0
size allocation. The new filter will be mistakenly assigned ZERO_PTR.
This patch checks for optlen != 0 and filter will be assigned NULL in
case of optlen == 0.
Fixes: 9d71dd0c70 ("can: add support of SAE J1939 protocol")
Link: https://lore.kernel.org/r/20210620123842.117975-1-nslusarek@gmx.net
Signed-off-by: Norbert Slusarek <nslusarek@gmx.net>
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bcc3f2a829 ]
I see no reason why max_dst_opts_cnt and max_hbh_opts_cnt
are fetched from the initial net namespace.
The other sysctls (max_dst_opts_len & max_hbh_opts_len)
are in fact already using the current ns.
Note: it is not clear why ipv6_destopt_rcv() use two ways to
get to the netns :
1) dev_net(dst->dev)
Originally used to increment IPSTATS_MIB_INHDRERRORS
2) dev_net(skb->dev)
Tom used this variant in his patch.
Maybe this calls to use ipv6_skb_net() instead ?
Fixes: 47d3d7ac65 ("ipv6: Implement limits on Hop-by-Hop and Destination options")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <tom@quantonium.net>
Cc: Coco Li <lixiaoyan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b2ac9800cf ]
The Broadcom UniMAC MDIO bus from mdio-bcm-unimac module comes too late.
So, GENET cannot find the ethernet PHY on UniMAC MDIO bus. This leads
GENET fail to attach the PHY as following log:
bcmgenet fd580000.ethernet: GENET 5.0 EPHY: 0x0000
...
could not attach to PHY
bcmgenet fd580000.ethernet eth0: failed to connect to PHY
uart-pl011 fe201000.serial: no DMA platform data
libphy: bcmgenet MII bus: probed
...
unimac-mdio unimac-mdio.-19: Broadcom UniMAC MDIO bus
This patch adds the soft dependency to load mdio-bcm-unimac module
before genet module to avoid the issue.
Fixes: 9a4e796970 ("net: bcmgenet: utilize generic Broadcom UniMAC MDIO controller driver")
Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=213485
Signed-off-by: Jian-Hong Pan <jhp@endlessos.org>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e020ac961c ]
The error code returned by platform_get_irq() is stored in 'irq', it's
forgotten to be copied to 'ret' before being returned. As a result, the
value 0 of 'ret' is returned incorrectly.
After the above fix is completed, initializing the local variable 'ret'
to 0 is no longer needed, remove it.
In addition, when dpu_mdss_init() is successfully returned, the value of
'ret' is always 0. Therefore, replace "return ret" with "return 0" to make
the code clearer.
Fixes: 070e64dc1b ("drm/msm/dpu: Convert to a chained irq chip")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Link: https://lore.kernel.org/r/20210510063805.3262-2-thunder.leizhen@huawei.com
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Rob Clark <robdclark@chromium.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f21c8a276c ]
irq_hpd interrupt should be handled after dongle plugged in and
before dongle unplugged. Hence irq_hpd interrupt is enabled at
the end of the plugin handle and disabled at the beginning of
unplugged handle. Current irq_hpd with sink_count = 0 is wrongly
handled same as the dongle unplugged which tears down the mainlink
and disables the phy. This patch fixes this problem by only tearing
down the mainlink but keeping phy enabled at irq_hpd with
sink_count = 0 handle so that next irq_hpd with sink_count =1 can be
handled by setup mainlink only. This patch also set dongle into D3
(power off) state at end of handling irq_hpd with sink_count = 0.
Changes in v2:
-- add ctrl->phy_Power_count
Changes in v3:
-- del ctrl->phy_Power_count
-- add phy_power_off to dp_ctrl_off_link_stream()
Changes in v4:
-- return immediately if clock disable failed at dp_ctrl_off_link_stream()
Changes in v5:
-- set dongle to D3 (power off) state at dp_ctrl_off_link_stream()
Changes in v6:
-- add Fixes tag
Fixes: ea9f337ce8 ("drm/msm/dp: reset dp controller only at boot up and pm_resume")
Signed-off-by: Kuogee Hsieh <khsieh@codeaurora.org>
Tested-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Link: https://lore.kernel.org/r/1621635930-30161-1-git-send-email-khsieh@codeaurora.org
Signed-off-by: Rob Clark <robdclark@chromium.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7506d211b9 ]
The sub-programs prog->aux->poke_tab[] is populated in jit_subprogs() and
then used when emitting 'BPF_JMP|BPF_TAIL_CALL' insn->code from the
individual JITs. The poke_tab[] to use is stored in the insn->imm by
the code adding it to that array slot. The JIT then uses imm to find the
right entry for an individual instruction. In the x86 bpf_jit_comp.c
this is done by calling emit_bpf_tail_call_direct with the poke_tab[]
of the imm value.
However, we observed the below null-ptr-deref when mixing tail call
programs with subprog programs. For this to happen we just need to
mix bpf-2-bpf calls and tailcalls with some extra calls or instructions
that would be patched later by one of the fixup routines. So whats
happening?
Before the fixup_call_args() -- where the jit op is done -- various
code patching is done by do_misc_fixups(). This may increase the
insn count, for example when we patch map_lookup_up using map_gen_lookup
hook. This does two things. First, it means the instruction index,
insn_idx field, of a tail call instruction will move by a 'delta'.
In verifier code,
struct bpf_jit_poke_descriptor desc = {
.reason = BPF_POKE_REASON_TAIL_CALL,
.tail_call.map = BPF_MAP_PTR(aux->map_ptr_state),
.tail_call.key = bpf_map_key_immediate(aux),
.insn_idx = i + delta,
};
Then subprog start values subprog_info[i].start will be updated
with the delta and any poke descriptor index will also be updated
with the delta in adjust_poke_desc(). If we look at the adjust
subprog starts though we see its only adjusted when the delta
occurs before the new instructions,
/* NOTE: fake 'exit' subprog should be updated as well. */
for (i = 0; i <= env->subprog_cnt; i++) {
if (env->subprog_info[i].start <= off)
continue;
Earlier subprograms are not changed because their start values
are not moved. But, adjust_poke_desc() does the offset + delta
indiscriminately. The result is poke descriptors are potentially
corrupted.
Then in jit_subprogs() we only populate the poke_tab[]
when the above insn_idx is less than the next subprogram start. From
above we corrupted our insn_idx so we might incorrectly assume a
poke descriptor is not used in a subprogram omitting it from the
subprogram. And finally when the jit runs it does the deref of poke_tab
when emitting the instruction and crashes with below. Because earlier
step omitted the poke descriptor.
The fix is straight forward with above context. Simply move same logic
from adjust_subprog_starts() into adjust_poke_descs() and only adjust
insn_idx when needed.
[ 82.396354] bpf_testmod: version magic '5.12.0-rc2alu+ SMP preempt mod_unload ' should be '5.12.0+ SMP preempt mod_unload '
[ 82.623001] loop10: detected capacity change from 0 to 8
[ 88.487424] ==================================================================
[ 88.487438] BUG: KASAN: null-ptr-deref in do_jit+0x184a/0x3290
[ 88.487455] Write of size 8 at addr 0000000000000008 by task test_progs/5295
[ 88.487471] CPU: 7 PID: 5295 Comm: test_progs Tainted: G I 5.12.0+ #386
[ 88.487483] Hardware name: Dell Inc. Precision 5820 Tower/002KVM, BIOS 1.9.2 01/24/2019
[ 88.487490] Call Trace:
[ 88.487498] dump_stack+0x93/0xc2
[ 88.487515] kasan_report.cold+0x5f/0xd8
[ 88.487530] ? do_jit+0x184a/0x3290
[ 88.487542] do_jit+0x184a/0x3290
...
[ 88.487709] bpf_int_jit_compile+0x248/0x810
...
[ 88.487765] bpf_check+0x3718/0x5140
...
[ 88.487920] bpf_prog_load+0xa22/0xf10
Fixes: a748c6975d ("bpf: propagate poke descriptors to subprograms")
Reported-by: Jussi Maki <joamaki@gmail.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ce8eb4c728 ]
When changing number of TX queues using ethtool:
# ethtool -L eth0 tx 1
[ 135.301047] Unable to handle kernel paging request at virtual address 00000000af5d0000
[...]
[ 135.525128] Call trace:
[ 135.525142] dma_release_from_dev_coherent+0x2c/0xb0
[ 135.525148] dma_free_attrs+0x54/0xe0
[ 135.525156] k3_cppi_desc_pool_destroy+0x50/0xa0
[ 135.525164] am65_cpsw_nuss_remove_tx_chns+0x88/0xdc
[ 135.525171] am65_cpsw_set_channels+0x3c/0x70
[...]
This is because k3_cppi_desc_pool_destroy() which is called after
k3_udma_glue_release_tx_chn() in am65_cpsw_nuss_remove_tx_chns()
references struct device that is unregistered at the end of
k3_udma_glue_release_tx_chn()
Therefore the right order is to call k3_cppi_desc_pool_destroy() and
destroy desc pool before calling k3_udma_glue_release_tx_chn().
Fix this throughout the driver.
Fixes: 93a7653031 ("net: ethernet: ti: introduce am65x/j721e gigabit eth subsystem driver")
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ddeacc4f64 ]
Resetting software indexes in bcm4908_dma_alloc_buf_descs() is not
enough as it's called during device probe only. Driver resets DMA on
every .ndo_open callback and it's required to reset indexes then.
This fixes inconsistent rings state and stalled traffic after interface
down & up sequence.
Fixes: 4feffeadbc ("net: broadcom: bcm4908enet: add BCM4908 controller driver")
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c69f114d09 ]
When doing source address validation, the flowi4 struct used for
fib_lookup should be in the reverse direction to the given skb.
fl4_dport and fl4_sport returned by fib4_rules_early_flow_dissect
should thus be swapped.
Fixes: 5a847a6e14 ("net/ipv4: Initialize proto and ports in flow struct")
Signed-off-by: Miao Wang <shankerwangmiao@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a6e3f2985a ]
Commit 6c11fbf97e ("ip6_tunnel: add MPLS transmit support")
moved assiging inner_ipproto down from ipxip6_tnl_xmit() to
its callee ip6_tnl_xmit(). The latter is also used by GRE.
Since commit 3872035241 ("gre: Use inner_proto to obtain inner
header protocol") GRE had been depending on skb->inner_protocol
during segmentation. It sets it in gre_build_header() and reads
it in gre_gso_segment(). Changes to ip6_tnl_xmit() overwrite
the protocol, resulting in GSO skbs getting dropped.
Note that inner_protocol is a union with inner_ipproto,
GRE uses the former while the change switched it to the latter
(always setting it to just IPPROTO_GRE).
Restore the original location of skb_set_inner_ipproto(),
it is unclear why it was moved in the first place.
Fixes: 6c11fbf97e ("ip6_tunnel: add MPLS transmit support")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tested-by: Vadim Fedorenko <vfedorenko@novek.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 490274b474 ]
The msk socket state is currently updated in a few spots without
owning the msk socket lock itself.
Some of such operations are safe, as they happens before exposing
the msk socket to user-space and can't race with other changes.
A couple of them, at connect time, can actually race with close()
or shutdown(), leaving breaking the socket state machine.
This change addresses the issue moving such update under the msk
socket lock with the usual:
<acquire spinlock>
<check sk lock onwers>
<ev defer to release_cb>
scheme.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/56
Fixes: 8fd738049a ("mptcp: fallback in case of simultaneous connect")
Fixes: c3c123d16c ("net: mptcp: don't hang in mptcp_sendmsg() after TCP fallback")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7dd5d437c2 ]
In 32-bit architecture, the result of sizeof() is a 32-bit integer so
the expression becomes the multiplication between 2 32-bit integer which
can potentially leads to integer overflow. As a result,
bpf_map_area_alloc() allocates less memory than needed.
Fix this by casting 1 operand to u64.
Fixes: 0d2c4f9640 ("bpf: Eliminate rlimit-based memory accounting for sockmap and sockhash maps")
Fixes: 99c51064fb ("devmap: Use bpf_map_area_alloc() for allocating hash buckets")
Fixes: 546ac1ffb7 ("bpf: add devmap, a map for storing net device references")
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210613143440.71975-1-minhquangbui99@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 534799097a ]
nft_table_lookup() allows us to obtain the table object by the name and
the family. The netlink portID validation needs to be skipped for the
dump path, since the ownership only applies to commands to update the
given table. Skip validation if the specified netlink PortID is zero
when calling nft_table_lookup().
Fixes: 6001a930ce ("netfilter: nftables: introduce table ownership")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit dd72fadf21 ]
In case of xfrm offload, if xdo_dev_state_add() of driver returns
-EOPNOTSUPP, xfrm offload fallback is failed.
In xfrm state_add() both xso->dev and xso->real_dev are initialized to
dev and when err(-EOPNOTSUPP) is returned only xso->dev is set to null.
So in this scenario the condition in func validate_xmit_xfrm(),
if ((x->xso.dev != dev) && (x->xso.real_dev == dev))
return skb;
returns true, due to which skb is returned without calling esp_xmit()
below which has fallback code. Hence the CRYPTO_FALLBACK is failing.
So fixing this with by keeping x->xso.real_dev as NULL when err is
returned in func xfrm_dev_state_add().
Fixes: bdfd2d1fa7 ("bonding/xfrm: use real_dev instead of slave_dev")
Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b8b79c414e ]
8021q module adds vlan 0 to all interfaces when it starts.
When 8021q module is loaded it isn't possible to create bond
with mv88e6xxx interfaces, bonding module dipslay error
"Couldn't add bond vlan ids", because it tries to add vlan 0
to slave interfaces.
There is unexpected behavior in the switch. When a PVID
is assigned to a port the switch changes VID to PVID
in ingress frames with VID 0 on the port. Expected
that the switch doesn't assign PVID to tagged frames
with VID 0. But there isn't a way to change this behavior
in the switch.
Fixes: 57e661aae6 ("net: dsa: mv88e6xxx: Link aggregation support")
Signed-off-by: Eldar Gasanov <eldargasanov2@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b90788459c ]
If this test fails we must free some resources as in all the other error
handling paths of this function.
Fixes: ca9c54d2d6 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1502328f17 ]
When receiving 32 bits DSS ack from the peer, the MPTCP need
to expand them to 64 bits value. The current code is buggy
WRT detecting 32 bits ack wrap-around: when the wrap-around
happens the current unsigned 32 bit ack value is lower than
the previous one.
Additionally check for possible reverse wrap and make the helper
visible, so that we could re-use it for the next patch.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/204
Fixes: cc9d256698 ("mptcp: update per unacked sequence on pkt reception")
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ea45fdf82c ]
The VLAN transfer logic should actually check for
FLOW_DISSECTOR_KEY_BASIC, not FLOW_DISSECTOR_KEY_CONTROL. Moreover, do
not fallback to case 2) .n_proto is set to 802.1q or 802.1ad, if
FLOW_DISSECTOR_KEY_BASIC is unset.
Fixes: 783003f3bb ("netfilter: nftables_offload: special ethertype handling for VLAN")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3c5e446220 ]
Release flow from the abort path, this is easy to reproduce since
b72920f6e4 ("netfilter: nftables: counter hardware offload support").
If the preparation phase fails, then the abort path is exercised without
releasing the flow rule object.
unreferenced object 0xffff8881f0fa7700 (size 128):
comm "nft", pid 1335, jiffies 4294931120 (age 4163.740s)
hex dump (first 32 bytes):
08 e4 de 13 82 88 ff ff 98 e4 de 13 82 88 ff ff ................
48 e4 de 13 82 88 ff ff 01 00 00 00 00 00 00 00 H...............
backtrace:
[<00000000634547e7>] flow_rule_alloc+0x26/0x80
[<00000000c8426156>] nft_flow_rule_create+0xc9/0x3f0 [nf_tables]
[<0000000075ff8e46>] nf_tables_newrule+0xc79/0x10a0 [nf_tables]
[<00000000ba65e40e>] nfnetlink_rcv_batch+0xaac/0xf90 [nfnetlink]
[<00000000505c614a>] nfnetlink_rcv+0x1bb/0x1f0 [nfnetlink]
[<00000000eb78e1fe>] netlink_unicast+0x34b/0x480
[<00000000a8f72c94>] netlink_sendmsg+0x3af/0x690
[<000000009cb1ddf4>] sock_sendmsg+0x96/0xa0
[<0000000039d06e44>] ____sys_sendmsg+0x3fe/0x440
[<00000000137e82ca>] ___sys_sendmsg+0xd8/0x140
[<000000000c6bf6a6>] __sys_sendmsg+0xb3/0x130
[<0000000043bd6268>] do_syscall_64+0x40/0xb0
[<00000000afdebc2d>] entry_SYSCALL_64_after_hwframe+0x44/0xae
Remove flow rule release from the offload commit path, otherwise error
from the offload commit phase might trigger a double-free due to the
execution of the abort_offload -> abort. After this patch, the abort
path takes care of releasing the flow rule.
This fix also needs to move the nft_flow_rule_create() call before the
transaction object is added otherwise the abort path might find a NULL
pointer to the flow rule object for the NFT_CHAIN_HW_OFFLOAD case.
While at it, rename BASIC-like goto tags to slightly more meaningful
names rather than adding a new "err3" tag.
Fixes: 63b48c73ff ("netfilter: nf_tables_offload: undo updates if transaction fails")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d452d48b9f ]
We got multiple reports that multi_chunk_sendfile test
case from tls selftest fails. This was sort of expected,
as the original fix was never applied (see it in the first
Link:). The test in question uses sendfile() with count
larger than the size of the underlying file. This will
make splice set MSG_MORE on all sendpage calls, meaning
TLS will never close and flush the last partial record.
Eric seem to have addressed a similar problem in
commit 35f9c09fe9 ("tcp: tcp_sendpages() should call tcp_push() once")
by introducing MSG_SENDPAGE_NOTLAST. Unlike MSG_MORE
MSG_SENDPAGE_NOTLAST is not set on the last call
of a "pipefull" of data (PIPE_DEF_BUFFERS == 16,
so every 16 pages or whenever we run out of data).
Having a break every 16 pages should be fine, TLS
can pack exactly 4 pages into a record, so for
aligned reads there should be no difference,
unaligned may see one extra record per sendpage().
Sticking to TCP semantics seems preferable to modifying
splice, but we can revisit it if real life scenarios
show a regression.
Reported-by: Vadim Fedorenko <vfedorenko@novek.ru>
Reported-by: Seth Forshee <seth.forshee@canonical.com>
Link: https://lore.kernel.org/netdev/1591392508-14592-1-git-send-email-pooja.trivedi@stackpath.com/
Fixes: 3c4d755915 ("tls: kernel TLS support")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tested-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 291c53e4da ]
ChaCha support did not adjust the bidirectional test.
We need to set up KTLS in reverse direction correctly,
otherwise these two cases will fail:
tls.12_chacha.bidir
tls.13_chacha.bidir
Fixes: 4f336e88a8 ("selftests/tls: add CHACHA20-POLY1305 to tls selftests")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Vadim Fedorenko <vfedorenko@novek.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit baa00119d6 ]
A bunch of tests uses uninitialized stack memory as random
data to send. This is harmless but generates compiler warnings.
Explicitly init the buffers with random data.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Vadim Fedorenko <vfedorenko@novek.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 89837eb4b2 ]
The spin_trylock() was assumed to contain the implicit
barrier needed to ensure the correct ordering between
STATE_MISSED setting/clearing and STATE_MISSED checking
in commit a90c57f2ce ("net: sched: fix packet stuck
problem for lockless qdisc").
But it turns out that spin_trylock() only has load-acquire
semantic, for strongly-ordered system(like x86), the compiler
barrier implicitly contained in spin_trylock() seems enough
to ensure the correct ordering. But for weakly-orderly system
(like arm64), the store-release semantic is needed to ensure
the correct ordering as clear_bit() and test_bit() is store
operation, see queued_spin_lock().
So add the explicit barrier to ensure the correct ordering
for the above case.
Fixes: a90c57f2ce ("net: sched: fix packet stuck problem for lockless qdisc")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 603113c514 ]
Non-ND strict packets with a source LLA go through the packet taps
again, while non-ND strict packets with other source addresses do not,
and we can see a clone of those packets on the vrf interface (we should
not). This is due to a series of changes:
Commit 6f12fa775530[1] made non-ND strict packets not being pushed again
in the packet taps. This changed with commit 205704c618af[2] for those
packets having a source LLA, as they need a lookup with the orig_iif.
The issue now is those packets do not skip the 'vrf_ip6_rcv' function to
the end (as the ones without a source LLA) and go through the check to
call packet taps again. This check was changed by commit 6f12fa775530[1]
and do not exclude non-strict packets anymore. Packets matching
'need_strict && !is_ndisc && is_ll_src' are now being sent through the
packet taps again. This can be seen by dumping packets on the vrf
interface.
Fix this by having the same code path for all non-ND strict packets and
selectively lookup with the orig_iif for those with a source LLA. This
has the effect to revert to the pre-205704c618af[2] condition, which
should also be easier to maintain.
[1] 6f12fa7755 ("vrf: mark skb for multicast or link-local as enslaved to VRF")
[2] 205704c618 ("vrf: packets with lladdr src needs dst at input with orig_iif when needs strict")
Fixes: 205704c618 ("vrf: packets with lladdr src needs dst at input with orig_iif when needs strict")
Cc: Stephen Suryaputra <ssuryaextr@gmail.com>
Reported-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a7e65fe7d8 ]
We use non-blocking sockets for testing sockmap redirections,
and got some random EAGAIN errors from UDP tests.
There is no guarantee the packet would be immediately available
to receive as soon as it is sent out, even on the local host.
For UDP, this is especially true because it does not lock the
sock during BH (unlike the TCP path). This is probably why we
only saw this error in UDP cases.
No matter how hard we try to make the queue empty check accurate,
it is always possible for recvmsg() to beat ->sk_data_ready().
Therefore, we should just retry in case of EAGAIN.
Fixes: d6378af615 ("selftests/bpf: Add a test case for udp sockmap")
Reported-by: Jiang Wang <jiang.wang@bytedance.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20210615021342.7416-3-xiyou.wangcong@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0de449d599 ]
As documented at drivers/base/platform.c for platform_get_irq:
* Gets an IRQ for a platform device and prints an error message if finding the
* IRQ fails. Device drivers should check the return value for errors so as to
* not pass a negative integer value to the request_irq() APIs.
So, the driver should check that platform_get_irq() return value
is _negative_, not that it's equal to zero, because -ENXIO (return
value from request_irq() if irq was not found) will
pass this check and it leads to passing negative irq to request_irq()
Fixes: 0dd0770936 ("NET: Add ezchip ethernet driver")
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e4b8700e07 ]
priv is netdev private data, but it is used
after free_netdev(). It can cause use-after-free when accessing priv
pointer. So, fix it by moving free_netdev() after netif_napi_del()
call.
Fixes: 0dd0770936 ("NET: Add ezchip ethernet driver")
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e3a5de6d81 ]
static int greth_of_remove(struct platform_device *of_dev)
{
...
struct greth_private *greth = netdev_priv(ndev);
...
unregister_netdev(ndev);
free_netdev(ndev);
of_iounmap(&of_dev->resource[0], greth->regs, resource_size(&of_dev->resource[0]));
...
}
greth is netdev private data, but it is used
after free_netdev(). It can cause use-after-free when accessing greth
pointer. So, fix it by moving free_netdev() after of_iounmap()
call.
Fixes: d4c41139df ("net: Add Aeroflex Gaisler 10/100/1G Ethernet MAC driver")
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 723885a675 ]
Fix the maximum size of the coredump generated with current mt7921
firmware. Otherwise, a truncated coredump would be reported to userland
via dev_coredumpv.
Also, there is an additional error handling enhanced in the patch to avoid
the possible invalid buffer access when the system failed to create the
buffer to hold the coredump.
Fixes: 0da3c795d0 ("mt76: mt7921: add coredump support")
Co-developed-by: YN Chen <YN.Chen@mediatek.com>
Signed-off-by: YN Chen <YN.Chen@mediatek.com>
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 89043529c8 ]
FCS error packets are filtered by default and won't be reported to
driver, so that RX fcs error and PER in testmode always show zero.
Fix this issue by reading fcs error count from hw counter.
We did't fix this issue by disabling fcs error rx filter since it may
let HW suffer some SER errors.
Fixes: 5d8a83f099 ("mt76: mt7915: implement testmode rx support")
Signed-off-by: Shayne Chen <shayne.chen@mediatek.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 271fa68536 ]
Always wake the device up before dumping the single_sku power table
otherwise the device can hang.
Fixes: ea29acc97c ("mt76: mt7921: add dumping Tx power table")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 223cea6d3c ]
Get rid of an undefined behaviour in mt76_testmode_alloc_skb routine
allocating skb frames
Fixes: 2601dda8fa ("mt76: testmode: add support to send larger packet")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fe2c3b1fc6 ]
Free all pending frames in case of failure in mt76_testmode_alloc_skb
routine
Fixes: 2601dda8fa ("mt76: testmode: add support to send larger packet")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 10de032a31 ]
Hw beacon cmd to the mt7921 firmware doesn't only filter out the beacon,
but also performs its own connection monitoring, including periodic
keep-alives to the AP and probing the AP on beacon loss. Will indicate
the host with the event when the firmware detects the connection is lost.
Fixes: 1d8efc741d ("mt76: mt7921: introduce Runtime PM support")
Reviewed-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Deren Wu <deren.wu@mediatek.com>
Signed-off-by: YN Chen <yn.chen@mediatek.com>
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit edb5aebc1c ]
It is possible the RCPI from the certain antenna is an invalid value,
especially packets are receiving while the system is frequently entering
deep sleep mode, so consider calculating RSSI with the reasonable upper
bound to avoid report the wrong value to the mac80211 layer.
Fixes: 163f4d22c1 ("mt76: mt7921: add MAC support")
Reviewed-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 193e5f22ee ]
Update MCU command usage to fix WoW configuration with disconnection
and bitmap pattern and to avoid magic number.
Fixes: ffa1bf9742 ("mt76: mt7921: introduce PM support")
Reviewed-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: YN Chen <yn.chen@mediatek.com>
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4bfa291251 ]
If the device is MMIO-based, we must ensure all TxD/TxP on the host
memory all being consumed by the device prior to safely switching to
fw_own state.
Fixes: ec7bd7b4a9 ("mt76: connac: check wake refcount in mcu_fw_pmctrl")
Reviewed-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 20eb83c749 ]
WiFi subsytem reset should control MT_WFSYS_SW_RST_B and then poll the
same register until the bit WFSYS_SW_INIT_DONE bit is set.
Fixes: 0c1ce98846 ("mt76: mt7921: add wifi reset support")
Reviewed-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8d3cdc1bbb ]
Fix theoretical NULL pointer dereference in mt7615_tx_prepare_skb and
mt7663_usb_sdio_tx_prepare_skb routines. This issue has been identified
by code analysis.
Fixes: 6aa4ed7927 ("mt76: mt7615: implement DMA support for MT7622")
Fixes: 4bb586bc33 ("mt76: mt7663u: sync probe sampling with rate configuration")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d7400a2f3e ]
Even if this is not a real issue since mt76_tx is never run with wcid set
to NULL, fix a theoretical NULL pointer dereference in mt76_tx routine
Fixes: db9f11d343 ("mt76: store wcid tx rate info in one u32 reduce locking")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 861fad474e ]
"idx" needs to be signed for the error handling to work.
Fixes: 495184ac91 ("mt76: mt7915: add support for applying pre-calibration data")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bd70957438 ]
Commit 0571a753cb cancelled delayed work too late, keeping small
race. Cancel work sooner to close it completely.
Signed-off-by: Pavel Machek (CIP) <pavel@denx.de>
Fixes: 0571a753cb ("net: pxa168_eth: Fix a potential data race in pxa168_eth_remove")
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 85102ba58b ]
A Segmentation fault error is caused when the following command
is executed.
$ sudo ./samples/bpf/xdp_redirect lo
Segmentation fault
This command is missing a device <IFNAME|IFINDEX> as an argument, resulting
in out-of-bounds access from argv.
If the number of devices for the xdp_redirect parameter is not 2,
we should report an error and exit.
Fixes: 24251c2647 ("samples/bpf: add option for native and skb mode for redirect apps")
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210616042324.314832-1-wanghai38@huawei.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5e91eabf66 ]
Currently rtrs when create_qp use a coarse numbers (bigger in general),
which leads to hardware create more resources which only waste memory with
no benefits.
For max_send_wr, we don't really need alway max_qp_wr size when creating
qp, reduce it to cq_size.
For max_recv_wr, cq_size is enough.
With the patch when sess_queue_depth=128, per session (2 paths) memory
consumption reduced from 188 MB to 65MB
When always_invalidate is enabled, we need send more wr, so treat it
special.
Fixes: 9cb8374804 ("RDMA/rtrs: server: main functionality")
Link: https://lore.kernel.org/r/20210614090337.29557-2-jinpu.wang@ionos.com
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Reviewed-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 61e8aeda93 ]
The vmlinux ".BTF_ids" ELF section is declared in btf_ids.h to hold a list
of zero-filled BTF IDs, which is then patched at link-time with correct
values by resolv_btfids. The section is flagged as "allocable" to preclude
compression, but notably the section contents (BTF IDs) are untyped.
When patching the BTF IDs, resolve_btfids writes in host-native endianness
and relies on libelf for any required translation on reading and updating
vmlinux. However, since the type of the .BTF_ids section content defaults
to ELF_T_BYTE (i.e. unsigned char), no translation occurs. This results in
incorrect patched values when cross-compiling to non-native endianness,
and can manifest as kernel Oops and test failures which are difficult to
troubleshoot [1].
Explicitly set the type of patched data to ELF_T_WORD, the architecture-
neutral ELF type corresponding to the u32 BTF IDs. This enables libelf to
transparently perform any needed endian conversions.
Fixes: fbbb68de80 ("bpf: Add resolve_btfids tool to resolve BTF IDs in ELF object")
Signed-off-by: Tony Ambardar <Tony.Ambardar@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Frank Eigler <fche@redhat.com>
Cc: Mark Wielaard <mark@klomp.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/CAPGftE_eY-Zdi3wBcgDfkz_iOr1KF10n=9mJHm1_a_PykcsoeA@mail.gmail.com [1]
Link: https://lore.kernel.org/bpf/20210618061404.818569-1-Tony.Ambardar@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f654fae47e ]
Fix broken Tx ring validation for AF_XDP. The commit under the Fixes
tag, fixed an off-by-one error in the validation but introduced
another error. Descriptors are now let through even if they straddle a
chunk boundary which they are not allowed to do in aligned mode. Worse
is that they are let through even if they straddle the end of the umem
itself, tricking the kernel to read data outside the allowed umem
region which might or might not be mapped at all.
Fix this by reintroducing the old code, but subtract the length by one
to fix the off-by-one error that the original patch was
addressing. The test chunk != chunk_end makes sure packets do not
straddle chunk boundraries. Note that packets of zero length are
allowed in the interface, therefore the test if the length is
non-zero.
Fixes: ac31565c21 ("xsk: Fix for xp_aligned_validate_desc() when len == chunk_size")
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/bpf/20210618075805.14412-1-magnus.karlsson@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2f99619820 ]
Fix a missing validation of a Tx descriptor when executing in skb mode
and the umem is in unaligned mode. A descriptor could point to a
buffer straddling the end of the umem, thus effectively tricking the
kernel to read outside the allowed umem region. This could lead to a
kernel crash if that part of memory is not mapped.
In zero-copy mode, the descriptor validation code rejects such
descriptors by checking a bit in the DMA address that tells us if the
next page is physically contiguous or not. For the last page in the
umem, this bit is not set, therefore any descriptor pointing to a
packet straddling this last page boundary will be rejected. However,
the skb path does not use this bit since it copies out data and can do
so to two different pages. (It also does not have the array of DMA
address, so it cannot even store this bit.) The code just returned
that the packet is always physically contiguous. But this is
unfortunately also returned for the last page in the umem, which means
that packets that cross the end of the umem are being allowed, which
they should not be.
Fix this by introducing a check for this in the SKB path only, not
penalizing the zero-copy path.
Fixes: 2b43470add ("xsk: Introduce AF_XDP buffer allocation API")
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/bpf/20210617092255.3487-1-magnus.karlsson@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0c38740c08 ]
Seems like 4d1b629861 ("selftests/bpf: Convert few tests to light skeleton.")
and 704e2beba2 ("selftests/bpf: Test ringbuf mmap read-only and read-write
restrictions") were done independently on bpf and bpf-next trees and are in
conflict with each other, despite a clean merge. Fix fetching of ringbuf's
map_fd to use light skeleton properly.
Fixes: 704e2beba2 ("selftests/bpf: Test ringbuf mmap read-only and read-write restrictions")
Fixes: 4d1b629861 ("selftests/bpf: Convert few tests to light skeleton.")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210618002824.2081922-1-andrii@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 15ae1375ea ]
Currently the rdma_rxe driver attempts to protect atomic responder
resources by taking a reference to the qp which is only freed when the
resource is recycled for a new read or atomic operation. This means that
in normal circumstances there is almost always an extra qp reference once
an atomic operation has been executed which prevents cleaning up the qp
and associated pd and cqs when the qp is destroyed.
This patch removes the call to rxe_add_ref() in send_atomic_ack() and the
call to rxe_drop_ref() in free_rd_atomic_resource(). If the qp is
destroyed while a peer is retrying an atomic op it will cause the
operation to fail which is acceptable.
Link: https://lore.kernel.org/r/20210604230558.4812-1-rpearsonhpe@gmail.com
Reported-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Fixes: 86af617641 ("IB/rxe: remove unnecessary skb_clone")
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8f518d43f8 ]
The osf expression only supports for TCP packets, add a upfront sanity
check to skip packet parsing if this is not a TCP packet.
Fixes: b96af92d6e ("netfilter: nf_tables: implement Passive OS fingerprint module in nft_osf")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit cdd73cc545 ]
ipv6_find_hdr() does not validate that this is an IPv6 packet. Add a
sanity check for calling ipv6_find_hdr() to make sure an IPv6 packet
is passed for parsing.
Fixes: 96518518cc ("netfilter: add nftables")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 979ebc54cf ]
Firmware has added assert if beacon template is received after
vdev_down. Firmware expects beacon template after vdev_start
and before vdev_up. This change is needed to support MBSSID EMA
cases in firmware.
Hence, Change the sequence in ath11k as expected from firmware.
This new change is not causing any issues with older
firmware.
Tested-on: IPQ8074 hw2.0 AHB WLAN.HK.2.5.0.1.r3-00011-QCAHKSWPL_SILICONZ-1
Tested-on: IPQ8074 hw2.0 AHB WLAN.HK.2.5.0.1.r4-00008-QCAHKSWPL_SILICONZ-1
Fixes: d5c65159f2 ("ath11k: driver for Qualcomm IEEE 802.11ax devices")
Signed-off-by: Seevalamuthu Mariappan <seevalam@codeaurora.org>
[sven@narfation.org: added tested-on/fixes information]
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20210525133028.2805615-1-sven@narfation.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3f60f46856 ]
A previous commit 4f68ef64cd ("cw1200: Fix concurrency
use-after-free bugs in cw1200_hw_scan()") tried to fix a seemingly
use-after-free bug between cw1200_bss_info_changed() and
cw1200_hw_scan(), where the former frees a sk_buff pointed
to by frame.skb, and the latter accesses the sk_buff
pointed to by frame.skb. However, this issue should be a
false alarm because:
(1) "frame.skb" is not a shared variable between the above
two functions, because "frame" is a local function variable,
each of the two functions has its own local "frame" - they
just happen to have the same variable name.
(2) the sk_buff(s) pointed to by these two "frame.skb" are
also two different object instances, they are individually
allocated by different dev_alloc_skb() within the two above
functions. To free one object instance will not invalidate
the access of another different one.
Based on these facts, the previous commit should be unnecessary.
Moreover, it also introduced a missing unlock which was
addressed in a subsequent commit 51c8d24101 ("cw1200: fix missing
unlock on error in cw1200_hw_scan()"). Now that the
original use-after-free is unreal, these two commits should
be reverted. This patch performs the reversion.
Fixes: 4f68ef64cd ("cw1200: Fix concurrency use-after-free bugs in cw1200_hw_scan()")
Fixes: 51c8d24101 ("cw1200: fix missing unlock on error in cw1200_hw_scan()")
Signed-off-by: Hang Zhang <zh.nvgt@gmail.com>
Acked-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20210521223238.25020-1-zh.nvgt@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9a1590934d ]
The rx_lastpkt_rssi field provided by the firmware is suitable for
NL80211_STA_INFO_{SIGNAL,CHAIN_SIGNAL}, while the rssi field is an
average. Fix up the assignments and set the correct STA_INFO bits. This
lets userspace know that the average RSSI is part of the station info.
Fixes: cae355dc90 ("brcmfmac: Add RSSI information to get_station.")
Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20210506132010.3964484-2-alsi@bang-olufsen.dk
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit feb4564376 ]
The sinfo->chains field is a bitmask for filled values in chain_signal
and chain_signal_avg, not a count. Treat it as such so that the driver
can properly report per-chain RSSI information.
Before (MIMO mode):
$ iw dev wlan0 station dump
...
signal: -51 [-51] dBm
After (MIMO mode):
$ iw dev wlan0 station dump
...
signal: -53 [-53, -54] dBm
Fixes: cae355dc90 ("brcmfmac: Add RSSI information to get_station.")
Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20210506132010.3964484-1-alsi@bang-olufsen.dk
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 89212e160b ]
There is not strong reason to have both WWAN and WWAN_CORE symbols,
Let's build the WWAN core framework when WWAN is selected, in the
same way as for other subsystems.
This fixes issue with mhi_net selecting WWAN_CORE without WWAN and
reported by kernel test robot:
Kconfig warnings: (for reference only)
WARNING: unmet direct dependencies detected for WWAN_CORE
Depends on NETDEVICES && WWAN
Selected by
- MHI_NET && NETDEVICES && NET_CORE && MHI_BUS
Fixes: 9a44c1cc63 ("net: Add a WWAN subsystem")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ef48667557 ]
Right now wcn->hal_buf is allocated in wcn36xx_start(). This is a problem
since we should have setup all of the buffers we required by the time
ieee80211_register_hw() is called.
struct ieee80211_ops callbacks may run prior to mac_start() and therefore
wcn->hal_buf must be initialized.
This is easily remediated by moving the allocation to probe() taking the
opportunity to tidy up freeing memory by using devm_kmalloc().
Fixes: 8e84c25821 ("wcn36xx: mac80211 driver for Qualcomm WCN3660/WCN3680 hardware")
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20210605173347.2266003-1-bryan.odonoghue@linaro.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 272fdc0c45 ]
kernel test robot reports over 200 build errors and warnings
that are due to this Kconfig problem when CARL9170=m,
MAC80211=y, and LEDS_CLASS=m.
WARNING: unmet direct dependencies detected for MAC80211_LEDS
Depends on [n]: NET [=y] && WIRELESS [=y] && MAC80211 [=y] && (LEDS_CLASS [=m]=y || LEDS_CLASS [=m]=MAC80211 [=y])
Selected by [m]:
- CARL9170_LEDS [=y] && NETDEVICES [=y] && WLAN [=y] && WLAN_VENDOR_ATH [=y] && CARL9170 [=m]
CARL9170_LEDS selects MAC80211_LEDS even though its kconfig
dependencies are not met. This happens because 'select' does not follow
any Kconfig dependency chains.
Fix this by making CARL9170_LEDS depend on MAC80211_LEDS, where
the latter supplies any needed dependencies on LEDS_CLASS.
Fixes: 1d7e1e6b1b ("carl9170: Makefile, Kconfig files and MAINTAINERS")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kernel test robot <lkp@intel.com>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: Christian Lamparter <chunkeey@googlemail.com>
Cc: linux-wireless@vger.kernel.org
Cc: Arnd Bergmann <arnd@arndb.de>
Suggested-by: Christian Lamparter <chunkeey@googlemail.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Christian Lamparter <chunkeey@gmail.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20210530031134.23274-1-rdunlap@infradead.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 52e597d3e2 ]
The commit d96febedfd ("net: stmmac: arrange Tx tail pointer update
to stmmac_flush_tx_descriptors") introduced the following coverity
warning:-
1. Unintentional integer overflow (OVERFLOW_BEFORE_WIDEN)
overflow_before_widen: Potentially overflowing expression
'tx_q->cur_tx * desc_size' with type 'unsigned int' (32 bits,
unsigned) is evaluated using 32-bit arithmetic, and then used in a
context that expects an expression of type dma_addr_t (64 bits,
unsigned).
Fixed this by assigning tx_tail_addr to dma_addr_t type, as dma_addr_t
datatype is decided by CONFIG_ARCH_DMA_ADDR_T_64_BIT.
Fixes: d96febedfd ("net: stmmac: arrange Tx tail pointer update to stmmac_flush_tx_descriptors")
Signed-off-by: Wong Vee Khee <vee.khee.wong@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4744bf072b ]
To support XDP, a headroom is prepended to the packet data.
Consider this offset when doing a prefetch.
Fixes: da5ec7f22a ("net: stmmac: refactor stmmac_init_rx_buffers for stmmac_reinit_rx_buffers")
Signed-off-by: Matteo Croce <mcroce@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fc336ae622 ]
On 5P49V6965, when an output is enabled we enable the corresponding
FOD. When this happens for the first time, and specifically when writing
register VC5_OUT_DIV_CONTROL in vc5_clk_out_prepare(), all other outputs
are stopped for a short time and then restarted.
According to Renesas support this is intended: "The reason for that is VC6E
has synced up all output function".
This behaviour can be disabled at least on VersaClock 6E devices, of which
only the 5P49V6965 is currently implemented by this driver. This requires
writing bit 7 (bypass_sync{1..4}) in register 0x20..0x50. Those registers
are named "Unused Factory Reserved Register", and the bits are documented
as "Skip VDDO<N> verification", which does not clearly explain the relation
to FOD sync. However according to Renesas support as well as my testing
setting this bit does prevent disabling of all clock outputs when enabling
a FOD.
See "VersaClock ® 6E Family Register Descriptions and Programming Guide"
(August 30, 2018), Table 116 "Power Up VDD check", page 58:
https://www.renesas.com/us/en/document/mau/versaclock-6e-family-register-descriptions-and-programming-guide
Signed-off-by: Luca Ceresoli <luca@lucaceresoli.net>
Reviewed-by: Adam Ford <aford173@gmail.com>
Link: https://lore.kernel.org/r/20210527211647.1520720-1-luca@lucaceresoli.net
Fixes: 2bda748e6a ("clk: vc5: Add support for IDT VersaClock 5P49V6965")
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e075a78119 ]
If the of_get_named_gpio_flags call fails in vc4_hdmi_bind, we jump to
the err_unprepare_hsm label. That label will then call
pm_runtime_disable and put_device on the DDC device.
We just retrieved the DDC device, so the latter is definitely justified.
However at that point we still haven't called pm_runtime_enable, so the
call to pm_runtime_disable is not supposed to be there.
Fixes: 10ee275cb1 ("drm/vc4: prepare for CEC support")
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20210524131852.263883-1-maxime@cerno.tech
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 32a25f2ea6 ]
To avoid the following failure when trying to load the rdma_rxe module
while IPv6 is disabled, add a check for EAFNOSUPPORT and ignore the
failure, also delete the needless debug print from rxe_setup_udp_tunnel().
$ modprobe rdma_rxe
modprobe: ERROR: could not insert 'rdma_rxe': Operation not permitted
Fixes: dfdd6158ca ("IB/rxe: Fix kernel panic in udp_setup_tunnel")
Link: https://lore.kernel.org/r/20210603090112.36341-1-kamalheib1@gmail.com
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7345201c39 ]
The cm_init_av_for_lap() and cm_init_av_by_path() function calls have the
following issues:
1. Both of them might sleep and should not be called under spinlock.
2. The access of cm_id_priv->av should be under cm_id_priv->lock, which
means it can't be initialized directly.
This patch splits the calling of 2 functions into two parts: first one
initializes an AV outside of the spinlock, the second one copies AV to
cm_id_priv->av under spinlock.
Fixes: e1444b5a16 ("IB/cm: Fix automatic path migration support")
Link: https://lore.kernel.org/r/038fb8ad932869b4548b0c7708cab7f76af06f18.1622629024.git.leonro@nvidia.com
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3595c398f6 ]
This reverts commit 9db0ff53cb, which wasn't
a full fix and still causes to the following panic:
panic @ time 1605623870.843, thread 0xfffffeb63b552000: vm_fault_lookup: fault on nofault entry, addr: 0xfffffe811a94e000
time = 1605623870
cpuid = 9, TSC = 0xb7937acc1b6
Panic occurred in module kernel loaded at 0xffffffff80200000:Stack: --------------------------------------------------
kernel:vm_fault+0x19da
kernel:vm_fault_trap+0x6e
kernel:trap_pfault+0x1f1
kernel:trap+0x31e
kernel:cm_destroy_id+0x38c
kernel:rdma_destroy_id+0x127
kernel:sdp_shutdown_task+0x3ae
kernel:taskqueue_run_locked+0x10b
kernel:taskqueue_thread_loop+0x87
kernel:fork_exit+0x83
Link: https://lore.kernel.org/r/4346449a7cdacc7a4eedc89cb1b42d8434ec9814.1622629024.git.leonro@nvidia.com
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4b4e586ebe ]
This is being used with two quite different flows, one attaches the
message to the priv and the other does not.
Ensure the message attach is consistently done under the spinlock and
ensure that the free on error always detaches the message from the
cm_id_priv, also always under lock.
This makes read/write to the cm_id_priv->msg consistently locked and
consistently NULL'd when the message is freed, even in all error paths.
Link: https://lore.kernel.org/r/f692b8c89eecb34fd82244f317e478bea6c97688.1622629024.git.leonro@nvidia.com
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9c5eee0afc ]
Currently vlan modification action checks existence of vlan priority by
comparing it to 0. Therefore it is impossible to modify existing vlan
tag to have priority 0.
For example, the following tc command will change the vlan id but will
not affect vlan priority:
tc filter add dev eth1 ingress matchall action vlan modify id 300 \
priority 0 pipe mirred egress redirect dev eth2
The incoming packet on eth1:
ethertype 802.1Q (0x8100), vlan 200, p 4, ethertype IPv4
will be changed to:
ethertype 802.1Q (0x8100), vlan 300, p 4, ethertype IPv4
although the user has intended to have p == 0.
The fix is to add tcfv_push_prio_exists flag to struct tcf_vlan_params
and rely on it when deciding to set the priority.
Fixes: 45a497f2d1 (net/sched: act_vlan: Introduce TCA_VLAN_ACT_MODIFY vlan action)
Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit eebd49a4ff ]
In commit 68dc022d04 ("xfrm: BEET mode doesn't support fragments
for inner packets"), it tried to fix the issue that in TX side the
packet is fragmented before the ESP encapping while in the RX side
the fragments always get reassembled before decapping with ESP.
This is not true for IPv6. IPv6 is different, and it's using exthdr
to save fragment info, as well as the ESP info. Exthdrs are added
in TX and processed in RX both in order. So in the above case, the
ESP decapping will be done earlier than the fragment reassembling
in TX side.
Here just remove the fragment check for the IPv6 inner packets to
recover the fragments support for BEET mode.
Fixes: 68dc022d04 ("xfrm: BEET mode doesn't support fragments for inner packets")
Reported-by: Xiumei Mu <xmu@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 56bb7c28ad ]
The 600MHz is a too high clock rate for some SoC versions for the video
decoder hardware and this may cause stability issues. Use 300MHz for the
video decoder by default, which is supported by all hardware versions.
Fixes: ed1a2459e2 ("clk: tegra: Add Tegra20/30 EMC clock implementation")
Acked-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7ecd7e290b ]
sess->stats and sess->stats->pcpu_stats objects are freed
when sysfs entry is removed. If something wrong happens and
session is closed before sysfs entry is created,
sess->stats and sess->stats->pcpu_stats objects are not freed.
This patch adds freeing of them at three places:
1. When client uses wrong address and session creation fails.
2. When client fails to create a sysfs entry.
3. When client adds wrong address via sysfs add_path.
Fixes: 215378b838 ("RDMA/rtrs: client: sysfs interface functions")
Link: https://lore.kernel.org/r/20210528113018.52290-21-jinpu.wang@ionos.com
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5b73b799c2 ]
The queue_depth is a module parameter for rtrs_server. It is used on the
client side to determing the queue_depth of the request queue for the RNBD
virtual block device.
During a reconnection event for an already mapped device, in case the
rtrs_server module queue_depth has changed, fail the reconnect attempt.
Also stop further auto reconnection attempts. A manual reconnect via
sysfs has to be triggerred.
Fixes: 6a98d71dae ("RDMA/rtrs: client: main functionality")
Link: https://lore.kernel.org/r/20210528113018.52290-20-jinpu.wang@ionos.com
Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2371c40354 ]
When closing a session, currently the rtrs_srv_stats object in the
closing session is freed by kobject release. But if it failed
to create a session by various reasons, it must free the rtrs_srv_stats
object directly because kobject is not created yet.
This problem is found by kmemleak as below:
1. One client machine maps /dev/nullb0 with session name 'bla':
root@test1:~# echo "sessname=bla path=ip:192.168.122.190 \
device_path=/dev/nullb0" > /sys/devices/virtual/rnbd-client/ctl/map_device
2. Another machine failed to create a session with the same name 'bla':
root@test2:~# echo "sessname=bla path=ip:192.168.122.190 \
device_path=/dev/nullb1" > /sys/devices/virtual/rnbd-client/ctl/map_device
-bash: echo: write error: Connection reset by peer
3. The kmemleak on server machine reported an error:
unreferenced object 0xffff888033cdc800 (size 128):
comm "kworker/2:1", pid 83, jiffies 4295086585 (age 2508.680s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<00000000a72903b2>] __alloc_sess+0x1d4/0x1250 [rtrs_server]
[<00000000d1e5321e>] rtrs_srv_rdma_cm_handler+0xc31/0xde0 [rtrs_server]
[<00000000bb2f6e7e>] cma_ib_req_handler+0xdc5/0x2b50 [rdma_cm]
[<00000000e896235d>] cm_process_work+0x2d/0x100 [ib_cm]
[<00000000b6866c5f>] cm_req_handler+0x11bc/0x1c40 [ib_cm]
[<000000005f5dd9aa>] cm_work_handler+0xe65/0x3cf2 [ib_cm]
[<00000000610151e7>] process_one_work+0x4bc/0x980
[<00000000541e0f77>] worker_thread+0x78/0x5c0
[<00000000423898ca>] kthread+0x191/0x1e0
[<000000005a24b239>] ret_from_fork+0x3a/0x50
Fixes: 39c2d639ca ("RDMA/rtrs-srv: Set .release function for rtrs srv device during device init")
Link: https://lore.kernel.org/r/20210528113018.52290-18-jinpu.wang@ionos.com
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 64bce1ee97 ]
When re-connecting, it resets hb_missed_max to 0.
Before the first re-connecting, client will trigger re-connection
when it gets hb-ack more than 5 times. But after the first
re-connecting, clients will do re-connection whenever it does
not get hb-ack because hb_missed_max is 0.
There is no need to reset hb_missed_max when re-connecting.
hb_missed_max should be kept until closing the session.
Fixes: c0894b3ea6 ("RDMA/rtrs: core: lib functions shared between client and server modules")
Link: https://lore.kernel.org/r/20210528113018.52290-16-jinpu.wang@ionos.com
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 41db63a7ef ]
When get_next_path_min_inflight is called to select the next path, it
iterates over the list of available rtrs_clt_sess (paths). It then reads
the number of inflight IOs for that path to select one which has the least
inflight IO.
But it may so happen that rtrs_clt_sess (path) is no longer in the
connected state because closing or error recovery paths can change the status
of the rtrs_clt_Sess.
For example, the client sent the heart-beat and did not get the
response, it would change the session status and stop IO processing.
The added checking of this patch can prevent accessing the broken path
and generating duplicated error messages.
It is ok if the status is changed after checking the status because
the error recovery path does not free memory and only tries to
reconnection. And also it is ok if the session is closed after checking
the status because closing the session changes the session status and
flush all IO beforing free memory. If the session is being accessed for
IO processing, the closing session will wait.
Fixes: 6a98d71dae ("RDMA/rtrs: client: main functionality")
Link: https://lore.kernel.org/r/20210528113018.52290-13-jinpu.wang@ionos.com
Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Reviewed-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0a4d8e96e4 ]
For outgoing subflow join, when recv SYNACK, in subflow_finish_connect(),
the mptcp_finish_join() may return false in some cases, and send a RESET
to remote, and no local hmac is required.
So generate subflow hmac after mptcp_finish_join().
Fixes: ec3edaa7ca ("mptcp: Add handling of outgoing MP_JOIN requests")
Signed-off-by: Jianguo Wu <wujianguo@chinatelecom.cn>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2f1af441fd ]
After commit 2c5ebd001d ("mptcp: refactor token container"),
pr_debug() is called before mptcp_crypto_key_gen_sha() in
mptcp_token_new_connect(), so the output local_key, token and
idsn are 0, like:
MPTCP: ssk=00000000f6b3c4a2, local_key=0, token=0, idsn=0
Move pr_debug() after mptcp_crypto_key_gen_sha().
Fixes: 2c5ebd001d ("mptcp: refactor token container")
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jianguo Wu <wujianguo@chinatelecom.cn>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ce0cb93a5a ]
The variable bit_per_pix is a u8 and is promoted in the multiplication
to an int type and then sign extended to a u64. If the result of the
int multiplication is greater than 0x7fffffff then the upper 32 bits will
be set to 1 as a result of the sign extension. Avoid this by casting
tu_size_reg to u64 to avoid sign extension and also a potential overflow.
Fixes: 1a0f7ed3ab ("drm/rockchip: cdn-dp: add cdn DP support for rk3399")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20200915162049.36434-1-colin.king@canonical.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 43c2de1002 ]
When we first enable the DSI encoder, we currently program some per-chip
configuration that we look up in rk3399_chip_data based on the device
tree compatible we match. This data configures various parameters of the
MIPI lanes, including on RK3399 whether DSI1 is slaved to DSI0 in a
dual-mode configuration. It also selects which LCDC (i.e. VOP) to scan
out from.
This causes a problem in RK3399 dual-mode configurations, though: panel
prepare() callbacks run before the encoder gets enabled and expect to be
able to write commands to the DSI bus, but the bus isn't fully
functional until the lane and master/slave configuration have been
programmed. As a result, dual-mode panels (and possibly others too) fail
to turn on when the rockchipdrm driver is initially loaded.
Because the LCDC mux is the only thing we don't know until enable time
(and is the only thing that can ever change), we can actually move most
of the initialization to bind() and get it out of the way early. That's
what this change does. (Rockchip's 4.4 BSP kernel does it in mode_set(),
which also avoids the issue, but bind() seems like the more correct
place to me.)
Tested on a Google Scarlet board (Acer Chromebook Tab 10), which has a
Kingdisplay KD097D04 dual-mode panel. Prior to this change, the panel's
backlight would turn on but no image would appear when initially loading
rockchipdrm. If I kept rockchipdrm loaded and reloaded the panel driver,
it would come on. With this change, the panel successfully turns on
during initial rockchipdrm load as expected.
Fixes: 2d4f7bdafd ("drm/rockchip: dsi: migrate to use dw-mipi-dsi bridge driver")
Signed-off-by: Thomas Hebb <tommyhebb@gmail.com>
Tested-by: Jonathan Liu <net147@gmail.com>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Link: https://patchwork.freedesktop.org/patch/msgid/55fe7f3454d8c91dc3837ba5aa741d4a0e67378f.1618797813.git.tommyhebb@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b601c16b7b ]
At boot, we can't rely on the vc4_get_crtc_encoder since we don't have a
state yet and thus will not be able to figure out which connector is
attached to our CRTC.
However, we have a muxing bit in the CRTC register we can use to get the
encoder currently connected to the pixelvalve. We can thus read that
register, lookup the associated register through the vc4_pv_data
structure, and then pass it to vc4_crtc_disable so that we can perform
the proper operations.
Fixes: 875a4d5368 ("drm/vc4: drv: Disable the CRTC at boot time")
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Reviewed-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210507150515.257424-6-maxime@cerno.tech
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5a184d959d ]
The vc4_get_crtc_encoder function currently only works when the
connector->state->crtc pointer is set, which is only true when the
connector is currently enabled.
However, we use it as part of the disable path as well, and our lookup
will fail in that case, resulting in it returning a null pointer we
can't act on.
We can access the connector that used to be connected to that crtc
though using the old connector state in the disable path.
Since we want to support both the enable and disable path, we can
support it by passing the state accessor variant as a function pointer,
together with the atomic state.
Fixes: 792c3132bc ("drm/vc4: encoder: Add finer-grained encoder callbacks")
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Reviewed-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210507150515.257424-5-maxime@cerno.tech
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 47e4dc4e63 ]
Commit 46060be6d8 ("clk: sunxi-ng: v3s: use sigma-delta modulation for audio-pll")
changed the audio pll on the Allwinner V3s and V3 SoCs to use
sigma-delta modulation. In the process the declaration of fixed postdivider
providing "pll-audio" was adjusted to provide the desired clock rates from
the now sigma-delta modulated pll.
However, while the divider used for calculations by the clock framework
was adjusted the actual divider programmed into the hardware in
sun8i_v3_v3s_ccu_init was left at "divide by four". This broke the
"pll-audio" clock, now only providing quater the expected clock rate.
It would in general be desirable to program the postdivider for
"pll-audio" to four, such that a broader range of frequencies were
available on the pll outputs. But the clock for the integrated codec
"ac-dig" does not feature a mux that allows to select from all pll outputs
as it is just a simple clock gate connected to "pll-audio". Thus we need
to set the postdivider to one to be able to provide the 22.5792MHz and
24.576MHz rates required by the internal sun4i codec.
This patches fixes the incorrect clock rate by forcing the postdivider to
one in sun8i_v3_v3s_ccu_init.
Fixes: 46060be6d8 ("clk: sunxi-ng: v3s: use sigma-delta modulation for audio-pll")
Signed-off-by: Tobias Schramm <t.schramm@manjaro.org>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Link: https://lore.kernel.org/r/20210513131315.2059451-1-t.schramm@manjaro.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 52af13a414 ]
The variables will be free on path err_phy_connect, it should
return error code, or it will cause double free when calling
ftgmac100_remove().
Fixes: bd466c3fb5 ("net/faraday: Support NCSI mode")
Fixes: 39bfab8844 ("net: ftgmac100: Add support for DT phy-handle property")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d2aa135683 ]
[Why]
Conditions that end up modifying the global dc state must be locked.
However, during mst allocate payload sequence, lock is already taken.
With StarTech 1.2 DP hub, we get an HPD RX interrupt for a reason other
than to indicate down reply availability right after sending payload
allocation. The handler again takes dc lock before calling the
dc's HPD RX handler. Due to this contention, the DRM thread which waits
for MST down reply never gets a chance to finish its waiting
successfully and ends up timing out. Once the lock is released, the hpd
rx handler fires and goes ahead to read from the MST HUB, but now its
too late and the HUB doesnt lightup all displays since DRM lacks error
handling when payload allocation fails.
[How]
Take lock only if there is a change in link status or if automated test
pattern bit is set. The latter fixes the null pointer dereference when
running certain DP Link Layer Compliance test.
Fixes: c8ea79a8a2 ("drm/amd/display: NULL pointer error during compliance test")
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 509b9a5b48 ]
[Why]
If GPU is in reset state, force enabling link will cause
unexpected behaviour.
[How]
Avoid handling HPD IRQ when GPU is in reset state.
Signed-off-by: Zhan Liu <zhan.liu@amd.com>
Reviewed-by: Nikola Cornij <nikola.cornij@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit cf8b92a756 ]
[Why]
In gpu reset dc_lock acquired in dm_suspend().
Asynchronously handle_hpd_rx_irq can also be called
through amdgpu_dm_irq_suspend->flush_work, which also
tries to acquire dc_lock. That causes a deadlock.
[How]
Check if amdgpu executing reset before acquiring dc_lock.
Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Signed-off-by: Roman Li <Roman.Li@amd.com>
Reviewed-by: Qingqing Zhuo <Qingqing.Zhuo@amd.com>
Acked-by: Wayne Lin <Wayne.Lin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bc794f8c56 ]
While some SoC samples are able to lock with a PLL factor of 55, others
samples can't. ATM, a minimum of 60 appears to work on all the samples
I have tried.
Even with 60, it sometimes takes a long time for the PLL to eventually
lock. The documentation says that the minimum rate of these PLLs DCO
should be 3GHz, a factor of 125. Let's use that to be on the safe side.
With factor range changed, the PLL seems to lock quickly (enough) so far.
It is still unclear if the range was the only reason for the delay.
Fixes: 085a4ea93d ("clk: meson: g12a: add peripheral clock controller")
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Acked-by: Neil Armstrong <narmstrong@baylibre.com>
Link: https://lore.kernel.org/r/20210429090325.60970-1-jbrunet@baylibre.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 88509f698c ]
In cases where the dirty linear memory range spans multiple sample sheets
in a surface, the dirty surface region is incorrectly computed.
To do this correctly and in an optimized fashion we would have to compute
the dirty region of each sample sheet and compute the union of those
regions.
But assuming that cpu writing to a multisample surface is rather a corner
case than a common case, just set the dirty region to the full surface.
This fixes OpenGL piglit errors with SVGA_FORCE_COHERENT=1
and the piglit test:
fbo-depthstencil blit default_fb -samples=2 -auto
Fixes: 9ca7d19ff8 ("drm/vmwgfx: Add surface dirty-tracking callbacks")
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Zack Rusin <zackr@vmware.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210505035740.286923-4-zackr@vmware.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 75156a887b ]
The SVGA3dCmdDXGenMips command uses a shader-resource view to access
the underlying surface. Normally accesses using that view-type are not
dirtying the underlying surface, but that particular command is an
exception.
Mark the surface gpu-dirty after a SVGA3dCmdDXGenMips command has been
submitted.
This fixes the piglit getteximage-formats test run with
SVGA_FORCE_COHERENT=1
Fixes: a9f58c456e ("drm/vmwgfx: Be more restrictive when dirtying resources")
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Zack Rusin <zackr@vmware.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210505035740.286923-3-zackr@vmware.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9e3617a7b8 ]
If GPIO controller is not available yet we need to defer
the probe of GBE until provider will become available.
While here, drop GPIOF_EXPORT because it's deprecated and
may not be available.
Fixes: f1a26fdf59 ("pch_gbe: Add MinnowBoard support")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Tested-by: Flavio Suligoi <f.suligoi@asem.it>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 71f0891c84 ]
In each iteration fwnode_for_each_available_child_node() bumps a reference
counting of a loop variable followed by dropping in on a next iteration,
Since in error case the loop is broken, we have to drop a reference count
by ourselves. Do it for port_fwnode in error case during ->probe().
Fixes: 248122212f ("net: mvpp2: use device_*/fwnode_* APIs instead of of_*")
Cc: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a100243d95 ]
When we call af_ops->set_link_af() we hold a RCU read lock
as we retrieve af_ops from the RCU protected list, but this
is unnecessary because we already hold RTNL lock, which is
the writer lock for protecting rtnl_af_ops, so it is safer
than RCU read lock. Similar for af_ops->validate_link_af().
This was not a problem until we begin to take mutex lock
down the path of ->set_link_af() in __ipv6_dev_mc_dec()
recently. We can just drop the RCU read lock there and
assert RTNL lock.
Reported-and-tested-by: syzbot+7d941e89dd48bcf42573@syzkaller.appspotmail.com
Fixes: 63ed8de4be ("mld: add mc_lock for protecting per-interface mld data")
Tested-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 17b9a94656 ]
The conversion to drm managed resources introduced two bugs: the plane is now
always initialized with the linear-only list, while the list with the Vivante
GPU modifiers should have been used when the PRG/PRE engines are present. This
masked another issue, as ipu_plane_format_mod_supported() is now called before
the private plane data is set up, so if a non-linear modifier is supplied in
the plane modifier list, we run into a NULL pointer dereference checking for
the PRG presence. To fix this just remove the check from this function, as we
know that it will only be called with a non-linear modifier, if the plane init
code has already determined that the PRG/PRE is present.
Fixes: 699e7e543f ("drm/imx: ipuv3-plane: use drm managed resources")
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Link: https://lore.kernel.org/r/20210510145927.988661-1-l.stach@pengutronix.de
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 06841148c5 ]
Only planes that are displayed via the Display Processor (DP) path
support color space conversion. Limit formats on planes that are
shown via the direct Display Controller (DC) path to RGB.
Reported-by: Fabio Estevam <festevam@gmail.com>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 655c0ed197 ]
In dm_dp_mst_detect(), We should check whether or not @connector
has been unregistered from userspace. If the connector is unregistered,
we should return disconnected status.
Fixes: 4562236b3b ("drm/amd/dc: Add dc display driver (v2)")
Signed-off-by: Yingjie Wang <wangyingjie55@126.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b515d26372 ]
Jianwen reported that IPv6 Interoperability tests are failing in an
IPsec case where one of the links between the IPsec peers has an MTU
of 1280. The peer generates a packet larger than this MTU, the router
replies with a "Packet too big" message indicating an MTU of 1280.
When the peer tries to send another large packet, xfrm_state_mtu
returns 1280 - ipsec_overhead, which causes ip6_setup_cork to fail
with EINVAL.
We can fix this by forcing xfrm_state_mtu to return IPV6_MIN_MTU when
IPv6 is used. After going through IPsec, the packet will then be
fragmented to obey the actual network's PMTU, just before leaving the
host.
Currently, TFC padding is capped to PMTU - overhead to avoid
fragementation: after padding and encapsulation, we still fit within
the PMTU. That behavior is preserved in this patch.
Fixes: 91657eafb6 ("xfrm: take net hdr len into account for esp payload size calculation")
Reported-by: Jianwen Ji <jiji@redhat.com>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 41eb5df1cb ]
Patch series "mm: memcg/slab: Fix objcg pointer array handling problem", v4.
Since the merging of the new slab memory controller in v5.9, the page
structure stores a pointer to objcg pointer array for slab pages. When
the slab has no used objects, it can be freed in free_slab() which will
call kfree() to free the objcg pointer array in
memcg_alloc_page_obj_cgroups(). If it happens that the objcg pointer
array is the last used object in its slab, that slab may then be freed
which may caused kfree() to be called again.
With the right workload, the slab cache may be set up in a way that allows
the recursive kfree() calling loop to nest deep enough to cause a kernel
stack overflow and panic the system. In fact, we have a reproducer that
can cause kernel stack overflow on a s390 system involving kmalloc-rcl-256
and kmalloc-rcl-128 slabs with the following kfree() loop recursively
called 74 times:
[ 285.520739] [<000000000ec432fc>] kfree+0x4bc/0x560 [ 285.520740]
[<000000000ec43466>] __free_slab+0xc6/0x228 [ 285.520741]
[<000000000ec41fc2>] __slab_free+0x3c2/0x3e0 [ 285.520742]
[<000000000ec432fc>] kfree+0x4bc/0x560 : While investigating this issue, I
also found an issue on the allocation side. If the objcg pointer array
happen to come from the same slab or a circular dependency linkage is
formed with multiple slabs, those affected slabs can never be freed again.
This patch series addresses these two issues by introducing a new set of
kmalloc-cg-<n> caches split from kmalloc-<n> caches. The new set will
only contain non-reclaimable and non-dma objects that are accounted in
memory cgroups whereas the old set are now for unaccounted objects only.
By making this split, all the objcg pointer arrays will come from the
kmalloc-<n> caches, but those caches will never hold any objcg pointer
array. As a result, deeply nested kfree() call and the unfreeable slab
problems are now gone.
This patch (of 4):
Since the merging of the new slab memory controller in v5.9, the page
structure may store a pointer to obj_cgroup pointer array for slab pages.
Currently, only the __GFP_ACCOUNT bit is masked off. However, the array
is not readily reclaimable and doesn't need to come from the DMA buffer.
So those GFP bits should be masked off as well.
Do the flag bit clearing at memcg_alloc_page_obj_cgroups() to make sure
that it is consistently applied no matter where it is called.
Link: https://lkml.kernel.org/r/20210505200610.13943-1-longman@redhat.com
Link: https://lkml.kernel.org/r/20210505200610.13943-2-longman@redhat.com
Fixes: 286e04b8ed ("mm: memcg/slab: allocate obj_cgroups for non-root slab pages")
Signed-off-by: Waiman Long <longman@redhat.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 832b507253 ]
mmap_lock will explicitly disable/enable preemption upon manipulating its
local CPU variables. This is to be expected, but in this case, it doesn't
play well with PREEMPT_RT. The preemption disabled code section also
takes a spin-lock. Spin-locks in RT systems will try to schedule, which
is exactly what we're trying to avoid.
To mitigate this, convert the explicit preemption handling to local_locks.
Which are RT aware, and will disable migration instead of preemption when
PREEMPT_RT=y.
The faulty call trace looks like the following:
__mmap_lock_do_trace_*()
preempt_disable()
get_mm_memcg_path()
cgroup_path()
kernfs_path_from_node()
spin_lock_irqsave() /* Scheduling while atomic! */
Link: https://lkml.kernel.org/r/20210604163506.2103900-1-nsaenzju@redhat.com
Fixes: 2b5067a814 ("mm: mmap_lock: add tracepoints around lock acquisition ")
Signed-off-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
Tested-by: Axel Rasmussen <axelrasmussen@google.com>
Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1a14e3779d ]
grab_mapping_entry() has a bug in handling of ENOMEM condition. Suppose
we have a PMD entry at index i which we are downgrading to a PTE entry.
grab_mapping_entry() will set pmd_downgrade to true, lock the entry, clear
the entry in xarray, and decrement mapping->nrpages. The it will call:
entry = dax_make_entry(pfn_to_pfn_t(0), flags);
dax_lock_entry(xas, entry);
which inserts new PTE entry into xarray. However this may fail allocating
the new node. We handle this by:
if (xas_nomem(xas, mapping_gfp_mask(mapping) & ~__GFP_HIGHMEM))
goto retry;
however pmd_downgrade stays set to true even though 'entry' returned from
get_unlocked_entry() will be NULL now. And we will go again through the
downgrade branch. This is mostly harmless except that mapping->nrpages is
decremented again and we temporarily have an invalid entry stored in
xarray. Fix the problem by setting pmd_downgrade to false each time we
lookup the entry we work with so that it matches the entry we found.
Link: https://lkml.kernel.org/r/20210622160015.18004-1-jack@suse.cz
Fixes: b15cd80068 ("dax: Convert page fault handlers to XArray")
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 54e948c60c ]
The snprintf() function returns the number of bytes which would have been
printed if the buffer was large enough. In other words it can return ">=
remain" but this code assumes it returns "== remain".
The run time impact of this bug is not very severe. The next iteration
through the loop would trigger a WARN() when we pass a negative limit to
snprintf(). We would then return success instead of -E2BIG.
The kernel implementation of snprintf() will never return negatives so
there is no need to check and I have deleted that dead code.
Link: https://lkml.kernel.org/r/20210511135350.GV1955@kadam
Fixes: a860f6eb4c ("ocfs2: sysfile interfaces for online file check")
Fixes: 74ae4e104d ("ocfs2: Create stack glue sysfs files.")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit cb9516be77 ]
Commit 6e6fcbc27e ("blk-mq: support batching dispatch in case of io")
starts to support io batching submission by using hctx->dispatch_busy.
However, blk_mq_update_dispatch_busy() isn't changed to update hctx->dispatch_busy
in that commit, so fix the issue by updating hctx->dispatch_busy in case
of real scheduler.
Reported-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jan Kara <jack@suse.cz>
Fixes: 6e6fcbc27e ("blk-mq: support batching dispatch in case of io")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20210625020248.1630497-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 60b6a7e6a0 ]
For chained bio, trace_block_bio_complete in bio_endio is currently called
only by the parent bio once upon all chained bio completed.
However, the sector and size for the parent bio are modified in bio_split.
Therefore, the size and sector of the complete events might not match the
queue events in blktrace.
The original fix of bio completion trace <fbbaf700e7b1> ("block: trace
completion of all bios.") wants multiple complete events to correspond
to one queue event but missed this.
The issue can be reproduced by md/raid5 read with bio cross chunks.
To fix, move trace completion into the loop for every chained bio to call.
Fixes: fbbaf700e7 ("block: trace completion of all bios.")
Reviewed-by: Wade Liang <wadel@synology.com>
Reviewed-by: BingJing Chang <bingjingc@synology.com>
Signed-off-by: Edward Hsieh <edwardh@synology.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210624123030.27014-1-edwardh@synology.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8c37d01e1a ]
The 86ad9a24f2 ("PM / devfreq: Add required OPPs support to passive governor")
supported the required-opp property for using devfreq passive governor.
But, 86ad9a24f2 has caused the problem on use-case when required-opp
is not used such as exynos-bus.c devfreq driver. So that fix the
get_target_freq of passive governor for supporting the case of when
required-opp is not used.
Fixes: 86ad9a24f2 ("PM / devfreq: Add required OPPs support to passive governor")
Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3b7180573c ]
In the CPU removal path the ->offline() callback provided by the
driver is always invoked before ->exit(), but in the cpufreq_online()
error path it is not, so ->exit() is expected to somehow know the
context in which it has been called and act accordingly.
That is less than straightforward, so make cpufreq_online() invoke
the driver's ->offline() callback, if present, on errors before
->exit() too.
This only potentially affects intel_pstate.
Fixes: 91a12e91dc ("cpufreq: Allow light-weight tear down and bring up of CPUs")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f37ccf8fce ]
clang's Control Flow Integrity requires that every indirect call has a
valid target, which is based on the type of the function pointer. The
*_show() functions in this file are written as if they will be called
from dev_attr_show(); however, they will be called from
sysfs_kf_seq_show() because the files were created by
sysfs_create_group() and the sysfs ops are based on kobj_sysfs_ops
because of kobject_add_and_create(). Because the *_show() functions do
not match the type of the show() member in struct kobj_attribute, there
is a CFI violation.
$ cat /sys/firmware/acpi/bgrt/{status,type,version,{x,y}offset}}
1
0
1
522
307
$ dmesg | grep "CFI failure"
[ 267.761825] CFI failure (target: type_show.d5e1ad21498a5fd14edbc5c320906598.cfi_jt+0x0/0x8):
[ 267.762246] CFI failure (target: xoffset_show.d5e1ad21498a5fd14edbc5c320906598.cfi_jt+0x0/0x8):
[ 267.762584] CFI failure (target: status_show.d5e1ad21498a5fd14edbc5c320906598.cfi_jt+0x0/0x8):
[ 267.762973] CFI failure (target: yoffset_show.d5e1ad21498a5fd14edbc5c320906598.cfi_jt+0x0/0x8):
[ 267.763330] CFI failure (target: version_show.d5e1ad21498a5fd14edbc5c320906598.cfi_jt+0x0/0x8):
Convert these functions to the type of the show() member in struct
kobj_attribute so that there is no more CFI violation. Because these
functions are all so similar, combine them into a macro.
Fixes: d1ff4b1cdb ("ACPI: Add support for exposing BGRT data")
Link: https://github.com/ClangBuiltLinux/linux/issues/1406
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9a2ac41b13 ]
Commit 85686d0dc1 ("block, bfq: keep shared queues out of the waker
mechanism") leaves shared bfq_queues out of the waker-detection
mechanism. It attains this goal by not updating the pointer
last_completed_rq_bfqq, if the last request completed belongs to a
shared bfq_queue (so that the pointer will not point to the shared
bfq_queue).
Yet this has a side effect: the pointer last_completed_rq_bfqq keeps
pointing, deceptively, to a bfq_queue that actually is not the last
one to have had a request completed. As a consequence, such a
bfq_queue may deceptively be considered as a waker of some bfq_queue,
even of some shared bfq_queue.
To address this issue, reset last_completed_rq_bfqq if the last
request completed belongs to a shared queue.
Fixes: 85686d0dc1 ("block, bfq: keep shared queues out of the waker mechanism")
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Link: https://lore.kernel.org/r/20210619140948.98712-8-paolo.valente@linaro.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bd3664b362 ]
Since commit 430a67f9d6 ("block, bfq: merge bursts of newly-created
queues"), BFQ may schedule a merge between a newly created sync
bfq_queue, say Q2, and the last sync bfq_queue created, say Q1. To this
goal, BFQ stores the address of Q1 in the field bic->stable_merge_bfqq
of the bic associated with Q2. So, when the time for the possible merge
arrives, BFQ knows which bfq_queue to merge Q2 with. In particular,
BFQ checks for possible merges on request arrivals.
Yet the same bic may also be associated with an async bfq_queue, say
Q3. So, if a request for Q3 arrives, then the above check may happen
to be executed while the bfq_queue at hand is Q3, instead of Q2. In
this case, Q1 happens to be merged with an async bfq_queue. This is
not only a conceptual mistake, because async queues are to be kept out
of queue merging, but also a bug that leads to inconsistent states.
This commits simply filters async queues out of delayed merges.
Fixes: 430a67f9d6 ("block, bfq: merge bursts of newly-created queues")
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Link: https://lore.kernel.org/r/20210619140948.98712-6-paolo.valente@linaro.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1d0903d61e ]
Now that we disable wbt by simply zero out rwb->wb_normal in
wbt_disable_default() when switch elevator to bfq, but it's not safe
because it will become false positive if we change queue depth. If it
become false positive between wbt_wait() and wbt_track() when submit
write request, it will lead to drop rqw->inflight to -1 in wbt_done(),
which will end up trigger IO hung. Fix this issue by introduce a new
state which mean the wbt was disabled.
Fixes: a79050434b ("blk-rq-qos: refactor out common elements of blk-wbt")
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://lore.kernel.org/r/20210619093700.920393-2-yi.zhang@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0a9ece9ba1 ]
igen6_edac needs mce_register()/unregister() functions,
so it should depend on X86_MCE (or X86_MCE_INTEL).
That change prevents these build errors:
ld: drivers/edac/igen6_edac.o: in function `igen6_remove':
igen6_edac.c:(.text+0x494): undefined reference to `mce_unregister_decode_chain'
ld: drivers/edac/igen6_edac.o: in function `igen6_probe':
igen6_edac.c:(.text+0xf5b): undefined reference to `mce_register_decode_chain'
Fixes: 10590a9d4f ("EDAC/igen6: Add EDAC driver for Intel client SoCs using IBECC")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20210619160203.2026-1-rdunlap@infradead.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ccb5ecdc2d ]
Before commit 8fcc4ae6fa ("arm64: acpi: Make apei_claim_sea()
synchronise with APEI's irq work"), do_sea() would unconditionally
signal the affected task from the arch code. Since that change,
the GHES driver sends the signals.
This exposes a problem as errors the GHES driver doesn't understand
or doesn't handle effectively are silently ignored. It will cause
the errors get taken again, and circulate endlessly. User-space task
get stuck in this loop.
Existing firmware on Kunpeng9xx systems reports cache errors with the
'ARM Processor Error' CPER records.
Do memory failure handling for ARM Processor Error Section just like
for Memory Error Section.
Fixes: 8fcc4ae6fa ("arm64: acpi: Make apei_claim_sea() synchronise with APEI's irq work")
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Reviewed-by: James Morse <james.morse@arm.com>
[ rjw: Subject edit ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2b2bcc76e2 ]
Getting a report (e.g. feature report) from a device requires us to send
a request indicating which report we want to retrieve and then waiting
for the corresponding response containing that report. We already
provide the response structure to the request call, but the request
isn't marked as a request that expects a response. Thus the request
returns before we receive the response and the response buffer indicates
a zero length response due to that.
This essentially means that the get-report calls are broken and will
always indicate that a report of length zero has been read.
Fix this by appropriately marking the request.
Fixes: b05ff1002a ("HID: Add support for Surface Aggregator Module HID transport")
Signed-off-by: Maximilian Luz <luzmaximilian@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d07f3b081e ]
pstore-blk just pokes directly into the pagecache for the block
device without going through the file operations for that by faking
up it's own file operations that do not match the block device ones.
As this breaks the control of the block layer of it's page cache,
and even now just works by accident only the best thing is to just
disable this driver.
Fixes: 17639f67c1 ("pstore/blk: Introduce backend for block devices")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210608161327.1537919-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 888be6067b ]
Currently, a device description can be obtained using ACPI, if the _STR
method exists for a particular device, and then exposed to the userspace
via a sysfs object as a string value.
If the _STR method is available for a given device then the data
(usually a Unicode string) is read and stored in a buffer (of the
ACPI_TYPE_BUFFER type) with a pointer to said buffer cached in the
struct acpi_device_pnp for later access.
The description_show() function is responsible for exposing the device
description to the userspace via a corresponding sysfs object and
internally calls the utf16s_to_utf8s() function with a pointer to the
buffer that contains the Unicode string so that it can be converted from
UTF16 encoding to UTF8 and thus allowing for the value to be safely
stored and later displayed.
When invoking the utf16s_to_utf8s() function, the description_show()
function also sets a limit of the data that can be saved into a provided
buffer as a result of the character conversion to be a total of
PAGE_SIZE, and upon completion, the utf16s_to_utf8s() function returns
an integer value denoting the number of bytes that have been written
into the provided buffer.
Following the execution of the utf16s_to_utf8s() a newline character
will be added at the end of the resulting buffer so that when the value
is read in the userspace through the sysfs object then it would include
newline making it more accessible when working with the sysfs file
system in the shell, etc. Normally, this wouldn't be a problem, but if
the function utf16s_to_utf8s() happens to return the number of bytes
written to be precisely PAGE_SIZE, then we would overrun the buffer and
write the newline character outside the allotted space which can have
undefined consequences or result in a failure.
To fix this buffer overrun, ensure that there always is enough space
left for the newline character to be safely appended.
Fixes: d1efe3c324 ("ACPI: Add new sysfs interface to export device description")
Signed-off-by: Krzysztof Wilczyński <kw@linux.com>
Reviewed-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 450605c28d ]
Microsoft Hypervisor expects the logical processor index to be the same
as CPU's index during logical processor creation. Using cpu_physical_id
confuses hypervisor's scheduler. That causes the root partition not boot
when core scheduler is used.
This patch removes the call to cpu_physical_id and uses the CPU index
directly for bringing up logical processor. This scheme works for both
classic scheduler and core scheduler.
Fixes: 333abaf5ab (x86/hyperv: implement and use hv_smp_prepare_cpus)
Signed-off-by: Praveen Kumar <kumarpraveen@linux.microsoft.com>
Link: https://lore.kernel.org/r/20210531074046.113452-1-kumarpraveen@linux.microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b9370dceab ]
The ACPI fan device IDs are shared between the fan driver and the
device power management code. The former is modular, so it needs
to include the table of device IDs for module autoloading and the
latter needs that list to avoid attaching the generic ACPI PM domain
to fan devices (which doesn't make sense) possibly before the fan
driver module is loaded.
Unfortunately, that requires the list of fan device IDs to be
updated in two places which is prone to mistakes, so put it into
a symbol definition in a separate header file so there is only one
copy of it in case it needs to be updated again in the future.
Fixes: b9ea0bae26 ("ACPI: PM: Avoid attaching ACPI PM domain to certain devices")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 18b380ed61 ]
Set err code in the error path before jumping to the end of the function.
Fixes: 4dc3bab868 ("PM / devfreq: Add support delayed timer for polling mode")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2e2f16d5cd ]
On ARMv7, resource_size_t can be 64-bit, which breaks printing
it as %x:
drivers/edac/aspeed_edac.c: In function 'init_csrows':
drivers/edac/aspeed_edac.c:257:28: error: format '%x' expects argument of \
type 'unsigned int', but argument 4 has type 'resource_size_t' {aka 'long \
long unsigned int'} [-Werror=format=]
257 | dev_dbg(mci->pdev, "dt: /memory node resources: first page \
r.start=0x%x, resource_size=0x%x, PAGE_SHIFT macro=0x%x\n",
Use the special %pR format string to pretty-print the entire resource
instead.
Fixes: edfc2d73ca ("EDAC/aspeed: Add support for AST2400 and AST2600")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Andrew Jeffery <andrew@aj.id.au>
Link: https://lkml.kernel.org/r/20210421135500.3518661-1-arnd@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 61c6f04a98 ]
v4l2_ctrl_new_std() fails if the caller provides no 'step' parameter for
integer control, so define it to fix following error:
s5p_mfc_dec_ctrls_setup:1166: Adding control (1) failed
Fixes: c3042bff91 ("media: s5p-mfc: Use display delay and display enable std controls")
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 95778c2d09 ]
i.MX6 device tree include files contain dangling endpoints for the
board device tree writers' convenience. These are still included in
many existing device trees.
Treat dangling endpoints as non-existent to support them.
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Fixes: 612b385efb ("media: video-mux: Create media links in bound notifier")
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ecc3a92c6f ]
Drop an unnecessary include of asm/barrier.h from dirty_log_test.c to
allow the test to build on arm64. arm64, s390, and x86 all build cleanly
without the include (PPC and MIPS aren't supported in KVM's selftests).
arm64's barrier.h includes linux/kasan-checks.h, which is not copied
into tools/.
In file included from ../../../../tools/include/asm/barrier.h:8,
from dirty_log_test.c:19:
.../arm64/include/asm/barrier.h:12:10: fatal error: linux/kasan-checks.h: No such file or directory
12 | #include <linux/kasan-checks.h>
| ^~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
Fixes: 84292e5659 ("KVM: selftests: Add dirty ring buffer test")
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210622200529.3650424-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e5830fb13b ]
Commit 22f232d134 ("KVM: selftests: x86: Set supported CPUIDs on
default VM") moved vcpu_set_cpuid into vm_create_with_vcpus, but
dirty_log_test doesn't use it to create vm. So vcpu's CPUIDs is
not set, the guest's pa_bits in kvm would be smaller than the
value queried by userspace.
However, the dirty track memory slot is in the highest GPA, the
reserved bits in gpte would be set with wrong pa_bits.
For shadow paging, page fault would fail in permission_fault and
be injected into guest. Since guest doesn't have idt, it finally
leads to vm_exit for triple fault.
Move vcpu_set_cpuid into vm_vcpu_add_default to set supported
CPUIDs on default vcpu, since almost all tests need it.
Fixes: 22f232d134 ("KVM: selftests: x86: Set supported CPUIDs on default VM")
Signed-off-by: Hou Wenlong <houwenlong93@linux.alibaba.com>
Message-Id: <411ea2173f89abce56fc1fca5af913ed9c5a89c9.1624351343.git.houwenlong93@linux.alibaba.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8f91efd870 ]
Race detected between psi_trigger_destroy/create as shown below, which
cause panic by accessing invalid psi_system->poll_wait->wait_queue_entry
and psi_system->poll_timer->entry->next. Under this modification, the
race window is removed by initialising poll_wait and poll_timer in
group_init which are executed only once at beginning.
psi_trigger_destroy() psi_trigger_create()
mutex_lock(trigger_lock);
rcu_assign_pointer(poll_task, NULL);
mutex_unlock(trigger_lock);
mutex_lock(trigger_lock);
if (!rcu_access_pointer(group->poll_task)) {
timer_setup(poll_timer, poll_timer_fn, 0);
rcu_assign_pointer(poll_task, task);
}
mutex_unlock(trigger_lock);
synchronize_rcu();
del_timer_sync(poll_timer); <-- poll_timer has been reinitialized by
psi_trigger_create()
So, trigger_lock/RCU correctly protects destruction of
group->poll_task but misses this race affecting poll_timer and
poll_wait.
Fixes: 461daba06b ("psi: eliminate kthread_worker from psi trigger scheduling mechanism")
Co-developed-by: ziwei.dai <ziwei.dai@unisoc.com>
Signed-off-by: ziwei.dai <ziwei.dai@unisoc.com>
Co-developed-by: ke.wang <ke.wang@unisoc.com>
Signed-off-by: ke.wang <ke.wang@unisoc.com>
Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Link: https://lkml.kernel.org/r/1623371374-15664-1-git-send-email-huangzhaoyang@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e31694e0a7 ]
When objtool creates the .altinstructions section, it sets the SHF_WRITE
flag to make the section writable -- unless the section had already been
previously created by the kernel. The mismatch between kernel-created
and objtool-created section flags can cause failures with external
tooling (kpatch-build). And the section doesn't need to be writable
anyway.
Make the section flags consistent with the kernel's.
Fixes: 9bc0bb5072 ("objtool/x86: Rewrite retpoline thunk calls")
Reported-by: Joe Lawrence <joe.lawrence@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/6c284ae89717889ea136f9f0064d914cd8329d31.1624462939.git.jpoimboe@redhat.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2a96726bd0 ]
The function nx842_OF_upd_status triggers a sparse RCU warning when
it directly dereferences the RCU-protected devdata. This appears
to be an accident as there was another variable of the same name
that was passed in from the caller.
After it was removed (because the main purpose of using it, to
update the status member was itself removed) the global variable
unintenionally stood in as its replacement.
This patch restores the devdata parameter.
Fixes: 90fd73f912 ("crypto: nx - remove pSeries NX 'status' field")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 012669c740 ]
On HETEROGENEOUS hardware (ARM big.Little, Intel Alderlake etc.) each
CPU might have a different hardware PMU. Since each such PMU is
represented by a different struct pmu, but we only have a single HW
task context.
That means that the task context needs to switch PMU type when it
switches CPUs.
Not doing this means that ctx->pmu calls (pmu_{dis,en}able(),
{start,commit,cancel}_txn() etc.) are called against the wrong PMU and
things will go wobbly.
Fixes: f83d2f91d2 ("perf/x86/intel: Add Alder Lake Hybrid support")
Reported-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lkml.kernel.org/r/YMsy7BuGT8nBTspT@hirez.programming.kicks-ass.net
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0d7993b234 ]
The current sun6i SPI implementation initializes the transfer too early,
resulting in SCK going high before the transfer. When using an additional
(gpio) chipselect with sun6i, the chipselect is asserted at a time when
clock is high, making the SPI transfer fail.
This is due to SUN6I_GBL_CTL_BUS_ENABLE being written into
SUN6I_GBL_CTL_REG at an early stage. Moving that to the transfer
function, hence, right before the transfer starts, mitigates that
problem.
Fixes: 3558fe900e (spi: sunxi: Add Allwinner A31 SPI controller driver)
Signed-off-by: Mirko Vogt <mirko-dev|linux@nanl.de>
Signed-off-by: Ralf Schlatterbeck <rsc@runtux.com>
Link: https://lore.kernel.org/r/20210614144507.y3udezjfbko7eavv@runtux.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c0c2c0dad6 ]
When PROVE_RAW_LOCK_NESTING=y many of the selftests FAILED because
HARDIRQ context is out-of-bounds for spinlocks. Instead make the
default hardware context the threaded hardirq context, which preserves
the old locking rules.
The wait-type specific locking selftests will have a non-threaded
HARDIRQ variant.
Fixes: de8f5e4f2d ("lockdep: Introduce wait-type checks")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Joerg Roedel <jroedel@suse.de>
Link: https://lore.kernel.org/r/20210617190313.322096283@infradead.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0213b7083e ]
Now cpu.uclamp.min acts as a protection, we need to make sure that the
uclamp request of the task is within the allowed range of the cgroup,
that is it is clamp()'ed correctly by tg->uclamp[UCLAMP_MIN] and
tg->uclamp[UCLAMP_MAX].
As reported by Xuewen [1] we can have some corner cases where there's
inversion between uclamp requested by task (p) and the uclamp values of
the taskgroup it's attached to (tg). Following table demonstrates
2 corner cases:
| p | tg | effective
-----------+-----+------+-----------
CASE 1
-----------+-----+------+-----------
uclamp_min | 60% | 0% | 60%
-----------+-----+------+-----------
uclamp_max | 80% | 50% | 50%
-----------+-----+------+-----------
CASE 2
-----------+-----+------+-----------
uclamp_min | 0% | 30% | 30%
-----------+-----+------+-----------
uclamp_max | 20% | 50% | 20%
-----------+-----+------+-----------
With this fix we get:
| p | tg | effective
-----------+-----+------+-----------
CASE 1
-----------+-----+------+-----------
uclamp_min | 60% | 0% | 50%
-----------+-----+------+-----------
uclamp_max | 80% | 50% | 50%
-----------+-----+------+-----------
CASE 2
-----------+-----+------+-----------
uclamp_min | 0% | 30% | 30%
-----------+-----+------+-----------
uclamp_max | 20% | 50% | 30%
-----------+-----+------+-----------
Additionally uclamp_update_active_tasks() must now unconditionally
update both UCLAMP_MIN/MAX because changing the tg's UCLAMP_MAX for
instance could have an impact on the effective UCLAMP_MIN of the tasks.
| p | tg | effective
-----------+-----+------+-----------
old
-----------+-----+------+-----------
uclamp_min | 60% | 0% | 50%
-----------+-----+------+-----------
uclamp_max | 80% | 50% | 50%
-----------+-----+------+-----------
*new*
-----------+-----+------+-----------
uclamp_min | 60% | 0% | *60%*
-----------+-----+------+-----------
uclamp_max | 80% |*70%* | *70%*
-----------+-----+------+-----------
[1] https://lore.kernel.org/lkml/CAB8ipk_a6VFNjiEnHRHkUMBKbA+qzPQvhtNjJ_YNzQhqV_o8Zw@mail.gmail.com/
Fixes: 0c18f2ecfc ("sched/uclamp: Fix wrong implementation of cpu.uclamp.min")
Reported-by: Xuewen Yan <xuewen.yan94@gmail.com>
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210617165155.3774110-1-qais.yousef@arm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d7d607096a ]
DL keeps track of the utilization on a per-rq basis with the structure
avg_dl. This utilization is updated during task_tick_dl(),
put_prev_task_dl() and set_next_task_dl(). However, when the current
running task changes its policy, set_next_task_dl() which would usually
take care of updating the utilization when the rq starts running DL
tasks, will not see a such change, leaving the avg_dl structure outdated.
When that very same task will be dequeued later, put_prev_task_dl() will
then update the utilization, based on a wrong last_update_time, leading to
a huge spike in the DL utilization signal.
The signal would eventually recover from this issue after few ms. Even
if no DL tasks are run, avg_dl is also updated in
__update_blocked_others(). But as the CPU capacity depends partly on the
avg_dl, this issue has nonetheless a significant impact on the scheduler.
Fix this issue by ensuring a load update when a running task changes
its policy to DL.
Fixes: 3727e0e ("sched/dl: Add dl_rq utilization tracking")
Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lore.kernel.org/r/1624271872-211872-3-git-send-email-vincent.donnefort@arm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fecfcbc288 ]
RT keeps track of the utilization on a per-rq basis with the structure
avg_rt. This utilization is updated during task_tick_rt(),
put_prev_task_rt() and set_next_task_rt(). However, when the current
running task changes its policy, set_next_task_rt() which would usually
take care of updating the utilization when the rq starts running RT tasks,
will not see a such change, leaving the avg_rt structure outdated. When
that very same task will be dequeued later, put_prev_task_rt() will then
update the utilization, based on a wrong last_update_time, leading to a
huge spike in the RT utilization signal.
The signal would eventually recover from this issue after few ms. Even if
no RT tasks are run, avg_rt is also updated in __update_blocked_others().
But as the CPU capacity depends partly on the avg_rt, this issue has
nonetheless a significant impact on the scheduler.
Fix this issue by ensuring a load update when a running task changes
its policy to RT.
Fixes: 371bf427 ("sched/rt: Add rt_rq utilization tracking")
Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lore.kernel.org/r/1624271872-211872-2-git-send-email-vincent.donnefort@arm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit be1a540886 ]
Split up the #VC handler code into a from-user and a from-kernel part.
This allows clean and correct state tracking, as the #VC handler needs
to enter NMI-state when raised from kernel mode and plain IRQ state when
raised from user-mode.
Fixes: 62441a1fb5 ("x86/sev-es: Correctly track IRQ states in runtime #VC handler")
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210618115409.22735-3-joro@8bytes.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d187f21733 ]
The #VC handler only cares about IRQs being disabled while the GHCB is
active, as it must not be interrupted by something which could cause
another #VC while it holds the GHCB (NMI is the exception for which the
backup GHCB exits).
Make sure nothing interrupts the code path while the GHCB is active
by making sure that callers of __sev_{get,put}_ghcb() have disabled
interrupts upfront.
[ bp: Massage commit message. ]
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210618115409.22735-2-joro@8bytes.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1aeb6b563a ]
When a log recovery is in progress, lots of operations have to take that
into account, so we keep this status per tree during the operation. Long
time ago error handling revamp patch 79787eaab4 ("btrfs: replace many
BUG_ONs with proper error handling") removed clearing of the status in
an error branch. Add it back as was intended in e02119d5a7 ("Btrfs:
Add a write ahead tree log to optimize synchronous operations").
There are probably no visible effects, log replay is done only during
mount and if it fails all structures are cleared so the stale status
won't be kept.
Fixes: 79787eaab4 ("btrfs: replace many BUG_ONs with proper error handling")
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 57c045bc72 ]
commit db27f8294c changed eco_mode << (ffs(sreg->eco_mode_mask) - 1)
to sreg->eco_mode_mask << (ffs(sreg->eco_mode_mask) - 1) which is wrong.
Fix it by simply set val = sreg->eco_mode_mask.
In additional, sreg->eco_mode_mask can be 0 (LDO3, LDO33, LDO34).
Return -EINVAL if idle mode is not supported when sreg->eco_mode_mask is 0.
While at it, also use unsigned int for reg_val/val which is the expected
type for regmap_read and regmap_update_bits.
Fixes: db27f8294c ("staging: regulator: hi6421v600-regulator: use shorter names for OF properties")
Signed-off-by: Axel Lin <axel.lin@ingics.com>
Link: https://lore.kernel.org/r/20210619123423.4091429-1-axel.lin@ingics.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2a71fabf6a ]
According to ARM DDI 0487G.a, page D13-3895, setting the PMCR_EL0.P bit to
1 has the following effect:
"Reset all event counters accessible in the current Exception level, not
including PMCCNTR_EL0, to zero."
Similar behaviour is described for AArch32 on page G8-7022. Make it so.
Fixes: c01d6a1802 ("KVM: arm64: pmu: Only handle supported event counters")
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210618105139.83795-1-alexandru.elisei@arm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 857f84743e ]
Currently pf_fixed is not increased when prefault is true. This is not
correct, since prefault here really means "async page fault completed".
In that case, the original page fault from the guest was morphed into as
async page fault and pf_fixed was not increased. So when prefault
indicates async page fault is completed, pf_fixed should be increased.
Additionally, currently pf_fixed is also increased even when page fault
is spurious, while legacy MMU increases pf_fixed when page fault returns
RET_PF_EMULATE or RET_PF_FIXED.
To fix above two issues, change to increase pf_fixed when return value
is not RET_PF_SPURIOUS (RET_PF_RETRY has already been ruled out by
reaching here).
More information:
https://lore.kernel.org/kvm/cover.1620200410.git.kai.huang@intel.com/T/#mbb5f8083e58a2cd262231512b9211cbe70fc3bd5
Fixes: bb18842e21 ("kvm: x86/mmu: Add TDP MMU PF handler")
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Message-Id: <2ea8b7f5d4f03c99b32bc56fc982e1e4e3d3fc6b.1623717884.git.kai.huang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 57a3e96d6d ]
Currently tdp_mmu_map_handle_target_level() returns 0, which is
RET_PF_RETRY, when page fault is actually fixed. This makes
kvm_tdp_mmu_map() also return RET_PF_RETRY in this case, instead of
RET_PF_FIXED. Fix by initializing ret to RET_PF_FIXED.
Note that kvm_mmu_page_fault() resumes guest on both RET_PF_RETRY and
RET_PF_FIXED, which means in practice returning the two won't make
difference, so this fix alone won't be necessary for stable tree.
Fixes: bb18842e21 ("kvm: x86/mmu: Add TDP MMU PF handler")
Reviewed-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Ben Gardon <bgardon@google.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Message-Id: <f9e8956223a586cd28c090879a8ff40f5eb6d609.1623717884.git.kai.huang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 272b0a998d ]
Drop bogus logic that incorrectly clobbers the accessed/dirty enabling
status of the nested MMU on an EPTP switch. When nested EPT is enabled,
walk_mmu points at L2's _legacy_ page tables, not L1's EPT for L2.
This is likely a benign bug, as mmu->ept_ad is never consumed (since the
MMU is not a nested EPT MMU), and stuffing mmu_role.base.ad_disabled will
never propagate into future shadow pages since the nested MMU isn't used
to map anything, just to walk L2's page tables.
Note, KVM also does a full MMU reload, i.e. the guest_mmu will be
recreated using the new EPTP, and thus any change in A/D enabling will be
properly recognized in the relevant MMU.
Fixes: 41ab937274 ("KVM: nVMX: Emulate EPTP switching for the L1 hypervisor")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210609234235.1244004-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0e75225dfa ]
Use BIT_ULL() instead of an open-coded shift to check whether or not a
function is enabled in L1's VMFUNC bitmap. This is a benign bug as KVM
supports only bit 0, and will fail VM-Enter if any other bits are set,
i.e. bits 63:32 are guaranteed to be zero.
Note, "function" is bounded by hardware as VMFUNC will #UD before taking
a VM-Exit if the function is greater than 63.
Before:
if ((vmcs12->vm_function_control & (1 << function)) == 0)
0x000000000001a916 <+118>: mov $0x1,%eax
0x000000000001a91b <+123>: shl %cl,%eax
0x000000000001a91d <+125>: cltq
0x000000000001a91f <+127>: and 0x128(%rbx),%rax
After:
if (!(vmcs12->vm_function_control & BIT_ULL(function & 63)))
0x000000000001a955 <+117>: mov 0x128(%rbx),%rdx
0x000000000001a95c <+124>: bt %rax,%rdx
Fixes: 27c42a1bb8 ("KVM: nVMX: Enable VMFUNC for the L1 hypervisor")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210609234235.1244004-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 07ffaf343e ]
Trigger a full TLB flush on behalf of the guest on nested VM-Enter and
VM-Exit when VPID is disabled for L2. kvm_mmu_new_pgd() syncs only the
current PGD, which can theoretically leave stale, unsync'd entries in a
previous guest PGD, which could be consumed if L2 is allowed to load CR3
with PCID_NOFLUSH=1.
Rename KVM_REQ_HV_TLB_FLUSH to KVM_REQ_TLB_FLUSH_GUEST so that it can
be utilized for its obvious purpose of emulating a guest TLB flush.
Note, there is no change the actual TLB flush executed by KVM, even
though the fast PGD switch uses KVM_REQ_TLB_FLUSH_CURRENT. When VPID is
disabled for L2, vpid02 is guaranteed to be '0', and thus
nested_get_vpid02() will return the VPID that is shared by L1 and L2.
Generate the request outside of kvm_mmu_new_pgd(), as getting the common
helper to correctly identify which requested is needed is quite painful.
E.g. using KVM_REQ_TLB_FLUSH_GUEST when nested EPT is in play is wrong as
a TLB flush from the L1 kernel's perspective does not invalidate EPT
mappings. And, by using KVM_REQ_TLB_FLUSH_GUEST, nVMX can do future
simplification by moving the logic into nested_vmx_transition_tlb_flush().
Fixes: 41fab65e7c ("KVM: nVMX: Skip MMU sync on nested VMX transition when possible")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210609234235.1244004-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit cbbf244f05 ]
Fans 7..12 do not have their own set of configuration registers.
So far the code ignored that and read beyond the end of the configuration
register range to get the tachometer period. This resulted in more or less
random fan speed values for those fans.
The datasheet is quite vague when it comes to defining the tachometer
period for fans 7..12. Experiments confirm that the period is the same
for both fans associated with a given set of configuration registers.
Fixes: 54187ff9d7 ("hwmon: (max31790) Convert to use new hwmon registration API")
Fixes: 195a4b4298 ("hwmon: Driver for Maxim MAX31790")
Cc: Jan Kundrát <jan.kundrat@cesnet.cz>
Reviewed-by: Jan Kundrát <jan.kundrat@cesnet.cz>
Cc: Václav Kubernát <kubernat@cesnet.cz>
Reviewed-by: Jan Kundrát <jan.kundrat@cesnet.cz>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20210526154022.3223012-2-linux@roeck-us.net
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 97387c2f06 ]
Valid Maxim Integrated ACPI device IDs would start with MXIM,
not with MAX1. On top of that, ACPI device IDs reflecting chip names
are almost always invalid.
Remove the invalid ACPI IDs.
Fixes: 04e1e70afe ("hwmon: (max31722) Add support for MAX31722/MAX31723 temperature sensors")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ac61c8aae4 ]
This reverts commit b58bd4c6df.
None of the ACPI IDs introduced with the reverted patch is a valid ACPI
device ID. Any ACPI users of this driver are advised to use PRP0001 and
a devicetree-compatible device identification.
Fixes: b58bd4c6df ("hwmon: (lm70) Add support for ACPI")
Cc: Andrej Picej <andpicej@gmail.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2b821698dc ]
The current RDACM21 initialization routine powers up the OV10640 image
sensor after the OV490 ISP. The ISP is programmed with a firmware loaded
from an embedded serial flash that (most probably) tries to interact and
program also the image sensor connected to the ISP.
As described in commit "media: i2c: rdacm21: Fix OV10640 powerup" the
image sensor powerdown signal is kept high by an internal pull up
resistor and occasionally fails to startup correctly if the powerdown
line is not asserted explicitly. Failures in the OV10640 startup causes
the OV490 firmware to fail to boot correctly resulting in the camera
module initialization to fail consequentially.
Fix this by powering up the OV10640 image sensor before testing the
OV490 firmware boot completion, by splitting the ov10640_initialize()
function in an ov10640_power_up() one and an ov10640_check_id() one.
Also make sure the OV10640 identification procedure gives enough time to
the image sensor to resume after the programming phase performed by the
OV490 firmware by repeating the ID read procedure.
This commit fixes a sporadic start-up error triggered by a failure to
detect the OV490 firmware boot completion:
rdacm21 8-0054: Timeout waiting for firmware boot
[hverkuil: fixed two typos in commit log]
Fixes: a59f853b3b ("media: i2c: Add driver for RDACM21 camera module")
Signed-off-by: Jacopo Mondi <jacopo+renesas@jmondi.org>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Reviewed-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ff75332b26 ]
The OV10640 image sensor powerdown signal is controlled by the first
line of the OV490 GPIO pad #1, but the pad #0 identifier
OV490_GPIO_OUTPUT_VALUE0 was erroneously used. As a result the image
sensor powerdown signal was never asserted but was left floating and
kept high by an internal pull-up resistor, causing sporadic failures
during the image sensor startup phase.
Fix this by using the correct GPIO pad identifier and wait the mandatory
1.5 millisecond delay after the powerup lane is asserted. The reset
delay is not characterized in the chip manual if not as "255 XVCLK +
initialization". Wait for at least 3 milliseconds to guarantee the SCCB
bus is available.
While at it also fix the reset sequence, as the reset line was released
before the powerdown one, and the line was not cycled.
This commit fixes a sporadic start-up error triggered by a failure to
read the OV10640 chip ID:
rdacm21 8-0054: OV10640 ID mismatch: (0x01)
Fixes: a59f853b3b ("media: i2c: Add driver for RDACM21 camera module")
Signed-off-by: Jacopo Mondi <jacopo+renesas@jmondi.org>
Reviewed-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5d11e6aad1 ]
The m2m_ctx resources was allocated by v4l2_m2m_ctx_init() in g2d_open()
should be freed from g2d_release() when it's not used.
Fix it
Fixes: 918847341a ("[media] v4l: add G2D driver for s5p device family")
Signed-off-by: Dillon Min <dillon.minfei@gmail.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 765ba251d2 ]
Converting the VIDIOC_DQEVENT_TIME32/VIDIOC_DQEVENT32/
VIDIOC_DQEVENT32_TIME32 arguments to the canonical form is done in common
code, but for some reason I ended up adding another conversion helper to
subdev_do_ioctl() as well. I must have concluded that this does not go
through the common conversion, but it has done that since the ioctl
handler was first added.
I assume this one is harmless as there should be no way to arrive here
from user space if CONFIG_COMPAT_32BIT_TIME is set, but since it is dead
code, it should just get removed.
On a 64-bit architecture, as well as a 32-bit architecture without
CONFIG_COMPAT_32BIT_TIME, handling this command is a mistake,
and the kernel should return an error.
Fixes: 1a6c0b36dd ("media: v4l2-core: fix VIDIOC_DQEVENT for time64 ABI")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c344f07aa1 ]
Syzbot found that passing ioctl command 0xc0505609 into a 64-bit
kernel from a 32-bit process causes uninitialized kernel memory to
get passed to drivers instead of the user space data:
BUG: KMSAN: uninit-value in check_array_args drivers/media/v4l2-core/v4l2-ioctl.c:3041 [inline]
BUG: KMSAN: uninit-value in video_usercopy+0x1631/0x3d30 drivers/media/v4l2-core/v4l2-ioctl.c:3315
CPU: 0 PID: 19595 Comm: syz-executor.4 Not tainted 5.11.0-rc7-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack+0x21c/0x280 lib/dump_stack.c:120
kmsan_report+0xfb/0x1e0 mm/kmsan/kmsan_report.c:118
__msan_warning+0x5f/0xa0 mm/kmsan/kmsan_instr.c:197
check_array_args drivers/media/v4l2-core/v4l2-ioctl.c:3041 [inline]
video_usercopy+0x1631/0x3d30 drivers/media/v4l2-core/v4l2-ioctl.c:3315
video_ioctl2+0x9f/0xb0 drivers/media/v4l2-core/v4l2-ioctl.c:3391
v4l2_ioctl+0x255/0x290 drivers/media/v4l2-core/v4l2-dev.c:360
v4l2_compat_ioctl32+0x2c6/0x370 drivers/media/v4l2-core/v4l2-compat-ioctl32.c:1248
__do_compat_sys_ioctl fs/ioctl.c:842 [inline]
__se_compat_sys_ioctl+0x53d/0x1100 fs/ioctl.c:793
__ia32_compat_sys_ioctl+0x4a/0x70 fs/ioctl.c:793
do_syscall_32_irqs_on arch/x86/entry/common.c:79 [inline]
__do_fast_syscall_32+0x102/0x160 arch/x86/entry/common.c:141
do_fast_syscall_32+0x6a/0xc0 arch/x86/entry/common.c:166
do_SYSENTER_32+0x73/0x90 arch/x86/entry/common.c:209
entry_SYSENTER_compat_after_hwframe+0x4d/0x5c
The time32 commands are defined but were never meant to be called on
64-bit machines, as those have always used time64 interfaces. I missed
this in my patch that introduced the time64 handling on 32-bit platforms.
The problem in this case is the mismatch of one function checking for
the numeric value of the command and another function checking for the
type of process (native vs compat) instead, with the result being that
for this combination, nothing gets copied into the buffer at all.
Avoid this by only trying to convert the time32 commands when running
on a 32-bit kernel where these are defined in a meaningful way.
[hverkuil: fix 3 warnings: switch with no cases]
Fixes: 577c89b0ce ("media: v4l2-core: fix v4l2_buffer handling for time64 ABI")
Reported-by: syzbot+142888ffec98ab194028@syzkaller.appspotmail.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5cd259ca5d ]
SM2 module alloc ec->Q in sm2_set_pub_key(), when doing alg test in
test_akcipher_one(), it will set public key for every test vector,
and don't free ec->Q. This will cause a memory leak.
This patch alloc ec->Q in sm2_ec_ctx_init().
Fixes: ea7ecb6644 ("crypto: sm2 - introduce OSCCA SM2 asymmetric cipher algorithm")
Signed-off-by: Hongbo Li <herberthbli@tencent.com>
Reviewed-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1b82435d17 ]
In curve25519_mod_init() the curve25519_alg will be registered only when
(X86_FEATURE_BMI2 && X86_FEATURE_ADX). But in curve25519_mod_exit()
it still checks (X86_FEATURE_BMI2 || X86_FEATURE_ADX) when do crypto
unregister. This will trigger a BUG_ON in crypto_unregister_alg() as
alg->cra_refcnt is 0 if the cpu only supports one of X86_FEATURE_BMI2
and X86_FEATURE_ADX.
Fixes: 07b586fe06 ("crypto: x86/curve25519 - replace with formally verified implementation")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ca323b2c61 ]
pm_runtime_get_sync will increment pm usage counter
even it failed. Forgetting to putting operation will
result in reference leak here. We fix it by replacing
it with pm_runtime_resume_and_get to keep usage counter
balanced.
Fixes: 604c31039d ("crypto: omap-sham - Check for return value from pm_runtime_get_sync")
Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 57c126661f ]
Function nitrox_register_interrupts leaves variable 'nr_vecs' unchecked, which
would be use as kcalloc parameter later.
Fixes: 5155e118dd ("crypto: cavium/nitrox - use pci_alloc_irq_vectors() while enabling MSI-X.")
Signed-off-by: Tong Tiangen <tongtiangen@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 13dfead49d ]
Rename struct sms_msg_data4 to sms_msg_data5 and increase the size of
its msg_data array from 4 to 5 elements. Notice that at some point
the 5th element of msg_data is being accessed in function
smscore_load_firmware_family2():
1006 trigger_msg->msg_data[4] = 4; /* Task ID */
Also, there is no need for the object _trigger_msg_ of type struct
sms_msg_data *, when _msg_ can be used, directly. Notice that msg_data
in struct sms_msg_data is a one-element array, which causes multiple
out-of-bounds warnings when accessing beyond its first element
in function smscore_load_firmware_family2():
992 struct sms_msg_data *trigger_msg =
993 (struct sms_msg_data *) msg;
994
995 pr_debug("sending MSG_SMS_SWDOWNLOAD_TRIGGER_REQ\n");
996 SMS_INIT_MSG(&msg->x_msg_header,
997 MSG_SMS_SWDOWNLOAD_TRIGGER_REQ,
998 sizeof(struct sms_msg_hdr) +
999 sizeof(u32) * 5);
1000
1001 trigger_msg->msg_data[0] = firmware->start_address;
1002 /* Entry point */
1003 trigger_msg->msg_data[1] = 6; /* Priority */
1004 trigger_msg->msg_data[2] = 0x200; /* Stack size */
1005 trigger_msg->msg_data[3] = 0; /* Parameter */
1006 trigger_msg->msg_data[4] = 4; /* Task ID */
even when enough dynamic memory is allocated for _msg_:
929 /* PAGE_SIZE buffer shall be enough and dma aligned */
930 msg = kmalloc(PAGE_SIZE, GFP_KERNEL | coredev->gfp_buf_flags);
but as _msg_ is casted to (struct sms_msg_data *):
992 struct sms_msg_data *trigger_msg =
993 (struct sms_msg_data *) msg;
the out-of-bounds warnings are actually valid and should be addressed.
Fix this by declaring object _msg_ of type struct sms_msg_data5 *,
which contains a 5-elements array, instead of just 4. And use
_msg_ directly, instead of creating object trigger_msg.
This helps with the ongoing efforts to enable -Warray-bounds by fixing
the following warnings:
CC [M] drivers/media/common/siano/smscoreapi.o
drivers/media/common/siano/smscoreapi.c: In function ‘smscore_load_firmware_family2’:
drivers/media/common/siano/smscoreapi.c:1003:24: warning: array subscript 1 is above array bounds of ‘u32[1]’ {aka ‘unsigned int[1]’} [-Warray-bounds]
1003 | trigger_msg->msg_data[1] = 6; /* Priority */
| ~~~~~~~~~~~~~~~~~~~~~^~~
In file included from drivers/media/common/siano/smscoreapi.c:12:
drivers/media/common/siano/smscoreapi.h:619:6: note: while referencing ‘msg_data’
619 | u32 msg_data[1];
| ^~~~~~~~
drivers/media/common/siano/smscoreapi.c:1004:24: warning: array subscript 2 is above array bounds of ‘u32[1]’ {aka ‘unsigned int[1]’} [-Warray-bounds]
1004 | trigger_msg->msg_data[2] = 0x200; /* Stack size */
| ~~~~~~~~~~~~~~~~~~~~~^~~
In file included from drivers/media/common/siano/smscoreapi.c:12:
drivers/media/common/siano/smscoreapi.h:619:6: note: while referencing ‘msg_data’
619 | u32 msg_data[1];
| ^~~~~~~~
drivers/media/common/siano/smscoreapi.c:1005:24: warning: array subscript 3 is above array bounds of ‘u32[1]’ {aka ‘unsigned int[1]’} [-Warray-bounds]
1005 | trigger_msg->msg_data[3] = 0; /* Parameter */
| ~~~~~~~~~~~~~~~~~~~~~^~~
In file included from drivers/media/common/siano/smscoreapi.c:12:
drivers/media/common/siano/smscoreapi.h:619:6: note: while referencing ‘msg_data’
619 | u32 msg_data[1];
| ^~~~~~~~
drivers/media/common/siano/smscoreapi.c:1006:24: warning: array subscript 4 is above array bounds of ‘u32[1]’ {aka ‘unsigned int[1]’} [-Warray-bounds]
1006 | trigger_msg->msg_data[4] = 4; /* Task ID */
| ~~~~~~~~~~~~~~~~~~~~~^~~
In file included from drivers/media/common/siano/smscoreapi.c:12:
drivers/media/common/siano/smscoreapi.h:619:6: note: while referencing ‘msg_data’
619 | u32 msg_data[1];
| ^~~~~~~~
Fixes: 018b0c6f8a ("[media] siano: make load firmware logic to work with newer firmwares")
Co-developed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8ed339f23d ]
The direction of the pipe argument must match the request-type direction
bit or control requests may fail depending on the host-controller-driver
implementation.
Control transfers without a data stage are treated as OUT requests by
the USB stack and should be using usb_sndctrlpipe(). Failing to do so
will now trigger a warning.
Fix the gl860_RTx() helper so that zero-length control reads fail with
an error message instead. Note that there are no current callers that
would trigger this.
Fixes: 4f7cb8837c ("V4L/DVB (12954): gspca - gl860: Addition of GL860 based webcams")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ce67eaca95 ]
Replace BIT() in v4l2's UPAI header with _BITUL(). BIT() is not defined
in the UAPI headers and its usage may cause userspace build errors.
Fixes: 206bc0f6fb ("media: vicodec: mark the stateless FWHT API as stable")
Signed-off-by: Joe Richey <joerichey@google.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a6b1e7093f ]
When the CSI bps per lane is not in the valid range, an appropriate error
code -EINVAL should be returned. However, we currently do not explicitly
assign this error code to 'ret'. As a result, 0 was incorrectly returned.
Fixes: 2561482468 ("[media] tc358743: support probe from device tree")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8f2e452730 ]
The media_device_usb_allocate() function returns error pointers when
it's enabled and something goes wrong. It can return NULL as well, but
only if CONFIG_MEDIA_CONTROLLER is disabled so that doesn't apply here.
Fixes: 812658d88d ("media: change au0828 to use Media Device Allocator API")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 01fe904c9a ]
In isp_video_release, file->private_data is freed via
_vb2_fop_release()->v4l2_fh_release(). But the freed
file->private_data is still used in v4l2_fh_is_singular_file()
->v4l2_fh_is_singular(file->private_data), which is a use
after free bug.
My patch uses a variable 'is_singular_file' to avoid the uaf.
v3: https://lore.kernel.org/patchwork/patch/1419058/
Fixes: 34947b8aeb ("[media] exynos4-is: Add the FIMC-IS ISP capture DMA driver")
Signed-off-by: Lv Yunlong <lyl2019@mail.ustc.edu.cn>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ba1ed4ae76 ]
The driver should only set the payload on .buf_prepare if the
buffer is CAPTURE type. If an OUTPUT buffer has a zero bytesused
set by userspace then v4l2-core will set it to buffer length.
If we overwrite bytesused for OUTPUT buffers, too, then
vb2_get_plane_payload() will return incorrect value which might be then
written to hw registers by the driver in rkvdec-h264.c.
[Changed the comment and used V4L2_TYPE_IS_CAPTURE macro]
Fixes: cd33c83044 ("media: rkvdec: Add the rkvdec driver")
Signed-off-by: Ezequiel Garcia <ezequiel@collabora.com>
Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
Signed-off-by: Andrzej Pietrasiewicz <andrzej.p@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6e2202ca1e ]
The conceal color property used for Venus v4 and v6 has the same
payload structure. But currently v4 follow down to payload
structure for v1. Correct this by moving set_property to v4.
Fixes: 4ef6039fad ("media: venus: vdec: Add support for conceal control")
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2cb2705cf7 ]
When we continue, due to device is disabled, loop we have to drop
reference count. When we have an array full of devices we have to also
drop the reference count. Note, in this case the
cio2_bridge_unregister_sensors() is called by the caller.
Fixes: 803abec64e ("media: ipu3-cio2: Add cio2-bridge to ipu3-cio2 driver")
Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Reviewed-by: Daniel Scally <djrscally@gmail.com>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 475ea6c602 ]
Will reported that the 'XXX __migrate_task() can fail' in migration_cpu_stop()
can happen, and it *is* sort of a big deal. Looking at it some more, one
will note there is a glaring hole in the deferred CPU selection:
(w/ CONFIG_CPUSET=n, so that the affinity mask passed via taskset doesn't
get AND'd with cpu_online_mask)
$ taskset -pc 0-2 $PID
# offline CPUs 3-4
$ taskset -pc 3-5 $PID
`\
$PID may stay on 0-2 due to the cpumask_any_distribute() picking an
offline CPU and __migrate_task() refusing to do anything due to
cpu_is_allowed().
set_cpus_allowed_ptr() goes to some length to pick a dest_cpu that matches
the right constraints vs affinity and the online/active state of the
CPUs. Reuse that instead of discarding it in the affine_move_task() case.
Fixes: 6d337eab04 ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()")
Reported-by: Will Deacon <will@kernel.org>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210526205751.842360-2-valentin.schneider@arm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b8e00abe7d ]
Some arches (um, sparc64, riscv, xtensa) cause a Kconfig warning for
LOCKDEP.
These arch-es select LOCKDEP_SUPPORT but they are not listed as one
of the arch-es that LOCKDEP depends on.
Since (16) arch-es define the Kconfig symbol LOCKDEP_SUPPORT if they
intend to have LOCKDEP support, replace the awkward list of
arch-es that LOCKDEP depends on with the LOCKDEP_SUPPORT symbol.
But wait. LOCKDEP_SUPPORT is included in LOCK_DEBUGGING_SUPPORT,
which is already a dependency here, so LOCKDEP_SUPPORT is redundant
and not needed.
That leaves the FRAME_POINTER dependency, but it is part of an
expression like this:
depends on (A && B) && (FRAME_POINTER || B')
where B' is a dependency of B so if B is true then B' is true
and the value of FRAME_POINTER does not matter.
Thus we can also delete the FRAME_POINTER dependency.
Fixes this kconfig warning: (for um, sparc64, riscv, xtensa)
WARNING: unmet direct dependencies detected for LOCKDEP
Depends on [n]: DEBUG_KERNEL [=y] && LOCK_DEBUGGING_SUPPORT [=y] && (FRAME_POINTER [=n] || MIPS || PPC || S390 || MICROBLAZE || ARM || ARC || X86)
Selected by [y]:
- PROVE_LOCKING [=y] && DEBUG_KERNEL [=y] && LOCK_DEBUGGING_SUPPORT [=y]
- LOCK_STAT [=y] && DEBUG_KERNEL [=y] && LOCK_DEBUGGING_SUPPORT [=y]
- DEBUG_LOCK_ALLOC [=y] && DEBUG_KERNEL [=y] && LOCK_DEBUGGING_SUPPORT [=y]
Fixes: 7d37cb2c91 ("lib: fix kconfig dependency on ARCH_WANT_FRAME_POINTERS")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Waiman Long <longman@redhat.com>
Link: https://lkml.kernel.org/r/20210524224150.8009-1-rdunlap@infradead.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9c87ae1a0d ]
'ret' is known to be 1 here. In fact 'i' is expected instead.
Store the return value of 'i2c_master_recv()' in 'ret' so that the error
message print the correct error code.
Fixes: acaa34bf06 ("media: rc: implement zilog transmitter")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Sean Young <sean@mess.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d699c5d0bd ]
Simplify the probe function by using the of_device_get_match_data()
helper instead of open coding. The logic is also moved up to fix the
missing pm_runtime cleanup in case of a match failure.
Fixes: 0bc42311cd ("crypto: sa2ul - Add support for AM64")
Signed-off-by: Suman Anna <s-anna@ti.com>
Reviewed-by: Tero Kristo <kristo@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5c8552325e ]
The pm_runtime APIs added first in commit 7694b6ca64 ("crypto: sa2ul -
Add crypto driver") are not unwound properly and was fixed up partially
in commit 13343badae ("crypto: sa2ul - Fix PM reference leak in
sa_ul_probe()"). This fixed up the pm_runtime usage count but not the
state. Fix this properly.
Fixes: 13343badae ("crypto: sa2ul - Fix PM reference leak in sa_ul_probe()")
Signed-off-by: Suman Anna <s-anna@ti.com>
Reviewed-by: Tero Kristo <kristo@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4c0716ee1d ]
The sa_dma_init() function doesn't release the requested dma channels
on all failure paths. Any failure in this function also ends up
leaking the dma pool created in sa_init_mem() in the sa_ul_probe()
function. Fix all of these issues.
Fixes: 7694b6ca64 ("crypto: sa2ul - Add crypto driver")
Signed-off-by: Suman Anna <s-anna@ti.com>
Reviewed-by: Tero Kristo <kristo@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0b0553b701 ]
Currently, an invalid dma address may be unmapped when calling
'xx_data_clr_all' in error path, so check dma address of sqe in/out
if initialized before calling 'dma_free_coherent' or 'dma_unmap_single'.
Fixes: a9214b0b6e ("crypto: hisilicon - fix the check on dma address")
Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2d01667252 ]
Actual data length of the 'secret' is not equal to the 'secret_size'.
Since the 'curve_id' has removed in the 'secret', the 'secret_size'
should subtract the length of the 'curve_id'.
Fixes: 6763f5ea2d ("crypto: ecdh - move curve_id of ECDH from ...")
Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 49219d9b87 ]
EVM_SETUP_COMPLETE is defined as 0x80000000, which is larger than INT_MAX.
The "-fno-strict-overflow" compiler option properly prevents signaling
EVM that the EVM policy setup is complete. Define and read an unsigned
int.
Fixes: f00d797507 ("EVM: Allow userspace to signal an RSA key has been loaded")
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bfc1f378c8 ]
Iff platform_get_irq() fails (or returns IRQ0) and thus the polling mode
has to be used, ata_host_activate() hits the WARN_ON() due to 'irq_handler'
parameter being non-NULL if the polling mode is selected. Let's only set
the pointer to the driver's IRQ handler if platform_get_irq() returns a
valid IRQ # -- this should avoid the unnecessary WARN_ON()...
Fixes: 43f01da0f2 ("MIPS/OCTEON/ata: Convert pata_octeon_cf.c to use device tree.")
Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Link: https://lore.kernel.org/r/3a241167-f84d-1d25-5b9b-be910afbe666@omp.ru
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8852c55240 ]
"OBJECT_FILES_NON_STANDARD_vma.o := n" has a dependency bug. When
objtool source is updated, the affected object doesn't get re-analyzed
by objtool.
Peter's new variable-sized jump label feature relies on objtool
rewriting the object file. Otherwise the system can fail to boot. That
effectively upgrades this minor dependency issue to a major bug.
The problem is that variables in prerequisites are expanded early,
during the read-in phase. The '$(objtool_dep)' variable indirectly uses
'$@', which isn't yet available when the target prerequisites are
evaluated.
Use '.SECONDEXPANSION:' which causes '$(objtool_dep)' to be expanded in
a later phase, after the target-specific '$@' variable has been defined.
Fixes: b9ab5ebb14 ("objtool: Add CONFIG_STACK_VALIDATION option")
Fixes: ab3257042c ("jump_label, x86: Allow short NOPs")
Reported-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 93b7385870 ]
cpu_cgroup_css_online() calls cpu_util_update_eff() without holding the
uclamp_mutex or rcu_read_lock() like other call sites, which is
a mistake.
The uclamp_mutex is required to protect against concurrent reads and
writes that could update the cgroup hierarchy.
The rcu_read_lock() is required to traverse the cgroup data structures
in cpu_util_update_eff().
Surround the caller with the required locks and add some asserts to
better document the dependency in cpu_util_update_eff().
Fixes: 7226017ad3 ("sched/uclamp: Fix a bug in propagating uclamp value in new cgroups")
Reported-by: Quentin Perret <qperret@google.com>
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210510145032.1934078-3-qais.yousef@arm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0c18f2ecfc ]
cpu.uclamp.min is a protection as described in cgroup-v2 Resource
Distribution Model
Documentation/admin-guide/cgroup-v2.rst
which means we try our best to preserve the minimum performance point of
tasks in this group. See full description of cpu.uclamp.min in the
cgroup-v2.rst.
But the current implementation makes it a limit, which is not what was
intended.
For example:
tg->cpu.uclamp.min = 20%
p0->uclamp[UCLAMP_MIN] = 0
p1->uclamp[UCLAMP_MIN] = 50%
Previous Behavior (limit):
p0->effective_uclamp = 0
p1->effective_uclamp = 20%
New Behavior (Protection):
p0->effective_uclamp = 20%
p1->effective_uclamp = 50%
Which is inline with how protections should work.
With this change the cgroup and per-task behaviors are the same, as
expected.
Additionally, we remove the confusing relationship between cgroup and
!user_defined flag.
We don't want for example RT tasks that are boosted by default to max to
change their boost value when they attach to a cgroup. If a cgroup wants
to limit the max performance point of tasks attached to it, then
cpu.uclamp.max must be set accordingly.
Or if they want to set different boost value based on cgroup, then
sysctl_sched_util_clamp_min_rt_default must be used to NOT boost to max
and set the right cpu.uclamp.min for each group to let the RT tasks
obtain the desired boost value when attached to that group.
As it stands the dependency on !user_defined flag adds an extra layer of
complexity that is not required now cpu.uclamp.min behaves properly as
a protection.
The propagation model of effective cpu.uclamp.min in child cgroups as
implemented by cpu_util_update_eff() is still correct. The parent
protection sets an upper limit of what the child cgroups will
effectively get.
Fixes: 3eac870a32 (sched/uclamp: Use TG's clamps to restrict TASK's clamps)
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210510145032.1934078-2-qais.yousef@arm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8edcb5049a ]
The use of an enum named 'RST' conflicts with a #define macro
named 'RST' in arch/mips/include/asm/mach-rc32434/rb.h.
The MIPS use of RST was there first (AFAICT), so change the
media/i2c/ uses of RST to be named 'RSET'.
'git grep -w RSET' does not report any naming conflicts with the
new name.
This fixes multiple build errors:
arch/mips/include/asm/mach-rc32434/rb.h:15:14: error: expected identifier before '(' token
15 | #define RST (1 << 15)
| ^
drivers/media/i2c/s5c73m3/s5c73m3.h:356:2: note: in expansion of macro 'RST'
356 | RST,
| ^~~
../arch/mips/include/asm/mach-rc32434/rb.h:15:14: error: expected identifier before '(' token
15 | #define RST (1 << 15)
| ^
../drivers/media/i2c/s5k6aa.c:180:2: note: in expansion of macro 'RST'
180 | RST,
| ^~~
../arch/mips/include/asm/mach-rc32434/rb.h:15:14: error: expected identifier before '(' token
15 | #define RST (1 << 15)
| ^
../drivers/media/i2c/s5k5baf.c:238:2: note: in expansion of macro 'RST'
238 | RST,
| ^~~
and some others that I have trimmed.
Fixes: cac47f1822 ("[media] V4L: Add S5C73M3 camera driver")
Fixes: 8b99312b72 ("[media] Add v4l2 subdev driver for S5K4ECGX sensor")
Fixes: 7d459937dc ("[media] Add driver for Samsung S5K5BAF camera sensor")
Fixes: bfa8dd3a05 ("[media] v4l: Add v4l2 subdev driver for S5K6AAFX sensor")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kernel test robot <lkp@intel.com>
Cc: Shawn Guo <shawnguo@kernel.org>
Cc: Sascha Hauer <s.hauer@pengutronix.de>
Cc: Pengutronix Kernel Team <kernel@pengutronix.de>
Cc: Fabio Estevam <festevam@gmail.com>
Cc: NXP Linux Team <linux-imx@nxp.com>
Cc: linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
Cc: Andrzej Hajda <a.hajda@samsung.com>
Cc: Sylwester Nawrocki <s.nawrocki@samsung.com>
Cc: Sangwook Lee <sangwook.lee@linaro.org>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2d3a62fbae ]
The driver overrides the error codes returned by platform_get_irq() to
-ENOENT, so if it returns -EPROBE_DEFER, the driver would fail the probe
permanently instead of the deferred probing. Switch to propagating the
error code upstream, still checking/overriding IRQ0 as libata regards it
as "no IRQ" (thus polling) anyway...
Fixes: 9ec36cafe4 ("of/irq: do irq resolution in platform_get_irq")
Signed-off-by: Sergey Shtylyov <s.shtylyov@omprussia.ru>
Link: https://lore.kernel.org/r/771ced55-3efb-21f5-f21c-b99920aae611@omprussia.ru
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4a24efa16e ]
The driver overrides the error codes returned by platform_get_irq() to
-EINVAL, so if it returns -EPROBE_DEFER, the driver would fail the probe
permanently instead of the deferred probing. Switch to propagating the
error code upstream, still checking/overriding IRQ0 as libata regards it
as "no IRQ" (thus polling) anyway...
Fixes: 9ec36cafe4 ("of/irq: do irq resolution in platform_get_irq")
Signed-off-by: Sergey Shtylyov <s.shtylyov@omprussia.ru>
Link: https://lore.kernel.org/r/105b456d-1199-f6e9-ceb7-ffc5ba551d1a@omprussia.ru
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e8acf011f2 ]
Crypto selftests fail on ixp4xx since it do not update IV after skcipher
requests.
Fixes: 81bef01500 ("crypto: ixp4xx - Hardware crypto support for IXP4xx CPUs")
Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9395c58fdd ]
Testing ixp4xx_crypto with CONFIG_DMA_API_DEBUG lead to the following error:
DMA-API: platform ixp4xx_crypto.0: device driver tries to free DMA memory it has not allocated [device address=0x0000000000000000] [size=24 bytes]
This is due to dma_unmap using the wrong address.
Fixes: 0d44dc59b2 ("crypto: ixp4xx - Fix handling of chained sg buffers")
Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 892bb6ecea ]
The device_run() first enables the clock and then
tries to resume PM runtime, checking for errors.
Well, if for some reason the pm_runtime can not resume,
it would be better to detect it beforehand.
So, change the order inside device_run().
Reviewed-by: Ezequiel Garcia <ezequiel@collabora.com>
Fixes: 775fec6900 ("media: add Rockchip VPU JPEG encoder driver")
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6005a8e955 ]
If pm_runtime resume logic fails, return the error code
provided by it, instead of -EAGAIN, as, depending on what
caused it to fail, it may not be something that would be
recovered.
Fixes: cbba45d436 ("[media] smiapp: Use runtime PM")
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 747bad54a6 ]
There's a bug at s5p_cec_adap_enable(): if called to
disable the device, it should call pm_runtime_put()
instead of pm_runtime_disable(), as the goal here is to
decrement the usage_count and not to disable PM runtime.
Reported-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Fixes: 1bcbf6f4b6 ("[media] cec: s5p-cec: Add s5p-cec driver")
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4cba5473c5 ]
The Venus code has a sort of watchdog that attempts to recover
from IP errors, implemented as a delayed work job, which
calls venus_sys_error_handler().
Right now, it has several issues:
1. It assumes that PM runtime resume never fails
2. It internally runs two while() loops that also assume that
PM runtime will never fail to go idle:
while (pm_runtime_active(core->dev_dec) || pm_runtime_active(core->dev_enc))
msleep(10);
...
while (core->pmdomains[0] && pm_runtime_active(core->pmdomains[0]))
usleep_range(1000, 1500);
3. It uses an OR to merge all return codes and then report to the user
4. If the hardware never recovers, it keeps running on every 10ms,
flooding the syslog with 2 messages (so, up to 200 messages
per second).
Rework the code, in order to prevent that, by:
1. check the return code from PM runtime resume;
2. don't let the while() loops run forever;
3. store the failed event;
4. use warn ratelimited when it fails to recover.
Fixes: af2c3834c8 ("[media] media: venus: adding core part and helper functions")
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit dbaca8e56e ]
The commit 7d93aecdb5 ("spi: Add generic support for unused native cs
with cs-gpios") excludes the valid case for the controllers that doesn't
need to switch native CS in order to perform the transfer, i.e. when
0 native
... ...
<n> - 1 native
<n> GPIO
<n> + 1 GPIO
... ...
where <n> defines maximum of native CSs supported by the controller.
To allow this, bail out from spi_get_gpio_descs() conditionally for
the controllers which explicitly marked with SPI_MASTER_GPIO_SS.
Fixes: 7d93aecdb5 ("spi: Add generic support for unused native cs with cs-gpios")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20210420164425.40287-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a0b8200d06 ]
Commit "mm/page_alloc: convert per-cpu list protection to local_lock" will
introduce a zero-sized per-CPU variable, which causes pahole to generate
invalid BTF. Only pahole versions 1.18 through 1.21 are impacted, as
before 1.18 pahole doesn't know anything about per-CPU variables, and 1.22
contains the proper fix for the issue.
Luckily, pahole 1.18 got --skip_encoding_btf_vars option disabling BTF
generation for per-CPU variables in anticipation of some unanticipated
problems. So use this escape hatch to disable per-CPU var BTF info on
those problematic pahole versions. Users relying on availability of
per-CPU var BTFs would need to upgrade to pahole 1.22+, but everyone won't
notice any regressions.
Link: https://lkml.kernel.org/r/20210530002536.3193829-1-andrii@kernel.org
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Michal Suchanek <msuchanek@suse.de>
Cc: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c0f8aa4fa8 ]
Commit c65e774fb3 ("x86/mm: Make PGDIR_SHIFT and PTRS_PER_P4D variable")
made PTRS_PER_P4D variable on x86 and introduced MAX_PTRS_PER_P4D as a
constant for cases which need a compile-time constant (e.g. fixed-size
arrays).
powerpc likewise has boot-time selectable MMU features which can cause
other mm "constants" to vary. For KASAN, we have some static
PTE/PMD/PUD/P4D arrays so we need compile-time maximums for all these
constants. Extend the MAX_PTRS_PER_ idiom, and place default definitions
in include/pgtable.h. These define MAX_PTRS_PER_x to be PTRS_PER_x unless
an architecture has defined MAX_PTRS_PER_x in its arch headers.
Clean up pgtable-nop4d.h and s390's MAX_PTRS_PER_P4D definitions while
we're at it: both can just pick up the default now.
Link: https://lkml.kernel.org/r/20210624034050.511391-4-dja@axtens.net
Signed-off-by: Daniel Axtens <dja@axtens.net>
Acked-by: Andrey Konovalov <andreyknvl@gmail.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8826ee4fe7 ]
isw_nr_in_flight is used to determine whether the inode switch queue
should be flushed from the umount path. Currently it's increased after
grabbing an inode and even scheduling the switch work. It means the
umount path can walk past cleanup_offline_cgwb() with active inode
references, which can result in a "Busy inodes after unmount." message and
use-after-free issues (with inode->i_sb which gets freed).
Fix it by incrementing isw_nr_in_flight before doing anything with the
inode and decrementing in the case when switching wasn't scheduled.
The problem hasn't yet been seen in the real life and was discovered by
Jan Kara by looking into the code.
Link: https://lkml.kernel.org/r/20210608230225.2078447-4-guro@fb.com
Signed-off-by: Roman Gushchin <guro@fb.com>
Suggested-by: Jan Kara <jack@suse.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c5f320ff8a ]
gcc points out a mistake in the mca driver that goes back to before the
git history:
arch/ia64/kernel/mca_drv.c: In function 'init_record_index_pools':
arch/ia64/kernel/mca_drv.c:346:54: error: expression does not compute the number of elements in this array; element typ
e is 'int', not 'size_t' {aka 'long unsigned int'} [-Werror=sizeof-array-div]
346 | for (i = 1; i < sizeof sal_log_sect_min_sizes/sizeof(size_t); i++)
| ^
This is the same as sizeof(size_t), which is two shorter than the actual
array. Use the ARRAY_SIZE() macro to get the correct calculation instead.
Link: https://lkml.kernel.org/r/20210514214123.875971-1-arnd@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2705dfb209 ]
ll_new_hw_segment() is reached only in case of single range discard
merge, and we don't have max discard segment size limit actually, so
it is wrong to run the following check:
if (req->nr_phys_segments + nr_phys_segs > blk_rq_get_max_segments(req))
it may be always false since req->nr_phys_segments is initialized as
one, and bio's segment count is still 1, blk_rq_get_max_segments(reg)
is 1 too.
Fix the issue by not doing the check and bypassing the calculation of
discard request's nr_phys_segments.
Based on analysis from Wang Shanker.
Cc: Christoph Hellwig <hch@lst.de>
Reported-by: Wang Shanker <shankerwangmiao@gmail.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20210628023312.1903255-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 96e39e95c0 ]
In adding APCS clock support for MSM8939, the second clock registration
fails due to duplicate device name like below.
[ 0.519657] sysfs: cannot create duplicate filename '/bus/platform/devices/qcom-apcs-msm8916-clk'
...
[ 0.661158] qcom_apcs_ipc b111000.mailbox: failed to register APCS clk
This is because MSM8939 has 3 APCS instances for Cluster0 (little cores),
Cluster1 (big cores) and CCI (Cache Coherent Interconnect). Although
only APCS of Cluster0 and Cluster1 have IPC bits, each of 3 APCS has
A53PLL clock control bits. That said, 3 'qcom-apcs-msm8916-clk' devices
need to be registered to instantiate all 3 clocks. Use PLATFORM_DEVID_AUTO
rather than PLATFORM_DEVID_NONE for platform_device_register_data() call
to fix the issue above.
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0060a4f28a ]
In the other places where we update ses->status we protect the
updates via GlobalMid_Lock. So to be consistent add the same
locking around it in cifs_put_smb_ses where it was missing.
Addresses-Coverity: 1268904 ("Data race condition")
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 424d823794 ]
The capacitive status of ExpressKeys is reported with usages beginning
at 0x940, not 0x950. Bring our driver into alignment with reality.
Signed-off-by: Jason Gerecke <jason.gerecke@wacom.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a5628263a9 ]
dacl_ptr can be null so we must check for it everywhere it is
used in build_sec_desc.
Addresses-Coverity: 1475598 ("Explicit null dereference")
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d1059c1b11 ]
A custom DSDT file is mostly used during development or debugging,
and in that case it is quite likely to want to rebuild the kernel
after changing ONLY the content of the DSDT.
This patch adds the custom DSDT as a prerequisite to tables.o
to ensure a rebuild if the DSDT file is updated. Make will merge
the prerequisites from multiple rules for the same target.
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c46ed2281b ]
The SPI core always reports a "MODALIAS=spi:<foo>", even if the device was
registered via OF. This means that this module won't auto-load if a DT has
for example has a node with a compatible "infineon,slb9670" string.
In that case kmod will expect a "MODALIAS=of:N*T*Cinfineon,slb9670" uevent
but instead will get a "MODALIAS=spi:slb9670", which is not present in the
kernel module aliases:
$ modinfo drivers/char/tpm/tpm_tis_spi.ko | grep alias
alias: of:N*T*Cgoogle,cr50C*
alias: of:N*T*Cgoogle,cr50
alias: of:N*T*Ctcg,tpm_tis-spiC*
alias: of:N*T*Ctcg,tpm_tis-spi
alias: of:N*T*Cinfineon,slb9670C*
alias: of:N*T*Cinfineon,slb9670
alias: of:N*T*Cst,st33htpm-spiC*
alias: of:N*T*Cst,st33htpm-spi
alias: spi:cr50
alias: spi:tpm_tis_spi
alias: acpi*:SMO0768:*
To workaround this issue, add in the SPI device ID table all the entries
that are present in the OF device ID table.
Reported-by: Alexander Wellbrock <a.wellbrock@mailbox.org>
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Tested-by: Peter Robinson <pbrobinson@gmail.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7560c02bdf ]
Some sorts of per-CPU clock sources have a history of going out of
synchronization with each other. However, this problem has purportedy been
solved in the past ten years. Except that it is all too possible that the
problem has instead simply been made less likely, which might mean that
some of the occasional "Marking clocksource 'tsc' as unstable" messages
might be due to desynchronization. How would anyone know?
Therefore apply CPU-to-CPU synchronization checking to newly unstable
clocksource that are marked with the new CLOCK_SOURCE_VERIFY_PERCPU flag.
Lists of desynchronized CPUs are printed, with the caveat that if it
is the reporting CPU that is itself desynchronized, it will appear that
all the other clocks are wrong. Just like in real life.
Reported-by: Chris Mason <clm@fb.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Feng Tang <feng.tang@intel.com>
Link: https://lore.kernel.org/r/20210527190124.440372-2-paulmck@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit db3a34e174 ]
When the clocksource watchdog marks a clock as unstable, this might be due
to that clock being unstable or it might be due to delays that happen to
occur between the reads of the two clocks. Yes, interrupts are disabled
across those two reads, but there are no shortage of things that can delay
interrupts-disabled regions of code ranging from SMI handlers to vCPU
preemption. It would be good to have some indication as to why the clock
was marked unstable.
Therefore, re-read the watchdog clock on either side of the read from the
clock under test. If the watchdog clock shows an excessive time delta
between its pair of reads, the reads are retried.
The maximum number of retries is specified by a new kernel boot parameter
clocksource.max_cswd_read_retries, which defaults to three, that is, up to
four reads, one initial and up to three retries. If more than one retry
was required, a message is printed on the console (the occasional single
retry is expected behavior, especially in guest OSes). If the maximum
number of retries is exceeded, the clock under test will be marked
unstable. However, the probability of this happening due to various sorts
of delays is quite small. In addition, the reason (clock-read delays) for
the unstable marking will be apparent.
Reported-by: Chris Mason <clm@fb.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Feng Tang <feng.tang@intel.com>
Link: https://lore.kernel.org/r/20210527190124.440372-1-paulmck@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e03f2ab78a ]
When attempting to schedule a merge of a given bfq_queue with the currently
in-service bfq_queue or with a cooperating bfq_queue among the scheduled
bfq_queues, delayed stable merge is checked for rotational or non-queueing
devs. For this stable merge to be performed, some conditions must be met.
If the current bfq_queue underwent some split from some merged bfq_queue,
one of these conditions is that two hundred milliseconds must elapse from
split, otherwise this condition is always met.
Unfortunately, by mistake, time_is_after_jiffies() was written instead of
time_is_before_jiffies() for this check, verifying that less than two
hundred milliseconds have elapsed instead of verifying that at least two
hundred milliseconds have elapsed.
Fix this issue by replacing time_is_after_jiffies() with
time_is_before_jiffies().
Signed-off-by: Luca Mariotti <mariottiluca1@hotmail.it>
Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
Signed-off-by: Pietro Pedroni <pedroni.pietro.96@gmail.com>
Link: https://lore.kernel.org/r/20210619140948.98712-3-paolo.valente@linaro.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4370cbf350 ]
On HP Pavilion Gaming Laptop 15-cx0xxx, the ECDT EC and DSDT EC share
the same port addresses but different GPEs. And the DSDT GPE is the
right one to use.
The current code duplicates DSDT EC with ECDT EC if the port addresses
are the same, and uses ECDT GPE as a result, which breaks this machine.
Introduce a new quirk for the HP laptop to trust the DSDT GPE,
and avoid duplicating even if the port addresses are the same.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=209989
Reported-and-tested-by: Shao Fu, Chen <leo881003@gmail.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e39df24169 ]
We were trying to fill in uninitialized file attributes in the error case.
Addresses-Coverity: 139689 ("Uninitialized variables")
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ff93b71a3e ]
Although in practice this can not occur (since IPv4 and IPv6 are the
only two cases currently supported), it is cleaner to avoid uninitialized
variable warnings.
Addresses smatch warning:
fs/cifs/cifs_swn.c:468 cifs_swn_store_swn_addr() error: uninitialized symbol 'port'.
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
CC: Samuel Cabrero <scabrero@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6efa994e35 ]
According to the investigation performed by Jacob Shivers at Red Hat,
cifs_lookup and cifs_readdir leak EAGAIN when the user session is
deleted on the server. Fix this issue by implementing a retry with
limits, as is implemented in cifs_revalidate_dentry_attr.
Reproducer based on the work by Jacob Shivers:
~~~
$ cat readdir-cifs-test.sh
#!/bin/bash
# Install and configure powershell and sshd on the windows
# server as descibed in
# https://docs.microsoft.com/en-us/windows-server/administration/openssh/openssh_overview
# This script uses expect(1)
USER=dude
SERVER=192.168.0.2
RPATH=root
PASS='password'
function debug_funcs {
for line in $@ ; do
echo "func $line +p" > /sys/kernel/debug/dynamic_debug/control
done
}
function setup {
echo 1 > /proc/fs/cifs/cifsFYI
debug_funcs wait_for_compound_request \
smb2_query_dir_first cifs_readdir \
compound_send_recv cifs_reconnect_tcon \
generic_ip_connect cifs_reconnect \
smb2_reconnect_server smb2_reconnect \
cifs_readv_from_socket cifs_readv_receive
tcpdump -i eth0 -w cifs.pcap host 192.168.2.182 & sleep 5
dmesg -C
}
function test_call {
if [[ $1 == 1 ]] ; then
tracer="strace -tt -f -s 4096 -o trace-$(date -Iseconds).txt"
fi
# Change the command here to anything appropriate
$tracer ls $2 > /dev/null
res=$?
if [[ $1 == 1 ]] ; then
if [[ $res == 0 ]] ; then
1>&2 echo success
else
1>&2 echo "failure ($res)"
fi
fi
}
mountpoint /mnt > /dev/null || mount -t cifs -o username=$USER,pass=$PASS //$SERVER/$RPATH /mnt
test_call 0 /mnt/
/usr/bin/expect << EOF
set timeout 60
spawn ssh $USER@$SERVER
expect "yes/no" {
send "yes\r"
expect "*?assword" { send "$PASS\r" }
} "*?assword" { send "$PASS\r" }
expect ">" { send "powershell close-smbsession -force\r" }
expect ">" { send "exit\r" }
expect eof
EOF
sysctl -w vm.drop_caches=2 > /dev/null
sysctl -w vm.drop_caches=2 > /dev/null
setup
test_call 1 /mnt/
~~~
Signed-off-by: Thiago Rafael Becker <trbecker@gmail.com>
Acked-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 889c2a7007 ]
Interlink is a special type of DFS link that resolves to a different
DFS domain-based namespace. To determine whether it is an interlink
or not, check if ReferralServers and StorageServers bits are set to 1
and 0 respectively in ReferralHeaderFlags, as specified in MS-DFSC
3.1.5.4.5 Determining Whether a Referral Response is an Interlink.
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 966a3cb7c7 ]
RHBZ: 1866684
We don't have a real fallocate in the SMB2 protocol so we used to emulate fallocate
by simply switching the file to become non-sparse. But as that could potantially consume
a lot more data than we intended to fallocate (large sparse file and fallocating a thin
slice in the middle) we would only do this IFF the fallocate request was for virtually
the entire file.
This patch improves this and starts allowing us to fallocate smaller chunks of a file by
overwriting the region with 0, for the parts that are unallocated.
The method used is to first query the server for FSCTL_QUERY_ALLOCATED_RANGES to find what
is unallocated in the fallocate range and then to only overwrite-with-zero the unallocated
ranges to fill in the holes.
As overwriting-with-zero is different from just allocating blocks, and potentially much
more expensive, we limit this to only allow fallocate ranges up to 1Mb in size.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Acked-by: Aurelien Aptel <aaptel@suse.com>
Acked-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 159f130f60 ]
The uncore memory frequency value from the mailbox command
CONFIG_TDP_GET_MEM_FREQ needs to be scaled based on the platform for
display. There is no single constant multiplier.
This change introduces CPU model specific memory frequency multiplier.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f0a029fff4 ]
There's little to no point in loading an EDAC driver running in a guest:
1) The CPU model reported by CPUID may not represent actual h/w
2) The hypervisor likely does not pass in access to memory controller devices
3) Hypervisors generally do not pass corrected error details to guests
Add a check in each of the Intel EDAC drivers for X86_FEATURE_HYPERVISOR
and simply return -ENODEV in the init routine.
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20210615174419.GA1087688@agluck-desk2.amr.corp.intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2a4a910aa4 ]
When parsing a request in nvmet_fc_handle_fcp_rqst() we should not
check for invalid target ports; if we do the command is aborted
from the fcp layer, causing the host to assume a transport error.
Rather we should still forward this request to the nvmet layer, which
will then correctly fail the command with an appropriate error status.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a0aac973a2 ]
nvmeq->cq_head is compared with nvmeq->q_depth and changed the value
and cq_phase for handling the next cq db.
but, nvmeq->q_depth's type is u32 and max. value is 0x10000 when
CQP.MSQE is 0xffff and io_queue_depth is 0x10000.
current temp. variable for comparing with nvmeq->q_depth is overflowed
when previous nvmeq->cq_head is 0xffff.
in this case, nvmeq->cq_phase is not updated.
so, fix data type for temp. variable to u32.
Signed-off-by: JK Kim <jongkang.kim2@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 28117f3a5c ]
The quirks added to asus-nb-wmi for the ASUS ROG Zephyrus G14 and G15 are
wrong, they tell the asus-wmi code to use the vendor specific WMI backlight
interface. But there is no such interface on these laptops.
As a side effect, these quirks stop the acpi_video driver to register since
they make acpi_video_get_backlight_type() return acpi_backlight_vendor,
leaving only the native AMD backlight driver in place, which is the one we
want. This happy coincidence is being replaced with a new quirk in
drivers/acpi/video_detect.c which actually sets the backlight_type to
acpi_backlight_native fixinf this properly. This reverts
commit 13bceda68f ("platform/x86: asus-nb-wmi: add support for ASUS ROG
Zephyrus G14 and G15").
Signed-off-by: Luke D. Jones <luke@ljones.dev>
Link: https://lore.kernel.org/r/20210419074915.393433-3-luke@ljones.dev
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2cafe29a8d ]
Yi reported several kernel panics on:
[16687.001777] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
...
[16687.163549] pc : __rq_qos_track+0x38/0x60
or
[ 997.690455] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020
...
[ 997.850347] pc : __rq_qos_done+0x2c/0x50
Turns out it is caused by race between adding rq qos(wbt) and normal IO
because rq_qos_add can be run when IO is being submitted, fix this issue
by freezing queue before adding/deleting rq qos to queue.
rq_qos_exit() needn't to freeze queue because it is called after queue
has been frozen.
iolatency calls rq_qos_add() during allocating queue, so freezing won't
add delay because queue usage refcount works at atomic mode at that
time.
iocost calls rq_qos_add() when writing cgroup attribute file, that is
fine to freeze queue at that time since we usually freeze queue when
storing to queue sysfs attribute, meantime iocost only exists on the
root cgroup.
wbt_init calls it in blk_register_queue() and queue sysfs attribute
store(queue_wb_lat_store() when write it 1st time in case of !BLK_WBT_MQ),
the following patch will speedup the queue freezing in wbt_init.
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Link: https://lore.kernel.org/r/20210609015822.103433-2-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fb1a79a6b6 ]
This commit fixes a freeze on insertion of a Guitar Hero Live PS3/WiiU
USB dongle. Indeed, with the current implementation, inserting one of
those USB dongles will lead to a hard freeze. I apologize for not
catching this earlier, it didn't occur on my old laptop.
While the issue was isolated to memory alloc/free, I could not figure
out why it causes a freeze. So this patch fixes this issue by
simplifying memory allocation and usage.
We remind that for the dongle to work properly, a control URB needs to
be sent periodically. We used to alloc/free the URB each time this URB
needed to be sent.
With this patch, the memory for the URB is allocated on the probe, reused
for as long as the dongle is plugged in, and freed once the dongle is
unplugged.
Signed-off-by: Pascal Giard <pascal.giard@etsmtl.ca>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b5539722eb ]
The Elantech touchscreen/digitizer in the Surface Go mistakenly reports
having a battery. This results in a low battery message every time you
try to use the pen.
This patch adds a quirk to ignore the non-existent battery and
gets rid of the false low battery messages.
Signed-off-by: Zoltan Tamas Vajda <zoltan.tamas.vajda@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0ec4e55e9f ]
The laptop keyboard doesn't work on many MEDION notebooks, but the
keyboard works well under Windows and Unix.
Through debugging, we found this log in the dmesg:
ACPI: IRQ 1 override to edge, high
pnp 00:03: Plug and Play ACPI device, IDs PNP0303 (active)
And we checked the IRQ definition in the DSDT, it is:
IRQ (Level, ActiveLow, Exclusive, )
{1}
So the BIOS defines the keyboard IRQ to Level_Low, but the Linux
kernel override it to Edge_High. If the Linux kernel is modified
to skip the IRQ override, the keyboard will work normally.
From the existing comment in acpi_dev_get_irqresource(), the override
function only needs to be called when IRQ() or IRQNoFlags() is used
to populate the resource descriptor, and according to Section 6.4.2.1
of ACPI 6.4 [1], if IRQ() is empty or IRQNoFlags() is used, the IRQ
is High true, edge sensitive and non-shareable. ACPICA also assumes
that to be the case (see acpi_rs_set_irq[] in rsirq.c).
In accordance with the above, check 3 additional conditions
(EdgeSensitive, ActiveHigh and Exclusive) when deciding whether or
not to treat an ACPI_RESOURCE_TYPE_IRQ resource as "legacy", in which
case the IRQ override is applicable to it.
Link: https://uefi.org/specs/ACPI/6.4/06_Device_Configuration/Device_Configuration.html#irq-descriptor # [1]
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=213031
BugLink: http://bugs.launchpad.net/bugs/1909814
Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reported-by: Manuel Krause <manuelkrause@netscape.net>
Tested-by: Manuel Krause <manuelkrause@netscape.net>
Signed-off-by: Hui Wang <hui.wang@canonical.com>
[ rjw: Subject rewrite, changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4ac7a817f1 ]
Although the system will not be in a good condition or it will not
boot if acpi_bus_init() fails, it is still necessary to put the
kobject in the error path before returning to avoid leaking memory.
Signed-off-by: Hanjun Guo <guohanjun@huawei.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c27bac0314 ]
ACPICA commit 180cb53963aa876c782a6f52cc155d951b26051a
According to the ACPI spec, _CID returns a package containing
hardware ID's. Each element of an ASL package contains a reference
count from the parent package as well as the element itself.
Name (TEST, Package() {
"String object" // this package element has a reference count of 2
})
A memory leak was caused in the _CID repair function because it did
not decrement the reference count created by the package. Fix the
memory leak by calling acpi_ut_remove_reference on _CID package elements
that represent a hardware ID (_HID).
Link: https://github.com/acpica/acpica/commit/180cb539
Tested-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Erik Kaneda <erik.kaneda@intel.com>
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fcef0e6c27 ]
This patch fixes the error path handling in lowcomms_start(). We need to
cleanup some static allocated data structure and cleanup possible
workqueue if these have started.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit cfe80306a0 ]
The new openat2() syscall verifies that no unknown O-flag values are
set and returns an error to userspace if they are while the older open
syscalls like open() and openat() simply ignore unknown flag values:
#define O_FLAG_CURRENTLY_INVALID (1 << 31)
struct open_how how = {
.flags = O_RDONLY | O_FLAG_CURRENTLY_INVALID,
.resolve = 0,
};
/* fails */
fd = openat2(-EBADF, "/dev/null", &how, sizeof(how));
/* succeeds */
fd = openat(-EBADF, "/dev/null", O_RDONLY | O_FLAG_CURRENTLY_INVALID);
However, openat2() silently truncates the upper 32 bits meaning:
#define O_FLAG_CURRENTLY_INVALID_LOWER32 (1 << 31)
#define O_FLAG_CURRENTLY_INVALID_UPPER32 (1 << 40)
struct open_how how_lowe32 = {
.flags = O_RDONLY | O_FLAG_CURRENTLY_INVALID_LOWER32,
};
struct open_how how_upper32 = {
.flags = O_RDONLY | O_FLAG_CURRENTLY_INVALID_UPPER32,
};
/* fails */
fd = openat2(-EBADF, "/dev/null", &how_lower32, sizeof(how_lower32));
/* succeeds */
fd = openat2(-EBADF, "/dev/null", &how_upper32, sizeof(how_upper32));
Fix this by preventing the immediate truncation in build_open_flags().
There's a snafu here though stripping FMODE_* directly from flags would
cause the upper 32 bits to be truncated as well due to integer promotion
rules since FMODE_* is unsigned int, O_* are signed ints (yuck).
In addition, struct open_flags currently defines flags to be 32 bit
which is reasonable. If we simply were to bump it to 64 bit we would
need to change a lot of code preemptively which doesn't seem worth it.
So simply add a compile-time check verifying that all currently known
O_* flags are within the 32 bit range and fail to build if they aren't
anymore.
This change shouldn't regress old open syscalls since they silently
truncate any unknown values anyway. It is a tiny semantic change for
openat2() but it is very unlikely people pass ing > 32 bit unknown flags
and the syscall is relatively new too.
Link: https://lore.kernel.org/r/20210528092417.3942079-3-brauner@kernel.org
Cc: Christoph Hellwig <hch@lst.de>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reported-by: Richard Guy Briggs <rgb@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Aleksa Sarai <cyphar@cyphar.com>
Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d327ea15a3 ]
sparse generates the following warning:
include/linux/prandom.h:114:45: sparse: sparse: cast truncates bits from
constant value
This is because the 64-bit seed value is manipulated and then placed in a
u32, causing an implicit cast and truncation. A forced cast to u32 doesn't
prevent this warning, which is reasonable because a typecast doesn't prove
that truncation was expected.
Logical-AND the value with 0xffffffff to make explicit that truncation to
32-bit is intended.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20210525122012.6336-3-rf@opensource.cirrus.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8aa31cbf20 ]
This patch fixes the EOF handling for TCP that if and EOF is received we
will close the socket next time the writequeue runs empty. This is a
half-closed socket functionality which doesn't exists in SCTP. The
midcomms layer will do a half closed socket functionality on DLM side to
solve this problem for the SCTP case. However there is still the last ack
flying around but other reset functionality will take care of it if it got
lost.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c6aa00e3d2 ]
These rx tx flags arguments are for signaling close_connection() from
which worker they are called. Obviously the receive worker cannot cancel
itself and vice versa for swork. For the othercon the receive worker
should only be used, however to avoid deadlocks we should pass the same
flags as the original close_connection() was called.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ba868d9dea ]
This patch will change the reconnect handling that if an error occurs
if a socket error callback is occurred. This will also handle reconnects
in a non blocking connecting case which is currently missing. If error
ECONNREFUSED is reported we delay the reconnect by one second.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b38bc9c2b3 ]
This patch holds the srcu connection read lock in cases where we lookup
the connections and accessing it. We don't hold the srcu lock in workers
function where the scheduled worker is part of the connection itself.
The connection should not be freed if any worker is scheduled or
pending.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bd63141d58 ]
refcount_inc_not_zero() in bt_tags_iter() still may read one freed
request.
Fix the issue by the following approach:
1) hold a per-tags spinlock when reading ->rqs[tag] and calling
refcount_inc_not_zero in bt_tags_iter()
2) clearing stale request referred via ->rqs[tag] before freeing
request pool, the per-tags spinlock is held for clearing stale
->rq[tag]
So after we cleared stale requests, bt_tags_iter() won't observe
freed request any more, also the clearing will wait for pending
request reference.
The idea of clearing ->rqs[] is borrowed from John Garry's previous
patch and one recent David's patch.
Tested-by: John Garry <john.garry@huawei.com>
Reviewed-by: David Jeffery <djeffery@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20210511152236.763464-4-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2e315dc07d ]
Grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter(), and
this way will prevent the request from being re-used when ->fn is
running. The approach is same as what we do during handling timeout.
Fix request use-after-free(UAF) related with completion race or queue
releasing:
- If one rq is referred before rq->q is frozen, then queue won't be
frozen before the request is released during iteration.
- If one rq is referred after rq->q is frozen, refcount_inc_not_zero()
will return false, and we won't iterate over this request.
However, still one request UAF not covered: refcount_inc_not_zero() may
read one freed request, and it will be handled in next patch.
Tested-by: John Garry <john.garry@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20210511152236.763464-3-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 12e0613715 ]
block_dump is an old debugging interface, one of it's functions is used
to print the information about who write which file on disk. If we
enable block_dump through /proc/sys/vm/block_dump and turn on debug log
level, we can gather information about write process name, target file
name and disk from kernel message. This feature is realized in
block_dump___mark_inode_dirty(), it print above information into kernel
message directly when marking inode dirty, so it is noisy and can easily
trigger log storm. At the same time, get the dentry refcount is also not
safe, we found it will lead to deadlock on ext4 file system with
data=journal mode.
After tracepoints has been introduced into the kernel, we got a
tracepoint in __mark_inode_dirty(), which is a better replacement of
block_dump___mark_inode_dirty(). The only downside is that it only trace
the inode number and not a file name, but it probably doesn't matter
because the original printed file name in block_dump is not accurate in
some cases, and we can still find it through the inode number and device
id. So this patch delete the dirting inode part of block_dump feature.
Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210313030146.2882027-2-yi.zhang@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6306f04319 ]
More ASUS laptops have the _GPE define in the DSDT table with a
different value than the _GPE number in the ECDT.
This is causing media keys not working on ASUS X505BA/BP, X542BA/BP
Add model info to the quirks list.
Signed-off-by: Chris Chiu <chris.chiu@canonical.com>
Signed-off-by: Jian-Hong Pan <jhp@endlessos.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fcd8cf0e3e ]
The Bay Trail Glavey TM800A550L tablet, which ships with Android installed
from the factory, uses a GT912 touchscreen controller which needs to have
its firmware uploaded by the OS to work (this is a first for a x86 based
device with a Goodix touchscreen controller).
Add a touchscreen_dmi entry for this which specifies the filenames
to use for the firmware and config files needed for this.
Note this matches on a GDIX1001 ACPI HID, while the original DSDT uses
a HID of GODX0911. For the touchscreen to work on these devices a DSDT
override is necessary to fix a missing IRQ and broken GPIO settings in
the ACPI-resources for the touchscreen. This override also changes the
HID to the standard GDIX1001 id typically used for Goodix touchscreens.
The DSDT override is available here:
https://fedorapeople.org/~jwrdegoede/glavey-tm800a550l-dsdt-override/
Reviewed-by: Bastien Nocera <hadess@hadess.net>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20210504185746.175461-5-hdegoede@redhat.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a22e3803f2 ]
Teclast X89 tablets come in 2 versions, with Windows pre-installed and with
Android pre-installed. These 2 versions have different DMI strings.
Add a match for the DMI strings used by the Android version BIOS.
Note the Android version BIOS has a bug in the DSDT where no IRQ is
provided, so for the touchscreen to work a DSDT override fixing this
is necessary as well.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20210504185746.175461-4-hdegoede@redhat.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5a6f0dbe62 ]
Move the DMI quirks for upside-down mounted Goodix touchscreens from
drivers/input/touchscreen/goodix.c to
drivers/platform/x86/touchscreen_dmi.c,
where all the other x86 touchscreen quirks live.
Note the touchscreen_dmi.c code attaches standard touchscreen
device-properties to an i2c-client device based on a combination of a
DMI match + a device-name match. I've verified that the: Teclast X98 Pro,
WinBook TW100 and WinBook TW700 uses an ACPI devicename of "GDIX1001:00"
based on acpidumps and/or dmesg output available on the web.
This patch was tested on a Teclast X89 tablet.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20210504185746.175461-2-hdegoede@redhat.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 900fdc4573 ]
The existing code attempted to handle numbers by doing a strto[u]l(),
ignoring the field width, and then repeatedly dividing to extract the
field out of the full converted value. If the string contains a run of
valid digits longer than will fit in a long or long long, this would
overflow and no amount of dividing can recover the correct value.
This patch fixes vsscanf() to obey number field widths when parsing
the number.
A new _parse_integer_limit() is added that takes a limit for the number
of characters to parse. The number field conversion in vsscanf is changed
to use this new function.
If a number starts with a radix prefix, the field width must be long
enough for at last one digit after the prefix. If not, it will be handled
like this:
sscanf("0x4", "%1i", &i): i=0, scanning continues with the 'x'
sscanf("0x4", "%2i", &i): i=0, scanning continues with the '4'
This is consistent with the observed behaviour of userland sscanf.
Note that this patch does NOT fix the problem of a single field value
overflowing the target type. So for example:
sscanf("123456789abcdef", "%x", &i);
Will not produce the correct result because the value obviously overflows
INT_MAX. But sscanf will report a successful conversion.
Note that where a very large number is used to mean "unlimited", the value
INT_MAX is used for consistency with the behaviour of vsnprintf().
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20210514161206.30821-2-rf@opensource.cirrus.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 65ea8f2c6e ]
Generally, the C-state latency is provided by the _CST method or
FADT, but some OEM platforms using AMD Picasso, Renoir, Van Gogh,
and Cezanne set the C2 latency greater than C3's which causes the
C2 state to be skipped.
That will block the core entering PC6, which prevents S0ix working
properly on Linux systems.
In other operating systems, the latency values are not validated and
this does not cause problems by skipping states.
To avoid this issue on Linux, detect when latencies are not an
arithmetic progression and sort them.
Link: 026d186e45
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1230#note_712174
Suggested-by: Prike Liang <Prike.Liang@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6d27975851 ]
The dep_unmet field in struct acpi_device is used to store the
number of unresolved _DEP dependencies (that is, operation region
dependencies for which there are no drivers present) for the ACPI
device object represented by it.
That field is initialized to 1 for all ACPI device objects in
acpi_add_single_object(), via acpi_init_device_object(), so as to
avoid evaluating _STA prematurely for battery device objects in
acpi_scan_init_status(), and it is "fixed up" in acpi_bus_check_add()
after the acpi_add_single_object() called by it has returned.
This is not particularly straightforward and causes dep_unmet to
remain 1 for device objects without dependencies created by invoking
acpi_add_single_object() directly, outside acpi_bus_check_add().
For this reason, rearrange acpi_add_single_object() to initialize
dep_unmet completely before calling acpi_scan_init_status(), which
requires passing one extra bool argument to it, and update all of
its callers accordingly.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f2145f8dc5 ]
Action of unbinding driver from a device is not cancellable and should not
fail, and driver core does not pay attention to the result of "remove"
method, therefore using down_interruptible() in hid_device_remove() does
not make sense.
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0ae71c7720 ]
Alban Crequy reported a race condition userspace faces when we want to
add some fds and make the syscall return them[1] using seccomp notify.
The problem is that currently two different ioctl() calls are needed by
the process handling the syscalls (agent) for another userspace process
(target): SECCOMP_IOCTL_NOTIF_ADDFD to allocate the fd and
SECCOMP_IOCTL_NOTIF_SEND to return that value. Therefore, it is possible
for the agent to do the first ioctl to add a file descriptor but the
target is interrupted (EINTR) before the agent does the second ioctl()
call.
This patch adds a flag to the ADDFD ioctl() so it adds the fd and
returns that value atomically to the target program, as suggested by
Kees Cook[2]. This is done by simply allowing
seccomp_do_user_notification() to add the fd and return it in this case.
Therefore, in this case the target wakes up from the wait in
seccomp_do_user_notification() either to interrupt the syscall or to add
the fd and return it.
This "allocate an fd and return" functionality is useful for syscalls
that return a file descriptor only, like connect(2). Other syscalls that
return a file descriptor but not as return value (or return more than
one fd), like socketpair(), pipe(), recvmsg with SCM_RIGHTs, will not
work with this flag.
This effectively combines SECCOMP_IOCTL_NOTIF_ADDFD and
SECCOMP_IOCTL_NOTIF_SEND into an atomic opteration. The notification's
return value, nor error can be set by the user. Upon successful invocation
of the SECCOMP_IOCTL_NOTIF_ADDFD ioctl with the SECCOMP_ADDFD_FLAG_SEND
flag, the notifying process's errno will be 0, and the return value will
be the file descriptor number that was installed.
[1]: https://lore.kernel.org/lkml/CADZs7q4sw71iNHmV8EOOXhUKJMORPzF7thraxZYddTZsxta-KQ@mail.gmail.com/
[2]: https://lore.kernel.org/lkml/202012011322.26DCBC64F2@keescook/
Signed-off-by: Rodrigo Campos <rodrigo@kinvolk.io>
Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Acked-by: Tycho Andersen <tycho@tycho.pizza>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210517193908.3113-4-sargun@sargun.me
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 50e7a31d30 ]
Smatch static checker warns that "mdev" can be null:
sound/usb/media.c:287 snd_media_device_create()
warn: 'mdev' can also be NULL
If CONFIG_MEDIA_CONTROLLER is disabled, this file should not be included
in the build.
The below conditions in the sound/usb/Makefile are in place to ensure that
media.c isn't included in the build.
sound/usb/Makefile:
snd-usb-audio-$(CONFIG_SND_USB_AUDIO_USE_MEDIA_CONTROLLER) += media.o
select SND_USB_AUDIO_USE_MEDIA_CONTROLLER if MEDIA_CONTROLLER &&
(MEDIA_SUPPORT=y || MEDIA_SUPPORT=SND_USB_AUDIO)
The following config check in include/media/media-dev-allocator.h is
in place to enable the API only when CONFIG_MEDIA_CONTROLLER and
CONFIG_USB are enabled.
#if defined(CONFIG_MEDIA_CONTROLLER) && defined(CONFIG_USB)
This check doesn't work as intended when CONFIG_USB=m. When CONFIG_USB=m,
CONFIG_USB_MODULE is defined and CONFIG_USB is not. The above config check
doesn't catch that CONFIG_USB is defined as a module and disables the API.
This results in sound/usb enabling Media Controller specific ALSA driver
code, while Media disables the Media Controller API.
Fix the problem requires two changes:
1. Change the check to use IS_ENABLED to detect when CONFIG_USB is enabled
as a module or static. Since CONFIG_MEDIA_CONTROLLER is a bool, leave
the check unchanged to be consistent with drivers/media/Makefile.
2. Change the drivers/media/mc/Makefile to include mc-dev-allocator.o
in mc-objs when CONFIG_USB is enabled.
Link: https://lore.kernel.org/alsa-devel/YLeAvT+R22FQ%2FEyw@mwanda/
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4fa3b91bde ]
Using register asm statements has been proven to be very error prone,
especially when using code instrumentation where gcc may add function
calls, which clobbers register contents in an unexpected way.
Therefore get rid of register asm statements in kvm code, even though
there is currently nothing wrong with them. This way we know for sure
that this bug class won't be introduced here.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Link: https://lore.kernel.org/r/20210621140356.1210771-1-hca@linux.ibm.com
[borntraeger@de.ibm.com: checkpatch strict fix]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7b1f8c6179 ]
In the step #3 of check_irq_usage(), we seach backwards to find a lock
whose usage conflicts the usage of @target_entry1 on safe/unsafe.
However, we should only keep the irq-unsafe usage of @target_entry1 into
consideration, because it could be a case where a lock is hardirq-unsafe
but soft-safe, and in check_irq_usage() we find it because its
hardirq-unsafe could result into a hardirq-safe-unsafe deadlock, but
currently since we don't filter out the other usage bits, so we may find
a lock dependency path softirq-unsafe -> softirq-safe, which in fact
doesn't cause a deadlock. And this may cause misleading lockdep splats.
Fix this by only keeping LOCKF_ENABLED_IRQ_ALL bits when we try the
backwards search.
Reported-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210618170110.3699115-4-boqun.feng@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 69c7a5fb24 ]
We use the same code to print backwards lock dependency path as the
forwards lock dependency path, and this could result into incorrect
printing because for a backwards lock_list ->trace is not the call trace
where the lock of ->class is acquired.
Fix this by introducing a separate function on printing the backwards
dependency path. Also add a few comments about the printing while we are
at it.
Reported-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210618170110.3699115-2-boqun.feng@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b05fbcc36b ]
With a config having PAGE_SIZE set to 256K, BTRFS build fails
with the following message
include/linux/compiler_types.h:326:38: error: call to
'__compiletime_assert_791' declared with attribute error:
BUILD_BUG_ON failed: (BTRFS_MAX_COMPRESSED % PAGE_SIZE) != 0
BTRFS_MAX_COMPRESSED being 128K, BTRFS cannot support platforms with
256K pages at the time being.
There are two platforms that can select 256K pages:
- hexagon
- powerpc
Disable BTRFS when 256K page size is selected. Supporting this would
require changes to the subpage mode that's currently being developed.
Given that 256K is many times larger than page sizes commonly used and
for what the algorithms and structures have been tuned, it's out of
scope and disabling build is a reasonable option.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[ update changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bcd77455d5 ]
[BUG]
With current btrfs subpage rw support, the following script can lead to
fs hang:
$ mkfs.btrfs -f -s 4k $dev
$ mount $dev -o nospace_cache $mnt
$ fsstress -w -n 100 -p 1 -s 1608140256 -v -d $mnt
The fs will hang at btrfs_start_ordered_extent().
[CAUSE]
In above test case, btrfs_invalidate() will be called with the following
parameters:
offset = 0 length = 53248 page dirty = 1 subpage dirty bitmap = 0x2000
Since @offset is 0, btrfs_invalidate() will try to invalidate the full
page, and finally call clear_page_extent_mapped() which will detach
subpage structure from the page.
And since the page no longer has subpage structure, the subpage dirty
bitmap will be cleared, preventing the dirty range from being written
back, thus no way to wake up the ordered extent.
[FIX]
Just follow other filesystems, only to invalidate the page if the range
covers the full page.
There are cases like truncate_setsize() which can call
btrfs_invalidatepage() with offset == 0 and length != 0 for the last
page of an inode.
Although the old code will still try to invalidate the full page, we are
still safe to just wait for ordered extent to finish.
So it shouldn't cause extra problems.
Tested-by: Ritesh Harjani <riteshh@linux.ibm.com> # [ppc64]
Tested-by: Anand Jain <anand.jain@oracle.com> # [aarch64]
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8c5ec99561 ]
The type of discard_bitmap_bytes and discard_extent_bytes is u64 so the
format should be %llu, though the actual values would hardly ever
overflow to negative values.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5963ffcaf3 ]
While stress testing our error handling I noticed that sometimes we
would still commit the transaction even though we had aborted the
transaction.
Currently we track if a trans handle has dirtied any metadata, and if it
hasn't we mark the filesystem as having an error (so no new transactions
can be started), but we will allow the current transaction to complete
as we do not mark the transaction itself as having been aborted.
This sounds good in theory, but we were not properly tracking IO errors
in btrfs_finish_ordered_io, and thus committing the transaction with
bogus free space data. This isn't necessarily a problem per-se with the
free space cache, as the other guards in place would have kept us from
accepting the free space cache as valid, but highlights a real world
case where we had a bug and could have corrupted the filesystem because
of it.
This "skip abort on empty trans handle" is nice in theory, but assumes
we have perfect error handling everywhere, which we clearly do not.
Also we do not allow further transactions to be started, so all this
does is save the last transaction that was happening, which doesn't
necessarily gain us anything other than the potential for real
corruption.
Remove this particular bit of code, if we decide we need to abort the
transaction then abort the current one and keep us from doing real harm
to the file system, regardless of whether this specific trans handle
dirtied anything or not.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 04587ad9be ]
If we fail to update the delayed inode we need to abort the transaction,
because we could leave an inode with the improper counts or some other
such corruption behind.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bb385bedde ]
If we get an error while looking up the inode item we'll simply bail
without cleaning up the delayed node. This results in this style of
warning happening on commit:
WARNING: CPU: 0 PID: 76403 at fs/btrfs/delayed-inode.c:1365 btrfs_assert_delayed_root_empty+0x5b/0x90
CPU: 0 PID: 76403 Comm: fsstress Tainted: G W 5.13.0-rc1+ #373
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
RIP: 0010:btrfs_assert_delayed_root_empty+0x5b/0x90
RSP: 0018:ffffb8bb815a7e50 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff95d6d07e1888 RCX: ffff95d6c0fa3000
RDX: 0000000000000002 RSI: 000000000029e91c RDI: ffff95d6c0fc8060
RBP: ffff95d6c0fc8060 R08: 00008d6d701a2c1d R09: 0000000000000000
R10: ffff95d6d1760ea0 R11: 0000000000000001 R12: ffff95d6c15a4d00
R13: ffff95d6c0fa3000 R14: 0000000000000000 R15: ffffb8bb815a7e90
FS: 00007f490e8dbb80(0000) GS:ffff95d73bc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f6e75555cb0 CR3: 00000001101ce001 CR4: 0000000000370ef0
Call Trace:
btrfs_commit_transaction+0x43c/0xb00
? finish_wait+0x80/0x80
? vfs_fsync_range+0x90/0x90
iterate_supers+0x8c/0x100
ksys_sync+0x50/0x90
__do_sys_sync+0xa/0x10
do_syscall_64+0x3d/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xae
Because the iref isn't dropped and this leaves an elevated node->count,
so any release just re-queues it onto the delayed inodes list. Fix this
by going to the out label to handle the proper cleanup of the delayed
node.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 77bbbc0cf8 ]
The POWER9 vCPU TLB management code assumes all threads in a core share
a TLB, and that TLBIEL execued by one thread will invalidate TLBs for
all threads. This is not the case for SMT8 capable POWER9 and POWER10
(big core) processors, where the TLB is split between groups of threads.
This results in TLB multi-hits, random data corruption, etc.
Fix this by introducing cpu_first_tlb_thread_sibling etc., to determine
which siblings share TLBs, and use that in the guest TLB flushing code.
[npiggin@gmail.com: add changelog and comment]
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210602040441.3984352-1-npiggin@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5471eea5d3 ]
The counter value of a perf task may leak to another RDPMC task.
For example, a perf stat task as below is running on CPU 0.
perf stat -e 'branches,cycles' -- taskset -c 0 ./workload
In the meantime, an RDPMC task, which is also running on CPU 0, may read
the GP counters periodically. (The RDPMC task creates a fixed event,
but read four GP counters.)
$./rdpmc_read_all_counters
index 0x0 value 0x8001e5970f99
index 0x1 value 0x8005d750edb6
index 0x2 value 0x0
index 0x3 value 0x0
index 0x0 value 0x8002358e48a5
index 0x1 value 0x8006bd1e3bc9
index 0x2 value 0x0
index 0x3 value 0x0
It is a potential security issue. Once the attacker knows what the other
thread is counting. The PerfMon counter can be used as a side-channel to
attack cryptosystems.
The counter value of the perf stat task leaks to the RDPMC task because
perf never clears the counter when it's stopped.
Three methods were considered to address the issue.
- Unconditionally reset the counter in x86_pmu_del(). It can bring extra
overhead even when there is no RDPMC task running.
- Only reset the un-assigned dirty counters when the RDPMC task is
scheduled in via sched_task(). It fails for the below case.
Thread A Thread B
clone(CLONE_THREAD) --->
set_affine(0)
set_affine(1)
while (!event-enabled)
;
event = perf_event_open()
mmap(event)
ioctl(event, IOC_ENABLE); --->
RDPMC
Counters are still leaked to the thread B.
- Only reset the un-assigned dirty counters before updating the CR4.PCE
bit. The method is implemented here.
The dirty counter is a counter, on which the assigned event has been
deleted, but the counter is not reset. To track the dirty counters,
add a 'dirty' variable in the struct cpu_hw_events.
The security issue can only be found with an RDPMC task. To enable the
RDMPC, the CR4.PCE bit has to be updated. Add a
perf_clear_dirty_counters() right before updating the CR4.PCE bit to
clear the existing dirty counters. Only the current un-assigned dirty
counters are reset, because the RDPMC assigned dirty counters will be
updated soon.
After applying the patch,
$ ./rdpmc_read_all_counters
index 0x0 value 0x0
index 0x1 value 0x0
index 0x2 value 0x0
index 0x3 value 0x0
index 0x0 value 0x0
index 0x1 value 0x0
index 0x2 value 0x0
index 0x3 value 0x0
Performance
The performance of a context switch only be impacted when there are two
or more perf users and one of the users must be an RDPMC user. In other
cases, there is no performance impact.
The worst-case occurs when there are two users: the RDPMC user only
uses one counter; while the other user uses all available counters.
When the RDPMC task is scheduled in, all the counters, other than the
RDPMC assigned one, have to be reset.
Test results for the worst-case, using a modified lat_ctx as measured
on an Ice Lake platform, which has 8 GP and 3 FP counters (ignoring
SLOTS).
lat_ctx -s 128K -N 1000 processes 2
Without the patch:
The context switch time is 4.97 us
With the patch:
The context switch time is 5.16 us
There is ~4% performance drop for the context switching time in the
worst-case.
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/1623693582-187370-1-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 489f16459e ]
Energy Aware Scheduling (EAS) needs to be able to predict the frequency
requests made by the SchedUtil governor to properly estimate energy used
in the future. It has to take into account CPUs utilization and forecast
Performance Domain (PD) frequency. There is a corner case when the max
allowed frequency might be reduced due to thermal. SchedUtil is aware of
that reduced frequency, so it should be taken into account also in EAS
estimations.
SchedUtil, as a CPUFreq governor, knows the maximum allowed frequency of
a CPU, thanks to cpufreq_driver_resolve_freq() and internal clamping
to 'policy::max'. SchedUtil is responsible to respect that upper limit
while setting the frequency through CPUFreq drivers. This effective
frequency is stored internally in 'sugov_policy::next_freq' and EAS has
to predict that value.
In the existing code the raw value of arch_scale_cpu_capacity() is used
for clamping the returned CPU utilization from effective_cpu_util().
This patch fixes issue with too big single CPU utilization, by introducing
clamping to the allowed CPU capacity. The allowed CPU capacity is a CPU
capacity reduced by thermal pressure raw value.
Thanks to knowledge about allowed CPU capacity, we don't get too big value
for a single CPU utilization, which is then added to the util sum. The
util sum is used as a source of information for estimating whole PD energy.
To avoid wrong energy estimation in EAS (due to capped frequency), make
sure that the calculation of util sum is aware of allowed CPU capacity.
This thermal pressure might be visible in scenarios where the CPUs are not
heavily loaded, but some other component (like GPU) drastically reduced
available power budget and increased the SoC temperature. Thus, we still
use EAS for task placement and CPUs are not over-utilized.
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Link: https://lore.kernel.org/r/20210614191128.22735-1-lukasz.luba@arm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 148c847c9e ]
pwmX_enable supports three possible values:
0: Fan control disabled. Duty cycle is fixed to 0%
1: Fan control enabled, pwm mode. Duty cycle is determined by
values written into Target Duty Cycle registers.
2: Fan control enabled, rpm mode
Duty cycle is adjusted such that fan speed matches
the values in Target Count registers
The current code does not do this; instead, it mixes pwm control
configuration with fan speed monitoring configuration. Worse, it
reports that pwm control would be disabled (pwmX_enable==0) when
it is in fact enabled in pwm mode. Part of the problem may be that
the chip sets the "TACH input enable" bit on its own whenever the
mode bit is set to RPM mode, but that doesn't mean that "TACH input
enable" accurately reflects the pwm mode.
Fix it up and only handle pwm control with the pwmX_enable attributes.
In the documentation, clarify that disabling pwm control (pwmX_enable=0)
sets the pwm duty cycle to 0%. In the code, explain why TACH_INPUT_EN
is set together with RPM_MODE.
While at it, only update the configuration register if the configuration
has changed, and only update the cached configuration if updating the
chip configuration was successful.
Cc: Jan Kundrát <jan.kundrat@cesnet.cz>
Cc: Václav Kubernát <kubernat@cesnet.cz>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Václav Kubernát <kubernat@cesnet.cz>
Reviewed-by: Jan Kundrát <jan.kundrat@cesnet.cz>
Link: https://lore.kernel.org/r/20210526154022.3223012-4-linux@roeck-us.net
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 897f633989 ]
The MAX31790 has two sets of registers for pwm duty cycles, one to request
a duty cycle and one to read the actual current duty cycle. Both do not
have to be the same.
When reporting the pwm duty cycle to the user, the actual pwm duty cycle
from pwm duty cycle registers needs to be reported. When setting it, the
pwm target duty cycle needs to be written. Since we don't know the actual
pwm duty cycle after a target pwm duty cycle has been written, set the
valid flag to false to indicate that actual pwm duty cycle should be read
from the chip instead of using cached values.
Cc: Jan Kundrát <jan.kundrat@cesnet.cz>
Cc: Václav Kubernát <kubernat@cesnet.cz>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Václav Kubernát <kubernat@ceesnet.cz>
Reviewed-by: Jan Kundrát <jan.kundrat@cesnet.cz>
Link: https://lore.kernel.org/r/20210526154022.3223012-3-linux@roeck-us.net
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e198be37e5 ]
Some BT.656 sensors (e.g. ADV718x) transmit frames with unstable BT.656
sync codes after initial power on. This confuses the imx CSI,resulting
in vertical and/or horizontal sync issues. Skip the first 20 frames
to avoid the unstable sync codes.
[fabio: fixed checkpatch warning and increased the frame skipping to 20]
Signed-off-by: Steve Longerbeam <slongerbeam@gmail.com>
Signed-off-by: Fabio Estevam <festevam@gmail.com>
Reviewed-by: Tim Harvey <tharvey@gateworks.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5368b1ee29 ]
As reported by smatch:
drivers/media/common/siano/smsdvb-main.c:1231 smsdvb_hotplug() warn: '&client->entry' not removed from list
If an error occur at the end of the registration logic, it won't
drop the device from the list.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1fec2ecc25 ]
As reported by smatch:
drivers/media/dvb-core/dvbdev.c: drivers/media/dvb-core/dvbdev.c:510 dvb_register_device() warn: '&dvbdev->list_head' not removed from list
drivers/media/dvb-core/dvbdev.c: drivers/media/dvb-core/dvbdev.c:530 dvb_register_device() warn: '&dvbdev->list_head' not removed from list
drivers/media/dvb-core/dvbdev.c: drivers/media/dvb-core/dvbdev.c:545 dvb_register_device() warn: '&dvbdev->list_head' not removed from list
The error logic inside dvb_register_device() doesn't remove
devices from the dvb_adapter_list in case of errors.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit abc0226df6 ]
The risk of especulation is actually almost-non-existing here,
as there are very few users of TCP/IP using the DVB stack,
as, this is mainly used with DVB-S/S2 cards, and only by people
that receives TCP/IP from satellite connections, which limits
a lot the number of users of such feature(*).
(*) In thesis, DVB-C cards could also benefit from it, but I'm
yet to see a hardware that supports it.
Yet, fixing it is trivial.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 22ca9f4aaf ]
crypto_shash_alg_has_setkey() is implemented by testing whether the
.setkey() member of a struct shash_alg points to the default version,
called shash_no_setkey(). As crypto_shash_alg_has_setkey() is a static
inline, this requires shash_no_setkey() to be exported to modules.
Unfortunately, when building with CFI, function pointers are routed
via CFI stubs which are private to each module (or to the kernel proper)
and so this function pointer comparison may fail spuriously.
Let's fix this by turning crypto_shash_alg_has_setkey() into an out of
line function.
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a7ab186f60 ]
The card timing and the bus frequency are not changed atomically with
respect to calls to the set_clock() callback in the driver. The result
is the driver sees a transient state where there's a mismatch between
the two and thus the inputs to the phase correction calculation
formula are garbage.
Switch from dev_warn() to dev_dbg() to avoid noise in the normal case,
though the change does make bad configurations less likely to be
noticed.
Reported-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Link: https://lore.kernel.org/r/20210607013020.85885-1-andrew@aj.id.au
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 42933c8aa1 ]
This patch fixes the following issues:
1. memstick_free_host() will free the host, so the use of ms_dev(host) after
it will be a problem. To fix this, move memstick_free_host() after when we
are done with ms_dev(host).
2. In rtsx_usb_ms_drv_remove(), pm need to be disabled before we remove
and free host otherwise memstick_check will be called and UAF will
happen.
[ 11.351173] BUG: KASAN: use-after-free in rtsx_usb_ms_drv_remove+0x94/0x140 [rtsx_usb_ms]
[ 11.357077] rtsx_usb_ms_drv_remove+0x94/0x140 [rtsx_usb_ms]
[ 11.357376] platform_remove+0x2a/0x50
[ 11.367531] Freed by task 298:
[ 11.368537] kfree+0xa4/0x2a0
[ 11.368711] device_release+0x51/0xe0
[ 11.368905] kobject_put+0xa2/0x120
[ 11.369090] rtsx_usb_ms_drv_remove+0x8c/0x140 [rtsx_usb_ms]
[ 11.369386] platform_remove+0x2a/0x50
[ 12.038408] BUG: KASAN: use-after-free in __mutex_lock.isra.0+0x3ec/0x7c0
[ 12.045432] mutex_lock+0xc9/0xd0
[ 12.046080] memstick_check+0x6a/0x578 [memstick]
[ 12.046509] process_one_work+0x46d/0x750
[ 12.052107] Freed by task 297:
[ 12.053115] kfree+0xa4/0x2a0
[ 12.053272] device_release+0x51/0xe0
[ 12.053463] kobject_put+0xa2/0x120
[ 12.053647] rtsx_usb_ms_drv_remove+0xc4/0x140 [rtsx_usb_ms]
[ 12.053939] platform_remove+0x2a/0x50
Signed-off-by: Tong Zhang <ztong0001@gmail.com>
Co-developed-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20210511163944.1233295-1-ztong0001@gmail.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9ad1efee08 ]
When the driver fails to talk with the hardware with dvb_usb_generic_rw,
it will return an error to dvb_usb_adapter_frontend_init. However, the
driver forgets to free the resource (e.g., struct cinergyt2_fe_state),
which leads to a memory leak.
Fix this by freeing struct cinergyt2_fe_state when dvb_usb_generic_rw
fails in cinergyt2_frontend_attach.
backtrace:
[<0000000056e17b1a>] kmalloc include/linux/slab.h:552 [inline]
[<0000000056e17b1a>] kzalloc include/linux/slab.h:682 [inline]
[<0000000056e17b1a>] cinergyt2_fe_attach+0x21/0x80 drivers/media/usb/dvb-usb/cinergyT2-fe.c:271
[<00000000ae0b1711>] cinergyt2_frontend_attach+0x21/0x70 drivers/media/usb/dvb-usb/cinergyT2-core.c:74
[<00000000d0254861>] dvb_usb_adapter_frontend_init+0x11b/0x1b0 drivers/media/usb/dvb-usb/dvb-usb-dvb.c:290
[<0000000002e08ac6>] dvb_usb_adapter_init drivers/media/usb/dvb-usb/dvb-usb-init.c:84 [inline]
[<0000000002e08ac6>] dvb_usb_init drivers/media/usb/dvb-usb/dvb-usb-init.c:173 [inline]
[<0000000002e08ac6>] dvb_usb_device_init.cold+0x4d0/0x6ae drivers/media/usb/dvb-usb/dvb-usb-init.c:287
Reported-by: syzbot+e1de8986786b3722050e@syzkaller.appspotmail.com
Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com>
Signed-off-by: Sean Young <sean@mess.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 27f2a4db76 ]
GDB produces the following warning when debugging kernels built with
CONFIG_RELR:
BFD: /android0/linux-next/vmlinux: unknown type [0x13] section `.relr.dyn'
when loading a kernel built with CONFIG_RELR into GDB. It can also
prevent debugging symbols using such relocations.
Peter sugguests:
[That flag] means that lld will use dynamic tags and section type
numbers in the OS-specific range rather than the generic range. The
kernel itself doesn't care about these numbers; it determines the
location of the RELR section using symbols defined by a linker script.
Link: https://github.com/ClangBuiltLinux/linux/issues/1057
Suggested-by: Peter Collingbourne <pcc@google.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/r/20210522012626.2811297-1-ndesaulniers@google.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bf6fa2c0dd ]
The code in entry-common.c runs at exception entry and return
boundaries, where portions of the kernel environment aren't available.
For example, RCU may not be watching, and lockdep state may be
out-of-sync with the hardware. Due to this, it is not sound to
instrument this code.
We generally avoid instrumentation by marking the entry functions as
`noinstr`, but currently this doesn't inhibit KCOV instrumentation.
Prevent this by disabling KCOV for the entire compilation unit.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210607094624.34689-20-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b7fdd20868 ]
When ctx_id >= HVA_MAX_INSTANCES in hva_hw_its_irq_thread() it tries to
access fields of ctx that is NULL at that point. The patch gets rid of
these accesses.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Evgeny Novikov <novikov@ispras.ru>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d84b9202d7 ]
The driver should only set the payload on .buf_prepare if the
buffer is CAPTURE type. If an OUTPUT buffer has a zero bytesused
set by userspace then v4l2-core will set it to buffer length.
If we overwrite bytesused for OUTPUT buffers, too, then
vb2_get_plane_payload() will return incorrect value which might be then
written to hw registers by the driver in cedrus_h264.c or cedrus_vp8.c.
Signed-off-by: Andrzej Pietrasiewicz <andrzej.p@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 082aaecff3 ]
The driver should only set the payload on .buf_prepare if the
buffer is CAPTURE type. If an OUTPUT buffer has a zero bytesused
set by userspace then v4l2-core will set it to buffer length.
If we overwrite bytesused for OUTPUT buffers, too, then
vb2_get_plane_payload() will return incorrect value which might be then
written to hw registers by the driver in hantro_g1_h264_dec.c.
Signed-off-by: Andrzej Pietrasiewicz <andrzej.p@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ac56886371 ]
The em28xx struct kref isn't being decreased after an error in the
em28xx_ir_init, leading to a possible memory leak.
A kref_put and em28xx_shutdown_buttons is added to the error handler code.
Signed-off-by: Igor Matheus Andrade Torrente <igormtorrente@gmail.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a3a54bf4bd ]
There is a problem with the tasklet in bt878. bt->tasklet is set by
dvb-bt8xx.ko, and bt878.ko can be loaded independently.
In this case if interrupt comes it may cause null-ptr-dereference.
To solve this issue, we check if the tasklet is actually set before
calling tasklet_schedule.
[ 1.750438] bt878(0): irq FDSR FBUS risc_pc=
[ 1.750728] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 1.752969] RIP: 0010:0x0
[ 1.757526] Call Trace:
[ 1.757659] <IRQ>
[ 1.757770] tasklet_action_common.isra.0+0x107/0x110
[ 1.758041] tasklet_action+0x22/0x30
[ 1.758237] __do_softirq+0xe0/0x29b
[ 1.758430] irq_exit_rcu+0xa4/0xb0
[ 1.758618] common_interrupt+0x8d/0xa0
[ 1.758824] </IRQ>
Signed-off-by: Tong Zhang <ztong0001@gmail.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 24786ccd9c ]
On some platform(imx6q), xvclk might not switch on in advance,
also for power save purpose, xvclk should not be always on.
so, add clk_prepare_enable(), clk_disable_unprepare() in driver
side to set xvclk on/off at proper stage.
Add following changes:
- add 'struct clk *clk;' in 'struct ov2659 {}'
- enable xvclk in ov2659_power_on()
- disable xvclk in ov2659_power_off()
Signed-off-by: Dillon Min <dillon.minfei@gmail.com>
Acked-by: Lad Prabhakar <prabhakar.csengg@gmail.com>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ed1b472fc1 ]
Files might come from a remote source and might have xattrs, including
security.ima. It should not be IMA task to decide whether security.ima
should be kept or not. This patch removes the removexattr() system
call in ima_inode_post_setattr().
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8fd28fa504 ]
NIST P192 is not unregistered if failed to register NIST P256,
actually it need to unregister the algorithms already registered.
Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6889fc2104 ]
Add a comment that p192 will fail to register in FIPS mode.
Fix ecdh-nist-p192's entry in testmgr by removing the ifdefs
and not setting fips_allowed.
Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1339a7c3ba ]
Use the sg count returned by dma_map_sg to call into
dmaengine_prep_slave_sg rather than using the original sg count. dma_map_sg
can merge consecutive sglist entries, thus making the original sg count
wrong. This is a fix for memory coruption issues observed while testing
encryption/decryption of large messages using libkcapi framework.
Patch has been tested further by running full suite of tcrypt.ko tests
including fuzz tests.
Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9afe77cf25 ]
Remove the unused macro ICP_DH895XCC_PESRAM_BAR_SIZE in the firmware
loader.
This is to fix the following warning when compiling the driver using the
clang compiler with CC=clang W=2:
drivers/crypto/qat/qat_common/qat_uclo.c:345:9: warning: macro is not used [-Wunused-macros]
Signed-off-by: Jack Xu <jack.xu@intel.com>
Co-developed-by: Zhehui Xiang <zhehui.xiang@intel.com>
Signed-off-by: Zhehui Xiang <zhehui.xiang@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 96b5722920 ]
Check the return code of the function qat_hal_rd_rel_reg() and return it
to the caller.
This is to fix the following warning when compiling the driver with
clang scan-build:
drivers/crypto/qat/qat_common/qat_hal.c:1436:2: warning: 6th function call argument is an uninitialized value
Signed-off-by: Jack Xu <jack.xu@intel.com>
Co-developed-by: Zhehui Xiang <zhehui.xiang@intel.com>
Signed-off-by: Zhehui Xiang <zhehui.xiang@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f8194e5e63 ]
syzbot has reported the following warning in pvr2_i2c_done:
sysfs group 'power' not found for kobject '1-0043'
When the device is disconnected (pvr_hdw_disconnect), the i2c adapter is
not unregistered along with the USB and v4l2 teardown. As part of the USB
device disconnect, the sysfs files of the subdevices are also deleted.
So, by the time pvr_i2c_core_done is called by pvr_context_destroy, the
sysfs files have been deleted.
To fix this, unregister the i2c adapter too in pvr_hdw_disconnect. Make
the device deregistration code shared by calling pvr_hdw_disconnect from
pvr2_hdw_destroy.
Reported-by: syzbot+e74a998ca8f1df9cc332@syzkaller.appspotmail.com
Tested-by: syzbot+e74a998ca8f1df9cc332@syzkaller.appspotmail.com
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Anirudh Rayabharam <mail@anirudhrb.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 67a7e53d5b ]
Dependent slice segment flag for PPS control is misnamed. It should have
"enabled" at the end. It only tells if this flag is present in slice
header or not and not the actual value.
Fix this by renaming the PPS flag and introduce another flag for slice
control which tells actual value.
Signed-off-by: Jernej Skrabec <jernej.skrabec@siol.net>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3d37ef41be ]
The cobalt_s_bit_sysctrl reads the old register value over PCI,
then changes a bit and sets writes the new value to the register.
This is used among other things for setting the HPD output pin.
But if the HPD is changed for multiple inputs at the same time,
then this causes a race condition where a stale value is read.
Serialize this function with a mutex.
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 00b89fe019 ]
For all intents and purposes, the idle task is a per-CPU kthread. It isn't
created via the same route as other pcpu kthreads however, and as a result
it is missing a few bells and whistles: it fails kthread_is_per_cpu() and
it doesn't have PF_NO_SETAFFINITY set.
Fix the former by giving the idle task a kthread struct along with the
KTHREAD_IS_PER_CPU flag. This requires some extra iffery as init_idle()
call be called more than once on the same idle task.
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210510151024.2448573-2-valentin.schneider@arm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 56c1f08762 ]
The right thing to do is to add a new object to the building
system when a certain config option is selected, and *not*
override them.
So, fix obj-$(config) logic at sti makefiles, using "+=",
instead of ":=".
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 06676aa1f4 ]
This patch adds missing MODULE_DEVICE_TABLE definition which generates
correct modalias for automatic loading of this driver when it is built
as an external module.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Bixuan Cui <cuibixuan@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f1a0a376ca ]
As pointed out by commit
de9b8f5dcb ("sched: Fix crash trying to dequeue/enqueue the idle thread")
init_idle() can and will be invoked more than once on the same idle
task. At boot time, it is invoked for the boot CPU thread by
sched_init(). Then smp_init() creates the threads for all the secondary
CPUs and invokes init_idle() on them.
As the hotplug machinery brings the secondaries to life, it will issue
calls to idle_thread_get(), which itself invokes init_idle() yet again.
In this case it's invoked twice more per secondary: at _cpu_up(), and at
bringup_cpu().
Given smp_init() already initializes the idle tasks for all *possible*
CPUs, no further initialization should be required. Now, removing
init_idle() from idle_thread_get() exposes some interesting expectations
with regards to the idle task's preempt_count: the secondary startup always
issues a preempt_disable(), requiring some reset of the preempt count to 0
between hot-unplug and hotplug, which is currently served by
idle_thread_get() -> idle_init().
Given the idle task is supposed to have preemption disabled once and never
see it re-enabled, it seems that what we actually want is to initialize its
preempt_count to PREEMPT_DISABLED and leave it there. Do that, and remove
init_idle() from idle_thread_get().
Secondary startups were patched via coccinelle:
@begone@
@@
-preempt_disable();
...
cpu_startup_entry(CPUHP_AP_ONLINE_IDLE);
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210512094636.2958515-1-valentin.schneider@arm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 59087b66ea ]
The pm_runtime_get_sync() internally increments the
dev->power.usage_count without decrementing it, even on errors.
Replace it by the new pm_runtime_resume_and_get(), introduced by:
commit dd8088d5a8 ("PM: runtime: Add pm_runtime_resume_and_get to deal with usage counter")
in order to properly decrement the usage counter, avoiding
a potential PM usage counter leak.
As a bonus, as pm_runtime_get_sync() always return 0 on
success, the logic can be simplified.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 59f96244af ]
The pm_runtime_get_sync() internally increments the
dev->power.usage_count without decrementing it, even on errors.
On some places, this is ok, but on others the usage count
ended being unbalanced on failures.
Replace it by the new pm_runtime_resume_and_get(), introduced by:
commit dd8088d5a8 ("PM: runtime: Add pm_runtime_resume_and_get to deal with usage counter")
in order to properly decrement the usage counter, avoiding
a potential PM usage counter leak.
As a bonus, such function always return zero on success. So,
some code can be simplified.
Reviewed-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c44eac5b72 ]
The pm_runtime_get_sync() internally increments the
dev->power.usage_count without decrementing it, even on errors.
The bdisp_start_streaming() doesn't take it into account, which
would unbalance PM usage counter at bdisp_stop_streaming().
The logic at bdisp_probe() is correct, but the best is to use
the same call along the driver.
So, replace it by the new pm_runtime_resume_and_get(), introduced by:
commit dd8088d5a8 ("PM: runtime: Add pm_runtime_resume_and_get to deal with usage counter")
in order to properly decrement the usage counter, avoiding
a potential PM usage counter leak.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9c298f82d8 ]
The pm_runtime_get_sync() internally increments the
dev->power.usage_count without decrementing it, even on errors.
Replace it by the new pm_runtime_resume_and_get(), introduced by:
commit dd8088d5a8 ("PM: runtime: Add pm_runtime_resume_and_get to deal with usage counter")
in order to properly decrement the usage counter, avoiding
a potential PM usage counter leak.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 10343de268 ]
The pm_runtime_get_sync() internally increments the
dev->power.usage_count without decrementing it, even on errors.
Replace it by the new pm_runtime_resume_and_get(), introduced by:
commit dd8088d5a8 ("PM: runtime: Add pm_runtime_resume_and_get to deal with usage counter")
in order to properly decrement the usage counter, avoiding
a potential PM usage counter leak.
As a plus, pm_runtime_resume_and_get() doesn't return
positive numbers, so the return code validation can
be removed.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Acked-by: Andrzej Pietrasiewicz <andrzejtp2010@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 908711f542 ]
Currently, the driver just assumes that PM runtime logic
succeded resuming the device.
That may not be the case, as pm_runtime_get_sync()
can fail (but keeping the usage count incremented).
Replace the code to use pm_runtime_resume_and_get(),
and letting it return the error code.
This way, if mtk_vcodec_dec_pw_on() fails, the logic
under fops_vcodec_open() will do the right thing and
return an error, instead of just assuming that the
device is ready to be used.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6e8b1526db ]
The pm_runtime_get_sync() internally increments the
dev->power.usage_count without decrementing it, even on errors.
Replace it by the new pm_runtime_resume_and_get(), introduced by:
commit dd8088d5a8 ("PM: runtime: Add pm_runtime_resume_and_get to deal with usage counter")
in order to properly decrement the usage counter, avoiding
a potential PM usage counter leak.
While here, check if the PM runtime error was caught at open time.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c41e024933 ]
The pm_runtime_get_sync() internally increments the
dev->power.usage_count without decrementing it, even on errors.
Replace it by the new pm_runtime_resume_and_get(), introduced by:
commit dd8088d5a8 ("PM: runtime: Add pm_runtime_resume_and_get to deal with usage counter")
in order to properly decrement the usage counter, avoiding
a potential PM usage counter leak.
While here, ensure that the driver will check if PM runtime
resumed at vpfe_initialize_device().
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fdc34e82c0 ]
The pm_runtime_get_sync() internally increments the
dev->power.usage_count without decrementing it, even on errors.
Replace it by the new pm_runtime_resume_and_get(), introduced by:
commit dd8088d5a8 ("PM: runtime: Add pm_runtime_resume_and_get to deal with usage counter")
in order to properly decrement the usage counter, avoiding
a potential PM usage counter leak.
While here, check if the PM runtime error was caught at
s5p_cec_adap_enable().
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d07bb9702c ]
The pm_runtime_get_sync() internally increments the
dev->power.usage_count without decrementing it, even on errors.
Replace it by the new pm_runtime_resume_and_get(), introduced by:
commit dd8088d5a8 ("PM: runtime: Add pm_runtime_resume_and_get to deal with usage counter")
in order to properly decrement the usage counter, avoiding
a potential PM usage counter leak.
While here, fix the return contition of mtk_mdp_m2m_start_streaming(),
as it doesn't make any sense to return 0 if the PM runtime failed
to resume.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e7c617cab7 ]
Calling pm_runtime_get_sync() is bad, since even when it
returns an error, pm_runtime_put*() should be called.
So, use instead pm_runtime_resume_and_get().
While here, ensure that the error condition will be checked
during clock enable an media open() calls.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 62c9044686 ]
The PM runtime get logic is currently broken, as it checks if
ret is zero instead of checking if it is an error code,
as reported by Dan Carpenter.
While here, use the pm_runtime_resume_and_get() as added by:
commit dd8088d5a8 ("PM: runtime: Add pm_runtime_resume_and_get to deal with usage counter")
added pm_runtime_resume_and_get() in order to automatically handle
dev->power.usage_count decrement on errors. As a bonus, such function
always return zero on success.
It should also be noticed that a fail of pm_runtime_get_sync() would
potentially result in a spurious runtime_suspend(), instead of
using pm_runtime_put_noidle().
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Daniele Alessandrelli <daniele.alessandrelli@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e90812c47b ]
The pm_runtime_get_sync() internally increments the
dev->power.usage_count without decrementing it, even on errors.
Replace it by the new pm_runtime_resume_and_get(), introduced by:
commit dd8088d5a8 ("PM: runtime: Add pm_runtime_resume_and_get to deal with usage counter")
in order to properly decrement the usage counter, avoiding
a potential PM usage counter leak.
Reviewed-by: Ezequiel Garcia <ezequiel@collabora.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 905ae01c4a ]
For RLIMIT_NPROC and some other rlimits the user_struct that holds the
global limit is kept alive for the lifetime of a process by keeping it
in struct cred. Adding a pointer to ucounts in the struct cred will
allow to track RLIMIT_NPROC not only for user in the system, but for
user in the user_namespace.
Updating ucounts may require memory allocation which may fail. So, we
cannot change cred.ucounts in the commit_creds() because this function
cannot fail and it should always return 0. For this reason, we modify
cred.ucounts before calling the commit_creds().
Changelog
v6:
* Fix null-ptr-deref in is_ucounts_overlimit() detected by trinity. This
error was caused by the fact that cred_alloc_blank() left the ucounts
pointer empty.
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Alexey Gladkov <legion@kernel.org>
Link: https://lkml.kernel.org/r/b37aaef28d8b9b0d757e07ba6dd27281bbe39259.1619094428.git.legion@kernel.org
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0e793ba77c ]
Currently, the SPI core doesn't set the struct device fwnode pointer
when it creates a new SPI device. This means when the device is
registered the fwnode is NULL and the check in device_add which sets
the fwnode->dev pointer is skipped. This wasn't previously an issue,
however these two patches:
commit 4731210c09 ("gpiolib: Bind gpio_device to a driver to enable
fw_devlink=on by default")
commit ced2af4195 ("gpiolib: Don't probe gpio_device if it's not the
primary device")
Added some code to the GPIO core which relies on using that
fwnode->dev pointer to determine if a driver is bound to the fwnode
and if not bind a stub GPIO driver. This means the GPIO providers
behind SPI will get both the expected driver and this stub driver
causing the stub driver to fail if it attempts to request any pin
configuration. For example on my system:
madera-pinctrl madera-pinctrl: pin gpio5 already requested by madera-pinctrl; cannot claim for gpiochip3
madera-pinctrl madera-pinctrl: pin-4 (gpiochip3) status -22
madera-pinctrl madera-pinctrl: could not request pin 4 (gpio5) from group aif1 on device madera-pinctrl
gpio_stub_drv gpiochip3: Error applying setting, reverse things back
gpio_stub_drv: probe of gpiochip3 failed with error -22
The firmware node on the device created by the GPIO framework is set
through the of_node pointer hence things generally actually work,
however that fwnode->dev is never set, as the check was skipped at
device_add time. This fix appears to match how the I2C subsystem
handles the same situation.
Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Link: https://lore.kernel.org/r/20210421101402.8468-1-ckeepax@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2ad8ccc17d ]
The thermal pressure signal gives information to the scheduler about
reduced CPU capacity due to thermal. It is based on a value stored in
a per-cpu 'thermal_pressure' variable. The online CPUs will get the
new value there, while the offline won't. Unfortunately, when the CPU
is back online, the value read from per-cpu variable might be wrong
(stale data). This might affect the scheduler decisions, since it
sees the CPU capacity differently than what is actually available.
Fix it by making sure that all online+offline CPUs would get the
proper value in their per-cpu variable when thermal framework sets
capping.
Fixes: f12e4f66ab ("thermal/cpu-cooling: Update thermal pressure in case of a maximum frequency capping")
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://lore.kernel.org/r/20210614191030.22241-1-lukasz.luba@arm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 80ef08670d upstream.
A request could end up on the fpq->io list after fuse_abort_conn() has
reset fpq->connected and aborted requests on that list:
Thread-1 Thread-2
======== ========
->fuse_simple_request() ->shutdown
->__fuse_request_send()
->queue_request() ->fuse_abort_conn()
->fuse_dev_do_read() ->acquire(fpq->lock)
->wait_for(fpq->lock) ->set err to all req's in fpq->io
->release(fpq->lock)
->acquire(fpq->lock)
->add req to fpq->io
After the userspace copy is done the request will be ended, but
req->out.h.error will remain uninitialized. Also the copy might block
despite being already aborted.
Fix both issues by not allowing the request to be queued on the fpq->io
list after fuse_abort_conn() has processed this list.
Reported-by: Pradeep P V K <pragalla@codeaurora.org>
Fixes: fd22d62ed0 ("fuse: no fc->lock for iqueue parts")
Cc: <stable@vger.kernel.org> # v4.2
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b89ecd60d3 upstream.
Fix the "fuse: trying to steal weird page" warning.
Description from Johannes Weiner:
"Think of it as similar to PG_active. It's just another usage/heat
indicator of file and anon pages on the reclaim LRU that, unlike
PG_active, persists across deactivation and even reclaim (we store it in
the page cache / swapper cache tree until the page refaults).
So if fuse accepts pages that can legally have PG_active set,
PG_workingset is fine too."
Reported-by: Thomas Lindroth <thomas.lindroth@gmail.com>
Fixes: 1899ad18c6 ("mm: workingset: tell cache transitions from workingset thrashing")
Cc: <stable@vger.kernel.org> # v4.20
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e4a9ccdd1c upstream.
We don't set the SB_BORN flag on submounts. This is wrong as these
superblocks are then considered as partially constructed or dying
in the rest of the code and can break some assumptions.
One such case is when you have a virtiofs filesystem with submounts
and you try to mount it again : virtio_fs_get_tree() tries to obtain
a superblock with sget_fc(). The logic in sget_fc() is to loop until
it has either found an existing matching superblock with SB_BORN set
or to create a brand new one. It is assumed that a superblock without
SB_BORN is transient and the loop is restarted. Forgetting to set
SB_BORN on submounts hence causes sget_fc() to retry forever.
Setting SB_BORN requires special care, i.e. a write barrier for
super_cache_count() which can check SB_BORN without taking any lock.
We should call vfs_get_tree() to deal with that but this requires
to have a proper ->get_tree() implementation for submounts, which
is a bigger piece of work. Go for a simple bug fix in the meatime.
Fixes: bf109c6404 ("fuse: implement crossmounts")
Cc: stable@vger.kernel.org # v5.10+
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e3a43f2a95 upstream.
As soon as fuse_dentry_automount() does up_write(&sb->s_umount), the
superblock can theoretically be killed. If this happens before the
submount was added to the &fc->mounts list, fuse_mount_remove() later
crashes in list_del_init() because it assumes the submount to be
already there.
Add the submount before dropping sb->s_umount to fix the inconsistency.
It is okay to nest fc->killsb under sb->s_umount, we already do this
on the ->kill_sb() path.
Signed-off-by: Greg Kurz <groug@kaod.org>
Fixes: bf109c6404 ("fuse: implement crossmounts")
Cc: stable@vger.kernel.org # v5.10+
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d92d88f056 upstream.
If fuse_fill_super_submount() returns an error, the error path
triggers a crash:
[ 26.206673] BUG: kernel NULL pointer dereference, address: 0000000000000000
[...]
[ 26.226362] RIP: 0010:__list_del_entry_valid+0x25/0x90
[...]
[ 26.247938] Call Trace:
[ 26.248300] fuse_mount_remove+0x2c/0x70 [fuse]
[ 26.248892] virtio_kill_sb+0x22/0x160 [virtiofs]
[ 26.249487] deactivate_locked_super+0x36/0xa0
[ 26.250077] fuse_dentry_automount+0x178/0x1a0 [fuse]
The crash happens because fuse_mount_remove() assumes that the FUSE
mount was already added to list under the FUSE connection, but this
only done after fuse_fill_super_submount() has returned success.
This means that until fuse_fill_super_submount() has returned success,
the FUSE mount isn't actually owned by the superblock. We should thus
reclaim ownership by clearing sb->s_fs_info, which will skip the call
to fuse_mount_remove(), and perform rollback, like virtio_fs_get_tree()
already does for the root sb.
Fixes: bf109c6404 ("fuse: implement crossmounts")
Cc: stable@vger.kernel.org # v5.10+
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9acc89d31f upstream.
EVM_ALLOW_METADATA_WRITES is an EVM initialization flag that can be set to
temporarily disable metadata verification until all xattrs/attrs necessary
to verify an EVM portable signature are copied to the file. This flag is
cleared when EVM is initialized with an HMAC key, to avoid that the HMAC is
calculated on unverified xattrs/attrs.
Currently EVM unnecessarily denies setting this flag if EVM is initialized
with a public key, which is not a concern as it cannot be used to trust
xattrs/attrs updates. This patch removes this limitation.
Fixes: ae1ba1676b ("EVM: Allow userland to permit modification of EVM-protected metadata")
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Cc: stable@vger.kernel.org # 4.16.x
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9eea290429 upstream.
evm_inode_init_security() requires an HMAC key to calculate the HMAC on
initial xattrs provided by LSMs. However, it checks generically whether a
key has been loaded, including also public keys, which is not correct as
public keys are not suitable to calculate the HMAC.
Originally, support for signature verification was introduced to verify a
possibly immutable initial ram disk, when no new files are created, and to
switch to HMAC for the root filesystem. By that time, an HMAC key should
have been loaded and usable to calculate HMACs for new files.
More recently support for requiring an HMAC key was removed from the
kernel, so that signature verification can be used alone. Since this is a
legitimate use case, evm_inode_init_security() should not return an error
when no HMAC key has been loaded.
This patch fixes this problem by replacing the evm_key_loaded() check with
a check of the EVM_INIT_HMAC flag in evm_initialized.
Fixes: 26ddabfe96 ("evm: enable EVM when X509 certificate is loaded")
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Cc: stable@vger.kernel.org # 4.5.x
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1d5c788099 upstream.
Perf errors out when sampling instructions:ppp.
$ perf record -e instructions:ppp -- true
Error:
The sys_perf_event_open() syscall returned with 22 (Invalid argument)
for event (instructions:ppp).
The instruction PDIR is only available on the fixed counter 0. The event
constraint has been updated to fixed0_constraint in
icl_get_event_constraints(). The Sapphire Rapids codes unconditionally
error out for the event which is not available on the GP counter 0.
Make the instructions:ppp an exception.
Fixes: 61b985e3e7 ("perf/x86/intel: Add perf core PMU support for Sapphire Rapids")
Reported-by: Yasin, Ahmad <ahmad.yasin@intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/1624029174-122219-4-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ee72a94ea4 upstream.
For some Alder Lake machine, the below fixed counter check warning may be
triggered.
[ 2.010766] hw perf events fixed 5 > max(4), clipping!
Current perf unconditionally increases the number of the GP counters and
the fixed counters for a big core PMU on an Alder Lake system, because
the number enumerated in the CPUID only reflects the common counters.
The big core may has more counters. However, Alder Lake may have an
alternative configuration. With that configuration,
the X86_FEATURE_HYBRID_CPU is not set. The number of the GP counters and
fixed counters enumerated in the CPUID is accurate. Perf mistakenly
increases the number of counters. The warning is triggered.
Directly use the enumerated value on the system with the alternative
configuration.
Fixes: f83d2f91d2 ("perf/x86/intel: Add Alder Lake Hybrid support")
Reported-by: Jin Yao <yao.jin@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/1624029174-122219-2-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7c6986ade6 upstream.
In raise_backtrace_ipi() we iterate through the cpumask of CPUs, sending
each an IPI asking them to do a backtrace, but we don't wait for the
backtrace to happen.
We then iterate through the CPU mask again, and if any CPU hasn't done
the backtrace and cleared itself from the mask, we print a trace on its
behalf, noting that the trace may be "stale".
This works well enough when a CPU is not responding, because in that
case it doesn't receive the IPI and the sending CPU is left to print the
trace. But when all CPUs are responding we are left with a race between
the sending and receiving CPUs, if the sending CPU wins the race then it
will erroneously print a trace.
This leads to spurious "stale" traces from the sending CPU, which can
then be interleaved messily with the receiving CPU, note the CPU
numbers, eg:
[ 1658.929157][ C7] rcu: Stack dump where RCU GP kthread last ran:
[ 1658.929223][ C7] Sending NMI from CPU 7 to CPUs 1:
[ 1658.929303][ C1] NMI backtrace for cpu 1
[ 1658.929303][ C7] CPU 1 didn't respond to backtrace IPI, inspecting paca.
[ 1658.929362][ C1] CPU: 1 PID: 325 Comm: kworker/1:1H Tainted: G W E 5.13.0-rc2+ #46
[ 1658.929405][ C7] irq_soft_mask: 0x01 in_mce: 0 in_nmi: 0 current: 325 (kworker/1:1H)
[ 1658.929465][ C1] Workqueue: events_highpri test_work_fn [test_lockup]
[ 1658.929549][ C7] Back trace of paca->saved_r1 (0xc0000000057fb400) (possibly stale):
[ 1658.929592][ C1] NIP: c00000000002cf50 LR: c008000000820178 CTR: c00000000002cfa0
To fix it, change the logic so that the sending CPU waits 5s for the
receiving CPU to print its trace. If the receiving CPU prints its trace
successfully then the sending CPU just continues, avoiding any spurious
"stale" trace.
This has the added benefit of allowing all CPUs to print their traces in
order and avoids any interleaving of their output.
Fixes: 5cc05910f2 ("powerpc/64s: Wire up arch_trigger_cpumask_backtrace()")
Cc: stable@vger.kernel.org # v4.18+
Reported-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210625140408.3351173-1-mpe@ellerman.id.au
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9913d5745b upstream.
All internal use cases for tracepoint_probe_register() is set to not ever
be called with the same function and data. If it is, it is considered a
bug, as that means the accounting of handling tracepoints is corrupted.
If the function and data for a tracepoint is already registered when
tracepoint_probe_register() is called, it will call WARN_ON_ONCE() and
return with EEXISTS.
The BPF system call can end up calling tracepoint_probe_register() with
the same data, which now means that this can trigger the warning because
of a user space process. As WARN_ON_ONCE() should not be called because
user space called a system call with bad data, there needs to be a way to
register a tracepoint without triggering a warning.
Enter tracepoint_probe_register_may_exist(), which can be called, but will
not cause a WARN_ON() if the probe already exists. It will still error out
with EEXIST, which will then be sent to the user space that performed the
BPF system call.
This keeps the previous testing for issues with other users of the
tracepoint code, while letting BPF call it with duplicated data and not
warn about it.
Link: https://lore.kernel.org/lkml/20210626135845.4080-1-penguin-kernel@I-love.SAKURA.ne.jp/
Link: https://syzkaller.appspot.com/bug?id=41f4318cf01762389f4d1c1c459da4f542fe5153
Cc: stable@vger.kernel.org
Fixes: c4f6699dfc ("bpf: introduce BPF_RAW_TRACEPOINT")
Reported-by: syzbot <syzbot+721aa903751db87aa244@syzkaller.appspotmail.com>
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Tested-by: syzbot+721aa903751db87aa244@syzkaller.appspotmail.com
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 26c5637310 upstream.
With the addition of simple mathematical operations (plus and minus), the
parsing of the "sym-offset" modifier broke, as it took the '-' part of the
"sym-offset" as a minus, and tried to break it up into a mathematical
operation of "field.sym - offset", in which case it failed to parse
(unless the event had a field called "offset").
Both .sym and .sym-offset modifiers should not be entered into
mathematical calculations anyway. If ".sym-offset" is found in the
modifier, then simply make it not an operation that can be calculated on.
Link: https://lkml.kernel.org/r/20210707110821.188ae255@oasis.local.home
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 100719dcef ("tracing: Add simple expression support to hist triggers")
Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1421ec684a upstream.
Resctrl test suite accepts command line argument "-t" to specify the
unit tests to run in the test list (e.g., -t mbm,mba,cmt,cat) as
documented in the help.
When calling strtok() to parse the option, the incorrect delimiters
argument ":\t" is used. As a result, passing "-t mbm,mba,cmt,cat" throws
an invalid option error.
Fix this by using delimiters argument "," instead of ":\t" for parsing
of unit tests list. At the same time, remove the unnecessary "spaces"
between the unit tests in help documentation to prevent confusion.
Fixes: 790bf585b0 ("selftests/resctrl: Add Cache Allocation Technology (CAT) selftest")
Fixes: 78941183d1 ("selftests/resctrl: Add Cache QoS Monitoring (CQM) selftest")
Fixes: ecdbb911f2 ("selftests/resctrl: Add MBM test")
Fixes: 034c7678dd ("selftests/resctrl: Add README for resctrl tests")
Cc: stable@vger.kernel.org
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 314538041b upstream.
In AP mode WPA2-PSK connections were not established.
The reason was that the AP was sending the first message
of the 4 way handshake encrypted, even though no pairwise
key had (correctly) yet been set.
Encryption was enabled if the "security_enable" driver flag
was set and encryption was not explicitly disabled by
IEEE80211_TX_INTFL_DONT_ENCRYPT.
However security_enable was set when *any* key, including
the AP GTK key, had been set which was causing unwanted
encryption even if no key was avaialble for the unicast
packet to be sent.
Fix this by adding a check that we have a key and drop
the old security_enable driver flag which is insufficient
and redundant.
The Redpine downstream out of tree driver does it this way too.
Regarding the Fixes tag the actual code being modified was
introduced earlier, with the original driver submission, in
dad0d04fa7 ("rsi: Add RS9113 wireless driver"), however
at that time AP mode was not yet supported so there was
no bug at that point.
So I have tagged the introduction of AP support instead
which was part of the patch set "rsi: support for AP mode" [1]
It is not clear whether AP WPA has ever worked, I can see nothing
on the kernel side that broke it afterwards yet the AP support
patch series says "Tests are performed to confirm aggregation,
connections in WEP and WPA/WPA2 security."
One possibility is that the initial tests were done with a modified
userspace (hostapd).
[1] https://www.spinics.net/lists/linux-wireless/msg165302.html
Signed-off-by: Martin Fuzzey <martin.fuzzey@flowbird.group>
Fixes: 38ef62353a ("rsi: security enhancements for AP mode")
CC: stable@vger.kernel.org
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1622564459-24430-1-git-send-email-martin.fuzzey@flowbird.group
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fb312ac5cc upstream.
I got this crash more times during debugging of PCIe controller and crash
happens somehow at the time when PCIe kernel code started link retraining (as
part of ASPM code) when at the same time PCIe link went down and ath9k probably
executed hw reset procedure.
Currently I'm not able to reproduce this issue as it looks like to be
some race condition between link training, ASPM, link down and reset
path. And as always, race conditions which depends on more input
parameters are hard to reproduce as it depends on precise timings.
But it is clear that pointers are zero in this case and should be
properly filled as same code pattern is used in ath9k_stop() function.
Anyway I was able to reproduce this crash by manually triggering ath
reset worker prior putting card up. I created simple patch to export
reset functionality via debugfs and use it to "simulate" of triggering
reset. s proved that NULL-pointer dereference issue is there.
Function ath9k_hw_reset() is dereferencing chan structure pointer, so it
needs to be non-NULL pointer.
Function ath9k_stop() already contains code which sets ah->curchan to valid
non-NULL pointer prior calling ath9k_hw_reset() function.
Add same code pattern also into ath_reset_internal() function to prevent
kernel NULL pointer dereference in ath9k_hw_reset() function.
This change fixes kernel NULL pointer dereference in ath9k_hw_reset() which
is caused by calling ath9k_hw_reset() from ath_reset_internal() with NULL
chan structure.
[ 45.334305] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[ 45.344417] Mem abort info:
[ 45.347301] ESR = 0x96000005
[ 45.350448] EC = 0x25: DABT (current EL), IL = 32 bits
[ 45.356166] SET = 0, FnV = 0
[ 45.359350] EA = 0, S1PTW = 0
[ 45.362596] Data abort info:
[ 45.365756] ISV = 0, ISS = 0x00000005
[ 45.369735] CM = 0, WnR = 0
[ 45.372814] user pgtable: 4k pages, 39-bit VAs, pgdp=000000000685d000
[ 45.379663] [0000000000000008] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
[ 45.388856] Internal error: Oops: 96000005 [#1] SMP
[ 45.393897] Modules linked in: ath9k ath9k_common ath9k_hw
[ 45.399574] CPU: 1 PID: 309 Comm: kworker/u4:2 Not tainted 5.12.0-rc2-dirty #785
[ 45.414746] Workqueue: phy0 ath_reset_work [ath9k]
[ 45.419713] pstate: 40000005 (nZcv daif -PAN -UAO -TCO BTYPE=--)
[ 45.425910] pc : ath9k_hw_reset+0xc4/0x1c48 [ath9k_hw]
[ 45.431234] lr : ath9k_hw_reset+0xc0/0x1c48 [ath9k_hw]
[ 45.436548] sp : ffffffc0118dbca0
[ 45.439961] x29: ffffffc0118dbca0 x28: 0000000000000000
[ 45.445442] x27: ffffff800dee4080 x26: 0000000000000000
[ 45.450923] x25: ffffff800df9b9d8 x24: 0000000000000000
[ 45.456404] x23: ffffffc0115f6000 x22: ffffffc008d0d408
[ 45.461885] x21: ffffff800dee5080 x20: ffffff800df9b9d8
[ 45.467366] x19: 0000000000000000 x18: 0000000000000000
[ 45.472846] x17: 0000000000000000 x16: 0000000000000000
[ 45.478326] x15: 0000000000000010 x14: ffffffffffffffff
[ 45.483807] x13: ffffffc0918db94f x12: ffffffc011498720
[ 45.489289] x11: 0000000000000003 x10: ffffffc0114806e0
[ 45.494770] x9 : ffffffc01014b2ec x8 : 0000000000017fe8
[ 45.500251] x7 : c0000000ffffefff x6 : 0000000000000001
[ 45.505733] x5 : 0000000000000000 x4 : 0000000000000000
[ 45.511213] x3 : 0000000000000000 x2 : ffffff801fece870
[ 45.516693] x1 : ffffffc00eded000 x0 : 000000000000003f
[ 45.522174] Call trace:
[ 45.524695] ath9k_hw_reset+0xc4/0x1c48 [ath9k_hw]
[ 45.529653] ath_reset_internal+0x1a8/0x2b8 [ath9k]
[ 45.534696] ath_reset_work+0x2c/0x40 [ath9k]
[ 45.539198] process_one_work+0x210/0x480
[ 45.543339] worker_thread+0x5c/0x510
[ 45.547115] kthread+0x12c/0x130
[ 45.550445] ret_from_fork+0x10/0x1c
[ 45.554138] Code: 910922c2 9117e021 95ff0398 b4000294 (b9400a61)
[ 45.560430] ---[ end trace 566410ba90b50e8b ]---
[ 45.565193] Kernel panic - not syncing: Oops: Fatal exception in interrupt
[ 45.572282] SMP: stopping secondary CPUs
[ 45.576331] Kernel Offset: disabled
[ 45.579924] CPU features: 0x00040002,0000200c
[ 45.584416] Memory Limit: none
[ 45.587564] Rebooting in 3 seconds..
Signed-off-by: Pali Rohár <pali@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20210402122653.24014-1-pali@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 08a84410a0 upstream.
Stop dmaengine transfer in sci_stop_tx(). Otherwise, the following
message is possible output when system enters suspend and while
transferring data, because clearing TIE bit in SCSCR is not able to
stop any dmaengine transfer.
sh-sci e6550000.serial: ttySC1: Unable to drain transmitter
Note that this driver has already used some #ifdef in the .c file
so that this patch also uses #ifdef to fix the issue. Otherwise,
build errors happens if the CONFIG_SERIAL_SH_SCI_DMA is disabled.
Fixes: 73a19e4c03 ("serial: sh-sci: Add DMA support.")
Cc: <stable@vger.kernel.org> # v4.9+
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Link: https://lore.kernel.org/r/20210610110806.277932-1-yoshihiro.shimoda.uh@renesas.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f407e2dca0 upstream.
On machines with dual accelerometers described in a single ACPI fwnode,
the bmc150_accel_probe() instantiates a second i2c-client for the second
accelerometer.
A pointer to this manually instantiated second i2c-client is stored
inside the iio_dev's private-data through bmc150_set_second_device(),
so that the i2c-client can be unregistered from bmc150_accel_remove().
Before this commit bmc150_set_second_device() took only 1 argument so it
would store the pointer in private-data of the iio_dev belonging to the
manually instantiated i2c-client, leading to the bmc150_accel_remove()
call for the second_dev trying to unregister *itself* while it was
being removed, leading to a deadlock and rmmod hanging.
Change bmc150_set_second_device() to take 2 arguments: 1. The i2c-client
which is instantiating the second i2c-client for the 2nd accelerometer and
2. The second-device pointer itself (which also is an i2c-client).
This will store the second_device pointer in the private data of the
iio_dev belonging to the (ACPI instantiated) i2c-client for the first
accelerometer and will make bmc150_accel_remove() unregister the
second_device i2c-client when called for the first client,
avoiding the deadlock.
Fixes: 5bfb3a4bd8 ("iio: accel: bmc150: Check for a second ACPI device for BOSC0200")
Cc: Jeremy Cline <jeremy@jcline.org>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f2bf22dc9e upstream.
The drvdata for iio-parent devices points to the struct iio_dev for
the iio-device. So by directly casting the return from i2c_get_clientdata()
to struct bmc150_accel_data * the code was ending up storing the second_dev
pointer in (and retrieving it from) some semi-random offset inside
struct iio_dev, rather then storing it in the second_dev member of the
bmc150_accel_data struct.
Fix the code to get the struct bmc150_accel_data * pointer to call
iio_priv() on the struct iio_dev * returned by i2c_get_clientdata(),
so that the correct pointer gets dereferenced.
This fixes the following oops on rmmod, caused by trying to
dereference the wrong return of bmc150_get_second_device():
[ 238.980737] BUG: unable to handle page fault for address: 0000000000004710
[ 238.980755] #PF: supervisor read access in kernel mode
[ 238.980760] #PF: error_code(0x0000) - not-present page
...
[ 238.980841] i2c_unregister_device.part.0+0x19/0x60
[ 238.980856] 0xffffffffc0815016
[ 238.980863] i2c_device_remove+0x25/0xb0
[ 238.980869] __device_release_driver+0x180/0x240
[ 238.980876] driver_detach+0xd4/0x120
[ 238.980882] bus_remove_driver+0x5b/0xd0
[ 238.980888] i2c_del_driver+0x44/0x70
While at it also remove the now no longer sensible checks for data
being NULL, iio_priv never returns NULL for an iio_dev with non 0
sized private-data.
Fixes: 5bfb3a4bd8 ("iio: accel: bmc150: Check for a second ACPI device for BOSC0200")
Cc: Jeremy Cline <jeremy@jcline.org>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2ac0b029a0 upstream.
The regmap is configured for 8 bit registers, uses a RB-Tree cache and
marks several registers as volatile (i.e. do not cache).
The ALS and PS data registers in the chip are 16 bit wide and spans
two regmap registers. In the current driver only the base register is
marked as volatile, resulting in the upper register only read once.
Further the data sheet notes:
| When the I2C read operation starts, all four ALS data registers are
| locked until the I2C read operation of register 0x8B is completed.
Which results in the registers never update after the 2nd read.
This patch fixes the problem by marking the upper 8 bits of the ALS
and PS registers as volatile, too.
Fixes: 2f2c96338a ("iio: ltr501: Add regmap support.")
Reported-by: Oliver Lang <Oliver.Lang@gossenmetrawatt.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Tested-by: Nikita Travkin <nikita@trvn.ru> # ltr559
Link: https://lore.kernel.org/r/20210610134619.2101372-2-mkl@pengutronix.de
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7cd04c863f upstream.
Allocating an IRQ is conditional to the IRQ existence, but freeing it
was not. If no IRQ was allocate, the driver would still try to free
IRQ 0. Add the missing checks.
This fixes the following trace when the driver is removed:
[ 100.667788] Trying to free already-free IRQ 0
[ 100.667793] WARNING: CPU: 0 PID: 2315 at kernel/irq/manage.c:1826 free_irq+0x1fd/0x370
...
[ 100.667914] Call Trace:
[ 100.667920] tcs3472_remove+0x3a/0x90 [tcs3472]
[ 100.667927] i2c_device_remove+0x2b/0xa0
Signed-off-by: frank zago <frank@zago.net>
Link: https://lore.kernel.org/r/20210427022017.19314-2-frank@zago.net
Fixes: 9d2f715d59 ("iio: light: tcs3472: support out-of-threshold events")
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d927ccfccb upstream.
The kernel writes to swap files on f2fs directly without the assistance
of the filesystem. This direct write by kernel can be non-sequential
even when the f2fs is in LFS mode. Such non-sequential write conflicts
with the LFS semantics. Especially when f2fs is set up on zoned block
devices, the non-sequential write causes unaligned write command errors.
To avoid the non-sequential writes to swap files, prevent swap file
activation when the filesystem is in LFS mode.
Fixes: 4969c06a0d ("f2fs: support swap file w/ DIO")
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Cc: stable@vger.kernel.org # v5.10+
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4c039d5452 upstream.
Older kernels don't support encryption with casefolding. This adds
the sysfs entry encrypted_casefold to show support for those combined
features. Support for this feature was originally added by
commit 7ad08a58bf ("f2fs: Handle casefolding with Encryption")
Fixes: 7ad08a58bf ("f2fs: Handle casefolding with Encryption")
Cc: stable@vger.kernel.org # v5.11+
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 85b18d7b5e upstream.
Turns out that the bit 61 in the TEID is not always 1 and if that's
the case the address space ID and the address are
unpredictable. Without an address and its address space ID we can't
export memory and hence we can only send a SIGSEGV to the process or
panic the kernel depending on who caused the exception.
Unfortunately bit 61 is only reliable if we have the "misc" UV feature
bit.
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Fixes: 084ea4d611 ("s390/mm: add (non)secure page access exceptions handlers")
Cc: stable@vger.kernel.org
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c749d8c018 upstream.
Currently css_wait_for_slow_path() gets called inside the chp->lock.
The path-verification-loop of slowpath inside this lock could lead to
deadlock as reported by the lockdep validator.
The ccw_device_get_chp_desc() during the instance of a device-set-online
would try to acquire the same 'chp->lock' to read the chp->desc.
The instance of this function can get called from multiple scenario,
like probing or setting-device online manually. This could, in some
corner-cases lead to the deadlock.
lockdep validator reported this as,
CPU0 CPU1
---- ----
lock(&chp->lock);
lock(kn->active#43);
lock(&chp->lock);
lock((wq_completion)cio);
The chp->lock was introduced to serialize the access of struct
channel_path. This lock is not needed for the css_wait_for_slow_path()
function, so invoke the slow-path function outside this lock.
Fixes: b730f3a933 ("[S390] cio: add lock to struct channel_path")
Cc: <stable@vger.kernel.org>
Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Vineeth Vijayan <vneethv@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 49c6f8756c upstream.
Invalidate all MMUs' roles after a CPUID update to force reinitizliation
of the MMU context/helpers. Despite the efforts of commit de3ccd26fa
("KVM: MMU: record maximum physical address width in kvm_mmu_extended_role"),
there are still a handful of CPUID-based properties that affect MMU
behavior but are not incorporated into mmu_role. E.g. 1gb hugepage
support, AMD vs. Intel handling of bit 8, and SEV's C-Bit location all
factor into the guest's reserved PTE bits.
The obvious alternative would be to add all such properties to mmu_role,
but doing so provides no benefit over simply forcing a reinitialization
on every CPUID update, as setting guest CPUID is a rare operation.
Note, reinitializing all MMUs after a CPUID update does not fix all of
KVM's woes. Specifically, kvm_mmu_page_role doesn't track the CPUID
properties, which means that a vCPU can reuse shadow pages that should
not exist for the new vCPU model, e.g. that map GPAs that are now illegal
(due to MAXPHYADDR changes) or that set bits that are now reserved
(PAGE_SIZE for 1gb pages), etc...
Tracking the relevant CPUID properties in kvm_mmu_page_role would address
the majority of problems, but fully tracking that much state in the
shadow page role comes with an unpalatable cost as it would require a
non-trivial increase in KVM's memory footprint. The GBPAGES case is even
worse, as neither Intel nor AMD provides a way to disable 1gb hugepage
support in the hardware page walker, i.e. it's a virtualization hole that
can't be closed when using TDP.
In other words, resetting the MMU after a CPUID update is largely a
superficial fix. But, it will allow reverting the tracking of MAXPHYADDR
in the mmu_role, and that case in particular needs to mostly work because
KVM's shadow_root_level depends on guest MAXPHYADDR when 5-level paging
is supported. For cases where KVM botches guest behavior, the damage is
limited to that guest. But for the shadow_root_level, a misconfigured
MMU can cause KVM to incorrectly access memory, e.g. due to walking off
the end of its shadow page tables.
Fixes: 7dcd575520 ("x86/kvm/mmu: check if tdp/shadow MMU reconfiguration is needed")
Cc: Yu Zhang <yu.c.zhang@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-7-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0aa1837533 upstream.
Reset the MMU context at vCPU INIT (and RESET for good measure) if CR0.PG
was set prior to INIT. Simply re-initializing the current MMU is not
sufficient as the current root HPA may not be usable in the new context.
E.g. if TDP is disabled and INIT arrives while the vCPU is in long mode,
KVM will fail to switch to the 32-bit pae_root and bomb on the next
VM-Enter due to running with a 64-bit CR3 in 32-bit mode.
This bug was papered over in both VMX and SVM, but still managed to rear
its head in the MMU role on VMX. Because EFER.LMA=1 requires CR0.PG=1,
kvm_calc_shadow_mmu_root_page_role() checks for EFER.LMA without first
checking CR0.PG. VMX's RESET/INIT flow writes CR0 before EFER, and so
an INIT with the vCPU in 64-bit mode will cause the hack-a-fix to
generate the wrong MMU role.
In VMX, the INIT issue is specific to running without unrestricted guest
since unrestricted guest is available if and only if EPT is enabled.
Commit 8668a3c468 ("KVM: VMX: Reset mmu context when entering real
mode") resolved the issue by forcing a reset when entering emulated real
mode.
In SVM, commit ebae871a50 ("kvm: svm: reset mmu on VCPU reset") forced
a MMU reset on every INIT to workaround the flaw in common x86. Note, at
the time the bug was fixed, the SVM problem was exacerbated by a complete
lack of a CR4 update.
The vendor resets will be reverted in future patches, primarily to aid
bisection in case there are non-INIT flows that rely on the existing VMX
logic.
Because CR0.PG is unconditionally cleared on INIT, and because CR0.WP and
all CR4/EFER paging bits are ignored if CR0.PG=0, simply checking that
CR0.PG was '1' prior to INIT/RESET is sufficient to detect a required MMU
context reset.
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ef318b9edf upstream.
Use the MMU's role to get its effective SMEP value when injecting a fault
into the guest. When walking L1's (nested) NPT while L2 is active, vCPU
state will reflect L2, whereas NPT uses the host's (L1 in this case) CR0,
CR4, EFER, etc... If L1 and L2 have different settings for SMEP and
L1 does not have EFER.NX=1, this can result in an incorrect PFEC.FETCH
when injecting #NPF.
Fixes: e57d4a356a ("KVM: Add instruction fetch checking when walking guest page table")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 112022bdb5 upstream.
Mark NX as being used for all non-nested shadow MMUs, as KVM will set the
NX bit for huge SPTEs if the iTLB mutli-hit mitigation is enabled.
Checking the mitigation itself is not sufficient as it can be toggled on
at any time and KVM doesn't reset MMU contexts when that happens. KVM
could reset the contexts, but that would require purging all SPTEs in all
MMUs, for no real benefit. And, KVM already forces EFER.NX=1 when TDP is
disabled (for WP=0, SMEP=1, NX=0), so technically NX is never reserved
for shadow MMUs.
Fixes: b8e8c8303f ("kvm: mmu: ITLB_MULTIHIT mitigation")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f0d4379087 upstream.
Remove a misguided WARN that attempts to detect the scenario where using
a special A/D tracking flag will set reserved bits on a non-MMIO spte.
The WARN triggers false positives when using EPT with 32-bit KVM because
of the !64-bit clause, which is just flat out wrong. The whole A/D
tracking goo is specific to EPT, and one of the big selling points of EPT
is that EPT is decoupled from the host's native paging mode.
Drop the WARN instead of trying to salvage the check. Keeping a check
specific to A/D tracking bits would essentially regurgitate the same code
that led to KVM needed the tracking bits in the first place.
A better approach would be to add a generic WARN on reserved bits being
set, which would naturally cover the A/D tracking bits, work for all
flavors of paging, and be self-documenting to some extent.
Fixes: 8a406c8953 ("KVM: x86/mmu: Rename and document A/D scheme for TDP SPTEs")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b33bb78a1f upstream.
Mark #ACs that won't be reinjected to the guest as wanted by L0 so that
KVM handles split-lock #AC from L2 instead of forwarding the exception to
L1. Split-lock #AC isn't yet virtualized, i.e. L1 will treat it like a
regular #AC and do the wrong thing, e.g. reinject it into L2.
Fixes: e6f8b6c12f ("KVM: VMX: Extend VMXs #AC interceptor to handle split lock #AC in guest")
Cc: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210622172244.3561540-1-seanjc@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c24d373225 upstream.
try_grab_compound_head() is used to grab a reference to a page from
get_user_pages_fast(), which is only protected against concurrent freeing
of page tables (via local_irq_save()), but not against concurrent TLB
flushes, freeing of data pages, or splitting of compound pages.
Because no reference is held to the page when try_grab_compound_head() is
called, the page may have been freed and reallocated by the time its
refcount has been elevated; therefore, once we're holding a stable
reference to the page, the caller re-checks whether the PTE still points
to the same page (with the same access rights).
The problem is that try_grab_compound_head() has to grab a reference on
the head page; but between the time we look up what the head page is and
the time we actually grab a reference on the head page, the compound page
may have been split up (either explicitly through split_huge_page() or by
freeing the compound page to the buddy allocator and then allocating its
individual order-0 pages). If that happens, get_user_pages_fast() may end
up returning the right page but lifting the refcount on a now-unrelated
page, leading to use-after-free of pages.
To fix it: Re-check whether the pages still belong together after lifting
the refcount on the head page. Move anything else that checks
compound_head(page) below the refcount increment.
This can't actually happen on bare-metal x86 (because there, disabling
IRQs locks out remote TLB flushes), but it can happen on virtualized x86
(e.g. under KVM) and probably also on arm64. The race window is pretty
narrow, and constantly allocating and shattering hugepages isn't exactly
fast; for now I've only managed to reproduce this in an x86 KVM guest with
an artificially widened timing window (by adding a loop that repeatedly
calls `inl(0x3f8 + 5)` in `try_get_compound_head()` to force VM exits, so
that PV TLB flushes are used instead of IPIs).
As requested on the list, also replace the existing VM_BUG_ON_PAGE() with
a warning and bailout. Since the existing code only performed the BUG_ON
check on DEBUG_VM kernels, ensure that the new code also only performs the
check under that configuration - I don't want to mix two logically
separate changes together too much. The macro VM_WARN_ON_ONCE_PAGE()
doesn't return a value on !DEBUG_VM, so wrap the whole check in an #ifdef
block. An alternative would be to change the VM_WARN_ON_ONCE_PAGE()
definition for !DEBUG_VM such that it always returns false, but since that
would differ from the behavior of the normal WARN macros, it might be too
confusing for readers.
Link: https://lkml.kernel.org/r/20210615012014.1100672-1-jannh@google.com
Fixes: 7aef4172c7 ("mm: handle PTE-mapped tail pages in gerneric fast gup implementaiton")
Signed-off-by: Jann Horn <jannh@google.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Jan Kara <jack@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 02b49cd117 upstream.
During system resume, MHI host triggers M3->M0 transition and then waits
for target device to enter M0 state. Once done, the device queues a state
change event into ctrl event ring and notifies MHI host by raising an
interrupt, where a tasklet is scheduled to process this event. In most
cases, the tasklet is served timely and wait operation succeeds.
However, there are cases where CPU is busy and cannot serve this tasklet
for some time. Once delay goes long enough, the device moves itself to M1
state and also interrupts MHI host after inserting a new state change
event to ctrl ring. Later when CPU finally has time to process the ring,
there will be two events:
1. For M3->M0 event, which is the first event to be processed queued first.
The tasklet handler serves the event, updates device state to M0 and
wakes up the task.
2. For M0->M1 event, which is processed later, the tasklet handler
triggers M1->M2 transition and updates device state to M2 directly,
then wakes up the MHI host (if it is still sleeping on this wait queue).
Note that although MHI host has been woken up while processing the first
event, it may still has no chance to run before the second event is
processed. In other words, MHI host has to keep waiting till timeout
causing the M0 state to be missed.
kernel log here:
...
Apr 15 01:45:14 test-NUC8i7HVK kernel: [ 4247.911251] mhi 0000:06:00.0: Entered with PM state: M3, MHI state: M3
Apr 15 01:45:14 test-NUC8i7HVK kernel: [ 4247.917762] mhi 0000:06:00.0: State change event to state: M0
Apr 15 01:45:14 test-NUC8i7HVK kernel: [ 4247.917767] mhi 0000:06:00.0: State change event to state: M1
Apr 15 01:45:14 test-NUC8i7HVK kernel: [ 4338.788231] mhi 0000:06:00.0: Did not enter M0 state, MHI state: M2, PM state: M2
...
Fix this issue by simply adding M2 as a valid state for resume.
Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-01720.1-QCAHSPSWPL_V1_V2_SILICONZ_LITE-1
Cc: stable@vger.kernel.org
Fixes: 0c6b20a1d7 ("bus: mhi: core: Add support for MHI suspend and resume")
Signed-off-by: Baochen Qiang <bqiang@codeaurora.org>
Reviewed-by: Hemant Kumar <hemantk@codeaurora.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20210524040312.14409-1-bqiang@codeaurora.org
[mani: slightly massaged the commit message]
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20210621161616.77524-4-manivannan.sadhasivam@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5483b904bf upstream.
When find a task from wait queue to wake up, a non-privileged task may
be found out, rather than the privileged. This maybe lead a deadlock
same as commit dfe1fe75e0 ("NFSv4: Fix deadlock between nfs4_evict_inode()
and nfs4_opendata_get_inode()"):
Privileged delegreturn task is queued to privileged list because all
the slots are assigned. If there has no enough slot to wake up the
non-privileged batch tasks(session less than 8 slot), then the privileged
delegreturn task maybe lost waked up because the found out task can't
get slot since the session is on draining.
So we should treate the privileged task as the emergency task, and
execute it as for as we can.
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: 5fcdfacc01 ("NFSv4: Return delegations synchronously in evict_inode")
Cc: stable@vger.kernel.org
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fcb170a9d8 upstream.
The 'queue->nr' will wraparound from 0 to 255 when only current
priority queue has tasks. This maybe lead a deadlock same as commit
dfe1fe75e0 ("NFSv4: Fix deadlock between nfs4_evict_inode()
and nfs4_opendata_get_inode()"):
Privileged delegreturn task is queued to privileged list because all
the slots are assigned. When non-privileged task complete and release
the slot, a non-privileged maybe picked out. It maybe allocate slot
failed when the session on draining.
If the 'queue->nr' has wraparound to 255, and no enough slot to
service it, then the privileged delegreturn will lost to wake up.
So we should avoid the wraparound on 'queue->nr'.
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: 5fcdfacc01 ("NFSv4: Return delegations synchronously in evict_inode")
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Cc: stable@vger.kernel.org
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 14a4696bc3 upstream.
When closing the isotp socket, the potentially running hrtimers are
canceled before removing the subscription for CAN identifiers via
can_rx_unregister().
This may lead to an unintended (re)start of a hrtimer in
isotp_rcv_cf() and isotp_rcv_fc() in the case that a CAN frame is
received by isotp_rcv() while the subscription removal is processed.
However, isotp_rcv() is called under RCU protection, so after calling
can_rx_unregister, we may call synchronize_rcu in order to wait for
any RCU read-side critical sections to finish. This prevents the
reception of CAN frames after hrtimer_cancel() and therefore the
unintended (re)start of the hrtimers.
Link: https://lore.kernel.org/r/20210618173713.2296-1-socketcan@hartkopp.net
Fixes: e057dd3fc2 ("can: add ISO 15765-2:2016 transport protocol")
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d5f9023fa6 upstream.
can_rx_register() callbacks may be called concurrently to the call to
can_rx_unregister(). The callbacks and callback data, though, are
protected by RCU and the struct sock reference count.
So the callback data is really attached to the life of sk, meaning
that it should be released on sk_destruct. However, bcm_remove_op()
calls tasklet_kill(), and RCU callbacks may be called under RCU
softirq, so that cannot be used on kernels before the introduction of
HRTIMER_MODE_SOFT.
However, bcm_rx_handler() is called under RCU protection, so after
calling can_rx_unregister(), we may call synchronize_rcu() in order to
wait for any RCU read-side critical sections to finish. That is,
bcm_rx_handler() won't be called anymore for those ops. So, we only
free them, after we do that synchronize_rcu().
Fixes: ffd980f976 ("[CAN]: Add broadcast manager (bcm) protocol")
Link: https://lore.kernel.org/r/20210619161813.2098382-1-cascardo@canonical.com
Cc: linux-stable <stable@vger.kernel.org>
Reported-by: syzbot+0f7e7e5e2f4f40fa89c0@syzkaller.appspotmail.com
Reported-by: Norbert Slusarek <nslusarek@gmx.net>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b9a037b7f3 upstream.
In ext4_orphan_cleanup(), if ext4_truncate() failed to get a transaction
handle, it didn't remove the inode from the in-core orphan list, which
may probably trigger below error dump in ext4_destroy_inode() during the
final iput() and could lead to memory corruption on the later orphan
list changes.
EXT4-fs (sda): Inode 6291467 (00000000b8247c67): orphan list check failed!
00000000b8247c67: 0001f30a 00000004 00000000 00000023 ............#...
00000000e24cde71: 00000006 014082a3 00000000 00000000 ......@.........
0000000072c6a5ee: 00000000 00000000 00000000 00000000 ................
...
This patch fix this by cleanup in-core orphan list manually if
ext4_truncate() return error.
Cc: stable@kernel.org
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20210507071904.160808-1-yi.zhang@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6819703f5a upstream.
The defrag loop processes leaves in batches and starting transaction for
each. The whole defragmentation on a given root is protected by a bit
but in case the transaction fails, the bit is not cleared
In case the transaction fails the bit would prevent starting
defragmentation again, so make sure it's cleared.
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 44365827cc upstream.
qgroup_account_snapshot() is trying to unlock the not taken
tree_log_mutex in a error path. Since ret != 0 in this case, we can
just return from here.
Fixes: 2a4d84c11a ("btrfs: move delayed ref flushing for qgroup into qgroup helper")
CC: stable@vger.kernel.org # 5.12+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f216562731 upstream.
The early check if we should attempt compression does not take into
account the number of input pages. It can happen that there's only one
page, eg. a tail page after some ranges of the BTRFS_MAX_UNCOMPRESSED
have been processed, or an isolated page that won't be converted to an
inline extent.
The single page would be compressed but a later check would drop it
again because the result size must be at least one block shorter than
the input. That can never work with just one page.
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d8ac76cdd1 upstream.
During an incremental send operation, when processing the new references
for the current inode, we might send an unlink operation for another inode
that has a conflicting path and has more than one hard link. However this
path was computed and cached before we processed previous new references
for the current inode. We may have orphanized a directory of that path
while processing a previous new reference, in which case the path will
be invalid and cause the receiver process to fail.
The following reproducer triggers the problem and explains how/why it
happens in its comments:
$ cat test-send-unlink.sh
#!/bin/bash
DEV=/dev/sdi
MNT=/mnt/sdi
mkfs.btrfs -f $DEV >/dev/null
mount $DEV $MNT
# Create our test files and directory. Inode 259 (file3) has two hard
# links.
touch $MNT/file1
touch $MNT/file2
touch $MNT/file3
mkdir $MNT/A
ln $MNT/file3 $MNT/A/hard_link
# Filesystem looks like:
#
# . (ino 256)
# |----- file1 (ino 257)
# |----- file2 (ino 258)
# |----- file3 (ino 259)
# |----- A/ (ino 260)
# |---- hard_link (ino 259)
#
# Now create the base snapshot, which is going to be the parent snapshot
# for a later incremental send.
btrfs subvolume snapshot -r $MNT $MNT/snap1
btrfs send -f /tmp/snap1.send $MNT/snap1
# Move inode 257 into directory inode 260. This results in computing the
# path for inode 260 as "/A" and caching it.
mv $MNT/file1 $MNT/A/file1
# Move inode 258 (file2) into directory inode 260, with a name of
# "hard_link", moving first inode 259 away since it currently has that
# location and name.
mv $MNT/A/hard_link $MNT/tmp
mv $MNT/file2 $MNT/A/hard_link
# Now rename inode 260 to something else (B for example) and then create
# a hard link for inode 258 that has the old name and location of inode
# 260 ("/A").
mv $MNT/A $MNT/B
ln $MNT/B/hard_link $MNT/A
# Filesystem now looks like:
#
# . (ino 256)
# |----- tmp (ino 259)
# |----- file3 (ino 259)
# |----- B/ (ino 260)
# | |---- file1 (ino 257)
# | |---- hard_link (ino 258)
# |
# |----- A (ino 258)
# Create another snapshot of our subvolume and use it for an incremental
# send.
btrfs subvolume snapshot -r $MNT $MNT/snap2
btrfs send -f /tmp/snap2.send -p $MNT/snap1 $MNT/snap2
# Now unmount the filesystem, create a new one, mount it and try to
# apply both send streams to recreate both snapshots.
umount $DEV
mkfs.btrfs -f $DEV >/dev/null
mount $DEV $MNT
# First add the first snapshot to the new filesystem by applying the
# first send stream.
btrfs receive -f /tmp/snap1.send $MNT
# The incremental receive operation below used to fail with the
# following error:
#
# ERROR: unlink A/hard_link failed: No such file or directory
#
# This is because when send is processing inode 257, it generates the
# path for inode 260 as "/A", since that inode is its parent in the send
# snapshot, and caches that path.
#
# Later when processing inode 258, it first processes its new reference
# that has the path of "/A", which results in orphanizing inode 260
# because there is a a path collision. This results in issuing a rename
# operation from "/A" to "/o260-6-0".
#
# Finally when processing the new reference "B/hard_link" for inode 258,
# it notices that it collides with inode 259 (not yet processed, because
# it has a higher inode number), since that inode has the name
# "hard_link" under the directory inode 260. It also checks that inode
# 259 has two hardlinks, so it decides to issue a unlink operation for
# the name "hard_link" for inode 259. However the path passed to the
# unlink operation is "/A/hard_link", which is incorrect since currently
# "/A" does not exists, due to the orphanization of inode 260 mentioned
# before. The path is incorrect because it was computed and cached
# before the orphanization. This results in the receiver to fail with
# the above error.
btrfs receive -f /tmp/snap2.send $MNT
umount $MNT
When running the test, it fails like this:
$ ./test-send-unlink.sh
Create a readonly snapshot of '/mnt/sdi' in '/mnt/sdi/snap1'
At subvol /mnt/sdi/snap1
Create a readonly snapshot of '/mnt/sdi' in '/mnt/sdi/snap2'
At subvol /mnt/sdi/snap2
At subvol snap1
At snapshot snap2
ERROR: unlink A/hard_link failed: No such file or directory
Fix this by recomputing a path before issuing an unlink operation when
processing the new references for the current inode if we previously
have orphanized a directory.
A test case for fstests will follow soon.
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 06e1e7f422 upstream.
If we can't read a reliable write pointer from a sequential zone fail
creating the block group with an I/O error.
Also if the read write pointer is beyond the end of the respective zone,
fail the creation of the block group on this zone with an I/O error.
While this could also happen in real world scenarios with misbehaving
drives, this issue addresses a problem uncovered by fstests' test case
generic/475.
CC: stable@vger.kernel.org # 5.12+
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c8671c7dc7 upstream.
Annotate the firmware files CCP might need using MODULE_FIRMWARE().
This will get them included into an initrd when CCP is also included
there. Otherwise the CCP module will not find its firmware when loaded
before the root-fs is mounted.
This can cause problems when the pre-loaded SEV firmware is too old to
support current SEV and SEV-ES virtualization features.
Fixes: e93720606e ("crypto: ccp - Allow SEV firmware to be chosen based on Family and Model")
Cc: stable@vger.kernel.org # v4.20+
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f8f84af5da upstream.
Even though we validate user-provided inputs we then traverse past
validated data when applying the new map. The issue was originally
discovered by Murray McAllister with this simple POC (if the following
is executed by an unprivileged user it will instantly panic the system):
int main(void) {
int fd, ret;
unsigned int buffer[10000];
fd = open("/dev/input/js0", O_RDONLY);
if (fd == -1)
printf("Error opening file\n");
ret = ioctl(fd, JSIOCSBTNMAP & ~IOCSIZE_MASK, &buffer);
printf("%d\n", ret);
}
The solution is to traverse internal buffer which is guaranteed to only
contain valid date when constructing the map.
Fixes: 182d679b22 ("Input: joydev - prevent potential read overflow in ioctl")
Fixes: 999b874f4a ("Input: joydev - validate axis/button maps before clobbering current ones")
Reported-by: Murray McAllister <murray.mcallister@gmail.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Alexander Larkin <avlarkin82@gmail.com>
Link: https://lore.kernel.org/r/20210620120030.1513655-1-avlarkin82@gmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 08aa647960 upstream.
In situation when copy_page_to_iter() got a compound page the current
code would only work on systems with no CONFIG_HIGHMEM. It *is* the majority
of real-world setups, or we would've drown in bug reports by now. Still needs
fixing.
Current variant works for solitary page; rename that to
__copy_page_to_iter() and turn the handling of compound pages into a loop over
subpages.
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5d49d3508b upstream.
On an error path, init_statfs calls iput(pn) after pn has already been put.
Fix that by setting pn to NULL after the initial iput.
Fixes: 97fd734ba1 ("gfs2: lookup local statfs inodes prior to journal recovery")
Cc: stable@vger.kernel.org # v5.10+
Reported-by: Jing Xiangfeng <jingxiangfeng@huawei.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d3c51c55cb upstream.
On filesystems with a block size smaller than PAGE_SIZE and non-empty
files smaller then PAGE_SIZE, gfs2_page_mkwrite could end up allocating
excess blocks beyond the end of the file, similar to fallocate. This
doesn't make sense; fix it.
Reported-by: Bob Peterson <rpeterso@redhat.com>
Fixes: 184b4e6085 ("gfs2: Fix end-of-file handling in gfs2_page_mkwrite")
Cc: stable@vger.kernel.org # v5.5+
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 122e093c17 upstream.
On systems with memory nodes sorted in descending order, for instance Dell
Precision WorkStation T5500, the struct pages for higher PFNs and
respectively lower nodes, could be overwritten by the initialization of
struct pages corresponding to the holes in the memory sections.
For example for the below memory layout
[ 0.245624] Early memory node ranges
[ 0.248496] node 1: [mem 0x0000000000001000-0x0000000000090fff]
[ 0.251376] node 1: [mem 0x0000000000100000-0x00000000dbdf8fff]
[ 0.254256] node 1: [mem 0x0000000100000000-0x0000001423ffffff]
[ 0.257144] node 0: [mem 0x0000001424000000-0x0000002023ffffff]
the range 0x1424000000 - 0x1428000000 in the beginning of node 0 starts in
the middle of a section and will be considered as a hole during the
initialization of the last section in node 1.
The wrong initialization of the memory map causes panic on boot when
CONFIG_DEBUG_VM is enabled.
Reorder loop order of the memory map initialization so that the outer loop
will always iterate over populated memory regions in the ascending order
and the inner loop will select the zone corresponding to the PFN range.
This way initialization of the struct pages for the memory holes will be
always done for the ranges that are actually not populated.
[akpm@linux-foundation.org: coding style fixes]
Link: https://lkml.kernel.org/r/YNXlMqBbL+tBG7yq@kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=213073
Link: https://lkml.kernel.org/r/20210624062305.10940-1-rppt@kernel.org
Fixes: 0740a50b9b ("mm/page_alloc.c: refactor initialization of struct page for holes in memory layout")
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Boris Petkov <bp@alien8.de>
Cc: Robert Shteynfeld <robert.shteynfeld@gmail.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Hildenbrand <david@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b31d9d6d7a upstream.
when system is doing s4, the process of xhci_resume may be as below:
1、xhci_mem_cleanup
2、xhci_init->xhci_mem_init->xhci_mem_cleanup(when memory is not enough).
xhci_mem_cleanup will be executed twice when system is out of memory.
xhci->port_caps is freed in xhci_mem_cleanup,but it isn't set to NULL.
It will be freed twice when xhci_mem_cleanup is called the second time.
We got following bug when system resumes from s4:
kernel BUG at mm/slub.c:309!
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
CPU: 0 PID: 5929 Tainted: G S W 5.4.96-arm64-desktop #1
pc : __slab_free+0x5c/0x424
lr : kfree+0x30c/0x32c
Call trace:
__slab_free+0x5c/0x424
kfree+0x30c/0x32c
xhci_mem_cleanup+0x394/0x3cc
xhci_mem_init+0x9ac/0x1070
xhci_init+0x8c/0x1d0
xhci_resume+0x1cc/0x5fc
xhci_plat_resume+0x64/0x70
platform_pm_thaw+0x28/0x60
dpm_run_callback+0x54/0x24c
device_resume+0xd0/0x200
async_resume+0x24/0x60
async_run_entry_fn+0x44/0x110
process_one_work+0x1f0/0x490
worker_thread+0x5c/0x450
kthread+0x158/0x160
ret_from_fork+0x10/0x24
Original patch that caused this issue was backported to 4.4 stable,
so this should be backported to 4.4 stabe as well.
Fixes: cf0ee7c60c ("xhci: Fix memory leak when caching protocol extended capability PSI tables - take 2")
Cc: stable@vger.kernel.org # v4.4+
Signed-off-by: Jiantao Zhang <water.zhangjiantao@huawei.com>
Signed-off-by: Tao Xue <xuetao09@huawei.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20210617150354.1512157-5-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2b537cf877 upstream.
If the voltage is being decreased in power negotiation, the Source will
set the power supply to operate at the new voltage level before sending
PS_RDY. Relax the threshold before sending Request Message so that it
will not race with Source which begins to adjust the voltage right after
it sends Accept Message (PPS) or tSrcTransition (25~35ms) after it sends
Accept Message (non-PPS).
The real threshold will be set after Sink receives PS_RDY Message.
Fixes: f321a02cae ("usb: typec: tcpm: Implement enabling Auto Discharge disconnect support")
Cc: stable <stable@vger.kernel.org>
Cc: Badhri Jagan Sridharan <badhri@google.com>
Reviewed-by: Badhri Jagan Sridharan <badhri@google.com>
Acked-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Signed-off-by: Kyle Tso <kyletso@google.com>
Link: https://lore.kernel.org/r/20210616090102.1897674-1-kyletso@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4897807753 upstream.
The device (32a7:0000 Heimann Sensor GmbH USB appset demo) claims to be
a CDC-ACM device in its descriptors but in fact is not. If it is run
with echo disabled it returns garbled data, probably due to something
that happens in the TTY layer. And when run with echo enabled (the
default), it will mess up the calibration data of the sensor the first
time any data is sent to the device.
In short, I had a bad time after connecting the sensor and trying to get
it to work. I hope blacklisting it in the cdc-acm driver will save
someone else a bit of trouble.
Signed-off-by: Hannu Hartikainen <hannu@hrtk.in>
Cc: stable <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20210622141454.337948-1-hannu@hrtk.in
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4249d6fbc1 upstream.
when receive eem echo command, it will send a response,
but queue this response to the usb request which allocate
from gadget device endpoint zero,
and transmit the request to IN endpoint of eem interface.
on dwc3 gadget, it will trigger following warning in function
__dwc3_gadget_ep_queue(),
if (WARN(req->dep != dep, "request %pK belongs to '%s'\n",
&req->request, req->dep->name))
return -EINVAL;
fix it by allocating a usb request from IN endpoint of eem interface,
and transmit the usb request to same IN endpoint of eem interface.
Signed-off-by: Linyu Yuan <linyyuan@codeaurora.com>
Cc: stable <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20210616115142.34075-1-linyyuan@codeaurora.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 41e81022a0 upstream.
The direction of the pipe argument must match the request-type direction
bit or control requests may fail depending on the host-controller-driver
implementation.
Fix the four control requests which erroneously used usb_rcvctrlpipe().
Fixes: 1d3e20236d ("[PATCH] USB: usbtouchscreen: unified USB touchscreen driver")
Fixes: 24ced062a2 ("usbtouchscreen: add support for DMC TSC-10/25 devices")
Fixes: 9e3b25837a ("Input: usbtouchscreen - add support for e2i touchscreen controller")
Signed-off-by: Johan Hovold <johan@kernel.org>
Cc: stable@vger.kernel.org # 2.6.17
Link: https://lore.kernel.org/r/20210524092048.4443-1-johan@kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c680ed46e4 upstream.
syzbot reported WARNING in vmalloc. The problem
was in zero size passed to vmalloc.
The root case was in wrong cxusb_bluebird_lgz201_properties
definition. adapter array has only 1 entry, but num_adapters was
2.
Call Trace:
__vmalloc_node mm/vmalloc.c:2963 [inline]
vmalloc+0x67/0x80 mm/vmalloc.c:2996
dvb_dmx_init+0xe4/0xb90 drivers/media/dvb-core/dvb_demux.c:1251
dvb_usb_adapter_dvb_init+0x564/0x860 drivers/media/usb/dvb-usb/dvb-usb-dvb.c:184
dvb_usb_adapter_init drivers/media/usb/dvb-usb/dvb-usb-init.c:86 [inline]
dvb_usb_init drivers/media/usb/dvb-usb/dvb-usb-init.c:184 [inline]
dvb_usb_device_init.cold+0xc94/0x146e drivers/media/usb/dvb-usb/dvb-usb-init.c:308
cxusb_probe+0x159/0x5e0 drivers/media/usb/dvb-usb/cxusb.c:1634
Fixes: 4d43e13f72 ("V4L/DVB (4643): Multi-input patch for DVB-USB device")
Cc: stable@vger.kernel.org
Reported-by: syzbot+7336195c02c1bd2f64e1@syzkaller.appspotmail.com
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: Sean Young <sean@mess.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f8fbcdfb06 upstream.
Asus Zenbook 14 UM431D has two speaker pins and a headphone pin, and
the auto-parser ends up assigning the bass to the third DAC 0x06.
Although the tone comes out, it's inconvenient because this DAC has no
volume control unlike two other DACs.
For obtaining the volume control for the bass speaker, this patch
enforces the mapping to let both front and bass speaker pins sharing
the same DAC. It's not ideal but a little bit of improvement.
Since we've already applied the same workaround for another ASUS
machine, we just need to hook the chain to the existing quirk.
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=212547
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20210620065952.18948-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 434591b2a7 upstream.
On HP Spectre x360 15-df0xxx, after system boot with plugged headset, the
headset mic are not detected.
Moving pincfg and DAC's config to single fixup function fix this.
[ The actual bug in the original code was that it used a chain to
ALC286_FIXUP_SPEAKER2_TO_DAC1, and it contains not only the DAC1
route fix but also another chain to ALC269_FIXUP_THINKPAD_ACPI.
I thought the latter one is harmless for non-Thinkpad, but it
doesn't seem so; it contains again yet another chain to
ALC269_FIXUP_SKI_IGNORE, and this might be bad for some machines,
including this HP machine. -- tiwai ]
Signed-off-by: Elia Devito <eliadevito@gmail.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20210619204105.5682-1-eliadevito@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 785b6f29a7 upstream.
The current way of the scarlett2 mixer code managing the
usb_mixer_elem_info object is wrong in two ways: it passes its
internal index to the head.id field, and the val_type field is
uninitialized. This ended up with the wrong execution at the resume
because a bogus unit id is passed wrongly. Also, in the later code
extensions, we'll have more mixer elements, and passing the index will
overflow the unit id size (of 256).
This patch corrects those issues. It introduces a new value type,
USB_MIXER_BESPOKEN, which indicates a non-standard mixer element, and
use this type for all scarlett2 mixer elements, as well as
initializing the fixed unit id 0 for avoiding the overflow.
Tested-by: Geoffrey D. Bennett <g@b4.vu>
Signed-off-by: Geoffrey D. Bennett <g@b4.vu>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/49721219f45b7e175e729b0d9d9c142fd8f4342a.1624379707.git.g@b4.vu
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6b6c17fe6f upstream.
Below devices reports zero as the number of channels for external output
plug with MIDI type:
* Yamaha GO44/GO46
* Terratec Phase 24/X24
As a result, rx packet format is invalid and they generate silent sound.
This is a regression added in v5.13.
This commit fixes the bug, addressed at a commit 1bd1b3be86 ("ALSA:
bebob: perform sequence replay for media clock recovery").
Cc: <stable@vger.kernel.org>
Fixes: 5c6ea94f2b ("ALSA: bebob: detect the number of available MIDI ports")
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Link: https://lore.kernel.org/r/20210618040447.113306-1-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1c58e933ab upstream.
Even with rate limited reporting this is very spammy and since
it is remote device that is providing bogus data there is no
need to report this as error.
Since real_len variable was used only to allow conditional error
message it is now also removed.
[72454.143336] bt_err_ratelimited: 10 callbacks suppressed
[72454.143337] Bluetooth: hci0: advertising data len corrected
[72454.296314] Bluetooth: hci0: advertising data len corrected
[72454.892329] Bluetooth: hci0: advertising data len corrected
[72455.051319] Bluetooth: hci0: advertising data len corrected
[72455.357326] Bluetooth: hci0: advertising data len corrected
[72455.663295] Bluetooth: hci0: advertising data len corrected
[72455.787278] Bluetooth: hci0: advertising data len corrected
[72455.942278] Bluetooth: hci0: advertising data len corrected
[72456.094276] Bluetooth: hci0: advertising data len corrected
[72456.249137] Bluetooth: hci0: advertising data len corrected
[72459.416333] bt_err_ratelimited: 13 callbacks suppressed
[72459.416334] Bluetooth: hci0: advertising data len corrected
[72459.721334] Bluetooth: hci0: advertising data len corrected
[72460.011317] Bluetooth: hci0: advertising data len corrected
[72460.327171] Bluetooth: hci0: advertising data len corrected
[72460.638294] Bluetooth: hci0: advertising data len corrected
[72460.946350] Bluetooth: hci0: advertising data len corrected
[72461.225320] Bluetooth: hci0: advertising data len corrected
[72461.690322] Bluetooth: hci0: advertising data len corrected
[72462.118318] Bluetooth: hci0: advertising data len corrected
[72462.427319] Bluetooth: hci0: advertising data len corrected
[72464.546319] bt_err_ratelimited: 7 callbacks suppressed
[72464.546319] Bluetooth: hci0: advertising data len corrected
[72464.857318] Bluetooth: hci0: advertising data len corrected
[72465.163332] Bluetooth: hci0: advertising data len corrected
[72465.278331] Bluetooth: hci0: advertising data len corrected
[72465.432323] Bluetooth: hci0: advertising data len corrected
[72465.891334] Bluetooth: hci0: advertising data len corrected
[72466.045334] Bluetooth: hci0: advertising data len corrected
[72466.197321] Bluetooth: hci0: advertising data len corrected
[72466.340318] Bluetooth: hci0: advertising data len corrected
[72466.498335] Bluetooth: hci0: advertising data len corrected
[72469.803299] bt_err_ratelimited: 10 callbacks suppressed
Signed-off-by: Szymon Janc <szymon.janc@codecoup.pl>
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=203753
Cc: stable@vger.kernel.org
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 59f90f1351 upstream.
In qca_power_shutdown() qcadev local variable is
initialized by hu->serdev.dev private data, but
hu->serdev can be NULL and there is a check for it.
Since, qcadev is not used before
if (!hu->serdev)
return;
we can move its initialization after this "if" to
prevent GPF.
Fixes: 5559904ccc ("Bluetooth: hci_qca: Add QCA Rome power off support to the qca_power_shutdown()")
Cc: stable@vger.kernel.org # v5.6+
Cc: Rocky Liao <rjliao@codeaurora.org>
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Reviewed-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f71a53d118 upstream.
Restore CR4.LA57 to the mmu_role to fix an amusing edge case with nested
virtualization. When KVM (L0) is using TDP, CR4.LA57 is not reflected in
mmu_role.base.level because that tracks the shadow root level, i.e. TDP
level. Normally, this is not an issue because LA57 can't be toggled
while long mode is active, i.e. the guest has to first disable paging,
then toggle LA57, then re-enable paging, thus ensuring an MMU
reinitialization.
But if L1 is crafty, it can load a new CR4 on VM-Exit and toggle LA57
without having to bounce through an unpaged section. L1 can also load a
new CR3 on exit, i.e. it doesn't even need to play crazy paging games, a
single entry PML5 is sufficient. Such shenanigans are only problematic
if L0 and L1 use TDP, otherwise L1 and L2 share an MMU that gets
reinitialized on nested VM-Enter/VM-Exit due to mmu_role.base.guest_mode.
Note, in the L2 case with nested TDP, even though L1 can switch between
L2s with different LA57 settings, thus bypassing the paging requirement,
in that case KVM's nested_mmu will track LA57 in base.level.
This reverts commit 8053f924ca.
Fixes: 8053f924ca ("KVM: x86/mmu: Drop kvm_mmu_extended_role.cr4_la57 hack")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-6-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-07 08:22:32 -04:00
1899 changed files with 20379 additions and 11045 deletions
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.