commit 59ea6d06cf upstream.
When fixing the race conditions between the coredump and the mmap_sem
holders outside the context of the process, we focused on
mmget_not_zero()/get_task_mm() callers in 04f5866e41 ("coredump: fix
race condition between mmget_not_zero()/get_task_mm() and core
dumping"), but those aren't the only cases where the mmap_sem can be
taken outside of the context of the process as Michal Hocko noticed
while backporting that commit to older -stable kernels.
If mmgrab() is called in the context of the process, but then the
mm_count reference is transferred outside the context of the process,
that can also be a problem if the mmap_sem has to be taken for writing
through that mm_count reference.
khugepaged registration calls mmgrab() in the context of the process,
but the mmap_sem for writing is taken later in the context of the
khugepaged kernel thread.
collapse_huge_page() after taking the mmap_sem for writing doesn't
modify any vma, so it's not obvious that it could cause a problem to the
coredump, but it happens to modify the pmd in a way that breaks an
invariant that pmd_trans_huge_lock() relies upon. collapse_huge_page()
needs the mmap_sem for writing just to block concurrent page faults that
call pmd_trans_huge_lock().
Specifically the invariant that "!pmd_trans_huge()" cannot become a
"pmd_trans_huge()" doesn't hold while collapse_huge_page() runs.
The coredump will call __get_user_pages() without mmap_sem for reading,
which eventually can invoke a lockless page fault which will need a
functional pmd_trans_huge_lock().
So collapse_huge_page() needs to use mmget_still_valid() to check it's
not running concurrently with the coredump... as long as the coredump
can invoke page faults without holding the mmap_sem for reading.
This has "Fixes: khugepaged" to facilitate backporting, but in my view
it's more a bug in the coredump code that will eventually have to be
rewritten to stop invoking page faults without the mmap_sem for reading.
So the long term plan is still to drop all mmget_still_valid().
Link: http://lkml.kernel.org/r/20190607161558.32104-1-aarcange@redhat.com
Fixes: ba76149f47 ("thp: khugepaged")
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Michal Hocko <mhocko@suse.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6486199378 upstream.
When the controller supports less queues than requested, we
should make sure that queue mapping does the right thing and
not assume that all queues are available. This fixes a crash
when the controller supports less queues than requested.
The rules are:
1. if no write queues are requested, we assign the available queues
to the default queue map. The default and read queue maps share the
existing queues.
2. if write queues are requested:
- first make sure that read queue map gets the requested
nr_io_queues count
- then grant the default queue map the minimum between the requested
nr_write_queues and the remaining queues. If there are no available
queues to dedicate to the default queue map, fallback to (1) and
share all the queues in the existing queue map.
Also, provide a log indication on how we constructed the different
queue maps.
Reported-by: Harris, James R <james.r.harris@intel.com>
Tested-by: Jim Harris <james.r.harris@intel.com>
Cc: <stable@vger.kernel.org> # v5.0+
Suggested-by: Roy Shterman <roys@lightbitslabs.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f34e25898a upstream.
If I/O queue connect times out, we might have freed the queue socket
already, so check for that on the error path in nvme_tcp_start_queue.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7a30df49f6 upstream.
A few new fields were added to mmu_gather to make TLB flush smarter for
huge page by telling what level of page table is changed.
__tlb_reset_range() is used to reset all these page table state to
unchanged, which is called by TLB flush for parallel mapping changes for
the same range under non-exclusive lock (i.e. read mmap_sem).
Before commit dd2283f260 ("mm: mmap: zap pages with read mmap_sem in
munmap"), the syscalls (e.g. MADV_DONTNEED, MADV_FREE) which may update
PTEs in parallel don't remove page tables. But, the forementioned
commit may do munmap() under read mmap_sem and free page tables. This
may result in program hang on aarch64 reported by Jan Stancek. The
problem could be reproduced by his test program with slightly modified
below.
---8<---
static int map_size = 4096;
static int num_iter = 500;
static long threads_total;
static void *distant_area;
void *map_write_unmap(void *ptr)
{
int *fd = ptr;
unsigned char *map_address;
int i, j = 0;
for (i = 0; i < num_iter; i++) {
map_address = mmap(distant_area, (size_t) map_size, PROT_WRITE | PROT_READ,
MAP_SHARED | MAP_ANONYMOUS, -1, 0);
if (map_address == MAP_FAILED) {
perror("mmap");
exit(1);
}
for (j = 0; j < map_size; j++)
map_address[j] = 'b';
if (munmap(map_address, map_size) == -1) {
perror("munmap");
exit(1);
}
}
return NULL;
}
void *dummy(void *ptr)
{
return NULL;
}
int main(void)
{
pthread_t thid[2];
/* hint for mmap in map_write_unmap() */
distant_area = mmap(0, DISTANT_MMAP_SIZE, PROT_WRITE | PROT_READ,
MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
munmap(distant_area, (size_t)DISTANT_MMAP_SIZE);
distant_area += DISTANT_MMAP_SIZE / 2;
while (1) {
pthread_create(&thid[0], NULL, map_write_unmap, NULL);
pthread_create(&thid[1], NULL, dummy, NULL);
pthread_join(thid[0], NULL);
pthread_join(thid[1], NULL);
}
}
---8<---
The program may bring in parallel execution like below:
t1 t2
munmap(map_address)
downgrade_write(&mm->mmap_sem);
unmap_region()
tlb_gather_mmu()
inc_tlb_flush_pending(tlb->mm);
free_pgtables()
tlb->freed_tables = 1
tlb->cleared_pmds = 1
pthread_exit()
madvise(thread_stack, 8M, MADV_DONTNEED)
zap_page_range()
tlb_gather_mmu()
inc_tlb_flush_pending(tlb->mm);
tlb_finish_mmu()
if (mm_tlb_flush_nested(tlb->mm))
__tlb_reset_range()
__tlb_reset_range() would reset freed_tables and cleared_* bits, but this
may cause inconsistency for munmap() which do free page tables. Then it
may result in some architectures, e.g. aarch64, may not flush TLB
completely as expected to have stale TLB entries remained.
Use fullmm flush since it yields much better performance on aarch64 and
non-fullmm doesn't yields significant difference on x86.
The original proposed fix came from Jan Stancek who mainly debugged this
issue, I just wrapped up everything together.
Jan's testing results:
v5.2-rc2-24-gbec7550cca10
--------------------------
mean stddev
real 37.382 2.780
user 1.420 0.078
sys 54.658 1.855
v5.2-rc2-24-gbec7550cca10 + "mm: mmu_gather: remove __tlb_reset_range() for force flush"
---------------------------------------------------------------------------------------_
mean stddev
real 37.119 2.105
user 1.548 0.087
sys 55.698 1.357
[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/1558322252-113575-1-git-send-email-yang.shi@linux.alibaba.com
Fixes: dd2283f260 ("mm: mmap: zap pages with read mmap_sem in munmap")
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: Jan Stancek <jstancek@redhat.com>
Reported-by: Jan Stancek <jstancek@redhat.com>
Tested-by: Jan Stancek <jstancek@redhat.com>
Suggested-by: Will Deacon <will.deacon@arm.com>
Tested-by: Will Deacon <will.deacon@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: <stable@vger.kernel.org> [4.20+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 275e928f19 ]
Force of 56G is not supported by hardware in Ethernet devices. This
configuration fails with a bad parameter error from firmware.
Add check of this case. Instead of trying to set 56G with autoneg off,
return a meaningful error.
Fixes: 56ade8fe3f ("mlxsw: spectrum: Add initial support for Spectrum ASIC")
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 12e750bc62 ]
If alloc_workqueue fails in alua_init, it should return -ENOMEM, otherwise
it will trigger null-ptr-deref while unloading module which calls
destroy_workqueue dereference
wq->lock like this:
BUG: KASAN: null-ptr-deref in __lock_acquire+0x6b4/0x1ee0
Read of size 8 at addr 0000000000000080 by task syz-executor.0/7045
CPU: 0 PID: 7045 Comm: syz-executor.0 Tainted: G C 5.1.0+ #28
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1
Call Trace:
dump_stack+0xa9/0x10e
__kasan_report+0x171/0x18d
? __lock_acquire+0x6b4/0x1ee0
kasan_report+0xe/0x20
__lock_acquire+0x6b4/0x1ee0
lock_acquire+0xb4/0x1b0
__mutex_lock+0xd8/0xb90
drain_workqueue+0x25/0x290
destroy_workqueue+0x1f/0x3f0
__x64_sys_delete_module+0x244/0x330
do_syscall_64+0x72/0x2a0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: 03197b61c5 ("scsi_dh_alua: Use workqueue for RTPG")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1d94f06e7f ]
When SME is enabled, the smartpqi driver won't work on the HP DL385 G10
machine, which causes the failure of kernel boot because it fails to
allocate pqi error buffer. Please refer to the kernel log:
....
[ 9.431749] usbcore: registered new interface driver uas
[ 9.441524] Microsemi PQI Driver (v1.1.4-130)
[ 9.442956] i40e 0000:04:00.0: fw 6.70.48768 api 1.7 nvm 10.2.5
[ 9.447237] smartpqi 0000:23:00.0: Microsemi Smart Family Controller found
Starting dracut initqueue hook...
[ OK ] Started Show Plymouth Boot Scre[ 9.471654] Broadcom NetXtreme-C/E driver bnxt_en v1.9.1
en.
[ OK ] Started Forward Password Requests to Plymouth Directory Watch.
[[0;[ 9.487108] smartpqi 0000:23:00.0: failed to allocate PQI error buffer
....
[ 139.050544] dracut-initqueue[949]: Warning: dracut-initqueue timeout - starting timeout scripts
[ 139.589779] dracut-initqueue[949]: Warning: dracut-initqueue timeout - starting timeout scripts
Basically, the fact that the coherent DMA mask value wasn't set caused the
driver to fall back to SWIOTLB when SME is active.
For correct operation, lets call the dma_set_mask_and_coherent() to
properly set the mask for both streaming and coherent, in order to inform
the kernel about the devices DMA addressing capabilities.
Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
Acked-by: Don Brace <don.brace@microsemi.com>
Tested-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1a97a477e6 ]
After reset SGMII Autoneg timer is set to 2us (bits 6 and 5 are 01).
That is not enough to finalize autonegatiation on some devices.
Increase this timer duration to maximum supported 16ms.
Signed-off-by: Max Uvarov <muvarov@gmail.com>
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 333061b924 ]
For supporting 10Mps speed in SGMII mode DP83867_10M_SGMII_RATE_ADAPT bit
of DP83867_10M_SGMII_CFG register has to be cleared by software.
That does not affect speeds 100 and 1000 so can be done on init.
Signed-off-by: Max Uvarov <muvarov@gmail.com>
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c678726305 ]
Ensure that we supply the same phy interface mode to mac_link_down() as
we did for the corresponding mac_link_up() call. This ensures that MAC
drivers that use the phy interface mode in these methods can depend on
mac_link_down() always corresponding to a mac_link_up() call for the
same interface mode.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 315ca92dd8 ]
The sh_eth_close() resets the MAC and then calls phy_stop()
so that mdio read access result is incorrect without any error
according to kernel trace like below:
ifconfig-216 [003] .n.. 109.133124: mdio_access: ee700000.ethernet-ffffffff read phy:0x01 reg:0x00 val:0xffff
According to the hardware manual, the RMII mode should be set to 1
before operation the Ethernet MAC. However, the previous code was not
set to 1 after the driver issued the soft_reset in sh_eth_dev_exit()
so that the mdio read access result seemed incorrect. To fix the issue,
this patch adds a condition and set the RMII mode register in
sh_eth_dev_exit() for R-Car Gen2 and RZ/A1 SoCs.
Note that when I have tried to move the sh_eth_dev_exit() calling
after phy_stop() on sh_eth_close(), but it gets worse (kernel panic
happened and it seems that a register is accessed while the clock is
off).
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1e29ab3186 ]
Calling sys_ni_syscall through a syscall_fn_t pointer trips indirect
call Control-Flow Integrity checking due to a function type
mismatch. Use SYSCALL_DEFINE0 for __arm64_sys_ni_syscall instead and
remove the now unnecessary casts.
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0e358bd7b7 ]
Although a syscall defined using SYSCALL_DEFINE0 doesn't accept
parameters, use the correct function type to avoid indirect call
type mismatches with Control-Flow Integrity checking.
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8ef8f368ce ]
Syscall wrappers in <asm/syscall_wrapper.h> use const struct pt_regs *
as the argument type. Use const in syscall_fn_t as well to fix indirect
call type mismatches with Control-Flow Integrity checking.
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6954158a16 ]
With gcc 4.1:
sound/firewire/fireface/ff-protocol-latter.c: In function ‘latter_switch_fetching_mode’:
sound/firewire/fireface/ff-protocol-latter.c:97: warning: integer constant is too large for ‘long’ type
sound/firewire/fireface/ff-protocol-latter.c: In function ‘latter_begin_session’:
sound/firewire/fireface/ff-protocol-latter.c:170: warning: integer constant is too large for ‘long’ type
sound/firewire/fireface/ff-protocol-latter.c:197: warning: integer constant is too large for ‘long’ type
sound/firewire/fireface/ff-protocol-latter.c:205: warning: integer constant is too large for ‘long’ type
sound/firewire/fireface/ff-protocol-latter.c: In function ‘latter_finish_session’:
sound/firewire/fireface/ff-protocol-latter.c:214: warning: integer constant is too large for ‘long’ type
Fix this by adding the missing "ULL" suffixes.
Add the same suffix to the last constant, to maintain consistency.
Fixes: fd1cc9de64 ("ALSA: fireface: add support for Fireface UCX")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5a3f49364c ]
Currently the HV KVM code takes the kvm->lock around calls to
kvm_for_each_vcpu() and kvm_get_vcpu_by_id() (which can call
kvm_for_each_vcpu() internally). However, that leads to a lock
order inversion problem, because these are called in contexts where
the vcpu mutex is held, but the vcpu mutexes nest within kvm->lock
according to Documentation/virtual/kvm/locking.txt. Hence there
is a possibility of deadlock.
To fix this, we simply don't take the kvm->lock mutex around these
calls. This is safe because the implementations of kvm_for_each_vcpu()
and kvm_get_vcpu_by_id() have been designed to be able to be called
locklessly.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1659e27d2b ]
Currently the Book 3S KVM code uses kvm->lock to synchronize access
to the kvm->arch.rtas_tokens list. Because this list is scanned
inside kvmppc_rtas_hcall(), which is called with the vcpu mutex held,
taking kvm->lock cause a lock inversion problem, which could lead to
a deadlock.
To fix this, we add a new mutex, kvm->arch.rtas_token_lock, which nests
inside the vcpu mutexes, and use that instead of kvm->lock when
accessing the rtas token list.
This removes the lockdep_assert_held() in kvmppc_rtas_tokens_free().
At this point we don't hold the new mutex, but that is OK because
kvmppc_rtas_tokens_free() is only called when the whole VM is being
destroyed, and at that point nothing can be looking up a token in
the list.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0d4ee88d92 ]
Currently the HV KVM code uses kvm->lock in conjunction with a flag,
kvm->arch.mmu_ready, to synchronize MMU setup and hold off vcpu
execution until the MMU-related data structures are ready. However,
this means that kvm->lock is being taken inside vcpu->mutex, which
is contrary to Documentation/virtual/kvm/locking.txt and results in
lockdep warnings.
To fix this, we add a new mutex, kvm->arch.mmu_setup_lock, which nests
inside the vcpu mutexes, and is taken in the places where kvm->lock
was taken that are related to MMU setup.
Additionally we take the new mutex in the vcpu creation code at the
point where we are creating a new vcore, in order to provide mutual
exclusion with kvmppc_update_lpcr() and ensure that an update to
kvm->arch.lpcr doesn't get missed, which could otherwise lead to a
stale vcore->lpcr value.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d10e0cc113 ]
During a suspend/resume, the xenwatch thread waits for all outstanding
xenstore requests and transactions to complete. This does not work
correctly for transactions started by userspace because it waits for
them to complete after freezing userspace threads which means the
transactions have no way of completing, resulting in a deadlock. This is
trivial to reproduce by running this script and then suspending the VM:
import pyxs, time
c = pyxs.client.Client(xen_bus_path="/dev/xen/xenbus")
c.connect()
c.transaction()
time.sleep(3600)
Even if this deadlock were resolved, misbehaving userspace should not
prevent a VM from being migrated. So, instead of waiting for these
transactions to complete before suspending, store the current generation
id for each transaction when it is started. The global generation id is
incremented during resume. If the caller commits the transaction and the
generation id does not match the current generation id, return EAGAIN so
that they try again. If the transaction was instead discarded, return OK
since no changes were made anyway.
This only affects users of the xenbus file interface. In-kernel users of
xenbus are assumed to be well-behaved and complete all transactions
before freezing.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 41349672e3 ]
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/xen/pvcalls-front.c: In function pvcalls_front_sendmsg:
drivers/xen/pvcalls-front.c:543:25: warning: variable bedata set but not used [-Wunused-but-set-variable]
drivers/xen/pvcalls-front.c: In function pvcalls_front_recvmsg:
drivers/xen/pvcalls-front.c:638:25: warning: variable bedata set but not used [-Wunused-but-set-variable]
They are never used since introduction.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7aae703f80 ]
Make sure only the portals for the online CPUs are used.
Without this change, there are issues when someone boots with
maxcpus=n, with n < actual number of cores available as frames
either received or corresponding to the transmit confirmation
path would be offered for dequeue to the offline CPU portals,
getting lost.
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6738028dd5 ]
Command 'perf record' and 'perf report' on a system without kernel
debuginfo packages uses /proc/kallsyms and /proc/modules to find
addresses for kernel and module symbols. On x86 this works for root and
non-root users.
On s390, when invoked as non-root user, many of the following warnings
are shown and module symbols are missing:
proc/{kallsyms,modules} inconsistency while looking for
"[sha1_s390]" module!
Command 'perf record' creates a list of module start addresses by
parsing the output of /proc/modules and creates a PERF_RECORD_MMAP
record for the kernel and each module. The following function call
sequence is executed:
machine__create_kernel_maps
machine__create_module
modules__parse
machine__create_module --> for each line in /proc/modules
arch__fix_module_text_start
Function arch__fix_module_text_start() is s390 specific. It opens
file /sys/module/<name>/sections/.text to extract the module's .text
section start address. On s390 the module loader prepends a header
before the first section, whereas on x86 the module's text section
address is identical the the module's load address.
However module section files are root readable only. For non-root the
read operation fails and machine__create_module() returns an error.
Command perf record does not generate any PERF_RECORD_MMAP record
for loaded modules. Later command perf report complains about missing
module maps.
To fix this function arch__fix_module_text_start() always returns
success. For root users there is no change, for non-root users
the module's load address is used as module's text start address
(the prepended header then counts as part of the text section).
This enable non-root users to use module symbols and avoid the
warning when perf report is executed.
Output before:
[tmricht@m83lp54 perf]$ ./perf report -D | fgrep MMAP
0 0x168 [0x50]: PERF_RECORD_MMAP ... x [kernel.kallsyms]_text
Output after:
[tmricht@m83lp54 perf]$ ./perf report -D | fgrep MMAP
0 0x168 [0x50]: PERF_RECORD_MMAP ... x [kernel.kallsyms]_text
0 0x1b8 [0x98]: PERF_RECORD_MMAP ... x /lib/modules/.../autofs4.ko.xz
0 0x250 [0xa8]: PERF_RECORD_MMAP ... x /lib/modules/.../sha_common.ko.xz
0 0x2f8 [0x98]: PERF_RECORD_MMAP ... x /lib/modules/.../des_generic.ko.xz
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Link: http://lkml.kernel.org/r/20190522144601.50763-4-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7379e65279 ]
The zcrypt device driver does not handle CPRBs which address
a control domain correctly. This fix introduces a workaround:
The domain field of the request CPRB is checked if there is
a valid domain value in there. If this is true and the value
is a control only domain (a domain which is enabled in the
crypto config ADM mask but disabled in the AQM mask) the
CPRB is forwarded to the default usage domain. If there is
no default domain, the request is rejected with an ENODEV.
This fix is important for maintaining crypto adapters. For
example one LPAR can use a crypto adapter domain ('Control
and Usage') but another LPAR needs to be able to maintain
this adapter domain ('Control'). Scenarios like this did
not work properly and the patch enables this.
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 97acec7df1 ]
This strncat() is safe because the buffer was allocated with zalloc(),
however gcc doesn't know that. Since the string always has 4 non-null
bytes, just use memcpy() here.
CC /home/shawn/linux/tools/perf/util/data-convert-bt.o
In file included from /usr/include/string.h:494,
from /home/shawn/linux/tools/lib/traceevent/event-parse.h:27,
from util/data-convert-bt.c:22:
In function ‘strncat’,
inlined from ‘string_set_value’ at util/data-convert-bt.c:274:4:
/usr/include/powerpc64le-linux-gnu/bits/string_fortified.h:136:10: error: ‘__builtin_strncat’ output may be truncated copying 4 bytes from a string of length 4 [-Werror=stringop-truncation]
136 | return __builtin___strncat_chk (__dest, __src, __len, __bos (__dest));
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Shawn Landden <shawn@git.icu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
LPU-Reference: 20190518183238.10954-1-shawn@git.icu
Link: https://lkml.kernel.org/n/tip-289f1jice17ta7tr3tstm9jm@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f6122ed2a4 ]
In the vfs_statx() context, during path lookup, the dentry gets
added to sd->s_dentry via configfs_attach_attr(). In the end,
vfs_statx() kills the dentry by calling path_put(), which invokes
configfs_d_iput(). Ideally, this dentry must be removed from
sd->s_dentry but it doesn't if the sd->s_count >= 3. As a result,
sd->s_dentry is holding reference to a stale dentry pointer whose
memory is already freed up. This results in use-after-free issue,
when this stale sd->s_dentry is accessed later in
configfs_readdir() path.
This issue can be easily reproduced, by running the LTP test case -
sh fs_racer_file_list.sh /config
(https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/fs/racer/fs_racer_file_list.sh)
Fixes: 76ae281f63 ('configfs: fix race between dentry put and lookup')
Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fa763f1b28 ]
We observed the same issue as reported by commit a8d7bde23e
("ALSA: hda - Force polling mode on CFL for fixing codec communication")
We don't have a better solution. So apply the same workaround to CNL.
Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a0692f0eef ]
If I2C_M_RECV_LEN check failed, msgs[i].buf allocated by memdup_user
will not be freed. Pump index up so it will be freed.
Fixes: 838bfa6049 ("i2c-dev: Add support for I2C_M_RECV_LEN")
Signed-off-by: Yingjoe Chen <yingjoe.chen@mediatek.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit eaeb3b7494 ]
Driver stops producing skbs on ring if a packet with FCS error
was coalesced into LRO session. Ring gets hang forever.
Thats a logical error in driver processing descriptors:
When rx_stat indicates MAC Error, next pointer and eop flags
are not filled. This confuses driver so it waits for descriptor 0
to be filled by HW.
Solution is fill next pointer and eop flag even for packets with FCS error.
Fixes: bab6de8fd1 ("net: ethernet: aquantia: Atlantic A0 and B0 specific functions.")
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: Dmitry Bogdanov <dmitry.bogdanov@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 31bafc49a7 ]
In case no other traffic happening on the ring, full tx cleanup
may not be completed. That may cause socket buffer to overflow
and tx traffic to stuck until next activity on the ring happens.
This is due to logic error in budget variable decrementor.
Variable is compared with zero, and then post decremented,
causing it to become MAX_INT. Solution is remove decrementor
from the `for` statement and rewrite it in a clear way.
Fixes: b647d39809 ("net: aquantia: Add tx clean budget and valid budget handling logic")
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1396500d67 ]
The devcoredump needs to operate on a stable state of the MMU while
it is writing the MMU state to the coredump. The missing lock
allowed both the userspace submit, as well as the GPU job finish
paths to mutate the MMU state while a coredump is under way.
Fixes: a8c21a5451 (drm/etnaviv: add initial etnaviv DRM driver)
Reported-by: David Jander <david@protonic.nl>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Tested-by: David Jander <david@protonic.nl>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9a51c6b1f9 ]
Both acpi_pci_need_resume() and acpi_dev_needs_resume() check if the
current ACPI wakeup configuration of the device matches what is
expected as far as system wakeup from sleep states is concerned, as
reflected by the device_may_wakeup() return value for the device.
However, they only should do that if wakeup.flags.valid is set for
the device's ACPI companion, because otherwise the wakeup.prepare_count
value for it is meaningless.
Add the missing wakeup.flags.valid checks to these functions.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3e66b7cc50 ]
Building with Clang reports the redundant use of MODULE_DEVICE_TABLE():
drivers/net/ethernet/dec/tulip/de4x5.c:2110:1: error: redefinition of '__mod_eisa__de4x5_eisa_ids_device_table'
MODULE_DEVICE_TABLE(eisa, de4x5_eisa_ids);
^
./include/linux/module.h:229:21: note: expanded from macro 'MODULE_DEVICE_TABLE'
extern typeof(name) __mod_##type##__##name##_device_table \
^
<scratch space>:90:1: note: expanded from here
__mod_eisa__de4x5_eisa_ids_device_table
^
drivers/net/ethernet/dec/tulip/de4x5.c:2100:1: note: previous definition is here
MODULE_DEVICE_TABLE(eisa, de4x5_eisa_ids);
^
./include/linux/module.h:229:21: note: expanded from macro 'MODULE_DEVICE_TABLE'
extern typeof(name) __mod_##type##__##name##_device_table \
^
<scratch space>:85:1: note: expanded from here
__mod_eisa__de4x5_eisa_ids_device_table
^
This drops the one further from the table definition to match the common
use of MODULE_DEVICE_TABLE().
Fixes: 07563c711f ("EISA bus MODALIAS attributes support")
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5a20a093d9 ]
Smatch reports a potential spectre vulnerability in the dpaa2-eth
driver, where the value of rxnfc->fs.location (which is provided
from user-space) is used as index in an array.
Add a call to array_index_nospec() to sanitize the access.
Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f4ca7a9260 ]
1. the frequency of csr clock is 66.5MHz, so the csr_clk value should
be 0 other than 5.
2. the csr_clk can be got from device tree, so remove initialization here.
Fixes: 9992f37e34 ("stmmac: dwmac-mediatek: add support for mt2712")
Signed-off-by: Biao Huang <biao.huang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5e7f7fc538 ]
The specific clk_csr value can be zero, and
stmmac_clk is necessary for MDC clock which can be set dynamically.
So, change the condition from plat->clk_csr to plat->stmmac_clk to
fix clk_csr can't be zero issue.
Fixes: cd7201f477 ("stmmac: MDC clock dynamically based on the csr clock input")
Signed-off-by: Biao Huang <biao.huang@mediatek.com>
Acked-by: Alexandre TORGUE <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4523a56115 ]
Currently we will not update the receive descriptor tail pointer in
stmmac_rx_refill. Rx dma will think no available descriptors and stop
once received packets exceed DMA_RX_SIZE, so that the rx only test will fail.
Update the receive tail pointer in stmmac_rx_refill to add more descriptors
to the rx channel, so packets can be received continually
Fixes: 54139cf3bb ("net: stmmac: adding multiple buffers for rx")
Signed-off-by: Biao Huang <biao.huang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e9646f0f5b ]
The gpio-adp5588 driver uses interfaces that are provided by
GPIOLIB_IRQCHIP, so select that symbol in its Kconfig entry.
Fixes these build errors:
../drivers/gpio/gpio-adp5588.c: In function ‘adp5588_irq_handler’:
../drivers/gpio/gpio-adp5588.c:266:26: error: ‘struct gpio_chip’ has no member named ‘irq’
dev->gpio_chip.irq.domain, gpio));
^
../drivers/gpio/gpio-adp5588.c: In function ‘adp5588_irq_setup’:
../drivers/gpio/gpio-adp5588.c:298:2: error: implicit declaration of function ‘gpiochip_irqchip_add_nested’ [-Werror=implicit-function-declaration]
ret = gpiochip_irqchip_add_nested(&dev->gpio_chip,
^
../drivers/gpio/gpio-adp5588.c:307:2: error: implicit declaration of function ‘gpiochip_set_nested_irqchip’ [-Werror=implicit-function-declaration]
gpiochip_set_nested_irqchip(&dev->gpio_chip,
^
Fixes: 459773ae8d ("gpio: adp5588-gpio: support interrupt controller")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: linux-gpio@vger.kernel.org
Reviewed-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Acked-by: Michael Hennerich <michael.hennerich@analog.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1b038c6e05 ]
In perf_output_put_handle(), an IRQ/NMI can happen in below location and
write records to the same ring buffer:
...
local_dec_and_test(&rb->nest)
... <-- an IRQ/NMI can happen here
rb->user_page->data_head = head;
...
In this case, a value A is written to data_head in the IRQ, then a value
B is written to data_head after the IRQ. And A > B. As a result,
data_head is temporarily decreased from A to B. And a reader may see
data_head < data_tail if it read the buffer frequently enough, which
creates unexpected behaviors.
This can be fixed by moving dec(&rb->nest) to after updating data_head,
which prevents the IRQ/NMI above from updating data_head.
[ Split up by peterz. ]
Signed-off-by: Yabin Cui <yabinc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: mark.rutland@arm.com
Fixes: ef60777c9a ("perf: Optimize the perf_output() path by removing IRQ-disables")
Link: http://lkml.kernel.org/r/20190517115418.224478157@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2ac44ab608 ]
For F17h AMD CPUs, the CPB capability ('Core Performance Boost') is forcibly set,
because some versions of that chip incorrectly report that they do not have it.
However, a hypervisor may filter out the CPB capability, for good
reasons. For example, KVM currently does not emulate setting the CPB
bit in MSR_K7_HWCR, and unchecked MSR access errors will be thrown
when trying to set it as a guest:
unchecked MSR access error: WRMSR to 0xc0010015 (tried to write 0x0000000001000011) at rIP: 0xffffffff890638f4 (native_write_msr+0x4/0x20)
Call Trace:
boost_set_msr+0x50/0x80 [acpi_cpufreq]
cpuhp_invoke_callback+0x86/0x560
sort_range+0x20/0x20
cpuhp_thread_fun+0xb0/0x110
smpboot_thread_fn+0xef/0x160
kthread+0x113/0x130
kthread_create_worker_on_cpu+0x70/0x70
ret_from_fork+0x35/0x40
To avoid this issue, don't forcibly set the CPB capability for a CPU
when running under a hypervisor.
Signed-off-by: Frank van der Linden <fllinden@amazon.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bp@alien8.de
Cc: jiaxun.yang@flygoat.com
Fixes: 0237199186 ("x86/CPU/AMD: Set the CPB bit unconditionally on F17h")
Link: http://lkml.kernel.org/r/20190522221745.GA15789@dev-dsk-fllinden-2c-c1893d73.us-west-2.amazon.com
[ Minor edits to the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ccfb62f27b ]
The user can change the device_name with the IMSETDEVNAME ioctl, but we
need to ensure that the user's name is NUL terminated. Otherwise it
could result in a buffer overflow when we copy the name back to the user
with IMGETDEVINFO ioctl.
I also changed two strcpy() calls which handle the name to strscpy().
Hopefully, there aren't any other ways to create a too long name, but
it's nice to do this as a kernel hardening measure.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5bce256f0b ]
In xhci_debugfs_create_slot(), kzalloc() can fail and
dev->debugfs_private will be NULL.
In xhci_debugfs_create_endpoint(), dev->debugfs_private is used without
any null-pointer check, and can cause a null pointer dereference.
To fix this bug, a null-pointer check is added in
xhci_debugfs_create_endpoint().
This bug is found by a runtime fuzzing tool named FIZZER written by us.
[subjet line change change, add potential -Mathais]
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b59bd3527f ]
Currently init_imc_pmu() can fail either because we try to register an
IMC unit with an invalid domain (i.e an IMC node not supported by the
kernel) or something went wrong while registering a valid IMC unit. In
both the cases kernel provides a 'Register failed' error message.
For example when trace-imc node is not supported by the kernel, but
skiboot advertises a trace-imc node we print:
IMC Unknown Device type
IMC PMU (null) Register failed
To avoid confusion just print the unknown device type message, before
attempting PMU registration, so the second message isn't printed.
Fixes: 8f95faaac5 ("powerpc/powernv: Detect and create IMC device")
Reported-by: Pavaman Subramaniyam <pavsubra@in.ibm.com>
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
[mpe: Reword change log a bit]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1cc54078d1 ]
We need to always call clkdm_clk_enable() and clkdm_clk_disable() even
the clkctrl clock(s) enabled for the domain do not have any gate register
bits. Otherwise clockdomains may never get enabled except when devices get
probed with the legacy "ti,hwmods" devicetree property.
Fixes: 88a172526c ("clk: ti: add support for clkctrl clocks")
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 82ce6eb1dd ]
A test for the basic NAT functionality uses ip command which needs veth
device. There is a condition where the kernel support for veth is not
compiled into the kernel and the test script breaks. This patch contains
code for reasonable error display and correct code exit.
Signed-off-by: Jeffrin Jose T <jeffrin@rajagiritech.edu.in>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e633508a95 ]
NFTA_FIB_F_PRESENT flag was not always honored since eval functions did
not call nft_fib_store_result in all cases.
Given that in all callsites there is a struct net_device pointer
available which holds the interface data to be stored in destination
register, simplify nft_fib_store_result() to just accept that pointer
instead of the nft_pktinfo pointer and interface index. This also
allows to drop the index to interface lookup previously needed to get
the name associated with given index.
Fixes: 055c4b34b9 ("netfilter: nft_fib: Support existence check")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 946c0d8e6e ]
This patch fixes netfilter hook traversal when there are more than 1 hooks
returning NF_QUEUE verdict. When the first queue reinjects the packet,
'nf_reinject' starts traversing hooks with a proper hook_index. However,
if it again receives a NF_QUEUE verdict (by some other netfilter hook), it
queues the packet with a wrong hook_index. So, when the second queue
reinjects the packet, it re-executes hooks in between.
Fixes: 960632ece6 ("netfilter: convert hook list to an array")
Signed-off-by: Jagdish Motwani <jagdish.motwani@sophos.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 23e3983a46 ]
This patch fixes an bug revealed by the following commit:
6b89d4c1ae ("perf/x86/intel: Fix INTEL_FLAGS_EVENT_CONSTRAINT* masking")
That patch modified INTEL_FLAGS_EVENT_CONSTRAINT() to only look at the event code
when matching a constraint. If code+umask were needed, then the
INTEL_FLAGS_UEVENT_CONSTRAINT() macro was needed instead.
This broke with some of the constraints for PEBS events.
Several of them, including the one used for cycles:p, cycles:pp, cycles:ppp
fell in that category and caused the event to be rejected in PEBS mode.
In other words, on some platforms a cmdline such as:
$ perf top -e cycles:pp
would fail with -EINVAL.
This patch fixes this bug by properly using INTEL_FLAGS_UEVENT_CONSTRAINT()
when needed in the PEBS constraint tables.
Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: kan.liang@intel.com
Link: http://lkml.kernel.org/r/20190521005246.423-1-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2c82c7e724 ]
We can oops in nf_tables_fill_rule_info().
Its not possible to fetch previous element in rcu-protected lists
when deletions are not prevented somehow: list_del_rcu poisons
the ->prev pointer value.
Before rcu-conversion this was safe as dump operations did hold
nfnetlink mutex.
Pass previous rule as argument, obtained by keeping a pointer to
the previous rule during traversal.
Fixes: d9adf22a29 ("netfilter: nf_tables: use call_rcu in netlink dumps")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 670784fb4e ]
Commit a939bb57cd ("pinctrl: intel: implement gpio_irq_enable") was
added because clearing interrupt status bit is required to avoid
unexpected behavior.
Turns out the unmask callback also needs the fix, which can solve weird
IRQ triggering issues on I2C touchpad ELAN1200.
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fea6991636 ]
If ->hif_read_reg() or ->hif_write_reg() fail then the code unlocks
and keeps executing. It should just return.
Fixes: c5c77ba18e ("staging: wilc1000: Add SDIO/SPI 802.11 driver")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ca4e4efbef ]
These are accidentally returning positive EINVAL instead of negative
-EINVAL. Some of the callers treat positive values as success.
Fixes: 7b3ad5abf0 ("staging: Import the BCM2835 MMAL-based V4L2 camera driver.")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Stefan Wahren <stefan.wahren@i2se.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1615fe41a1 ]
The MPU6050 driver has recently gained support for the
ICM20602 IMU, which is very similar to MPU6xxx. However,
the ICM20602's FIFO data specifically includes temperature
readings, which were not present on MPU6xxx parts. As a
result, the driver will under-read the ICM20602's FIFO
register, causing the same (partial) sample to be returned
for all reads, until the FIFO overflows.
Fix this by adding a table of scan elements specifically
for the ICM20602, which takes the extra temperature data
into consideration.
While we're at it, fix the temperature offset and scaling
on ICM20602, since it uses different scale/offset constants
than the rest of the MPU6xxx devices.
Signed-off-by: Steve Moskovchenko <stevemo@skydio.com>
Fixes: 22904bdff9 ("iio: imu: mpu6050: Add support for the ICM 20602 IMU")
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
After introducing dedicated uplink representor, the netdev instance
set over the esw manager vport (PF) became no longer in use, so it was
removed in the cited commit once we're on switchdev mode.
However, the mlx5e_detach function was not updated accordingly, and it
still tries to detach a non-existing netdev, causing a kernel crash.
This patch fixes this issue.
Fixes: aec002f6f8 ("net/mlx5e: Uninstantiate esw manager vport netdev on switchdev mode")
Signed-off-by: Alaa Hleihel <alaa@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 522924b583 ]
The below patch fixes an incorrect zerocopy refcnt increment when
appending with MSG_MORE to an existing zerocopy udp skb.
send(.., MSG_ZEROCOPY | MSG_MORE); // refcnt 1
send(.., MSG_ZEROCOPY | MSG_MORE); // refcnt still 1 (bar frags)
But it missed that zerocopy need not be passed at the first send. The
right test whether the uarg is newly allocated and thus has extra
refcnt 1 is not !skb, but !skb_zcopy.
send(.., MSG_MORE); // <no uarg>
send(.., MSG_ZEROCOPY); // refcnt 1
Fixes: 100f6d8e09 ("net: correct zerocopy refcnt with udp MSG_MORE")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Stacked devices like bond interface may have a VLAN device on top of
them. Detect lag state correctly under this condition, and return the
correct routed net device, according to it the encap header is built.
Fixes: e32ee6c78e ("net/mlx5e: Support tunnel encap over tagged Ethernet")
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Due to an issue on Spectrum-2, in front-panel ports split four ways, 2 out
of 32 port buffers cannot be used. To work around this, the next FW release
will mark them as unused, and will report correspondingly lower total
shared buffer size. mlxsw will pick up the new value through a query to
cap_total_buffer_size resource. However the initial size for shared buffer
pool 0 is hard-coded and therefore needs to be updated.
Thus reduce the pool size by 2.7 MiB (which corresponds to 2/32 of the
total size of 42 MiB), and round down to the whole number of cells.
Fixes: fe099bf682 ("mlxsw: spectrum_buffers: Add Spectrum-2 shared buffer configuration")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The cited commit changed the initialization placement of the eswitch
attributes so it is done prior to parse tc actions function call,
including among others the in_rep and in_mdev fields which are mistakenly
reassigned inside the parse actions function.
This breaks the source port matching criteria of the peer redirect rule.
Fix by removing the now redundant reassignment of the already initialized
fields.
Fixes: 988ab9c736 ("net/mlx5e: Introduce mlx5e_flow_esw_attr_init() helper")
Signed-off-by: Raed Salem <raeds@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
After we have a dedicated uplink representor, the new netdev ops
doesn't support ndo_set_feature. Because of that, we can't change
some features, eg. rxvlan. Now add it back.
In this patch, I also do a cleanup for the features flag handling,
eg. remove duplicate NETIF_F_HW_TC flag setting.
Fixes: aec002f6f8 ("net/mlx5e: Uninstantiate esw manager vport netdev on switchdev mode")
Signed-off-by: Chris Mi <chrism@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The driver tries to periodically refresh neighbours that are used to
reach nexthops. This is done by periodically calling neigh_event_send().
However, if the neighbour becomes dead, there is nothing we can do to
return it to a connected state and the above function call is basically
a NOP.
This results in the nexthop never being written to the device's
adjacency table and therefore never used to forward packets.
Fix this by dropping our reference from the dead neighbour and
associating the nexthop with a new neigbhour which we will try to
refresh.
Fixes: a7ff87acd9 ("mlxsw: spectrum_router: Implement next-hop routing")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Alex Veber <alexve@mellanox.com>
Tested-by: Alex Veber <alexve@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add missing entries for create/destroy UCTX and UMEM commands.
This could get us wrong "unknown FW command" error in flows
where we unbind the device or reset the driver.
Also the translation of these commands from opcodes to string
was missing.
Fixes: 6e3722baac ("IB/mlx5: Use the correct commands for UMEM and UCTX allocation")
Signed-off-by: Edward Srouji <edwards@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit f0d2ca1531 ]
Using ethtool, users can specify a classification action matching on the
full vlan tag, which includes the DEI bit (also previously called CFI).
However, when converting the ethool_flow_spec to a flow_rule, we use
dissector keys to represent the matching patterns.
Since the vlan dissector key doesn't include the DEI bit, this
information was silently discarded when translating the ethtool
flow spec in to a flow_rule.
This commit adds the DEI bit into the vlan dissector key, and allows
propagating the information to the driver when parsing the ethtool flow
spec.
Fixes: eca4205f9e ("ethtool: add ethtool_rx_flow_spec to flow_rule structure translator")
Reported-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 6bb9e376c2 ]
If some of the switch ports were not listed in the device tree, due to
being unused, the ksz_mib_read_work function ended up accessing a NULL
dp->slave pointer and causing an oops. Skip checking statistics for any
unused ports.
Fixes: 7c6ff470aa ("net: dsa: microchip: add MIB counter reading support")
Signed-off-by: Robert Hancock <hancock@sedsystems.ca>
Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 6b7a3430c1 ]
When removing all VID filters, the mvpp2_prs_vid_entry_remove would be
called with the TCAM id incorrectly used as a VID, causing the wrong
TCAM entries to be invalidated.
Fix this by directly invalidating entries in the VID range.
Fixes: 56beda3db6 ("net: mvpp2: Add hardware offloading for VLAN filtering")
Suggested-by: Yuri Chipchev <yuric@marvell.com>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 46b0090a66 ]
VID filtering is implemented in the Header Parser, with one range of 11
vids being assigned for each no-loopback port.
Make sure we use the per-port range when looking for existing entries in
the Parser.
Since we used a global range instead of a per-port one, this causes VIDs
to be removed from the whitelist from all ports of the same PPv2
instance.
Fixes: 56beda3db6 ("net: mvpp2: Add hardware offloading for VLAN filtering")
Suggested-by: Yuri Chipchev <yuric@marvell.com>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit eccc73a6b2 ]
In commit a07966447f ("geneve: ICMP error lookup handler") I wrongly
assumed buffers from icmp_socket_deliver() would be linear. This is not
the case: icmp_socket_deliver() only guarantees we have 8 bytes of linear
data.
Eric fixed this same issue for fou and fou6 in commits 26fc181e6c
("fou, fou6: do not assume linear skbs") and 5355ed6388 ("fou, fou6:
avoid uninit-value in gue_err() and gue6_err()").
Use pskb_may_pull() instead of checking skb->len, and take into account
the fact we later access the GENEVE header with udp_hdr(), so we also
need to sum skb_transport_header() here.
Reported-by: Guillaume Nault <gnault@redhat.com>
Fixes: a07966447f ("geneve: ICMP error lookup handler")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 8399a6930d ]
In commit c3a43b9fec ("vxlan: ICMP error lookup handler") I wrongly
assumed buffers from icmp_socket_deliver() would be linear. This is not
the case: icmp_socket_deliver() only guarantees we have 8 bytes of linear
data.
Eric fixed this same issue for fou and fou6 in commits 26fc181e6c
("fou, fou6: do not assume linear skbs") and 5355ed6388 ("fou, fou6:
avoid uninit-value in gue_err() and gue6_err()").
Use pskb_may_pull() instead of checking skb->len, and take into account
the fact we later access the VXLAN header with udp_hdr(), so we also
need to sum skb_transport_header() here.
Reported-by: Guillaume Nault <gnault@redhat.com>
Fixes: c3a43b9fec ("vxlan: ICMP error lookup handler")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Prior to reloading a device we must first verify that it was not already
removed. Otherwise, the attempt to remove the device will do nothing, and
in that case we will end up proceeding with adding an new device that no
one was expecting to remove, leaving behind used resources such as EQs that
causes a failure to destroy comp EQs and syndrome (0x30f433).
Fix that by making sure that we try to remove and add a device (based on a
protocol) only if the device is already added.
Fixes: c5447c7059 ("net/mlx5: E-Switch, Reload IB interface when switching devlink modes")
Signed-off-by: Alaa Hleihel <alaa@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 42f5cda5ea ]
Set the SOCK_DONE flag to match the TCP_CLOSING state when a peer has
shut down and there is nothing left to read.
This fixes the following bug:
1) Peer sends SHUTDOWN(RDWR).
2) Socket enters TCP_CLOSING but SOCK_DONE is not set.
3) read() returns -ENOTCONN until close() is called, then returns 0.
Signed-off-by: Stephen Barber <smbarber@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5cf02612b3 ]
Syzbot reported a memleak caused by grp members' deferredq list not
purged when the grp is be deleted.
The issue occurs when more(msg_grp_bc_seqno(hdr), m->bc_rcv_nxt) in
tipc_group_filter_msg() and the skb will stay in deferredq.
So fix it by calling __skb_queue_purge for each member's deferredq
in tipc_group_delete() when a tipc sk leaves the grp.
Fixes: b87a5ea31c ("tipc: guarantee group unicast doesn't bypass group broadcast")
Reported-by: syzbot+78fbe679c8ca8d264a8d@syzkaller.appspotmail.com
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 07a6d63eb1 ]
In d5a2aa24, the name in struct console sunhv_console was changed from "ttyS"
to "ttyHV" while the name in struct uart_ops sunhv_pops remained unchanged.
This results in the hypervisor console device to be listed as "ttyHV0" under
/proc/consoles while the device node is still named "ttyS0":
root@osaka:~# cat /proc/consoles
ttyHV0 -W- (EC p ) 4:64
tty0 -WU (E ) 4:1
root@osaka:~# readlink /sys/dev/char/4:64
../../devices/root/f02836f0/f0285690/tty/ttyS0
root@osaka:~#
This means that any userland code which tries to determine the name of the
device file of the hypervisor console device can not rely on the information
provided by /proc/consoles. In particular, booting current versions of debian-
installer inside a SPARC LDOM will fail with the installer unable to determine
the console device.
After renaming the device in struct uart_ops sunhv_pops to "ttyHV" as well,
the inconsistency is fixed and it is possible again to determine the name
of the device file of the hypervisor console device by reading the contents
of /proc/console:
root@osaka:~# cat /proc/consoles
ttyHV0 -W- (EC p ) 4:64
tty0 -WU (E ) 4:1
root@osaka:~# readlink /sys/dev/char/4:64
../../devices/root/f02836f0/f0285690/tty/ttyHV0
root@osaka:~#
With this change, debian-installer works correctly when installing inside
a SPARC LDOM.
Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ce950f1050 ]
Based on comments from Xin, even after fixes for our recent syzbot
report of cookie memory leaks, its possible to get a resend of an INIT
chunk which would lead to us leaking cookie memory.
To ensure that we don't leak cookie memory, free any previously
allocated cookie first.
Change notes
v1->v2
update subsystem tag in subject (davem)
repeat kfree check for peer_random and peer_hmacs (xin)
v2->v3
net->sctp
also free peer_chunks
v3->v4
fix subject tags
v4->v5
remove cut line
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: syzbot+f7e9153b037eac9b1df8@syzkaller.appspotmail.com
CC: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
CC: Xin Long <lucien.xin@gmail.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: netdev@vger.kernel.org
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 385097a367 ]
Check that the NFC_ATTR_TARGET_INDEX attributes (in addition to
NFC_ATTR_DEVICE_INDEX) are provided by the netlink client prior to
accessing them. This prevents potential unhandled NULL pointer dereference
exceptions which can be triggered by malicious user-mode programs,
if they omit one or both of these attributes.
Signed-off-by: Young Xiao <92siuyang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 648ee6cea7 ]
tls_sw_do_sendpage needs to return the total number of bytes sent
regardless of how many sk_msgs are allocated. Unfortunately, copied
(the value we return up the stack) is zero'd before each new sk_msg
is allocated so we only return the copied size of the last sk_msg used.
The caller (splice, etc.) of sendpage will then believe only part
of its data was sent and send the missing chunks again. However,
because the data actually was sent the receiver will get multiple
copies of the same data.
To reproduce this do multiple sendfile calls with a length close to
the max record size. This will in turn call splice/sendpage, sendpage
may use multiple sk_msg in this case and then returns the incorrect
number of bytes. This will cause splice to resend creating duplicate
data on the receiver. Andre created a C program that can easily
generate this case so we will push a similar selftest for this to
bpf-next shortly.
The fix is to _not_ zero the copied field so that the total sent
bytes is returned.
Reported-by: Steinar H. Gunderson <steinar+kernel@gunderson.no>
Reported-by: Andre Tomt <andre@tomt.net>
Tested-by: Andre Tomt <andre@tomt.net>
Fixes: d829e9c411 ("tls: convert to generic sk_msg interface")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 760c80b70b ]
We get this regression when using RTL8366RB as part of a bridge
with OpenWrt:
WARNING: CPU: 0 PID: 1347 at net/switchdev/switchdev.c:291
switchdev_port_attr_set_now+0x80/0xa4
lan0: Commit of attribute (id=7) failed.
(...)
realtek-smi switch lan0: failed to initialize vlan filtering on this port
This is because it is trying to disable VLAN filtering
on VLAN0, as we have forgot to add 1 to the port number
to get the right VLAN in rtl8366_vlan_filtering(): when
we initialize the VLAN we associate VLAN1 with port 0,
VLAN2 with port 1 etc, so we need to add 1 to the port
offset.
Fixes: d8652956cf ("net: dsa: realtek-smi: Add Realtek SMI driver")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 6be8e297f9 ]
lapb_register calls lapb_create_cb, which initializes the control-
block's ref-count to one, and __lapb_insert_cb, which increments it when
adding the new block to the list of blocks.
lapb_unregister calls __lapb_remove_cb, which decrements the ref-count
when removing control-block from the list of blocks, and calls lapb_put
itself to decrement the ref-count before returning.
However, lapb_unregister also calls __lapb_devtostruct to look up the
right control-block for the given net_device, and __lapb_devtostruct
also bumps the ref-count, which means that when lapb_unregister returns
the ref-count is still 1 and the control-block is leaked.
Call lapb_put after __lapb_devtostruct to fix leak.
Reported-by: syzbot+afb980676c836b4a0afa@syzkaller.appspotmail.com
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 65a3c497c0 ]
Before taking a refcount, make sure the object is not already
scheduled for deletion.
Same fix is needed in ipv6_flowlabel_opt()
Fixes: 18367681a1 ("ipv6 flowlabel: Convert np->ipv6_fl_list to RCU.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 9a33629ba6 ]
For better consistency of synthetic NIC names, we set the probe mode to
PROBE_FORCE_SYNCHRONOUS. So the names can be aligned with the vmbus
channel offer sequence.
Fixes: af0a5646cb ("use the new async probing feature for the hyperv drivers")
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6bac76db1d upstream.
Due to copy&paste error nf_nat_mangle_udp_packet passes IPPROTO_TCP,
resulting in incorrect udp checksum when payload had to be mangled.
Fixes: dac3fe7259 ("netfilter: nat: remove csum_recalc hook")
Reported-by: Marc Haber <mh+netdev@zugschlus.de>
Tested-by: Marc Haber <mh+netdev@zugschlus.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 33258a1db1 upstream.
Commit 1b2443a547 ("powerpc/book3s64: Avoid multiple endian
conversion in pte helpers") changed the actual bitwise tests in
pte_access_permitted by using pte_write() and pte_present() helpers
rather than raw bitwise testing _PAGE_WRITE and _PAGE_PRESENT bits.
The pte_present() change now returns true for PTEs which are
!_PAGE_PRESENT and _PAGE_INVALID, which is the combination used by
pmdp_invalidate() to synchronize access from lock-free lookups.
pte_access_permitted() is used by pmd_access_permitted(), so allowing
GUP lock free access to proceed with such PTEs breaks this
synchronisation.
This bug has been observed on a host using the hash page table MMU,
with random crashes and corruption in guests, usually together with
bad PMD messages in the host.
Fix this by adding an explicit check in pmd_access_permitted(), and
documenting the condition explicitly.
The pte_write() change should be okay, and would prevent GUP from
falling back to the slow path when encountering savedwrite PTEs, which
matches what x86 (that does not implement savedwrite) does.
Fixes: 1b2443a547 ("powerpc/book3s64: Avoid multiple endian conversion in pte helpers")
Cc: stable@vger.kernel.org # v4.20+
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6c284228eb upstream.
In the old days, _PAGE_EXEC didn't exist on 6xx aka book3s/32.
Therefore, allthough __mapin_ram_chunk() was already mapping kernel
text with PAGE_KERNEL_TEXT and the rest with PAGE_KERNEL, the entire
memory was executable. Part of the memory (first 512kbytes) was
mapped with BATs instead of page table, but it was also entirely
mapped as executable.
In commit 385e89d5b2 ("powerpc/mm: add exec protection on
powerpc 603"), we started adding exec protection to some 6xx, namely
the 603, for pages mapped via pagetables.
Then, in commit 63b2bc6195 ("powerpc/mm/32s: Use BATs for
STRICT_KERNEL_RWX"), the exec protection was extended to BAT mapped
memory, so that really only the kernel text could be executed.
The problem here is that kexec is based on copying some code into
upper part of memory then executing it from there in order to install
a fresh new kernel at its definitive location.
However, the code is position independant and first part of it is
just there to deactivate the MMU and jump to the second part. So it
is possible to run this first part inplace instead of running the
copy. Once the MMU is off, there is no protection anymore and the
second part of the code will just run as before.
Reported-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Fixes: 63b2bc6195 ("powerpc/mm/32s: Use BATs for STRICT_KERNEL_RWX")
Cc: stable@vger.kernel.org # v5.1+
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Tested-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 48eaeb7664 upstream.
We've moved the override and firmware EDID (simply "override EDID" from
now on) handling to the low level drm_do_get_edid() function in order to
transparently use the override throughout the stack. The idea is that
you get the override EDID via the ->get_modes() hook.
Unfortunately, there are scenarios where the DDC probe in drm_get_edid()
called via ->get_modes() fails, although the preceding ->detect()
succeeds.
In the case reported by Paul Wise, the ->detect() hook,
intel_crt_detect(), relies on hotplug detect, bypassing the DDC. In the
case reported by Ilpo Järvinen, there is no ->detect() hook, which is
interpreted as connected. The subsequent DDC probe reached via
->get_modes() fails, and we don't even look at the override EDID,
resulting in no modes being added.
Because drm_get_edid() is used via ->detect() all over the place, we
can't trivially remove the DDC probe, as it leads to override EDID
effectively meaning connector forcing. The goal is that connector
forcing and override EDID remain orthogonal.
Generally, the underlying problem here is the conflation of ->detect()
and ->get_modes() via drm_get_edid(). The former should just detect, and
the latter should just get the modes, typically via reading the EDID. As
long as drm_get_edid() is used in ->detect(), it needs to retain the DDC
probe. Or such users need to have a separate DDC probe step first.
The EDID caching between ->detect() and ->get_modes() done by some
drivers is a further complication that prevents us from making
drm_do_get_edid() adapt to the two cases.
Work around the regression by falling back to a separate attempt at
getting the override EDID at drm_helper_probe_single_connector_modes()
level. With a working DDC and override EDID, it'll never be called; the
override EDID will come via ->get_modes(). There will still be a failing
DDC probe attempt in the cases that require the fallback.
v2:
- Call drm_connector_update_edid_property (Paul)
- Update commit message about EDID caching (Daniel)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107583
Reported-by: Paul Wise <pabs3@bonedaddy.net>
Cc: Paul Wise <pabs3@bonedaddy.net>
References: http://mid.mail-archive.com/alpine.DEB.2.20.1905262211270.24390@whs-18.cs.helsinki.fi
Reported-by: Ilpo Järvinen <ilpo.jarvinen@cs.helsinki.fi>
Cc: Ilpo Järvinen <ilpo.jarvinen@cs.helsinki.fi>
Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch>
References: 15f080f08d ("drm/edid: respect connector force for drm_get_edid ddc probe")
Fixes: 53fd40a90f ("drm: handle override and firmware EDID at drm_do_get_edid() level")
Cc: <stable@vger.kernel.org> # v4.15+ 56a2b7f2a3 drm/edid: abstract override/firmware EDID retrieval
Cc: <stable@vger.kernel.org> # v4.15+
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Harish Chegondi <harish.chegondi@intel.com>
Tested-by: Paul Wise <pabs3@bonedaddy.net>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190610093054.28445-1-jani.nikula@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c7563e62a6 upstream.
Booting with kernel parameter "rdt=cmt,mbmtotal,memlocal,l3cat,mba" and
executing "mount -t resctrl resctrl -o mba_MBps /sys/fs/resctrl" results in
a NULL pointer dereference on systems which do not have local MBM support
enabled..
BUG: kernel NULL pointer dereference, address: 0000000000000020
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 722 Comm: kworker/0:3 Not tainted 5.2.0-0.rc3.git0.1.el7_UNSUPPORTED.x86_64 #2
Workqueue: events mbm_handle_overflow
RIP: 0010:mbm_handle_overflow+0x150/0x2b0
Only enter the bandwith update loop if the system has local MBM enabled.
Fixes: de73f38f76 ("x86/intel_rdt/mba_sc: Feedback loop to dynamically update mem bandwidth")
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Reinette Chatre <reinette.chatre@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20190610171544.13474-1-prarit@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 00e5a2bbcc upstream.
The size of the vmemmap section is hardcoded to 1 TB to support the
maximum amount of system RAM in 4-level paging mode - 64 TB.
However, 1 TB is not enough for vmemmap in 5-level paging mode. Assuming
the size of struct page is 64 Bytes, to support 4 PB system RAM in 5-level,
64 TB of vmemmap area is needed:
4 * 1000^5 PB / 4096 bytes page size * 64 bytes per page struct / 1000^4 TB = 62.5 TB.
This hardcoding may cause vmemmap to corrupt the following
cpu_entry_area section, if KASLR puts vmemmap very close to it and the
actual vmemmap size is bigger than 1 TB.
So calculate the actual size of the vmemmap region needed and then align
it up to 1 TB boundary.
In 4-level paging mode it is always 1 TB. In 5-level it's adjusted on
demand. The current code reserves 0.5 PB for vmemmap on 5-level. With
this change, the space can be saved and thus used to increase entropy
for the randomization.
[ bp: Spell out how the 64 TB needed for vmemmap is computed and massage commit
message. ]
Fixes: eedb92abb9 ("x86/mm: Make virtual memory layout dynamic for CONFIG_X86_5LEVEL=y")
Signed-off-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Kirill A. Shutemov <kirill@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: kirill.shutemov@linux.intel.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable <stable@vger.kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190523025744.3756-1-bhe@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f3176ec942 upstream.
Since commit d52888aa27 ("x86/mm: Move LDT remap out of KASLR region on
5-level paging") kernel doesn't boot with KASAN on 5-level paging machines.
The bug is actually in early_p4d_offset() and introduced by commit
12a8cc7fcf ("x86/kasan: Use the same shadow offset for 4- and 5-level paging")
early_p4d_offset() tries to convert pgd_val(*pgd) value to a physical
address. This doesn't make sense because pgd_val() already contains the
physical address.
It did work prior to commit d52888aa27 because the result of
"__pa_nodebug(pgd_val(*pgd)) & PTE_PFN_MASK" was the same as "pgd_val(*pgd)
& PTE_PFN_MASK". __pa_nodebug() just set some high bits which were masked
out by applying PTE_PFN_MASK.
After the change of the PAGE_OFFSET offset in commit d52888aa27
__pa_nodebug(pgd_val(*pgd)) started to return a value with more high bits
set and PTE_PFN_MASK wasn't enough to mask out all of them. So it returns a
wrong not even canonical address and crashes on the attempt to dereference
it.
Switch back to pgd_val() & PTE_PFN_MASK to cure the issue.
Fixes: 12a8cc7fcf ("x86/kasan: Use the same shadow offset for 4- and 5-level paging")
Reported-by: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: kasan-dev@googlegroups.com
Cc: stable@vger.kernel.org
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190614143149.2227-1-aryabinin@virtuozzo.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 78f4e932f7 upstream.
Adric Blake reported the following warning during suspend-resume:
Enabling non-boot CPUs ...
x86: Booting SMP configuration:
smpboot: Booting Node 0 Processor 1 APIC 0x2
unchecked MSR access error: WRMSR to 0x10f (tried to write 0x0000000000000000) \
at rIP: 0xffffffff8d267924 (native_write_msr+0x4/0x20)
Call Trace:
intel_set_tfa
intel_pmu_cpu_starting
? x86_pmu_dead_cpu
x86_pmu_starting_cpu
cpuhp_invoke_callback
? _raw_spin_lock_irqsave
notify_cpu_starting
start_secondary
secondary_startup_64
microcode: sig=0x806ea, pf=0x80, revision=0x96
microcode: updated to revision 0xb4, date = 2019-04-01
CPU1 is up
The MSR in question is MSR_TFA_RTM_FORCE_ABORT and that MSR is emulated
by microcode. The log above shows that the microcode loader callback
happens after the PMU restoration, leading to the conjecture that
because the microcode hasn't been updated yet, that MSR is not present
yet, leading to the #GP.
Add a microcode loader-specific hotplug vector which comes before
the PERF vectors and thus executes earlier and makes sure the MSR is
present.
Fixes: 400816f60c ("perf/x86/intel: Implement support for TSX Force Abort")
Reported-by: Adric Blake <promarbler14@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: x86@kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=203637
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bd21f0222a upstream.
This patch fixes the chipmunk-like voice that manifets randomly when
using the integrated mic of the Logitech Webcam HD C270.
The issue was solved initially for this device by commit 2394d67e44
("USB: add RESET_RESUME for webcams shown to be quirky") but it was then
reintroduced by e387ef5c47 ("usb: Add USB_QUIRK_RESET_RESUME for all
Logitech UVC webcams"). This patch is to have the fix back.
Signed-off-by: Marco Zatta <marco@zatta.me>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit babd183915 upstream.
In commit abb621844f ("usb: ch9: make usb_endpoint_maxp() return
only packet size") the API to usb_endpoint_maxp() changed. It used to
just return wMaxPacketSize but after that commit it returned
wMaxPacketSize with the high bits (the multiplier) masked off. If you
wanted to get the multiplier it was now up to your code to call the
new usb_endpoint_maxp_mult() which was introduced in
commit 541b6fe630 ("usb: add helper to extract bits 12:11 of
wMaxPacketSize").
Prior to the API change most host drivers were updated, but no update
was made to dwc2. Presumably it was assumed that dwc2 was too
simplistic to use the multiplier and thus just didn't support a
certain class of USB devices. However, it turns out that dwc2 did use
the multiplier and many devices using it were working quite nicely.
That means that many USB devices have been broken since the API
change. One such device is a Logitech HD Pro Webcam C920.
Specifically, though dwc2 didn't directly call usb_endpoint_maxp(), it
did call usb_maxpacket() which in turn called usb_endpoint_maxp().
Let's update dwc2 to work properly with the new API.
Fixes: abb621844f ("usb: ch9: make usb_endpoint_maxp() return only packet size")
Cc: stable@vger.kernel.org
Acked-by: Minas Harutyunyan <hminas@synopsys.com>
Reviewed-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4a4863bf2e upstream.
Insert a padding between data and the stored_xfer_buffer pointer to
ensure they are not on the same cache line.
Otherwise, the stored_xfer_buffer gets corrupted for IN URBs on
non-cache-coherent systems. (In my case: Lantiq xRX200 MIPS)
Fixes: 3bc04e28a0 ("usb: dwc2: host: Get aligned DMA in a more supported way")
Fixes: 56406e017a ("usb: dwc2: Fix DMA alignment to start at allocated boundary")
Cc: <stable@vger.kernel.org>
Tested-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Minas Harutyunyan <hminas@synopsys.com>
Signed-off-by: Martin Schiller <ms@dev.tdt.de>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bcd6aa7b6c upstream.
If SVGA_3D_CMD_DX_DEFINE_RENDERTARGET_VIEW is called with a surface
ID of SVGA3D_INVALID_ID, the srf struct will remain NULL after
vmw_cmd_res_check(), leading to a null pointer dereference in
vmw_view_add().
Cc: <stable@vger.kernel.org>
Fixes: d80efd5cb3 ("drm/vmwgfx: Initial DX support")
Signed-off-by: Murray McAllister <murray.mcallister@gmail.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5ed7f4b5ec upstream.
If SVGA_3D_CMD_DX_SET_SHADER is called with a shader ID
of SVGA3D_INVALID_ID, and a shader type of
SVGA3D_SHADERTYPE_INVALID, the calculated binding.shader_slot
will be 4294967295, leading to an out-of-bounds read in vmw_binding_loc()
when the offset is calculated.
Cc: <stable@vger.kernel.org>
Fixes: d80efd5cb3 ("drm/vmwgfx: Initial DX support")
Signed-off-by: Murray McAllister <murray.mcallister@gmail.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 883d25e70b ]
The fields filter would not work with child fields, as the respective
parents would not be included. No parents displayed == no childs displayed.
To reproduce, run on s390 (would work on other platforms, too, but would
require a different filter name):
- Run 'kvm_stat -d'
- Press 'f'
- Enter 'instruct'
Notice that events like instruction_diag_44 or instruction_diag_500 are not
displayed - the output remains empty.
With this patch, we will filter by matching events and their parents.
However, consider the following example where we filter by
instruction_diag_44:
kvm statistics - summary
regex filter: instruction_diag_44
Event Total %Total CurAvg/s
exit_instruction 276 100.0 12
instruction_diag_44 256 92.8 11
Total 276 12
Note that the parent ('exit_instruction') displays the total events, but
the childs listed do not match its total (256 instead of 276). This is
intended (since we're filtering all but one child), but might be confusing
on first sight.
Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 55eda003f0 ]
VM_MODE_P52V48_4K is not a valid mode for AArch64. Replace its
use in vm_create_default() with a mode that works and represents
a good AArch64 default. (We didn't ever see a problem with this
because we don't have any unit tests using vm_create_default(),
but it's good to get it fixed in advance.)
Reported-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bffed38d4f ]
The memory slot size must be aligned to the host's page size. When
testing a guest with a 4k page size on a host with a 64k page size,
then 3 guest pages are not host page size aligned. Since we just need
a nearly arbitrary number of extra pages to ensure the memslot is not
aligned to a 64 host-page boundary for this test, then we can use
16, as that's 64k aligned, but not 64 * 64k aligned.
Fixes: 76d58e0f07 ("KVM: fix KVM_CLEAR_DIRTY_LOG for memory slots of unaligned size", 2019-04-17)
Signed-off-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 19ec166c3f ]
kselftests exposed a problem in the s390 handling for memory slots.
Right now we only do proper memory slot handling for creation of new
memory slots. Neither MOVE, nor DELETION are handled properly. Let us
implement those.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2924b52117 ]
According to the SDM, for MSR_IA32_PERFCTR0/1 "the lower-order 32 bits of
each MSR may be written with any value, and the high-order 8 bits are
sign-extended according to the value of bit 31", but the fixed counters
in real hardware are limited to the width of the fixed counters ("bits
beyond the width of the fixed-function counter are reserved and must be
written as zeros"). Fix KVM to do the same.
Reported-by: Nadav Amit <nadav.amit@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0e6f467ee2 ]
This patch will simplify the changes in the next, by enforcing the
masking of the counters to RDPMC and RDMSR.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6f2f84532c ]
Userspace can easily set up invalid processor state in such a way that
dmesg will be filled with VMCS or VMCB dumps. Disable this by default
using a module parameter.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0e6edceb8f ]
After commit c3941d9e0 (KVM: lapic: Allow user to disable adaptive tuning of
timer advancement), '-1' enables adaptive tuning starting from default
advancment of 1000ns. However, we should expose an int instead of an overflow
uint module parameter.
Before patch:
/sys/module/kvm/parameters/lapic_timer_advance_ns:4294967295
After patch:
/sys/module/kvm/parameters/lapic_timer_advance_ns:-1
Fixes: c3941d9e0 (KVM: lapic: Allow user to disable adaptive tuning of timer advancement)
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4d25996565 ]
We get a warning when build kernel W=1:
arch/x86/kvm/vmx/vmx.c:6365:6: warning: no previous prototype for ‘vmx_update_host_rsp’ [-Wmissing-prototypes]
void vmx_update_host_rsp(struct vcpu_vmx *vmx, unsigned long host_rsp)
Add the missing declaration to fix this.
Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit be7fcf1d17 ]
The code is trying to check that all the padding is zeroed out and it
does this:
entry->padding[0] == entry->padding[1] == entry->padding[2] == 0
Assume everything is zeroed correctly, then the first comparison is
true, the next comparison is false and false is equal to zero so the
overall condition is true. This bug doesn't affect run time very
badly, but the code should instead just check that all three paddings
are zero individually.
Also the error message was copy and pasted from an earlier error and it
wasn't correct.
Fixes: 7edcb73433 ("KVM: selftests: Add hyperv_cpuid test")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit db80927ea1 ]
The offset for reading the shadow VMCS is sizeof(*kvm_state)+VMCS12_SIZE,
so the correct size must be that plus sizeof(*vmcs12). This could lead
to KVM reading garbage data from userspace and not reporting an error,
but is otherwise not sensitive.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 623e1528d4 ]
KVM has helpers to handle the condition codes of trapped aarch32
instructions. These are marked __hyp_text and used from HYP, but they
aren't built by the 'hyp' Makefile, which has all the runes to avoid ASAN
and KCOV instrumentation.
Move this code to a new hyp/aarch32.c to avoid a hyp-panic when starting
an aarch32 guest on a host built with the ASAN/KCOV debug options.
Fixes: 021234ef37 ("KVM: arm64: Make kvm_condition_valid32() accessible from EL2")
Fixes: 8cebe750c4 ("arm64: KVM: Make kvm_skip_instr32 available to HYP")
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 94d250fae4 ]
Fix a racing condition in ipheth.c that can lead to slow performance.
Bug: In ipheth_tx(), netif_wake_queue() may be called on the callback
ipheth_sndbulk_callback(), _before_ netif_stop_queue() is called.
When this happens, the queue is stopped longer than it needs to be,
thus reducing network performance.
Fix: Move netif_stop_queue() in front of usb_submit_urb(). Now the order
is always correct. In case, usb_submit_urb() fails, the queue is woken up
again as callback will not fire.
Testing: This racing condition is usually not noticeable, as it has to
occur very frequently to slowdown the network. The callback from the USB
is usually triggered slow enough, so the situation does not appear.
However, on a Ubuntu Linux on VMWare Workstation, running on Windows 10,
the we loose the race quite often and the following speedup can be noticed:
Without this patch: Download: 4.10 Mbit/s, Upload: 4.01 Mbit/s
With this patch: Download: 36.23 Mbit/s, Upload: 17.61 Mbit/s
Signed-off-by: Oliver Zweigle <Oliver.Zweigle@faro.com>
Signed-off-by: Bernd Eckstein <3ernd.Eckstein@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 55267c88c0 ]
hist_field_var_ref() is an implementation of hist_field_fn_t(), which
can be called with a null tracing_map_elt elt param when assembling a
key in event_hist_trigger().
In the case of hist_field_var_ref() this doesn't make sense, because a
variable can only be resolved by looking it up using an already
assembled key i.e. a variable can't be used to assemble a key since
the key is required in order to access the variable.
Upper layers should prevent the user from constructing a key using a
variable in the first place, but in case one slips through, it
shouldn't cause a NULL pointer dereference. Also if one does slip
through, we want to know about it, so emit a one-time warning in that
case.
Link: http://lkml.kernel.org/r/64ec8dc15c14d305295b64cdfcc6b2b9dd14753f.1555597045.git.tom.zanussi@linux.intel.com
Reported-by: Vincent Bernat <vincent@bernat.ch>
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fe48319243 ]
When running under a pipe, some timer tests would not report output in
real-time because stdout flushes were missing after printf()s that lacked
a newline. This adds them to restore real-time status output that humans
can enjoy.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fc82d93e57 ]
The IPv4 testing address are all in 192.51.100.0 subnet. It doesn't make
sense to set a 198.51.100.1 local address. Should be a typo.
Fixes: 65b2b4939a ("selftests: net: initial fib rule tests")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c01dafad77 ]
Several places (dimm_devs.c, core.c etc) include label.h but only
label.c uses NSINDEX_SIGNATURE, so move its definition to label.c
instead.
In file included from drivers/nvdimm/dimm_devs.c:23:
drivers/nvdimm/label.h:41:19: warning: 'NSINDEX_SIGNATURE' defined but
not used [-Wunused-const-variable=]
Also, some places abuse "/**" which is only reserved for the kernel-doc.
drivers/nvdimm/bus.c:648: warning: cannot understand function prototype:
'struct attribute_group nd_device_attribute_group = '
drivers/nvdimm/bus.c:677: warning: cannot understand function prototype:
'struct attribute_group nd_numa_attribute_group = '
Those are just some member assignments for the "struct attribute_group"
instances and it can't be expressed in the kernel-doc.
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d0c0d90233 ]
Currently an int is being shifted and the result is being cast to a u64
which leads to undefined behaviour if the shift is more than 31 bits. Fix
this by casting the integer value 1 to u64 before the shift operation.
Addresses-Coverity: ("Bad shift operation")
Fixes: 7b59476912 ("[SCSI] bnx2fc: Handle REC_TOV error code from firmware")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 41552199b5 ]
drivers/scsi/myrs.c: In function 'myrs_log_event':
drivers/scsi/myrs.c:821:24: warning: 'sshdr.sense_key' may be used uninitialized in this function [-Wmaybe-uninitialized]
struct scsi_sense_hdr sshdr;
If ev->ev_code is not 0x1C, sshdr.sense_key may be used uninitialized. Fix
this by initializing variable 'sshdr' to 0.
Fixes: 7726618639 ("scsi: myrs: Add Mylex RAID controller (SCSI interface)")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d6423bd030 ]
There are several Beckhoff Automation industrial PC boards which use
pmc_plt_clk* clocks for ethernet controllers. This adds affected boards
to critclk_systems DMI table so the clocks are marked as CLK_CRITICAL and
not turned off.
Fixes: 648e921888 ("clk: x86: Stop marking clocks as CLK_IS_CRITICAL")
Signed-off-by: Steffen Dirkwinkel <s.dirkwinkel@beckhoff.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3d0818f5eb ]
The Lex 3I380D industrial PC has 4 ethernet controllers on board
which need pmc_plt_clk0 - 3 to function, add it to the critclk_systems
DMI table, so that drivers/clk/x86/clk-pmc-atom.c will mark the clocks
as CLK_CRITICAL and they will not get turned off.
Fixes: 648e921888 ("clk: x86: Stop marking clocks as CLK_IS_CRITICAL")
Reported-and-tested-by: Semyon Verchenko <semverchenko@factor-ts.ru>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 510a405d94 ]
Unconditionally hide device pm latency tolerance when uninitializing
the controller to ensure all qos resources are released so that we're
not leaking this memory. This is safe to call if none were allocated in
the first place, or were previously freed.
Fixes: c5552fde102fc("nvme: Enable autonomous power state transitions")
Suggested-by: Keith Busch <keith.busch@intel.com>
Tested-by: David Milburn <dmilburn@redhat.com>
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
[changelog]
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5fb4aac756 ]
Holding the SRCU critical section protecting the namespace list can
cause deadlocks when using the per-namespace admin passthrough ioctl to
delete as namespace. Release it earlier when performing per-controller
ioctls to avoid that.
Reported-by: Kenneth Heitke <kenneth.heitke@intel.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 100c815cbd ]
If we can't get a namespace don't leak the SRCU lock. nvme_ioctl was
working around this, but nvme_pr_command wasn't handling this properly.
Just do what callers would usually expect.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ac4e0e055f ]
For a host which has a lower rlimit for max locked memory (e.g., 64KB),
the following error occurs in one of our production systems:
# /usr/sbin/bpftool prog load /paragon/pods/52877437/home/mark.o \
/sys/fs/bpf/paragon_mark_21 type cgroup/skb \
map idx 0 pinned /sys/fs/bpf/paragon_map_21
libbpf: Error in bpf_object__probe_name():Operation not permitted(1).
Couldn't load basic 'r0 = 0' BPF program.
Error: failed to open object file
The reason is due to low locked memory during bpf_object__probe_name()
which probes whether program name is supported in kernel or not
during __bpf_object__open_xattr().
bpftool program load already tries to relax mlock rlimit before
bpf_object__load(). Let us move set_max_rlimit() before
__bpf_object__open_xattr(), which fixed the issue here.
Fixes: 47eff61777 ("bpf, libbpf: introduce bpf_object__probe_caps to test BPF capabilities")
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7ba36eccb3 ]
The arm64 ptdump code can race with concurrent modification of the
kernel page tables. At the time this was added, this was sound as:
* Modifications to leaf entries could result in stale information being
logged, but would not result in a functional problem.
* Boot time modifications to non-leaf entries (e.g. freeing of initmem)
were performed when the ptdump code cannot be invoked.
* At runtime, modifications to non-leaf entries only occurred in the
vmalloc region, and these were strictly additive, as intermediate
entries were never freed.
However, since commit:
commit 324420bf91 ("arm64: add support for ioremap() block mappings")
... it has been possible to create huge mappings in the vmalloc area at
runtime, and as part of this existing intermediate levels of table my be
removed and freed.
It's possible for the ptdump code to race with this, and continue to
walk tables which have been freed (and potentially poisoned or
reallocated). As a result of this, the ptdump code may dereference bogus
addresses, which could be fatal.
Since huge-vmap is a TLB and memory optimization, we can disable it when
the runtime ptdump code is in use to avoid this problem.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Fixes: 324420bf91 ("arm64: add support for ioremap() block mappings")
Acked-by: Ard Biesheuvel <ard.biesheuvel@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5fa2ca7c4a ]
The tcp_bpf_wait_data() routine needs to check timeo != 0 before
calling sk_wait_event() otherwise we may see unexpected stalls
on receiver.
Arika did all the leg work here I just formatted, posted and ran
a few tests.
Fixes: 604326b41a ("bpf, sockmap: convert to generic sk_msg interface")
Reported-by: Arika Chen <eaglesora@gmail.com>
Suggested-by: Arika Chen <eaglesora@gmail.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f4a0be84d7 ]
For the unlikely case of TxBD extensions (i.e. ptp)
the driver tries to unmap the tx_swbd corresponding
to the extension, which is bogus as it has no buffer
attached.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f413cbb332 ]
Errors are negative numbers. Using %u shows them as very large positive
numbers such as 4294967277 that don't make sense. Use the %d format
instead, and get a much nicer -19.
Signed-off-by: Luca Ceresoli <luca@lucaceresoli.net>
Fixes: b48e0bab14 ("net: macb: Migrate to devm clock interface")
Fixes: 93b31f48b3 ("net/macb: unify clock management")
Fixes: 421d9df062 ("net/macb: merge at91_ether driver into macb driver")
Fixes: aead88bd0e ("net: ethernet: macb: Add support for rx_clk")
Fixes: f5473d1d44 ("net: macb: Support clock management for tsu_clk")
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 48caebf7e1 ]
When dumping the page table in response to an unexpected kernel page
fault, we print the virtual (hashed) address of the page table base, but
display physical addresses for everything else.
Make the page table dumping code in show_pte() consistent, by printing
the page table base pointer as a physical address.
Reported-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e2a8be5696 ]
There were a number of erroneous comments and incorrect older lockdep
checks that were causing a number of warnings.
Resolve the following:
- Inconsistent lock state warnings in lpfc_nvme_info_show().
- Fixed comments and code on sequences where ring lock is now held instead
of hbalock.
- Reworked calling sequences around lpfc_sli_iocbq_lookup(). Rather than
locking prior to the routine and have routine guess on what lock, take
the lock within the routine. The lockdep check becomes unnecessary.
- Fixed comments and removed erroneous hbalock checks.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
CC: Bart Van Assche <bvanassche@acm.org>
Tested-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d0adee5d12 ]
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/scsi/qedi/qedi_iscsi.c: In function 'qedi_ep_connect':
drivers/scsi/qedi/qedi_iscsi.c:813:23: warning: variable 'udev' set but not used [-Wunused-but-set-variable]
drivers/scsi/qedi/qedi_iscsi.c:812:18: warning: variable 'cdev' set but not used [-Wunused-but-set-variable]
These have never been used since introduction.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c09581a527 ]
KASAN reports this:
BUG: KASAN: global-out-of-bounds in qedi_dbg_err+0xda/0x330 [qedi]
Read of size 31 at addr ffffffffc12b0ae0 by task syz-executor.0/2429
CPU: 0 PID: 2429 Comm: syz-executor.0 Not tainted 5.0.0-rc7+ #45
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0xfa/0x1ce lib/dump_stack.c:113
print_address_description+0x1c4/0x270 mm/kasan/report.c:187
kasan_report+0x149/0x18d mm/kasan/report.c:317
memcpy+0x1f/0x50 mm/kasan/common.c:130
qedi_dbg_err+0xda/0x330 [qedi]
? 0xffffffffc12d0000
qedi_init+0x118/0x1000 [qedi]
? 0xffffffffc12d0000
? 0xffffffffc12d0000
? 0xffffffffc12d0000
do_one_initcall+0xfa/0x5ca init/main.c:887
do_init_module+0x204/0x5f6 kernel/module.c:3460
load_module+0x66b2/0x8570 kernel/module.c:3808
__do_sys_finit_module+0x238/0x2a0 kernel/module.c:3902
do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x462e99
Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f2d57e55c58 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000000000073bfa0 RCX: 0000000000462e99
RDX: 0000000000000000 RSI: 00000000200003c0 RDI: 0000000000000003
RBP: 00007f2d57e55c70 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f2d57e566bc
R13: 00000000004bcefb R14: 00000000006f7030 R15: 0000000000000004
The buggy address belongs to the variable:
__func__.67584+0x0/0xffffffffffffd520 [qedi]
Memory state around the buggy address:
ffffffffc12b0980: fa fa fa fa 00 04 fa fa fa fa fa fa 00 00 05 fa
ffffffffc12b0a00: fa fa fa fa 00 00 04 fa fa fa fa fa 00 05 fa fa
> ffffffffc12b0a80: fa fa fa fa 00 06 fa fa fa fa fa fa 00 02 fa fa
^
ffffffffc12b0b00: fa fa fa fa 00 00 04 fa fa fa fa fa 00 00 03 fa
ffffffffc12b0b80: fa fa fa fa 00 00 02 fa fa fa fa fa 00 00 04 fa
Currently the qedi_dbg_* family of functions can overrun the end of the
source string if it is less than the destination buffer length because of
the use of a fixed sized memcpy. Remove the memset/memcpy calls to nfunc
and just use func instead as it is always a null terminated string.
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: ace7f46ba5 ("scsi: qedi: Add QLogic FastLinQ offload iSCSI driver framework.")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5386a4e6c7 ]
During EEH error recovery testing it was discovered that driver's reset()
callback partially frees resources used by driver, leaving some stale
memory. After reset() is done and when resume() callback in driver uses
old data which results into error leaving adapter disabled due to PCIe
error.
This patch does cleanup for EEH recovery code path and prevents adapter
from getting disabled.
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Himanshu Madhani <hmadhani@marvell.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit cabede8b4f ]
When converting a skb to msg->sg we forget to set the size after the
latest ktls/tls code conversion. This patch can be reached by doing
a redir into ingress path from BPF skb sock recv hook. Then trying to
read the size fails.
Fix this by setting the size.
Fixes: 604326b41a ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c42253cc88 ]
In tcp bpf remove we free the cork list and purge the ingress msg
list. However we do this before the ref count reaches zero so it
could be possible some other access is in progress. In this case
(tcp close and/or tcp_unhash) we happen to also hold the sock
lock so no path exists but lets fix it otherwise it is extremely
fragile and breaks the reference counting rules. Also we already
check the cork list and ingress msg queue and free them once the
ref count reaches zero so its wasteful to check twice.
Fixes: 604326b41a ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 014894360e ]
If we try to call strp_done on a parser that has never been
initialized, because the sockmap user is only using TX side for
example we get the following error.
[ 883.422081] WARNING: CPU: 1 PID: 208 at kernel/workqueue.c:3030 __flush_work+0x1ca/0x1e0
...
[ 883.422095] Workqueue: events sk_psock_destroy_deferred
[ 883.422097] RIP: 0010:__flush_work+0x1ca/0x1e0
This had been wrapped in a 'if (psock->parser.enabled)' logic which
was broken because the strp_done() was never actually being called
because we do a strp_stop() earlier in the tear down logic will
set parser.enabled to false. This could result in a use after free
if work was still in the queue and was resolved by the patch here,
1d79895aef ("sk_msg: Always cancel strp work before freeing the
psock"). However, calling strp_stop(), done by the patch marked in
the fixes tag, only is useful if we never initialized a strp parser
program and never initialized the strp to start with. Because if
we had initialized a stream parser strp_stop() would have been called
by sk_psock_drop() earlier in the tear down process. By forcing the
strp to stop we get past the WARNING in strp_done that checks
the stopped flag but calling cancel_work_sync on work that has never
been initialized is also wrong and generates the warning above.
To fix check if the parser program exists. If the program exists
then the strp work has been initialized and must be sync'd and
cancelled before free'ing any structures. If no program exists we
never initialized the stream parser in the first place so skip the
sync/cancel logic implemented by strp_done.
Finally, remove the strp_done its not needed and in the case where we
are using the stream parser has already been called.
Fixes: e8e3437762 ("bpf: Stop the psock parser before canceling its work")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 14ae42a6f0 ]
Since commit 5768402fd9 ("perf/ring_buffer: Use high order allocations
for AUX buffers optimistically"), the perf core tends to back aux buffer
allocations with high-order pages with the order encoded in the
PagePrivate data. The Arm SPE driver explicitly rejects such pages,
causing the perf tool to fail with:
| failed to mmap with 12 (Cannot allocate memory)
In actual fact, we can simply treat these pages just like any other
since the perf core takes care to populate the page array appropriately.
In theory we could try to map with PMDs where possible, but for now,
let's just get things working again.
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Fixes: 5768402fd9 ("perf/ring_buffer: Use high order allocations for AUX buffers optimistically")
Reported-by: Hanjun Guo <guohanjun@huawei.com>
Tested-by: Hanjun Guo <guohanjun@huawei.com>
Tested-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b281218ad4 ]
There is an out-of-bounds access to "config[len - 1]" array when the
variable "len" is zero.
See commit dada6a43b0 ("kgdboc: fix KASAN global-out-of-bounds bug
in param_set_kgdboc_var()") for details.
Signed-off-by: Young Xiao <YangX92@hotmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f0654ba94e ]
This reverts commit feb689025f.
The fix attempt was incorrect, leading to the mutex deadlock through
the close of OSS sequencer client. The proper fix needs more
consideration, so let's revert it now.
Fixes: feb689025f ("ALSA: seq: Protect in-kernel ioctl calls with mutex")
Reported-by: syzbot+47ded6c0f23016cde310@syzkaller.appspotmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2eabc5ec8a ]
The snd_seq_ioctl_get_subscription() retrieves the port subscriber
information as a pointer, while the object isn't protected, hence it
may be deleted before the actual reference. This race was spotted by
syzkaller and may lead to a UAF.
The fix is simply copying the data in the lookup function that
performs in the rwsem to protect against the deletion.
Reported-by: syzbot+9437020c82413d00222d@syzkaller.appspotmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit feb689025f ]
ALSA OSS sequencer calls the ioctl function indirectly via
snd_seq_kernel_client_ctl(). While we already applied the protection
against races between the normal ioctls and writes via the client's
ioctl_mutex, this code path was left untouched. And this seems to be
the cause of still remaining some rare UAF as spontaneously triggered
by syzkaller.
For the sake of robustness, wrap the ioctl_mutex also for the call via
snd_seq_kernel_client_ctl(), too.
Reported-by: syzbot+e4c8abb920efa77bace9@syzkaller.appspotmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 326fb6dd14 upstream.
While loading the DMC firmware we were double checking the headers made
sense, but in no place we checked that we were actually reading memory
we were supposed to. This could be wrong in case the firmware file is
truncated or malformed.
Before this patch:
# ls -l /lib/firmware/i915/icl_dmc_ver1_07.bin
-rw-r--r-- 1 root root 25716 Feb 1 12:26 icl_dmc_ver1_07.bin
# truncate -s 25700 /lib/firmware/i915/icl_dmc_ver1_07.bin
# modprobe i915
# dmesg| grep -i dmc
[drm:intel_csr_ucode_init [i915]] Loading i915/icl_dmc_ver1_07.bin
[drm] Finished loading DMC firmware i915/icl_dmc_ver1_07.bin (v1.7)
i.e. it loads random data. Now it fails like below:
[drm:intel_csr_ucode_init [i915]] Loading i915/icl_dmc_ver1_07.bin
[drm:csr_load_work_fn [i915]] *ERROR* Truncated DMC firmware, rejecting.
i915 0000:00:02.0: Failed to load DMC firmware i915/icl_dmc_ver1_07.bin. Disabling runtime power management.
i915 0000:00:02.0: DMC firmware homepage: https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/i915
Before reading any part of the firmware file, validate the input first.
Fixes: eb805623d8 ("drm/i915/skl: Add support to load SKL CSR firmware.")
Cc: stable@vger.kernel.org
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190605235535.17791-1-lucas.demarchi@intel.com
(cherry picked from commit bc7b488b1d)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d74408f528 upstream.
Our SDVO audio support is pretty bogus. We can't push audio over the
SDVO bus, so trying to enable audio in the SDVO control register doesn't
do anything. In fact it looks like the SDVO encoder will always mix in
the audio coming over HDA, and there's no (at least documented) way to
disable that from our side. So HDMI audio does work currently on gen4
but only by luck really. On gen3 it got broken by the referenced commit.
And what has always been missing on every platform is the ELD.
To pass the ELD to the audio driver we need to write it to magic buffer
in the SDVO encoder hardware which then gets pulled out via HDA in the
other end. Ie. pretty much the same thing we had for native HDMI before
we started to just pass the ELD between the drivers. This sort of
explains why we even have that silly hardware buffer with native HDMI.
$ cat /proc/asound/card0/eld#1.0
-monitor_present 0
-eld_valid 0
+monitor_present 1
+eld_valid 1
+monitor_name LG TV
+connection_type HDMI
+...
This also fixes our state readout since we can now query the SDVO
encoder about the state of the "ELD valid" and "presence detect"
bits. As mentioned those don't actually control whether audio
gets sent over the HDMI cable, but it's the best we can do. And with
the state checker appeased we can re-enable HDMI audio for gen3.
Cc: stable@vger.kernel.org
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: zardam@gmail.com
Tested-by: zardam@gmail.com
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108976
Fixes: de44e256b9 ("drm/i915/sdvo: Shut up state checker with hdmi cards on gen3")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190409144054.24561-3-ville.syrjala@linux.intel.com
Reviewed-by: Imre Deak <imre.deak@intel.com>
(cherry picked from commit dc49a56bd4)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 517b91f4cd upstream.
[What]
readptr read always returns zero, since most likely
these blocks are either power or clock gated.
[How]
fetch rptr after amdgpu_ring_alloc() which informs
the power management code that the block is about to be
used and hence the gating is turned off.
Signed-off-by: Louis Li <Ching-shih.Li@amd.com>
Signed-off-by: Shirish S <shirish.s@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 29040d1ac5 upstream.
commit 53e947a0e1 ("ASoC: soc-core: merge card resources cleanup
method") merged cleanup method of snd_soc_instantiate_card() and
soc_cleanup_card_resources().
But, after this commit, if user uses unbind/bind to Component factor
drivers, Kernel might indicates refcount error at
soc_cleanup_card_resources().
The 1st reason is card->snd_card is still exist even though
snd_card_free() was called, but it is already cleaned.
We need to set NULL to it.
2nd is card->dapm and card create debugfs, but its dentry is still
exist even though it was removed. We need to set NULL to it.
Fixes: 53e947a0e1 ("ASoC: soc-core: merge card resources cleanup method")
Cc: stable@vger.kernel.org # for v5.1
Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b06c58c2a1 upstream.
When the output sample rate is [8kHz, 30kHz], the limitation
of the supported ratio range is [1/24, 8]. In the driver
we use (8kHz, 30kHz) instead of [8kHz, 30kHz].
So this patch is to fix this issue and the potential rounding
issue with divider.
Fixes: fff6e03c7b ("ASoC: fsl_asrc: add support for 8-30kHz
output sample rate")
Cc: <stable@vger.kernel.org>
Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com>
Acked-by: Nicolin Chen <nicoleotsuka@gmail.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ad6eecbfc0 upstream.
Add regcache_mark_dirty before regcache_sync for power
of codec may be lost at suspend, then all the register
need to be reconfigured.
Fixes: 0c516b4ff8 ("ASoC: cs42xx8: Add codec driver
support for CS42448/CS42888")
Cc: <stable@vger.kernel.org>
Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 18fa84a2db upstream.
A PF_EXITING task can stay associated with an offline css. If such
task calls task_get_css(), it can get stuck indefinitely. This can be
triggered by BSD process accounting which writes to a file with
PF_EXITING set when racing against memcg disable as in the backtrace
at the end.
After this change, task_get_css() may return a css which was already
offline when the function was called. None of the existing users are
affected by this change.
INFO: rcu_sched self-detected stall on CPU
INFO: rcu_sched detected stalls on CPUs/tasks:
...
NMI backtrace for cpu 0
...
Call Trace:
<IRQ>
dump_stack+0x46/0x68
nmi_cpu_backtrace.cold.2+0x13/0x57
nmi_trigger_cpumask_backtrace+0xba/0xca
rcu_dump_cpu_stacks+0x9e/0xce
rcu_check_callbacks.cold.74+0x2af/0x433
update_process_times+0x28/0x60
tick_sched_timer+0x34/0x70
__hrtimer_run_queues+0xee/0x250
hrtimer_interrupt+0xf4/0x210
smp_apic_timer_interrupt+0x56/0x110
apic_timer_interrupt+0xf/0x20
</IRQ>
RIP: 0010:balance_dirty_pages_ratelimited+0x28f/0x3d0
...
btrfs_file_write_iter+0x31b/0x563
__vfs_write+0xfa/0x140
__kernel_write+0x4f/0x100
do_acct_process+0x495/0x580
acct_process+0xb9/0xdb
do_exit+0x748/0xa00
do_group_exit+0x3a/0xa0
get_signal+0x254/0x560
do_signal+0x23/0x5c0
exit_to_usermode_loop+0x5d/0xa0
prepare_exit_to_usermode+0x53/0x80
retint_user+0x8/0x8
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org # v4.2+
Fixes: ec438699a9 ("cgroup, block: implement task_get_css() and use it in bio_associate_current()")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1f0ffa6734 upstream.
When people set a writeback percent via sysfs file,
/sys/block/bcache<N>/bcache/writeback_percent
current code directly sets BCACHE_DEV_WB_RUNNING to dc->disk.flags
and schedules kworker dc->writeback_rate_update.
If there is no cache set attached to, the writeback kernel thread is
not running indeed, running dc->writeback_rate_update does not make
sense and may cause NULL pointer deference when reference cache set
pointer inside update_writeback_rate().
This patch checks whether the cache set point (dc->disk.c) is NULL in
sysfs interface handler, and only set BCACHE_DEV_WB_RUNNING and
schedule dc->writeback_rate_update when dc->disk.c is not NULL (it
means the cache device is attached to a cache set).
This problem might be introduced from initial bcache commit, but
commit 3fd47bfe55 ("bcache: stop dc->writeback_rate_update properly")
changes part of the original code piece, so I add 'Fixes: 3fd47bfe55b0'
to indicate from which commit this patch can be applied.
Fixes: 3fd47bfe55 ("bcache: stop dc->writeback_rate_update properly")
Reported-by: Bjørn Forsman <bjorn.forsman@gmail.com>
Signed-off-by: Coly Li <colyli@suse.de>
Reviewed-by: Bjørn Forsman <bjorn.forsman@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 31b90956b1 upstream.
Recently people report bcache code compiled with gcc9 is broken, one of
the buggy behavior I observe is that two adjacent 4KB I/Os should merge
into one but they don't. Finally it turns out to be a stack corruption
caused by macro PRECEDING_KEY().
See how PRECEDING_KEY() is defined in bset.h,
437 #define PRECEDING_KEY(_k) \
438 ({ \
439 struct bkey *_ret = NULL; \
440 \
441 if (KEY_INODE(_k) || KEY_OFFSET(_k)) { \
442 _ret = &KEY(KEY_INODE(_k), KEY_OFFSET(_k), 0); \
443 \
444 if (!_ret->low) \
445 _ret->high--; \
446 _ret->low--; \
447 } \
448 \
449 _ret; \
450 })
At line 442, _ret points to address of a on-stack variable combined by
KEY(), the life range of this on-stack variable is in line 442-446,
once _ret is returned to bch_btree_insert_key(), the returned address
points to an invalid stack address and this address is overwritten in
the following called bch_btree_iter_init(). Then argument 'search' of
bch_btree_iter_init() points to some address inside stackframe of
bch_btree_iter_init(), exact address depends on how the compiler
allocates stack space. Now the stack is corrupted.
Fixes: 0eacac2203 ("bcache: PRECEDING_KEY()")
Signed-off-by: Coly Li <colyli@suse.de>
Reviewed-by: Rolf Fokkens <rolf@rolffokkens.nl>
Reviewed-by: Pierre JUHEN <pierre.juhen@orange.fr>
Tested-by: Shenghui Wang <shhuiw@foxmail.com>
Tested-by: Pierre JUHEN <pierre.juhen@orange.fr>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Nix <nix@esperi.org.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ca21f851cc upstream.
The Acorn i2c driver (for RiscPC) triggers the "i2c adapter has no name"
warning in the I2C core driver, resulting in the RTC being inaccessible.
Fix this.
Fixes: 2236baa75f ("i2c: Sanity checks on adapter registration")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6e7739fc93 upstream.
The 5.1 mount system rework changed the smackfsdef mount option to
smackfsdefault. This fixes the regression by making smackfsdef treated
the same way as smackfsdefault.
Also fix the smack_param_specs[] to have "smack" prefixes on all the
names. This isn't visible to a user unless they either:
(a) Try to mount a filesystem that's converted to the internal mount API
and that implements the ->parse_monolithic() context operation - and
only then if they call security_fs_context_parse_param() rather than
security_sb_eat_lsm_opts().
There are no examples of this upstream yet, but nfs will probably want
to do this for nfs2 or nfs3.
(b) Use fsconfig() to configure the filesystem - in which case
security_fs_context_parse_param() will be called.
This issue is that smack_sb_eat_lsm_opts() checks for the "smack" prefix
on the options, but smack_fs_context_parse_param() does not.
Fixes: c3300aaf95 ("smack: get rid of match_token()")
Fixes: 2febd254ad ("smack: Implement filesystem context security hooks")
Cc: stable@vger.kernel.org
Reported-by: Jose Bollo <jose.bollo@iot.bzh>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4e4abae311 upstream.
Apparently, some Qualcomm arm64 platforms which appear to expose their
SMMU global register space are still, in fact, using a hypervisor to
mediate it by trapping and emulating register accesses. Sadly, some
deployed versions of said trapping code have bugs wherein they go
horribly wrong for stores using r31 (i.e. XZR/WZR) as the source
register.
While this can be mitigated for GCC today by tweaking the constraints
for the implementation of writel_relaxed(), to avoid any potential
arms race with future compilers more aggressively optimising register
allocation, the simple way is to just remove all the problematic
constant zeros. For the write-only TLB operations, the actual value is
irrelevant anyway and any old nearby variable will provide a suitable
GPR to encode. The one point at which we really do need a zero to clear
a context bank happens before any of the TLB maintenance where crashes
have been reported, so is apparently not a problem... :/
Reported-by: AngeloGioacchino Del Regno <kholk11@gmail.com>
Tested-by: Marc Gonzalez <marc.w.gonzalez@free.fr>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Marc Gonzalez <marc.w.gonzalez@free.fr>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit eb96e57b91 upstream.
This can be a debug message. Favour dev_dbg() over dprintk() as this is
already used much more than dprintk().
dvb_frontend: dvb_frontend_get_frequency_limits: frequency interval: tuner: 45000000...860000000, frontend: 44250000...867250000
Fixes: 00ecd6bc71 ("media: dvb_frontend: add debug message for frequency intervals")
Cc: <stable@vger.kernel.org> # 5.0
Signed-off-by: Sean Young <sean@mess.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f6581f5b55 upstream.
Restore the read memory barrier in __ptrace_may_access() that was deleted
a couple years ago. Also add comments on this barrier and the one it pairs
with to explain why they're there (as far as I understand).
Fixes: bfedb58925 ("mm: Add a user_ns owner to mm_struct and fix ptrace permission checks")
Cc: stable@vger.kernel.org
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f6e2aa91a4 upstream.
Recently syzbot in conjunction with KMSAN reported that
ptrace_peek_siginfo can copy an uninitialized siginfo to userspace.
Inspecting ptrace_peek_siginfo confirms this.
The problem is that off when initialized from args.off can be
initialized to a negaive value. At which point the "if (off >= 0)"
test to see if off became negative fails because off started off
negative.
Prevent the core problem by adding a variable found that is only true
if a siginfo is found and copied to a temporary in preparation for
being copied to userspace.
Prevent args.off from being truncated when being assigned to off by
testing that off is <= the maximum possible value of off. Convert off
to an unsigned long so that we should not have to truncate args.off,
we have well defined overflow behavior so if we add another check we
won't risk fighting undefined compiler behavior, and so that we have a
type whose maximum value is easy to test for.
Cc: Andrei Vagin <avagin@gmail.com>
Cc: stable@vger.kernel.org
Reported-by: syzbot+0d602a1b0d8c95bdf299@syzkaller.appspotmail.com
Fixes: 84c751bd4a ("ptrace: add ability to retrieve signals without removing from a queue (v4)")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a58f2cef26 upstream.
There was the below bug report from Wu Fangsuo.
On the CMA allocation path, isolate_migratepages_range() could isolate
unevictable LRU pages and reclaim_clean_page_from_list() can try to
reclaim them if they are clean file-backed pages.
page:ffffffbf02f33b40 count:86 mapcount:84 mapping:ffffffc08fa7a810 index:0x24
flags: 0x19040c(referenced|uptodate|arch_1|mappedtodisk|unevictable|mlocked)
raw: 000000000019040c ffffffc08fa7a810 0000000000000024 0000005600000053
raw: ffffffc009b05b20 ffffffc009b05b20 0000000000000000 ffffffc09bf3ee80
page dumped because: VM_BUG_ON_PAGE(PageLRU(page) || PageUnevictable(page))
page->mem_cgroup:ffffffc09bf3ee80
------------[ cut here ]------------
kernel BUG at /home/build/farmland/adroid9.0/kernel/linux/mm/vmscan.c:1350!
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
Modules linked in:
CPU: 0 PID: 7125 Comm: syz-executor Tainted: G S 4.14.81 #3
Hardware name: ASR AQUILAC EVB (DT)
task: ffffffc00a54cd00 task.stack: ffffffc009b00000
PC is at shrink_page_list+0x1998/0x3240
LR is at shrink_page_list+0x1998/0x3240
pc : [<ffffff90083a2158>] lr : [<ffffff90083a2158>] pstate: 60400045
sp : ffffffc009b05940
..
shrink_page_list+0x1998/0x3240
reclaim_clean_pages_from_list+0x3c0/0x4f0
alloc_contig_range+0x3bc/0x650
cma_alloc+0x214/0x668
ion_cma_allocate+0x98/0x1d8
ion_alloc+0x200/0x7e0
ion_ioctl+0x18c/0x378
do_vfs_ioctl+0x17c/0x1780
SyS_ioctl+0xac/0xc0
Wu found it's due to commit ad6b67041a ("mm: remove SWAP_MLOCK in
ttu"). Before that, unevictable pages go to cull_mlocked so that we
can't reach the VM_BUG_ON_PAGE line.
To fix the issue, this patch filters out unevictable LRU pages from the
reclaim_clean_pages_from_list in CMA.
Link: http://lkml.kernel.org/r/20190524071114.74202-1-minchan@kernel.org
Fixes: ad6b67041a ("mm: remove SWAP_MLOCK in ttu")
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reported-by: Wu Fangsuo <fangsuowu@asrmicro.com>
Debugged-by: Wu Fangsuo <fangsuowu@asrmicro.com>
Tested-by: Wu Fangsuo <fangsuowu@asrmicro.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Pankaj Suryawanshi <pankaj.suryawanshi@einfochips.com>
Cc: <stable@vger.kernel.org> [4.12+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit be99ca2716 upstream.
ocfs2_dentry_attach_lock() can be executed in parallel threads against the
same dentry. Make that race safe. The race is like this:
thread A thread B
(A1) enter ocfs2_dentry_attach_lock,
seeing dentry->d_fsdata is NULL,
and no alias found by
ocfs2_find_local_alias, so kmalloc
a new ocfs2_dentry_lock structure
to local variable "dl", dl1
.....
(B1) enter ocfs2_dentry_attach_lock,
seeing dentry->d_fsdata is NULL,
and no alias found by
ocfs2_find_local_alias so kmalloc
a new ocfs2_dentry_lock structure
to local variable "dl", dl2.
......
(A2) set dentry->d_fsdata with dl1,
call ocfs2_dentry_lock() and increase
dl1->dl_lockres.l_ro_holders to 1 on
success.
......
(B2) set dentry->d_fsdata with dl2
call ocfs2_dentry_lock() and increase
dl2->dl_lockres.l_ro_holders to 1 on
success.
......
(A3) call ocfs2_dentry_unlock()
and decrease
dl2->dl_lockres.l_ro_holders to 0
on success.
....
(B3) call ocfs2_dentry_unlock(),
decreasing
dl2->dl_lockres.l_ro_holders, but
see it's zero now, panic
Link: http://lkml.kernel.org/r/20190529174636.22364-1-wen.gang.wang@oracle.com
Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Reported-by: Daniel Sobe <daniel.sobe@nxp.com>
Tested-by: Daniel Sobe <daniel.sobe@nxp.com>
Reviewed-by: Changwei Ge <gechangwei@live.cn>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 355e8d26f7 upstream.
Opening and closing an io_uring instance leaks a UNIX domain socket
inode. This is because the ->file of the io_uring instance's internal
UNIX domain socket is set to point to the io_uring file, but then
sock_release() sees the non-NULL ->file and assumes the inode reference
is held by the file so doesn't call iput(). That's not the case here,
since the reference is still meant to be held by the socket; the actual
inode of the io_uring file is different.
Fix this leak by NULL-ing out ->file before releasing the socket.
Reported-by: syzbot+111cb28d9f583693aefa@syzkaller.appspotmail.com
Fixes: 2b188cc1bb ("Add io_uring IO interface")
Cc: <stable@vger.kernel.org> # v5.1+
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fec6375320 upstream.
In selinux_sb_eat_lsm_opts(), 'arg' is allocated by kmemdup_nul(). It
returns NULL when fails. So 'arg' should be checked. And 'mnt_opts'
should be freed when error.
Signed-off-by: Gen Zhang <blackgod016574@gmail.com>
Fixes: 99dbbb593f ("selinux: rewrite selinux_sb_eat_lsm_opts()")
Cc: <stable@vger.kernel.org>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e2e0e09758 upstream.
In selinux_add_mnt_opt(), 'val' is allocated by kmemdup_nul(). It returns
NULL when fails. So 'val' should be checked. And 'mnt_opts' should be
freed when error.
Signed-off-by: Gen Zhang <blackgod016574@gmail.com>
Fixes: 757cbe597f ("LSM: new method: ->sb_add_mnt_opt()")
Cc: <stable@vger.kernel.org>
[PM: fixed some indenting problems]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit aff7ed4851 upstream.
These strings may come from untrusted sources (e.g. file xattrs) so they
need to be properly escaped.
Reproducer:
# setenforce 0
# touch /tmp/test
# setfattr -n security.selinux -v 'kuřecí řízek' /tmp/test
# runcon system_u:system_r:sshd_t:s0 cat /tmp/test
(look at the generated AVCs)
Actual result:
type=AVC [...] trawcon=kuřecí řízek
Expected result:
type=AVC [...] trawcon=6B75C5996563C3AD20C599C3AD7A656B
Fixes: fede148324 ("selinux: log invalid contexts in AVCs")
Cc: stable@vger.kernel.org # v5.1+
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Acked-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 352bcae97f upstream.
Check for exact and correct return value to snd_i2c_sendbytes
call for EWS/DMX 6Fire (snd_ice1712).
Fixes a systemic error on every boot starting from kernel 5.1
onwards to snd_ice1712 driver ("cannot send pca") on Terratec
EWS/DMX 6Fire PCI soundcards.
Check for exact and correct return value to snd_i2c_sendbytes
call for EWS/DMX 6Fire (snd_ice1712).
Fixes a systemic error on every boot to snd_ice1712 driver
("cannot send pca") on Terratec EWS/DMX 6Fire PCI soundcards.
Fixes: c99776cc40 ("ALSA: ice1712: fix a missing check of snd_i2c_sendbytes")
Signed-off-by: Rui Nuno Capela <rncbc@rncbc.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d8fa87c368 upstream.
Stanton SCS.1m can transfer isochronous packet with Multi Bit Linear
Audio data channels, therefore it allows software to capture PCM
substream. However, ALSA oxfw driver doesn't.
This commit changes the driver to add one PCM substream for capture
direction.
Fixes: de5126cc3c ("ALSA: oxfw: add stream format quirk for SCS.1 models")
Cc: <stable@vger.kernel.org> # v4.5+
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 69dbdfffef upstream.
The Bluetooth interface of the 2nd-gen Intuos Pro batches together four
independent "frames" of finger data into a single report. Each frame
is essentially equivalent to a single USB report, with the up-to-10
fingers worth of information being spread across two frames. At the
moment the driver only calls `input_sync` after processing all four
frames have been processed, which can result in the driver sending
multiple updates for a single slot within the same SYN_REPORT. This
can confuse userspace, so modify the driver to sync more often if
necessary (i.e., after reporting the state of all fingers).
Fixes: 4922cd26f0 ("HID: wacom: Support 2nd-gen Intuos Pro's Bluetooth classic interface")
Cc: <stable@vger.kernel.org> # 4.11+
Signed-off-by: Jason Gerecke <jason.gerecke@wacom.com>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6441fc781c upstream.
The button numbering of the 2nd-gen Intuos Pro is not consistent between
the USB and Bluetooth interfaces. Over USB, the HID_GENERIC codepath
enumerates the eight ExpressKeys first (BTN_0 - BTN_7) followed by the
center modeswitch button (BTN_8). The Bluetooth codepath, however, has
the center modeswitch button as BTN_0 and the the eight ExpressKeys as
BTN_1 - BTN_8. To ensure userspace button mappings do not change
depending on how the tablet is connected, modify the Bluetooth codepath
to report buttons in the same order as USB.
To ensure the mode switch LED continues to toggle in response to the
mode switch button, the `wacom_is_led_toggled` function also requires
a small update.
Link: https://github.com/linuxwacom/input-wacom/pull/79
Fixes: 4922cd26f0 ("HID: wacom: Support 2nd-gen Intuos Pro's Bluetooth classic interface")
Cc: <stable@vger.kernel.org> # 4.11+
Signed-off-by: Jason Gerecke <jason.gerecke@wacom.com>
Reviewed-by: Aaron Skomra <aaron.skomra@wacom.com>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fe7f8d73d1 upstream.
The Bluetooth reports from the 2nd-gen Intuos Pro have separate bits for
indicating if the tip or eraser is in contact with the tablet. At the
moment, only the tip contact bit controls the state of the BTN_TOUCH
event. This prevents the eraser from working as expected. This commit
changes the driver to send BTN_TOUCH whenever either the tip or eraser
contact bit is set.
Fixes: 4922cd26f0 ("HID: wacom: Support 2nd-gen Intuos Pro's Bluetooth classic interface")
Cc: <stable@vger.kernel.org> # 4.11+
Signed-off-by: Jason Gerecke <jason.gerecke@wacom.com>
Reviewed-by: Aaron Skomra <aaron.skomra@wacom.com>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e92a7be7fe upstream.
If the tool spends some time in prox before entering range, a series of
events (e.g. ABS_DISTANCE, MSC_SERIAL) can be sent before we or userspace
have any clue about the pen whose data is being reported. We need to hold
off on reporting anything until the pen has entered range. Since we still
want to report events that occur "in prox" after the pen has *left* range
we use 'wacom-tool[0]' as the indicator that the pen did at one point
enter range and provide us/userspace with tool type and serial number
information.
Fixes: a48324de6d ("HID: wacom: Bluetooth IRQ for Intuos Pro should handle prox/range")
Cc: <stable@vger.kernel.org> # 4.11+
Signed-off-by: Jason Gerecke <jason.gerecke@wacom.com>
Reviewed-by: Aaron Armstrong Skomra <aaron.skomra@wacom.com>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2cc08800a6 upstream.
The serial number and tool type information that is reported by the tablet
while a pen is merely "in prox" instead of fully "in range" can be stale
and cause us to report incorrect tool information. Serial number, tool
type, and other information is only valid once the pen comes fully in range
so we should be careful to not use this information until that point.
In particular, this issue may cause the driver to incorectly report
BTN_TOOL_RUBBER after switching from the eraser tool back to the pen.
Fixes: a48324de6d ("HID: wacom: Bluetooth IRQ for Intuos Pro should handle prox/range")
Cc: <stable@vger.kernel.org> # 4.11+
Signed-off-by: Jason Gerecke <jason.gerecke@wacom.com>
Reviewed-by: Aaron Armstrong Skomra <aaron.skomra@wacom.com>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 81bcbad53b upstream.
Since kernel v5.0, one single win8 touchscreen device failed.
And it turns out this is because it reports 2 InRange usage per touch.
It's a first, and I *really* wonder how this was allowed by Microsoft in
the first place. But IIRC, Breno told me this happened *after* a firmware
upgrade...
Anyway, better be safe for those crappy devices, and make sure we have
a full slot before jumping to the next.
This won't prevent all crappy devices to fail here, but at least we will
have a safeguard as long as the contact ID and the X and Y coordinates
are placed in the report after the grabage.
Fixes: 01eaac7e57 ("HID: multitouch: remove one copy of values")
CC: stable@vger.kernel.org # v5.0+
Reported-and-tested-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 15fc1b5c86 upstream.
This reverts commit 94a9992f7d.
The commit allows for more than 32 bits in hid_field_extract(),
but the return value is a 32 bits int.
So basically what this commit is doing is just silencing those
legitimate errors.
Revert to a previous situation in the hope that a proper
fix will be impletemented.
Fixes: 94a9992f7d ("HID: Increase maximum report size allowed by hid_field_extract()")
Cc: stable@vger.kernel.org # v5.1
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 39b3c3a5fb upstream.
The value field is actually an array of .maxfield. We should assign the
correct number to the correct usage.
Not that we never encounter a device that requires this ATM, but better
have the proper code path.
Fixes: 2dc702c991 ("HID: input: use the Resolution Multiplier for
high-resolution scrolling")
Cc: stable@vger.kernel.org # v5.0+
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d43c17ead8 upstream.
Some old mice have a tendency to not accept the high resolution multiplier.
They reply with a -EPIPE which was previously ignored.
Force the call to resolution multiplier to be synchronous and actually
check for the answer. If this fails, consider the mouse like a normal one.
Fixes: 2dc702c991 ("HID: input: use the Resolution Multiplier for
high-resolution scrolling")
Link: https://bugzilla.redhat.com/show_bug.cgi?id=1700071
Reported-and-tested-by: James Feeney <james@nurealm.net>
Cc: stable@vger.kernel.org # v5.0+
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Not-entirely-upstream-sha1-but-equivalent: bed2dd8421
("drm/ttm: Quick-test mmap offset in ttm_bo_mmap()")
Setting CONFIG_NOUVEAU_LEGACY_CTX_SUPPORT=n (added by commit: b30a43ac71)
causes the build to fail with:
ERROR: "drm_legacy_mmap" [drivers/gpu/drm/nouveau/nouveau.ko] undefined!
This does not happend upstream as the offending code got removed in:
bed2dd8421 ("drm/ttm: Quick-test mmap offset in ttm_bo_mmap()")
Fix that by adding check for CONFIG_NOUVEAU_LEGACY_CTX_SUPPORT around
the drm_legacy_mmap() call.
Also, as Sven Joachim pointed out, we need to make the check in
CONFIG_NOUVEAU_LEGACY_CTX_SUPPORT=n case return -EINVAL as its done
for basically all other gpu drivers, especially in upstream kernels
drivers/gpu/drm/ttm/ttm_bo_vm.c as of the upstream commit bed2dd8421.
NOTE. This is a minimal stable-only fix for trees where b30a43ac71 is
backported as the build error affects nouveau only.
Fixes: b30a43ac71 ("drm/nouveau: add kconfig option to turn off nouveau
legacy contexts. (v3)")
Signed-off-by: Thomas Backlund <tmb@mageia.org>
Cc: stable@vger.kernel.org
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Sven Joachim <svenjoac@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b30a43ac71 upstream.
There was a nouveau DDX that relied on legacy context ioctls to work,
but we fixed it years ago, give distros that have a modern DDX the
option to break the uAPI and close the mess of holes that legacy
context support is.
Full context of the story:
commit 0e975980d4
Author: Peter Antoine <peter.antoine@intel.com>
Date: Tue Jun 23 08:18:49 2015 +0100
drm: Turn off Legacy Context Functions
The context functions are not used by the i915 driver and should not
be used by modeset drivers. These driver functions contain several bugs
and security holes. This change makes these functions optional can be
turned on by a setting, they are turned off by default for modeset
driver with the exception of the nouvea driver that may require them with
an old version of libdrm.
The previous attempt was
commit 7c510133d9
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Thu Aug 8 15:41:21 2013 +0200
drm: mark context support as a legacy subsystem
but this had to be reverted
commit c21eb21cb5
Author: Dave Airlie <airlied@redhat.com>
Date: Fri Sep 20 08:32:59 2013 +1000
Revert "drm: mark context support as a legacy subsystem"
v2: remove returns from void function, and formatting (Daniel Vetter)
v3:
- s/Nova/nouveau/ in the commit message, and add references to the
previous attempts
- drop the part touching the drm hw lock, that should be a separate
patch.
Signed-off-by: Peter Antoine <peter.antoine@intel.com> (v2)
Cc: Peter Antoine <peter.antoine@intel.com> (v2)
Reviewed-by: Peter Antoine <peter.antoine@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
v2: move DRM_VM dependency into legacy config.
v3: fix missing dep (kbuild robot)
Cc: stable@vger.kernel.org
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5f3e2bf008 upstream.
Some TCP peers announce a very small MSS option in their SYN and/or
SYN/ACK messages.
This forces the stack to send packets with a very high network/cpu
overhead.
Linux has enforced a minimal value of 48. Since this value includes
the size of TCP options, and that the options can consume up to 40
bytes, this means that each segment can include only 8 bytes of payload.
In some cases, it can be useful to increase the minimal value
to a saner value.
We still let the default to 48 (TCP_MIN_SND_MSS), for compatibility
reasons.
Note that TCP_MAXSEG socket option enforces a minimal value
of (TCP_MIN_MSS). David Miller increased this minimal value
in commit c39508d6f1 ("tcp: Make TCP_MAXSEG minimum more correct.")
from 64 to 88.
We might in the future merge TCP_MIN_SND_MSS and TCP_MIN_MSS.
CVE-2019-11479 -- tcp mss hardcoded to 48
Signed-off-by: Eric Dumazet <edumazet@google.com>
Suggested-by: Jonathan Looney <jtl@netflix.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Tyler Hicks <tyhicks@canonical.com>
Cc: Bruce Curtis <brucec@netflix.com>
Cc: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f070ef2ac6 upstream.
Jonathan Looney reported that a malicious peer can force a sender
to fragment its retransmit queue into tiny skbs, inflating memory
usage and/or overflow 32bit counters.
TCP allows an application to queue up to sk_sndbuf bytes,
so we need to give some allowance for non malicious splitting
of retransmit queue.
A new SNMP counter is added to monitor how many times TCP
did not allow to split an skb if the allowance was exceeded.
Note that this counter might increase in the case applications
use SO_SNDBUF socket option to lower sk_sndbuf.
CVE-2019-11478 : tcp_fragment, prevent fragmenting a packet when the
socket is already using more than half the allowed space
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Jonathan Looney <jtl@netflix.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Tyler Hicks <tyhicks@canonical.com>
Cc: Bruce Curtis <brucec@netflix.com>
Cc: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3b4929f65b upstream.
Jonathan Looney reported that TCP can trigger the following crash
in tcp_shifted_skb() :
BUG_ON(tcp_skb_pcount(skb) < pcount);
This can happen if the remote peer has advertized the smallest
MSS that linux TCP accepts : 48
An skb can hold 17 fragments, and each fragment can hold 32KB
on x86, or 64KB on PowerPC.
This means that the 16bit witdh of TCP_SKB_CB(skb)->tcp_gso_segs
can overflow.
Note that tcp_sendmsg() builds skbs with less than 64KB
of payload, so this problem needs SACK to be enabled.
SACK blocks allow TCP to coalesce multiple skbs in the retransmit
queue, thus filling the 17 fragments to maximal capacity.
CVE-2019-11477 -- u16 overflow of TCP_SKB_CB(skb)->tcp_gso_segs
Fixes: 832d11c5cd ("tcp: Try to restore large SKBs while SACK processing")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Jonathan Looney <jtl@netflix.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Tyler Hicks <tyhicks@canonical.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Bruce Curtis <brucec@netflix.com>
Cc: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 44a9bd18a0 upstream.
The test case we have is rightfully failing with the current kernel:
io_uring_setup(1, 0x7ffe2cafebe0), flags: IORING_SETUP_SQPOLL|IORING_SETUP_SQ_AFF, resv: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000, sq_thread_cpu: 4
expected -1, got 3
This is in a vm, and CPU3 is the last valid one, hence asking for 4
should fail the setup with -EINVAL, not succeed. The problem is that
we're using array_index_nospec() with nr_cpu_ids as the index, hence we
wrap and end up using CPU0 instead of CPU4. This makes the setup
succeed where it should be failing.
We don't need to use array_index_nospec() as we're not indexing any
array with this. Instead just compare with nr_cpu_ids directly. This
is fine as we're checking with cpu_online() afterwards.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7c32ae35fb upstream.
The call of unsubscribe_port() which manages the group count and
module refcount from delete_and_unsubscribe_port() looks racy; it's
not covered by the group list lock, and it's likely a cause of the
reported unbalance at port deletion. Let's move the call inside the
group list_mutex to plug the hole.
Reported-by: syzbot+e4c8abb920efa77bace9@syzkaller.appspotmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9e46b840c7 upstream.
Overlay file f_pos is the master copy that is preserved
through copy up and modified on read/write, but only real
fs knows how to SEEK_HOLE/SEEK_DATA and real fs may impose
limitations that are more strict than ->s_maxbytes for specific
files, so we use the real file to perform seeks.
We do not call real fs for SEEK_CUR:0 query and for SEEK_SET:0
requests.
Fixes: d1d04ef857 ("ovl: stack file ops")
Reported-by: Eddie Horng <eddiehorng.tw@gmail.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 98487de318 upstream.
We found that it return success when we set IMMUTABLE_FL flag to a file in
docker even though the docker didn't have the capability
CAP_LINUX_IMMUTABLE.
The commit d1d04ef857 ("ovl: stack file ops") and dab5ca8fd9 ("ovl: add
lsattr/chattr support") implemented chattr operations on a regular overlay
file. ovl_real_ioctl() overridden the current process's subjective
credentials with ofs->creator_cred which have the capability
CAP_LINUX_IMMUTABLE so that it will return success in
vfs_ioctl()->cap_capable().
Fix this by checking the capability before cred overridden. And here we
only care about APPEND_FL and IMMUTABLE_FL, so get these information from
inode.
[SzM: move check and call to underlying fs inside inode locked region to
prevent two such calls from racing with each other]
Signed-off-by: Jiufei Xue <jiufei.xue@linux.alibaba.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This reverts commit 1e07d63749 which is
commit b30a43ac71 upstream.
Sven reports:
Commit 1e07d63749 ("drm/nouveau: add kconfig option to turn off nouveau
legacy contexts. (v3)") has caused a build failure for me when I
actually tried that option (CONFIG_NOUVEAU_LEGACY_CTX_SUPPORT=n):
,----
| Kernel: arch/x86/boot/bzImage is ready (#1)
| Building modules, stage 2.
| MODPOST 290 modules
| ERROR: "drm_legacy_mmap" [drivers/gpu/drm/nouveau/nouveau.ko] undefined!
| scripts/Makefile.modpost:91: recipe for target '__modpost' failed
`----
Upstream does not have that problem, as commit bed2dd8421 ("drm/ttm:
Quick-test mmap offset in ttm_bo_mmap()") has removed the use of
drm_legacy_mmap from nouveau_ttm.c. Unfortunately that commit does not
apply in 5.1.9.
The ensuing discussion proposed a number of one-off patches, but no
solid agreement was made, so just revert the commit for now to get
people's systems building again.
Reported-by: Sven Joachim <svenjoac@gmx.de>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Thomas Backlund <tmb@mageia.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This reverts commit 07e38998a1 which is
commit d5bb334a8e upstream.
Lots of people have reported issues with this patch, and as there does
not seem to be a fix going into Linus's kernel tree any time soon,
revert the commit in the stable trees so as to get people's machines
working properly again.
Reported-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reported-by: Hans de Goede <hdegoede@redhat.com>
Cc: Jeremy Cline <jeremy@jcline.org>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 8c43004af0 ]
pcpu_find_block_fit() guarantees that a fit is found within
PCPU_BITMAP_BLOCK_BITS. Iteration is used to determine the first fit as
it compares against the block's contig_hint. This can lead to
incorrectly scanning past the end of the bitmap. The behavior was okay
given the check after for bit_off >= end and the correctness of the
hints from pcpu_find_block_fit().
This patch fixes this by bounding the end offset by the number of bits
in a chunk.
Signed-off-by: Dennis Zhou <dennis@kernel.org>
Reviewed-by: Peng Fan <peng.fan@nxp.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 32a155b1a8 ]
The datasheet says the vconn MUST be off when we start toggling. The
tcpm.c state-machine is responsible to make sure vconn is off, but lets
add a WARN to catch any cases where vconn is not off for some reason.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4d8e3e951a ]
During early system resume on Exynos5422 with performance counters enabled
the following kernel oops happens:
Internal error: Oops - undefined instruction: 0 [#1] PREEMPT SMP ARM
Modules linked in:
CPU: 0 PID: 1433 Comm: bash Tainted: G W 5.0.0-rc5-next-20190208-00023-gd5fb5a8a13e6-dirty #5480
Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
...
Flags: nZCv IRQs off FIQs off Mode SVC_32 ISA ARM Segment none
Control: 10c5387d Table: 4451006a DAC: 00000051
Process bash (pid: 1433, stack limit = 0xb7e0e22f)
...
(reset_ctrl_regs) from [<c0112ad0>] (dbg_cpu_pm_notify+0x1c/0x24)
(dbg_cpu_pm_notify) from [<c014c840>] (notifier_call_chain+0x44/0x84)
(notifier_call_chain) from [<c014cbc0>] (__atomic_notifier_call_chain+0x7c/0x128)
(__atomic_notifier_call_chain) from [<c01ffaac>] (cpu_pm_notify+0x30/0x54)
(cpu_pm_notify) from [<c055116c>] (syscore_resume+0x98/0x3f4)
(syscore_resume) from [<c0189350>] (suspend_devices_and_enter+0x97c/0xe74)
(suspend_devices_and_enter) from [<c0189fb8>] (pm_suspend+0x770/0xc04)
(pm_suspend) from [<c0187740>] (state_store+0x6c/0xcc)
(state_store) from [<c09fa698>] (kobj_attr_store+0x14/0x20)
(kobj_attr_store) from [<c030159c>] (sysfs_kf_write+0x4c/0x50)
(sysfs_kf_write) from [<c0300620>] (kernfs_fop_write+0xfc/0x1e0)
(kernfs_fop_write) from [<c0282be8>] (__vfs_write+0x2c/0x160)
(__vfs_write) from [<c0282ea4>] (vfs_write+0xa4/0x16c)
(vfs_write) from [<c0283080>] (ksys_write+0x40/0x8c)
(ksys_write) from [<c0101000>] (ret_fast_syscall+0x0/0x28)
Undefined instruction is triggered during CP14 reset, because bits: #16
(Secure privileged invasive debug disabled) and #17 (Secure privileged
noninvasive debug disable) are set in DSCR. Those bits depend on SPNIDEN
and SPIDEN lines, which are provided by Secure JTAG hardware block. That
block in turn is powered from cluster 0 (big/Eagle), but the Exynos5422
boots on cluster 1 (LITTLE/KFC).
To fix this issue it is enough to turn on the power on the cluster 0 for
a while. This lets the Secure JTAG block to propagate the needed signals
to LITTLE/KFC cores and change their DSCR.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5ab99cf7d5 ]
The PVDD_APIO_1V8 (LDO2) and PVDD_ABB_1V8 (LDO8) regulators were turned
off by Linux kernel as unused. However they supply critical parts of
SoC so they should be always on:
1. PVDD_APIO_1V8 supplies SYS pins (gpx[0-3], PSHOLD), HDMI level shift,
RTC, VDD1_12 (DRAM internal 1.8 V logic), pull-up for PMIC interrupt
lines, TTL/UARTR level shift, reset pins and SW-TACT1 button.
It also supplies unused blocks like VDDQ_SRAM (for SROM controller) and
VDDQ_GPIO (gpm7, gpy7).
The LDO2 cannot be turned off (S2MPS11 keeps it on anyway) so
marking it "always-on" only reflects its real status.
2. PVDD_ABB_1V8 supplies Adaptive Body Bias Generator for ARM cores,
memory and Mali (G3D).
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9d3863736a ]
The lack of defaults provided by the caller to
v4l2_fwnode_endpoint_parse() signals the use of the default lane mapping.
The default lane mapping must not be used however if the firmmare contains
the lane mapping. Disable the default lane mapping in that case, and
improve the debug messages telling of the use of the defaults.
This was missed previously since the default mapping will only unsed in
this case if the bus type is set, and no driver did both while still
needing the lane mapping configuration.
Fixes: b4357d21d6 ("media: v4l: fwnode: Support default CSI-2 lane mapping for drivers")
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b00ef53053 ]
It must be made sure that immediate mode is not already set, when
modifying shadow register value in ehrpwm_pwm_disable(). Otherwise
modifications to the action-qualifier continuous S/W force
register(AQSFRC) will be done in the active register.
This may happen when both channels are being disabled. In this case,
only the first channel state will be recorded as disabled in the shadow
register. Later, when enabling the first channel again, the second
channel would be enabled as well. Setting RLDCSF to zero, first, ensures
that the shadow register is updated as desired.
Fixes: 38dabd91ff ("pwm: tiehrpwm: Fix disabling of output of PWMs")
Signed-off-by: Christoph Vogtländer <c.vogtlaender@sigma-surface-science.com>
[vigneshr@ti.com: Improve commit message]
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5ba846b1ee ]
Intel IOMMU, when enabled, tries to find the domain of the device,
assuming it's a PCI one, during DMA operations, such as mapping or
unmapping. Since we are splitting the actual PCI device to couple of
children via MFD framework (see drivers/mfd/intel-lpss.c for details),
the DMA device appears to be a platform one, and thus not an actual one
that performs DMA. In a such situation IOMMU can't find or allocate
a proper domain for its operations. As a result, all DMA operations are
failed.
In order to fix this, supply parent of the platform device
to the DMA engine framework and fix filter functions accordingly.
We may rely on the fact that parent is a real PCI device, because no
other configuration is present in the wild.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Mark Brown <broonie@kernel.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [for tty parts]
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c2d8b9a6c1 ]
The send functions in batman-adv are expected to consume the skb when
either the data is queued up for the underlying driver or when some
precondition failed. batadv_dat_send_data didn't do this and instead
created a copy of the skb, modified it and queued the copy up for
transmission. The caller has to take care that the skb is handled correctly
(for example free'd) when batadv_dat_send_data returns.
This unclear behavior already lead to memory leaks in the recent past.
Renaming the function to batadv_dat_forward_data should make it easier to
identify that the data is forwarded but the skb is not actually
send+consumed.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 203a068ac9 ]
Currently we aren't checking for the ICE_FC_NONE case for the current
flow control mode. This is causing "Unknown" to be printed for the
current flow control method if flow control is disabled. Fix this by
adding the case for ICE_FC_NONE to print "None".
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 21e2118f47 ]
We need to only apply errata 1.101 handling to clear non-wakeup edge gpios
for idle to the gpio bank(s) in the wkup domain to prevent spurious wake-up
events.
And we must restore what we did after idle manually as the gpio bank in
wkup domain is not restored otherwise.
Let's keep bank->saved_datain register reading separate, that's not related
to the 1.101 errata and is used separately on restore.
Cc: Aaro Koskinen <aaro.koskinen@iki.fi>
Cc: Grygorii Strashko <grygorii.strashko@ti.com>
Cc: Keerthy <j-keerthy@ti.com>
Cc: Peter Ujfalusi <peter.ujfalusi@ti.com>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Cc: Tero Kristo <t-kristo@ti.com>
Reported-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f95f57e437 ]
The regulator definition got their supply names cleaned up during
upstreaming, so they no longer match the driver defined names. Update
the supply names.
Also fill out the missing voltage of SMPS 5.
Fixes: 0b363f5b87 ("arm64: dts: qcom: qcs404: Add PMS405 RPM regulators")
Reported-by: Nicolas Dechesne <nicolas.dechesne@linaro.org>
Reviewed-by: Niklas Cassel <niklas.cassel@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Andy Gross <andy.gross@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 778c02a236 ]
If a sync bfq_queue has a higher weight than some other queue, and
remains temporarily empty while in service, then, to preserve the
bandwidth share of the queue, it is necessary to plug I/O dispatching
until a new request arrives for the queue. In addition, a timeout
needs to be set, to avoid waiting for ever if the process associated
with the queue has actually finished its I/O.
Even with the above timeout, the device is however not fed with new
I/O for a while, if the process has finished its I/O. If this happens
often, then throughput drops and latencies grow. For this reason, the
timeout is kept rather low: 8 ms is the current default.
Unfortunately, such a low value may cause, on the opposite end, a
violation of bandwidth guarantees for a process that happens to issue
new I/O too late. The higher the system load, the higher the
probability that this happens to some process. This is a problem in
scenarios where service guarantees matter more than throughput. One
important case are weight-raised queues, which need to be granted a
very high fraction of the bandwidth.
To address this issue, this commit lower-bounds the plugging timeout
for weight-raised queues to 20 ms. This simple change provides
relevant benefits. For example, on a PLEXTOR PX-256M5S, with which
gnome-terminal starts in 0.6 seconds if there is no other I/O in
progress, the same applications starts in
- 0.8 seconds, instead of 1.2 seconds, if ten files are being read
sequentially in parallel
- 1 second, instead of 2 seconds, if, in parallel, five files are
being read sequentially, and five more files are being written
sequentially
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ec7f6aad57 ]
When ioremap fails, hga_vram should not be dereferenced. The fix
check the failure to avoid NULL pointer dereference.
Signed-off-by: Kangjie Lu <kjlu@umn.edu>
Cc: Aditya Pakki <pakki001@umn.edu>
Cc: Ferenc Bakonyi <fero@drama.obuda.kando.hu>
[b.zolnierkie: minor patch summary fixup]
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a5f50c5013 ]
GT5663 is capacitive touch controller with customized smart
wakeup gestures.
Add support for it by adding compatible and supported chip data.
The chip data on GT5663 is similar to GT1151, like
- config data register has 0x8050 address
- config data register max len is 240
- config data checksum has 16-bit
Signed-off-by: Jagan Teki <jagan@amarulasolutions.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0257eda08e ]
Driver maintains state machine for processing and completing switch
commands. This patch resets FCF_ASYNC_{SENT|ACTIVE} flag to indicate if the
previous command is active or sent, in order for next GPSC command to
advance the state machine.
[mkp: commit desc typo]
Signed-off-by: Giridhar Malavali <gmalavali@marvell.com>
Signed-off-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 954b4b752a ]
The MSI message address in the RC address space can be 64 bit. The
R-Car PCIe RC supports such a 64bit MSI message address as well.
The code currently uses virt_to_phys(__get_free_pages()) to obtain
a reserved page for the MSI message address, and the return value
of which can be a 64 bit physical address on 64 bit system.
However, the driver only programs PCIEMSIALR register with the bottom
32 bits of the virt_to_phys(__get_free_pages()) return value and does
not program the top 32 bits into PCIEMSIAUR, but rather programs the
PCIEMSIAUR register with 0x0. This worked fine on older 32 bit R-Car
SoCs, however may fail on new 64 bit R-Car SoCs.
Since from a PCIe controller perspective, an inbound MSI is a memory
write to a special address (in case of this controller, defined by
the value in PCIEMSIAUR:PCIEMSIALR), which triggers an interrupt, but
never hits the DRAM _and_ because allocation of an MSI by a PCIe card
driver obtains the MSI message address by reading PCIEMSIAUR:PCIEMSIALR
in rcar_msi_setup_irqs(), incorrectly programmed PCIEMSIAUR cannot
cause memory corruption or other issues.
There is however the possibility that if virt_to_phys(__get_free_pages())
returned address above the 32bit boundary _and_ PCIEMSIAUR was programmed
to 0x0 _and_ if the system had physical RAM at the address matching the
value of PCIEMSIALR, a PCIe card driver could allocate a buffer with a
physical address matching the value of PCIEMSIALR and a remote write to
such a buffer by a PCIe card would trigger a spurious MSI.
Fixes: e015f88c36 ("PCI: rcar: Add support for R-Car H3 to pcie-rcar")
Signed-off-by: Marek Vasut <marek.vasut+renesas@gmail.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Simon Horman <horms+renesas@verge.net.au>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Phil Edworthy <phil.edworthy@renesas.com>
Cc: Simon Horman <horms+renesas@verge.net.au>
Cc: Wolfram Sang <wsa@the-dreams.de>
Cc: linux-renesas-soc@vger.kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fd8a44bd5b ]
Platforms which populate msi_host_init() have their own MSI controller
logic. Writing to MSI control registers on platforms which do not use
Designware's MSI controller logic might have side effects.
To be safe, do not write to MSI control registers if the platform uses
its own MSI controller logic instead of Designware's MSI one.
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
[lorenzo.pieralisi@arm.com: updated commit log]
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 72110b5674 ]
When set 2 same MAC to different function of one port, IMP
will return error as the later one may modify the origin one.
This will cause bond fail for 2 VFs of one port.
Driver just print warning and return 0 with this patch, so
if set same MAC address, it will return 0 but do not really
configure HW.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 186857c5a1 ]
As Hagbard Celine reported:
Hi, this is a long standing bug that I've hit before on older kernels,
but I was not able to get the syslog saved because of the nature of
the bug. This time I had booted form a pen-drive, and was able to save
the log to it's efi-partition.
What i did to trigger it was to create a partition and format it f2fs,
then mount it with options:
"rw,relatime,lazytime,background_gc=on,disable_ext_identify,discard,heap,user_xattr,inline_xattr,acl,inline_data,inline_dentry,flush_merge,data_flush,extent_cache,mode=adaptive,active_logs=6,whint_mode=fs-based,alloc_mode=default,fsync_mode=strict".
Then I unpacked a big .tar.xz to the partition (I used a
gentoo-stage3-tarball as I was in process of installing Gentoo).
Same options just without data_flush gives no problems.
Mar 20 20:54:01 usbgentoo kernel: FAT-fs (nvme0n1p4): Volume was not
properly unmounted. Some data may be corrupt. Please run fsck.
Mar 20 21:05:23 usbgentoo kernel: kworker/dying (1588) used greatest
stack depth: 12064 bytes left
Mar 20 21:06:40 usbgentoo kernel: BUG: stack guard page was hit at
00000000a4b0733c (stack is 0000000056016422..0000000096e7463f)
Mar 20 21:06:40 usbgentoo kernel: kernel stack overflow
......
Mar 20 21:06:40 usbgentoo kernel: Call Trace:
Mar 20 21:06:40 usbgentoo kernel: read_node_page+0x71/0xf0
Mar 20 21:06:40 usbgentoo kernel: ? xas_load+0x8/0x50
Mar 20 21:06:40 usbgentoo kernel: __get_node_page+0x73/0x2a0
Mar 20 21:06:40 usbgentoo kernel: f2fs_get_dnode_of_data+0x34e/0x580
Mar 20 21:06:40 usbgentoo kernel: f2fs_write_inline_data+0x5e/0x2a0
Mar 20 21:06:40 usbgentoo kernel: __write_data_page+0x421/0x690
Mar 20 21:06:40 usbgentoo kernel: f2fs_write_cache_pages+0x1cf/0x460
Mar 20 21:06:40 usbgentoo kernel: f2fs_write_data_pages+0x2b3/0x2e0
Mar 20 21:06:40 usbgentoo kernel: ? f2fs_inode_chksum_verify+0x1d/0xc0
Mar 20 21:06:40 usbgentoo kernel: ? read_node_page+0x71/0xf0
Mar 20 21:06:40 usbgentoo kernel: do_writepages+0x3c/0xd0
Mar 20 21:06:40 usbgentoo kernel: __filemap_fdatawrite_range+0x7c/0xb0
Mar 20 21:06:40 usbgentoo kernel: f2fs_sync_dirty_inodes+0xf2/0x200
Mar 20 21:06:40 usbgentoo kernel: f2fs_balance_fs_bg+0x2a3/0x2c0
Mar 20 21:06:40 usbgentoo kernel: ? f2fs_inode_dirtied+0x21/0xc0
Mar 20 21:06:40 usbgentoo kernel: f2fs_balance_fs+0xd6/0x2b0
Mar 20 21:06:40 usbgentoo kernel: __write_data_page+0x4fb/0x690
......
Mar 20 21:06:40 usbgentoo kernel: __writeback_single_inode+0x2a1/0x340
Mar 20 21:06:40 usbgentoo kernel: ? soft_cursor+0x1b4/0x220
Mar 20 21:06:40 usbgentoo kernel: writeback_sb_inodes+0x1d5/0x3e0
Mar 20 21:06:40 usbgentoo kernel: __writeback_inodes_wb+0x58/0xa0
Mar 20 21:06:40 usbgentoo kernel: wb_writeback+0x250/0x2e0
Mar 20 21:06:40 usbgentoo kernel: ? 0xffffffff8c000000
Mar 20 21:06:40 usbgentoo kernel: ? cpumask_next+0x16/0x20
Mar 20 21:06:40 usbgentoo kernel: wb_workfn+0x2f6/0x3b0
Mar 20 21:06:40 usbgentoo kernel: ? __switch_to_asm+0x40/0x70
Mar 20 21:06:40 usbgentoo kernel: process_one_work+0x1f5/0x3f0
Mar 20 21:06:40 usbgentoo kernel: worker_thread+0x28/0x3c0
Mar 20 21:06:40 usbgentoo kernel: ? rescuer_thread+0x330/0x330
Mar 20 21:06:40 usbgentoo kernel: kthread+0x10e/0x130
Mar 20 21:06:40 usbgentoo kernel: ? kthread_create_on_node+0x60/0x60
Mar 20 21:06:40 usbgentoo kernel: ret_from_fork+0x35/0x40
The root cause is that we run into an infinite recursive calling in
between f2fs_balance_fs_bg and writepage() as described below:
- f2fs_write_data_pages --- A
- __write_data_page
- f2fs_balance_fs
- f2fs_balance_fs_bg --- B
- f2fs_sync_dirty_inodes
- filemap_fdatawrite
- f2fs_write_data_pages --- A
...
- f2fs_balance_fs_bg --- B
...
In order to fix this issue, let's detect such condition in __write_data_page()
and just skip calling f2fs_balance_fs() recursively.
Reported-by: Hagbard Celine <hagbardcelin@gmail.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0cd0e49711 ]
Call order on probe():
- max14656_hw_init() enables interrupts on the chip
- devm_request_irq() starts processing interrupts, isr
could be called immediately
- isr: schedules delayed work (irq_work)
- irq_work: calls power_supply_changed()
- devm_power_supply_register() registers the power supply
Depending on timing, it's possible that power_supply_changed()
is called on an unregistered power supply structure.
Fix by registering the power supply before requesting the irq.
Cc: Alexander Kurz <akurz@blala.de>
Signed-off-by: Sven Van Asbroeck <TheSven73@gmail.com>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2fef327668 ]
In current driver, SET_LATE_SYSTEM_SLEEP_PM_OPS is used to install the
callbacks for suspend/resume.
GPIO pin may be used as the interrupt pin by some device. However, using
SET_LATE_SYSTEM_SLEEP_PM_OPS() to install the callbacks, the resume
callback is called after resume_device_irqs(). Unintended interrupts may
arrive due to resuming device irqs first, but the GPIO controller is not
properly restored.
Normally, for a SMP system, there are multiple cores, so even when there are
unintended interrupts, BSP gets the chance to initialize the GPIO chip soon.
But when there is only 1 core is active (other cores are offlined or
single core) during resume, it is more easily to observe the unintended
interrupts.
This patch renames the suspend/resume function by adding suffix "_noirq",
and installs the callbacks using SET_NOIRQ_SYSTEM_SLEEP_PM_OPS().
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 72aff4ecf1 ]
This area is used to store keys by HSPPA in case of AM438x SOC. Leave it
active.
Signed-off-by: Kabir Sahane <x0153567@ti.com>
Signed-off-by: Andrew F. Davis <afd@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a1e07ba89d ]
[Why]
The input color space for the plane was previously ignored even if it
was set.
If a limited range YUV format was given to DC then the
wrong color transformation matrix was being used since DC assumed that
it was full range instead.
[How]
Respect the given color_space format for the plane if it isn't
COLOR_SPACE_UNKNOWN. Otherwise, use the implicit default since DM
didn't specify.
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Reviewed-by: Sun peng Li <Sunpeng.Li@amd.com>
Acked-by: Aric Cyr <Aric.Cyr@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 15ae3b28f8 ]
[Why]
If link is already enabled at a different rate (for example 5.4 Gbps)
then calling VBIOS command table to switch to a new rate
(for example 2.7 Gbps) will not take effect.
This can lead to link training failure to occur.
[How]
If the requested link rate is different than the current link rate,
the link must be disabled in order to re-enable at the new
link rate.
In today's logic it is currently only impacting eDP since DP
connection types will always disable the link during display
detection, when initial link verification occurs.
Signed-off-by: Anthony Koo <Anthony.Koo@amd.com>
Reviewed-by: Aric Cyr <Aric.Cyr@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Acked-by: Tony Cheng <Tony.Cheng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fb26228bfc ]
The find_dlpar_node() helper returns a device node with its reference
incremented. Both the add and remove paths use this helper for find the
appropriate node, but fail to release the reference when done.
Annotate the find_dlpar_node() helper with a comment about the incremented
reference count and call of_node_put() on the obtained device_node in the
add and remove paths. Also, fixup a reference leak in the find_vio_slot()
helper where we fail to call of_node_put() on the vdevice node after we
iterate over its children.
Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bbdc00a7de ]
The rk3288 SoC has two PWM implementations available, the "old"
implementation and the "new" one. You can switch between the two of
them by flipping a bit in the grf.
The "old" implementation is the default at chip power up but isn't the
one that's officially supposed to be used. ...and, in fact, the
driver that gets selected in Linux using the rk3288 device tree only
supports the "new" implementation.
Long ago I tried to get a switch to the right IP block landed in the
PWM driver (search for "rk3288: Switch to use the proper PWM IP") but
that got rejected. In the mean time the grf has grown a full-fledged
driver that already sets other random bits like this. That means we
can now get the fix landed.
For those wondering how things could have possibly worked for the last
4.5 years, folks have mostly been relying on the bootloader to set
this bit. ...but occasionally folks have pointed back to my old patch
series [1] in downstream kernels.
[1] https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1391597.html
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f7b0c4ce8c ]
By default, for performance consideration, Intel IOMMU
driver won't flush IOTLB immediately after a buffer is
unmapped. It schedules a thread and flushes IOTLB in a
batched mode. This isn't suitable for untrusted device
since it still can access the memory even if it isn't
supposed to do so.
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Tested-by: Xu Pengfei <pengfei.xu@intel.com>
Tested-by: Mika Westerberg <mika.westerberg@intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d327330185 ]
Historically the power supply management in this driver has been handled
in two separate places in parallel. Device-tree users simply defined an
appropriate regulator, while two boards with no DT support (da830-evm and
omapl138-hawk) passed functions defined in their respective board files
over platform data. These functions simply used legacy GPIO calls to
watch the oc GPIO for interrupts and disable the vbus GPIO when the irq
fires.
Commit d193abf1c9 ("usb: ohci-da8xx: add vbus and overcurrent gpios")
updated these GPIO calls to the modern API and moved them inside the
driver.
This however is not the optimal solution for the vbus GPIO which should
be modeled as a fixed regulator that can be controlled with a GPIO.
In order to keep the overcurrent protection available once we move the
board files to using fixed regulators we need to disable the enable_reg
regulator when the overcurrent indicator interrupt fires. Since we
cannot call regulator_disable() from interrupt context, we need to
switch to using a oneshot threaded interrupt.
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Signed-off-by: Sekhar Nori <nsekhar@ti.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 57a20248ef ]
Experimentally it can be seen that going into deep sleep (specifically
setting PMU_CLR_DMA and PMU_CLR_BUS in RK3288_PMU_PWRMODE_CON1)
appears to fail unless "aclk_dmac1" is on. The failure is that the
system never signals that it made it into suspend on the GLOBAL_PWROFF
pin and it just hangs.
NOTE that it's confirmed that it's the actual suspend that fails, not
one of the earlier calls to read/write registers. Specifically if you
comment out the "PMU_GLOBAL_INT_DISABLE" setting in
rk3288_slp_mode_set() and then comment out the "cpu_do_idle()" call in
rockchip_lpmode_enter() then you can exercise the whole suspend path
without any crashing.
This is currently not a problem with suspend upstream because there is
no current way to exercise the deep suspend code. However, anyone
trying to make it work will run into this issue.
This was not a problem on shipping rk3288-based Chromebooks because
those devices all ran on an old kernel based on 3.14. On that kernel
"aclk_dmac1" appears to be left on all the time.
There are several ways to skin this problem.
A) We could add "aclk_dmac1" to the list of critical clocks and that
apperas to work, but presumably that wastes power.
B) We could keep a list of "struct clk" objects to enable at suspend
time in clk-rk3288.c and use the standard clock APIs.
C) We could make the rk3288-pmu driver keep a list of clocks to enable
at suspend time. Presumably this would require a dts and bindings
change.
D) We could just whack the clock on in the existing syscore suspend
function where we whack a bunch of other clocks. This is particularly
easy because we know for sure that the clock's only parent
("aclk_cpu") is a critical clock so we don't need to do anything more
than ungate it.
In this case I have chosen D) because it seemed like the least work,
but any of the other options would presumably also work fine.
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Elaine Zhang <zhangqing@rock-chips.com>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 89e28da828 ]
When building with -Wsometimes-uninitialized, Clang warns:
drivers/soc/mediatek/mtk-pmic-wrap.c:1358:6: error: variable 'rdata' is
used uninitialized whenever '||' condition is true
[-Werror,-Wsometimes-uninitialized]
If pwrap_write returns non-zero, pwrap_read will not be called to
initialize rdata, meaning that we will use some random uninitialized
stack value in our print statement. Zero initialize rdata in case this
happens.
Link: https://github.com/ClangBuiltLinux/linux/issues/401
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Matthias Brugger <matthias.bgg@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f316a2b53c ]
hook_fault_code() is an ARM32 specific API for hooking into data abort.
AM65X platforms (that integrate ARM v8 cores and select CONFIG_ARM64 as
arch) rely on pci-keystone.c but on them the enumeration of a
non-present BDF does not trigger a bus error, so the fixup exception
provided by calling hook_fault_code() is not needed and can be guarded
with CONFIG_ARM.
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
[lorenzo.pieralisi@arm.com: commit log]
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b22af42b3e ]
SERDES connected to the PCIe controller in AM654 requires
power on reset enable (POR_EN) to be set in the SERDES. The
SERDES driver sets POR_EN in the reset ops and it has to be
invoked before init or enable ops. In order for SERDES driver
to set POR_EN, invoke the phy_reset() API in pci-keystone driver.
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 94d4e7af14 ]
As new transfer mechanisms are added to the EC codebase, they may
not support v2 of the EC protocol.
If the v3 initial handshake transfer fails, the kernel will try
and call cmd_xfer as a fallback. If v2 is not supported, cmd_xfer
will be NULL, and the code will end up causing a kernel panic.
Add a check for NULL before calling the transfer function, along
with a helpful comment explaining how one might end up in this
situation.
Signed-off-by: Enrico Granata <egranata@chromium.org>
Reviewed-by: Jett Rink <jettrink@chromium.org>
Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c68b901ac4 ]
The accumulator sample register is signed 32-bits wide register on
droid 4. And only the earlier version of cpcap has a signed 24-bits
wide register. We're currently passing it around as unsigned, so
let's fix that and use sign_extend32() for the earlier revision.
Signed-off-by: Tony Lindgren <tony@atomide.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3e957b377b ]
Added a new local variable in the i40e_setup_tc function named
old_queue_pairs so num_queue_pairs can be restored to the correct
value in case configuring queue channels fails. Additionally, moved
the exit label in the i40e_setup_tc function so the if (need_reset)
block can be executed.
Also, fixed data packing in the i40e_setup_tc function.
Signed-off-by: Adam Ludkiewicz <adam.ludkiewicz@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a46b51cd2a ]
Commit 5f84bb1a40 ("soc/tegra: pmc: Add sysfs entries for reset info")
added sysfs entries for Tegra reset source and level. However, these
sysfs are not removed on error and so if the registering of PMC device
is probe deferred, then the next time we attempt to probe the PMC device
warnings such as the following will be displayed on boot ...
sysfs: cannot create duplicate filename '/devices/platform/7000e400.pmc/reset_reason'
Fix this by calling device_remove_file() for each sysfs entry added on
failure. Note that we call device_remove_file() unconditionally without
checking if the sysfs entry was created in the first place, but this
should be OK because kernfs_remove_by_name_ns() will fail silently.
Fixes: 5f84bb1a40 ("soc/tegra: pmc: Add sysfs entries for reset info")
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ea094d5358 ]
In pcibios_irq_init(), the PCI IRQ routing table 'pirq_table' is first
found through pirq_find_routing_table(). If the table is not found and
CONFIG_PCI_BIOS is defined, the table is then allocated in
pcibios_get_irq_routing_table() using kmalloc(). Later, if the I/O APIC is
used, this table is actually not used. In that case, the allocated table
is not freed, which is a memory leak.
Free the allocated table if it is not used.
Signed-off-by: Wenwen Wang <wang6495@umn.edu>
[bhelgaas: added Ingo's reviewed-by, since the only change since v1 was to
use the irq_routing_table local variable name he suggested]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9872760eb7 ]
The XDomain protocol messages may start as soon as Thunderbolt control
channel is started. This means that if the other host starts sending
ThunderboltIP packets early enough they will be passed to the network
driver which then gets confused because its resume hook is not called
yet.
Fix this by unregistering the ThunderboltIP protocol handler when
suspending and registering it back on resume.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 083c1b5e50 ]
When running application tool switchtec-user's `firmware update` and `event
wait` commands concurrently, sometimes the firmware update speed reduced
significantly.
It is because when the MRPC event happened after MRPC event occurrence
check but before the event mask loop reaches its header register in event
ISR, the MRPC event would be masked unintentionally. Since there's no
chance to enable it again except for a module reload, all the following
MRPC execution completion checks time out.
Fix this bug by skipping the mask operation for MRPC event in event ISR,
same as what we already do for LINK event.
Fixes: 52eabba5bc ("switchtec: Add IOCTLs to the Switchtec driver")
Signed-off-by: Wesley Sheng <wesley.sheng@microchip.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3f54c447df ]
Disabling the SMMU when probing from within a kdump kernel so that all
incoming transactions are terminated can prevent the core of the crashed
kernel from being transferred off the machine if all I/O devices are
behind the SMMU.
Instead, continue to probe the SMMU after it is disabled so that we can
reinitialise it entirely and re-attach the DMA masters as they are reset.
Since the kdump kernel may not have drivers for all of the active DMA
masters, we suppress fault reporting to avoid spamming the console and
swamping the IRQ threads.
Reported-by: "Leizhen (ThunderTown)" <thunder.leizhen@huawei.com>
Tested-by: "Leizhen (ThunderTown)" <thunder.leizhen@huawei.com>
Tested-by: Bhupesh Sharma <bhsharma@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 41be3e2618 ]
vfio_dev_present() which is the condition to
wait_event_interruptible_timeout(), will call vfio_group_get_device
and try to acquire the mutex group->device_lock.
wait_event_interruptible_timeout() will set the state of the current
task to TASK_INTERRUPTIBLE, before doing the condition check. This
means that we will try to acquire the mutex while already in a
sleeping state. The scheduler warns us by giving the following
warning:
[ 4050.264464] ------------[ cut here ]------------
[ 4050.264508] do not call blocking ops when !TASK_RUNNING; state=1 set at [<00000000b33c00e2>] prepare_to_wait_event+0x14a/0x188
[ 4050.264529] WARNING: CPU: 12 PID: 35924 at kernel/sched/core.c:6112 __might_sleep+0x76/0x90
....
4050.264756] Call Trace:
[ 4050.264765] ([<000000000017bbaa>] __might_sleep+0x72/0x90)
[ 4050.264774] [<0000000000b97edc>] __mutex_lock+0x44/0x8c0
[ 4050.264782] [<0000000000b9878a>] mutex_lock_nested+0x32/0x40
[ 4050.264793] [<000003ff800d7abe>] vfio_group_get_device+0x36/0xa8 [vfio]
[ 4050.264803] [<000003ff800d87c0>] vfio_del_group_dev+0x238/0x378 [vfio]
[ 4050.264813] [<000003ff8015f67c>] mdev_remove+0x3c/0x68 [mdev]
[ 4050.264825] [<00000000008e01b0>] device_release_driver_internal+0x168/0x268
[ 4050.264834] [<00000000008de692>] bus_remove_device+0x162/0x190
[ 4050.264843] [<00000000008daf42>] device_del+0x1e2/0x368
[ 4050.264851] [<00000000008db12c>] device_unregister+0x64/0x88
[ 4050.264862] [<000003ff8015ed84>] mdev_device_remove+0xec/0x130 [mdev]
[ 4050.264872] [<000003ff8015f074>] remove_store+0x6c/0xa8 [mdev]
[ 4050.264881] [<000000000046f494>] kernfs_fop_write+0x14c/0x1f8
[ 4050.264890] [<00000000003c1530>] __vfs_write+0x38/0x1a8
[ 4050.264899] [<00000000003c187c>] vfs_write+0xb4/0x198
[ 4050.264908] [<00000000003c1af2>] ksys_write+0x5a/0xb0
[ 4050.264916] [<0000000000b9e270>] system_call+0xdc/0x2d8
[ 4050.264925] 4 locks held by sh/35924:
[ 4050.264933] #0: 000000001ef90325 (sb_writers#4){.+.+}, at: vfs_write+0x9e/0x198
[ 4050.264948] #1: 000000005c1ab0b3 (&of->mutex){+.+.}, at: kernfs_fop_write+0x1cc/0x1f8
[ 4050.264963] #2: 0000000034831ab8 (kn->count#297){++++}, at: kernfs_remove_self+0x12e/0x150
[ 4050.264979] #3: 00000000e152484f (&dev->mutex){....}, at: device_release_driver_internal+0x5c/0x268
[ 4050.264993] Last Breaking-Event-Address:
[ 4050.265002] [<000000000017bbaa>] __might_sleep+0x72/0x90
[ 4050.265010] irq event stamp: 7039
[ 4050.265020] hardirqs last enabled at (7047): [<00000000001cee7a>] console_unlock+0x6d2/0x740
[ 4050.265029] hardirqs last disabled at (7054): [<00000000001ce87e>] console_unlock+0xd6/0x740
[ 4050.265040] softirqs last enabled at (6416): [<0000000000b8fe26>] __udelay+0xb6/0x100
[ 4050.265049] softirqs last disabled at (6415): [<0000000000b8fe06>] __udelay+0x96/0x100
[ 4050.265057] ---[ end trace d04a07d39d99a9f9 ]---
Let's fix this as described in the article
https://lwn.net/Articles/628628/.
Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
[remove now redundant vfio_dev_present()]
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0ab88ca4bc ]
clang warns that 'contextlen' may be accessed without an initialization:
fs/nfsd/nfs4xdr.c:2911:9: error: variable 'contextlen' is uninitialized when used here [-Werror,-Wuninitialized]
contextlen);
^~~~~~~~~~
fs/nfsd/nfs4xdr.c:2424:16: note: initialize the variable 'contextlen' to silence this warning
int contextlen;
^
= 0
Presumably this cannot happen, as FATTR4_WORD2_SECURITY_LABEL is
set if CONFIG_NFSD_V4_SECURITY_LABEL is enabled.
Adding another #ifdef like the other two in this function
avoids the warning.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0b8f62625d ]
A fuzzer recently triggered lockdep warnings about potential sb_writers
deadlocks caused by fh_want_write().
Looks like we aren't careful to pair each fh_want_write() with an
fh_drop_write().
It's not normally a problem since fh_put() will call fh_drop_write() for
us. And was OK for NFSv3 where we'd do one operation that might call
fh_want_write(), and then put the filehandle.
But an NFSv4 protocol fuzzer can do weird things like call unlink twice
in a compound, and then we get into trouble.
I'm a little worried about this approach of just leaving everything to
fh_put(). But I think there are probably a lot of
fh_want_write()/fh_drop_write() imbalances so for now I think we need it
to be more forgiving.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7640682e67 ]
FUSE filesystem server and kernel client negotiate during initialization
phase, what should be the maximum write size the client will ever issue.
Correspondingly the filesystem server then queues sys_read calls to read
requests with buffer capacity large enough to carry request header + that
max_write bytes. A filesystem server is free to set its max_write in
anywhere in the range between [1*page, fc->max_pages*page]. In particular
go-fuse[2] sets max_write by default as 64K, wheres default fc->max_pages
corresponds to 128K. Libfuse also allows users to configure max_write, but
by default presets it to possible maximum.
If max_write is < fc->max_pages*page, and in NOTIFY_RETRIEVE handler we
allow to retrieve more than max_write bytes, corresponding prepared
NOTIFY_REPLY will be thrown away by fuse_dev_do_read, because the
filesystem server, in full correspondence with server/client contract, will
be only queuing sys_read with ~max_write buffer capacity, and
fuse_dev_do_read throws away requests that cannot fit into server request
buffer. In turn the filesystem server could get stuck waiting indefinitely
for NOTIFY_REPLY since NOTIFY_RETRIEVE handler returned OK which is
understood by clients as that NOTIFY_REPLY was queued and will be sent
back.
Cap requested size to negotiate max_write to avoid the problem. This
aligns with the way NOTIFY_RETRIEVE handler works, which already
unconditionally caps requested retrieve size to fuse_conn->max_pages. This
way it should not hurt NOTIFY_RETRIEVE semantic if we return less data than
was originally requested.
Please see [1] for context where the problem of stuck filesystem was hit
for real, how the situation was traced and for more involving patch that
did not make it into the tree.
[1] https://marc.info/?l=linux-fsdevel&m=155057023600853&w=2
[2] https://github.com/hanwen/go-fuse
Signed-off-by: Kirill Smelkov <kirr@nexedi.com>
Cc: Han-Wen Nienhuys <hanwen@google.com>
Cc: Jakob Unterwurzacher <jakobunt@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit da75b89097 ]
The device tree binding already lists compatible strings for these two
SoCs. They don't have the defect as seen on the H3, and the size and
register layout is the same as the A64. Furthermore, the driver does
not include nvmem cell definitions.
Add support for these two compatible strings, re-using the config for
the A64.
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Acked-by: Maxime Ripard <maxime.ripard@bootlin.com>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2fe518fecb ]
When the bit_offset in the cell is zero, the pointer to the msb will
not be properly initialized (ie, will still be pointing to the first
byte in the buffer).
This being the case, if there are bits to clear in the msb, those will
be left untouched while the mask will incorrectly clear bit positions
on the first byte.
This commit also makes sure that any byte unused in the cell is
cleared.
Signed-off-by: Jorge Ramirez-Ortiz <jorge.ramirez-ortiz@linaro.org>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7b3320e6b1 ]
Commit 7ee7ef24d0 ("scsi: arm64: defconfig: enable configs for Hisilicon ufs")
set 'CONFIG_SCSI_UFS_HISI=y', but the configs it depends
on
(CONFIG_SCSI_HFSHCD_PLATFORM && CONFIG_SCSI_UFSHCD)
were left to being built as modules.
Commit 1f4fa50dd4 ("arm64: defconfig: Regenerate for v4.20") "fixed"
that by reverting to 'CONFIG_SCSI_UFS_HISI=m'.
Thing is, if the rootfs is stored in the on-board flash (which
is the "canonical" way of doing things), we either need these drivers
to be built-in, or we need to fiddle with an initramfs to access that
flash and eventually load the modules installed over there.
The former is the easiest, do that.
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Reviewed-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b2d3b5ee66 ]
When removing memory we need to remove the memory from the node
it was added to instead of looking up the node it should be in
in the device tree.
During testing we have seen scenarios where the affinity for a
LMB changes due to a partition migration or PRRN event. In these
cases the node the LMB exists in may not match the node the device
tree indicates it belongs in. This can lead to a system crash
when trying to DLPAR remove the LMB after a migration or PRRN
event. The current code looks up the node in the device tree to
remove the LMB from, the crash occurs when we try to offline this
node and it does not have any data, i.e. node_data[nid] == NULL.
36:mon> e
cpu 0x36: Vector: 300 (Data Access) at [c0000001828b7810]
pc: c00000000036d08c: try_offline_node+0x2c/0x1b0
lr: c0000000003a14ec: remove_memory+0xbc/0x110
sp: c0000001828b7a90
msr: 800000000280b033
dar: 9a28
dsisr: 40000000
current = 0xc0000006329c4c80
paca = 0xc000000007a55200 softe: 0 irq_happened: 0x01
pid = 76926, comm = kworker/u320:3
36:mon> t
[link register ] c0000000003a14ec remove_memory+0xbc/0x110
[c0000001828b7a90] c00000000006a1cc arch_remove_memory+0x9c/0xd0 (unreliable)
[c0000001828b7ad0] c0000000003a14e0 remove_memory+0xb0/0x110
[c0000001828b7b20] c0000000000c7db4 dlpar_remove_lmb+0x94/0x160
[c0000001828b7b60] c0000000000c8ef8 dlpar_memory+0x7e8/0xd10
[c0000001828b7bf0] c0000000000bf828 handle_dlpar_errorlog+0xf8/0x160
[c0000001828b7c60] c0000000000bf8cc pseries_hp_work_fn+0x3c/0xa0
[c0000001828b7c90] c000000000128cd8 process_one_work+0x298/0x5a0
[c0000001828b7d20] c000000000129068 worker_thread+0x88/0x620
[c0000001828b7dc0] c00000000013223c kthread+0x1ac/0x1c0
[c0000001828b7e30] c00000000000b45c ret_from_kernel_thread+0x5c/0x80
To resolve this we need to track the node a LMB belongs to when
it is added to the system so we can remove it from that node instead
of the node that the device tree indicates it should belong to.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f495222e28 ]
Currently the IRQ handler in HD-audio controller driver is registered
before the chip initialization. That is, we have some window opened
between the azx_acquire_irq() call and the CORB/RIRB setup. If an
interrupt is triggered in this small window, the IRQ handler may
access to the uninitialized RIRB buffer, which leads to a NULL
dereference Oops.
This is usually no big problem since most of Intel chips do register
the IRQ via MSI, and we've already fixed the order of the IRQ
enablement and the CORB/RIRB setup in the former commit b61749a89f
("sound: enable interrupt after dma buffer initialization"), hence the
IRQ won't be triggered in that room. However, some platforms use a
shared IRQ, and this may allow the IRQ trigger by another source.
Another possibility is the kdump environment: a stale interrupt might
be present in there, the IRQ handler can be falsely triggered as well.
For covering this small race, let's move the azx_acquire_irq() call
after hda_intel_init_chip() call. Although this is a bit radical
change, it can cover more widely than checking the CORB/RIRB setup
locally in the callee side.
Reported-by: Liwei Song <liwei.song@windriver.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 26a302afbe ]
flow_offload_alloc() calls nf_route() to get a dst_entry. Internally,
nf_route() calls ip_route_output_key() that allocates a dst_entry and
holds it. So, a dst_entry should be released by dst_release() if
nf_route() is successful.
Otherwise, netns exit routine cannot be finished and the following
message is printed:
[ 257.490952] unregister_netdevice: waiting for lo to become free. Usage count = 1
Fixes: ac2a66665e ("netfilter: add generic flow table infrastructure")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 33cc3c0cfa ]
nf_flow_offload_ip_hook() and nf_flow_offload_ipv6_hook() do not check
ttl value. So, ttl value overflow may occur.
Fixes: 97add9f0d6 ("netfilter: flow table support for IPv4")
Fixes: 0995210753 ("netfilter: flow table support for IPv6")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9dc1a38ef1 ]
We do not restart a controller in a deleting state for timeout errors.
When in this state, unblock potential request dispatchers with failed
completions by shutting down the controller on timeout detection.
Reported-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c8e9e9b764 ]
Just like IO queues, the admin queue also will not be restarted after a
controller shutdown. Unquiesce this queue so that we do not block
request dispatch on a permanently disabled controller.
Reported-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6b7330303a ]
Certain platforms like K2G reguires the outbound ATU window to be
aligned. The alignment size is already present in mem->page_size.
Use the alignment size present in mem->page_size to configure an
aligned ATU window. In order to raise an interrupt, CPU has to write
to address offset from the start of the window unlike before where
writes were always to the beginning of the ATU window.
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8f22066457 ]
commit 834b905199 ("misc: pci_endpoint_test: Add support for
PCI_ENDPOINT_TEST regs to be mapped to any BAR") while adding
test_reg_bar in order to map PCI_ENDPOINT_TEST regs to be mapped to any
BAR failed to update test_reg_bar in pci_endpoint_test, resulting in
test_reg_bar having invalid value when used outside probe.
Fix it.
Fixes: 834b905199 ("misc: pci_endpoint_test: Add support for PCI_ENDPOINT_TEST regs to be mapped to any BAR")
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit cf1ec4539a ]
The intel_iommu_gfx_mapped flag is exported by the Intel
IOMMU driver to indicate whether an IOMMU is used for the
graphic device. In a virtualized IOMMU environment (e.g.
QEMU), an include-all IOMMU is used for graphic device.
This flag is found to be clear even the IOMMU is used.
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Reported-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Fixes: c0771df8d5 ("intel-iommu: Export a flag indicating that the IOMMU is used for iGFX.")
Suggested-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a223770bfa ]
CONFIG_WATCHDOG_PRETIMEOUT_GOV build symbol adds watchdog_pretimeout.o
object to watchdog.o, the latter is compiled only if CONFIG_WATCHDOG_CORE
is selected, so it rightfully makes sense to add it as a dependency.
The change fixes the next compilation errors, if CONFIG_WATCHDOG_CORE=n
and CONFIG_WATCHDOG_PRETIMEOUT_GOV=y are selected:
drivers/watchdog/pretimeout_noop.o: In function `watchdog_gov_noop_register':
drivers/watchdog/pretimeout_noop.c:35: undefined reference to `watchdog_register_governor'
drivers/watchdog/pretimeout_noop.o: In function `watchdog_gov_noop_unregister':
drivers/watchdog/pretimeout_noop.c:40: undefined reference to `watchdog_unregister_governor'
drivers/watchdog/pretimeout_panic.o: In function `watchdog_gov_panic_register':
drivers/watchdog/pretimeout_panic.c:35: undefined reference to `watchdog_register_governor'
drivers/watchdog/pretimeout_panic.o: In function `watchdog_gov_panic_unregister':
drivers/watchdog/pretimeout_panic.c:40: undefined reference to `watchdog_unregister_governor'
Reported-by: Kuo, Hsuan-Chi <hckuo2@illinois.edu>
Fixes: ff84136cb6 ("watchdog: add watchdog pretimeout governor framework")
Signed-off-by: Vladimir Zapolskiy <vz@mleia.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b07e228eee ]
The documentated behavior is: if max_hw_heartbeat_ms is implemented, the
minimum of the set_timeout argument and max_hw_heartbeat_ms should be used.
This patch implements this behavior.
Previously only the first 7bits were used and the input argument was
returned.
Signed-off-by: Georg Hofmann <georg@hofmannsweb.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit edbd82c5fb ]
Following splat gets triggered when nfnetlink monitor is running while
xtables-nft selftests are running:
net/netfilter/nf_tables_api.c:1272 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
1 lock held by xtables-nft-mul/27006:
#0: 00000000e0f85be9 (&net->nft.commit_mutex){+.+.}, at: nf_tables_valid_genid+0x1a/0x50
Call Trace:
nf_tables_fill_chain_info.isra.45+0x6cc/0x6e0
nf_tables_chain_notify+0xf8/0x1a0
nf_tables_commit+0x165c/0x1740
nf_tables_fill_chain_info() can be called both from dumps (rcu read locked)
or from the transaction path if a userspace process subscribed to nftables
notifications.
In the 'table dump' case, rcu_access_pointer() cannot be used: We do not
hold transaction mutex so the pointer can be NULLed right after the check.
Just unconditionally fetch the value, then have the helper return
immediately if its NULL.
In the notification case we don't hold the rcu read lock, but updates are
prevented due to transaction mutex. Use rcu_dereference_check() to make lockdep
aware of this.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f5e85ce8e7 ]
Since commit bc7d811ace ("netfilter: nf_ct_h323: Convert
CHECK_BOUND macro to function"), NAT traversal for H.323
doesn't work, failing to parse H323-UserInformation.
nf_h323_error_boundary() compares contents of the bitstring,
not the addresses, preventing valid H.323 packets from being
conntrack'd.
This looks like an oversight from when CHECK_BOUND macro was
converted to a function.
To fix it, stop dereferencing bs->cur and bs->end.
Fixes: bc7d811ace ("netfilter: nf_ct_h323: Convert CHECK_BOUND macro to function")
Signed-off-by: Jakub Jankowski <shasta@toxcorp.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 43c8f13118 ]
rhashtable_insert_fast() may return an error value when memory
allocation fails, but flow_offload_add() does not check for errors.
This patch just adds missing error checking.
Fixes: ac2a66665e ("netfilter: add generic flow table infrastructure")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8520ce1e17 ]
The IRQ handler, mmci_irq(), loops until all status bits have been cleared.
However, the status bit signaling busy in variant->busy_detect_flag, may be
set even if busy detection isn't monitored for the current request.
This may be the case for the CMD11 when switching the I/O voltage, which
leads to that mmci_irq() busy loops in IRQ context. Fix this problem, by
clearing the status bit for busy, before continuing to validate the
condition for the loop. This is safe, because the busy status detection has
already been taken care of by mmci_cmd_irq().
Signed-off-by: Ludovic Barre <ludovic.barre@st.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d989903058 ]
Overlayfs "fake" path is used for stacked file operations on underlying
files. Operations on files with "fake" path must not generate fsnotify
events with path data, because those events have already been generated at
overlayfs layer and because the reported event->fd for fanotify marks on
underlying inode/filesystem will have the wrong path (the overlayfs path).
Link: https://lore.kernel.org/linux-fsdevel/20190423065024.12695-1-jencce.kernel@gmail.com/
Reported-by: Murphy Zhou <jencce.kernel@gmail.com>
Fixes: d1d04ef857 ("ovl: stack file ops")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9e2b5de560 ]
If we ever did MSI-related initializations, we need to call
dw_pcie_free_msi() in the error code path.
Remove the IS_ENABLED(CONFIG_PCI_MSI) check for MSI init because
pci_msi_enabled() already has a stub for !CONFIG_PCI_MSI.
Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Gustavo Pimentel <gustavo.pimentel@synopsys.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 198790d9a3 ]
In free_percpu() we sometimes call pcpu_schedule_balance_work() to
queue a work item (which does a wakeup) while holding pcpu_lock.
This creates an unnecessary lock dependency between pcpu_lock and
the scheduler's pi_lock. There are other places where we call
pcpu_schedule_balance_work() without hold pcpu_lock, and this case
doesn't need to be different.
Moving the call outside the lock prevents the following lockdep splat
when running tools/testing/selftests/bpf/{test_maps,test_progs} in
sequence with lockdep enabled:
======================================================
WARNING: possible circular locking dependency detected
5.1.0-dbg-DEV #1 Not tainted
------------------------------------------------------
kworker/23:255/18872 is trying to acquire lock:
000000000bc79290 (&(&pool->lock)->rlock){-.-.}, at: __queue_work+0xb2/0x520
but task is already holding lock:
00000000e3e7a6aa (pcpu_lock){..-.}, at: free_percpu+0x36/0x260
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #4 (pcpu_lock){..-.}:
lock_acquire+0x9e/0x180
_raw_spin_lock_irqsave+0x3a/0x50
pcpu_alloc+0xfa/0x780
__alloc_percpu_gfp+0x12/0x20
alloc_htab_elem+0x184/0x2b0
__htab_percpu_map_update_elem+0x252/0x290
bpf_percpu_hash_update+0x7c/0x130
__do_sys_bpf+0x1912/0x1be0
__x64_sys_bpf+0x1a/0x20
do_syscall_64+0x59/0x400
entry_SYSCALL_64_after_hwframe+0x49/0xbe
-> #3 (&htab->buckets[i].lock){....}:
lock_acquire+0x9e/0x180
_raw_spin_lock_irqsave+0x3a/0x50
htab_map_update_elem+0x1af/0x3a0
-> #2 (&rq->lock){-.-.}:
lock_acquire+0x9e/0x180
_raw_spin_lock+0x2f/0x40
task_fork_fair+0x37/0x160
sched_fork+0x211/0x310
copy_process.part.43+0x7b1/0x2160
_do_fork+0xda/0x6b0
kernel_thread+0x29/0x30
rest_init+0x22/0x260
arch_call_rest_init+0xe/0x10
start_kernel+0x4fd/0x520
x86_64_start_reservations+0x24/0x26
x86_64_start_kernel+0x6f/0x72
secondary_startup_64+0xa4/0xb0
-> #1 (&p->pi_lock){-.-.}:
lock_acquire+0x9e/0x180
_raw_spin_lock_irqsave+0x3a/0x50
try_to_wake_up+0x41/0x600
wake_up_process+0x15/0x20
create_worker+0x16b/0x1e0
workqueue_init+0x279/0x2ee
kernel_init_freeable+0xf7/0x288
kernel_init+0xf/0x180
ret_from_fork+0x24/0x30
-> #0 (&(&pool->lock)->rlock){-.-.}:
__lock_acquire+0x101f/0x12a0
lock_acquire+0x9e/0x180
_raw_spin_lock+0x2f/0x40
__queue_work+0xb2/0x520
queue_work_on+0x38/0x80
free_percpu+0x221/0x260
pcpu_freelist_destroy+0x11/0x20
stack_map_free+0x2a/0x40
bpf_map_free_deferred+0x3c/0x50
process_one_work+0x1f7/0x580
worker_thread+0x54/0x410
kthread+0x10f/0x150
ret_from_fork+0x24/0x30
other info that might help us debug this:
Chain exists of:
&(&pool->lock)->rlock --> &htab->buckets[i].lock --> pcpu_lock
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(pcpu_lock);
lock(&htab->buckets[i].lock);
lock(pcpu_lock);
lock(&(&pool->lock)->rlock);
*** DEADLOCK ***
3 locks held by kworker/23:255/18872:
#0: 00000000b36a6e16 ((wq_completion)events){+.+.},
at: process_one_work+0x17a/0x580
#1: 00000000dfd966f0 ((work_completion)(&map->work)){+.+.},
at: process_one_work+0x17a/0x580
#2: 00000000e3e7a6aa (pcpu_lock){..-.},
at: free_percpu+0x36/0x260
stack backtrace:
CPU: 23 PID: 18872 Comm: kworker/23:255 Not tainted 5.1.0-dbg-DEV #1
Hardware name: ...
Workqueue: events bpf_map_free_deferred
Call Trace:
dump_stack+0x67/0x95
print_circular_bug.isra.38+0x1c6/0x220
check_prev_add.constprop.50+0x9f6/0xd20
__lock_acquire+0x101f/0x12a0
lock_acquire+0x9e/0x180
_raw_spin_lock+0x2f/0x40
__queue_work+0xb2/0x520
queue_work_on+0x38/0x80
free_percpu+0x221/0x260
pcpu_freelist_destroy+0x11/0x20
stack_map_free+0x2a/0x40
bpf_map_free_deferred+0x3c/0x50
process_one_work+0x1f7/0x580
worker_thread+0x54/0x410
kthread+0x10f/0x150
ret_from_fork+0x24/0x30
Signed-off-by: John Sperbeck <jsperbeck@google.com>
Signed-off-by: Dennis Zhou <dennis@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1e4e25c495 ]
The subsystem will free the asd memory on notifier cleanup, if the asd is
added to the notifier.
However the memory is freed using kfree.
Thus, we cannot allocate the asd using devm_*
This can lead to crashes and problems.
To test this issue, just return an error at probe, but cleanup the
notifier beforehand.
Fixes: 106267444f ("[media] atmel-isc: add the Image Sensor Controller code")
Signed-off-by: Eugen Hristev <eugen.hristev@microchip.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b42b179bda ]
As Jungyeon reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=203221
- Overview
When mounting the attached crafted image and running program, this error is reported.
The image is intentionally fuzzed from a normal f2fs image for testing and I enabled option CONFIG_F2FS_CHECK_FS on.
- Reproduces
cc poc_07.c
mkdir test
mount -t f2fs tmp.img test
cp a.out test
cd test
sudo ./a.out
- Messages
kernel BUG at fs/f2fs/node.c:1279!
RIP: 0010:read_node_page+0xcf/0xf0
Call Trace:
__get_node_page+0x6b/0x2f0
f2fs_iget+0x8f/0xdf0
f2fs_lookup+0x136/0x320
__lookup_slow+0x92/0x140
lookup_slow+0x30/0x50
walk_component+0x1c1/0x350
path_lookupat+0x62/0x200
filename_lookup+0xb3/0x1a0
do_fchmodat+0x3e/0xa0
__x64_sys_chmod+0x12/0x20
do_syscall_64+0x43/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
On below paths, we can have opportunity to readahead inode page
- gc_node_segment -> f2fs_ra_node_page
- gc_data_segment -> f2fs_ra_node_page
- f2fs_fill_dentries -> f2fs_ra_node_page
Unlike synchronized read, on readahead path, we can set page uptodate
before verifying page's checksum, then read_node_page() will trigger
kernel panic once it encounters a uptodated page w/ incorrect checksum.
So considering readahead scenario, we have to do checksum each time
when loading inode page even if it is uptodated.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 45a7468815 ]
With below mkfs and mount option, generic/339 of fstest will report that
scratch image becomes corrupted.
MKFS_OPTIONS -- -O extra_attr -O project_quota -O inode_checksum -O flexible_inline_xattr -O inode_crtime -f /dev/zram1
MOUNT_OPTIONS -- -o acl,user_xattr -o discard,noinline_xattr /dev/zram1 /mnt/scratch_f2fs
[ASSERT] (f2fs_check_dirent_position:1315) --> Wrong position of dirent pino:1970, name: (...)
level:8, dir_level:0, pgofs:951, correct range:[900, 901]
In old kernel, inline data and directory always reserved 200 bytes in
inode layout, even if inline_xattr is disabled, then new kernel tries
to retrieve that space for non-inline xattr inode, but for inline dentry,
its layout size should be fixed, so we just keep that reserved space.
But the problem here is that, after inline dentry conversion, inline
dentry layout no longer exists, if we still reserve inline xattr space,
after dents updates, there will be a hole in inline xattr space, which
can break hierarchy hash directory structure.
This patch fixes this issue by retrieving inline xattr space after
inline dentry conversion.
Fixes: 6afc662e68 ("f2fs: support flexible inline xattr size")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 793ab1c8a7 ]
As Jungyeon reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=203211
- Overview
When mounting the attached crafted image and making a new file, I got this error and the error messages keep repeating.
The image is intentionally fuzzed from a normal f2fs image for testing and I run with option CONFIG_F2FS_CHECK_FS on.
- Reproduces
mkdir test
mount -t f2fs tmp.img test
cd test
touch t
- Messages
[ 58.820451] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.821485] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.822530] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.823571] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.824616] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.825640] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.826663] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.827698] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.828719] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.829759] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.830783] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.831828] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.832869] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.833888] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.834945] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.835996] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.837028] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.838051] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.839072] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.840100] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.841147] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.842186] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.843214] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.844267] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.845282] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.846305] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.847341] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
... (repeating)
During GC, if segment type stored in SSA and SIT is inconsistent, we just
skip migrating current segment directly, since we need to know the exact
type to decide the migration function we use.
So in foreground GC, we will easily run into a infinite loop as we may
select the same victim segment which has inconsistent type due to greedy
policy. In order to end up this, we choose to shutdown filesystem. For
backgrond GC, we need to do that as well, so that we can avoid latter
potential infinite looped foreground GC.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e95bcdb2fe ]
As Jungyeon reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=203233
- Overview
When mounting the attached crafted image and running program, following errors are reported.
Additionally, it hangs on sync after running program.
The image is intentionally fuzzed from a normal f2fs image for testing.
Compile options for F2FS are as follows.
CONFIG_F2FS_FS=y
CONFIG_F2FS_STAT_FS=y
CONFIG_F2FS_FS_XATTR=y
CONFIG_F2FS_FS_POSIX_ACL=y
CONFIG_F2FS_CHECK_FS=y
- Reproduces
cc poc_13.c
mkdir test
mount -t f2fs tmp.img test
cp a.out test
cd test
sudo ./a.out
sync
- Kernel messages
F2FS-fs (sdb): Bitmap was wrongly set, blk:4608
kernel BUG at fs/f2fs/segment.c:2102!
RIP: 0010:update_sit_entry+0x394/0x410
Call Trace:
f2fs_allocate_data_block+0x16f/0x660
do_write_page+0x62/0x170
f2fs_do_write_node_page+0x33/0xa0
__write_node_page+0x270/0x4e0
f2fs_sync_node_pages+0x5df/0x670
f2fs_write_checkpoint+0x372/0x1400
f2fs_sync_fs+0xa3/0x130
f2fs_do_sync_file+0x1a6/0x810
do_fsync+0x33/0x60
__x64_sys_fsync+0xb/0x10
do_syscall_64+0x43/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
sit.vblocks and sum valid block count in sit.valid_map may be
inconsistent, segment w/ zero vblocks will be treated as free
segment, while allocating in free segment, we may allocate a
free block, if its bitmap is valid previously, it can cause
kernel crash due to bitmap verification failure.
Anyway, to avoid further serious metadata inconsistence and
corruption, it is necessary and worth to detect SIT
inconsistence. So let's enable check_block_count() to verify
vblocks and valid_map all the time rather than do it only
CONFIG_F2FS_CHECK_FS is enabled.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ea6d7e72fe ]
As Jungyeon reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=203213
- Overview
When mounting the attached crafted image and running program, I got this error.
Additionally, it hangs on sync after running the this script.
The image is intentionally fuzzed from a normal f2fs image for testing and I enabled option CONFIG_F2FS_CHECK_FS on.
- Reproduces
mkdir test
mount -t f2fs tmp.img test
cp a.out test
cd test
sudo ./a.out
sync
kernel BUG at fs/f2fs/f2fs.h:2012!
RIP: 0010:truncate_node+0x2c9/0x2e0
Call Trace:
f2fs_truncate_xattr_node+0xa1/0x130
f2fs_remove_inode_page+0x82/0x2d0
f2fs_evict_inode+0x2a3/0x3a0
evict+0xba/0x180
__dentry_kill+0xbe/0x160
dentry_kill+0x46/0x180
dput+0xbb/0x100
do_renameat2+0x3c9/0x550
__x64_sys_rename+0x17/0x20
do_syscall_64+0x43/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
The reason is dec_valid_node_count() will trigger kernel panic due to
inconsistent count in between inode.i_blocks and actual block.
To avoid panic, let's just print debug message and set SBI_NEED_FSCK to
give a hint to fsck for latter repairing.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix build warning and add unlikely]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 622927f3b8 ]
With below mkfs and mount option:
MKFS_OPTIONS -- -O extra_attr -O project_quota -O inode_checksum -O flexible_inline_xattr -O inode_crtime -f
MOUNT_OPTIONS -- -o noinline_xattr
We may miss xattr data with below testcase:
- mkdir dir
- setfattr -n "user.name" -v 0 dir
- for ((i = 0; i < 190; i++)) do touch dir/$i; done
- umount
- mount
- getfattr -n "user.name" dir
user.name: No such attribute
The root cause is that we persist xattr data into reserved inline xattr
space, even if inline_xattr is not enable in inline directory inode, after
inline dentry conversion, reserved space no longer exists, so that xattr
data missed.
Let's use inline xattr space only if inline_xattr flag is set on inode
to fix this iusse.
Fixes: 6afc662e68 ("f2fs: support flexible inline xattr size")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5e159cd349 ]
As Jungyeon reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=203209
- Overview
When mounting the attached crafted image and running program, I got this error.
Additionally, it hangs on sync after the this script.
The image is intentionally fuzzed from a normal f2fs image for testing and I enabled option CONFIG_F2FS_CHECK_FS on.
- Reproduces
cc poc_01.c
./run.sh f2fs
sync
kernel BUG at fs/f2fs/f2fs.h:1788!
RIP: 0010:f2fs_truncate_data_blocks_range+0x342/0x350
Call Trace:
f2fs_truncate_blocks+0x36d/0x3c0
f2fs_truncate+0x88/0x110
f2fs_setattr+0x3e1/0x460
notify_change+0x2da/0x400
do_truncate+0x6d/0xb0
do_sys_ftruncate+0xf1/0x160
do_syscall_64+0x43/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
The reason is dec_valid_block_count() will trigger kernel panic due to
inconsistent count in between inode.i_blocks and actual block.
To avoid panic, let's just print debug message and set SBI_NEED_FSCK to
give a hint to fsck for latter repairing.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix build warning and add unlikely]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 546d22f070 ]
As Jungyeon reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=203217
- Overview
When mounting the attached crafted image and running program, I got this error.
Additionally, it hangs on sync after running the program.
The image is intentionally fuzzed from a normal f2fs image for testing and I enabled option CONFIG_F2FS_CHECK_FS on.
- Reproduces
cc poc_test_05.c
mkdir test
mount -t f2fs tmp.img test
sudo ./a.out
sync
- Messages
kernel BUG at fs/f2fs/inode.c:707!
RIP: 0010:f2fs_evict_inode+0x33f/0x3a0
Call Trace:
evict+0xba/0x180
f2fs_iget+0x598/0xdf0
f2fs_lookup+0x136/0x320
__lookup_slow+0x92/0x140
lookup_slow+0x30/0x50
walk_component+0x1c1/0x350
path_lookupat+0x62/0x200
filename_lookup+0xb3/0x1a0
do_readlinkat+0x56/0x110
__x64_sys_readlink+0x16/0x20
do_syscall_64+0x43/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
During inode loading, __recover_inline_status() can recovery inode status
and set inode dirty, once we failed in following process, it will fail
the check in f2fs_evict_inode, result in trigger BUG_ON().
Let's clear dirty inode in error path of f2fs_iget() to avoid panic.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 626bcf2b7c ]
As Jungyeon reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=203225
- Overview
When mounting the attached crafted image and unmounting it, following errors are reported.
Additionally, it hangs on sync after unmounting.
The image is intentionally fuzzed from a normal f2fs image for testing.
Compile options for F2FS are as follows.
CONFIG_F2FS_FS=y
CONFIG_F2FS_STAT_FS=y
CONFIG_F2FS_FS_XATTR=y
CONFIG_F2FS_FS_POSIX_ACL=y
CONFIG_F2FS_CHECK_FS=y
- Reproduces
mkdir test
mount -t f2fs tmp.img test
touch test/t
umount test
sync
- Messages
kernel BUG at fs/f2fs/node.c:3073!
RIP: 0010:f2fs_destroy_node_manager+0x2f0/0x300
Call Trace:
f2fs_put_super+0xf4/0x270
generic_shutdown_super+0x62/0x110
kill_block_super+0x1c/0x50
kill_f2fs_super+0xad/0xd0
deactivate_locked_super+0x35/0x60
cleanup_mnt+0x36/0x70
task_work_run+0x75/0x90
exit_to_usermode_loop+0x93/0xa0
do_syscall_64+0xba/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0010:f2fs_destroy_node_manager+0x2f0/0x300
NAT table is corrupted, so reserved meta/node inode ids were added into
free list incorrectly, during file creation, since reserved id has cached
in inode hash, so it fails the creation and preallocated nid can not be
released later, result in kernel panic.
To fix this issue, let's do nid boundary check during free nid loading.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8b6810f8ac ]
As Jungyeon reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=203219
- Overview
When mounting the attached crafted image and running program, I got this error.
Additionally, it hangs on sync after running the program.
The image is intentionally fuzzed from a normal f2fs image for testing and I enabled option CONFIG_F2FS_CHECK_FS on.
- Reproduces
cc poc_06.c
mkdir test
mount -t f2fs tmp.img test
cp a.out test
cd test
sudo ./a.out
sync
- Messages
kernel BUG at fs/f2fs/node.c:1183!
RIP: 0010:f2fs_remove_inode_page+0x294/0x2d0
Call Trace:
f2fs_evict_inode+0x2a3/0x3a0
evict+0xba/0x180
__dentry_kill+0xbe/0x160
dentry_kill+0x46/0x180
dput+0xbb/0x100
do_renameat2+0x3c9/0x550
__x64_sys_rename+0x17/0x20
do_syscall_64+0x43/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
The reason is f2fs_remove_inode_page() will trigger kernel panic due to
inconsistent i_blocks value of inode.
To avoid panic, let's just print debug message and set SBI_NEED_FSCK to
give a hint to fsck for latter repairing of potential image corruption.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix build warning and add unlikely]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 988385795c ]
There are some places in where we missed to unlock page or unlock page
incorrectly, fix them.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 05573d6ccf ]
As Jungyeon reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=203239
- Overview
When mounting the attached crafted image and running program, following errors are reported.
Additionally, it hangs on sync after running program.
The image is intentionally fuzzed from a normal f2fs image for testing.
Compile options for F2FS are as follows.
CONFIG_F2FS_FS=y
CONFIG_F2FS_STAT_FS=y
CONFIG_F2FS_FS_XATTR=y
CONFIG_F2FS_FS_POSIX_ACL=y
CONFIG_F2FS_CHECK_FS=y
- Reproduces
cc poc_15.c
./run.sh f2fs
sync
- Kernel messages
------------[ cut here ]------------
kernel BUG at fs/f2fs/segment.c:3162!
RIP: 0010:f2fs_inplace_write_data+0x12d/0x160
Call Trace:
f2fs_do_write_data_page+0x3c1/0x820
__write_data_page+0x156/0x720
f2fs_write_cache_pages+0x20d/0x460
f2fs_write_data_pages+0x1b4/0x300
do_writepages+0x15/0x60
__filemap_fdatawrite_range+0x7c/0xb0
file_write_and_wait_range+0x2c/0x80
f2fs_do_sync_file+0x102/0x810
do_fsync+0x33/0x60
__x64_sys_fsync+0xb/0x10
do_syscall_64+0x43/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
The reason is f2fs_inplace_write_data() will trigger kernel panic due
to data block locates in node type segment.
To avoid panic, let's just return error code and set SBI_NEED_FSCK to
give a hint to fsck for latter repairing.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 22d61e286e ]
As Jungyeon reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=203227
- Overview
When mounting the attached crafted image, following errors are reported.
Additionally, it hangs on sync after trying to mount it.
The image is intentionally fuzzed from a normal f2fs image for testing.
Compile options for F2FS are as follows.
CONFIG_F2FS_FS=y
CONFIG_F2FS_STAT_FS=y
CONFIG_F2FS_FS_XATTR=y
CONFIG_F2FS_FS_POSIX_ACL=y
CONFIG_F2FS_CHECK_FS=y
- Reproduces
mkdir test
mount -t f2fs tmp.img test
sync
- Messages
kernel BUG at fs/f2fs/recovery.c:549!
RIP: 0010:recover_data+0x167a/0x1780
Call Trace:
f2fs_recover_fsync_data+0x613/0x710
f2fs_fill_super+0x1043/0x1aa0
mount_bdev+0x16d/0x1a0
mount_fs+0x4a/0x170
vfs_kern_mount+0x5d/0x100
do_mount+0x200/0xcf0
ksys_mount+0x79/0xc0
__x64_sys_mount+0x1c/0x20
do_syscall_64+0x43/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
During recovery, if ofs_of_node is inconsistent in between recovered
node page and original checkpointed node page, let's just fail recovery
instead of making kernel panic.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fdc6bae940 ]
The ADJ_TAI adjtimex mode sets the TAI-UTC offset of the system clock.
It is typically set by NTP/PTP implementations and it is automatically
updated by the kernel on leap seconds. The initial value is zero (which
applications may interpret as unknown), but this value cannot be set by
adjtimex. This limitation seems to go back to the original "nanokernel"
implementation by David Mills.
Change the ADJ_TAI check to accept zero as a valid TAI-UTC offset in
order to allow setting it back to the initial value.
Fixes: 153b5d054a ("ntp: support for TAI")
Suggested-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Link: https://lkml.kernel.org/r/20190417084833.7401-1-mlichvar@redhat.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 68a1c8485c ]
On failure of_irq_get() returns a negative value or zero, which is
not handled as an error in the existing implementation.
Instead of using this API, use platform_get_irq() that returns
exclusively a negative value on failure.
Also, do not output an error log in case of defer probe error.
Signed-off-by: Fabien Dessenne <fabien.dessenne@st.com>
Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f173747fff ]
Holding the spin-lock for all of the code in meson_pwm_apply() can
result in a "BUG: scheduling while atomic". This can happen because
clk_get_rate() (which is called from meson_pwm_calc()) may sleep.
Only hold the spin-lock when modifying registers to solve this.
The reason why we need a spin-lock in the driver is because the
REG_MISC_AB register is shared between the two channels provided by one
PWM controller. The only functions where REG_MISC_AB is modified are
meson_pwm_enable() and meson_pwm_disable() so the register reads/writes
in there need to be protected by the spin-lock.
The original code also used the spin-lock to protect the values in
struct meson_pwm_channel. This could be necessary if two consumers can
use the same PWM channel. However, PWM core doesn't allow this so we
don't need to protect the values in struct meson_pwm_channel with a
lock.
Fixes: 211ed63075 ("pwm: Add support for Meson PWM Controller")
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Reviewed-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Reviewed-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2b8358a951 ]
The mpc85xx EDAC driver can be configured as a module but then fails to
build because it uses two unexported symbols:
ERROR: ".pci_find_hose_for_OF_device" [drivers/edac/mpc85xx_edac_mod.ko] undefined!
ERROR: ".early_find_capability" [drivers/edac/mpc85xx_edac_mod.ko] undefined!
We don't want to export those symbols just for this driver, so make the
driver only configurable as a built-in.
This seems to have been broken since at least
c92132f598 ("edac/85xx: Add PCIe error interrupt edac support")
(Nov 2013).
[ bp: make it depend on EDAC=y so that the EDAC core doesn't get built
as a module. ]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Johannes Thumshirn <jth@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: linuxppc-dev@ozlabs.org
Cc: morbidrsa@gmail.com
Link: https://lkml.kernel.org/r/20190502141941.12927-1-mpe@ellerman.id.au
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e2f7fc0ac6 ]
Commit 31fd85816d ("bpf: permits narrower load from bpf program
context fields") made the verifier add AND instructions to clear the
unwanted bits with a mask when doing a narrow load. The mask is
computed with
(1 << size * 8) - 1
where "size" is the size of the narrow load. When doing a 4 byte load
of a an 8 byte field the verifier shifts the literal 1 by 32 places to
the left. This results in an overflow of a signed integer, which is an
undefined behavior. Typically, the computed mask was zero, so the
result of the narrow load ended up being zero too.
Cast the literal to long long to avoid overflows. Note that narrow
load of the 4 byte fields does not have the undefined behavior,
because the load size can only be either 1 or 2 bytes, so shifting 1
by 8 or 16 places will not overflow it. And reading 4 bytes would not
be a narrow load of a 4 bytes field.
Fixes: 31fd85816d ("bpf: permits narrower load from bpf program context fields")
Reviewed-by: Alban Crequy <alban@kinvolk.io>
Reviewed-by: Iago López Galeiras <iago@kinvolk.io>
Signed-off-by: Krzesimir Nowak <krzesimir@kinvolk.io>
Cc: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d2434e4d94 ]
Cursor position updates were accidentally causing us to attempt to interlock
window with window immediate, and without a matching window immediate update,
NVDisplay could hang forever in some circumstances.
Fixes suspend/resume on (at least) Quadro RTX4000 (TU104).
Reported-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e6da956795 ]
The ignore flag is set on fake jumps in order to keep
add_jump_destinations() from setting their jump_dest, since it already
got set when the fake jump was created.
But using the ignore flag is a bit of a hack. It's normally used to
skip validation of an instruction, which doesn't really make sense for
fake jumps.
Also, after the next patch, using the ignore flag for fake jumps can
trigger a false "why am I validating an ignored function?" warning.
Instead just add an explicit check in add_jump_destinations() to skip
fake jumps.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/71abc072ff48b2feccc197723a9c52859476c068.1557766718.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 67793bd3b3 ]
The driver currently sets register 0xfb (Low Refresh Rate) based on the
value of mode->vrefresh. Firstly, this field is specified to be in Hz,
but the magic numbers used by the code are Hz * 1000. This essentially
leads to the low refresh rate always being set to 0x01, since the
vrefresh value will always be less than 24000. Fix the magic numbers to
be in Hz.
Secondly, according to the comment in drm_modes.h, the field is not
supposed to be used in a functional way anyway. Instead, use the helper
function drm_mode_vrefresh().
Fixes: 9c8af882bf ("drm: Add adv7511 encoder driver")
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Matt Redfearn <matt.redfearn@thinci.com>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20190424132210.26338-1-matt.redfearn@thinci.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c4a52d6696 ]
nv50_head_atomic_duplicate_state() makes a copy of nv50_head_atom
struct. This patch adds copying of struct member named "or", which
previously was left uninitialized in the duplicated structure.
Due to this bug, incorrect nhsync and nvsync values were sometimes used.
In my particular case, that lead to a mismatch between the output
resolution of the graphics device (GeForce GT 630 OEM) and the reported
input signal resolution on the display. xrandr reported 1680x1050, but
the display reported 1280x1024. As a result of this mismatch, the output
on the display looked like it was cropped (only part of the output was
actually visible on the display).
git bisect pointed to commit 2ca7fb5c1c ("drm/nouveau/kms/nv50: handle
SetControlOutputResource from head"), which added the member "or" to
nv50_head_atom structure, but forgot to copy it in
nv50_head_atomic_duplicate_state().
Fixes: 2ca7fb5c1c ("drm/nouveau/kms/nv50: handle SetControlOutputResource from head")
Signed-off-by: Peteris Rudzusiks <peteris.rudzusiks@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a0b694d0af ]
HW has error checks in place which check that pixel depth is explicitly
provided on DP, while HDMI has a "default" setting that we use.
In multi-display configurations with identical modelines, but different
protocols (HDMI + DP, in this case), it was possible for the DP head to
get swapped to the head which previously drove the HDMI output, without
updating HeadSetControlOutputResource(), triggering the error check and
hanging the core update.
Reported-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 48171d0ea7 ]
I noticed that we can get a -EREMOTEIO errors on at least omap4 duovero:
twl6040 0-004b: Failed to write 2d = 19: -121
And then any following register access will produce errors.
There 2d offset above is register ACCCTL that gets written on twl6040
powerup. With error checking added to the related regcache_sync() call,
the -EREMOTEIO error is reproducable on twl6040 powerup at least
duovero.
To fix the error, we need to wait until twl6040 is accessible after the
powerup. Based on tests on omap4 duovero, we need to wait over 8ms after
powerup before register write will complete without failures. Let's also
make sure we warn about possible errors too.
Note that we have twl6040_patch[] reg_sequence with the ACCCTL register
configuration and regcache_sync() will write the new value to ACCCTL.
Signed-off-by: Tony Lindgren <tony@atomide.com>
Acked-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 13d03e9daf ]
Where possible, we want the failsafe link configuration (one which won't
hang the OR during modeset because of not enough bandwidth for the mode)
to also be supported by the sink.
This prevents "link rate unsupported by sink" messages when link training
fails.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9e364e87ad ]
MODULE_DEVICE_TABLE(of, <of_match_table> should be called to complete DT
OF mathing mechanism and register it.
Before this patch:
modinfo drivers/mfd/tps65912-spi.ko | grep alias
alias: spi:tps65912
After this patch:
modinfo drivers/mfd/tps65912-spi.ko | grep alias
alias: of:N*T*Cti,tps65912C*
alias: of:N*T*Cti,tps65912
alias: spi:tps65912
Reported-by: Javier Martinez Canillas <javier@dowhile0.org>
Signed-off-by: Daniel Gomez <dagmcr@gmail.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fc7d18cf6a ]
We print a calibration failure message on -EPROBE_DEFER from
nvmem/qfprom as follows:
[ 3.003090] qcom-tsens 4a9000.thermal-sensor: version: 1.4
[ 3.005376] qcom-tsens 4a9000.thermal-sensor: tsens calibration failed
[ 3.113248] qcom-tsens 4a9000.thermal-sensor: version: 1.4
This confuses people when, in fact, calibration succeeds later when
nvmem/qfprom device is available. Don't print this message on a
-EPROBE_DEFER.
Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 63f55fcea5 ]
Currently IRQ remains enabled after .remove, later if device is probed,
IRQ is requested before .thermal_init, this may cause IRQ function be
called before device is initialized.
this patch disables interrupt in .remove, to ensure irq function
only be called after device is fully initialized.
Signed-off-by: Jiada Wang <jiada_wang@mentor.com>
Reviewed-by: Simon Horman <horms+renesas@verge.net.au>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a9e73998f9 ]
While validating new map we require the @start_data to be strictly less
than @end_data, which is fine for regular applications (this is why this
nit didn't trigger for that long). These members are set from executable
loaders such as elf handers, still it is pretty valid to have a loadable
data section with zero size in file, in such case the start_data is equal
to end_data once kernel loader finishes.
As a result when we're trying to restore such programs the procedure fails
and the kernel returns -EINVAL. From the image dump of a program:
| "mm_start_code": "0x400000",
| "mm_end_code": "0x8f5fb4",
| "mm_start_data": "0xf1bfb0",
| "mm_end_data": "0xf1bfb0",
Thus we need to change validate_prctl_map from strictly less to less or
equal operator use.
Link: http://lkml.kernel.org/r/20190408143554.GY1421@uranus.lan
Fixes: f606b77f1a ("prctl: PR_SET_MM -- introduce PR_SET_MM_MAP operation")
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Andrey Vagin <avagin@gmail.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 745e10146c ]
"cat /proc/slab_allocators" could hang forever on SMP machines with
kmemleak or object debugging enabled due to other CPUs running do_drain()
will keep making kmemleak_object or debug_objects_cache dirty and unable
to escape the first loop in leaks_show(),
do {
set_store_user_clean(cachep);
drain_cpu_caches(cachep);
...
} while (!is_store_user_clean(cachep));
For example,
do_drain
slabs_destroy
slab_destroy
kmem_cache_free
__cache_free
___cache_free
kmemleak_free_recursive
delete_object_full
__delete_object
put_object
free_object_rcu
kmem_cache_free
cache_free_debugcheck --> dirty kmemleak_object
One approach is to check cachep->name and skip both kmemleak_object and
debug_objects_cache in leaks_show(). The other is to set store_user_clean
after drain_cpu_caches() which leaves a small window between
drain_cpu_caches() and set_store_user_clean() where per-CPU caches could
be dirty again lead to slightly wrong information has been stored but
could also speed up things significantly which sounds like a good
compromise. For example,
# cat /proc/slab_allocators
0m42.778s # 1st approach
0m0.737s # 2nd approach
[akpm@linux-foundation.org: tweak comment]
Link: http://lkml.kernel.org/r/20190411032635.10325-1-cai@lca.pw
Fixes: d31676dfde ("mm/slab: alternative implementation for DEBUG_SLAB_LEAK")
Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 024eee0e83 ]
MADV_DONTNEED is handled with mmap_sem taken in read mode. We call
page_mkclean without holding mmap_sem.
MADV_DONTNEED implies that pages in the region are unmapped and subsequent
access to the pages in that range is handled as a new page fault. This
implies that if we don't have parallel access to the region when
MADV_DONTNEED is run we expect those range to be unallocated.
w.r.t page_mkclean() we need to make sure that we don't break the
MADV_DONTNEED semantics. MADV_DONTNEED check for pmd_none without holding
pmd_lock. This implies we skip the pmd if we temporarily mark pmd none.
Avoid doing that while marking the page clean.
Keep the sequence same for dax too even though we don't support
MADV_DONTNEED for dax mapping
The bug was noticed by code review and I didn't observe any failures w.r.t
test run. This is similar to
commit 58ceeb6bec
Author: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Date: Thu Apr 13 14:56:26 2017 -0700
thp: fix MADV_DONTNEED vs. MADV_FREE race
commit ced108037c
Author: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Date: Thu Apr 13 14:56:20 2017 -0700
thp: fix MADV_DONTNEED vs. numa balancing race
Link: http://lkml.kernel.org/r/20190321040610.14226-1-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc:"Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2b59e01a3a ]
Currently one bit in cma bitmap represents number of pages rather than
one page, cma->count means cma size in pages. So to find available pages
via find_next_zero_bit()/find_next_bit() we should use cma size not in
pages but in bits although current free pages number is correct due to
zero value of order_per_bit. Once order_per_bit is changed the bitmap
status will be incorrect.
The size input in cma_debug_show_areas() is not correct. It will
affect the available pages at some position to debug the failure issue.
This is an example with order_per_bit = 1
Before this change:
[ 4.120060] cma: number of available pages: 1@93+4@108+7@121+7@137+7@153+7@169+7@185+7@201+3@213+3@221+3@229+3@237+3@245+3@253+3@261+3@269+3@277+3@285+3@293+3@301+3@309+3@317+3@325+19@333+15@369+512@512=> 638 free of 1024 total pages
After this change:
[ 4.143234] cma: number of available pages: 2@93+8@108+14@121+14@137+14@153+14@169+14@185+14@201+6@213+6@221+6@229+6@237+6@245+6@253+6@261+6@269+6@277+6@285+6@293+6@301+6@309+6@317+6@325+38@333+30@369=> 252 free of 1024 total pages
Obviously the bitmap status before is incorrect.
Link: http://lkml.kernel.org/r/20190320060829.9144-1-zbestahu@gmail.com
Signed-off-by: Yue Hu <huyue2@yulong.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Laura Abbott <labbott@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit dd7ef7bd14 ]
In a low-memory situation, cc->fast_search_fail can keep increasing as it
is unable to find an available page to isolate in
fast_isolate_freepages(). As the result, it could trigger an error below,
so just compare with the maximum bits can be shifted first.
UBSAN: Undefined behaviour in mm/compaction.c:1160:30
shift exponent 64 is too large for 64-bit type 'unsigned long'
CPU: 131 PID: 1308 Comm: kcompactd1 Kdump: loaded Tainted: G
W L 5.0.0+ #17
Call trace:
dump_backtrace+0x0/0x450
show_stack+0x20/0x2c
dump_stack+0xc8/0x14c
__ubsan_handle_shift_out_of_bounds+0x7e8/0x8c4
compaction_alloc+0x2344/0x2484
unmap_and_move+0xdc/0x1dbc
migrate_pages+0x274/0x1310
compact_zone+0x26ec/0x43bc
kcompactd+0x15b8/0x1a24
kthread+0x374/0x390
ret_from_fork+0x10/0x18
[akpm@linux-foundation.org: code cleanup]
Link: http://lkml.kernel.org/r/20190320203338.53367-1-cai@lca.pw
Fixes: 70b44595ea ("mm, compaction: use free lists to quickly locate a migration source")
Signed-off-by: Qian Cai <cai@lca.pw>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 54c7a8916a ]
Patch series "initramfs tidyups".
I've spent some time chasing down behavior in initramfs and found
plenty of opportunity to improve the code. A first stab on that is
contained in this series.
This patch (of 7):
We free the initrd memory for all successful or error cases except for the
case where opening /initrd.image fails, which looks like an oversight.
Steven said:
: This also changes the behaviour when CONFIG_INITRAMFS_FORCE is enabled
: - specifically it means that the initrd is freed (previously it was
: ignored and never freed). But that seems like reasonable behaviour and
: the previous behaviour looks like another oversight.
Link: http://lkml.kernel.org/r/20190213174621.29297-3-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Steven Price <steven.price@arm.com>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com> [arm64]
Cc: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 299c83dce9 ]
342332e6a9 ("mm/page_alloc.c: introduce kernelcore=mirror option") and
later patches rewrote the calculation of node spanned pages.
e506b99696 ("mem-hotplug: fix node spanned pages when we have a movable
node"), but the current code still has problems,
When we have a node with only zone_movable and the node id is not zero,
the size of node spanned pages is double added.
That's because we have an empty normal zone, and zone_start_pfn or
zone_end_pfn is not between arch_zone_lowest_possible_pfn and
arch_zone_highest_possible_pfn, so we need to use clamp to constrain the
range just like the commit <96e907d13602> (bootmem: Reimplement
__absent_pages_in_range() using for_each_mem_pfn_range()).
e.g.
Zone ranges:
DMA [mem 0x0000000000001000-0x0000000000ffffff]
DMA32 [mem 0x0000000001000000-0x00000000ffffffff]
Normal [mem 0x0000000100000000-0x000000023fffffff]
Movable zone start for each node
Node 0: 0x0000000100000000
Node 1: 0x0000000140000000
Early memory node ranges
node 0: [mem 0x0000000000001000-0x000000000009efff]
node 0: [mem 0x0000000000100000-0x00000000bffdffff]
node 0: [mem 0x0000000100000000-0x000000013fffffff]
node 1: [mem 0x0000000140000000-0x000000023fffffff]
node 0 DMA spanned:0xfff present:0xf9e absent:0x61
node 0 DMA32 spanned:0xff000 present:0xbefe0 absent:0x40020
node 0 Normal spanned:0 present:0 absent:0
node 0 Movable spanned:0x40000 present:0x40000 absent:0
On node 0 totalpages(node_present_pages): 1048446
node_spanned_pages:1310719
node 1 DMA spanned:0 present:0 absent:0
node 1 DMA32 spanned:0 present:0 absent:0
node 1 Normal spanned:0x100000 present:0x100000 absent:0
node 1 Movable spanned:0x100000 present:0x100000 absent:0
On node 1 totalpages(node_present_pages): 2097152
node_spanned_pages:2097152
Memory: 6967796K/12582392K available (16388K kernel code, 3686K rwdata,
4468K rodata, 2160K init, 10444K bss, 5614596K reserved, 0K
cma-reserved)
It shows that the current memory of node 1 is double added.
After this patch, the problem is fixed.
node 0 DMA spanned:0xfff present:0xf9e absent:0x61
node 0 DMA32 spanned:0xff000 present:0xbefe0 absent:0x40020
node 0 Normal spanned:0 present:0 absent:0
node 0 Movable spanned:0x40000 present:0x40000 absent:0
On node 0 totalpages(node_present_pages): 1048446
node_spanned_pages:1310719
node 1 DMA spanned:0 present:0 absent:0
node 1 DMA32 spanned:0 present:0 absent:0
node 1 Normal spanned:0 present:0 absent:0
node 1 Movable spanned:0x100000 present:0x100000 absent:0
On node 1 totalpages(node_present_pages): 1048576
node_spanned_pages:1048576
memory: 6967796K/8388088K available (16388K kernel code, 3686K rwdata,
4468K rodata, 2160K init, 10444K bss, 1420292K reserved, 0K
cma-reserved)
Link: http://lkml.kernel.org/r/1554178276-10372-1-git-send-email-fanglinxu@huawei.com
Signed-off-by: Linxu Fang <fanglinxu@huawei.com>
Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
Cc: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d9eb1417c7 ]
Patch series "mm/memory_hotplug: Better error handling when removing
memory", v1.
Error handling when removing memory is somewhat messed up right now. Some
errors result in warnings, others are completely ignored. Memory unplug
code can essentially not deal with errors properly as of now.
remove_memory() will never fail.
We have basically two choices:
1. Allow arch_remov_memory() and friends to fail, propagating errors via
remove_memory(). Might be problematic (e.g. DIMMs consisting of multiple
pieces added/removed separately).
2. Don't allow the functions to fail, handling errors in a nicer way.
It seems like most errors that can theoretically happen are really corner
cases and mostly theoretical (e.g. "section not valid"). However e.g.
aborting removal of sections while all callers simply continue in case of
errors is not nice.
If we can gurantee that removal of memory always works (and WARN/skip in
case of theoretical errors so we can figure out what is going on), we can
go ahead and implement better error handling when adding memory.
E.g. via add_memory():
arch_add_memory()
ret = do_stuff()
if (ret) {
arch_remove_memory();
goto error;
}
Handling here that arch_remove_memory() might fail is basically
impossible. So I suggest, let's avoid reporting errors while removing
memory, warning on theoretical errors instead and continuing instead of
aborting.
This patch (of 4):
__add_pages() doesn't add the memory resource, so __remove_pages()
shouldn't remove it. Let's factor it out. Especially as it is a special
case for memory used as system memory, added via add_memory() and friends.
We now remove the resource after removing the sections instead of doing it
the other way around. I don't think this change is problematic.
add_memory()
register memory resource
arch_add_memory()
remove_memory
arch_remove_memory()
release memory resource
While at it, explain why we ignore errors and that it only happeny if
we remove memory in a different granularity as we added it.
[david@redhat.com: fix printk warning]
Link: http://lkml.kernel.org/r/20190417120204.6997-1-david@redhat.com
Link: http://lkml.kernel.org/r/20190409100148.24703-2-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Arun KS <arunks@codeaurora.org>
Cc: Mathieu Malaterre <malat@debian.org>
Cc: Andrew Banman <andrew.banman@hpe.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Mike Travis <mike.travis@hpe.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oscar Salvador <osalvador@suse.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Rob Herring <robh@kernel.org>
Cc: Stefan Agner <stefan@agner.ch>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0919e1b69a ]
When a huge page is allocated, PagePrivate() is set if the allocation
consumed a reservation. When freeing a huge page, PagePrivate is checked.
If set, it indicates the reservation should be restored. PagePrivate
being set at free huge page time mostly happens on error paths.
When huge page reservations are created, a check is made to determine if
the mapping is associated with an explicitly mounted filesystem. If so,
pages are also reserved within the filesystem. The default action when
freeing a huge page is to decrement the usage count in any associated
explicitly mounted filesystem. However, if the reservation is to be
restored the reservation/use count within the filesystem should not be
decrementd. Otherwise, a subsequent page allocation and free for the same
mapping location will cause the file filesystem usage to go 'negative'.
Filesystem Size Used Avail Use% Mounted on
nodev 4.0G -4.0M 4.1G - /opt/hugepool
To fix, when freeing a huge page do not adjust filesystem usage if
PagePrivate() is set to indicate the reservation should be restored.
I did not cc stable as the problem has been around since reserves were
added to hugetlbfs and nobody has noticed.
Link: http://lkml.kernel.org/r/20190328234704.27083-2-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3e01ae2612 ]
The following warning is seen on systems with broken clock divider.
INFO: trying to register non-static key.
the code is fine but needs lockdep annotation.
turning off the locking correctness validator.
CPU: 0 PID: 1 Comm: swapper Not tainted 5.1.0-09698-g1fb3b52 #1
Hardware name: ARM Integrator/CP (Device Tree)
[<c0011be8>] (unwind_backtrace) from [<c000ebb8>] (show_stack+0x10/0x18)
[<c000ebb8>] (show_stack) from [<c07d3fd0>] (dump_stack+0x18/0x24)
[<c07d3fd0>] (dump_stack) from [<c0060d48>] (register_lock_class+0x674/0x6f8)
[<c0060d48>] (register_lock_class) from [<c005de2c>]
(__lock_acquire+0x68/0x2128)
[<c005de2c>] (__lock_acquire) from [<c0060408>] (lock_acquire+0x110/0x21c)
[<c0060408>] (lock_acquire) from [<c07f755c>] (_raw_spin_lock+0x34/0x48)
[<c07f755c>] (_raw_spin_lock) from [<c0536c8c>]
(pl111_display_enable+0xf8/0x5fc)
[<c0536c8c>] (pl111_display_enable) from [<c0502f54>]
(drm_atomic_helper_commit_modeset_enables+0x1ec/0x244)
Since commit eedd6033b4 ("drm/pl111: Support variants with broken clock
divider"), the spinlock is not initialized if the clock divider is broken.
Initialize it earlier to fix the problem.
Fixes: eedd6033b4 ("drm/pl111: Support variants with broken clock divider")
Cc: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/1557758781-23586-1-git-send-email-linux@roeck-us.net
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fc8670d1f7 ]
media_device_cleanup() and v4l2_m2m_unregister_media_controller() were
missing in the probe error path.
While at it, re-order calls in the remove path to unregister/cleanup
things in the reverse order they were initialized/registered.
Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit dbb9247167 ]
This reverts commit 8059add047.
This commit while seemingly a good idea, breaks a radv check,
for a node being master because something succeeds where it failed
before now.
Apply the Linus rule, revert early and try again, we don't break
userspace.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 4cdd17ba1d upstream.
We need to compute the uart state only on the first open. This is
usually what is done in the ->install hook. serial_core used to do this
in ->open on every open. So move it to ->install.
As a side effect, it ensures the state is set properly in the window
after tty_init_dev is called, but before uart_open. This fixes a bunch
of races between tty_open and flush_to_ldisc we were dealing with
recently.
One of such bugs was attempted to fix in commit fedb576064 (serial:
fix race between flush_to_ldisc and tty_open), but it only took care of
a couple of functions (uart_start and uart_unthrottle). I was able to
reproduce the crash on a SLE system, but in uart_write_room which is
also called from flush_to_ldisc via process_echoes. I was *unable* to
reproduce the bug locally. It is due to having this patch in my queue
since 2012!
general protection fault: 0000 [#1] SMP KASAN PTI
CPU: 1 PID: 5 Comm: kworker/u4:0 Tainted: G L 4.12.14-396-default #1 SLE15-SP1 (unreleased)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c89-prebuilt.qemu.org 04/01/2014
Workqueue: events_unbound flush_to_ldisc
task: ffff8800427d8040 task.stack: ffff8800427f0000
RIP: 0010:uart_write_room+0xc4/0x590
RSP: 0018:ffff8800427f7088 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 000000000000002f RSI: 00000000000000ee RDI: ffff88003888bd90
RBP: ffffffffb9545850 R08: 0000000000000001 R09: 0000000000000400
R10: ffff8800427d825c R11: 000000000000006e R12: 1ffff100084fee12
R13: ffffc900004c5000 R14: ffff88003888bb28 R15: 0000000000000178
FS: 0000000000000000(0000) GS:ffff880043300000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000561da0794148 CR3: 000000000ebf4000 CR4: 00000000000006e0
Call Trace:
tty_write_room+0x6d/0xc0
__process_echoes+0x55/0x870
n_tty_receive_buf_common+0x105e/0x26d0
tty_ldisc_receive_buf+0xb7/0x1c0
tty_port_default_receive_buf+0x107/0x180
flush_to_ldisc+0x35d/0x5c0
...
0 in rbx means tty->driver_data is NULL in uart_write_room. 0x178 is
tried to be dereferenced (0x178 >> 3 is 0x2f in rdx) at
uart_write_room+0xc4. 0x178 is exactly (struct uart_state *)NULL->refcount
used in uart_port_lock from uart_write_room.
So revert the upstream commit here as my local patch should fix the
whole family.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Li RongQing <lirongqing@baidu.com>
Cc: Wang Li <wangli39@baidu.com>
Cc: Zhang Yu <zhangyu31@baidu.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 89a4aac0ab upstream.
In the case of a normal sync update, the preparation of framebuffers (be
it calling drm_atomic_helper_prepare_planes() or doing setups with
drm_framebuffer_get()) are performed in the new_state and the respective
cleanups are performed in the old_state.
In the case of async updates, the preparation is also done in the
new_state but the cleanups are done in the new_state (because updates
are performed in place, i.e. in the current state).
The current code blocks async udpates when the fb is changed, turning
async updates into sync updates, slowing down cursor updates and
introducing regressions in igt tests with errors of type:
"CRITICAL: completed 97 cursor updated in a period of 30 flips, we
expect to complete approximately 15360 updates, with the threshold set
at 7680"
Fb changes in async updates were prevented to avoid the following scenario:
- Async update, oldfb = NULL, newfb = fb1, prepare fb1, cleanup fb1
- Async update, oldfb = fb1, newfb = fb2, prepare fb2, cleanup fb2
- Non-async commit, oldfb = fb2, newfb = fb1, prepare fb1, cleanup fb2 (wrong)
Where we have a single call to prepare fb2 but double cleanup call to fb2.
To solve the above problems, instead of blocking async fb changes, we
place the old framebuffer in the new_state object, so when the code
performs cleanups in the new_state it will cleanup the old_fb and we
will have the following scenario instead:
- Async update, oldfb = NULL, newfb = fb1, prepare fb1, no cleanup
- Async update, oldfb = fb1, newfb = fb2, prepare fb2, cleanup fb1
- Non-async commit, oldfb = fb2, newfb = fb1, prepare fb1, cleanup fb2
Where calls to prepare/cleanup are balanced.
Cc: <stable@vger.kernel.org> # v4.14+
Fixes: 25dc194b34 ("drm: Block fb changes for async plane updates")
Suggested-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Helen Koike <helen.koike@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190603165610.24614-6-helen.koike@collabora.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 551bd3368a upstream.
With Sphinx 2.0 (or prior versions with the deprecation warnings fixed) the
docs build fails with:
Documentation/gpu/i915.rst:403: WARNING: Title level inconsistent:
Global GTT Fence Handling
~~~~~~~~~~~~~~~~~~~~~~~~~
reST markup error:
Documentation/gpu/i915.rst:403: (SEVERE/4) Title level inconsistent:
I "fixed" it by changing the subsections in i915.rst, but that didn't seem
like the correct change. It turns out that a couple of i915 files create
their own subsections in kerneldoc comments using apostrophes as the
heading marker:
Layout
''''''
That breaks the normal subsection marker ordering, and newer Sphinx is
rather more strict about enforcing that ordering. So fix the offending
comments to make Sphinx happy.
(This is unfortunate, in that kerneldoc comments shouldn't need to be aware
of where they might be included in the heading hierarchy, but I don't see
a better way around it).
Cc: stable@vger.kernel.org # v4.14+
Acked-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a8c2d5ab9e upstream.
"To track whether a request has started on HW, we can emit a breadcrumb at
the beginning of the request and check its timeline's HWSP to see if the
breadcrumb has advanced past the start of this request." It means all the
request which timeline's has_init_breadcrumb is true, then the
emit_init_breadcrumb process must have before emitting the real commands,
otherwise, the scheduler might get a wrong state of this request during
reset. If the request is exactly the guilty one, the scheduler won't
terminate it with the wrong state. To avoid this, do emit_init_breadcrumb
for all the requests from gvt.
v2: cc to stable kernel
Fixes: 8547444137 ("drm/i915: Identify active requests")
Cc: stable@vger.kernel.org
Acked-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Weinan <weinan.z.li@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ce0e22f5d8 upstream.
[What]
vce ring test fails consistently during resume in s3 cycle, due to
mismatch read & write pointers.
On debug/analysis its found that rptr to be compared is not being
correctly updated/read, which leads to this failure.
Below is the failure signature:
[drm:amdgpu_vce_ring_test_ring] *ERROR* amdgpu: ring 12 test failed
[drm:amdgpu_device_ip_resume_phase2] *ERROR* resume of IP block <vce_v3_0> failed -110
[drm:amdgpu_device_resume] *ERROR* amdgpu_device_ip_resume failed (-110).
[How]
fetch rptr appropriately, meaning move its read location further down
in the code flow.
With this patch applied the s3 failure is no more seen for >5k s3 cycles,
which otherwise is pretty consistent.
V2: remove reduntant fetch of rptr
Signed-off-by: Louis Li <Ching-shih.Li@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0cbd0adc44 upstream.
As discussed with Nicholas and Daniel Vetter (patchwork
link to discussion below), the VRR timestamping behaviour
produced utterly useless and bogus vblank/pageflip
timestamps. We have found a way to fix this and provide
sane behaviour.
As of Linux 5.2, the amdgpu driver will be able to
provide exactly the same vblank / pageflip timestamp
semantic in variable refresh rate mode as in standard
fixed refresh rate mode. This is achieved by deferring
core vblank handling (drm_crtc_handle_vblank()) until
the end of front porch, and also defer the sending of
pageflip completion events until end of front porch,
when we can safely compute correct pageflip/vblank
timestamps.
The same approach will be possible for other VRR
capable kms drivers, so we can actually have sane
and useful timestamps in VRR mode.
This patch removes the section of the docs that
describes the broken timestamp behaviour present
in Linux 5.0/5.1.
Fixes: ab7a664f7a ("drm: Document variable refresh properties")
Link: https://patchwork.freedesktop.org/patch/285333/
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190418060157.18968-1-mario.kleiner.de@gmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b30a43ac71 upstream.
There was a nouveau DDX that relied on legacy context ioctls to work,
but we fixed it years ago, give distros that have a modern DDX the
option to break the uAPI and close the mess of holes that legacy
context support is.
Full context of the story:
commit 0e975980d4
Author: Peter Antoine <peter.antoine@intel.com>
Date: Tue Jun 23 08:18:49 2015 +0100
drm: Turn off Legacy Context Functions
The context functions are not used by the i915 driver and should not
be used by modeset drivers. These driver functions contain several bugs
and security holes. This change makes these functions optional can be
turned on by a setting, they are turned off by default for modeset
driver with the exception of the nouvea driver that may require them with
an old version of libdrm.
The previous attempt was
commit 7c510133d9
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Thu Aug 8 15:41:21 2013 +0200
drm: mark context support as a legacy subsystem
but this had to be reverted
commit c21eb21cb5
Author: Dave Airlie <airlied@redhat.com>
Date: Fri Sep 20 08:32:59 2013 +1000
Revert "drm: mark context support as a legacy subsystem"
v2: remove returns from void function, and formatting (Daniel Vetter)
v3:
- s/Nova/nouveau/ in the commit message, and add references to the
previous attempts
- drop the part touching the drm hw lock, that should be a separate
patch.
Signed-off-by: Peter Antoine <peter.antoine@intel.com> (v2)
Cc: Peter Antoine <peter.antoine@intel.com> (v2)
Reviewed-by: Peter Antoine <peter.antoine@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
v2: move DRM_VM dependency into legacy config.
v3: fix missing dep (kbuild robot)
Cc: stable@vger.kernel.org
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d985a35332 upstream.
In the case of async update, modifications are done in place, i.e. in the
current plane state, so the new_state is prepared and the new_state is
cleaned up (instead of the old_state, unlike what happens in a
normal sync update).
To cleanup the old_fb properly, it needs to be placed in the new_state
in the end of async_update, so cleanup call will unreference the old_fb
correctly.
Also, the previous code had a:
plane_state = plane->funcs->atomic_duplicate_state(plane);
...
swap(plane_state, plane->state);
if (plane->state->fb && plane->state->fb != new_state->fb) {
...
}
Which was wrong, as the fb were just assigned to be equal, so this if
statement nevers evaluates to true.
Another details is that the function drm_crtc_vblank_get() can only be
called when vop->is_enabled is true, otherwise it has no effect and
trows a WARN_ON().
Calling drm_atomic_set_fb_for_plane() (which get a referent of the new
fb and pus the old fb) is not required, as it is taken care by
drm_mode_cursor_universal() when calling
drm_atomic_helper_update_plane().
Fixes: 15609559a8 ("drm/rockchip: update cursors asynchronously through atomic.")
Cc: <stable@vger.kernel.org> # v4.20+
Signed-off-by: Helen Koike <helen.koike@collabora.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190603165610.24614-2-helen.koike@collabora.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bd17cc5a20 upstream.
The limit here is supposed to be how much of the page is left, but it's
just using PAGE_SIZE as the limit.
The other thing to remember is that snprintf() returns the number of
bytes which would have been copied if we had had enough room. So that
means that if we run out of space then this code would end up passing a
negative value as the limit and the kernel would print an error message.
I have change the code to use scnprintf() which returns the number of
bytes that were successfully printed (not counting the NUL terminator).
Fixes: c92316bf8e ("test_firmware: add batched firmware tests")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 110080cea0 upstream.
There are a couple potential integer overflows here.
round_up(m->size + (m->addr & ~PAGE_MASK), PAGE_SIZE);
The first thing is that the "m->size + (...)" addition could overflow,
and the second is that round_up() overflows to zero if the result is
within PAGE_SIZE of the type max.
In this code, the "m->size" variable is an u64 but we're saving the
result in "map_size" which is an unsigned long and genwqe_user_vmap()
takes an unsigned long as well. So I have used ULONG_MAX as the upper
bound. From a practical perspective unsigned long is fine/better than
trying to change all the types to u64.
Fixes: eaf4722d46 ("GenWQE Character device and DDCB queue")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e4f2d1af71 upstream.
The pistachio platform uses the U-Boot bootloader & generally boots a
kernel in the uImage format. As such it's useful to build one when
building the kernel, but to do so currently requires the user to
manually specify a uImage target on the make command line.
Make uImage.gz the pistachio platform's default build target, so that
the default is to build a kernel image that we can actually boot on a
board such as the MIPS Creator Ci40.
Marked for stable backport as far as v4.1 where pistachio support was
introduced. This is primarily useful for CI systems such as kernelci.org
which will benefit from us building a suitable image which can then be
booted as part of automated testing, extending our test coverage to the
affected stable branches.
Signed-off-by: Paul Burton <paul.burton@mips.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Kevin Hilman <khilman@baylibre.com>
Tested-by: Kevin Hilman <khilman@baylibre.com>
URL: https://groups.io/g/kernelci/message/388
Cc: stable@vger.kernel.org # v4.1+
Cc: linux-mips@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 074a1e1167 upstream.
The virt_addr_valid() function is meant to return true iff
virt_to_page() will return a valid struct page reference. This is true
iff the address provided is found within the unmapped address range
between PAGE_OFFSET & MAP_BASE, but we don't currently check for that
condition. Instead we simply mask the address to obtain what will be a
physical address if the virtual address is indeed in the desired range,
shift it to form a PFN & then call pfn_valid(). This can incorrectly
return true if called with a virtual address which, after masking,
happens to form a physical address corresponding to a valid PFN.
For example we may vmalloc an address in the kernel mapped region
starting a MAP_BASE & obtain the virtual address:
addr = 0xc000000000002000
When masked by virt_to_phys(), which uses __pa() & in turn CPHYSADDR(),
we obtain the following (bogus) physical address:
addr = 0x2000
In a common system with PHYS_OFFSET=0 this will correspond to a valid
struct page which should really be accessed by virtual address
PAGE_OFFSET+0x2000, causing virt_addr_valid() to incorrectly return 1
indicating that the original address corresponds to a struct page.
This is equivalent to the ARM64 change made in commit ca219452c6
("arm64: Correctly bounds check virt_addr_valid").
This fixes fallout when hardened usercopy is enabled caused by the
related commit 517e1fbeb6 ("mm/usercopy: Drop extra
is_vmalloc_or_module() check") which removed a check for the vmalloc
range that was present from the introduction of the hardened usercopy
feature.
Signed-off-by: Paul Burton <paul.burton@mips.com>
References: ca219452c6 ("arm64: Correctly bounds check virt_addr_valid")
References: 517e1fbeb6 ("mm/usercopy: Drop extra is_vmalloc_or_module() check")
Reported-by: Julien Cristau <jcristau@debian.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: YunQiang Su <ysu@wavecomp.com>
URL: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=929366
Cc: stable@vger.kernel.org # v4.12+
Cc: linux-mips@vger.kernel.org
Cc: Yunqiang Su <ysu@wavecomp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5651cd3c43 upstream.
When the controller supports less queues than requested, we
should make sure that queue mapping does the right thing and
not assume that all queues are available. This fixes a crash
when the controller supports less queues than requested.
The rules are:
1. if no write/poll queues are requested, we assign the available queues
to the default queue map. The default and read queue maps share the
existing queues.
2. if write queues are requested:
- first make sure that read queue map gets the requested
nr_io_queues count
- then grant the default queue map the minimum between the requested
nr_write_queues and the remaining queues. If there are no available
queues to dedicate to the default queue map, fallback to (1) and
share all the queues in the existing queue map.
3. if poll queues are requested:
- map the remaining queues to the poll queue map.
Also, provide a log indication on how we constructed the different
queue maps.
Reported-by: Harris, James R <james.r.harris@intel.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Tested-by: Jim Harris <james.r.harris@intel.com>
Cc: <stable@vger.kernel.org> # v5.0+
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 962f0af83c upstream.
Commit 0aaba41b58 ("s390: remove all code using the access register
mode") removed access register mode from the kernel, and also from the
address space detection logic. However, user space could still switch
to access register mode (trans_exc_code == 1), and exceptions in that
mode would not be correctly assigned.
Fix this by adding a check for trans_exc_code == 1 to get_fault_type(),
and remove the wrong comment line before that function.
Fixes: 0aaba41b58 ("s390: remove all code using the access register mode")
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: <stable@vger.kernel.org> # v4.15+
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 49b8095867 upstream.
This driver does not support reading more than 255 bytes at once because
the register for storing the number of bytes to read is only 8 bits. Add
a max_read_len quirk to enforce this.
This was found when using this driver with the SFP driver, which was
previously reading all 256 bytes in the SFP EEPROM in one transaction.
This caused a bunch of hard-to-debug errors in the xiic driver since the
driver/logic was treating the number of bytes to read as zero.
Rejecting transactions that aren't supported at least allows the problem
to be diagnosed more easily.
Signed-off-by: Robert Hancock <hancock@sedsystems.ca>
Reviewed-by: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit de9f869616 upstream.
get_desc() computes a pointer into the LDT while holding a lock that
protects the LDT from being freed, but then drops the lock and returns the
(now potentially dangling) pointer to its caller.
Fix it by giving the caller a copy of the LDT entry instead.
Fixes: 670f928ba0 ("x86/insn-eval: Add utility function to get segment descriptor")
Cc: stable@vger.kernel.org
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ec527c3180 upstream.
As explained in
0cc3cd2165 ("cpu/hotplug: Boot HT siblings at least once")
we always, no matter what, have to bring up x86 HT siblings during boot at
least once in order to avoid first MCE bringing the system to its knees.
That means that whenever 'nosmt' is supplied on the kernel command-line,
all the HT siblings are as a result sitting in mwait or cpudile after
going through the online-offline cycle at least once.
This causes a serious issue though when a kernel, which saw 'nosmt' on its
commandline, is going to perform resume from hibernation: if the resume
from the hibernated image is successful, cr3 is flipped in order to point
to the address space of the kernel that is being resumed, which in turn
means that all the HT siblings are all of a sudden mwaiting on address
which is no longer valid.
That results in triple fault shortly after cr3 is switched, and machine
reboots.
Fix this by always waking up all the SMT siblings before initiating the
'restore from hibernation' process; this guarantees that all the HT
siblings will be properly carried over to the resumed kernel waiting in
resume_play_dead(), and acted upon accordingly afterwards, based on the
target kernel configuration.
Symmetricaly, the resumed kernel has to push the SMT siblings to mwait
again in case it has SMT disabled; this means it has to online all
the siblings when resuming (so that they come out of hlt) and offline
them again to let them reach mwait.
Cc: 4.19+ <stable@vger.kernel.org> # v4.19+
Debugged-by: Thomas Gleixner <tglx@linutronix.de>
Fixes: 0cc3cd2165 ("cpu/hotplug: Boot HT siblings at least once")
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Acked-by: Pavel Machek <pavel@ucw.cz>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 51b72656bb upstream.
If an SCC error occurs during a read/write command execution, a false
positive CRC error message is output.
mmcblk0: response CRC error sending r/w cmd command, card status 0x900
check_scc_error() checks SCC_RVSREQ.RVSERR bit. RVSERR detects a
correction error in the next (up or down) delay tap position. However,
since the command is successful, only retuning needs to be executed.
This has been confirmed by HW engineers.
Thus, on SCC error, set retuning flag instead of setting an error code.
Fixes: b85fb0a1c8 ("mmc: tmio: Fix SCC error detection")
Signed-off-by: Takeshi Saito <takeshi.saito.xv@renesas.com>
[wsa: updated comment and commit message, removed some braces]
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Simon Horman <horms+renesas@verge.net.au>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 61009f82a9 upstream.
We accidentally changed the error code from -EAGAIN to 1 when we did the
blk-mq conversion.
Maybe a contributing factor to this mistake is that it wasn't obvious
that the "while (chunk) {" condition is always true. I have cleaned
that up as well.
Fixes: d0be12274d ("mspro_block: convert to blk-mq")
Cc: stable@vger.kernel.org
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 913ab9780f upstream.
To print the pathname that will be used by shell in the current
environment, 'command -v' is a standardized way. [1]
'which' is also often used in scripts, but it is less portable.
When I worked on commit bd55f96fa9 ("kbuild: refactor cc-cross-prefix
implementation"), I was eager to use 'command -v' but it did not work.
(The reason is explained below.)
I kept 'which' as before but got rid of '> /dev/null 2>&1' as I
thought it was no longer needed. Sorry, I was wrong.
It works well on my Ubuntu machine, but Alexey Brodkin reports noisy
warnings on CentOS7 when 'which' fails to find the given command in
the PATH environment.
$ which foo
which: no foo in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin)
Given that behavior of 'which' depends on system (and it may not be
installed by default), I want to try 'command -v' once again.
The specification [1] clearly describes the behavior of 'command -v'
when the given command is not found:
Otherwise, no output shall be written and the exit status shall reflect
that the name was not found.
However, we need a little magic to use 'command -v' from Make.
$(shell ...) passes the argument to a subshell for execution, and
returns the standard output of the command.
Here is a trick. GNU Make may optimize this by executing the command
directly instead of forking a subshell, if no shell special characters
are found in the command and omitting the subshell will not change the
behavior.
In this case, no shell special character is used. So, Make will try
to run it directly. However, 'command' is a shell-builtin command,
then Make would fail to find it in the PATH environment:
$ make ARCH=m68k defconfig
make: command: Command not found
make: command: Command not found
make: command: Command not found
In fact, Make has a table of shell-builtin commands because it must
ask the shell to execute them.
Until recently, 'command' was missing in the table.
This issue was fixed by the following commit:
| commit 1af314465e5dfe3e8baa839a32a72e83c04f26ef
| Author: Paul Smith <psmith@gnu.org>
| Date: Sun Nov 12 18:10:28 2017 -0500
|
| * job.c: Add "command" as a known shell built-in.
|
| This is not a POSIX shell built-in but it's common in UNIX shells.
| Reported by Nick Bowler <nbowler@draconx.ca>.
Because the latest release is GNU Make 4.2.1 in 2016, this commit is
not included in any released versions. (But some distributions may
have back-ported it.)
We need to trick Make to spawn a subshell. There are various ways to
do so:
1) Use a shell special character '~' as dummy
$(shell : ~; command -v $(c)gcc)
2) Use a variable reference that always expands to the empty string
(suggested by David Laight)
$(shell command$${x:+} -v $(c)gcc)
3) Use redirect
$(shell command -v $(c)gcc 2>/dev/null)
I chose 3) to not confuse people. The stderr would not be polluted
anyway, but it will provide extra safety, and is easy to understand.
Tested on Make 3.81, 3.82, 4.0, 4.1, 4.2, 4.2.1
[1] http://pubs.opengroup.org/onlinepubs/9699919799/utilities/command.html
Fixes: bd55f96fa9 ("kbuild: refactor cc-cross-prefix implementation")
Cc: linux-stable <stable@vger.kernel.org> # 5.1
Reported-by: Alexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Tested-by: Alexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8880fa32c5 upstream.
The ram pstore backend has always had the crash dumper frontend enabled
unconditionally. However, it was possible to effectively disable it
by setting a record_size=0. All the machinery would run (storing dumps
to the temporary crash buffer), but 0 bytes would ultimately get stored
due to there being no przs allocated for dumps. Commit 89d328f637
("pstore/ram: Correctly calculate usable PRZ bytes"), however, assumed
that there would always be at least one allocated dprz for calculating
the size of the temporary crash buffer. This was, of course, not the
case when record_size=0, and would lead to a NULL deref trying to find
the dprz buffer size:
BUG: unable to handle kernel NULL pointer dereference at (null)
...
IP: ramoops_probe+0x285/0x37e (fs/pstore/ram.c:808)
cxt->pstore.bufsize = cxt->dprzs[0]->buffer_size;
Instead, we need to only enable the frontends based on the success of the
prz initialization and only take the needed actions when those zones are
available. (This also fixes a possible error in detecting if the ftrace
frontend should be enabled.)
Reported-and-tested-by: Yaro Slav <yaro330@gmail.com>
Fixes: 89d328f637 ("pstore/ram: Correctly calculate usable PRZ bytes")
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a9fb94a99b upstream.
Set tfm to NULL on free_buf_for_compression() after crypto_free_comp().
This avoid a use-after-free when allocate_buf_for_compression()
and free_buf_for_compression() are called twice. Although
free_buf_for_compression() freed the tfm, allocate_buf_for_compression()
won't reinitialize the tfm since the tfm pointer is not NULL.
Fixes: 95047b0519 ("pstore: Refactor compression initialization")
Signed-off-by: Pi-Hsun Shih <pihsun@chromium.org>
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a2bc923629 upstream.
Prior to sending COPY_FILE_RANGE to userspace filesystem, we must flush all
dirty pages in both the source and destination files.
This patch adds the missing flush of the source file.
Tested on libfuse-3.5.0 with:
libfuse/example/passthrough_ll /mnt/fuse/ -o writeback
libfuse/test/test_syscalls /mnt/fuse/tmp/test
Fixes: 88bc7d5097 ("fuse: add support for copy_file_range()")
Cc: <stable@vger.kernel.org> # v4.20
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ba851a39c9 upstream.
When a waiter is waked by CB_NOTIFY_LOCK, it will retry
nfs4_proc_setlk(). The waiter may fail to nfs4_proc_setlk() and sleep
again. However, the waiter is already removed from clp->cl_lock_waitq
when handling CB_NOTIFY_LOCK in nfs4_wake_lock_waiter(). So any
subsequent CB_NOTIFY_LOCK won't wake this waiter anymore. We should
put the waiter back to clp->cl_lock_waitq before retrying.
Cc: stable@vger.kernel.org #4.9+
Signed-off-by: Yihao Wu <wuyihao@linux.alibaba.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 52b042ab99 upstream.
Commit b7dbcc0e43 "NFSv4.1: Fix a race where CB_NOTIFY_LOCK fails to wake a waiter"
found this bug. However it didn't fix it.
This commit replaces schedule_timeout() with wait_woken() and
default_wake_function() with woken_wake_function() in function
nfs4_retry_setlk() and nfs4_wake_lock_waiter(). wait_woken() uses
memory barriers in its implementation to avoid potential race condition
when putting a process into sleeping state and then waking it up.
Fixes: a1d617d8f1 ("nfs: allow blocking locks to be awoken by lock callbacks")
Cc: stable@vger.kernel.org #4.9+
Signed-off-by: Yihao Wu <wuyihao@linux.alibaba.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7987b694ad upstream.
The addition of rpc_check_timeout() to call_decode causes an Oops
when the RPCSEC_GSS credential is rejected.
The reason is that rpc_decode_header() will call xprt_release() in
order to free task->tk_rqstp, which is needed by rpc_check_timeout()
to check whether or not we should exit due to a soft timeout.
The fix is to move the call to xprt_release() into call_decode() so
we can perform it after rpc_check_timeout().
Reported-by: Olga Kornievskaia <olga.kornievskaia@gmail.com>
Reported-by: Nick Bowler <nbowler@draconx.ca>
Fixes: cea57789e4 ("SUNRPC: Clean up")
Cc: stable@vger.kernel.org # v5.1+
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ec6017d903 upstream.
If call_status returns ENOTCONN, we need to re-establish the connection
state after. Otherwise the client goes into an infinite loop of call_encode,
call_transmit, call_status (ENOTCONN), call_encode.
Fixes: c8485e4d63 ("SUNRPC: Handle ECONNREFUSED correctly in xprt_transmit()")
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Cc: stable@vger.kernel.org # v2.6.29+
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 527a1d1ede upstream.
According to the found documentation, data cache flushes and sync
instructions are needed on the PCX-U+ (PA8200, e.g. C200/C240)
platforms, while PCX-W (PA8500, e.g. C360) platforms aparently don't
need those flushes when changing the IO PDIR data structures.
We have no documentation for PCX-W+ (PA8600) and PCX-W2 (PA8700) CPUs,
but Carlo Pisani reported that his C3600 machine (PA8600, PCX-W+) fails
when the fdc instructions were removed. His firmware didn't set the NIOP
bit, so one may assume it's a firmware bug since other C3750 machines
had the bit set.
Even if documentation (as mentioned above) states that PCX-W (PA8500,
e.g. J5000) does not need fdc flushes, Sven could show that an Adaptec
29320A PCI-X SCSI controller reliably failed on a dd command during the
first five minutes in his J5000 when fdc flushes were missing.
Going forward, we will now NOT replace the fdc and sync assembler
instructions by NOPS if:
a) the NP iopdir_fdc bit was set by firmware, or
b) we find a CPU up to and including a PCX-W+ (PA8600).
This fixes the HPMC crashes on a C240 and C36XX machines. For other
machines we rely on the firmware to set the bit when needed.
In case one finds HPMC issues, people could try to boot their machines
with the "no-alternatives" kernel option to turn off any alternative
patching.
Reported-by: Sven Schnelle <svens@stackframe.org>
Reported-by: Carlo Pisani <carlojpisani@gmail.com>
Tested-by: Sven Schnelle <svens@stackframe.org>
Fixes: 3847dab774 ("parisc: Add alternative coding infrastructure")
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org # 5.0+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 63923d2c38 upstream.
We only support I/O to kernel space. Using %sr1 to load the coherence
index may be racy unless interrupts are disabled. This patch changes the
code used to load the coherence index to use implicit space register
selection. This saves one instruction and eliminates the race.
Tested on rp3440, c8000 and c3750.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Cc: stable@vger.kernel.org
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a8c715b4dd upstream.
As of today if userspace process tries to access a kernel virtual addres
(0x7000_0000 to 0x7ffff_ffff) such that a legit kernel mapping already
exists, that process hangs instead of being killed with SIGSEGV
Fix that by ensuring that do_page_fault() handles kenrel vaddr only if
in kernel mode.
And given this, we can also simplify the code a bit. Now a vmalloc fault
implies kernel mode so its failure (for some reason) can reuse the
@no_context label and we can remove @bad_area_nosemaphore.
Reproduce user test for original problem:
------------------------>8-----------------
#include <stdlib.h>
#include <stdint.h>
int main(int argc, char *argv[])
{
volatile uint32_t temp;
temp = *(uint32_t *)(0x70000000);
}
------------------------>8-----------------
Cc: <stable@vger.kernel.org>
Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 66be4e66a7 upstream.
Herbert Xu pointed out that commit bb73c52bad ("rcu: Don't disable
preemption for Tiny and Tree RCU readers") was incorrect in making the
preempt_disable/enable() be conditional on CONFIG_PREEMPT_COUNT.
If CONFIG_PREEMPT_COUNT isn't enabled, the preemption enable/disable is
a no-op, but still is a compiler barrier.
And RCU locking still _needs_ that compiler barrier.
It is simply fundamentally not true that RCU locking would be a complete
no-op: we still need to guarantee (for example) that things that can
trap and cause preemption cannot migrate into the RCU locked region.
The way we do that is by making it a barrier.
See for example commit 386afc9114 ("spinlocks and preemption points
need to be at least compiler barriers") from back in 2013 that had
similar issues with spinlocks that become no-ops on UP: they must still
constrain the compiler from moving other operations into the critical
region.
Now, it is true that a lot of RCU operations already use READ_ONCE() and
WRITE_ONCE() (which in practice likely would never be re-ordered wrt
anything remotely interesting), but it is also true that that is not
globally the case, and that it's not even necessarily always possible
(ie bitfields etc).
Reported-by: Herbert Xu <herbert@gondor.apana.org.au>
Fixes: bb73c52bad ("rcu: Don't disable preemption for Tiny and Tree RCU readers")
Cc: stable@kernel.org
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e52972c11d ]
Commit 38030d7cb7 ("net/tls: avoid NULL-deref on resync during device removal")
tried to fix a potential NULL-dereference by taking the
context rwsem. Unfortunately the RX resync may get called
from soft IRQ, so we can't use the rwsem to protect from
the device disappearing. Because we are guaranteed there
can be only one resync at a time (it's called from strparser)
use a bit to indicate resync is busy and make device
removal wait for the bit to get cleared.
Note that there is a leftover "flags" field in struct
tls_context already.
Fixes: 4799ac81e5 ("tls: Add rx inline crypto offload")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 135dd9594f ]
Querying EEPROM high pages data for SFP module is currently
not supported by our driver but is still tried, resulting in
invalid FW queries.
Set the EEPROM ethtool data length to 256 for SFP module to
limit the reading for page 0 only and prevent invalid FW queries.
Fixes: 7202da8b7f ("ethtool, net/mlx4_en: Cable info, get_module_info/eeprom ethtool support")
Signed-off-by: Erez Alfasi <ereza@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 7fcd1e033d ]
e is the counter used to save the location of a dump when an
skb is filled. Once the walk of the table is complete, mr_table_dump
needs to return without resetting that index to 0. Dump of a specific
table is looping because of the reset because there is no way to
indicate the walk of the table is done.
Move the reset to the caller so the dump of each table starts at 0,
but the loop counter is maintained if a dump fills an skb.
Fixes: e1cedae1ba ("ipmr: Refactor mr_rtm_dumproute")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4b2a2bfeb3 ]
Commit cd9ff4de01 changed the key for IFF_POINTOPOINT devices to
INADDR_ANY but neigh_xmit which is used for MPLS encapsulations was not
updated to use the altered key. The result is that every packet Tx does
a lookup on the gateway address which does not find an entry, a new one
is created only to find the existing one in the table right before the
insert since arp_constructor was updated to reset the primary key. This
is seen in the allocs and destroys counters:
ip -s -4 ntable show | head -10 | grep alloc
which increase for each packet showing the unnecessary overhread.
Fix by having neigh_xmit use __ipv4_neigh_lookup_noref for NEIGH_ARP_TABLE.
Fixes: cd9ff4de01 ("ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY")
Reported-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 64c6f4bbca ]
Ian and Alan both reported seeing overflows after upgrades to 5.x kernels:
neighbour: arp_cache: neighbor table overflow!
Alan's mpls script helped get to the bottom of this bug. When a new entry
is created the gc_entries counter is bumped in neigh_alloc to check if a
new one is allowed to be created. ___neigh_create then searches for an
existing entry before inserting the just allocated one. If an entry
already exists, the new one is dropped in favor of the existing one. In
this case the cleanup path needs to drop the gc_entries counter. There
is no memory leak, only a counter leak.
Fixes: 58956317c8 ("neighbor: Improve garbage collection")
Reported-by: Ian Kumlien <ian.kumlien@gmail.com>
Reported-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 930b9a0543 ]
WoL magic packet configuration sometimes does not work due to
couple of leakages found.
Mainly there was a regression introduced during readx_poll refactoring.
Next, fw request waiting time was too small. Sometimes that
caused sleep proxy config function to return with an error
and to skip WoL configuration.
At last, WoL data were passed to FW from not clean buffer.
That could cause FW to accept garbage as a random configuration data.
Fixes: 6a7f227731 ("net: aquantia: replace AQ_HW_WAIT_FOR with readx_poll_timeout_atomic")
Signed-off-by: Nikita Danilov <nikita.danilov@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b9aa52c4cb ]
The following code returns EFAULT (Bad address):
s = socket(AF_INET6, SOCK_RAW, IPPROTO_ICMPV6);
setsockopt(s, SOL_IPV6, IPV6_HDRINCL, 1);
sendto(ipv6_icmp6_packet, addr); /* returns -1, errno = EFAULT */
The IPv4 equivalent code works. A workaround is to use IPPROTO_RAW
instead of IPPROTO_ICMPV6.
The failure happens because 2 bytes are eaten from the msghdr by
rawv6_probe_proto_opt() starting from commit 19e3c66b52 ("ipv6
equivalent of "ipv4: Avoid reading user iov twice after
raw_probe_proto_opt""), but at that time it was not a problem because
IPV6_HDRINCL was not yet introduced.
Only eat these 2 bytes if hdrincl == 0.
Fixes: 715f504b11 ("ipv6: add IPV6_HDRINCL option for raw sockets")
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 59e3e4b526 ]
As it was done in commit 8f659a03a0 ("net: ipv4: fix for a race
condition in raw_sendmsg") and commit 20b50d7997 ("net: ipv4: emulate
READ_ONCE() on ->hdrincl bit-field in raw_sendmsg()") for ipv4, copy the
value of inet->hdrincl in a local variable, to avoid introducing a race
condition in the next commit.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 82ba25c6de ]
By default, packets received in another VRF should not be passed to an
unbound socket in the default VRF. This patch updates the IPv4 UDP
multicast logic to match the unicast VRF logic (in compute_score()),
as well as the IPv6 mcast logic (in __udp_v6_is_mcast_sock()).
The particular case I noticed was DHCP discover packets going
to the 255.255.255.255 address, which are handled by
__udp4_lib_mcast_deliver(). The previous code meant that running
multiple different DHCP server or relay agent instances across VRFs
did not work correctly - any server/relay agent in the default VRF
received DHCP discover packets for all other VRFs.
Fixes: 6da5b0f027 ("net: ensure unbound datagram socket to be chosen when not in a VRF")
Signed-off-by: Tim Beale <timbeale@catalyst.net.nz>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4970b42d5c ]
This reverts commit e9919a24d3.
Nathan reported the new behaviour breaks Android, as Android just add
new rules and delete old ones.
If we return 0 without adding dup rules, Android will remove the new
added rules and causing system to soft-reboot.
Fixes: e9919a24d3 ("fib_rules: return 0 directly if an exactly same rule exists when NLM_F_EXCL not supplied")
Reported-by: Nathan Chancellor <natechancellor@gmail.com>
Reported-by: Yaro Slav <yaro330@gmail.com>
Reported-by: Maciej Żenczykowski <zenczykowski@gmail.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Tested-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 720f1de402 ]
Currently, the process issuing a "start" command on the pktgen procfs
interface, acquires the pktgen thread lock and never release it, until
all pktgen threads are completed. The above can blocks indefinitely any
other pktgen command and any (even unrelated) netdevice removal - as
the pktgen netdev notifier acquires the same lock.
The issue is demonstrated by the following script, reported by Matteo:
ip -b - <<'EOF'
link add type dummy
link add type veth
link set dummy0 up
EOF
modprobe pktgen
echo reset >/proc/net/pktgen/pgctrl
{
echo rem_device_all
echo add_device dummy0
} >/proc/net/pktgen/kpktgend_0
echo count 0 >/proc/net/pktgen/dummy0
echo start >/proc/net/pktgen/pgctrl &
sleep 1
rmmod veth
Fix the above releasing the thread lock around the sleep call.
Additionally we must prevent racing with forcefull rmmod - as the
thread lock no more protects from them. Instead, acquire a self-reference
before waiting for any thread. As a side effect, running
rmmod pktgen
while some thread is running now fails with "module in use" error,
before this patch such command hanged indefinitely.
Note: the issue predates the commit reported in the fixes tag, but
this fix can't be applied before the mentioned commit.
v1 -> v2:
- no need to check for thread existence after flipping the lock,
pktgen threads are freed only at net exit time
-
Fixes: 6146e6a43b ("[PKTGEN]: Removes thread_{un,}lock() macros.")
Reported-and-tested-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit afa0925c6f ]
Rollover used to use a complex RCU mechanism for assignment, which had
a race condition. The below patch fixed the bug and greatly simplified
the logic.
The feature depends on fanout, but the state is private to the socket.
Fanout_release returns f only when the last member leaves and the
fanout struct is to be freed.
Destroy rollover unconditionally, regardless of fanout state.
Fixes: 57f015f5ec ("packet: fix crash in fanout_demux_rollover()")
Reported-by: syzbot <syzkaller@googlegroups.com>
Diagnosed-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 28e74a7cfd ]
Some SFP modules do not like reads longer than 16 bytes, so read the
EEPROM in chunks of 16 bytes at a time. This behaviour is not specified
in the SFP MSAs, which specifies:
"The serial interface uses the 2-wire serial CMOS E2PROM protocol
defined for the ATMEL AT24C01A/02/04 family of components."
and
"As long as the SFP+ receives an acknowledge, it shall serially clock
out sequential data words. The sequence is terminated when the host
responds with a NACK and a STOP instead of an acknowledge."
We must avoid breaking a read across a 16-bit quantity in the diagnostic
page, thankfully all 16-bit quantities in that page are naturally
aligned.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 85cb928787 ]
When the following tests last for several hours, the problem will occur.
Server:
rds-stress -r 1.1.1.16 -D 1M
Client:
rds-stress -r 1.1.1.14 -s 1.1.1.16 -D 1M -T 30
The following will occur.
"
Starting up....
tsks tx/s rx/s tx+rx K/s mbi K/s mbo K/s tx us/c rtt us cpu
%
1 0 0 0.00 0.00 0.00 0.00 0.00 -1.00
1 0 0 0.00 0.00 0.00 0.00 0.00 -1.00
1 0 0 0.00 0.00 0.00 0.00 0.00 -1.00
1 0 0 0.00 0.00 0.00 0.00 0.00 -1.00
"
>From vmcore, we can find that clean_list is NULL.
>From the source code, rds_mr_flushd calls rds_ib_mr_pool_flush_worker.
Then rds_ib_mr_pool_flush_worker calls
"
rds_ib_flush_mr_pool(pool, 0, NULL);
"
Then in function
"
int rds_ib_flush_mr_pool(struct rds_ib_mr_pool *pool,
int free_all, struct rds_ib_mr **ibmr_ret)
"
ibmr_ret is NULL.
In the source code,
"
...
list_to_llist_nodes(pool, &unmap_list, &clean_nodes, &clean_tail);
if (ibmr_ret)
*ibmr_ret = llist_entry(clean_nodes, struct rds_ib_mr, llnode);
/* more than one entry in llist nodes */
if (clean_nodes->next)
llist_add_batch(clean_nodes->next, clean_tail, &pool->clean_list);
...
"
When ibmr_ret is NULL, llist_entry is not executed. clean_nodes->next
instead of clean_nodes is added in clean_list.
So clean_nodes is discarded. It can not be used again.
The workqueue is executed periodically. So more and more clean_nodes are
discarded. Finally the clean_list is NULL.
Then this problem will occur.
Fixes: 1bc144b625 ("net, rds, Replace xlist in net/rds/xlist.h with llist")
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d37acd5aa9 ]
Use a safe strscpy call to copy the ethtool stat strings into the
relevant buffers, instead of a memcpy that will be accessing
out-of-bound data.
Fixes: 118d6298f6 ("net: mvpp2: add ethtool GOP statistics")
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 09faf5a7d7 ]
Fix ability to set RX descriptor number, the reason - initially
"tx_max_pending" was set incorrectly, but the issue appears after
adding sanity check, so fix is for "sanity" patch.
Fixes: 37e2d99b59 ("ethtool: Ensure new ring parameters are within bounds during SRINGPARAM")
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b7999b0772 ]
In Jianlin's testing, netperf was broken with 'Connection reset by peer',
as the cookie check failed in rt6_check() and ip6_dst_check() always
returned NULL.
It's caused by Commit 93531c6743 ("net/ipv6: separate handling of FIB
entries from dst based routes"), where the cookie can be got only when
'c1'(see below) for setting dst_cookie whereas rt6_check() is called
when !'c1' for checking dst_cookie, as we can see in ip6_dst_check().
Since in ip6_dst_check() both rt6_dst_from_check() (c1) and rt6_check()
(!c1) will check the 'from' cookie, this patch is to remove the c1 check
in rt6_get_cookie(), so that the dst_cookie can always be set properly.
c1:
(rt->rt6i_flags & RTF_PCPU || unlikely(!list_empty(&rt->rt6i_uncached)))
Fixes: 93531c6743 ("net/ipv6: separate handling of FIB entries from dst based routes")
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0a90478b93 ]
With the topo:
h1 ---| rp1 |
| route rp3 |--- h3 (192.168.200.1)
h2 ---| rp2 |
If rp1 bc_forwarding is set while rp2 bc_forwarding is not, after
doing "ping 192.168.200.255" on h1, then ping 192.168.200.255 on
h2, and the packets can still be forwared.
This issue was caused by the input route cache. It should only do
the cache for either bc forwarding or local delivery. Otherwise,
local delivery can use the route cache for bc forwarding of other
interfaces.
This patch is to fix it by not doing cache for local delivery if
all.bc_forwarding is enabled.
Note that we don't fix it by checking route cache local flag after
rt_cache_valid() in "local_input:" and "ip_mkroute_input", as the
common route code shouldn't be touched for bc_forwarding.
Fixes: 5cbf777cfd ("route: add support for directed broadcast forwarding")
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0a8dd9f67c ]
syzbot found the following leak in sctp_process_init
BUG: memory leak
unreferenced object 0xffff88810ef68400 (size 1024):
comm "syz-executor273", pid 7046, jiffies 4294945598 (age 28.770s)
hex dump (first 32 bytes):
1d de 28 8d de 0b 1b e3 b5 c2 f9 68 fd 1a 97 25 ..(........h...%
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<00000000a02cebbd>] kmemleak_alloc_recursive include/linux/kmemleak.h:55
[inline]
[<00000000a02cebbd>] slab_post_alloc_hook mm/slab.h:439 [inline]
[<00000000a02cebbd>] slab_alloc mm/slab.c:3326 [inline]
[<00000000a02cebbd>] __do_kmalloc mm/slab.c:3658 [inline]
[<00000000a02cebbd>] __kmalloc_track_caller+0x15d/0x2c0 mm/slab.c:3675
[<000000009e6245e6>] kmemdup+0x27/0x60 mm/util.c:119
[<00000000dfdc5d2d>] kmemdup include/linux/string.h:432 [inline]
[<00000000dfdc5d2d>] sctp_process_init+0xa7e/0xc20
net/sctp/sm_make_chunk.c:2437
[<00000000b58b62f8>] sctp_cmd_process_init net/sctp/sm_sideeffect.c:682
[inline]
[<00000000b58b62f8>] sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1384
[inline]
[<00000000b58b62f8>] sctp_side_effects net/sctp/sm_sideeffect.c:1194
[inline]
[<00000000b58b62f8>] sctp_do_sm+0xbdc/0x1d60 net/sctp/sm_sideeffect.c:1165
[<0000000044e11f96>] sctp_assoc_bh_rcv+0x13c/0x200
net/sctp/associola.c:1074
[<00000000ec43804d>] sctp_inq_push+0x7f/0xb0 net/sctp/inqueue.c:95
[<00000000726aa954>] sctp_backlog_rcv+0x5e/0x2a0 net/sctp/input.c:354
[<00000000d9e249a8>] sk_backlog_rcv include/net/sock.h:950 [inline]
[<00000000d9e249a8>] __release_sock+0xab/0x110 net/core/sock.c:2418
[<00000000acae44fa>] release_sock+0x37/0xd0 net/core/sock.c:2934
[<00000000963cc9ae>] sctp_sendmsg+0x2c0/0x990 net/sctp/socket.c:2122
[<00000000a7fc7565>] inet_sendmsg+0x64/0x120 net/ipv4/af_inet.c:802
[<00000000b732cbd3>] sock_sendmsg_nosec net/socket.c:652 [inline]
[<00000000b732cbd3>] sock_sendmsg+0x54/0x70 net/socket.c:671
[<00000000274c57ab>] ___sys_sendmsg+0x393/0x3c0 net/socket.c:2292
[<000000008252aedb>] __sys_sendmsg+0x80/0xf0 net/socket.c:2330
[<00000000f7bf23d1>] __do_sys_sendmsg net/socket.c:2339 [inline]
[<00000000f7bf23d1>] __se_sys_sendmsg net/socket.c:2337 [inline]
[<00000000f7bf23d1>] __x64_sys_sendmsg+0x23/0x30 net/socket.c:2337
[<00000000a8b4131f>] do_syscall_64+0x76/0x1a0 arch/x86/entry/common.c:3
The problem was that the peer.cookie value points to an skb allocated
area on the first pass through this function, at which point it is
overwritten with a heap allocated value, but in certain cases, where a
COOKIE_ECHO chunk is included in the packet, a second pass through
sctp_process_init is made, where the cookie value is re-allocated,
leaking the first allocation.
Fix is to always allocate the cookie value, and free it when we are done
using it.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: syzbot+f7e9153b037eac9b1df8@syzkaller.appspotmail.com
CC: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: netdev@vger.kernel.org
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0ee4e76937 ]
ethtool_get_regs() allocates a buffer of size ops->get_regs_len(),
and pass it to the kernel driver via ops->get_regs() for filling.
There is no restriction about what the kernel drivers can or cannot do
with the open ethtool_regs structure. They usually set regs->version
and ignore regs->len or set it to the same size as ops->get_regs_len().
But if userspace allocates a smaller buffer for the registers dump,
we would cause a userspace buffer overflow in the final copy_to_user()
call, which uses the regs.len value potentially reset by the driver.
To fix this, make this case obvious and store regs.len before calling
ops->get_regs(), to only copy as much data as requested by userspace,
up to the value returned by ops->get_regs_len().
While at it, remove the redundant check for non-null regbuf.
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 137caa702f upstream.
The current buffer check halves the frame rate on non-plus i.MX6Q,
as the IDMAC current buffer pointer is not yet updated when
ipu_plane_atomic_update_pending is called from the EOF irq handler.
Fixes: 70e8a0c71e ("drm/imx: ipuv3-plane: add function to query atomic update status")
Tested-by: Marco Felsch <m.felsch@pengutronix.de>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 63cb444418 upstream.
This may confuse user-space clients like plymouth that opens a drm
file descriptor as a result of a hotplug event and then generates a
new event...
Cc: <stable@vger.kernel.org>
Fixes: 5ea1734827 ("drm/vmwgfx: Send a hotplug event at master_set")
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Deepak Rawat <drawat@vmware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e41c20cf50 upstream.
In compat mode, we allowed host-backed user-space with guest-backed
kernel / device. In this mode, set shader commands was broken since
no relocations were emitted. Fix this.
Cc: <stable@vger.kernel.org>
Fixes: e8c66efbfe ("drm/vmwgfx: Make user resource lookups reference-free during validation")
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8407f8a1d9 upstream.
User-space handles equal to zero are interpreted as uninitialized or
illegal by some drm systems (most notably kms). This means that a
dumb buffer or surface with a zero user-space handle can never be
used as a kms frame-buffer.
Cc: <stable@vger.kernel.org>
Fixes: c7eae62666 ("drm/vmwgfx: Make the object handles idr-generated")
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Deepak Rawat <drawat@vmware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 61b51fb51c upstream.
The allocated pages need to be invalidated in CPU caches. On ARM32 the
DMA_BIDIRECTIONAL flag only ensures that data is written-back to DRAM and
the data stays in CPU cache lines. While the DMA_FROM_DEVICE flag ensures
that the corresponding CPU cache lines are getting invalidated and nothing
more, that's exactly what is needed for a newly allocated pages.
This fixes randomly failing rendercheck tests on Tegra30 using the
Opentegra driver for tests that use small-sized pixmaps (10x10 and less,
i.e. 1-2 memory pages) because apparently CPU reads out stale data from
caches and/or that data is getting evicted to DRAM at the time of HW job
execution.
Fixes: bd43c9f0fa ("drm/tegra: gem: Map pages via the DMA API")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7210e06015 upstream.
The gcc-common.h file did not take into account certain macros that
might have already been defined in the build environment. This updates
the header to avoid redefining the macros, as seen on a Darwin host
using gcc 4.9.2:
HOSTCXX -fPIC scripts/gcc-plugins/arm_ssp_per_task_plugin.o - due to: scripts/gcc-plugins/gcc-common.h
In file included from scripts/gcc-plugins/arm_ssp_per_task_plugin.c:3:0:
scripts/gcc-plugins/gcc-common.h:153:0: warning: "__unused" redefined
^
In file included from /usr/include/stdio.h:64:0,
from /Users/hns/Documents/Projects/QuantumSTEP/System/Library/Frameworks/System.framework/Versions-jessie/x86_64-apple-darwin15.0.0/gcc/arm-linux-gnueabi/bin/../lib/gcc/arm-linux-gnueabi/4.9.2/plugin/include/system.h:40,
from /Users/hns/Documents/Projects/QuantumSTEP/System/Library/Frameworks/System.framework/Versions-jessie/x86_64-apple-darwin15.0.0/gcc/arm-linux-gnueabi/bin/../lib/gcc/arm-linux-gnueabi/4.9.2/plugin/include/gcc-plugin.h:28,
from /Users/hns/Documents/Projects/QuantumSTEP/System/Library/Frameworks/System.framework/Versions-jessie/x86_64-apple-darwin15.0.0/gcc/arm-linux-gnueabi/bin/../lib/gcc/arm-linux-gnueabi/4.9.2/plugin/include/plugin.h:23,
from scripts/gcc-plugins/gcc-common.h:9,
from scripts/gcc-plugins/arm_ssp_per_task_plugin.c:3:
/usr/include/sys/cdefs.h:161:0: note: this is the location of the previous definition
^
Reported-and-tested-by: "H. Nikolaus Schaller" <hns@goldelico.com>
Fixes: 189af46571 ("ARM: smp: add support for per-task stack canaries")
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 141731d15d upstream.
This reverts most of commit b8eee0e90f ("lockd: Show pid of lockd for
remote locks"), which caused remote locks to not be differentiated between
remote processes for NLM.
We retain the fixup for setting the client's fl_pid to a negative value.
Fixes: b8eee0e90f ("lockd: Show pid of lockd for remote locks")
Cc: stable@vger.kernel.org
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: XueWei Zhang <xueweiz@google.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 31fad7d41e upstream.
In cifs_read_allocate_pages, in case of ENOMEM, we go through
whole rdata->pages array but we have failed the allocation before
nr_pages, therefore we may end up calling put_page with NULL
pointer, causing oops
Signed-off-by: Roberto Bergantinos Corpas <rbergant@redhat.com>
Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 210782038b upstream.
Currently in the case where SMB2_ioctl returns the -EOPNOTSUPP error
there is a memory leak of pneg_inbuf. Fix this by returning via
the out_free_inbuf exit path that will perform the relevant kfree.
Addresses-Coverity: ("Resource leak")
Fixes: 969ae8e8d4 ("cifs: Accept validate negotiate if server return NT_STATUS_NOT_SUPPORTED")
CC: Stable <stable@vger.kernel.org> # v5.1+
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a67fedd788 upstream.
Commit e895f00a84 ("Staging: wlan-ng: hfa384x_usb.c Fixed too long
code line warnings.") moved the retrieval of the transfer buffer from
the URB from the top of function hfa384x_usbin_callback to a point
after reposting of the URB via a call to submit_rx_urb. The reposting
of the URB allocates a new transfer buffer so the new buffer is
retrieved instead of the buffer containing the response passed into
the callback. This results in failure to initialize the adapter with
an error reported in the system log (something like "CTLX[1] error:
state(Request failed)").
This change moves the retrieval to just before the point where the URB
is reposted so that the correct transfer buffer is retrieved and
initialization of the device succeeds.
Signed-off-by: Tim Collier <osdevtc@gmail.com>
Fixes: e895f00a84 ("Staging: wlan-ng: hfa384x_usb.c Fixed too long code line warnings.")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ca641bae6d upstream.
The create_pagelist() "count" parameter comes from the user in
vchiq_ioctl() and it could overflow. If you look at how create_page()
is called in vchiq_prepare_bulk_data(), then the "size" variable is an
int so it doesn't make sense to allow negatives or larger than INT_MAX.
I don't know this code terribly well, but I believe that typical values
of "count" are typically quite low and I don't think this check will
affect normal valid uses at all.
The "pagelist_size" calculation can also overflow on 32 bit systems, but
not on 64 bit systems. I have added an integer overflow check for that
as well.
The Raspberry PI doesn't offer the same level of memory protection that
x86 does so these sorts of bugs are probably not super critical to fix.
Fixes: 71bad7f086 ("staging: add bcm2708 vchiq driver")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a1ad1cc970 upstream.
After memory allocation failure vc_allocate() doesn't clean up data
which has been initialized in visual_init(). In case of fbcon this
leads to divide-by-0 in fbcon_init() on next open of the same tty.
memory allocation in vc_allocate() may fail here:
1097: vc->vc_screenbuf = kzalloc(vc->vc_screenbuf_size, GFP_KERNEL);
on next open() fbcon_init() skips vc_font.data initialization:
1088: if (!p->fontdata) {
division by zero in fbcon_init() happens here:
1149: new_cols /= vc->vc_font.width;
Additional check is needed in fbcon_deinit() to prevent
usage of uninitialized vc_screenbuf:
1251: if (vc->vc_hi_font_mask && vc->vc_screenbuf)
1252: set_vc_hi_font(vc, false);
Crash:
#6 [ffffc90001eafa60] divide_error at ffffffff81a00be4
[exception RIP: fbcon_init+463]
RIP: ffffffff814b860f RSP: ffffc90001eafb18 RFLAGS: 00010246
...
#7 [ffffc90001eafb60] visual_init at ffffffff8154c36e
#8 [ffffc90001eafb80] vc_allocate at ffffffff8154f53c
#9 [ffffc90001eafbc8] con_install at ffffffff8154f624
...
Signed-off-by: Grzegorz Halat <ghalat@redhat.com>
Reviewed-by: Oleksandr Natalenko <oleksandr@redhat.com>
Acked-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 221be106d7 upstream.
This patch prevents memory access beyond the evm_tfm array by checking the
validity of the index (hash algorithm) passed to init_desc(). The hash
algorithm can be arbitrarily set if the security.ima xattr type is not
EVM_XATTR_HMAC.
Fixes: 5feeb61183 ("evm: Allow non-SHA1 digital signatures")
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 558b523d46 upstream.
Checking efi_enabled(EFI_BOOT) is not sufficient to ensure that
EFI runtime services are available, e.g. if efi=noruntime is used.
Without this, I get an oops on a PREEMPT_RT kernel where efi=noruntime is
the default.
Fixes: 399574c64e ("x86/ima: retry detecting secure boot mode")
Cc: stable@vger.kernel.org (linux-5.0)
Signed-off-by: Scott Wood <swood@redhat.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 096ea522e8 upstream.
Recent versions of sphinx will emit messages like:
Documentation/sphinx/kerneldoc.py:103:
RemovedInSphinx20Warning: app.warning() is now deprecated.
Use sphinx.util.logging instead.
Switch to sphinx.util.logging to make this unsightly message go away.
Alas, that interface was only added in version 1.6, so we have to add a
version check to keep things working with older sphinxes.
Cc: stable@vger.kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2404dad1f6 upstream.
AutoReporter is going away; recent versions of sphinx emit a warning like:
Documentation/sphinx/kerneldoc.py:125:
RemovedInSphinx20Warning: AutodocReporter is now deprecated.
Use sphinx.util.docutils.switch_source_input() instead.
Make the switch. But switch_source_input() only showed up in 1.7, so we
have to do ugly version checks to keep things working in older versions.
Cc: stable@vger.kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3bc8088464 upstream.
Our version check in Documentation/conf.py never envisioned a world where
Sphinx moved beyond 1.x. Now that the unthinkable has happened, fix our
version check to handle higher version numbers correctly.
Cc: stable@vger.kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e577c8b64d upstream.
When we have holes in a normal memory zone, we could endup having
cached_migrate_pfns which may not necessarily be valid, under heavy memory
pressure with swapping enabled ( via __reset_isolation_suitable(),
triggered by kswapd).
Later if we fail to find a page via fast_isolate_freepages(), we may end
up using the migrate_pfn we started the search with, as valid page. This
could lead to accessing NULL pointer derefernces like below, due to an
invalid mem_section pointer.
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008 [47/1825]
Mem abort info:
ESR = 0x96000004
Exception class = DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000004
CM = 0, WnR = 0
user pgtable: 4k pages, 48-bit VAs, pgdp = 0000000082f94ae9
[0000000000000008] pgd=0000000000000000
Internal error: Oops: 96000004 [#1] SMP
...
CPU: 10 PID: 6080 Comm: qemu-system-aar Not tainted 510-rc1+ #6
Hardware name: AmpereComputing(R) OSPREY EV-883832-X3-0001/OSPREY, BIOS 4819 09/25/2018
pstate: 60000005 (nZCv daif -PAN -UAO)
pc : set_pfnblock_flags_mask+0x58/0xe8
lr : compaction_alloc+0x300/0x950
[...]
Process qemu-system-aar (pid: 6080, stack limit = 0x0000000095070da5)
Call trace:
set_pfnblock_flags_mask+0x58/0xe8
compaction_alloc+0x300/0x950
migrate_pages+0x1a4/0xbb0
compact_zone+0x750/0xde8
compact_zone_order+0xd8/0x118
try_to_compact_pages+0xb4/0x290
__alloc_pages_direct_compact+0x84/0x1e0
__alloc_pages_nodemask+0x5e0/0xe18
alloc_pages_vma+0x1cc/0x210
do_huge_pmd_anonymous_page+0x108/0x7c8
__handle_mm_fault+0xdd4/0x1190
handle_mm_fault+0x114/0x1c0
__get_user_pages+0x198/0x3c0
get_user_pages_unlocked+0xb4/0x1d8
__gfn_to_pfn_memslot+0x12c/0x3b8
gfn_to_pfn_prot+0x4c/0x60
kvm_handle_guest_abort+0x4b0/0xcd8
handle_exit+0x140/0x1b8
kvm_arch_vcpu_ioctl_run+0x260/0x768
kvm_vcpu_ioctl+0x490/0x898
do_vfs_ioctl+0xc4/0x898
ksys_ioctl+0x8c/0xa0
__arm64_sys_ioctl+0x28/0x38
el0_svc_common+0x74/0x118
el0_svc_handler+0x38/0x78
el0_svc+0x8/0xc
Code: f8607840 f100001f 8b011401 9a801020 (f9400400)
---[ end trace af6a35219325a9b6 ]---
The issue was reported on an arm64 server with 128GB with holes in the
zone (e.g, [32GB@4GB, 96GB@544GB]), with a swap device enabled, while
running 100 KVM guest instances.
This patch fixes the issue by ensuring that the page belongs to a valid
PFN when we fallback to using the lower limit of the scan range upon
failure in fast_isolate_freepages().
Link: http://lkml.kernel.org/r/1558711908-15688-1-git-send-email-suzuki.poulose@arm.com
Fixes: 5a811889de ("mm, compaction: use free lists to quickly locate a migration target")
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reported-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Mel Gorman <mgorman@techsingularity.net>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d76cac67db upstream.
I don't think this is userspace visible but SIGKILL does not have
any si_codes that use the fault member of the siginfo union. Correct
this the simple way and call force_sig instead of force_sig_fault when
the signal is SIGKILL.
The two know places where synchronous SIGKILL are generated are
do_bad_area and fpsimd_save. The call paths to force_sig_fault are:
do_bad_area
arm64_force_sig_fault
force_sig_fault
force_signal_inject
arm64_notify_die
arm64_force_sig_fault
force_sig_fault
Which means correcting this in arm64_force_sig_fault is enough
to ensure the arm64 code is not misusing the generic code, which
could lead to maintenance problems later.
Cc: stable@vger.kernel.org
Cc: Dave Martin <Dave.Martin@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Fixes: af40ff687b ("arm64: signal: Ensure si_code is valid for all fault signals")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3e85899637 upstream.
We have a single node system with node 0 disabled:
Scanning NUMA topology in Northbridge 24
Number of physical nodes 2
Skipping disabled node 0
Node 1 MemBase 0000000000000000 Limit 00000000fbff0000
NODE_DATA(1) allocated [mem 0xfbfda000-0xfbfeffff]
This causes crashes in memcg when system boots:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
#PF error: [normal kernel read fault]
...
RIP: 0010:list_lru_add+0x94/0x170
...
Call Trace:
d_lru_add+0x44/0x50
dput.part.34+0xfc/0x110
__fput+0x108/0x230
task_work_run+0x9f/0xc0
exit_to_usermode_loop+0xf5/0x100
It is reproducible as far as 4.12. I did not try older kernels. You have
to have a new enough systemd, e.g. 241 (the reason is unknown -- was not
investigated). Cannot be reproduced with systemd 234.
The system crashes because the size of lru array is never updated in
memcg_update_all_list_lrus and the reads are past the zero-sized array,
causing dereferences of random memory.
The root cause are list_lru_memcg_aware checks in the list_lru code. The
test in list_lru_memcg_aware is broken: it assumes node 0 is always
present, but it is not true on some systems as can be seen above.
So fix this by avoiding checks on node 0. Remember the memcg-awareness by
a bool flag in struct list_lru.
Link: http://lkml.kernel.org/r/20190522091940.3615-1-jslaby@suse.cz
Fixes: 60d3fd32a7 ("list_lru: introduce per-memcg lists")
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Suggested-by: Vladimir Davydov <vdavydov.dev@gmail.com>
Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9852ae3fe5 upstream.
memory.stat and other files already consider subtrees in their output, and
we should too in order to not present an inconsistent interface.
The current situation is fairly confusing, because people interacting with
cgroups expect hierarchical behaviour in the vein of memory.stat,
cgroup.events, and other files. For example, this causes confusion when
debugging reclaim events under low, as currently these always read "0" at
non-leaf memcg nodes, which frequently causes people to misdiagnose breach
behaviour. The same confusion applies to other counters in this file when
debugging issues.
Aggregation is done at write time instead of at read-time since these
counters aren't hot (unlike memory.stat which is per-page, so it does it
at read time), and it makes sense to bundle this with the file
notifications.
After this patch, events are propagated up the hierarchy:
[root@ktst ~]# cat /sys/fs/cgroup/system.slice/memory.events
low 0
high 0
max 0
oom 0
oom_kill 0
[root@ktst ~]# systemd-run -p MemoryMax=1 true
Running as unit: run-r251162a189fb4562b9dabfdc9b0422f5.service
[root@ktst ~]# cat /sys/fs/cgroup/system.slice/memory.events
low 0
high 0
max 7
oom 1
oom_kill 1
As this is a change in behaviour, this can be reverted to the old
behaviour by mounting with the `memory_localevents' flag set. However, we
use the new behaviour by default as there's a lack of evidence that there
are any current users of memory.events that would find this change
undesirable.
akpm: this is a behaviour change, so Cc:stable. THis is so that
forthcoming distros which use cgroup v2 are more likely to pick up the
revised behaviour.
Link: http://lkml.kernel.org/r/20190208224419.GA24772@chrisdown.name
Signed-off-by: Chris Down <chris@chrisdown.name>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5d24f455c1 upstream.
The datasheet states:
Bit 4: ClockEnSet the ClockEn bit high to enable an external clocking
(crystal or clock generator at XIN). Set the ClockEn bit to 0 to disable
clocking
Bit 1: CrystalEnSet the CrystalEn bit high to enable the crystal
oscillator. When using an external clock source at XIN, CrystalEn must
be set low.
The bit 4, MAX310X_CLKSRC_EXTCLK_BIT, should be set and was not.
This was required to make the MAX3107 with an external crystal on our
board able to send or receive data.
Signed-off-by: Joe Burmeister <joe.burmeister@devtank.co.uk>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 61c0e37950 upstream.
When the tty layer requests the uart to throttle, the current code
executing in msm_serial will trigger "Bad mode in Error Handler" and
generate an invalid stack frame in pstore before rebooting (that is if
pstore is indeed configured: otherwise the user shall just notice a
reboot with no further information dumped to the console).
This patch replaces the PIO byte accessor with the word accessor
already used in PIO mode.
Fixes: 68252424a7 ("tty: serial: msm: Support big-endian CPUs")
Cc: stable@vger.kernel.org
Signed-off-by: Jorge Ramirez-Ortiz <jorge.ramirez-ortiz@linaro.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 13067ef73f upstream.
Fix wrong order in probing routine initialization - field `base_addr'
is used before it's initialized. Move assignment of 'priv->base_addr`
to the beginning, prior the call to mlxcpld_i2c_read_comm().
Wrong order caused the first read of capability register to be executed
at wrong offset 0x0 instead of 0x2000. By chance it was a "good
garbage" at 0x0 offset.
Fixes: 313ce648b5 ("i2c: mlxcpld: Add support for extended transaction length for i2c-mlxcpld")
Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 342406e4fb upstream.
For a while, we've had the problem of i2c bus access not grabbing
a runtime PM ref when it's being used in userspace by i2c-dev, resulting
in nouveau spamming the kernel log with errors if anything attempts to
access the i2c bus while the GPU is in runtime suspend. An example:
[ 130.078386] nouveau 0000:01:00.0: i2c: aux 000d: begin idle timeout ffffffff
Since the GPU is in runtime suspend, the MMIO region that the i2c bus is
on isn't accessible. On x86, the standard behavior for accessing an
unavailable MMIO region is to just return ~0.
Except, that turned out to be a lie. While computers with a clean
concious will return ~0 in this scenario, some machines will actually
completely hang a CPU on certian bad MMIO accesses. This was witnessed
with someone's Lenovo ThinkPad P50, where sensors-detect attempting to
access the i2c bus while the GPU was suspended would result in a CPU
hang:
CPU: 5 PID: 12438 Comm: sensors-detect Not tainted 5.0.0-0.rc4.git3.1.fc30.x86_64 #1
Hardware name: LENOVO 20EQS64N17/20EQS64N17, BIOS N1EET74W (1.47 ) 11/21/2017
RIP: 0010:ioread32+0x2b/0x30
Code: 81 ff ff ff 03 00 77 20 48 81 ff 00 00 01 00 76 05 0f b7 d7 ed c3
48 c7 c6 e1 0c 36 96 e8 2d ff ff ff b8 ff ff ff ff c3 8b 07 <c3> 0f 1f
40 00 49 89 f0 48 81 fe ff ff 03 00 76 04 40 88 3e c3 48
RSP: 0018:ffffaac3c5007b48 EFLAGS: 00000292 ORIG_RAX: ffffffffffffff13
RAX: 0000000001111000 RBX: 0000000001111000 RCX: 0000043017a97186
RDX: 0000000000000aaa RSI: 0000000000000005 RDI: ffffaac3c400e4e4
RBP: ffff9e6443902c00 R08: ffffaac3c400e4e4 R09: ffffaac3c5007be7
R10: 0000000000000004 R11: 0000000000000001 R12: ffff9e6445dd0000
R13: 000000000000e4e4 R14: 00000000000003c4 R15: 0000000000000000
FS: 00007f253155a740(0000) GS:ffff9e644f600000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005630d1500358 CR3: 0000000417c44006 CR4: 00000000003606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
g94_i2c_aux_xfer+0x326/0x850 [nouveau]
nvkm_i2c_aux_i2c_xfer+0x9e/0x140 [nouveau]
__i2c_transfer+0x14b/0x620
i2c_smbus_xfer_emulated+0x159/0x680
? _raw_spin_unlock_irqrestore+0x1/0x60
? rt_mutex_slowlock.constprop.0+0x13d/0x1e0
? __lock_is_held+0x59/0xa0
__i2c_smbus_xfer+0x138/0x5a0
i2c_smbus_xfer+0x4f/0x80
i2cdev_ioctl_smbus+0x162/0x2d0 [i2c_dev]
i2cdev_ioctl+0x1db/0x2c0 [i2c_dev]
do_vfs_ioctl+0x408/0x750
ksys_ioctl+0x5e/0x90
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x60/0x1e0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7f25317f546b
Code: 0f 1e fa 48 8b 05 1d da 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff
ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01
f0 ff ff 73 01 c3 48 8b 0d ed d9 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffc88caab68 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00005630d0fe7260 RCX: 00007f25317f546b
RDX: 00005630d1598e80 RSI: 0000000000000720 RDI: 0000000000000003
RBP: 00005630d155b968 R08: 0000000000000001 R09: 00005630d15a1da0
R10: 0000000000000070 R11: 0000000000000246 R12: 00005630d1598e80
R13: 00005630d12f3d28 R14: 0000000000000720 R15: 00005630d12f3ce0
watchdog: BUG: soft lockup - CPU#5 stuck for 23s! [sensors-detect:12438]
Yikes! While I wanted to try to make it so that accessing an i2c bus on
nouveau would wake up the GPU as needed, airlied pointed out that pretty
much any usecase for userspace accessing an i2c bus on a GPU (mainly for
the DDC brightness control that some displays have) is going to only be
useful while there's at least one display enabled on the GPU anyway, and
the GPU never sleeps while there's displays running.
Since teaching the i2c bus to wake up the GPU on userspace accesses is a
good deal more difficult than it might seem, mostly due to the fact that
we have to use the i2c bus during runtime resume of the GPU, we instead
opt for the easiest solution: don't let userspace access i2c busses on
the GPU at all while it's in runtime suspend.
Changes since v1:
* Also disable i2c busses that run over DP AUX
Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a86cb413f4 upstream.
KVM_CAP_MAX_VCPU_ID is currently always reporting KVM_MAX_VCPU_ID on all
architectures. However, on s390x, the amount of usable CPUs is determined
during runtime - it is depending on the features of the machine the code
is running on. Since we are using the vcpu_id as an index into the SCA
structures that are defined by the hardware (see e.g. the sca_add_vcpu()
function), it is not only the amount of CPUs that is limited by the hard-
ware, but also the range of IDs that we can use.
Thus KVM_CAP_MAX_VCPU_ID must be determined during runtime on s390x, too.
So the handling of KVM_CAP_MAX_VCPU_ID has to be moved from the common
code into the architecture specific code, and on s390x we have to return
the same value here as for KVM_CAP_MAX_VCPUS.
This problem has been discovered with the kvm_create_max_vcpus selftest.
With this change applied, the selftest now passes on s390x, too.
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20190523164309.13345-9-thuth@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9cb40eb184 upstream.
We met another Acer Aspire laptop which has the problem on the
headset-mic, the Pin 0x19 is not set the corret configuration for a
mic and the pin presence can't be detected too after plugging a
headset. Kailang suggested that we should set the coeff to enable the
mic and apply the ALC269_FIXUP_LIFEBOOK_EXTMIC. After doing that,
both headset-mic presence and headset-mic work well.
The existing ALC255_FIXUP_ACER_MIC_NO_PRESENCE set the headset-mic
jack to be a phantom jack. Now since the jack can support presence
unsol event, let us imporve it to set the jack to be a normal jack.
https://bugs.launchpad.net/bugs/1821269
Fixes: 5824ce8de7 ("ALSA: hda/realtek - Add support for Acer Aspire E5-475 headset mic")
Cc: Chris Chiu <chiu@endlessm.com>
CC: Daniel Drake <drake@endlessm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Kailang Yang <kailang@realtek.com>
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 317d931392 upstream.
I measured power consumption between power_save_node=1 and power_save_node=0.
It's almost the same.
Codec will enter to runtime suspend and suspend.
That pin also will enter to D3. Don't need to enter to D3 by single pin.
So, Disable power_save_node as default. It will avoid more issues.
Windows Driver also has not this option at runtime PM.
Signed-off-by: Kailang Yang <kailang@realtek.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0b074ab7fc upstream.
The current code performs the cancel of a delayed work at the late
stage of disconnection procedure, which may lead to the access to the
already cleared state.
This patch assures to call cancel_delayed_work_sync() at the beginning
of the disconnection procedure for avoiding that race. The delayed
work object is now assigned in the common line6 object instead of its
derivative, so that we can call cancel_delayed_work_sync().
Along with the change, the startup function is called via the new
callback instead. This will make it easier to port other LINE6
drivers to use the delayed work for startup in later patches.
Reported-by: syzbot+5255458d5e0a2b10bbb9@syzkaller.appspotmail.com
Fixes: 7f84ff68be ("ALSA: line6: toneport: Fix broken usage of timer for delayed execution")
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8b909e3548 upstream.
Commit b6664ba42f ("s390, kexec_file: drop arch_kexec_mem_walk()")
changed kexec_add_buffer() to skip searching for a memory location if
kexec_buf.mem is already set, and use the address that is there.
In powerpc code we reuse a kexec_buf variable for loading both the
kernel and the initramfs by resetting some of the fields between those
uses, but not mem. This causes kexec_add_buffer() to try to load the
kernel at the same address where initramfs will be loaded, which is
naturally rejected:
# kexec -s -l --initrd initramfs vmlinuz
kexec_file_load failed: Invalid argument
Setting the mem field before every call to kexec_add_buffer() fixes
this regression.
Fixes: b6664ba42f ("s390, kexec_file: drop arch_kexec_mem_walk()")
Cc: stable@vger.kernel.org # v5.0+
Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Reviewed-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3202e35ec1 upstream.
Consider a scenario where user creates two events:
1st event:
attr.sample_type |= PERF_SAMPLE_BRANCH_STACK;
attr.branch_sample_type = PERF_SAMPLE_BRANCH_ANY;
fd = perf_event_open(attr, 0, 1, -1, 0);
This sets cpuhw->bhrb_filter to 0 and returns valid fd.
2nd event:
attr.sample_type |= PERF_SAMPLE_BRANCH_STACK;
attr.branch_sample_type = PERF_SAMPLE_BRANCH_CALL;
fd = perf_event_open(attr, 0, 1, -1, 0);
It overrides cpuhw->bhrb_filter to -1 and returns with error.
Now if power_pmu_enable() gets called by any path other than
power_pmu_add(), ppmu->config_bhrb(-1) will set MMCRA to -1.
Fixes: 3925f46bb5 ("powerpc/perf: Enable branch stack sampling framework")
Cc: stable@vger.kernel.org # v3.10+
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d724c9e549 upstream.
The sprgs are a set of 4 general purpose sprs provided for software use.
SPRG3 is special in that it can also be read from userspace. Thus it is
used on linux to store the cpu and numa id of the process to speed up
syscall access to this information.
This register is overwritten with the guest value on kvm guest entry,
and so needs to be restored on exit again. Thus restore the value on
the guest exit path in kvmhv_p9_guest_entry().
Cc: stable@vger.kernel.org # v4.20+
Fixes: 95a6432ce9 ("KVM: PPC: Book3S HV: Streamlined guest entry/exit path on P9 for radix guests")
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1c2c7029c0 upstream.
This patch fixes a complain about possible sleep during
spinlock aquired
"BUG: sleeping function called from invalid context at
include/crypto/algapi.h:426"
for the ctr(aes) and ctr(des) s390 specific ciphers.
Instead of using a spinlock this patch introduces a mutex
which is save to be held in sleeping context. Please note
a deadlock is not possible as mutex_trylock() is used.
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reported-by: Julian Wiedmann <jwi@linux.ibm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bef9f0ba30 upstream.
The current kernel uses improved crypto selftests. These
tests showed that the current implementation of gcm-aes-s390
is not able to deal with chunks of output buffers which are
not a multiple of 16 bytes. This patch introduces a rework
of the gcm aes s390 scatter walk handling which now is able
to handle any input and output scatter list chunk sizes
correctly.
Code has been verified by the crypto selftests, the tcrypt
kernel module and additional tests ran via the af_alg interface.
Cc: <stable@vger.kernel.org>
Reported-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Patrick Steuer <steuer@linux.ibm.com>
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 30d40577e3 upstream.
[BUG]
When a fs has orphan reloc tree along with unfinished balance:
...
item 16 key (TREE_RELOC ROOT_ITEM FS_TREE) itemoff 12090 itemsize 439
generation 12 root_dirid 256 bytenr 300400640 level 1 refs 0 <<<
lastsnap 8 byte_limit 0 bytes_used 1359872 flags 0x0(none)
uuid 7c48d938-33a3-4aae-ab19-6e5c9d406e46
item 17 key (BALANCE TEMPORARY_ITEM 0) itemoff 11642 itemsize 448
temporary item objectid BALANCE offset 0
balance status flags 14
Then at mount time, we can hit the following kernel BUG_ON():
BTRFS info (device dm-3): relocating block group 298844160 flags metadata|dup
------------[ cut here ]------------
kernel BUG at fs/btrfs/relocation.c:1413!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 1 PID: 897 Comm: btrfs-balance Tainted: G O 5.2.0-rc1-custom #15
RIP: 0010:create_reloc_root+0x1eb/0x200 [btrfs]
Call Trace:
btrfs_init_reloc_root+0x96/0xb0 [btrfs]
record_root_in_trans+0xb2/0xe0 [btrfs]
btrfs_record_root_in_trans+0x55/0x70 [btrfs]
select_reloc_root+0x7e/0x230 [btrfs]
do_relocation+0xc4/0x620 [btrfs]
relocate_tree_blocks+0x592/0x6a0 [btrfs]
relocate_block_group+0x47b/0x5d0 [btrfs]
btrfs_relocate_block_group+0x183/0x2f0 [btrfs]
btrfs_relocate_chunk+0x4e/0xe0 [btrfs]
btrfs_balance+0x864/0xfa0 [btrfs]
balance_kthread+0x3b/0x50 [btrfs]
kthread+0x123/0x140
ret_from_fork+0x27/0x50
[CAUSE]
In btrfs, reloc trees are used to record swapped tree blocks during
balance.
Reloc tree either get merged (replace old tree blocks of its parent
subvolume) in next transaction if its ref is 1 (fresh).
Or is already merged and will be cleaned up if its ref is 0 (orphan).
After commit d2311e6985 ("btrfs: relocation: Delay reloc tree deletion
after merge_reloc_roots"), reloc tree cleanup is delayed until one block
group is balanced.
Since fresh reloc roots are recorded during merge, as long as there
is no power loss, those orphan reloc roots converted from fresh ones are
handled without problem.
However when power loss happens, orphan reloc roots can be recorded
on-disk, thus at next mount time, we will have orphan reloc roots from
on-disk data directly, and ignored by clean_dirty_subvols() routine.
Then when background balance starts to balance another block group, and
needs to create new reloc root for the same root, btrfs_insert_item()
returns -EEXIST, and trigger that BUG_ON().
[FIX]
For orphan reloc roots, also queue them to rc->dirty_subvol_roots, so
all reloc roots no matter orphan or not, can be cleaned up properly and
avoid above BUG_ON().
And to cooperate with above change, clean_dirty_subvols() will check if
the queued root is a reloc root or a subvol root.
For a subvol root, do the old work, and for a orphan reloc root, clean it
up.
Fixes: d2311e6985 ("btrfs: relocation: Delay reloc tree deletion after merge_reloc_roots")
CC: stable@vger.kernel.org # 5.1
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6b1f72e5b8 upstream.
When using the no-holes feature, if we have a file with prealloc extents
with a start offset beyond the file's eof, doing an incremental send can
cause corruption of the file due to incorrect hole detection. Such case
requires that the prealloc extent(s) exist in both the parent and send
snapshots, and that a hole is punched into the file that covers all its
extents that do not cross the eof boundary.
Example reproducer:
$ mkfs.btrfs -f -O no-holes /dev/sdb
$ mount /dev/sdb /mnt/sdb
$ xfs_io -f -c "pwrite -S 0xab 0 500K" /mnt/sdb/foobar
$ xfs_io -c "falloc -k 1200K 800K" /mnt/sdb/foobar
$ btrfs subvolume snapshot -r /mnt/sdb /mnt/sdb/base
$ btrfs send -f /tmp/base.snap /mnt/sdb/base
$ xfs_io -c "fpunch 0 500K" /mnt/sdb/foobar
$ btrfs subvolume snapshot -r /mnt/sdb /mnt/sdb/incr
$ btrfs send -p /mnt/sdb/base -f /tmp/incr.snap /mnt/sdb/incr
$ md5sum /mnt/sdb/incr/foobar
816df6f64deba63b029ca19d880ee10a /mnt/sdb/incr/foobar
$ mkfs.btrfs -f /dev/sdc
$ mount /dev/sdc /mnt/sdc
$ btrfs receive -f /tmp/base.snap /mnt/sdc
$ btrfs receive -f /tmp/incr.snap /mnt/sdc
$ md5sum /mnt/sdc/incr/foobar
cf2ef71f4a9e90c2f6013ba3b2257ed2 /mnt/sdc/incr/foobar
--> Different checksum, because the prealloc extent beyond the
file's eof confused the hole detection code and it assumed
a hole starting at offset 0 and ending at the offset of the
prealloc extent (1200Kb) instead of ending at the offset
500Kb (the file's size).
Fix this by ensuring we never cross the file's size when issuing the
write operations for a hole.
Fixes: 16e7549f04 ("Btrfs: incompatible format change to remove hole extents")
CC: stable@vger.kernel.org # 3.14+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 57949d033a upstream.
[BUG]
When mounting a fs with reloc tree and has qgroup enabled, it can cause
NULL pointer dereference at mount time:
BUG: kernel NULL pointer dereference, address: 00000000000000a8
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP NOPTI
RIP: 0010:btrfs_qgroup_add_swapped_blocks+0x186/0x300 [btrfs]
Call Trace:
replace_path.isra.23+0x685/0x900 [btrfs]
merge_reloc_root+0x26e/0x5f0 [btrfs]
merge_reloc_roots+0x10a/0x1a0 [btrfs]
btrfs_recover_relocation+0x3cd/0x420 [btrfs]
open_ctree+0x1bc8/0x1ed0 [btrfs]
btrfs_mount_root+0x544/0x680 [btrfs]
legacy_get_tree+0x34/0x60
vfs_get_tree+0x2d/0xf0
fc_mount+0x12/0x40
vfs_kern_mount.part.12+0x61/0xa0
vfs_kern_mount+0x13/0x20
btrfs_mount+0x16f/0x860 [btrfs]
legacy_get_tree+0x34/0x60
vfs_get_tree+0x2d/0xf0
do_mount+0x81f/0xac0
ksys_mount+0xbf/0xe0
__x64_sys_mount+0x25/0x30
do_syscall_64+0x65/0x240
entry_SYSCALL_64_after_hwframe+0x49/0xbe
[CAUSE]
In btrfs_recover_relocation(), we don't have enough info to determine
which block group we're relocating, but only to merge existing reloc
trees.
Thus in btrfs_recover_relocation(), rc->block_group is NULL.
btrfs_qgroup_add_swapped_blocks() hasn't taken this into consideration,
and causes a NULL pointer dereference.
The bug is introduced by commit 3d0174f78e ("btrfs: qgroup: Only trace
data extents in leaves if we're relocating data block group"), and
later qgroup refactoring still keeps this optimization.
[FIX]
Thankfully in the context of btrfs_recover_relocation(), there is no
other progress can modify tree blocks, thus those swapped tree blocks
pair will never affect qgroup numbers, no matter whatever we set for
block->trace_leaf.
So we only need to check if @bg is NULL before accessing @bg->flags.
Reported-by: Juan Erbes <jerbes@gmail.com>
Link: https://bugzilla.opensuse.org/show_bug.cgi?id=1134806
Fixes: 3d0174f78e ("btrfs: qgroup: Only trace data extents in leaves if we're relocating data block group")
CC: stable@vger.kernel.org # 4.20+
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 60d9f50308 upstream.
While logging an inode we follow its ancestors and for each one we mark
it as logged in the current transaction, even if we have not logged it.
As a consequence if we change an attribute of an ancestor, such as the
UID or GID for example, and then explicitly fsync it, we end up not
logging the inode at all despite returning success to user space, which
results in the attribute being lost if a power failure happens after
the fsync.
Sample reproducer:
$ mkfs.btrfs -f /dev/sdb
$ mount /dev/sdb /mnt
$ mkdir /mnt/dir
$ chown 6007:6007 /mnt/dir
$ sync
$ chown 9003:9003 /mnt/dir
$ touch /mnt/dir/file
$ xfs_io -c fsync /mnt/dir/file
# fsync our directory after fsync'ing the new file, should persist the
# new values for the uid and gid.
$ xfs_io -c fsync /mnt/dir
<power failure>
$ mount /dev/sdb /mnt
$ stat -c %u:%g /mnt/dir
6007:6007
--> should be 9003:9003, the uid and gid were not persisted, despite
the explicit fsync on the directory prior to the power failure
Fix this by not updating the logged_trans field of ancestor inodes when
logging an inode, since we have not logged them. Let only future calls to
btrfs_log_inode() to mark inodes as logged.
This could be triggered by my recent fsync fuzz tester for fstests, for
which an fstests patch exists titled "fstests: generic, fsync fuzz tester
with fsstress".
Fixes: 12fcfd22fe ("Btrfs: tree logging unlink/rename fixes")
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 06989c799f upstream.
When syncing the log, the final phase of a fsync operation, we need to
either create a log root's item or update the existing item in the log
tree of log roots, and that depends on the current value of the log
root's log_transid - if it's 1 we need to create the log root item,
otherwise it must exist already and we update it. Since there is no
synchronization between updating the log_transid and checking it for
deciding whether the log root's item needs to be created or updated, we
end up with a tiny race window that results in attempts to update the
item to fail because the item was not yet created:
CPU 1 CPU 2
btrfs_sync_log()
lock root->log_mutex
set log root's log_transid to 1
unlock root->log_mutex
btrfs_sync_log()
lock root->log_mutex
sets log root's
log_transid to 2
unlock root->log_mutex
update_log_root()
sees log root's log_transid
with a value of 2
calls btrfs_update_root(),
which fails with -EUCLEAN
and causes transaction abort
Until recently the race lead to a BUG_ON at btrfs_update_root(), but after
the recent commit 7ac1e464c4 ("btrfs: Don't panic when we can't find a
root key") we just abort the current transaction.
A sample trace of the BUG_ON() on a SLE12 kernel:
------------[ cut here ]------------
kernel BUG at ../fs/btrfs/root-tree.c:157!
Oops: Exception in kernel mode, sig: 5 [#1]
SMP NR_CPUS=2048 NUMA pSeries
(...)
Supported: Yes, External
CPU: 78 PID: 76303 Comm: rtas_errd Tainted: G X 4.4.156-94.57-default #1
task: c00000ffa906d010 ti: c00000ff42b08000 task.ti: c00000ff42b08000
NIP: d000000036ae5cdc LR: d000000036ae5cd8 CTR: 0000000000000000
REGS: c00000ff42b0b860 TRAP: 0700 Tainted: G X (4.4.156-94.57-default)
MSR: 8000000002029033 <SF,VEC,EE,ME,IR,DR,RI,LE> CR: 22444484 XER: 20000000
CFAR: d000000036aba66c SOFTE: 1
GPR00: d000000036ae5cd8 c00000ff42b0bae0 d000000036bda220 0000000000000054
GPR04: 0000000000000001 0000000000000000 c00007ffff8d37c8 0000000000000000
GPR08: c000000000e19c00 0000000000000000 0000000000000000 3736343438312079
GPR12: 3930373337303434 c000000007a3a800 00000000007fffff 0000000000000023
GPR16: c00000ffa9d26028 c00000ffa9d261f8 0000000000000010 c00000ffa9d2ab28
GPR20: c00000ff42b0bc48 0000000000000001 c00000ff9f0d9888 0000000000000001
GPR24: c00000ffa9d26000 c00000ffa9d261e8 c00000ffa9d2a800 c00000ff9f0d9888
GPR28: c00000ffa9d26028 c00000ffa9d2aa98 0000000000000001 c00000ffa98f5b20
NIP [d000000036ae5cdc] btrfs_update_root+0x25c/0x4e0 [btrfs]
LR [d000000036ae5cd8] btrfs_update_root+0x258/0x4e0 [btrfs]
Call Trace:
[c00000ff42b0bae0] [d000000036ae5cd8] btrfs_update_root+0x258/0x4e0 [btrfs] (unreliable)
[c00000ff42b0bba0] [d000000036b53610] btrfs_sync_log+0x2d0/0xc60 [btrfs]
[c00000ff42b0bce0] [d000000036b1785c] btrfs_sync_file+0x44c/0x4e0 [btrfs]
[c00000ff42b0bd80] [c00000000032e300] vfs_fsync_range+0x70/0x120
[c00000ff42b0bdd0] [c00000000032e44c] do_fsync+0x5c/0xb0
[c00000ff42b0be10] [c00000000032e8dc] SyS_fdatasync+0x2c/0x40
[c00000ff42b0be30] [c000000000009488] system_call+0x3c/0x100
Instruction dump:
7f43d378 4bffebb9 60000000 88d90008 3d220000 e8b90000 3b390009 e87a01f0
e8898e08 e8f90000 4bfd48e5 60000000 <0fe00000> e95b0060 39200004 394a0ea0
---[ end trace 8f2dc8f919cabab8 ]---
So fix this by doing the check of log_transid and updating or creating the
log root's item while holding the root's log_mutex.
Fixes: 7237f18336 ("Btrfs: fix tree logs parallel sync")
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5338e43abb upstream.
When replaying a log that contains a new file or directory name that needs
to be added to its parent directory, we end up updating the mtime and the
ctime of the parent directory to the current time after we have set their
values to the correct ones (set at fsync time), efectivelly losing them.
Sample reproducer:
$ mkfs.btrfs -f /dev/sdb
$ mount /dev/sdb /mnt
$ mkdir /mnt/dir
$ touch /mnt/dir/file
# fsync of the directory is optional, not needed
$ xfs_io -c fsync /mnt/dir
$ xfs_io -c fsync /mnt/dir/file
$ stat -c %Y /mnt/dir
1557856079
<power failure>
$ sleep 3
$ mount /dev/sdb /mnt
$ stat -c %Y /mnt/dir
1557856082
--> should have been 1557856079, the mtime is updated to the current
time when replaying the log
Fix this by not updating the mtime and ctime to the current time at
btrfs_add_link() when we are replaying a log tree.
This could be triggered by my recent fsync fuzz tester for fstests, for
which an fstests patch exists titled "fstests: generic, fsync fuzz tester
with fsstress".
Fixes: e02119d5a7 ("Btrfs: Add a write ahead tree log to optimize synchronous operations")
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ef4021fe5f upstream.
When the user tries to remove a zfcp port via sysfs, we only rejected it if
there are zfcp unit children under the port. With purely automatically
scanned LUNs there are no zfcp units but only SCSI devices. In such cases,
the port_remove erroneously continued. We close the port and this
implicitly closes all LUNs under the port. The SCSI devices survive with
their private zfcp_scsi_dev still holding a reference to the "removed"
zfcp_port (still allocated but invisible in sysfs) [zfcp_get_port_by_wwpn
in zfcp_scsi_slave_alloc]. This is not a problem as long as the fc_rport
stays blocked. Once (auto) port scan brings back the removed port, we
unblock its fc_rport again by design. However, there is no mechanism that
would recover (open) the LUNs under the port (no "ersfs_3" without
zfcp_unit [zfcp_erp_strategy_followup_success]). Any pending or new I/O to
such LUN leads to repeated:
Done: NEEDS_RETRY Result: hostbyte=DID_IMM_RETRY driverbyte=DRIVER_OK
See also v4.10 commit 6f2ce1c6af ("scsi: zfcp: fix rport unblock race
with LUN recovery"). Even a manual LUN recovery
(echo 0 > /sys/bus/scsi/devices/H:C:T:L/zfcp_failed)
does not help, as the LUN links to the old "removed" port which remains
to lack ZFCP_STATUS_COMMON_RUNNING [zfcp_erp_required_act].
The only workaround is to first ensure that the fc_rport is blocked
(e.g. port_remove again in case it was re-discovered by (auto) port scan),
then delete the SCSI devices, and finally re-discover by (auto) port scan.
The port scan includes an fc_rport unblock, which in turn triggers
a new scan on the scsi target to freshly get new pure auto scan LUNs.
Fix this by rejecting port_remove also if there are SCSI devices
(even without any zfcp_unit) under this port. Re-use mechanics from v3.7
commit d99b601b63 ("[SCSI] zfcp: restore refcount check on port_remove").
However, we have to give up zfcp_sysfs_port_units_mutex earlier in unit_add
to prevent a deadlock with scsi_host scan taking shost->scan_mutex first
and then zfcp_sysfs_port_units_mutex now in our zfcp_scsi_slave_alloc().
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Fixes: b62a8d9b45 ("[SCSI] zfcp: Use SCSI device data zfcp scsi dev instead of zfcp unit")
Fixes: f8210e3488 ("[SCSI] zfcp: Allow midlayer to scan for LUNs when running in NPIV mode")
Cc: <stable@vger.kernel.org> #2.6.37+
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a47686636d upstream.
Most Siano devices require an alignment for the response.
Changeset f3be52b0056a ("media: usb: siano: Fix general protection fault in smsusb")
changed the logic with gets such aligment, but it now produces a
sparce warning:
drivers/media/usb/siano/smsusb.c: In function 'smsusb_init_device':
drivers/media/usb/siano/smsusb.c:447:37: warning: 'in_maxp' may be used uninitialized in this function [-Wmaybe-uninitialized]
447 | dev->response_alignment = in_maxp - sizeof(struct sms_msg_hdr);
| ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~
The sparse message itself is bogus, but a broken (or fake) USB
eeprom could produce a negative value for response_alignment.
So, change the code in order to check if the result is not
negative.
Fixes: 31e0456de5 ("media: usb: siano: Fix general protection fault in smsusb")
CC: <stable@vger.kernel.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 45457c0117 upstream.
GCC complains about an apparently uninitialized variable recently
added to smsusb_init_device(). It's a false positive, but to silence
the warning this patch adds a trivial initialization.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: kbuild test robot <lkp@intel.com>
CC: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 31e0456de5 upstream.
The syzkaller USB fuzzer found a general-protection-fault bug in the
smsusb part of the Siano DVB driver. The fault occurs during probe
because the driver assumes without checking that the device has both
IN and OUT endpoints and the IN endpoint is ep1.
By slightly rearranging the driver's initialization code, we can make
the appropriate checks early on and thus avoid the problem. If the
expected endpoints aren't present, the new code safely returns -ENODEV
from the probe routine.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-and-tested-by: syzbot+53f029db71c19a47325a@syzkaller.appspotmail.com
CC: <stable@vger.kernel.org>
Reviewed-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ea26111338 upstream.
Without USB_QUIRK_NO_LPM ethernet will not work and rtl8152 will
complain with
r8152 <device...>: Stop submitting intr, status -71
Adding the quirk resolves this. As the dock is externally powered, this
should not have any drawbacks.
Signed-off-by: Maximilian Luz <luzmaximilian@gmail.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a03ff54460 upstream.
The syzkaller USB fuzzer found a slab-out-of-bounds write bug in the
USB core, caused by a failure to check the actual size of a BOS
descriptor. This patch adds a check to make sure the descriptor is at
least as large as it is supposed to be, so that the code doesn't
inadvertently access memory beyond the end of the allocated region
when assigning to dev->bos->desc->bNumDeviceCaps later on.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-and-tested-by: syzbot+71f1e64501a309fcc012@syzkaller.appspotmail.com
CC: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3ea3091f1b upstream.
Fix the following sparse context imbalance regression introduced in
a patch that fixed sleeping function called from invalid context bug.
kbuild test robot reported on:
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git usb-linus
Regressions in current branch:
drivers/usb/usbip/stub_dev.c:399:9: sparse: sparse: context imbalance in 'stub_probe' - different lock contexts for basic block
drivers/usb/usbip/stub_dev.c:418:13: sparse: sparse: context imbalance in 'stub_disconnect' - different lock contexts for basic block
drivers/usb/usbip/stub_dev.c:464:1-10: second lock on line 476
Error ids grouped by kconfigs:
recent_errors
├── i386-allmodconfig
│ └── drivers-usb-usbip-stub_dev.c:second-lock-on-line
├── x86_64-allmodconfig
│ ├── drivers-usb-usbip-stub_dev.c:sparse:sparse:context-imbalance-in-stub_disconnect-different-lock-contexts-for-basic-block
│ └── drivers-usb-usbip-stub_dev.c:sparse:sparse:context-imbalance-in-stub_probe-different-lock-contexts-for-basic-block
└── x86_64-allyesconfig
└── drivers-usb-usbip-stub_dev.c:second-lock-on-line
This is a real problem in an error leg where spin_lock() is called on an
already held lock.
Fix the imbalance in stub_probe() and stub_disconnect().
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Fixes: 0c9e8b3cad ("usbip: usbip_host: fix BUG: sleeping function called from invalid context")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f7fac17ca9 upstream.
Xhci_handshake() implements the algorithm already captured by
readl_poll_timeout_atomic(). Convert the former to use the latter to
avoid repetition.
Turned out this patch also fixes a bug on the AMD Stoneyridge platform
where usleep(1) sometimes takes over 10ms.
This means a 5 second timeout can easily take over 15 seconds which will
trigger the watchdog and reboot the system.
[Add info about patch fixing a bug to commit message -Mathias]
Signed-off-by: Andrey Smirnov <andrew.smirnov@gmail.com>
Tested-by: Raul E Rangel <rrangel@chromium.org>
Reviewed-by: Raul E Rangel <rrangel@chromium.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c1a145a3ed upstream.
Commit 597c56e372 ("xhci: update bounce buffer with correct sg num")
caused the following build warnings:
drivers/usb/host/xhci-ring.c:676:19: warning: format '%ld' expects argument of type 'long int', but argument 3 has type 'size_t {aka unsigned int}' [-Wformat=]
Use %zu for printing size_t type in order to fix the warnings.
Fixes: 597c56e372 ("xhci: update bounce buffer with correct sg num")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Fabio Estevam <festevam@gmail.com>
Cc: stable <stable@vger.kernel.org>
Acked-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 597c56e372 upstream.
This change fixes a data corruption issue occurred on USB hard disk for
the case that bounce buffer is used during transferring data.
While updating data between sg list and bounce buffer, current
implementation passes mapped sg number (urb->num_mapped_sgs) to
sg_pcopy_from_buffer() and sg_pcopy_to_buffer(). This causes data
not get copied if target buffer is located in the elements after
mapped sg elements. This change passes sg number for full list to
fix issue.
Besides, for copying data from bounce buffer, calling dma_unmap_single()
on the bounce buffer before copying data to sg list can avoid cache issue.
Fixes: f9c589e142 ("xhci: TD-fragment, align the unsplittable case with a bounce buffer")
Cc: <stable@vger.kernel.org> # v4.8+
Signed-off-by: Henry Lin <henryl@nvidia.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ef4d6f6b27 upstream.
The ror32 implementation (word >> shift) | (word << (32 - shift) has
undefined behaviour if shift is outside the [1, 31] range. Similarly
for the 64 bit variants. Most callers pass a compile-time constant
(naturally in that range), but there's an UBSAN report that these may
actually be called with a shift count of 0.
Instead of special-casing that, we can make them DTRT for all values of
shift while also avoiding UB. For some reason, this was already partly
done for rol32 (which was well-defined for [0, 31]). gcc 8 recognizes
these patterns as rotates, so for example
__u32 rol32(__u32 word, unsigned int shift)
{
return (word << (shift & 31)) | (word >> ((-shift) & 31));
}
compiles to
0000000000000020 <rol32>:
20: 89 f8 mov %edi,%eax
22: 89 f1 mov %esi,%ecx
24: d3 c0 rol %cl,%eax
26: c3 retq
Older compilers unfortunately do not do as well, but this only affects
the small minority of users that don't pass constants.
Due to integer promotions, ro[lr]8 were already well-defined for shifts
in [0, 8], and ro[lr]16 were mostly well-defined for shifts in [0, 16]
(only mostly - u16 gets promoted to _signed_ int, so if bit 15 is set,
word << 16 is undefined). For consistency, update those as well.
Link: http://lkml.kernel.org/r/20190410211906.2190-1-linux@rasmusvillemoes.dk
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Reported-by: Ido Schimmel <idosch@mellanox.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Cc: Vadim Pasternak <vadimp@mellanox.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d3c976c14a upstream.
Previously, %g2 would end up with the value PAGE_SIZE, but after the
commit mentioned below it ends up with the value 1 due to being reused
for a different purpose. We need it to be PAGE_SIZE as we use it to step
through pages in our demap loop, otherwise we set different flags in the
low 12 bits of the address written to, thereby doing things other than a
nucleus page flush.
Fixes: a74ad5e660 ("sparc64: Handle extremely large kernel TLB range flushes more gracefully.")
Reported-by: Meelis Roos <mroos@linux.ee>
Tested-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: James Clarke <jrtc27@jrtc27.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 526f5b851a upstream.
Error message printed:
modprobe: ERROR: could not insert 'tipc': Address family not
supported by protocol.
when modprobe tipc after the following patch: switch order of
device registration, commit 7e27e8d613
("tipc: switch order of device registration to fix a crash")
Because sock_create_kern(net, AF_TIPC, ...) called by
tipc_topsrv_create_listener() in the initialization process
of tipc_init_net(), so tipc_socket_init() must be execute before that.
Meanwhile, tipc_net_id need to be initialized when sock_create()
called, and tipc_socket_init() is no need to be called for each namespace.
I add a variable tipc_topsrv_net_ops, and split the
register_pernet_subsys() of tipc into two parts, and split
tipc_socket_init() with initialization of pernet params.
By the way, I fixed resources rollback error when tipc_bcast_init()
failed in tipc_init_net().
Fixes: 7e27e8d613 ("tipc: switch order of device registration to fix a crash")
Signed-off-by: Junwei Hu <hujunwei4@huawei.com>
Reported-by: Wang Wang <wangwang2@huawei.com>
Reported-by: syzbot+1e8114b61079bfe9cbc5@syzkaller.appspotmail.com
Reviewed-by: Kang Zhou <zhoukang7@huawei.com>
Reviewed-by: Suanming Mou <mousuanming@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 357d065a44 upstream.
VMX ghash was using a fallback that did not support interleaving simd
and nosimd operations, leading to failures in the extended test suite.
If I understood correctly, Eric's suggestion was to use the same
data format that the generic code uses, allowing us to call into it
with the same contexts. I wasn't able to get that to work - I think
there's a very different key structure and data layout being used.
So instead steal the arm64 approach and perform the fallback
operations directly if required.
Fixes: cc333cd68d ("crypto: vmx - Adding GHASH routines for VMX module")
Cc: stable@vger.kernel.org # v4.1+
Reported-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 100f6d8e09 ]
TCP zerocopy takes a uarg reference for every skb, plus one for the
tcp_sendmsg_locked datapath temporarily, to avoid reaching refcnt zero
as it builds, sends and frees skbs inside its inner loop.
UDP and RAW zerocopy do not send inside the inner loop so do not need
the extra sock_zerocopy_get + sock_zerocopy_put pair. Commit
52900d22288ed ("udp: elide zerocopy operation in hot path") introduced
extra_uref to pass the initial reference taken in sock_zerocopy_alloc
to the first generated skb.
But, sock_zerocopy_realloc takes this extra reference at the start of
every call. With MSG_MORE, no new skb may be generated to attach the
extra_uref to, so refcnt is incorrectly 2 with only one skb.
Do not take the extra ref if uarg && !tcp, which implies MSG_MORE.
Update extra_uref accordingly.
This conditional assignment triggers a false positive may be used
uninitialized warning, so have to initialize extra_uref at define.
Changes v1->v2: fix typo in Fixes SHA1
Fixes: 52900d2228 ("udp: elide zerocopy operation in hot path")
Reported-by: syzbot <syzkaller@googlegroups.com>
Diagnosed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ab0610efab ]
This reverts commit 2391b0030e which has
introduced regression. Now SGE's BAR2 Doorbell/GTS Page Size is
interpreted correctly in the firmware itself by using actual host
page size. Hence previous commit needs to be reverted.
Signed-off-by: Vishal Kulkarni <vishal@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c3f4a6c39c ]
On device surprise removal path (the notifier) we can't
bail just because the features are disabled. They may
have been enabled during the lifetime of the device.
This bug leads to leaking netdev references and
use-after-frees if there are active connections while
device features are cleared.
Fixes: e8f6979981 ("net/tls: Add generic NIC offload infrastructure")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3686637e50 ]
TLS offload drivers shouldn't (and currently don't) block
the TLS offload feature changes based on whether there are
active offloaded connections or not.
This seems to be a good idea, because we want the admin to
be able to disable the TLS offload at any time, and there
is no clean way of disabling it for active connections
(TX side is quite problematic). So if features are cleared
existing connections will stay offloaded until they close,
and new connections will not attempt offload to a given
device.
However, the offload state removal handling is currently
broken if feature flags get cleared while there are
active TLS offloads.
RX side will completely bail from cleanup, even on normal
remove path, leaving device state dangling, potentially
causing issues when the 5-tuple is reused. It will also
fail to release the netdev reference.
Remove the RX-side warning message, in next release cycle
it should be printed when features are disabled, rather
than when connection dies, but for that we need a more
efficient method of finding connection of a given netdev
(a'la BPF offload code).
Fixes: 4799ac81e5 ("tls: Add rx inline crypto offload")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 043556d091 ]
Add a test which sends 15 bytes of data, and then tries
to read 10 byes twice. Previously the second read would
sleep indifinitely, since the record was already decrypted
and there is only 5 bytes left, not full 10.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 04b25a5411 ]
When tls_sw_recvmsg() partially copies a record it pops that
record from ctx->recv_pkt and places it on rx_list.
Next iteration of tls_sw_recvmsg() reads from rx_list via
process_rx_list() before it enters the decryption loop.
If there is no more records to be read tls_wait_data()
will put the process on the wait queue and got to sleep.
This is incorrect, because some data was already copied
in process_rx_list().
In case of RPC connections process may never get woken up,
because peer also simply blocks in read().
I think this may also fix a similar issue when BPF is at
play, because after __tcp_bpf_recvmsg() returns some data
we subtract it from len and use continue to restart the
loop, but len could have just reached 0, so again we'd
sleep unnecessarily. That's added by:
commit d3b18ad31f ("tls: add bpf support to sk_msg handling")
Fixes: 692d7b5d1f ("tls: Fix recvmsg() to be able to peek across multiple records")
Reported-by: David Beckett <david.beckett@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Tested-by: David Beckett <david.beckett@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 46a1695960 ]
If some of the data came from the previous record, i.e. from
the rx_list it had already been decrypted, so it's not counted
towards the "decrypted" variable, but the "copied" variable.
Take that into account when checking lowat.
When calculating lowat target we need to pass the original len.
E.g. if lowat is at 80, len is 100 and we had 30 bytes on rx_list
target would currently be incorrectly calculated as 70, even though
we only need 50 more bytes to make up the 80.
Fixes: 692d7b5d1f ("tls: Fix recvmsg() to be able to peek across multiple records")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Tested-by: David Beckett <david.beckett@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d629522e1d ]
Skip RDMA context memory allocations, reduce to 1 ring, and disable
TPA when running in the kdump kernel. Without this patch, the driver
fails to initialize with memory allocation errors when running in a
typical kdump kernel.
Fixes: cf6daed098 ("bnxt_en: Increase context memory allocations on 57500 chips for RDMA.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 1b3f0b75c3 ]
When making configuration changes, the driver calls bnxt_close_nic()
and then bnxt_open_nic() for the changes to take effect. A parameter
irq_re_init is passed to the call sequence to indicate if IRQ
should be re-initialized. This irq_re_init parameter needs to
be included in the bnxt_reserve_rings() call. bnxt_reserve_rings()
can only call pci_disable_msix() if the irq_re_init parameter is
true, otherwise it may hit BUG() because some IRQs may not have been
freed yet.
Fixes: 41e8d79837 ("bnxt_en: Modify the ring reservation functions for 57500 series chips.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 296d5b5416 ]
For every RX packet, the driver replenishes all buffers used for that
packet and puts them back into the RX ring and RX aggregation ring.
In one code path where the RX packet has one RX buffer and one or more
aggregation buffers, we missed recycling the aggregation buffer(s) if
we are unable to allocate a new SKB buffer. This leads to the
aggregation ring slowly running out of buffers over time. Fix it
by properly recycling the aggregation buffers.
Fixes: c0c050c58d ("bnxt_en: New Broadcom ethernet driver.")
Reported-by: Rakesh Hemnani <rhemnani@fb.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
stmmac_init_chan() needs to be called before stmmac_init_rx_chan() and
stmmac_init_tx_chan(). This is because if PBLx8 is to be used,
"DMA_CH(#i)_Control.PBLx8" needs to be set before programming
"DMA_CH(#i)_TX_Control.TxPBL" and "DMA_CH(#i)_RX_Control.RxPBL".
Fixes: 47f2a9ce52 ("net: stmmac: dma channel init prepared for multiple queues")
Reviewed-by: Zhang, Baoli <baoli.zhang@intel.com>
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: Weifeng Voon <weifeng.voon@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently ethtool was not able to get/set the flow control due to a
missing "!". It will always return -EOPNOTSUPP even the device is
flow control supported.
This patch fixes the condition check for ethtool flow control get/set
function for ETHTOOL_LINK_MODE_Asym_Pause_BIT.
Fixes: 3c1bcc8614 (“net: ethernet: Convert phydev advertize and supported from u32 to link mode”)
Signed-off-by: Tan, Tee Min <tee.min.tan@intel.com>
Reviewed-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: Voon, Weifeng <weifeng.voon@intel.com@intel.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c0194e2d0e ]
When CQE compression is enabled (Multi-host systems), compressed CQEs
might arrive to the driver rx, compressed CQEs don't have a valid hash
offload and the driver already reports a hash value of 0 and invalid hash
type on the skb for compressed CQEs, but this is not good enough.
On a congested PCIe, where CQE compression will kick in aggressively,
gro will deliver lots of out of order packets due to the invalid hash
and this might cause a serious performance drop.
The only valid solution, is to disable rxhash offload at all when CQE
compression is favorable (Multi-host systems).
Fixes: 7219ab34f1 ("net/mlx5e: CQE compression")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 25fa506b70 ]
root ns is yet another fs core node which is freed using kfree() by
tree_put_node().
Rest of the other fs core objects are also allocated using kmalloc
variants.
However, root ns memory is allocated using kvzalloc().
Hence allocate root ns memory using kzalloc().
Fixes: 2530236303 ("net/mlx5_core: Flow steering tree initialization")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
TLV_SET is called with a data pointer and a len parameter that tells us
how many bytes are pointed to by data. When invoking memcpy() we need
to careful to only copy len bytes.
Previously we would copy TLV_LENGTH(len) bytes which would copy an extra
4 bytes past the end of the data pointer which newer GCC versions
complain about.
In file included from test.c:17:
In function 'TLV_SET',
inlined from 'test' at test.c:186:5:
/usr/include/linux/tipc_config.h:317:3:
warning: 'memcpy' forming offset [33, 36] is out of the bounds [0, 32]
of object 'bearer_name' with type 'char[32]' [-Warray-bounds]
memcpy(TLV_DATA(tlv_ptr), data, tlv_len);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
test.c: In function 'test':
test.c::161:10: note:
'bearer_name' declared here
char bearer_name[TIPC_MAX_BEARER_NAME];
^~~~~~~~~~~
We still want to ensure any padding bytes at the end are initialised, do
this with a explicit memset() rather than copy bytes past the end of
data. Apply the same logic to TCM_SET.
Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 9414277a5d ]
In below code flow, for ingress acl table root ns memory leads
to double free.
mlx5_init_fs
init_ingress_acls_root_ns()
init_ingress_acl_root_ns
kfree(steering->esw_ingress_root_ns);
/* steering->esw_ingress_root_ns is not marked NULL */
mlx5_cleanup_fs
cleanup_ingress_acls_root_ns
steering->esw_ingress_root_ns non NULL check passes.
kfree(steering->esw_ingress_root_ns);
/* double free */
Similar issue exist for other tables.
Hence zero out the pointers to not process the table again.
Fixes: 9b93ab981e ("net/mlx5: Separate ingress/egress namespaces for each vport")
Fixes: 40c3eebb49e51 ("net/mlx5: Add support in RDMA RX steering")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ad70411a97 ]
When disconnecting cdc_ncm the kernel sporadically crashes shortly
after the disconnect:
[ 57.868812] Unable to handle kernel NULL pointer dereference at virtual address 00000000
...
[ 58.006653] PC is at 0x0
[ 58.009202] LR is at call_timer_fn+0xec/0x1b4
[ 58.013567] pc : [<0000000000000000>] lr : [<ffffff80080f5130>] pstate: 00000145
[ 58.020976] sp : ffffff8008003da0
[ 58.024295] x29: ffffff8008003da0 x28: 0000000000000001
[ 58.029618] x27: 000000000000000a x26: 0000000000000100
[ 58.034941] x25: 0000000000000000 x24: ffffff8008003e68
[ 58.040263] x23: 0000000000000000 x22: 0000000000000000
[ 58.045587] x21: 0000000000000000 x20: ffffffc68fac1808
[ 58.050910] x19: 0000000000000100 x18: 0000000000000000
[ 58.056232] x17: 0000007f885aff8c x16: 0000007f883a9f10
[ 58.061556] x15: 0000000000000001 x14: 000000000000006e
[ 58.066878] x13: 0000000000000000 x12: 00000000000000ba
[ 58.072201] x11: ffffffc69ff1db30 x10: 0000000000000020
[ 58.077524] x9 : 8000100008001000 x8 : 0000000000000001
[ 58.082847] x7 : 0000000000000800 x6 : ffffff8008003e70
[ 58.088169] x5 : ffffffc69ff17a28 x4 : 00000000ffff138b
[ 58.093492] x3 : 0000000000000000 x2 : 0000000000000000
[ 58.098814] x1 : 0000000000000000 x0 : 0000000000000000
...
[ 58.205800] [< (null)>] (null)
[ 58.210521] [<ffffff80080f5298>] expire_timers+0xa0/0x14c
[ 58.215937] [<ffffff80080f542c>] run_timer_softirq+0xe8/0x128
[ 58.221702] [<ffffff8008081120>] __do_softirq+0x298/0x348
[ 58.227118] [<ffffff80080a6304>] irq_exit+0x74/0xbc
[ 58.232009] [<ffffff80080e17dc>] __handle_domain_irq+0x78/0xac
[ 58.237857] [<ffffff8008080cf4>] gic_handle_irq+0x80/0xac
...
The crash happens roughly 125..130ms after the disconnect. This
correlates with the 'delay' timer that is started on certain USB tx/rx
errors in the URB completion handler.
The problem is a race of usbnet_stop() with usbnet_start_xmit(). In
usbnet_stop() we call usbnet_terminate_urbs() to cancel all URBs in
flight. This only makes sense if no new URBs are submitted
concurrently, though. But the usbnet_start_xmit() can run at the same
time on another CPU which almost unconditionally submits an URB. The
error callback of the new URB will then schedule the timer after it was
already stopped.
The fix adds a check if the tx queue is stopped after the tx list lock
has been taken. This should reliably prevent the submission of new URBs
while usbnet_terminate_urbs() does its job. The same thing is done on
the rx side even though it might be safe due to other flags that are
checked there.
Signed-off-by: Jan Klötzke <Jan.Kloetzke@preh.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 59715171fb ]
(At least) RTL8168e forgets its MAC address in PCI D3. To fix this set
the MAC address when resuming. For resuming from runtime-suspend we
had this in place already, for resuming from S3/S5 it was missing.
The commit referenced as being fixed isn't wrong, it's just the first
one where the patch applies cleanly.
Fixes: 0f07bd850d ("r8169: use dev_get_drvdata where possible")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reported-by: Albert Astals Cid <aacid@kde.org>
Tested-by: Albert Astals Cid <aacid@kde.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 49ce881c0d ]
Commit 984203ceff ("net: stmmac: mdio: remove reset gpio free")
removed the reset gpio free, when the driver is unbinded or rmmod,
we miss the gpio free.
This patch uses managed API to request the reset gpio, so that the
gpio could be freed properly.
Fixes: 984203ceff ("net: stmmac: mdio: remove reset gpio free")
Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4097e9d250 ]
Function tcf_action_dump() relies on tc_action->order field when starting
nested nla to send action data to userspace. This approach breaks in
several cases:
- When multiple filters point to same shared action, tc_action->order field
is overwritten each time it is attached to filter. This causes filter
dump to output action with incorrect attribute for all filters that have
the action in different position (different order) from the last set
tc_action->order value.
- When action data is displayed using tc action API (RTM_GETACTION), action
order is overwritten by tca_action_gd() according to its position in
resulting array of nl attributes, which will break filter dump for all
filters attached to that shared action that expect it to have different
order value.
Don't rely on tc_action->order when dumping actions. Set nla according to
action position in resulting array of actions instead.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3d3ced2ec5 ]
Some boards do not have the PHY firmware programmed in the 3310's flash,
which leads to the PHY not working as expected. Warn the user when the
PHY fails to boot the firmware and refuse to initialise.
Fixes: 20b2af32ff ("net: phy: add Marvell Alaska X 88X3310 10Gigabit PHY support")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2180843721 ]
MVPP2_TXQ_SCHED_TOKEN_CNTR_REG() expects the logical queue id but
the current code is passing the global tx queue offset, so it ends
up writing to unknown registers (between 0x8280 and 0x82fc, which
seemed to be unused by the hardware). This fixes the issue by using
the logical queue id instead.
Fixes: 3f518509de ("ethernet: Add new driver for Marvell Armada 375 network unit")
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d484e06e25 ]
Fix below issues in err code path of probe:
1. we don't need to unregister_netdev() because the netdev isn't
registered.
2. when register_netdev() fails, we also need to destroy bm pool for
HWBM case.
Fixes: dc35a10f68 ("net: mvneta: bm: add support for hardware buffer management")
Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a4270d6795 ]
If a network driver provides to napi_gro_frags() an
skb with a page fragment of exactly 14 bytes, the call
to gro_pull_from_frag0() will 'consume' the fragment
by calling skb_frag_unref(skb, 0), and the page might
be freed and reused.
Reading eth->h_proto at the end of napi_frags_skb() might
read mangled data, or crash under specific debugging features.
BUG: KASAN: use-after-free in napi_frags_skb net/core/dev.c:5833 [inline]
BUG: KASAN: use-after-free in napi_gro_frags+0xc6f/0xd10 net/core/dev.c:5841
Read of size 2 at addr ffff88809366840c by task syz-executor599/8957
CPU: 1 PID: 8957 Comm: syz-executor599 Not tainted 5.2.0-rc1+ #32
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load_n_noabort+0xf/0x20 mm/kasan/generic_report.c:142
napi_frags_skb net/core/dev.c:5833 [inline]
napi_gro_frags+0xc6f/0xd10 net/core/dev.c:5841
tun_get_user+0x2f3c/0x3ff0 drivers/net/tun.c:1991
tun_chr_write_iter+0xbd/0x156 drivers/net/tun.c:2037
call_write_iter include/linux/fs.h:1872 [inline]
do_iter_readv_writev+0x5f8/0x8f0 fs/read_write.c:693
do_iter_write fs/read_write.c:970 [inline]
do_iter_write+0x184/0x610 fs/read_write.c:951
vfs_writev+0x1b3/0x2f0 fs/read_write.c:1015
do_writev+0x15b/0x330 fs/read_write.c:1058
Fixes: a50e233c50 ("net-gro: restore frag0 optimization")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ce8d24f9a5 ]
Fix the clk mismatch in the error path "failed_reset" because
below error path will disable clk_ahb and clk_ipg directly, it
should use pm_runtime_put_noidle() instead of pm_runtime_put()
to avoid to call runtime resume callback.
Reported-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Fugang Duan <fugang.duan@nxp.com>
Tested-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 84b3fd1fc9 ]
Currently, the upper half of a 4-byte STATS_TYPE_PORT statistic ends
up in bits 47:32 of the return value, instead of bits 31:16 as they
should.
Fixes: 6e46e2d821 ("net: dsa: mv88e6xxx: Fix u64 statistics")
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ef74422020 ]
When identical rules are inserted, the latter one goes to C-TCAM. For
that, a second eRP with the same mask is created. These 2 eRPs by the
nature cannot be merged and also one cannot be parent of another.
Teach mlxsw_sp_acl_erp_delta_fill() about this possibility and handle it
gracefully.
Reported-by: Alex Kushnarov <alexanderk@mellanox.com>
Fixes: c22291f7cf ("mlxsw: spectrum: acl: Implement delta for ERP")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 31680ac265 ]
IPv6 redirect is broken for VRF. __ip6_route_redirect walks the FIB
entries looking for an exact match on ifindex. With VRF the flowi6_oif
is updated by l3mdev_update_flow to the l3mdev index and the
FLOWI_FLAG_SKIP_NH_OIF set in the flags to tell the lookup to skip the
device match. For redirects the device match is requires so use that
flag to know when the oif needs to be reset to the skb device index.
Fixes: ca254490c8 ("net: Add VRF support to IPv6 stack")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 72f7cfab6f ]
IPv6 does not consider if the socket is bound to a device when binding
to an address. The result is that a socket can be bound to eth0 and
then bound to the address of eth1. If the device is a VRF, the result
is that a socket can only be bound to an address in the default VRF.
Resolve by considering the device if sk_bound_dev_if is set.
Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Tested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 903869bd10 ]
ip_sf_list_clear_all() needs to be defined even if !CONFIG_IP_MULTICAST
Fixes: 3580d04aa6 ("ipv4/igmp: fix another memory leak in igmpv3_del_delrec()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit df453700e8 ]
According to Amit Klein and Benny Pinkas, IP ID generation is too weak
and might be used by attackers.
Even with recent net_hash_mix() fix (netns: provide pure entropy for net_hash_mix())
having 64bit key and Jenkins hash is risky.
It is time to switch to siphash and its 128bit keys.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Amit Klein <aksecurity@gmail.com>
Reported-by: Benny Pinkas <benny@pinkas.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b73484b2fc ]
When parsing an ethtool flow spec to build a flow_rule, the code checks
if both the vlan etype and the vlan tci are specified by the user to add
a FLOW_DISSECTOR_KEY_VLAN match.
However, when the user only specified a vlan etype or a vlan tci, this
check silently ignores these parameters.
For example, the following rule :
ethtool -N eth0 flow-type udp4 vlan 0x0010 action -1 loc 0
will result in no error being issued, but the equivalent rule will be
created and passed to the NIC driver :
ethtool -N eth0 flow-type udp4 action -1 loc 0
In the end, neither the NIC driver using the rule nor the end user have
a way to know that these keys were dropped along the way, or that
incorrect parameters were entered.
This kind of check should be left to either the driver, or the ethtool
flow spec layer.
This commit makes so that ethtool parameters are forwarded as-is to the
NIC driver.
Since none of the users of ethtool_rx_flow_rule_create are using the
VLAN dissector, I don't think this qualifies as a regression.
Fixes: eca4205f9e ("ethtool: add ethtool_rx_flow_spec to flow_rule structure translator")
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Acked-by: Pablo Neira Ayuso <pablo@gnumonks.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b5730061d1 ]
VLAN flows never get offloaded unless ivlan_vld is set in filter spec.
It's not compulsory for vlan_ethtype to be set.
So, always enable ivlan_vld bit for offloading VLAN flows regardless of
vlan_ethtype is set or not.
Fixes: ad9af3e09c (cxgb4: add tc flower match support for vlan)
Signed-off-by: Raju Rangoju <rajur@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 334031219a ]
Once in a while, with just the right timing, 802.3ad slaves will fail to
properly initialize, winding up in a weird state, with a partner system
mac address of 00:00:00:00:00:00. This started happening after a fix to
properly track link_failure_count tracking, where an 802.3ad slave that
reported itself as link up in the miimon code, but wasn't able to get a
valid speed/duplex, started getting set to BOND_LINK_FAIL instead of
BOND_LINK_DOWN. That was the proper thing to do for the general "my link
went down" case, but has created a link initialization race that can put
the interface in this odd state.
The simple fix is to instead set the slave link to BOND_LINK_DOWN again,
if the link has never been up (last_link_up == 0), so the link state
doesn't bounce from BOND_LINK_DOWN to BOND_LINK_FAIL -- it hasn't failed
in this case, it simply hasn't been up yet, and this prevents the
unnecessary state change from DOWN to FAIL and getting stuck in an init
failure w/o a partner mac.
Fixes: ea53abfab9 ("bonding/802.3ad: fix link_failure_count tracking")
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: "David S. Miller" <davem@davemloft.net>
CC: netdev@vger.kernel.org
Tested-by: Heesoon Kim <Heesoon.Kim@stratus.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5abac9d7e1 ]
Currently we check if the __ICE_PREPARED_FOR_RESET bit is set prior to
calling ice_prepare_for_reset in ice_reset_subtask(), but we aren't
checking that bit in ice_do_reset() before calling
ice_prepare_for_reset(). This is not consistent and can cause issues if
ice_prepare_for_reset() is called prior to ice_do_reset(). Fix this by
checking if the __ICE_PREPARED_FOR_RESET bit is set internal to
ice_prepare_for_reset().
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fa3c098c2d ]
As Hans de Goede pointed, using this driver without ACPI
makes little sense, so add ACPI dependency to Kconfig entry
to fix a build error while CONFIG_ACPI is not set.
drivers/extcon/extcon-axp288.c: In function 'axp288_extcon_probe':
drivers/extcon/extcon-axp288.c:363:20: error: dereferencing pointer to incomplete type
put_device(&adev->dev);
Fixes: 0cf064db94 ("extcon: axp288: Convert to use acpi_dev_get_first_match_dev()")
Reported-by: Hulk Robot <hulkci@huawei.com>
Suggested-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Mukesh Ojha <mojha@codeaurora.org>
Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d1ffa760d2 ]
The quiesce function calls cio_cancel_halt_clear() and if we
get an -EBUSY we go into a loop where we:
- wait for any interrupts
- flush all I/O in the workqueue
- retry cio_cancel_halt_clear
During the period where we are waiting for interrupts or
flushing all I/O, the channel subsystem could have completed
a halt/clear action and turned off the corresponding activity
control bits in the subchannel status word. This means the next
time we call cio_cancel_halt_clear(), we will again start by
calling cancel subchannel and so we can be stuck between calling
cancel and halt forever.
Rather than calling cio_cancel_halt_clear() immediately after
waiting, let's try to disable the subchannel. If we succeed in
disabling the subchannel then we know nothing else can happen
with the device.
Suggested-by: Eric Farman <farman@linux.ibm.com>
Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
Message-Id: <4d5a4b98ab1b41ac6131b5c36de18b76c5d66898.1555449329.git.alifm@linux.ibm.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Acked-by: Halil Pasic <pasic@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit da676c6aa6 ]
The current calculation for the video start delay in the current DSI driver
is that it is the total vertical size, minus the front porch and sync length,
plus 1. This equals to the active vertical size plus the back porch plus 1.
That 1 is coming in the Allwinner BSP from an variable that is set to 1.
However, if we look at the Allwinner BSP more closely, and especially in
the "legacy" code for the display (in drivers/video/sunxi/legacy/), we can
see that this variable is actually computed from the porches and the sync
minus 10, clamped between 8 and 100.
This fixes the start delay symptom we've seen on some panels (vblank
timeouts with vertical white stripes at the bottom of the panel).
Reviewed-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
Link: https://patchwork.freedesktop.org/patch/msgid/6e5f72e68f47ca0223877464bf12f0c3f3978de8.1549896081.git-series.maxime.ripard@bootlin.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4bc46da4a3 ]
[Why]
Seamless boot tries to reuse planes that were enabled for the first
commit applied.
In the case where Raven is booting with two monitors connected and the
first commit contains two streams the screen corruption would occur
because the second stream was trying to re-use a tg and plane that
weren't previously enabled.
The state on the first commit looks something like the following:
TG0: enabled=1
TG1: enabled=0
TG2: enabled=0
TG3: enabled=0
New state: pipe=0, stream=0, plane=0, new_tg=0
New state: pipe=1, stream=1, plane=1, new_tg=1
New state: pipe=2, stream=NULL, plane=NULL, new_tg=NULL
New state: pipe=3, stream=NULL, plane=NULL, new_tg=NULL
Only one plane/tg is setup before we enter accelerated mode so
we really want to disabling everything but that first plane.
[How]
Check if the stream is not NULL and if the tg is enabled before
deciding whether to skip the plane disable.
Also ensure we're also disabling on the current state's pipe_ctx so
we don't overwrite the fields in the new pending state.
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Reviewed-by: Anthony Koo <Anthony.Koo@amd.com>
Acked-by: Harry Wentland <Harry.Wentland@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit dcf1a98867 ]
[Why]
AUX arbitration occurs between SW and FW components.
When AUX acquire fails, it causes engine->ddc to be NULL,
which leads to an exception when we try to release the AUX
engine.
[How]
When AUX engine acquire fails, it should return from the
function without trying to continue the operation.
The upper level will determine if it wants to retry.
i.e. dce_aux_transfer_with_retries will be used and retry.
Signed-off-by: Anthony Koo <Anthony.Koo@amd.com>
Reviewed-by: Aric Cyr <Aric.Cyr@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7d7b25d05e ]
The SND_SOC_DAVINCI_MCASP driver can use either edma or sdma as
a back-end, and it takes the presence of the respective dma engine
drivers in the configuration as an indication to which ones should be
built. However, this is flawed in multiple ways:
- With CONFIG_TI_EDMA=m and CONFIG_SND_SOC_DAVINCI_MCASP=y,
is enabled as =m, and we get a link error:
sound/soc/ti/davinci-mcasp.o: In function `davinci_mcasp_probe':
davinci-mcasp.c:(.text+0x930): undefined reference to `edma_pcm_platform_register'
- When CONFIG_SND_SOC_DAVINCI_MCASP=m has already been selected by
another driver, the same link error appears even if CONFIG_TI_EDMA
is disabled
There are possibly other issues here, but it seems that the only reasonable
solution is to always build both SND_SOC_TI_EDMA_PCM and
SND_SOC_TI_SDMA_PCM as a dependency here. Both are fairly small and
do not have any other compile-time dependencies, so the cost is
very small, and makes the configuration stage much more consistent.
Fixes: f2055e145f ("ASoC: ti: Merge davinci and omap directories")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8ca5104715 ]
Building with clang shows a variable that is only used by the
suspend/resume functions but defined outside of their #ifdef block:
sound/soc/ti/davinci-mcasp.c:48:12: error: variable 'context_regs' is not needed and will not be emitted
We commonly fix these by marking the PM functions as __maybe_unused,
but here that would grow the davinci_mcasp structure, so instead
add another #ifdef here.
Fixes: 1cc0c054f3 ("ASoC: davinci-mcasp: Convert the context save/restore to use array")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5442dcaa0d ]
This fixes a bug for messages containing both zero length and
unidirectional xfers.
The function spi_map_msg will allocate dummy tx and/or rx buffers
for use with unidirectional transfers when the hardware can only do
a bidirectional transfer. That dummy buffer will be used in place
of a NULL buffer even when the xfer length is 0.
Then in the function __spi_map_msg, if he hardware can dma,
the zero length xfer will have spi_map_buf called on the dummy
buffer.
Eventually, __sg_alloc_table is called and returns -EINVAL
because nents == 0.
This fix prevents the error by not using the dummy buffer when
the xfer length is zero.
Signed-off-by: Chris Lesiak <chris.lesiak@licor.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5e6afb3832 ]
The mutex for the regulator_dev must be controlled by the caller of
the regulator_notifier_call_chain(), as described in the comment
for that function.
Failure to mutex lock and unlock surrounding the notifier call results
in a kernel WARN_ON_ONCE() which will dump a backtrace for the
regulator_notifier_call_chain() when that function call is first made.
The mutex can be controlled using the regulator_lock/unlock() API.
Fixes: f6130be652 ("regulator: DA9055 regulator driver")
Suggested-by: Adam Thomson <Adam.Thomson.Opensource@diasemi.com>
Signed-off-by: Steve Twiss <stwiss.opensource@diasemi.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 978995def0 ]
The mutex for the regulator_dev must be controlled by the caller of
the regulator_notifier_call_chain(), as described in the comment
for that function.
Failure to mutex lock and unlock surrounding the notifier call results
in a kernel WARN_ON_ONCE() which will dump a backtrace for the
regulator_notifier_call_chain() when that function call is first made.
The mutex can be controlled using the regulator_lock/unlock() API.
Fixes: 4068e5182a ("regulator: da9062: DA9062 regulator driver")
Suggested-by: Adam Thomson <Adam.Thomson.Opensource@diasemi.com>
Signed-off-by: Steve Twiss <stwiss.opensource@diasemi.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 275513b769 ]
The mutex for the regulator_dev must be controlled by the caller of
the regulator_notifier_call_chain(), as described in the comment
for that function.
Failure to mutex lock and unlock surrounding the notifier call results
in a kernel WARN_ON_ONCE() which will dump a backtrace for the
regulator_notifier_call_chain() when that function call is first made.
The mutex can be controlled using the regulator_lock/unlock() API.
Fixes: c90456e36d ("regulator: pv88090: new regulator driver")
Suggested-by: Adam Thomson <Adam.Thomson.Opensource@diasemi.com>
Signed-off-by: Steve Twiss <stwiss.opensource@diasemi.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 119c4f5085 ]
The mutex for the regulator_dev must be controlled by the caller of
the regulator_notifier_call_chain(), as described in the comment
for that function.
Failure to mutex lock and unlock surrounding the notifier call results
in a kernel WARN_ON_ONCE() which will dump a backtrace for the
regulator_notifier_call_chain() when that function call is first made.
The mutex can be controlled using the regulator_lock/unlock() API.
Fixes: e4ee831f94 ("regulator: Add WM831x DC-DC buck convertor support")
Suggested-by: Adam Thomson <Adam.Thomson.Opensource@diasemi.com>
Signed-off-by: Steve Twiss <stwiss.opensource@diasemi.com>
Acked-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1867af94cf ]
The mutex for the regulator_dev must be controlled by the caller of
the regulator_notifier_call_chain(), as described in the comment
for that function.
Failure to mutex lock and unlock surrounding the notifier call results
in a kernel WARN_ON_ONCE() which will dump a backtrace for the
regulator_notifier_call_chain() when that function call is first made.
The mutex can be controlled using the regulator_lock/unlock() API.
Fixes: 99cf3af5e2 ("regulator: pv88080: new regulator driver")
Suggested-by: Adam Thomson <Adam.Thomson.Opensource@diasemi.com>
Signed-off-by: Steve Twiss <stwiss.opensource@diasemi.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 29d40b4a57 ]
The mutex for the regulator_dev must be controlled by the caller of
the regulator_notifier_call_chain(), as described in the comment
for that function.
Failure to mutex lock and unlock surrounding the notifier call results
in a kernel WARN_ON_ONCE() which will dump a backtrace for the
regulator_notifier_call_chain() when that function call is first made.
The mutex can be controlled using the regulator_lock/unlock() API.
Fixes: 69ca3e58d1 ("regulator: da9063: Add Dialog DA9063 voltage regulators support.")
Suggested-by: Adam Thomson <Adam.Thomson.Opensource@diasemi.com>
Signed-off-by: Steve Twiss <stwiss.opensource@diasemi.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 65378de335 ]
The mutex for the regulator_dev must be controlled by the caller of
the regulator_notifier_call_chain(), as described in the comment
for that function.
Failure to mutex lock and unlock surrounding the notifier call results
in a kernel WARN_ON_ONCE() which will dump a backtrace for the
regulator_notifier_call_chain() when that function call is first made.
The mutex can be controlled using the regulator_lock/unlock() API.
Fixes: 1028a37daa ("regulator: da9211: new regulator driver")
Suggested-by: Adam Thomson <Adam.Thomson.Opensource@diasemi.com>
Signed-off-by: Steve Twiss <stwiss.opensource@diasemi.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 89b2758c19 ]
The mutex for the regulator_dev must be controlled by the caller of
the regulator_notifier_call_chain(), as described in the comment
for that function.
Failure to mutex lock and unlock surrounding the notifier call results
in a kernel WARN_ON_ONCE() which will dump a backtrace for the
regulator_notifier_call_chain() when that function call is first made.
The mutex can be controlled using the regulator_lock/unlock() API.
Fixes: b59320cc5a ("regulator: lp8755: new driver for LP8755")
Suggested-by: Adam Thomson <Adam.Thomson.Opensource@diasemi.com>
Signed-off-by: Steve Twiss <stwiss.opensource@diasemi.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c842749ea1 ]
Commit 71abd29057 ("spi: imx: Add support for SPI Slave mode") added
an RX FIFO flush before start of a transfer. In slave mode, the master
may have sent more data than expected and this data will still be in the
RX FIFO at the start of the next transfer, and so needs to be flushed.
However, the code to do the flush was accidentally saving this data into
the previous transfer's RX buffer, clobbering the contents of whatever
followed that buffer.
Change it to empty the FIFO and throw away the data. Every one of the
RX functions for the different eCSPI versions and modes reads the RX
FIFO data using the same readl() call, so just use that, rather than
using the spi_imx->rx function pointer and making sure all the different
rx functions have a working "throw away" mode.
There is another issue, which affects master mode when switching from
DMA to PIO. There can be extra data in the RX FIFO which triggers this
flush code, causing memory corruption in the same manner. I don't know
why this data is unexpectedly in the FIFO. It's likely there is a
different bug or erratum responsible for that. But regardless of that,
I think this is proper fix the for bug at hand here.
Fixes: 71abd29057 ("spi: imx: Add support for SPI Slave mode")
Cc: Jiada Wang <jiada_wang@mentor.com>
Cc: Fabio Estevam <festevam@gmail.com>
Cc: Stefan Agner <stefan@agner.ch>
Cc: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Trent Piepho <tpiepho@impinj.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f582136372 ]
The mutex for the regulator_dev must be controlled by the caller of
the regulator_notifier_call_chain(), as described in the comment
for that function.
Failure to mutex lock and unlock surrounding the notifier call results
in a kernel WARN_ON_ONCE() which will dump a backtrace for the
regulator_notifier_call_chain() when that function call is first made.
The mutex can be controlled using the regulator_lock/unlock() API.
Fixes: f307a7e9b7 ("regulator: pv88060: new regulator driver")
Suggested-by: Adam Thomson <Adam.Thomson.Opensource@diasemi.com>
Signed-off-by: Steve Twiss <stwiss.opensource@diasemi.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f132da2534 ]
The mutex for the regulator_dev must be controlled by the caller of
the regulator_notifier_call_chain(), as described in the comment
for that function.
Failure to mutex lock and unlock surrounding the notifier call results
in a kernel WARN_ON_ONCE() which will dump a backtrace for the
regulator_notifier_call_chain() when that function call is first made.
The mutex can be controlled using the regulator_lock/unlock() API.
Fixes: 3eb2c7ecb7 ("regulator: Add LTC3589 support")
Suggested-by: Adam Thomson <Adam.Thomson.Opensource@diasemi.com>
Signed-off-by: Steve Twiss <stwiss.opensource@diasemi.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 769fc8d418 ]
The mutex for the regulator_dev must be controlled by the caller of
the regulator_notifier_call_chain(), as described in the comment
for that function.
Failure to mutex lock and unlock surrounding the notifier call results
in a kernel WARN_ON_ONCE() which will dump a backtrace for the
regulator_notifier_call_chain() when that function call is first made.
The mutex can be controlled using the regulator_lock/unlock() API.
Fixes: 37b918a034 ("regulator: Add LTC3676 support")
Suggested-by: Adam Thomson <Adam.Thomson.Opensource@diasemi.com>
Signed-off-by: Steve Twiss <stwiss.opensource@diasemi.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f7a621728a ]
The mutex for the regulator_dev must be controlled by the caller of
the regulator_notifier_call_chain(), as described in the comment
for that function.
Failure to mutex lock and unlock surrounding the notifier call results
in a kernel WARN_ON_ONCE() which will dump a backtrace for the
regulator_notifier_call_chain() when that function call is first made.
The mutex can be controlled using the regulator_lock/unlock() API.
Fixes: d4d6b722e7 ("regulator: Add WM831x ISINK support")
Suggested-by: Adam Thomson <Adam.Thomson.Opensource@diasemi.com>
Signed-off-by: Steve Twiss <stwiss.opensource@diasemi.com>
Acked-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8be64b6d87 ]
The mutex for the regulator_dev must be controlled by the caller of
the regulator_notifier_call_chain(), as described in the comment
for that function.
Failure to mutex lock and unlock surrounding the notifier call results
in a kernel WARN_ON_ONCE() which will dump a backtrace for the
regulator_notifier_call_chain() when that function call is first made.
The mutex can be controlled using the regulator_lock/unlock() API.
Fixes: d1c6b4fe66 ("regulator: Add WM831x LDO support")
Suggested-by: Adam Thomson <Adam.Thomson.Opensource@diasemi.com>
Signed-off-by: Steve Twiss <stwiss.opensource@diasemi.com>
Acked-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 26843bb128 ]
While the sequencer is reset after each SPI message since commit
880c6d114f ("spi: rspi: Add support for Quad and Dual SPI
Transfers on QSPI"), it was never reset for the first message, thus
relying on reset state or bootloader settings.
Fix this by initializing it explicitly during configuration.
Fixes: 0b2182ddac ("spi: add support for Renesas RSPI")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fe4ed1b457 ]
Currently dsi_display_init_dsi() calls dss_pll_enable() but it is not
paired with dss_pll_disable() in dsi_display_uninit_dsi(). This leaves
the DSS clocks enabled when the display is blanked wasting about extra
5mW of power while idle.
The clock that is left on by not calling dss_pll_disable() is
DSS_CLKCTRL bit 10 OPTFCLKEN_SYS_CLK that is the source clock for
DSI PLL.
We can fix this issue by by making the current dsi_pll_uninit() into
dsi_pll_disable(). This way we can just call dss_pll_disable() from
dsi_display_uninit_dsi() and the code becomes a bit easier to follow.
However, we need to also consider that DSI PLL can be muxed for DVI too
as pointed out by Tomi Valkeinen <tomi.valkeinen@ti.com>. In the DVI
case, we want to unconditionally disable the clocks. To get around this
issue, we separate out the DSI lane handling from dsi_pll_enable() and
dsi_pll_disable() as suggested by Tomi in an earlier experimental patch.
So we must only toggle the DSI regulator based on the vdds_dsi_enabled
flag from dsi_display_init_dsi() and dsi_display_uninit_dsi().
We need to make these two changes together to avoid breaking things
for DVI when fixing the DSI clock handling. And this all causes a
slight renumbering of the error path for dsi_display_init_dsi().
Suggested-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e482ae9b5f ]
Writeback jobs are allocated when the WRITEBACK_FB_ID is set, and
deleted when the jobs complete. This results in both a memory leak of
the job and a leak of the framebuffer if the atomic commit returns
before the job is queued for processing, for instance if the atomic
check fails or if the commit runs in test-only mode.
Fix this by implementing the drm_writeback_cleanup_job() function and
calling it from __drm_atomic_helper_connector_destroy_state(). As
writeback jobs are removed from the state when they're queued for
processing, any job left in the state when the state gets destroyed
needs to be cleaned up.
The existing declaration of the drm_writeback_cleanup_job() function
without an implementation hints that this problem was considered, but
never addressed.
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Reviewed-by: Brian Starkey <brian.starkey@arm.com>
Acked-by: Liviu Dudau <liviu.dudau@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f37d8e67f3 ]
pch_alloc_dma_buf allocated tx, rx DMA buffers which can fail. Further,
these buffers are used without a check. The patch checks for these
failures and sends the error upstream.
Signed-off-by: Aditya Pakki <pakki001@umn.edu>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9b16406864 ]
When unloading the driver, mailbox commands may be sent without holding a
reference on the ndlp. By the time the mailbox command completes, the ndlp
may have reduced its ref counts and been freed. The problem was reported
by KASAN.
While unregistering due to driver unload, have the completion noop'd by
setting the ndlp context NULL'd. Due to the unload, no further action was
necessary. Also, while reviewing this path, the generic nulling of the
context after handling should be slightly moved.
Reported by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 50e3f871fb ]
A patch in the 12.2.0.0 set caused a new lockdep warning:
WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
5.0.0-rc8-next-20190301-dbg+ #1 Not tainted
Possible interrupt unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&(&qp->io_buf_list_put_lock)->rlock);
local_irq_disable();
lock(&(&phba->hbalock)->rlock);
lock(&(&qp->io_buf_list_put_lock)->rlock);
<Interrupt>
lock(&(&phba->hbalock)->rlock);
see: https://www.spinics.net/lists/linux-scsi/msg128389.html
In summary, the new patch added taking the io_buf_list_put_lock while under
an irq-disabled hbalock. This created a lock heirarchy dependent upon irq
being disabled, and there are paths that take the io_buf_list_put_lock
without disabling irq.
Looking at the lpfc_io_free routine, which is where the new heirarchy was
introduced, there is no reason to be taking out the hbalock and raising
irq, as the functionality is replaced by the io_buf_list_xxx locks.
Resolve by removing the hbalock/irq calls in lpfc_io_free.
Fixes: 5e5b511d8b ("scsi: lpfc: Partition XRI buffer list across Hardware Queues")
Reported-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ff6bf89717 ]
A prior patch which added support for non-uniform allocation of MSIX
vectors now causes a smatch complaint:
drivers/scsi/lpfc/lpfc_scsi.c:3674 lpfc_scsi_cmd_iocb_cmpl()
error: we previously assumed 'phba->sli4_hba.hdwq' could be
null (see line 3667)
Resolve by removing the unnecessary check for a NULL hdwq table.
Fixes 6a828b0f61: ("scsi: lpfc: Support non-uniform allocation of MSIX vectors to hardware queues")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e8869f5b0a ]
The adapter initialization sequence enables interrupts, initializes the
adapter link_state to LINK_DOWN, then issues commands to initialize the
adapter. The interrupt handler on the adapter validates the link_state (has
to be at least LINK_DOWN) and if invalid, will discard the interrupting
event.
In most cases, there is not a command completion, thus an interrupt until
the initialization commands have been sent which is post the setting of
state to LINK_DOWN. However, in cases of firmware reset, the reset will
modify the link_state to an invalid value (indicating a reset of the
adapter) and there occasionally are cases where the adapter will generate
an asynchronous event which shares the eq/cq used for mailbox commands. In
the failure case, an interrupt is generated immediately after enabling them
due to the async event. As link_state is invalid, the eq is list and the
CQ not serviced. At this point link_state is initialized and the mailbox
command sent. As the CQ has not been serviced, it is not armed, so no
interrupt event is generated when the mailbox command completes.
Modify the initialization sequence so that interrupts are enabled after
link_state is properly initialized, which avoids the race condition with
the async event.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c95a3b4b0f ]
During debug, it was seen that the driver is issuing commands specific to
SLI3 on SLI4 devices. Although the adapter correctly rejected the command,
this should not be done.
Revise the code to stop sending these commands on a SLI4 adapter.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 03aa4f191a ]
Two saa7146/hexium files contain a construct that causes a warning
when built with clang:
drivers/media/pci/saa7146/hexium_orion.c:210:12: error: stack frame size of 2272 bytes in function 'hexium_probe'
[-Werror,-Wframe-larger-than=]
static int hexium_probe(struct saa7146_dev *dev)
^
drivers/media/pci/saa7146/hexium_gemini.c:257:12: error: stack frame size of 2304 bytes in function 'hexium_attach'
[-Werror,-Wframe-larger-than=]
static int hexium_attach(struct saa7146_dev *dev, struct saa7146_pci_extension_data *info)
^
This one happens regardless of KASAN, and the problem is that a
constructor to initialize a dynamically allocated structure leads
to a copy of that structure on the stack, whereas gcc initializes
it in place.
Link: https://bugs.llvm.org/show_bug.cgi?id=40776
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
[hverkuil-cisco@xs4all.nl: fix checkpatch warnings]
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c66a919746 ]
If the driver undergoes repeated host resets it starts losing exchange
structures and eventually returns SCSI_MLQUEUE_HOST_BUSY and does not
recover. The offline path is not reclaiming the outstanding ios on the fcp
pring txcmplq before calling lpfc_destroy_multixripool, which causes the
txmcplq to be reinit and the resources lost.
Flush the fcp rings before destroying the multixripools.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 32a80c093b ]
The driver is reporting support for NVME even when not configured for NVME
operation.
Fix (and make more readable) when NVME protocol support is indicated.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ea6c7e34f3 ]
It is not possible to use devm_kzalloc since that memory is
freed immediately when the device instance is unbound.
Various objects like the video device may still be in use
since someone has the device node open, and when that is closed
it expects the memory to be around.
So use kzalloc and release it at the appropriate time.
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f74267b51c ]
The media_device is part of a static global vimc_device struct.
The media framework expects this to be zeroed before it is
used, however, since this is a global this is not the case if
vimc is unbound and then bound again.
So call memset to ensure any left-over values are cleared.
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ce3c2433b0 ]
Restore a default case to prepare_vdi_in_buffers() to fix the following
smatch errors:
drivers/staging/media/imx/imx-media-vdic.c:236 prepare_vdi_in_buffers() error: uninitialized symbol 'prev_phys'.
drivers/staging/media/imx/imx-media-vdic.c:237 prepare_vdi_in_buffers() error: uninitialized symbol 'curr_phys'.
drivers/staging/media/imx/imx-media-vdic.c:238 prepare_vdi_in_buffers() error: uninitialized symbol 'next_phys'.
Fixes: 6e537b58de ("media: imx: vdic: rely on VDIC for correct field order")
Reported-by: Hans Verkuil <hverkuil@xs4all.nl>
Signed-off-by: Steve Longerbeam <slongerbeam@gmail.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3235d39464 ]
Commit 0650a91499 ("media: mtk-vcodec: Correct return type for mem2mem
buffer helpers") fixed the return types for mem2mem buffer helper
functions, but omitted two occurrences that are accessed in the
mtk_v4l2_debug() macro. These only trigger compiler errors when DEBUG is
defined.
Fixes: 0650a91499 ("media: mtk-vcodec: Correct return type for mem2mem buffer helpers")
Signed-off-by: Alexandre Courbot <acourbot@chromium.org>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ed713a4a13 ]
clang-8 warns about one function here when KASAN is enabled, even
without the 'asan-stack' option:
drivers/media/usb/go7007/go7007-fw.c:1551:5: warning: stack frame size of 2656 bytes in function
I have reported this issue in the llvm bugzilla, but to make
it work with the clang-8 release, a small annotation is still
needed.
Link: https://bugs.llvm.org/show_bug.cgi?id=38809
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
[hverkuil-cisco@xs4all.nl: fix checkpatch warning]
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e855165f3d ]
Clang-9 makes some different inlining decisions compared to gcc, which
leads to a warning about a possible stack overflow problem when building
with CONFIG_KASAN, including when setting asan-stack=0, which avoids
most other frame overflow warnings:
drivers/media/platform/vicodec/codec-fwht.c:673:12: error: stack frame size of 2224 bytes in function 'encode_plane'
Manually adding noinline_for_stack annotations in those functions
called by encode_plane() or decode_plane() that require a significant
amount of kernel stack makes this impossible to happen with any
compiler.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e6f8bd59c2 ]
When streaming is stopped all URBs are killed, but in fill_frame and in
bulk_irq this results in an attempt to resubmit the killed URB. That is
not what you want and causes spurious kernel messages.
So check if streaming has stopped before resubmitting.
Also check against gspca_dev->streaming rather than vb2_start_streaming_called()
since vb2_start_streaming_called() will return true when in stop_streaming,
but gspca_dev->streaming is set to false when stop_streaming is called.
Fixes: 6992effe53 ("gspca: Kill all URBs before releasing any of them")
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2978a505aa ]
The state TASK_UNINTERRUPTIBLE should be set just before
schedule_timeout() call, so it knows the sleep mode it should enter.
There is no point in setting TASK_UNINTERRUPTIBLE at the initialization
of the thread as schedule_timeout() will set the state back to
TASK_RUNNING.
This fixes a warning in __might_sleep() call, as it's expecting the
task to be in TASK_RUNNING state just before changing the state to
a sleeping state.
Reported-by: Hans Verkuil <hverkuil@xs4all.nl>
Signed-off-by: Helen Koike <helen.koike@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 63a06181d7 ]
devm_reset_control_get could fail, so the fix checks its return value and
passes the error code upstream in case it fails.
Signed-off-by: Kangjie Lu <kjlu@umn.edu>
Acked-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b9952f93cd ]
[Why]
The kms_plane@plane-position-covered-pipe-*-planes subtests can produce
a sequence of atomic commits such that neither active_changed nor
mode_changed but connectors_changed.
When this happens we remove the old stream from the context and add
a new stream but the new stream doesn't have mode_changed=true set.
This incorrect programming sequence causes CRC mismatches to occur in
the test.
The stream->mode_changed value should be set whenever a new stream
is created.
[How]
A new stream is created whenever drm_atomic_crtc_needs_modeset is true.
We previously covered the active_changed and mode_changed conditions
for the CRTC but connectors_changed is also checked within
drm_atomic_crtc_needs_modeset.
So just use drm_atomic_crtc_needs_modeset directly to determine the
mode_changed flag.
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Reviewed-by: Sun peng Li <Sunpeng.Li@amd.com>
Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 162f807858 ]
[Why]
used to be unable to run 4:2:0 if using a dongle because 4k60 bandwidth
exceeded dongle caps
[How]
half pixel clock during comparison to dongle cap. *Could get stuck on black
screen on monitor that don't support 420 but will be selecting 420 as
preferred mode*
Signed-off-by: Martin Leung <martin.leung@amd.com>
Reviewed-by: Wenjing Liu <Wenjing.Liu@amd.com>
Acked-by: Aidan Wood <Aidan.Wood@amd.com>
Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f91813992c ]
[Why]
The dc_gamma_type CUSTOM_GAMMA is used to represent degamma
mappings passed in by drm. This type of gamma must be interpolated
into a transfer function by apply_1d_lut. The line in
mod_color_calculate_degamma_params that handled this case
was erroneously removed.
[How]
For CUSTOM_GAMMA degamma, calculate the lut as before.
Signed-off-by: David Francis <David.Francis@amd.com>
Reviewed-by: Krunoslav Kovac <Krunoslav.Kovac@amd.com>
Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 49dc762cff ]
The driver should really call dm365_isif_setup_pinmux() through a callback,
but uses a hack to include a davinci specific machine header file when
compile testing instead. This works almost everywhere, but not on the
ARM omap1 platform, which has another header named mach/mux.h. This
causes a build failure:
drivers/staging/media/davinci_vpfe/dm365_isif.c:2028:2: error: implicit declaration of function 'davinci_cfg_reg' [-Werror,-Wimplicit-function-declaration]
davinci_cfg_reg(DM365_VIN_CAM_WEN);
^
drivers/staging/media/davinci_vpfe/dm365_isif.c:2028:2: error: this function declaration is not a prototype [-Werror,-Wstrict-prototypes]
drivers/staging/media/davinci_vpfe/dm365_isif.c:2028:18: error: use of undeclared identifier 'DM365_VIN_CAM_WEN'
davinci_cfg_reg(DM365_VIN_CAM_WEN);
^
drivers/staging/media/davinci_vpfe/dm365_isif.c:2029:18: error: use of undeclared identifier 'DM365_VIN_CAM_VD'
davinci_cfg_reg(DM365_VIN_CAM_VD);
^
drivers/staging/media/davinci_vpfe/dm365_isif.c:2030:18: error: use of undeclared identifier 'DM365_VIN_CAM_HD'
davinci_cfg_reg(DM365_VIN_CAM_HD);
^
drivers/staging/media/davinci_vpfe/dm365_isif.c:2031:18: error: use of undeclared identifier 'DM365_VIN_YIN4_7_EN'
davinci_cfg_reg(DM365_VIN_YIN4_7_EN);
^
drivers/staging/media/davinci_vpfe/dm365_isif.c:2032:18: error: use of undeclared identifier 'DM365_VIN_YIN0_3_EN'
davinci_cfg_reg(DM365_VIN_YIN0_3_EN);
^
7 errors generated.
Exclude omap1 from compile-testing, under the assumption that all others
still work.
Fixes: 4907c73dee ("media: staging: davinci_vpfe: allow building with COMPILE_TEST")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e6bddf6c67 ]
[why]
Stream update will adjust both info packets and stream params,
need to make sure all things are applied togather.
[how]
add pipe lock during stream update
Signed-off-by: Wenjing Liu <Wenjing.Liu@amd.com>
Reviewed-by: Jun Lei <Jun.Lei@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 981fbe3da2 ]
Ref: https://bugzilla.kernel.org/show_bug.cgi?id=199323
Users are experiencing problems with the DVBSky S960/S960C USB devices
since the following commit:
9d659ae: ("locking/mutex: Add lock handoff to avoid starvation")
The device malfunctions after running for an indeterminable period of
time, and the problem can only be cleared by rebooting the machine.
It is possible to encourage the problem to surface by blocking the
signal to the LNB.
Further debugging revealed the cause of the problem.
In the following capture:
- thread #1325 is running m88ds3103_set_frontend
- thread #42 is running ts2020_stat_work
a> [1325] usb 1-1: dvb_usb_v2_generic_io: >>> 08 68 02 07 80
[1325] usb 1-1: dvb_usb_v2_generic_io: <<< 08
[42] usb 1-1: dvb_usb_v2_generic_io: >>> 09 01 01 68 3f
[42] usb 1-1: dvb_usb_v2_generic_io: <<< 08 ff
[42] usb 1-1: dvb_usb_v2_generic_io: >>> 08 68 02 03 11
[42] usb 1-1: dvb_usb_v2_generic_io: <<< 07
[42] usb 1-1: dvb_usb_v2_generic_io: >>> 09 01 01 60 3d
[42] usb 1-1: dvb_usb_v2_generic_io: <<< 07 ff
b> [1325] usb 1-1: dvb_usb_v2_generic_io: >>> 08 68 02 07 00
[1325] usb 1-1: dvb_usb_v2_generic_io: <<< 07
[42] usb 1-1: dvb_usb_v2_generic_io: >>> 08 68 02 03 11
[42] usb 1-1: dvb_usb_v2_generic_io: <<< 07
[42] usb 1-1: dvb_usb_v2_generic_io: >>> 09 01 01 60 21
[42] usb 1-1: dvb_usb_v2_generic_io: <<< 07 ff
[42] usb 1-1: dvb_usb_v2_generic_io: >>> 08 68 02 03 11
[42] usb 1-1: dvb_usb_v2_generic_io: <<< 07
[42] usb 1-1: dvb_usb_v2_generic_io: >>> 09 01 01 60 66
[42] usb 1-1: dvb_usb_v2_generic_io: <<< 07 ff
[1325] usb 1-1: dvb_usb_v2_generic_io: >>> 08 68 02 03 11
[1325] usb 1-1: dvb_usb_v2_generic_io: <<< 07
[1325] usb 1-1: dvb_usb_v2_generic_io: >>> 08 60 02 10 0b
[1325] usb 1-1: dvb_usb_v2_generic_io: <<< 07
Two i2c messages are sent to perform a reset in m88ds3103_set_frontend:
a. 0x07, 0x80
b. 0x07, 0x00
However, as shown in the capture, the regmap mutex is being handed over
to another thread (ts2020_stat_work) in between these two messages.
>From here, the device responds to every i2c message with an 07 message,
and will only return to normal operation following a power cycle.
Use regmap_multi_reg_write to group the two reset messages, ensuring
both are processed before the regmap mutex is unlocked.
Signed-off-by: James Hutchinson <jahutchinson99@googlemail.com>
Reviewed-by: Antti Palosaari <crope@iki.fi>
Signed-off-by: Sean Young <sean@mess.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fdfa59cd63 ]
Commit 14f4eaedda ("media: dvbsky: fix driver unregister logic") fixed
a use-after-free by removing the reference to the frontend after deleting
the backing i2c device.
This has the unfortunate side effect the frontend device is never freed
in the dvb core leaving a dangling device, leading to errors when the
dvb core tries to register the frontend after e.g. a replug as reported
here: https://www.spinics.net/lists/linux-media/msg138181.html
media: dvbsky: issues with DVBSky T680CI
===
[ 561.119145] sp2 8-0040: CIMaX SP2 successfully attached
[ 561.119161] usb 2-3: DVB: registering adapter 0 frontend 0 (Silicon Labs
Si2168)...
[ 561.119174] sysfs: cannot create duplicate filename '/class/dvb/
dvb0.frontend0'
===
The use after free happened as dvb_usbv2_disconnect calls in this order:
- dvb_usb_device::props->exit(...)
- dvb_usbv2_adapter_frontend_exit(...)
+ if (fe) dvb_unregister_frontend(fe)
+ dvb_usb_device::props->frontend_detach(...)
Moving the release of the i2c device from exit() to frontend_detach()
avoids the dangling pointer access and allows the core to unregister
the frontend.
This was originally reported for a DVBSky T680CI, but it also affects
the MyGica T230C. As all supported devices structure the registration/
unregistration identically, apply the change for all device types.
Signed-off-by: Stefan Brüns <stefan.bruens@rwth-aachen.de>
Signed-off-by: Sean Young <sean@mess.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0ab34a0881 ]
si2165_readreg8() may fail. Looking into si2165_readreg8(), we will find
that "val_tmp" will be an uninitialized value when regmap_read() fails.
"val_tmp" is then assigned to "val". So if si2165_readreg8() fails,
"val" will be a random value. Further use will lead to undefined
behaviors. The fix checks if si2165_readreg8() fails, and if so, returns
its error code upstream.
Signed-off-by: Kangjie Lu <kjlu@umn.edu>
Reviewed-by: Matthias Schwarzott <zzam@gentoo.org>
Tested-by: Matthias Schwarzott <zzam@gentoo.org>
Signed-off-by: Sean Young <sean@mess.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5b6e13216b ]
igb sets different WoL settings in system suspend callback and runtime
suspend callback.
The suspend direct complete optimization leaves igb in runtime suspended
state with wrong WoL setting during system suspend.
To fix this, we need to disable suspend direct complete optimization to
let igb always use suspend callback to set correct WoL during system
suspend.
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 459d69c407 ]
There are some new e1000e devices can only be woken up from D3 one time,
by plugging Ethernet cable. Subsequent cable plugging does set PME bit
correctly, but it still doesn't get woken up.
Since e1000e connects to the root complex directly, we rely on ACPI to
wake it up. In this case, the GPE from _PRW only works once and stops
working after that. Though it appears to be a platform bug, e1000e
maintainers confirmed that I219 does not support D3.
So disable runtime PM on CNP+ chips. We may need to disable earlier
generations if this bug also hit older platforms.
Bugzilla: https://bugzilla.kernel.org/attachment.cgi?id=280819
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 42b2cc83af ]
This patch fixes issues with VF queues being disabled, and VF netdev
network carrier being lost after reset. Basically, we need to check if VF
is enabled, and queue configured in reset_all_vfs flow, and disable/enable
those queues appropriately whenever the function is called after
Global/CORER/PFR reset/rebuild/replay.
Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 76eb24fc23 ]
When start_streaming was called both last_src_buf and last_dst_buf
pointers were set to NULL, but this depends on whether the capture
or output queue starts streaming.
When decoding with resolution changes in between the capture queue
has to restart streaming whenever a resolution change occurs. And
that would reset last_src_buf as well, which causes a problem if
the decoder was stopped by the application. Since last_src_buf
is now NULL, the LAST flag is never set for the last capture
buffer.
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 948dff7cfa ]
The imgu_rpm_dummy_cb() looks like an API misuse that is explained
in the comment above it. Aside from that, it also causes a warning
when power management support is disabled:
drivers/staging/media/ipu3/ipu3.c:794:12: error: 'imgu_rpm_dummy_cb' defined but not used [-Werror=unused-function]
The warning is at least easy to fix by marking the function as
__maybe_unused.
Fixes: 7fc7af649c ("media: staging/intel-ipu3: Add imgu top level pci device driver")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 32ab688b28 ]
Since commit 3d6a8fe256 ("media: ov7670: hook s_power onto v4l2 core"),
the device is actually powered off while the video stream is stopped.
The frame format and framerate are restored right after power-up, but
restoring the default register settings is forgotten.
Fixes: 3d6a8fe256 ("media: ov7670: hook s_power onto v4l2 core")
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Reviewed-by: Lubomir Rintel <lkundrak@v3.sk>
Tested-by: Lubomir Rintel <lkundrak@v3.sk>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 106204b56f ]
In case kzalloc fails, the fix releases resources and returns
-ENOMEM to avoid the NULL pointer dereference.
Signed-off-by: Kangjie Lu <kjlu@umn.edu>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit eec3d5efd1 ]
[Why]
The plane_reset callback is subclassed but hasn't been updated since
the drm helper got updated to include resetting alpha related state
(state->alpha and state->pixel_blend_mode). The overlay planes
exposed by amdgpu_dm were therefore being rendered as invisible by
default ever since supported was exposed for alpha blending properties
on overlays.
This caused regressions in igt@kms_plane_multiple@atomic-tiling-none
and igt@kms_plane@plane-position-covered-pipe tests.
[How]
Reset the plane state values to their correct values as defined in
the drm helper.
This fixes the IGT test regression.
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Reviewed-by: Harry Wentland <Harry.Wentland@amd.com>
Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b05e2c5e81 ]
[Why]
Somewhere in the atomic check reshuffle ABM got lost.
ABM is a crtc property (copied from a connector property).
It can change without a modeset, just like underscan.
[How]
In the skip_modeset branch of atomic check crtc updates,
copy over the abm property.
Signed-off-by: David Francis <David.Francis@amd.com>
Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com>
Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 66acd4418d ]
[Why]
In certain cases we do link training when we don't have a backend.
[How]
In dc_link_set_preferred_link_settings(), store preferred link settings
first and then verify that the link is DP and the link stream's backend is
enabled. If either is false, then we will not do any link retraining.
Signed-off-by: Samson Tam <Samson.Tam@amd.com>
Reviewed-by: Aric Cyr <Aric.Cyr@amd.com>
Acked-by: Anthony Koo <Anthony.Koo@amd.com>
Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7287275b43 ]
The regulator header has empty inline functions for most interfaces,
but not regulator_get_linear_step(), which has just grown a user
that does not depend on regulators otherwise:
drivers/clk/tegra/clk-tegra124-dfll-fcpu.c: In function 'get_alignment_from_regulator':
drivers/clk/tegra/clk-tegra124-dfll-fcpu.c:555:19: error: implicit declaration of function 'regulator_get_linear_step'; did you mean 'regulator_get_drvdata'? [-Werror=implicit-function-declaration]
align->step_uv = regulator_get_linear_step(reg);
^~~~~~~~~~~~~~~~~~~~~~~~~
regulator_get_drvdata
cc1: all warnings being treated as errors
scripts/Makefile.build:278: recipe for target 'drivers/clk/tegra/clk-tegra124-dfll-fcpu.o' failed
Add the missing stub along the others.
Fixes: b3cf8d0695 ("clk: tegra: dfll: CVB calculation alignment with the regulator")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ca1438dcb3 ]
The newly added tracepoints in the spi-mxs driver cause a link
error when the driver is a loadable module:
ERROR: "__tracepoint_spi_transfer_stop" [drivers/spi/spi-mxs.ko] undefined!
ERROR: "__tracepoint_spi_transfer_start" [drivers/spi/spi-mxs.ko] undefined!
I'm not quite sure where to put the export statements, but
directly after the inclusion of the header seems as good as
any other place.
Fixes: f3fdea3af4 ("spi: mxs: add tracing to custom .transfer_one_message callback")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2cc12751cf ]
Memory allocated via kmemdup might fail and return a NULL pointer.
This patch adds a check on the return value of kmemdup and passes the
error upstream.
Signed-off-by: Aditya Pakki <pakki001@umn.edu>
Reviewed-by: Mukesh Ojha <mojha@codeaurora.org>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9aabb68568 ]
In enumerate_services, ida_simple_get on failure can return an error and
leaks memory. The patch ensures that the dev_set_name is set on non
failure cases, and releases memory during failure.
Signed-off-by: Aditya Pakki <pakki001@umn.edu>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 62f95ae805 ]
Newer combinations of the glibc, kernel and openssh can result in long initial
startup times on OMAP devices:
[ 6.671425] systemd-rc-once[102]: Creating ED25519 key; this may take some time ...
[ 142.652491] systemd-rc-once[102]: Creating ED25519 key; done.
due to the blocking getrandom(2) system call:
[ 142.610335] random: crng init done
Set the quality level for the omap hwrng driver allowing the kernel to use the
hwrng as an entropy source at boot.
Signed-off-by: Rouven Czerwinski <r.czerwinski@pengutronix.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d4223e06b6 ]
The buffer descriptor setup loop is correct only if it is setting up at
least one bd struct. Besides, there is an error if dma_map_sg() returns
0, which is possible and must be handled.
Additionally, remove the BUG_ON() checking sglen, which is unnecessary
because we configure DMA with that constraint during init.
Signed-off-by: George Hilliard <thirtythreeforty@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ee576ec1c1 ]
Amazingly a mlx5e_tc function is being called from the eswitch layer,
which is by itself very terrible! The function was declared locally in
eswitch_offloads.c so it could be used there, which caused the following
compilation warning, fix that.
drivers/.../mlx5/core/en_tc.c:3242:6: [-Werror=missing-prototypes]
error: no previous prototype for ‘mlx5e_tc_clean_fdb_peer_flows’
Fixes: 04de7dda73 ("net/mlx5e: Infrastructure for duplicated offloading of TC flows")
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e0ceeae708 ]
The Hygon family 18h multi-die processor platform supports 1, 2 or
4-Dies per socket. The topology looks like this:
System View (with 1-Die 2-Socket):
|------------|
------ -----
SOCKET0 | D0 | | D1 | SOCKET1
------ -----
System View (with 2-Die 2-socket):
--------------------
| -------------|------
| | | |
------------ ------------
SOCKET0 | D1 -- D0 | | D3 -- D2 | SOCKET1
------------ ------------
System View (with 4-Die 2-Socket) :
--------------------
| -------------|------
| | | |
------------ ------------
| D1 -- D0 | | D7 -- D6 |
| | \/ | | | | \/ | |
SOCKET0 | | /\ | | | | /\ | | SOCKET1
| D2 -- D3 | | D4 -- D5 |
------------ ------------
| | | |
------|------------| |
--------------------
Currently
phys_proc_id = initial_apicid >> bits
calculates the physical processor ID from the initial_apicid by shifting
*bits*.
However, this does not work for 1-Die and 2-Die 2-socket systems.
According to document [1] section 2.1.11.1, the bits is the value of
CPUID_Fn80000008_ECX[12:15]. The possible values are 4, 5 or 6 which
mean:
4 - 1 die
5 - 2 dies
6 - 3/4 dies.
Hygon programs the initial ApicId the same way as AMD. The ApicId is
read from CPUID_Fn00000001_EBX (see section 2.1.11.1 of referrence [1])
and the definition is as below (see section 2.1.10.2.1.3 of [1]):
-------------------------------------------------
Bit | 6 | 5 4 | 3 | 2 1 0 |
|-----------|---------|--------|----------------|
IDs | Socket ID | Node ID | CCX ID | Core/Thread ID |
-------------------------------------------------
So for 3/4-Die configurations, the bits variable is 6, which is the same
as the ApicID definition field.
For 1-Die and 2-Die configurations, bits is 4 or 5, which will cause the
right shifted result to not be exactly the value of socket ID.
However, the socket ID should be obtained from ApicId[6]. To fix the
problem and match the ApicID field definition, set the shift bits to 6
for all Hygon family 18h multi-die CPUs.
Because AMD doesn't have 2-Socket systems with 1-Die/2-Die processors
(see reference [2]), this doesn't need to be changed on the AMD side but
only for Hygon.
References:
[1] https://www.amd.com/system/files/TechDocs/54945_PPR_Family_17h_Models_00h-0Fh.pdf
[2] https://www.amd.com/en/products/specifications/processors
[bp: heavily massage commit message. ]
Signed-off-by: Pu Wen <puwen@hygon.cn>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Lendacky <Thomas.Lendacky@amd.com>
Cc: Yazen Ghannam <yazen.ghannam@amd.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/1553355740-19999-1-git-send-email-puwen@hygon.cn
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f6ed6491d5 ]
adma driver is using pm_clk_*() interface for managing clock resources.
With this it is observed that clocks remain ON always. This happens on
Tegra devices which use BPMP co-processor to manage clock resources,
where clocks are enabled during prepare phase. This is necessary because
clocks to BPMP are always blocking. When pm_clk_*() interface is used on
such Tegra devices, clock prepare count is not balanced till remove call
happens for the driver and hence clocks are seen ON always. Thus this
patch replaces pm_clk_*() with devm_clk_*() framework.
Suggested-by: Mohan Kumar D <mkumard@nvidia.com>
Reviewed-by: Jonathan Hunter <jonathanh@nvidia.com>
Signed-off-by: Sameer Pujar <spujar@nvidia.com>
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 099e6cc158 ]
Currently incoming ARP Replies, for example via a DHT-PUT message, do
not update the timeout for an already existing DAT entry. These ARP
Replies are dropped instead.
This however defeats the purpose of the DHCPACK snooping, for instance.
Right now, a DAT entry in the DHT will be purged every five minutes,
likely leading to a mesh-wide ARP Request broadcast after this timeout.
Which then recreates the entry. The idea of the DHCPACK snooping is to
be able to update an entry before a timeout happens, to avoid ARP Request
flooding.
This patch fixes this issue by updating a DAT entry on incoming
ARP Replies even if a matching DAT entry already exists. While still
filtering the ARP Reply towards the soft-interface, to avoid duplicate
messages on the client device side.
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Acked-by: Antonio Quartulli <a@unstable.cc>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 98bbbb76f2 ]
clang correctly points out a code path that would lead
to an uninitialized variable use:
security/selinux/netlabel.c:310:6: error: variable 'addr' is used uninitialized whenever 'if' condition is false
[-Werror,-Wsometimes-uninitialized]
if (ip_hdr(skb)->version == 4) {
^~~~~~~~~~~~~~~~~~~~~~~~~
security/selinux/netlabel.c:322:40: note: uninitialized use occurs here
rc = netlbl_conn_setattr(ep->base.sk, addr, &secattr);
^~~~
security/selinux/netlabel.c:310:2: note: remove the 'if' if its condition is always true
if (ip_hdr(skb)->version == 4) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
security/selinux/netlabel.c:291:23: note: initialize the variable 'addr' to silence this warning
struct sockaddr *addr;
^
= NULL
This is probably harmless since we should not see ipv6 packets
of CONFIG_IPV6 is disabled, but it's better to rearrange the code
so this cannot happen.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
[PM: removed old patchwork link, fixed checkpatch.pl style errors]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2ebd4428d9 ]
In the current implementation of ice_reset_subtask, if multiple reset
types are set in the pf->state, the most intrusive one is meant to be
performed only, but the bits requesting the other types are not being
cleared. This would lead to another reset being performed the next time
the service task is scheduled.
Change the flow of ice_reset_subtask so that all reset request bits in
pf->state are cleared, and we still perform the most intrusive of the
resets requested.
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8eead25cbd ]
The function 'v4l2_m2m_buf_copy_metadata' should
be called even if decoding/encoding ends with
status VB2_BUF_STATE_ERROR, so that the metadata
is copied from the source buffer to the dest buffer.
Signed-off-by: Dafna Hirschfeld <dafna3@gmail.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit faf5a744f4 ]
clang -Wuninitialized incorrectly sees a variable being used without
initialization:
drivers/scsi/lpfc/lpfc_nvme.c:2102:37: error: variable 'localport' is uninitialized when used here
[-Werror,-Wuninitialized]
lport = (struct lpfc_nvme_lport *)localport->private;
^~~~~~~~~
drivers/scsi/lpfc/lpfc_nvme.c:2059:38: note: initialize the variable 'localport' to silence this warning
struct nvme_fc_local_port *localport;
^
= NULL
1 error generated.
This is clearly in dead code, as the condition leading up to it is always
false when CONFIG_NVME_FC is disabled, and the variable is always
initialized when nvme_fc_register_localport() got called successfully.
Change the preprocessor conditional to the equivalent C construct, which
makes the code more readable and gets rid of the warning.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 608f729c31 ]
Clang -Wuninitialized notices that on is_qla40XX we never allocate any DMA
memory in get_fw_boot_info() but attempt to free it anyway:
drivers/scsi/qla4xxx/ql4_os.c:5915:7: error: variable 'buf_dma' is used uninitialized whenever 'if' condition is false
[-Werror,-Wsometimes-uninitialized]
if (!(val & 0x07)) {
^~~~~~~~~~~~~
drivers/scsi/qla4xxx/ql4_os.c:5985:47: note: uninitialized use occurs here
dma_free_coherent(&ha->pdev->dev, size, buf, buf_dma);
^~~~~~~
drivers/scsi/qla4xxx/ql4_os.c:5915:3: note: remove the 'if' if its condition is always true
if (!(val & 0x07)) {
^~~~~~~~~~~~~~~~~~~
drivers/scsi/qla4xxx/ql4_os.c:5885:20: note: initialize the variable 'buf_dma' to silence this warning
dma_addr_t buf_dma;
^
= 0
Skip the call to dma_free_coherent() here.
Fixes: 2a991c2159 ("[SCSI] qla4xxx: Boot from SAN support for open-iscsi")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 64a59d05a4 ]
commit 63f545ed12 ("ice: Add support for adaptive interrupt moderation")
was meant to add support for adaptive interrupt moderation but there was
an error on my part while formatting the patch, and thus only part of the
patch ended up being submitted.
This patch rectifies the error by adding the rest of the code.
Fixes: 63f545ed12 ("ice: Add support for adaptive interrupt moderation")
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8ead7e8172 ]
If ohci-platform is runtime suspended, we can currently get an "imprecise
external abort" on reboot with ohci-platform loaded when PM runtime
is implemented for the SoC.
Let's fix this by adding PM runtime support to usb_hcd_platform_shutdown.
Signed-off-by: Tony Lindgren <tony@atomide.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a88eceb17a ]
This patch adds spi_master_put in release function
to drop the controller's refcount.
Signed-off-by: Ludovic Barre <ludovic.barre@st.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b699cce160 ]
The rcu_head_after_call_rcu() function reads the rhp->func pointer twice,
which can result in a false-positive WARN_ON_ONCE() if the callback
were passed to call_rcu() between the two reads. Although racing
rcu_head_after_call_rcu() with call_rcu() is to be a dubious use case
(the return value is not reliable in that case), intermittent and
irreproducible warnings are also quite dubious. This commit therefore
uses a single READ_ONCE() to pick up the value of rhp->func once, then
tests that value twice, thus guaranteeing consistent processing within
rcu_head_after_call_rcu()().
Neverthless, racing rcu_head_after_call_rcu() with call_rcu() is still
a dubious use case.
Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org>
[ paulmck: Add blank line after declaration per checkpatch.pl. ]
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ad092c0277 ]
If the specified rcuperf.perf_type is not in the rcu_perf_init()
function's perf_ops[] array, rcuperf prints some console messages and
then invokes rcu_perf_cleanup() to set state so that a future torture
test can run. However, rcu_perf_cleanup() also attempts to end the
test that didn't actually start, and in doing so relies on the value
of cur_ops, a value that is not particularly relevant in this case.
This can result in confusing output or even follow-on failures due to
attempts to use facilities that have not been properly initialized.
This commit therefore sets the value of cur_ops to NULL in this case and
inserts a check near the beginning of rcu_perf_cleanup(), thus avoiding
relying on an irrelevant cur_ops value.
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 006c077041 ]
Linux reads MCG_CAP[Count] to find the number of MCA banks visible to a
CPU. Currently, this number is the same for all CPUs and a warning is
shown if there is a difference. The number of banks is overwritten with
the MCG_CAP[Count] value of each following CPU that boots.
According to the Intel SDM and AMD APM, the MCG_CAP[Count] value gives
the number of banks that are available to a "processor implementation".
The AMD BKDGs/PPRs further clarify that this value is per core. This
value has historically been the same for every core in the system, but
that is not an architectural requirement.
Future AMD systems may have different MCG_CAP[Count] values per core,
so the assumption that all CPUs will have the same MCG_CAP[Count] value
will no longer be valid.
Also, the first CPU to boot will allocate the struct mce_banks[] array
using the number of banks based on its MCG_CAP[Count] value. The machine
check handler and other functions use the global number of banks to
iterate and index into the mce_banks[] array. So it's possible to use an
out-of-bounds index on an asymmetric system where a following CPU sees a
MCG_CAP[Count] value greater than its predecessors.
Thus, allocate the mce_banks[] array to the maximum number of banks.
This will avoid the potential out-of-bounds index since the value of
mca_cfg.banks is capped to MAX_NR_BANKS.
Set the value of mca_cfg.banks equal to the max of the previous value
and the value for the current CPU. This way mca_cfg.banks will always
represent the max number of banks detected on any CPU in the system.
This will ensure that all CPUs will access all the banks that are
visible to them. A CPU that can access fewer than the max number of
banks will find the registers of the extra banks to be read-as-zero.
Furthermore, print the resulting number of MCA banks in use. Do this in
mcheck_late_init() so that the final value is printed after all CPUs
have been initialized.
Finally, get bank count from target CPU when doing injection with mce-inject
module.
[ bp: Remove out-of-bounds example, passify and cleanup commit message. ]
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Pu Wen <puwen@hygon.cn>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20180727214009.78289-1-Yazen.Ghannam@amd.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b813afae7a ]
If the specified rcutorture.torture_type is not in the rcu_torture_init()
function's torture_ops[] array, rcutorture prints some console messages
and then invokes rcu_torture_cleanup() to set state so that a future
torture test can run. However, rcu_torture_cleanup() also attempts to
end the test that didn't actually start, and in doing so relies on the
value of cur_ops, a value that is not particularly relevant in this case.
This can result in confusing output or even follow-on failures due to
attempts to use facilities that have not been properly initialized.
This commit therefore sets the value of cur_ops to NULL in this case
and inserts a check near the beginning of rcu_torture_cleanup(),
thus avoiding relying on an irrelevant cur_ops value.
Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f19501aa07 ]
There has been a lurking "TBD" in the machine check poll routine ever
since it was first split out from the machine check handler. The
potential issue is that the poll routine may have just begun a read from
the STATUS register in a machine check bank when the hardware logs an
error in that bank and signals a machine check.
That race used to be pretty small back when machine checks were
broadcast, but the addition of local machine check means that the poll
code could continue running and clear the error from the bank before the
local machine check handler on another CPU gets around to reading it.
Fix the code to be sure to only process errors that need to be processed
in the poll code, leaving other logged errors alone for the machine
check handler to find and process.
[ bp: Massage a bit and flip the "== 0" check to the usual !(..) test. ]
Fixes: b79109c3bb ("x86, mce: separate correct machine check poller and fatal exception handler")
Fixes: ed7290d0ee ("x86, mce: implement new status bits")
Reported-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Cc: Yazen Ghannam <Yazen.Ghannam@amd.com>
Link: https://lkml.kernel.org/r/20190312170938.GA23035@agluck-desk
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7ca8c2c8bb ]
The module was initializing completions whenever it was going to wait on
them, and not when the completion was allocated. This is incorrect
according to the completion docs:
Calling init_completion() on the same completion object twice is
most likely a bug [...]
Re-initialization is also unnecessary because the module never uses
complete_all(). Fix this by only ever initializing the completion a
single time, and log if the completions are not consumed as intended
(this is not a fatal problem, but should not go unnoticed).
Signed-off-by: George Hilliard <thirtythreeforty@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1bbb1c318c ]
ipw->attr_memory and ipw->common_memory are assigned with the
return value of ioremap. ioremap may fail, but no checks
are enforced. The fix inserts the checks to avoid potential
NULL pointer dereferences.
Signed-off-by: Kangjie Lu <kjlu@umn.edu>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4b0a2c5ff7 ]
For regular serial ports we do not initialize value of vtermno
variable. A garbage value is assigned for non console ports.
The value can be observed as a random integer with [1].
[1] vim /sys/kernel/debug/virtio-ports/vport*p*
This patch initialize the value of vtermno for console serial
ports to '1' and regular serial ports are initiaized to '0'.
Reported-by: siliu@redhat.com
Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b49f6d83e2 ]
This patch fixes the error exit path of fastrpc_init_create_process().
If the DMA allocation or the DSP invoke fails the fastrpc_map was freed
but not removed from the mapping list leading to a double free once the
mapping list is emptied in fastrpc_device_release().
[srinivas kandagatla]: Cleaned up error path labels and reset init mem
to NULL after free
Fixes: d73f71c7c6ee("misc: fastrpc: Add support for create remote init process")
Signed-off-by: Thierry Escande <thierry.escande@linaro.org>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 415a0729bd ]
dma_alloc_coherent buffers could have writes queued in store buffers so
commit them before sending buffer to DSP using correct dma barriers.
Same with vice-versa.
Fixes: c68cfb718c ("misc: fastrpc: Add support for context Invoke method")
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 80f3afd72b ]
While passing address phy address to DSP, take care of the offset
calculated from virtual address vma.
Fixes: c68cfb718c ("misc: fastrpc: Add support for context Invoke method")
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d623dfd283 ]
The InfiniBand Architecture Specification section 10.6.7.2.4 TYPE 2 MEMORY
WINDOWS says that if the CI supports the Base Memory Management Extensions
defined in this specification, the R_Key format for a Type 2 Memory Window
must consist of:
* 24 bit index in the most significant bits of the R_Key, which is owned
by the CI, and
* 8 bit key in the least significant bits of the R_Key, which is owned by
the Consumer.
This means that the kernel should compare only the index part of a R_Key
to determine equality with another R_Key.
Fixes: db570d7dea ("IB/mlx5: Add ODP support to MW")
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7a8e61f847 ]
Several people reported testing failures after setting CLOCK_REALTIME close
to the limits of the kernel internal representation in nanoseconds,
i.e. year 2262.
The failures are exposed in subsequent operations, i.e. when arming timers
or when the advancing CLOCK_MONOTONIC makes the calculation of
CLOCK_REALTIME overflow into negative space.
Now people start to paper over the underlying problem by clamping
calculations to the valid range, but that's just wrong because such
workarounds will prevent detection of real issues as well.
It is reasonable to force an upper bound for the various methods of setting
CLOCK_REALTIME. Year 2262 is the absolute upper bound. Assume a maximum
uptime of 30 years which is plenty enough even for esoteric embedded
systems. That results in an upper bound of year 2232 for setting the time.
Once that limit is reached in reality this limit is only a small part of
the problem space. But until then this stops people from trying to paper
over the problem at the wrong places.
Reported-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Reported-by: Hongbo Yao <yaohongbo@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Miroslav Lichvar <mlichvar@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1903231125480.2157@nanos.tec.linutronix.de
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 167e535438 ]
The PLL parameters are computed by looping over the range of acceptable
M, N and E values, and selecting the combination that produces the
output frequency closest to the target. The internal frequency
constraints are taken into account by restricting the tested values for
the PLL parameters, reducing the search space. The target frequency,
however, is only taken into account when computing the post-PLL divider,
which can result in a 0 value for the divider when the PLL output
frequency being tested is lower than half of the target frequency.
Subsequent loops will produce a better set of PLL parameters, but for
some of the iterations this can result in a division by 0.
Fix it by clamping the divider value. We could instead restrict the E
values being tested in the inner loop, but that would require additional
calculation that would likely be less efficient as the E parameter can
only take three different values.
Fixes: c25c013611 ("drm: rcar-du: lvds: D3/E3 support")
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Reviewed-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 00d082cc4e ]
On the D3 SoC the LVDS PHY must be enabled in the same register write
that enables the LVDS output. Skip writing the LVEN bit independently
on that platform, it will be set by the write that sets LVRES.
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Reviewed-by: Jacopo Mondi <jacopo+renesas@jmondi.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 52fafc58c3 ]
Commit 0650a91499 ("media: mtk-vcodec: Correct return type for mem2mem
buffer helpers") fixed the return types for mem2mem buffer helper
functions by changing a few local variables from vb2_buffer to
vb2_v4l2_buffer. However, it left a few accesses to vb2_buffer::planes
as-is, accidentally turning them into accesses to
vb2_v4l2_buffer::planes and resulting in values being read from/written
to the wrong place.
Fix this by inserting vb2_buf into these accesses so they mimic their
original behavior.
Fixes: 0650a91499 ("media: mtk-vcodec: Correct return type for mem2mem buffer helpers")
Signed-off-by: Alexandre Courbot <acourbot@chromium.org>
Reviewed-by: Ezequiel Garcia <ezequiel@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9b9ea7c2b5 ]
In order to prevent ISOC URBs from being infinitely resubmitted,
the driver's USB disconnect handler must kill all the in-flight URBs.
While here, change the URB packet status message to a debug level,
to avoid spamming the console too much.
This commit fixes a lockup caused by an interrupt storm coming
from the URB completion handler.
Signed-off-by: Ezequiel Garcia <ezequiel@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9c2ccc324b ]
Smatch marks skb->data as untrusted so it warns that "evt_hdr->dlen"
can copy up to 255 bytes and we only have room for two bytes. Even
if this comes from the firmware and we trust it, the new policy
generally is just to fix it as kernel hardenning.
I can't test this code so I tried to be very conservative. I considered
not allowing "evt_hdr->dlen == 1" because it doesn't initialize the
whole variable but in the end I decided to allow it and manually
initialized "asic_id" and "asic_ver" to zero.
Fixes: e8454ff7b9 ("[media] drivers:media:radio: wl128x: FM Driver Common sources")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c03a0fd0b6 ]
syzbot is hitting use-after-free bug in uinput module [1]. This is because
kobject_uevent(KOBJ_REMOVE) is called again due to commit 0f4dafc056
("Kobject: auto-cleanup on final unref") after memory allocation fault
injection made kobject_uevent(KOBJ_REMOVE) from device_del() from
input_unregister_device() fail, while uinput_destroy_device() is expecting
that kobject_uevent(KOBJ_REMOVE) is not called after device_del() from
input_unregister_device() completed.
That commit intended to catch cases where nobody even attempted to send
"remove" uevents. But there is no guarantee that an event will ultimately
be sent. We are at the point of no return as far as the rest of the kernel
is concerned; there are no repeats or do-overs.
Also, it is not clear whether some subsystem depends on that commit.
If no subsystem depends on that commit, it will be better to remove
the state_{add,remove}_uevent_sent logic. But we don't want to risk
a regression (in a patch which will be backported) by trying to remove
that logic. Therefore, as a first step, let's avoid the use-after-free bug
by making sure that kobject_uevent(KOBJ_REMOVE) won't be triggered twice.
[1] https://syzkaller.appspot.com/bug?id=8b17c134fe938bbddd75a45afaa9e68af43a362d
Reported-by: syzbot <syzbot+f648cfb7e0b52bf7ae32@syzkaller.appspotmail.com>
Analyzed-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Fixes: 0f4dafc056 ("Kobject: auto-cleanup on final unref")
Cc: Kay Sievers <kay@vrfy.org>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e850b89f50 ]
Unmapping ptes in the device MMU on Palladium can take a long time, which
can cause a kernel BUG of CPU soft lockup.
This patch minimize the chances for this bug by sleeping a little between
unmapping ptes.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0191949333 ]
Fixes: SPI driver can be built as module so perform SPI controller reset
on probe to make sure it is in valid state before initiating transfer.
Signed-off-by: Sowjanya Komatineni <skomatineni@nvidia.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1f87b0cd32 ]
According to hidpp20_batterylevel_get_battery_info my Logitech K270
keyboard reports only 2 battery levels. This matches with what I've seen
after testing with batteries at varying level of fullness, it always
reports either 5% or 30%.
Windows reports "battery good" for the 30% level. I've captured an USB
trace of Windows reading the battery and it is getting the same info
as the Linux hidpp code gets.
Now that Linux handles these devices as hidpp devices, it reports the
battery as being low as it treats anything under 31% as low, this leads
to the user constantly getting a "Keyboard battery is low" warning from
GNOME3, which is very annoying.
This commit fixes this by changing the low threshold to anything under
30%, which I assume is what Windows does.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b9df2ea2b8 ]
The clock sources of the AXI-bus clock (266.66 MHz) used for Audio-DMAC
DMA transfers are:
Channel R-Car H3 R-Car M3-W R-Car M3-N R-Car E3
---------------------------------------------------------------
Audio-DMAC0 S1D2 S1D2 S1D2 S1D2
Audio-DMAC1 S1D2 S1D2 S1D2 -
As a result, change the parent clocks of the Audio-DMAC{0,1} module
clocks on R-Car H3, R-Car M3-W, and R-Car M3-N to S1D2, and change the
parent clock of the Audio-DMAC0 module on R-Car E3 to S1D2.
NOTE: This information will be reflected in a future revision of the
R-Car Gen3 Hardware Manual.
Signed-off-by: Takeshi Kihara <takeshi.kihara.df@renesas.com>
[geert: Update R-Car D3, RZ/G2M, and RZ/G2E]
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3c772f71a5 ]
The clock sources of the AXI BUS clock (266.66 MHz) used for SYS-DMAC
DMA transfers are:
Channel R-Car H3 R-Car M3-W R-Car M3-N
-------------------------------------------------
SYS-DMAC0 S0D3 S0D3 S0D3
SYS-DMAC1 S3D1 S3D1 S3D1
SYS-DMAC2 S3D1 S3D1 S3D1
As a result, change the parent clocks of the SYS-DMAC{1,2} module clocks
on R-Car H3, R-Car M3-W, and R-Car M3-N to S3D1.
NOTE: This information will be reflected in a future revision of the
R-Car Gen3 Hardware Manual.
Signed-off-by: Takeshi Kihara <takeshi.kihara.df@renesas.com>
[geert: Update RZ/G2M]
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7649773293 ]
The use of zero-sized array causes undefined behaviour when it is not
the last member in a structure. As it happens to be in this case.
Also, the current code makes use of a language extension to the C90
standard, but the preferred mechanism to declare variable-length
types such as this one is a flexible array member, introduced in
C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last. Which is beneficial
to cultivate a high-quality code.
Fixes: e48f129c2f ("[SCSI] cxgb3i: convert cdev->l2opt to use rcu to prevent NULL dereference")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b820d52e7e ]
The call to of_parse_phandle returns a node pointer with refcount
incremented thus it must be explicitly decremented after the last
usage.
Detected by coccinelle with the following warnings:
./sound/soc/fsl/eukrea-tlv320.c:121:3-9: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 102, but without a correspo nding object release within this function.
./sound/soc/fsl/eukrea-tlv320.c:127:3-9: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 102, but without a correspo nding object release within this function.
Signed-off-by: Wen Yang <wen.yang99@zte.com.cn>
Cc: Liam Girdwood <lgirdwood@gmail.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Jaroslav Kysela <perex@perex.cz>
Cc: Takashi Iwai <tiwai@suse.com>
Cc: alsa-devel@alsa-project.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 58e7515500 ]
As seen on some USB wireless keyboards manufactured by Primax, the HID
parser was using some assumptions that are not always true. In this case
it's s the fact that, inside the scope of a main item, an Usage Page
will always precede an Usage.
The spec is not pretty clear as 6.2.2.7 states "Any usage that follows
is interpreted as a Usage ID and concatenated with the Usage Page".
While 6.2.2.8 states "When the parser encounters a main item it
concatenates the last declared Usage Page with a Usage to form a
complete usage value." Being somewhat contradictory it was decided to
match Window's implementation, which follows 6.2.2.8.
In summary, the patch moves the Usage Page concatenation from the local
item parsing function to the main item parsing function.
Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Reviewed-by: Terry Junge <terry.junge@poly.com>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8440bb9b94 ]
When compile-testing on arm:
arch/sh/include/cpu-sh4/cpu/sh7786.h: In function ‘sh7786_mm_sel’:
arch/sh/include/cpu-sh4/cpu/sh7786.h:135:21: warning: passing argument 1 of ‘__raw_readl’ makes pointer from integer without a cast [-Wint-conversion]
return __raw_readl(0xFC400020) & 0x7;
^~~~~~~~~~
In file included from include/linux/io.h:25:0,
from arch/sh/include/cpu-sh4/cpu/sh7786.h:14,
from drivers/pinctrl/sh-pfc/pfc-sh7786.c:15:
arch/arm/include/asm/io.h:113:21: note: expected ‘const volatile void *’ but argument is of type ‘unsigned int’
#define __raw_readl __raw_readl
^
arch/arm/include/asm/io.h:114:19: note: in expansion of macro ‘__raw_readl’
static inline u32 __raw_readl(const volatile void __iomem *addr)
^~~~~~~~~~~
__raw_readl() on SuperH is a macro that casts the passed I/O address to
the correct type, while the implementations on most other architectures
expect to be passed the correct pointer type.
Add an explicit cast to fix this.
Note that this also gets rid of a sparse warning on SuperH:
arch/sh/include/cpu-sh4/cpu/sh7786.h:135:16: warning: incorrect type in argument 1 (different base types)
arch/sh/include/cpu-sh4/cpu/sh7786.h:135:16: expected void const volatile [noderef] <asn:2>*<noident>
arch/sh/include/cpu-sh4/cpu/sh7786.h:135:16: got unsigned int
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6734b29735 ]
port_pd is treated as le32 in declaration and read, fix assignment to be
in le32 too. This change fixes the following compilation warnings.
drivers/infiniband/hw/hns/hns_roce_ah.c:67:24: warning: incorrect type
in assignment (different base types)
drivers/infiniband/hw/hns/hns_roce_ah.c:67:24: expected restricted __le32 [usertype] port_pd
drivers/infiniband/hw/hns/hns_roce_ah.c:67:24: got restricted __be32 [usertype]
Fixes: 9a4435375c ("IB/hns: Add driver files for hns RoCE driver")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Gal Pressman <galpress@amazon.com>
Reviewed-by: Lijun Ou <ouliun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b69656fa7e ]
New tooling got confused about this:
arch/x86/lib/memcpy_64.o: warning: objtool: .fixup+0x7: return with UACCESS enabled
While the code isn't wrong, it is tedious (if at all possible) to
figure out what function a particular chunk of .fixup belongs to.
This then confuses the objtool uaccess validation. Instead of
returning directly from the .fixup, jump back into the right function.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 88e4718275 ]
Occasionally GCC is less agressive with inlining and the following is
observed:
arch/x86/kernel/signal.o: warning: objtool: restore_sigcontext()+0x3cc: call to force_valid_ss.isra.5() with UACCESS enabled
arch/x86/kernel/signal.o: warning: objtool: do_signal()+0x384: call to frame_uc_flags.isra.0() with UACCESS enabled
Cure this by moving this code out of the AC=1 region, since it really
isn't needed for the user access.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 192a7e1f73 ]
Back in commit 4d339989ac ("iwlwifi: mvm: support ibss in dqa mode")
we changed queue selection for IBSS to be:
if (ieee80211_is_probe_resp(fc) || ieee80211_is_auth(fc) ||
ieee80211_is_deauth(fc))
return IWL_MVM_DQA_AP_PROBE_RESP_QUEUE;
if (info->hw_queue == info->control.vif->cab_queue)
return info->hw_queue;
return IWL_MVM_DQA_AP_PROBE_RESP_QUEUE;
Clearly, the thought at the time must've been that mac80211 will
select the hw_queue as the cab_queue, so that we'll return and use
that, where we store the multicast queue for IBSS. This, however,
isn't true because mac80211 doesn't implement powersave for IBSS
and thus selects the normal IBSS interface AC queue (best effort).
This therefore always used the probe response queue, which maps to
the BE FIFO.
In commit cfbc6c4c5b ("iwlwifi: mvm: support mac80211 TXQs model")
we rethought this code, and as a consequence now started mapping the
multicast traffic to the multicast hardware queue since we no longer
relied on mac80211 selecting the queue, doing it ourselves instead.
This queue is mapped to the MCAST FIFO. however, this isn't actually
enabled/controlled by the firmware in IBSS mode because we don't
implement powersave, and frames from this queue can never go out in
this case.
Therefore, we got queue hang reports such as
https://bugzilla.kernel.org/show_bug.cgi?id=201707
Fix this by mapping the multicast queue to the BE FIFO in IBSS so
that all the frames can go out.
Fixes: cfbc6c4c5b ("iwlwifi: mvm: support mac80211 TXQs model")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 49122ec426 ]
The functions that send management TX frame have 3 possible
results: success and other side acknowledged receive (ACK=1),
success and other side did not acknowledge receive(ACK=0) and
failure to send the frame. The current implementation
incorrectly reports the ACK=0 case as failure.
Signed-off-by: Lior David <liord@codeaurora.org>
Signed-off-by: Maya Erez <merez@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6752bea8b0 ]
[Why]
The actual position for the cursor on the screen is essentially:
x_out = x - x_plane - x_hotspot
y_out = y - y_plane - y_hotspot
The register values for cursor position and cursor hotspot need to be
greater than zero when programmed, but we also need to subtract off
the plane position to display the cursor at the correct position.
Since we don't want x or y to be less than zero, we add the plane
position as a positive value to x_hotspot or y_hotspot. However, what
this doesn't take into account is that the hotspot registers are limited
by the maximum cursor size.
On DCN10 the cursor hotspot regitsers are masked to 0xFF, so they have
a maximum value of 0-255. Values greater this will wrap, causing the
cursor to display in the wrong position.
In practice this means that for sufficiently large plane positions, the
cursor will be drawn twice on the screen, and can cause screen flashes
or p-state WARNS depending on what the wrapped value is.
So we need a way to remove the value from x_plane and y_plane without
exceeding the maximum cursor size.
[How]
Subtract as much as x_plane/y_plane as possible from x and y and place
the remainder in the cursor hotspot register.
The value for x_hotspot and y_hotspot can still wrap around but it
won't happen in a case where the cursor is actually enabled.
The cursor plane needs to intersect at least one pixel of the plane's
rectangle to be enabled, so the cursor position + hotspot provided by
userspace must always be strictly less than the maximum cursor size for
the cursor to actually be enabled.
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Reviewed-by: Sun peng Li <Sunpeng.Li@amd.com>
Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3b141e8cfd ]
For regulators used by UFS, vcc, vccq and vccq2 will have voltage range
initialized by ufshcd_populate_vreg(), however other regulators may have
undefined voltage range if dt-bindings have no such definition.
In above undefined case, both "min_uV" and "max_uV" fields in ufs_vreg
struct will be zero values and these values will be configured on
regulators in different power modes.
Currently this may have no harm if both "min_uV" and "max_uV" always keep
"zero values" because regulator_set_voltage() will always bypass such
invalid values and return "good" results.
However improper values shall be fixed to avoid potential bugs. Simply
bypass voltage configuration if voltage range is not defined.
Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Acked-by: Alim Akhtar <alim.akhtar@samsung.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0487fff766 ]
Currently if a regulator has "<name>-fixed-regulator" property in device
tree, it will skip current limit initialization. This lead to a zero
"max_uA" value in struct ufs_vreg.
However, "regulator_set_load" operation shall be required on regulators
which have valid current limits, otherwise a zero "max_uA" set by
"regulator_set_load" may cause unexpected behavior when this regulator is
enabled or set as high power mode.
Similarly, in device's icc_level configuration flow, the target icc_level
shall be updated if regulator also has valid current limit, otherwise a
wrong icc_level will be calculated by zero "max_uA" and thus causes
unexpected results after it is written to device.
Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Acked-by: Alim Akhtar <alim.akhtar@samsung.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1723fdec5f ]
While devm_gpiod_get_index_optional() returns NULL if the GPIO is not
present (i.e. -ENOENT), it may still return other error codes, like
-EPROBE_DEFER. Currently these are not handled, leading to
unrecoverable failures later in case of probe deferral:
gpiod_set_consumer_name: invalid GPIO (errorpointer)
gpiod_direction_output: invalid GPIO (errorpointer)
gpiod_set_value_cansleep: invalid GPIO (errorpointer)
gpiod_set_value_cansleep: invalid GPIO (errorpointer)
gpiod_set_value_cansleep: invalid GPIO (errorpointer)
Detect and propagate errors to fix this.
Fixes: f3186dd876 ("spi: Optionally use GPIO descriptors for CS GPIOs")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a652e00ee1 ]
The IRQ is requested before the struct rtc is allocated and registered, but
this struct is used in the IRQ handler. This may lead to a NULL pointer
dereference.
Switch to devm_rtc_allocate_device/rtc_register_device to allocate the rtc
struct before requesting the IRQ.
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit db3b9e2e1d ]
It was observed that rarely during USB disconnect happening shortly after
connect (before full initialization completes) usb_hub_wq would wait
forever for the dev_init_lock to be unlocked. dev_init_lock would remain
locked though because of infinite wait during usb_kill_urb:
[ 2730.656472] kworker/0:2 D 0 260 2 0x00000000
[ 2730.660700] Workqueue: events request_firmware_work_func
[ 2730.664807] [<809dca20>] (__schedule) from [<809dd164>] (schedule+0x4c/0xac)
[ 2730.670587] [<809dd164>] (schedule) from [<8069af44>] (usb_kill_urb+0xdc/0x114)
[ 2730.676815] [<8069af44>] (usb_kill_urb) from [<7f258b50>] (brcmf_usb_free_q+0x34/0xa8 [brcmfmac])
[ 2730.684833] [<7f258b50>] (brcmf_usb_free_q [brcmfmac]) from [<7f2517d4>] (brcmf_detach+0xa0/0xb8 [brcmfmac])
[ 2730.693557] [<7f2517d4>] (brcmf_detach [brcmfmac]) from [<7f251a34>] (brcmf_attach+0xac/0x3d8 [brcmfmac])
[ 2730.702094] [<7f251a34>] (brcmf_attach [brcmfmac]) from [<7f2587ac>] (brcmf_usb_probe_phase2+0x468/0x4a0 [brcmfmac])
[ 2730.711601] [<7f2587ac>] (brcmf_usb_probe_phase2 [brcmfmac]) from [<7f252888>] (brcmf_fw_request_done+0x194/0x220 [brcmfmac])
[ 2730.721795] [<7f252888>] (brcmf_fw_request_done [brcmfmac]) from [<805748e4>] (request_firmware_work_func+0x4c/0x88)
[ 2730.731125] [<805748e4>] (request_firmware_work_func) from [<80141474>] (process_one_work+0x228/0x808)
[ 2730.739223] [<80141474>] (process_one_work) from [<80141a80>] (worker_thread+0x2c/0x564)
[ 2730.746105] [<80141a80>] (worker_thread) from [<80147bcc>] (kthread+0x13c/0x16c)
[ 2730.752227] [<80147bcc>] (kthread) from [<801010b4>] (ret_from_fork+0x14/0x20)
[ 2733.099695] kworker/0:3 D 0 1065 2 0x00000000
[ 2733.103926] Workqueue: usb_hub_wq hub_event
[ 2733.106914] [<809dca20>] (__schedule) from [<809dd164>] (schedule+0x4c/0xac)
[ 2733.112693] [<809dd164>] (schedule) from [<809e2a8c>] (schedule_timeout+0x214/0x3e4)
[ 2733.119621] [<809e2a8c>] (schedule_timeout) from [<809dde2c>] (wait_for_common+0xc4/0x1c0)
[ 2733.126810] [<809dde2c>] (wait_for_common) from [<7f258d00>] (brcmf_usb_disconnect+0x1c/0x4c [brcmfmac])
[ 2733.135206] [<7f258d00>] (brcmf_usb_disconnect [brcmfmac]) from [<8069e0c8>] (usb_unbind_interface+0x5c/0x1e4)
[ 2733.143943] [<8069e0c8>] (usb_unbind_interface) from [<8056d3e8>] (device_release_driver_internal+0x164/0x1fc)
[ 2733.152769] [<8056d3e8>] (device_release_driver_internal) from [<8056c078>] (bus_remove_device+0xd0/0xfc)
[ 2733.161138] [<8056c078>] (bus_remove_device) from [<8056977c>] (device_del+0x11c/0x310)
[ 2733.167939] [<8056977c>] (device_del) from [<8069cba8>] (usb_disable_device+0xa0/0x1cc)
[ 2733.174743] [<8069cba8>] (usb_disable_device) from [<8069507c>] (usb_disconnect+0x74/0x1dc)
[ 2733.181823] [<8069507c>] (usb_disconnect) from [<80695e88>] (hub_event+0x478/0xf88)
[ 2733.188278] [<80695e88>] (hub_event) from [<80141474>] (process_one_work+0x228/0x808)
[ 2733.194905] [<80141474>] (process_one_work) from [<80141a80>] (worker_thread+0x2c/0x564)
[ 2733.201724] [<80141a80>] (worker_thread) from [<80147bcc>] (kthread+0x13c/0x16c)
[ 2733.207913] [<80147bcc>] (kthread) from [<801010b4>] (ret_from_fork+0x14/0x20)
It was traced down to a case where usb_kill_urb would be called on an URB
structure containing more or less random data, including large number in
its use_count. During the debugging it appeared that in brcmf_usb_free_q()
the traversal over URBs' lists is not synchronized with operations on those
lists in brcmf_usb_rx_complete() leading to handling
brcmf_usbdev_info structure (holding lists' head) as lists' element and in
result causing above problem.
Fix it by walking through all URBs during brcmf_cancel_all_urbs using the
arrays of requests instead of linked lists.
Signed-off-by: Piotr Figiel <p.figiel@camlintechnologies.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c80d26e81e ]
brcmu_pkt_buf_free_skb emits WARNING when attempting to free a sk_buff
which is part of any queue. After USB disconnect this may have happened
when brcmf_fws_hanger_cleanup() is called as per-interface psq was never
cleaned when removing the interface.
Change brcmf_fws_macdesc_cleanup() in a way that it removes the
corresponding packets from hanger table (to avoid double-free when
brcmf_fws_hanger_cleanup() is called) and add a call to clean-up the
interface specific packet queue.
Below is a WARNING during USB disconnect with Raspberry Pi WiFi dongle
running in AP mode. This was reproducible when the interface was
transmitting during the disconnect and is fixed with this commit.
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1171 at drivers/net/wireless/broadcom/brcm80211/brcmutil/utils.c:49 brcmu_pkt_buf_free_skb+0x3c/0x40
Modules linked in: nf_log_ipv4 nf_log_common xt_LOG xt_limit iptable_mangle xt_connmark xt_tcpudp xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter ip_tables x_tables usb_f_mass_storage usb_f_rndis u_ether cdc_acm smsc95xx usbnet ci_hdrc_imx ci_hdrc ulpi usbmisc_imx 8250_exar 8250_pci 8250 8250_base libcomposite configfs udc_core
CPU: 0 PID: 1171 Comm: kworker/0:0 Not tainted 4.19.23-00075-gde33ed8 #99
Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
Workqueue: usb_hub_wq hub_event
[<8010ff84>] (unwind_backtrace) from [<8010bb64>] (show_stack+0x10/0x14)
[<8010bb64>] (show_stack) from [<80840278>] (dump_stack+0x88/0x9c)
[<80840278>] (dump_stack) from [<8011f5ec>] (__warn+0xfc/0x114)
[<8011f5ec>] (__warn) from [<8011f71c>] (warn_slowpath_null+0x40/0x48)
[<8011f71c>] (warn_slowpath_null) from [<805a476c>] (brcmu_pkt_buf_free_skb+0x3c/0x40)
[<805a476c>] (brcmu_pkt_buf_free_skb) from [<805bb6c4>] (brcmf_fws_cleanup+0x1e4/0x22c)
[<805bb6c4>] (brcmf_fws_cleanup) from [<805bc854>] (brcmf_fws_del_interface+0x58/0x68)
[<805bc854>] (brcmf_fws_del_interface) from [<805b66ac>] (brcmf_remove_interface+0x40/0x150)
[<805b66ac>] (brcmf_remove_interface) from [<805b6870>] (brcmf_detach+0x6c/0xb0)
[<805b6870>] (brcmf_detach) from [<805bdbb8>] (brcmf_usb_disconnect+0x30/0x4c)
[<805bdbb8>] (brcmf_usb_disconnect) from [<805e5d64>] (usb_unbind_interface+0x5c/0x1e0)
[<805e5d64>] (usb_unbind_interface) from [<804aab10>] (device_release_driver_internal+0x154/0x1ec)
[<804aab10>] (device_release_driver_internal) from [<804a97f4>] (bus_remove_device+0xcc/0xf8)
[<804a97f4>] (bus_remove_device) from [<804a6fc0>] (device_del+0x118/0x308)
[<804a6fc0>] (device_del) from [<805e488c>] (usb_disable_device+0xa0/0x1c8)
[<805e488c>] (usb_disable_device) from [<805dcf98>] (usb_disconnect+0x70/0x1d8)
[<805dcf98>] (usb_disconnect) from [<805ddd84>] (hub_event+0x464/0xf50)
[<805ddd84>] (hub_event) from [<80135a70>] (process_one_work+0x138/0x3f8)
[<80135a70>] (process_one_work) from [<80135d5c>] (worker_thread+0x2c/0x554)
[<80135d5c>] (worker_thread) from [<8013b1a0>] (kthread+0x124/0x154)
[<8013b1a0>] (kthread) from [<801010e8>] (ret_from_fork+0x14/0x2c)
Exception stack(0xecf8dfb0 to 0xecf8dff8)
dfa0: 00000000 00000000 00000000 00000000
dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
---[ end trace 38d234018e9e2a90 ]---
------------[ cut here ]------------
Signed-off-by: Piotr Figiel <p.figiel@camlintechnologies.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d825db3462 ]
Clang warns about what is clearly a case of passing an uninitalized
variable into a static function:
drivers/net/wireless/broadcom/b43/phy_lp.c:1852:23: error: variable 'gains' is uninitialized when used here
[-Werror,-Wuninitialized]
lpphy_papd_cal(dev, gains, 0, 1, 30);
^~~~~
drivers/net/wireless/broadcom/b43/phy_lp.c:1838:2: note: variable 'gains' is declared here
struct lpphy_tx_gains gains, oldgains;
^
1 error generated.
However, this function is empty, and its arguments are never evaluated,
so gcc in contrast does not warn here. Both compilers behave in a
reasonable way as far as I can tell, so we should change the code
to avoid the warning everywhere.
We could just eliminate the lpphy_papd_cal() function entirely,
given that it has had the TODO comment in it for 10 years now
and is rather unlikely to ever get done. I'm doing a simpler
change here, and just pass the 'oldgains' variable in that has
been initialized, based on the guess that this is what was
originally meant.
Fixes: 2c0d6100da ("b43: LP-PHY: Begin implementing calibration & software RFKILL support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Larry Finger <Larry.Finger@lwfinger.net>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 003b686ace ]
'hostcmd' is alloced by kzalloc, should be freed before
leaving from the error handling cases, otherwise it will
cause mem leak.
Fixes: 3935ccc14d ("mwifiex: add cfg80211 testmode support")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 765976285a ]
In case alloc_workqueue fails, the fix reports the error and
returns to avoid NULL pointer dereference.
Signed-off-by: Kangjie Lu <kjlu@umn.edu>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0979ff7992 ]
Currently, ksym_search located at trace_helpers won't check symbols are
existing or not.
In ksym_search, when symbol is not found, it will return &syms[0](_stext).
But when the kernel symbols are not loaded, it will return NULL, which is
not a desired action.
This commit will add verification logic whether symbols are loaded prior
to the symbol search.
Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 389775a660 ]
It used netdev->uc and netdev->mc list in function
hns3_recover_hw_addr() and hns3_remove_hw_addr().
We should add protect for them.
Fixes: f05e210971 ("net: hns3: Clear mac vlan table entries when unload driver or function reset")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c4e401e5a9 ]
hns3_get_stats() should check the resetting status firstly,
since the device will be reinitialized when resetting. If the
reset has not completed, the hns3_get_stats() may access
invalid memory.
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit abbde27929 ]
Indio->mlock is used for protecting the different iio device modes.
It is currently not being used in this way. Replace the lock with
an internal lock specifically used for protecting the SPI transfer
buffer.
Signed-off-by: Justin Chen <justinpopo6@gmail.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6f9ca1d3eb ]
When building with -Wsometimes-uninitialized, Clang warns:
drivers/iio/common/ssp_sensors/ssp_iio.c:95:6: warning: variable
'calculated_time' is used uninitialized whenever 'if' condition is false
[-Wsometimes-uninitialized]
While it isn't wrong, this will never be a problem because
iio_push_to_buffers_with_timestamp only uses calculated_time
on the same condition that it is assigned (when scan_timestamp
is not zero). While iio_push_to_buffers_with_timestamp is marked
as inline, Clang does inlining in the optimization stage, which
happens after the semantic analysis phase (plus inline is merely
a hint to the compiler).
Fix this by just zero initializing calculated_time.
Link: https://github.com/ClangBuiltLinux/linux/issues/394
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 536cc27dea ]
devm_regmap_init_i2c may fail and return NULL. The fix returns
the error when it fails.
Signed-off-by: Kangjie Lu <kjlu@umn.edu>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit df1d80aee9 ]
For devices from the SigmaDelta family we need to keep CS low when doing a
conversion, since the device will use the MISO line as a interrupt to
indicate that the conversion is complete.
This is why the driver locks the SPI bus and when the SPI bus is locked
keeps as long as a conversion is going on. The current implementation gets
one small detail wrong though. CS is only de-asserted after the SPI bus is
unlocked. This means it is possible for a different SPI device on the same
bus to send a message which would be wrongfully be addressed to the
SigmaDelta device as well. Make sure that the last SPI transfer that is
done while holding the SPI bus lock de-asserts the CS signal.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Alexandru Ardelean <Alexandru.Ardelean@analog.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bc29d3a69d ]
The call to of_find_matching_node_and_match returns a node pointer with
refcount incremented thus it must be explicitly decremented after the
last usage.
Detected by coccinelle with the following warnings:
drivers/gpu/drm/pl111/pl111_versatile.c:333:3-9: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 317, but without a corresponding object release within this function.
drivers/gpu/drm/pl111/pl111_versatile.c:340:3-9: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 317, but without a corresponding object release within this function.
drivers/gpu/drm/pl111/pl111_versatile.c:346:3-9: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 317, but without a corresponding object release within this function.
drivers/gpu/drm/pl111/pl111_versatile.c:354:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 317, but without a corresponding object release within this function.
drivers/gpu/drm/pl111/pl111_versatile.c:395:3-9: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 317, but without a corresponding object release within this function.
drivers/gpu/drm/pl111/pl111_versatile.c:402:1-7: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 317, but without a corresponding object release within this function.
Signed-off-by: Wen Yang <wen.yang99@zte.com.cn>
Cc: Eric Anholt <eric@anholt.net> (supporter:DRM DRIVER FOR ARM PL111 CLCD)
Cc: David Airlie <airlied@linux.ie> (maintainer:DRM DRIVERS)
Cc: Daniel Vetter <daniel@ffwll.ch> (maintainer:DRM DRIVERS)
Cc: dri-devel@lists.freedesktop.org (open list:DRM DRIVERS)
Cc: linux-kernel@vger.kernel.org (open list)
Signed-off-by: Eric Anholt <eric@anholt.net>
Link: https://patchwork.freedesktop.org/patch/msgid/1554307455-40361-6-git-send-email-wen.yang99@zte.com.cn
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f96fb7d198 ]
When the card is registered by the machine driver,
dai link components are probed after the snd_card is
created. This is done in snd_soc_bind_card() which calls
snd_soc_instantiate_card() to first create the snd_card
and then probes the link components by calling
soc_probe_link_components(). The snd_card is used by the
component driver to add the kcontrols associated
with dapm widgets to the card.
When the machine driver is unregistered, the snd_card
is freed when the card resources are cleaned up.
But the snd_card needs to be valid while unloading the
topology dapm widgets in order to remove the kcontrols
from the card.
Since, unloading topology is done when the component
driver is removed, the link components should be removed
in snd_soc_unbind_card(). This will ensure that the kcontrols
are removed before the card resources are cleaned up and
the snd_card itself is freed.
Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 063773011d ]
Lockdep reports the following issue on my setup:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock((work_completion)(&(&rdev->disable_work)->work));
lock(regulator_list_mutex);
lock((work_completion)(&(&rdev->disable_work)->work));
lock(regulator_list_mutex);
The problem is that regulator_unregister takes the
regulator_list_mutex and then calls flush_work on disable_work. But
regulator_disable_work calls regulator_lock_dependent which will
also take the regulator_list_mutex. Resulting in a deadlock if the
flush_work call actually needs to flush the work.
Fix this issue by moving the flush_work outside of the
regulator_list_mutex. The list mutex is not used to guard the point at
which the delayed work is queued, so its use adds no additional safety.
Fixes: f8702f9e4a ("regulator: core: Use ww_mutex for regulators locking")
Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Reviewed-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0a919ae492 ]
Move code calling spi_get_gpio_descs() to happen after ctlr->dev's
name is set in order to have proper GPIO consumer names.
Before:
cat /sys/kernel/debug/gpio
gpiochip0: GPIOs 0-31, parent: platform/40049000.gpio, vf610-gpio:
gpio-6 ( |regulator-usb0-vbus ) out lo
gpiochip1: GPIOs 32-63, parent: platform/4004a000.gpio, vf610-gpio:
gpio-36 ( |scl ) in hi
gpio-37 ( |sda ) in hi
gpio-40 ( |(null) CS1 ) out lo
gpio-41 ( |(null) CS0 ) out lo ACTIVE LOW
gpio-42 ( |miso ) in hi
gpio-43 ( |mosi ) in lo
gpio-44 ( |sck ) out lo
After:
cat /sys/kernel/debug/gpio
gpiochip0: GPIOs 0-31, parent: platform/40049000.gpio, vf610-gpio:
gpio-6 ( |regulator-usb0-vbus ) out lo
gpiochip1: GPIOs 32-63, parent: platform/4004a000.gpio, vf610-gpio:
gpio-36 ( |scl ) in hi
gpio-37 ( |sda ) in hi
gpio-40 ( |spi0 CS1 ) out lo
gpio-41 ( |spi0 CS0 ) out lo ACTIVE LOW
gpio-42 ( |miso ) in hi
gpio-43 ( |mosi ) in lo
gpio-44 ( |sck ) out lo
Signed-off-by: Andrey Smirnov <andrew.smirnov@gmail.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Chris Healy <cphealy@gmail.com>
Cc: linux-spi@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 636e78b1cd ]
clang started to error on invalid asm clobber usage in x86 headers
and many bpf program samples failed to build with the message:
CLANG-bpf /data/users/ast/bpf-next/samples/bpf/xdp_redirect_kern.o
In file included from /data/users/ast/bpf-next/samples/bpf/xdp_redirect_kern.c:14:
In file included from ../include/linux/in.h:23:
In file included from ../include/uapi/linux/in.h:24:
In file included from ../include/linux/socket.h:8:
In file included from ../include/linux/uio.h:14:
In file included from ../include/crypto/hash.h:16:
In file included from ../include/linux/crypto.h:26:
In file included from ../include/linux/uaccess.h:5:
In file included from ../include/linux/sched.h:15:
In file included from ../include/linux/sem.h:5:
In file included from ../include/uapi/linux/sem.h:5:
In file included from ../include/linux/ipc.h:9:
In file included from ../include/linux/refcount.h:72:
../arch/x86/include/asm/refcount.h:72:36: error: asm-specifier for input or output variable conflicts with asm clobber list
r->refs.counter, e, "er", i, "cx");
^
../arch/x86/include/asm/refcount.h:86:27: error: asm-specifier for input or output variable conflicts with asm clobber list
r->refs.counter, e, "cx");
^
2 errors generated.
Override volatile() to workaround the problem.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit caa3c8e525 ]
This patch fixes a bug in the implementation of the function that removes
the device.
The bug can happen when the device is removed but not the driver itself
(e.g. remove by the OS due to PCI freeze in Power architecture).
In that case, there maybe open users that are calling IOCTLs while the
device is removed. This is a possible race condition that the driver must
handle. Otherwise, a kernel panic may occur.
This race is prevented in the hard-reset flow, because the driver makes
sure the users are closed before continuing with the hard-reset. This
race can not occur when the driver itself is removed because the OS makes
sure all the file descriptors are closed.
The fix is to make sure the open users close their file descriptors and if
they don't (after a certain amount of time), the driver sends them a
SIGKILL, because the remove of the device can't be stopped.
The patch re-uses the same code that is called from the hard-reset flow.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9f201aba56 ]
During hard-reset, contexts are closed as part of the tear-down process.
After a context is closed, the driver cleans up the page tables of that
context in the device's DRAM. This action is both dangerous and
unnecessary.
It is unnecessary, because the device is going through a hard-reset, which
means the device's DRAM contents are no longer valid and the device's MMU
is being reset.
It is dangerous, because if the hard-reset came as a result of a PCI
freeze, this action may cause the entire host machine to hang.
Therefore, prevent all device PTE updates when a hard-reset operation is
pending.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 78bf47353b ]
The implementation of IOC_OPAL_ENABLE_DISABLE_MBR handled the value
opal_mbr_data.enable_disable incorrectly: enable_disable is expected
to be one of OPAL_MBR_ENABLE(0) or OPAL_MBR_DISABLE(1). enable_disable
was passed directly to set_mbr_done and set_mbr_enable_disable where
is was interpreted as either OPAL_TRUE(1) or OPAL_FALSE(0). The end
result was that calling IOC_OPAL_ENABLE_DISABLE_MBR with OPAL_MBR_ENABLE
actually disabled the shadow MBR and vice versa.
This patch adds correct conversion from OPAL_MBR_DISABLE/ENABLE to
OPAL_FALSE/TRUE. The change affects existing programs using
IOC_OPAL_ENABLE_DISABLE_MBR but this is typically used only once when
setting up an Opal drive.
Acked-by: Jon Derrick <jonathan.derrick@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Scott Bauer <sbauer@plzdonthack.me>
Signed-off-by: David Kozub <zub@linux.fjfi.cvut.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7c468966f0 ]
The call to of_get_child_by_name returns a node pointer with refcount
incremented thus it must be explicitly decremented after the last
usage.
Detected by coccinelle with the following warnings:
./drivers/cpufreq/kirkwood-cpufreq.c:127:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 118, but without a corresponding object release within this function.
./drivers/cpufreq/kirkwood-cpufreq.c:133:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 118, but without a corresponding object release within this function.
and also do some cleanup:
- of_node_put(np);
- np = NULL;
...
of_node_put(np);
Signed-off-by: Wen Yang <wen.yang99@zte.com.cn>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: linux-pm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8d10dc28a9 ]
The call to of_find_node_by_name returns a node pointer with refcount
incremented thus it must be explicitly decremented after the last
usage.
Detected by coccinelle with the following warnings:
./drivers/cpufreq/pmac32-cpufreq.c:557:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 552, but without a corresponding object release within this function.
./drivers/cpufreq/pmac32-cpufreq.c:569:1-7: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 552, but without a corresponding object release within this function.
./drivers/cpufreq/pmac32-cpufreq.c:598:1-7: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 587, but without a corresponding object release within this function.
Signed-off-by: Wen Yang <wen.yang99@zte.com.cn>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linux-pm@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a9acc26b75 ]
The call to of_get_cpu_node returns a node pointer with refcount
incremented thus it must be explicitly decremented after the last
usage.
Detected by coccinelle with the following warnings:
./drivers/cpufreq/pasemi-cpufreq.c:212:1-7: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 147, but without a corresponding object release within this function.
./drivers/cpufreq/pasemi-cpufreq.c:220:1-7: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 147, but without a corresponding object release within this function.
Signed-off-by: Wen Yang <wen.yang99@zte.com.cn>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-pm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2332980328 ]
The call to of_get_cpu_node returns a node pointer with refcount
incremented thus it must be explicitly decremented after the last
usage.
Detected by coccinelle with the following warnings:
./drivers/cpufreq/ppc_cbe_cpufreq.c:89:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 76, but without a corresponding object release within this function.
./drivers/cpufreq/ppc_cbe_cpufreq.c:89:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 76, but without a corresponding object release within this function.
Signed-off-by: Wen Yang <wen.yang99@zte.com.cn>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: linux-pm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e4bf63482c ]
Most, if not all, Quectel devices use dynamic interface numbers, and
users are able to change the USB configuration at will. Matching on for
example interface number is therefore not possible.
Instead, the QMI device can be identified by looking at the interface
class, subclass and protocol (all 0xff), as well as the number of
endpoints. The reason we need to look at the number of endpoints, is
that the diagnostic port interface has the same class, subclass and
protocol as QMI. However, the diagnostic port only has two endpoints,
while QMI has three.
Until now, we have identified the QMI device by combining a match on
class, subclass and protocol, with a call to the function
quectel_diag_detect(). In quectel_diag_detect(), we check if the number
of endpoints matches for known Quectel vendor/product ids.
Adding new vendor/product ids to quectel_diag_detect() is not a good
long-term solution. This commit replaces the function with a quirk, and
applies the quirk to affected Quectel devices that I have been able to
test the change with (EP06, EM12 and EC25). If the quirk is set and the
number of endpoints equal two, we return from qmi_wwan_probe() with
-ENODEV.
Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>
Acked-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e233516e6a ]
When hclgevf_client_start() fails or VF driver unloaded, there is
nobody to disable keep_alive_timer.
So this patch fixes them.
Fixes: a6d818e31d ("net: hns3: Add vport alive state checking support")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e14d314c7a ]
Dan reported, that cleanup path in test_memcg_subtree_control()
triggers a static checker warning:
./tools/testing/selftests/cgroup/test_memcontrol.c:76 \
test_memcg_subtree_control()
error: uninitialized symbol 'child2'.
Fix this by initializing child2 and parent2 variables and
split the cleanup path into few stages.
Signed-off-by: Roman Gushchin <guro@fb.com>
Fixes: 84092dbcf9 ("selftests: cgroup: add memory controller self-tests")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Shuah Khan (Samsung OSG) <shuah@kernel.org>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Shuah Khan <shuah@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e49f69363a ]
[why]
The existing calculation uses a wrong formula to
calculate bandwidth from timing.
[how]
Expose the existing proper function that calculates the bandwidth,
so dc_link can use it to calculate timing bandwidth correctly.
Signed-off-by: Wenjing Liu <Wenjing.Liu@amd.com>
Reviewed-by: Jun Lei <Jun.Lei@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e91012ee85 ]
clang points out that the declaration of cio_irb does not match the
definition exactly, it is missing the alignment attribute:
../drivers/s390/cio/cio.c:50:1: warning: section does not match previous declaration [-Wsection]
DEFINE_PER_CPU_ALIGNED(struct irb, cio_irb);
^
../include/linux/percpu-defs.h:150:2: note: expanded from macro 'DEFINE_PER_CPU_ALIGNED'
DEFINE_PER_CPU_SECTION(type, name, PER_CPU_ALIGNED_SECTION) \
^
../include/linux/percpu-defs.h:93:9: note: expanded from macro 'DEFINE_PER_CPU_SECTION'
extern __PCPU_ATTRS(sec) __typeof__(type) name; \
^
../include/linux/percpu-defs.h:49:26: note: expanded from macro '__PCPU_ATTRS'
__percpu __attribute__((section(PER_CPU_BASE_SECTION sec))) \
^
../drivers/s390/cio/cio.h:118:1: note: previous attribute is here
DECLARE_PER_CPU(struct irb, cio_irb);
^
../include/linux/percpu-defs.h:111:2: note: expanded from macro 'DECLARE_PER_CPU'
DECLARE_PER_CPU_SECTION(type, name, "")
^
../include/linux/percpu-defs.h:87:9: note: expanded from macro 'DECLARE_PER_CPU_SECTION'
extern __PCPU_ATTRS(sec) __typeof__(type) name
^
../include/linux/percpu-defs.h:49:26: note: expanded from macro '__PCPU_ATTRS'
__percpu __attribute__((section(PER_CPU_BASE_SECTION sec))) \
^
Use DECLARE_PER_CPU_ALIGNED() here, to make the two match.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Sebastian Ott <sebott@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 81a8f2beb3 ]
If CONFIG_PGSTE is not set (e.g. when compiling without KVM), GCC complains:
CC arch/s390/mm/pgtable.o
arch/s390/mm/pgtable.c:413:15: warning: ‘pmd_alloc_map’ defined but not
used [-Wunused-function]
static pmd_t *pmd_alloc_map(struct mm_struct *mm, unsigned long addr)
^~~~~~~~~~~~~
Wrap the function with "#ifdef CONFIG_PGSTE" to silence the warning.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2aa632c5ff ]
The brace initialization used here generates warnings on some
compilers. For example, on GCC 4.9:
[...] In function ‘dm_determine_update_type_for_commit’:
[...] error: missing braces around initializer [-Werror=missing-braces]
struct dc_stream_update stream_update = { 0 };
^
Use memset to make this more portable.
v2: Specify the compiler / diagnostic in the commit message (Paul)
Cc: Sun peng Li <Sunpeng.Li@amd.com>
Cc: Harry Wentland <Harry.Wentland@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Reviewed-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 24613a04ad ]
Commit
2613f36ed9 ("x86/microcode: Attempt late loading only when new microcode is present")
added the new define UCODE_NEW to denote that an update should happen
only when newer microcode (than installed on the system) has been found.
But it missed adjusting that for the old /dev/cpu/microcode loading
interface. Fix it.
Fixes: 2613f36ed9 ("x86/microcode: Attempt late loading only when new microcode is present")
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jann Horn <jannh@google.com>
Link: https://lkml.kernel.org/r/20190405133010.24249-3-bp@alien8.de
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 913140e221 ]
The 'func_code' variable gets printed in debug statements without
a prior initialization in multiple functions, as reported when building
with clang:
drivers/s390/crypto/zcrypt_api.c:659:6: warning: variable 'func_code' is used uninitialized whenever 'if' condition is true
[-Wsometimes-uninitialized]
if (mex->outputdatalength < mex->inputdatalength) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/s390/crypto/zcrypt_api.c:725:29: note: uninitialized use occurs here
trace_s390_zcrypt_rep(mex, func_code, rc,
^~~~~~~~~
drivers/s390/crypto/zcrypt_api.c:659:2: note: remove the 'if' if its condition is always false
if (mex->outputdatalength < mex->inputdatalength) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/s390/crypto/zcrypt_api.c:654:24: note: initialize the variable 'func_code' to silence this warning
unsigned int func_code;
^
Add initializations to all affected code paths to shut up the warning
and make the warning output consistent.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c06e64407e ]
The firmware sets BIT(13) in clkflag to mark a divider as fractional
divider. The clock driver copies the clkflag straight to the flags of
the common clock framework. In the common clk framework flags, BIT(13)
is defined as CLK_DUTY_CYCLE_PARENT.
Add a new field to the zynqmp_clk_divider to specify if a divider is a
fractional devider. Set this field based on the clkflag when registering
a divider.
At the same time, unset BIT(13) from clkflag when copying the flags to
the common clk framework flags.
Signed-off-by: Michael Tretter <m.tretter@pengutronix.de>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit dfe7fb21cd ]
Most rk3288-based boards are derived from the EVB and thus use a PWM
regulator for the logic rail. However, most rk3288-based boards don't
specify the PWM regulator in their device tree. We'll deal with that
by making it critical.
NOTE: it's important to make it critical and not just IGNORE_UNUSED
because all PWMs in the system share the same clock. We don't want
another PWM user to turn the clock on and off and kill the logic rail.
This change is in preparation for actually having the PWMs in the
rk3288 device tree actually point to the proper PWM clock. Up until
now they've all pointed to the clock for the old IP block and they've
all worked due to the fact that rkpwm was IGNORE_UNUSED and that the
clock rates for both clocks were the same.
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 00053de522 ]
Microphone detection provides the button detection features on the
Arizona CODECs as such it will be running if the jack is currently
inserted. If the driver is unbound whilst the jack is still inserted
this will cause warnings from the regulator framework as the MICVDD
regulator is put but was never disabled.
Correct this by disabling microphone detection on driver removal and if
the microphone detection was running disable the regulator and put the
runtime reference that was currently held.
Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 00c0cd9e59 ]
It appears that there is a typo in the rk3288 TRM. For
GRF_SOC_CON0[7] it says that 0 means "vepu" and 1 means "vdpu". It's
the other way around.
How do I know? Here's my evidence:
1. Prior to commit 4d3e84f996 ("clk: rockchip: describe aclk_vcodec
using the new muxgrf type on rk3288") we always pretended that we
were using "aclk_vdpu" and the comment in the code said that this
matched the default setting in the system. In fact the default
setting is 0 according to the TRM and according to reading memory
at bootup. In addition rk3288-based Chromebooks ran like this and
the video codecs worked.
2. With the existing clock code if you boot up and try to enable the
new VIDEO_ROCKCHIP_VPU as a module (and without "clk_ignore_unused"
on the command line), you get errors like "failed to get ack on
domain 'pd_video', val=0x80208". After flipping vepu/vdpu things
init OK.
3. If I export and add both the vepu and vdpu to the list of clocks
for RK3288_PD_VIDEO I can get past the power domain errors, but now
I freeze when the vpu_mmu gets initted.
4. If I just mark the "vdpu" as IGNORE_UNUSED then everything boots up
and probes OK showing that somehow the "vdpu" was important to keep
enabled. This is because we were actually using it as a parent.
5. After this change I can hack "aclk_vcodec_pre" to parent from
"aclk_vepu" using assigned-clocks and the video codec still probes
OK.
6. Rockchip has said so on the mailing list [1].
...so let's fix it.
Let's also add CLK_SET_RATE_PARENT to "aclk_vcodec_pre" as suggested
by Jonas Karlman. Prior to the same commit you could do
clk_set_rate() on "aclk_vcodec" and it would change "aclk_vdpu".
That's because "aclk_vcodec" was a simple gate clock (always gets
CLK_SET_RATE_PARENT) and its direct parent was "aclk_vdpu". After
that commit "aclk_vcodec_pre" gets in the way so we need to add
CLK_SET_RATE_PARENT to it too.
[1] https://lkml.kernel.org/r/1d17b015-9e17-34b9-baf8-c285dc1957aa@rock-chips.com
Fixes: 4d3e84f996 ("clk: rockchip: describe aclk_vcodec using the new muxgrf type on rk3288")
Suggested-by: Jonas Karlman <jonas@kwiboo.se>
Suggested-by: Randy Li <ayaka@soulik.info>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit dc351d4c5f ]
The dev->power.direct_complete flag may become set in device_prepare() in
case the device don't have any PM callbacks (dev->power.no_pm_callbacks is
set). This leads to a broken behaviour, when there is child having wakeup
enabled and relies on its parent to be used in the wakeup path.
More precisely, when the direct complete path becomes selected for the
child in __device_suspend(), the propagation of the dev->power.wakeup_path
becomes skipped as well.
Let's address this problem, by checking if the device is a part the wakeup
path or has wakeup enabled, then prevent the direct complete path from
being used.
Reported-by: Loic Pallardy <loic.pallardy@st.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
[ rjw: Comment cleanup ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit cc5ff6e90f ]
If there is pending skb in RX flow when close the port, and the
pending buffer is not cleaned, the new packet will be added to
the pending skb when the port opens again, and the first new
packet has error data.
This patch cleans the pending skb when clean RX ring.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 05cb6b2a66 ]
eSDHC-A001: The data timeout counter (SYSCTL[DTOCV]) is not
reliable for DTOCV values 0x4(2^17 SD clock), 0x8(2^21 SD clock),
and 0xC(2^25 SD clock). The data timeout counter can count from
2^13–2^27, but for values 2^17, 2^21, and 2^25, the timeout
counter counts for only 2^13 SD clocks.
A-008358: The data timeout counter value loaded into the timeout
counter is less than expected and can result into early timeout
error in case of eSDHC data transactions. The table below shows
the expected vs actual timeout period for different values of
SYSCTL[DTOCV]:
these two erratum has the same quirk to control it, and set
SDHCI_QUIRK_RESET_AFTER_REQUEST to fix above issue.
Signed-off-by: Yinbo Zhu <yinbo.zhu@nxp.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5dd1955225 ]
In the event of that any data error (like, IRQSTAT[DCE]) occurs
during an eSDHC data transaction where DMA is used for data
transfer to/from the system memory, setting the SYSCTL[RSTD]
register may cause a system hang. If software sets the register
SYSCTL[RSTD] to 1 for error recovery while DMA transferring is
not complete, eSDHC may hang the system bus. This happens because
the software register SYSCTL[RSTD] resets the DMA engine without
waiting for the completion of pending system transactions. This
erratum is to fix this issue.
Signed-off-by: Yinbo Zhu <yinbo.zhu@nxp.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a46e427125 ]
Software writing to the Transfer Type configuration register
(system clock domain) can cause a setup/hold violation in the
CRC flops (card clock domain), which can cause write accesses
to be sent with corrupt CRC values. This issue occurs only for
write preceded by read. this erratum is to fix this issue.
Signed-off-by: Yinbo Zhu <yinbo.zhu@nxp.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 002ee28e8b ]
pwrseq_emmc.c implements a HW reset procedure for eMMC chip by driving a
GPIO line.
It registers the .reset() cb on mmc_pwrseq_ops and it registers a system
restart notification handler; both of them perform reset by unconditionally
calling gpiod_set_value().
If the eMMC reset line is tied to a GPIO controller whose driver can sleep
(i.e. I2C GPIO controller), then the kernel would spit warnings when trying
to reset the eMMC chip by means of .reset() mmc_pwrseq_ops cb (that is
exactly what I'm seeing during boot).
Furthermore, on system reset we would gets to the system restart
notification handler with disabled interrupts - local_irq_disable() is
called in machine_restart() at least on ARM/ARM64 - and we would be in
trouble when the GPIO driver tries to sleep (which indeed doesn't happen
here, likely because in my case the machine specific code doesn't call
do_kernel_restart(), I guess..).
This patch fixes the .reset() cb to make use of gpiod_set_value_cansleep(),
so that the eMMC gets reset on boot without complaints, while, since there
isn't that much we can do, we avoid register the restart handler if the
GPIO controller has a sleepy driver (and we spit a dev_notice() message to
let people know)..
This had been tested on a downstream 4.9 kernel with backported
commit 83f37ee7ba33 ("mmc: pwrseq: Add reset callback to the struct
mmc_pwrseq_ops") and commit ae60fb031cf2 ("mmc: core: Don't do eMMC HW
reset when resuming the eMMC card"), because I couldn't boot my board
otherwise. Maybe worth to RFT.
Signed-off-by: Andrea Merello <andrea.merello@gmail.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d8649fc1c5 ]
When we discover the PHY is empty in sas_rediscover_dev(), the PHY
information (like negotiated linkrate) is not updated.
As such, for a user examining sysfs for that PHY, they would see
incorrect values:
root@(none)$ cd /sys/class/sas_phy/phy-0:0:20
root@(none)$ more negotiated_linkrate
3.0 Gbit
root@(none)$ echo 0 > enable
root@(none)$ more negotiated_linkrate
3.0 Gbit
So fix this, simply discover the PHY again, even though we know it's empty;
in the above example, this gives us:
root@(none)$ more negotiated_linkrate
Phy disabled
We must do this after unregistering the device associated with the PHY
(in sas_unregister_devs_sas_addr()).
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 73e6ff71a7 ]
Super-IO accesses may fail on a system with no or unmapped LPC bus.
Unable to handle kernel paging request at virtual address ffffffbffee0002e
pgd = ffffffc1d68d4000
[ffffffbffee0002e] *pgd=0000000000000000, *pud=0000000000000000
Internal error: Oops: 94000046 [#1] PREEMPT SMP
Modules linked in: f71805f(+) hwmon
CPU: 3 PID: 1659 Comm: insmod Not tainted 4.5.0+ #88
Hardware name: linux,dummy-virt (DT)
task: ffffffc1f6665400 ti: ffffffc1d6418000 task.ti: ffffffc1d6418000
PC is at f71805f_find+0x6c/0x358 [f71805f]
Also, other drivers may attempt to access the LPC bus at the same time,
resulting in undefined behavior.
Use request_muxed_region() to ensure that IO access on the requested
address space is supported, and to ensure that access by multiple
drivers is synchronized.
Fixes: e53004e20a ("hwmon: New f71805f driver")
Reported-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reported-by: John Garry <john.garry@huawei.com>
Cc: John Garry <john.garry@huawei.com>
Acked-by: John Garry <john.garry@huawei.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 755a9b0f8a ]
Super-IO accesses may fail on a system with no or unmapped LPC bus.
Also, other drivers may attempt to access the LPC bus at the same time,
resulting in undefined behavior.
Use request_muxed_region() to ensure that IO access on the requested
address space is supported, and to ensure that access by multiple drivers
is synchronized.
Fixes: ba224e2c4f ("hwmon: New PC87427 hardware monitoring driver")
Reported-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reported-by: John Garry <john.garry@huawei.com>
Cc: John Garry <john.garry@huawei.com>
Acked-by: John Garry <john.garry@huawei.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8c08267567 ]
Super-IO accesses may fail on a system with no or unmapped LPC bus.
Also, other drivers may attempt to access the LPC bus at the same time,
resulting in undefined behavior.
Use request_muxed_region() to ensure that IO access on the requested
address space is supported, and to ensure that access by multiple drivers
is synchronized.
Fixes: 8d5d45fb14 ("I2C: Move hwmon drivers (2/3)")
Reported-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reported-by: John Garry <john.garry@huawei.com>
Cc: John Garry <john.garry@huawei.com>
Acked-by: John Garry <john.garry@huawei.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d6410408ad ]
Super-IO accesses may fail on a system with no or unmapped LPC bus.
Also, other drivers may attempt to access the LPC bus at the same time,
resulting in undefined behavior.
Use request_muxed_region() to ensure that IO access on the requested
address space is supported, and to ensure that access by multiple drivers
is synchronized.
Fixes: 8d5d45fb14 ("I2C: Move hwmon drivers (2/3)")
Reported-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reported-by: John Garry <john.garry@huawei.com>
Cc: John Garry <john.garry@huawei.com>
Acked-by: John Garry <john.garry@huawei.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 14b97ba5c2 ]
Super-IO accesses may fail on a system with no or unmapped LPC bus.
Also, other drivers may attempt to access the LPC bus at the same time,
resulting in undefined behavior.
Use request_muxed_region() to ensure that IO access on the requested
address space is supported, and to ensure that access by multiple drivers
is synchronized.
Fixes: 2219cd81a6 ("hwmon/vt1211: Add probing of alternate config index port")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b53b012805 ]
The patch 23c7b54ca1: "PM / devfreq: Fix devfreq_add_device() when
drivers are built as modules." leads to the following static checker
warning:
drivers/devfreq/devfreq.c:1043 governor_store()
warn: 'governor' can also be NULL
The reason is that the try_then_request_governor() function returns both
error pointers and NULL. It should just return error pointers, so fix
this by returning a ERR_PTR to the error intead of returning NULL.
Fixes: 23c7b54ca1 ("PM / devfreq: Fix devfreq_add_device() when drivers are built as modules.")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
Reviewed-by: Chanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e37a784d8b ]
->i_crypt_info starts out NULL and may later be locklessly set to a
non-NULL value by the cmpxchg() in fscrypt_get_encryption_info().
But ->i_crypt_info is used directly, which technically is incorrect.
It's a data race, and it doesn't include the data dependency barrier
needed to safely dereference the pointer on at least one architecture.
Fix this by using READ_ONCE() instead. Note: we don't need to use
smp_load_acquire(), since dereferencing the pointer only requires a data
dependency barrier, which is already included in READ_ONCE(). We also
don't need READ_ONCE() in places where ->i_crypt_info is unconditionally
dereferenced, since it must have already been checked.
Also downgrade the cmpxchg() to cmpxchg_release(), since RELEASE
semantics are sufficient on the write side.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a6d2a5a92e ]
Currently if alloc_skb fails to allocate the skb a null skb is passed to
t4_set_arp_err_handler and this ends up dereferencing the null skb. Avoid
the NULL pointer dereference by checking for a NULL skb and returning
early.
Addresses-Coverity: ("Dereference null return")
Fixes: b38a0ad8ec ("RDMA/cxgb4: Set arp error handler for PASS_ACCEPT_RPL messages")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Potnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 81fb8736dd ]
clock_getres() in the vDSO library has to preserve the same behaviour
of posix_get_hrtimer_res().
In particular, posix_get_hrtimer_res() does:
sec = 0;
ns = hrtimer_resolution;
where 'hrtimer_resolution' depends on whether or not high resolution
timers are enabled, which is a runtime decision.
The vDSO incorrectly returns the constant CLOCK_REALTIME_RES. Fix this
by exposing 'hrtimer_resolution' in the vDSO datapage and returning that
instead.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
[will: Use WRITE_ONCE(), move adr off COARSE path, renumber labels, use 'w' reg]
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 36a2ba0775 ]
In a system where, through IORT firmware mappings, the SMMU device is
mapped to a NUMA node that is not online, the kernel bootstrap results
in the following crash:
Unable to handle kernel paging request at virtual address 0000000000001388
Mem abort info:
ESR = 0x96000004
Exception class = DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000004
CM = 0, WnR = 0
[0000000000001388] user address but active_mm is swapper
Internal error: Oops: 96000004 [#1] SMP
Modules linked in:
CPU: 5 PID: 1 Comm: swapper/0 Not tainted 5.0.0 #15
pstate: 80c00009 (Nzcv daif +PAN +UAO)
pc : __alloc_pages_nodemask+0x13c/0x1068
lr : __alloc_pages_nodemask+0xdc/0x1068
...
Process swapper/0 (pid: 1, stack limit = 0x(____ptrval____))
Call trace:
__alloc_pages_nodemask+0x13c/0x1068
new_slab+0xec/0x570
___slab_alloc+0x3e0/0x4f8
__slab_alloc+0x60/0x80
__kmalloc_node_track_caller+0x10c/0x478
devm_kmalloc+0x44/0xb0
pinctrl_bind_pins+0x4c/0x188
really_probe+0x78/0x2b8
driver_probe_device+0x64/0x110
device_driver_attach+0x74/0x98
__driver_attach+0x9c/0xe8
bus_for_each_dev+0x84/0xd8
driver_attach+0x30/0x40
bus_add_driver+0x170/0x218
driver_register+0x64/0x118
__platform_driver_register+0x54/0x60
arm_smmu_driver_init+0x24/0x2c
do_one_initcall+0xbc/0x328
kernel_init_freeable+0x304/0x3ac
kernel_init+0x18/0x110
ret_from_fork+0x10/0x1c
Code: f90013b5 b9410fa1 1a9f0694 b50014c2 (b9400804)
---[ end trace dfeaed4c373a32da ]--
Change the dev_set_proximity() hook prototype so that it returns a
value and make it return failure if the PXM->NUMA-node mapping
corresponds to an offline node, fixing the crash.
Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Link: https://lore.kernel.org/linux-arm-kernel/20190315021940.86905-1-wangkefeng.wang@huawei.com/
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bfb0ebed53 ]
Modifying the VLAN stripping options when a port VLAN is configured
will break traffic for the VSI, and conceptually doesn't make sense,
so don't allow this.
Signed-off-by: Nicholas Nunley <nicholas.d.nunley@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 06b6e2a233 ]
This patch fixes the problem with the driver being able to add only 7
multicast MAC address filters instead of 16. The problem is fixed by
changing the maximum number of MAC address filters to 16+1+1 (two extra
are needed because the driver uses 1 for unicast MAC address and 1 for
broadcast).
Signed-off-by: Adam Ludkiewicz <adam.ludkiewicz@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit df8e249be8 ]
Set the Rx flow classification enable flag only if key config
operation is successful.
Fixes 3f9b5c9 ("dpaa2-eth: Configure Rx flow classification key")
Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d41ce98a12 ]
With randconfig build testing on arm64, we can run into a configuration
that has CONFIG_OMAP_CONTROL_PHY=m and CONFIG_OMAP_USB2=y, which in turn
causes a link failure:
drivers/phy/ti/phy-omap-usb2.o: In function `omap_usb_phy_power':
phy-omap-usb2.c:(.text+0x17c): undefined reference to `omap_control_phy_power'
I could not come up with a good way to correctly describe the relation
of the two symbols, but if we just select CONFIG_OMAP_CONTROL_PHY
during compile testing, we can no longer run into the broken configuration.
Fixes: 6777cee3a8 ("phy: ti: usb2: Add support for AM654 USB2 PHY")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 208d3423ee ]
gcc points out that when CONFIG_GPIOLIB is disabled,
gpiod_get_array_value_cansleep() returns 0 but fails to set its output:
drivers/phy/motorola/phy-mapphone-mdm6600.c: In function 'phy_mdm6600_status':
drivers/phy/motorola/phy-mapphone-mdm6600.c:220:24: error: 'values[0]' is used uninitialized in this function [-Werror=uninitialized]
This could be fixed more generally in gpiolib by returning a failure
code, but for this specific case, the easier workaround is to add a
gpiolib dependency.
Fixes: 5d1ebbda03 ("phy: mapphone-mdm6600: Add USB PHY driver for MDM6600 on Droid 4")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e6f32efb1b ]
On platforms where the MUSB and HCI controllers share PHY0, PHY passby
is required when using the HCI controller with the PHY, but it must be
disabled when the MUSB controller is used instead.
Without this, PHY0 passby is always enabled, which results in broken
peripheral mode on such platforms (e.g. H3/H5).
Fixes: ba4bdc9e1d ("PHY: sunxi: Add driver for sunxi usb phy")
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 95cee0b4e3 ]
Add a required reset to the SDM845 UFS phy to express the PHY reset
bit inside the UFS controller register space. Before this change, this
reset was not expressed in the DT, and the driver utilized two different
callbacks (phy_init and phy_poweron) to implement a two-phase
initialization procedure that involved deasserting this reset between
init and poweron. This abused the two callbacks and diluted their
purpose.
That scheme does not work as regulators cannot be turned off in
phy_poweroff because they were turned on in init, rather than poweron.
The net result is that regulators are left on in suspend that shouldn't
be.
This new scheme gives the UFS reset to the PHY, so that it can fully
initialize itself in a single callback. We can then turn regulators on
during poweron and off during poweroff.
Signed-off-by: Evan Green <evgreen@chromium.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7dbcf2b0b7 ]
Commit
37fe6a42b3 ("x86: Check stack overflow in detail")
added a broad check for the full exception stack area, i.e. it considers
the full exception stack area as valid.
That's wrong in two aspects:
1) It does not check the individual areas one by one
2) #DF, NMI and #MCE are not enabling interrupts which means that a
regular device interrupt cannot happen in their context. In fact if a
device interrupt hits one of those IST stacks that's a bug because some
code path enabled interrupts while handling the exception.
Limit the check to the #DB stack and consider all other IST stacks as
'overflow' or invalid.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com>
Cc: Nicolai Stange <nstange@suse.de>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190414160143.682135110@linutronix.de
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 381419fa72 ]
The SCSI core does not like to have devices or hosts unregistered
while error recovery is in progress. Trying to do so can lead to
self-deadlock: Part of the removal code tries to obtain a lock already
held by the error handler.
This can cause problems for the usb-storage and uas drivers, because
their error handler routines perform a USB reset, and if the reset
fails then the USB core automatically goes on to unbind all drivers
from the device's interfaces -- all while still in the context of the
SCSI error handler.
As it turns out, practically all the scenarios leading to a USB reset
failure end up causing a device disconnect (the main error pathway in
usb_reset_and_verify_device(), at the end of the routine, calls
hub_port_logical_disconnect() before returning). As a result, the
hub_wq thread will soon become aware of the problem and will unbind
all the device's drivers in its own context, not in the
error-handler's context.
This means that usb_reset_device() does not need to call
usb_unbind_and_rebind_marked_interfaces() in cases where
usb_reset_and_verify_device() has returned an error, because hub_wq
will take care of everything anyway.
This particular problem was observed in somewhat artificial
circumstances, by using usbfs to tell a hub to power-down a port
connected to a USB-3 mass storage device using the UAS protocol. With
the port turned off, the currently executing command timed out and the
error handler started running. The USB reset naturally failed,
because the hub port was off, and the error handler deadlocked as
described above. Not carrying out the call to
usb_unbind_and_rebind_marked_interfaces() fixes this issue.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: Kento Kobayashi <Kento.A.Kobayashi@sony.com>
Tested-by: Kento Kobayashi <Kento.A.Kobayashi@sony.com>
CC: Bart Van Assche <bvanassche@acm.org>
CC: Martin K. Petersen <martin.petersen@oracle.com>
CC: Jacky Cao <Jacky.Cao@sony.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a4cdc9baee ]
Subsequent code relies on the values that qeth_update_from_chp_desc()
reads from the CHP descriptor. Rather than dealing with weird errors
later on, just handle it properly here.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 09f11b6c99 ]
switch_lock was introduced because it allowed serialization of device
authorization requests from userspace without need to take the big
domain lock (tb->lock). This was fine because device authorization with
ICM is just one command that is sent to the firmware. Now that we start
to handle all tunneling in the driver switch_lock is not enough because
we need to walk over the topology to establish paths.
For this reason drop switch_lock from the driver completely in favour of
big domain lock.
There is one complication, though. If userspace is waiting for the lock
in tb_switch_set_authorized(), it keeps the device_del() from removing
the sysfs attribute because it waits for active users to release the
attribute first which leads into following splat:
INFO: task kworker/u8:3:73 blocked for more than 61 seconds.
Tainted: G W 5.1.0-rc1+ #244
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/u8:3 D12976 73 2 0x80000000
Workqueue: thunderbolt0 tb_handle_hotplug [thunderbolt]
Call Trace:
? __schedule+0x2e5/0x740
? _raw_spin_lock_irqsave+0x12/0x40
? prepare_to_wait_event+0xc5/0x160
schedule+0x2d/0x80
__kernfs_remove.part.17+0x183/0x1f0
? finish_wait+0x80/0x80
kernfs_remove_by_name_ns+0x4a/0x90
remove_files.isra.1+0x2b/0x60
sysfs_remove_group+0x38/0x80
sysfs_remove_groups+0x24/0x40
device_remove_attrs+0x3d/0x70
device_del+0x14c/0x360
device_unregister+0x15/0x50
tb_switch_remove+0x9e/0x1d0 [thunderbolt]
tb_handle_hotplug+0x119/0x5a0 [thunderbolt]
? process_one_work+0x1b7/0x420
process_one_work+0x1b7/0x420
worker_thread+0x37/0x380
? _raw_spin_unlock_irqrestore+0xf/0x30
? process_one_work+0x420/0x420
kthread+0x118/0x130
? kthread_create_on_node+0x60/0x60
ret_from_fork+0x35/0x40
We deal this by following what network stack did for some of their
attributes and use mutex_trylock() with restart_syscall(). This makes
userspace release the attribute allowing sysfs attribute removal to
progress before the write is restarted and eventually fail when the
attribute is removed.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f98baa3109 ]
The frame_busy mask is used in frame_done event handling, which is not
invoked for async commits. So an async commit will leave the
frame_busy mask populated after it completes and future commits will start
with the busy mask incorrect.
This showed up on disable after cursor move. I was hitting the "this should
not happen" comment in the frame event worker since frame_busy was set,
we queued the event, but there were no frames pending (since async
also doesn't set that).
Reviewed-by: Fritz Koenig <frkoenig@google.com>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20190130163220.138637-1-sean@poorly.run
Signed-off-by: Rob Clark <robdclark@chromium.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6cd5235c31 ]
The call to of_get_child_by_name returns a node pointer with refcount
incremented thus it must be explicitly decremented after the last
usage.
Detected by coccinelle with the following warnings:
drivers/gpu/drm/msm/adreno/a5xx_gpu.c:57:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 47, but without a corresponding object release within this function.
drivers/gpu/drm/msm/adreno/a5xx_gpu.c:66:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 47, but without a corresponding object release within this function.
drivers/gpu/drm/msm/adreno/a5xx_gpu.c:118:1-7: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 47, but without a corresponding object release within this function.
drivers/gpu/drm/msm/adreno/a5xx_gpu.c:57:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 51, but without a corresponding object release within this function.
drivers/gpu/drm/msm/adreno/a5xx_gpu.c:66:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 51, but without a corresponding object release within this function.
drivers/gpu/drm/msm/adreno/a5xx_gpu.c:118:1-7: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 51, but without a corresponding object release within this function.
Signed-off-by: Wen Yang <wen.yang99@zte.com.cn>
Cc: Rob Clark <robdclark@gmail.com>
Cc: Sean Paul <sean@poorly.run>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Jordan Crouse <jcrouse@codeaurora.org>
Cc: Mamta Shukla <mamtashukla555@gmail.com>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Sharat Masetty <smasetty@codeaurora.org>
Cc: linux-arm-msm@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: freedreno@lists.freedesktop.org
Cc: linux-kernel@vger.kernel.org (open list)
Reviewed-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Rob Clark <robdclark@chromium.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a511227787 ]
The kzalloc here was being used without checking the return - if the
kzalloc fails return VCHIQ_ERROR. The call-site of
vchiq_platform_init_state() vchiq_init_state() was not responding
to an allocation failure so checks for != VCHIQ_SUCCESS
and pass VCHIQ_ERROR up to vchiq_platform_init() which then
will fail with -EINVAL.
Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
Reported-by: kbuild test robot <lkp@intel.com>
Acked-By: Stefan Wahren <stefan.wahren@i2se.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4dcabece4c ]
The number of descendant cgroups and the number of dying
descendant cgroups are currently synchronized using the cgroup_mutex.
The number of descendant cgroups will be required by the cgroup v2
freezer, which will use it to determine if a cgroup is frozen
(depending on total number of descendants and number of frozen
descendants). It's not always acceptable to grab the cgroup_mutex,
especially from quite hot paths (e.g. exit()).
To avoid this, let's additionally synchronize these counters using
the css_set_lock.
So, it's safe to read these counters with either cgroup_mutex or
css_set_lock locked, and for changing both locks should be acquired.
Signed-off-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: kernel-team@fb.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b7d5dc2107 ]
The per-CPU variable batched_entropy_uXX is protected by get_cpu_var().
This is just a preempt_disable() which ensures that the variable is only
from the local CPU. It does not protect against users on the same CPU
from another context. It is possible that a preemptible context reads
slot 0 and then an interrupt occurs and the same value is read again.
The above scenario is confirmed by lockdep if we add a spinlock:
| ================================
| WARNING: inconsistent lock state
| 5.1.0-rc3+ #42 Not tainted
| --------------------------------
| inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
| ksoftirqd/9/56 [HC0[0]:SC1[1]:HE0:SE0] takes:
| (____ptrval____) (batched_entropy_u32.lock){+.?.}, at: get_random_u32+0x3e/0xe0
| {SOFTIRQ-ON-W} state was registered at:
| _raw_spin_lock+0x2a/0x40
| get_random_u32+0x3e/0xe0
| new_slab+0x15c/0x7b0
| ___slab_alloc+0x492/0x620
| __slab_alloc.isra.73+0x53/0xa0
| kmem_cache_alloc_node+0xaf/0x2a0
| copy_process.part.41+0x1e1/0x2370
| _do_fork+0xdb/0x6d0
| kernel_thread+0x20/0x30
| kthreadd+0x1ba/0x220
| ret_from_fork+0x3a/0x50
…
| other info that might help us debug this:
| Possible unsafe locking scenario:
|
| CPU0
| ----
| lock(batched_entropy_u32.lock);
| <Interrupt>
| lock(batched_entropy_u32.lock);
|
| *** DEADLOCK ***
|
| stack backtrace:
| Call Trace:
…
| kmem_cache_alloc_trace+0x20e/0x270
| ipmi_alloc_recv_msg+0x16/0x40
…
| __do_softirq+0xec/0x48d
| run_ksoftirqd+0x37/0x60
| smpboot_thread_fn+0x191/0x290
| kthread+0xfe/0x130
| ret_from_fork+0x3a/0x50
Add a spinlock_t to the batched_entropy data structure and acquire the
lock while accessing it. Acquire the lock with disabled interrupts
because this function may be used from interrupt context.
Remove the batched_entropy_reset_lock lock. Now that we have a lock for
the data scructure, we can access it from a remote CPU.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fe6f1a6a8e ]
When the system boots with random.trust_cpu=1 it doesn't initialize the
per-NUMA CRNGs because it skips the rest of the CRNG startup code. This
means that the code from 1e7f583af6 ("random: make /dev/urandom scalable
for silly userspace programs") is not used when random.trust_cpu=1.
crash> dmesg | grep random:
[ 0.000000] random: get_random_bytes called from start_kernel+0x94/0x530 with crng_init=0
[ 0.314029] random: crng done (trusting CPU's manufacturer)
crash> print crng_node_pool
$6 = (struct crng_state **) 0x0
After adding the missing call to numa_crng_init() the per-NUMA CRNGs are
initialized again:
crash> dmesg | grep random:
[ 0.000000] random: get_random_bytes called from start_kernel+0x94/0x530 with crng_init=0
[ 0.314031] random: crng done (trusting CPU's manufacturer)
crash> print crng_node_pool
$1 = (struct crng_state **) 0xffff9a915f4014a0
The call to invalidate_batched_entropy() was also missing. This is
important for architectures like PPC and S390 which only have the
arch_get_random_seed_* functions.
Fixes: 39a8883a2b ("random: add a config option to trust the CPU's hwrng")
Signed-off-by: Jon DeVree <nuxi@vault24.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 56c46bba9b ]
With STRICT_KERNEL_RWX enabled anything marked __init is placed at a 16M
boundary. This is necessary so that it can be repurposed later with
different permissions. However, in kernels with text larger than 16M,
this pushes early_setup past 32M, incapable of being reached by the
branch instruction.
Fix this by setting the CTR and branching there instead.
Fixes: 1e0fc9d1eb ("powerpc/Kconfig: Enable STRICT_KERNEL_RWX for some configs")
Signed-off-by: Russell Currey <ruscur@russell.cc>
[mpe: Fix it to work on BE by using DOTSYM()]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2d4d9b308f ]
When booted with "topology_updates=no", or when "off" is written to
/proc/powerpc/topology_updates, NUMA reassignments are inhibited for
PRRN and VPHN events. However, migration and suspend unconditionally
re-enable reassignments via start_topology_update(). This is
incoherent.
Check the topology_updates_enabled flag in
start/stop_topology_update() so that callers of those APIs need not be
aware of whether reassignments are enabled. This allows the
administrative decision on reassignments to remain in force across
migrations and suspensions.
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2c88e3c7ec ]
commit 2da78092dd "block: Fix dev_t minor allocation lifetime"
specifically moved blk_free_devt(dev->devt) call to part_release()
to avoid reallocating device number before the device is fully
shutdown.
However, it can cause use-after-free on gendisk in get_gendisk().
We use md device as example to show the race scenes:
Process1 Worker Process2
md_free
blkdev_open
del_gendisk
add delete_partition_work_fn() to wq
__blkdev_get
get_gendisk
put_disk
disk_release
kfree(disk)
find part from ext_devt_idr
get_disk_and_module(disk)
cause use after free
delete_partition_work_fn
put_device(part)
part_release
remove part from ext_devt_idr
Before <devt, hd_struct pointer> is removed from ext_devt_idr by
delete_partition_work_fn(), we can find the devt and then access
gendisk by hd_struct pointer. But, if we access the gendisk after
it have been freed, it can cause in use-after-freeon gendisk in
get_gendisk().
We fix this by adding a new helper blk_invalidate_devt() in
delete_partition() and del_gendisk(). It replaces hd_struct
pointer in idr with value 'NULL', and deletes the entry from
idr in part_release() as we do now.
Thanks to Jan Kara for providing the solution and more clear comments
for the code.
Fixes: 2da78092dd ("block: Fix dev_t minor allocation lifetime")
Cc: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c1ced46c7b ]
The ctrl_check_input() function is called from pvr2_ctrl_range_check().
It's supposed to validate user supplied input and return true or false
depending on whether the input is valid or not. The problem is that
negative shifts or shifts greater than 31 are undefined in C. In
practice with GCC they result in shift wrapping so this function returns
true for some inputs which are not valid and this could result in a
buffer overflow:
drivers/media/usb/pvrusb2/pvrusb2-ctrl.c:205 pvr2_ctrl_get_valname()
warn: uncapped user index 'names[val]'
The cptr->hdw->input_allowed_mask mask is configured in pvr2_hdw_create()
and the highest valid bit is BIT(4).
Fixes: 7fb20fa38c ("V4L/DVB (7299): pvrusb2: Improve logic which handles input choice availability")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 70c4cf17e4 ]
In audit_rule_change(), audit_data_to_entry() is firstly invoked to
translate the payload data to the kernel's rule representation. In
audit_data_to_entry(), depending on the audit field type, an audit tree may
be created in audit_make_tree(), which eventually invokes kmalloc() to
allocate the tree. Since this tree is a temporary tree, it will be then
freed in the following execution, e.g., audit_add_rule() if the message
type is AUDIT_ADD_RULE or audit_del_rule() if the message type is
AUDIT_DEL_RULE. However, if the message type is neither AUDIT_ADD_RULE nor
AUDIT_DEL_RULE, i.e., the default case of the switch statement, this
temporary tree is not freed.
To fix this issue, only allocate the tree when the type is AUDIT_ADD_RULE
or AUDIT_DEL_RULE.
Signed-off-by: Wenwen Wang <wang6495@umn.edu>
Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bccb89cf9c ]
This driver returns an error if unsupported media bus pixel code is
requested by VIDIOC_SUBDEV_S_FMT.
But according to Documentation/media/uapi/v4l/vidioc-subdev-g-fmt.rst,
Drivers must not return an error solely because the requested format
doesn't match the device capabilities. They must instead modify the
format to match what the hardware can provide.
So select default format code and return success in that case.
This is detected by v4l2-compliance.
Cc: "Lad, Prabhakar" <prabhakar.csengg@gmail.com>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: Lad, Prabhakar <prabhakar.csengg@gmail.com>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f604f0f5af ]
If the application was streaming from both videoX and vbiX, and streaming
from videoX was stopped, then the vbi streaming also stopped.
The cause being that stop_streaming for video stopped the subdevs as well,
instead of only doing that if dev->streaming_users reached 0.
au0828_stop_vbi_streaming was also wrong since it didn't stop the subdevs
at all when dev->streaming_users reached 0.
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Tested-by: Shuah Khan <shuah@kernel.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ccdd85d518 ]
In preparation for adding asynchronous subdevice support to the driver,
don't acquire v4l2_clk from the driver .probe() callback as that may
fail if the clock is provided by a bridge driver which may be not yet
initialized. Move the v4l2_clk_get() to ov6650_video_probe() helper
which is going to be converted to v4l2_subdev_internal_ops.registered()
callback, executed only when the bridge driver is ready.
Signed-off-by: Janusz Krzysztofik <jmkrzyszt@gmail.com>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bbeefa7357 ]
The error return value is not written by some firmware codecs, such as
MPEG-2 decode on CodaHx4. Clear the error return value before starting
the picture run to avoid misinterpreting unrelated values returned by
sequence initialization as error return value.
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e2c114c06d ]
Even if this case shouldn't happen when controller is properly programmed,
it's still better to avoid dumping a kernel Oops for this.
As the sequence may happen only for debugging purposes, log the error and
just finish the tasklet call.
Signed-off-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Acked-by: Ludovic Desroches <ludovic.desroches@microchip.com>
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0d2e2a82d4 ]
Uncore PMU drivers face an awkward cyclic dependency wherein:
- They have to pick a valid online CPU to associate with before
registering the PMU device, since it will get exposed to userspace
immediately.
- The PMU registration has to be be at least partly complete before
hotplug events can be handled, since trying to migrate an
uninitialised context would be bad.
- The hotplug handler has to be ready as soon as a CPU is chosen, lest
it go offline without the user-visible cpumask value getting updated.
The arm-cci driver has tried to solve this by using get_cpu() to pick
the current CPU and prevent it from disappearing while both
registrations are performed, but that results in taking mutexes with
preemption disabled, which makes certain configurations very unhappy:
[ 1.983337] BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:2004
[ 1.983340] in_atomic(): 1, irqs_disabled(): 0, pid: 1, name: swapper/0
[ 1.983342] Preemption disabled at:
[ 1.983353] [<ffffff80089801f4>] cci_pmu_probe+0x1dc/0x488
[ 1.983360] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.18.20-rt8-yocto-preempt-rt #1
[ 1.983362] Hardware name: ZynqMP ZCU102 Rev1.0 (DT)
[ 1.983364] Call trace:
[ 1.983369] dump_backtrace+0x0/0x158
[ 1.983372] show_stack+0x24/0x30
[ 1.983378] dump_stack+0x80/0xa4
[ 1.983383] ___might_sleep+0x138/0x160
[ 1.983386] __might_sleep+0x58/0x90
[ 1.983391] __rt_mutex_lock_state+0x30/0xc0
[ 1.983395] _mutex_lock+0x24/0x30
[ 1.983400] perf_pmu_register+0x2c/0x388
[ 1.983404] cci_pmu_probe+0x2bc/0x488
[ 1.983409] platform_drv_probe+0x58/0xa8
It is not feasible to resolve all the possible races outside of the perf
core itself, so address the immediate bug by following the example of
nearly every other PMU driver and not even trying to do so. Registering
the hotplug notifier first should minimise the window in which things
can go wrong, so that's about as much as we can reasonably do here. This
also revealed an additional race in assigning the global pointer too
late relative to the hotplug notifier, which gets fixed in the process.
Reported-by: Li, Meng <Meng.Li@windriver.com>
Tested-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f4033db5b8 ]
This is mostly a revert of commit 55bb6a633c ("clk: rockchip: mark
noc and some special clk as critical on rk3288") except that we're
keeping "pmu_hclk_otg0" as critical still.
NOTE: turning these clocks off doesn't seem to do a whole lot in terms
of power savings (checking the power on the logic rail). It appears
to save maybe 1-2mW. ...but still it seems like we should turn the
clocks off if they aren't needed.
About "pmu_hclk_otg0" (the one clock from the original commit we're
still keeping critical) from an email thread:
> pmu ahb clock
>
> Function: Clock to pmu module when hibernation and/or ADP is
> enabled. Must be greater than or equal to 30 MHz.
>
> If the SOC design does not support hibernation/ADP function, only have
> hclk_otg, this clk can be switched according to the usage of otg.
> If the SOC design support hibernation/ADP, has two clocks, hclk_otg and
> pmu_hclk_otg0.
> Hclk_otg belongs to the closed part of otg logic, which can be switched
> according to the use of otg.
>
> pmu_hclk_otg0 belongs to the always on part.
>
> As for whether pmu_hclk_otg0 can be turned off when otg is not in use,
> we have not tested. IC suggest make pmu_hclk_otg0 always on.
For the rest of the clocks:
atclk: No documentation about this clock other than that it goes to
the CPU. CPU functions fine without it on. Maybe needed for JTAG?
jtag: Presumably this clock is only needed if you're debugging with
JTAG. It doesn't seem like it makes sense to waste power for every
rk3288 user. In any case to do JTAG you'd need private patches to
adjust the pinctrl the mux the JTAG out anyway.
pclk_dbg, pclk_core_niu: On veyron Chromebooks we turn these two
clocks on only during kernel panics in order to access some coresight
registers. Since nothing in the upstream kernel does this we should
be able to leave them off safely. Maybe also needed for JTAG?
hsicphy12m_xin12m: There is no indication of why this clock would need
to be turned on for boards that don't use HSIC.
pclk_ddrupctl[0-1], pclk_publ0[0-1]: On veyron Chromebooks we turn
these 4 clocks on only when doing DDR transitions and they are off
otherwise. I see no reason why they'd need to be on in the upstream
kernel which doesn't support DDRFreq.
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Elaine Zhang <zhangqing@rock-chips.com>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 44b9f86cd4 ]
The call to of_find_compatible_node returns a node pointer with refcount
incremented thus it must be explicitly decremented after the last
usage.
Detected by coccinelle with the following warnings:
./drivers/pinctrl/samsung/pinctrl-exynos-arm.c:76:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 66, but without a corresponding object release within this function.
./drivers/pinctrl/samsung/pinctrl-exynos-arm.c:82:1-7: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 66, but without a corresponding object release within this function.
Signed-off-by: Wen Yang <wen.yang99@zte.com.cn>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Tomasz Figa <tomasz.figa@gmail.com>
Cc: Sylwester Nawrocki <s.nawrocki@samsung.com>
Cc: Kukjin Kim <kgene@kernel.org>
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-gpio@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 483d70d73b ]
The call to of_get_child_by_name returns a node pointer with refcount
incremented thus it must be explicitly decremented after the last
usage.
Detected by coccinelle with the following warnings:
./drivers/pinctrl/pinctrl-st.c:1188:3-9: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 1175, but without a corresponding object release within this function.
./drivers/pinctrl/pinctrl-st.c:1188:3-9: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 1175, but without a corresponding object release within this function.
./drivers/pinctrl/pinctrl-st.c:1199:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 1175, but without a corresponding object release within this function.
./drivers/pinctrl/pinctrl-st.c:1199:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 1175, but without a corresponding object release within this function.
Signed-off-by: Wen Yang <wen.yang99@zte.com.cn>
Cc: Patrice Chotard <patrice.chotard@st.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: linux-gpio@vger.kernel.org
Cc: linux-kernel@vger.kernel.org (open list)
Reviewed-by: Patrice Chotard <patrice.chotard@st.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 44a4455ac2 ]
The call to of_get_child_by_name returns a node pointer with refcount
incremented thus it must be explicitly decremented after the last
usage.
Detected by coccinelle with the following warnings:
./drivers/pinctrl/pinctrl-pistachio.c:1422:1-7: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 1360, but without a corresponding object release within this function.
Signed-off-by: Wen Yang <wen.yang99@zte.com.cn>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: linux-gpio@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 096377525c ]
According to the logitech_hidpp_2.0_specification_draft_2012-06-04.pdf doc:
https://lekensteyn.nl/files/logitech/logitech_hidpp_2.0_specification_draft_2012-06-04.pdf
We should use a register-access-protocol request using the short input /
output report ids. This is necessary because 27MHz HID++ receivers have
a max-packetsize on their HIP++ endpoint of 8, so they cannot support
long reports. Using a feature-access-protocol request (which is always
long or very-long) with these will cause a timeout error, followed by
the hidpp driver treating the device as not being HID++ capable.
This commit fixes this by switching to using a rap request to get the
protocol version.
Besides being tested with a (046d:c517) 27MHz receiver with various
27MHz keyboards and mice, this has also been tested to not cause
regressions on a non-unifying dual-HID++ nano receiver (046d:c534) with
k270 and m185 HID++-2.0 devices connected and on a unifying/dj receiver
(046d:c52b) with a HID++-2.0 Logitech Rechargeable Touchpad T650.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit cac63f9b16 ]
Fixed warning: incorrect type in assignment reported by kbuild test robot.
The detailed warning is shown as below.
make ARCH=x86_64 allmodconfig
make C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__'
All warnings (new ones prefixed by >>):
btmtkuart.c:671:18: sparse: warning: incorrect type in assignment
(different base types)
btmtkuart.c:671:18: sparse: expected unsigned int [usertype] baudrate
btmtkuart.c:671:18: sparse: got restricted __le32 [usertype]
sparse warnings: (new ones prefixed by >>)
btmtkuart.c:671:18: sparse: warning: incorrect type in assignment
(different base types)
btmtkuart.c:671:18: sparse: expected unsigned int [usertype] baudrate
btmtkuart.c:671:18: sparse: got restricted __le32 [usertype]
vim +671 drivers/bluetooth/btmtkuart.c
659
660 static int btmtkuart_change_baudrate(struct hci_dev *hdev)
661 {
662 struct btmtkuart_dev *bdev = hci_get_drvdata(hdev);
663 struct btmtk_hci_wmt_params wmt_params;
664 u32 baudrate;
665 u8 param;
666 int err;
667
668 /* Indicate the device to enter the probe state the host is
669 * ready to change a new baudrate.
670 */
> 671 baudrate = cpu_to_le32(bdev->desired_speed);
672 wmt_params.op = MTK_WMT_HIF;
Fixes: 22eaf6c994 ("Bluetooth: mediatek: add support for MediaTek MT7663U and MT7668U UART devices")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5035726128 ]
The BCM43341B has the default MAC address 43:34:1B:00:1F:AC if none
is given. This address was found when enabling Bluetooth on multiple
Intel Edison modules. It also contains the sequence 43341B, the name
the chip identifies itself as. Using the same BD_ADDR is problematic
when having multiple Intel Edison modules in each others range.
The default address also has the LAA (locally administered address)
bit set which prevents a BNEP device from being created, needed for
BT tethering.
Add this to the list of black listed default MAC addresses and let
the user configure a valid one using f.i.
`btmgmt -i hci0 public-addr xx:xx:xx:xx:xx:xx`
Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Ferry Toth <ftoth@exalondelft.nl>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ecf2b768bd ]
qca_set_baudrate() calls serdev_device_wait_until_sent() assuming that
the HCI is always associated with a serdev device. This isn't true for
ROME controllers instantiated through ldisc, where the call causes a
crash due to a NULL pointer dereferentiation. Only call the function
when we have a serdev device. The timeout for ROME devices at the end
of qca_set_baudrate() is long enough to be reasonably sure that the
command was sent.
Fixes: fa9ad876b8 ("Bluetooth: hci_qca: Add support for Qualcomm Bluetooth chip wcn3990")
Reported-by: Balakrishna Godavarthi <bgodavar@codeaurora.org>
Reported-by: Rocky Liao <rjliao@codeaurora.org>
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Reviewed-by: Rocky Liao <rjliao@codeaurora.org>
Tested-by: Rocky Liao <rjliao@codeaurora.org>
Reviewed-by: Balakrishna Godavarthi <bgodavar@codeaurora.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 29da93fea3 ]
Randy reported objtool triggered on his (GCC-7.4) build:
lib/strncpy_from_user.o: warning: objtool: strncpy_from_user()+0x315: call to __ubsan_handle_add_overflow() with UACCESS enabled
lib/strnlen_user.o: warning: objtool: strnlen_user()+0x337: call to __ubsan_handle_sub_overflow() with UACCESS enabled
This is due to UBSAN generating signed-overflow-UB warnings where it
should not. Prior to GCC-8 UBSAN ignored -fwrapv (which the kernel
uses through -fno-strict-overflow).
Make the functions use 'unsigned long' throughout.
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: luto@kernel.org
Link: http://lkml.kernel.org/r/20190424072208.754094071@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6ae865615f ]
The __put_user() macro evaluates it's @ptr argument inside the
__uaccess_begin() / __uaccess_end() region. While this would normally
not be expected to be an issue, an UBSAN bug (it ignored -fwrapv,
fixed in GCC 8+) would transform the @ptr evaluation for:
drivers/gpu/drm/i915/i915_gem_execbuffer.c: if (unlikely(__put_user(offset, &urelocs[r-stack].presumed_offset))) {
into a signed-overflow-UB check and trigger the objtool AC validation.
Finish this commit:
2a418cf3f5 ("x86/uaccess: Don't leak the AC flag into __put_user() value evaluation")
and explicitly evaluate all 3 arguments early.
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: luto@kernel.org
Fixes: 2a418cf3f5 ("x86/uaccess: Don't leak the AC flag into __put_user() value evaluation")
Link: http://lkml.kernel.org/r/20190424072208.695962771@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d4645d30b5 ]
The test robot reported a wrong assignment of a per-CPU variable which
it detected by using sparse and sent a report. The assignment itself is
correct. The annotation for sparse was wrong and hence the report.
The first pointer is a "normal" pointer and points to the per-CPU memory
area. That means that the __percpu annotation has to be moved.
Move the __percpu annotation to pointer which points to the per-CPU
area. This change affects only the sparse tool (and is ignored by the
compiler).
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: f97f8f06a4 ("smpboot: Provide infrastructure for percpu hotplug threads")
Link: http://lkml.kernel.org/r/20190424085253.12178-1-bigeasy@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 392bef7096 ]
When building x86 with Clang LTO and CFI, CFI jump regions are
automatically added to the end of the .text section late in linking. As a
result, the _etext position was being labelled before the appended jump
regions, causing confusion about where the boundaries of the executable
region actually are in the running kernel, and broke at least the fault
injection code. This moves the _etext mark to outside (and immediately
after) the .text area, as it already the case on other architectures
(e.g. arm64, arm).
Reported-and-tested-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20190423183827.GA4012@beast
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b49bdc8602 ]
When releasing the vfio-ccw mdev, we currently do not release
any existing channel program and its pinned pages. This can
lead to the following warning:
[1038876.561565] WARNING: CPU: 2 PID: 144727 at drivers/vfio/vfio_iommu_type1.c:1494 vfio_sanity_check_pfn_list+0x40/0x70 [vfio_iommu_type1]
....
1038876.561921] Call Trace:
[1038876.561935] ([<00000009897fb870>] 0x9897fb870)
[1038876.561949] [<000003ff8013bf62>] vfio_iommu_type1_detach_group+0xda/0x2f0 [vfio_iommu_type1]
[1038876.561965] [<000003ff8007b634>] __vfio_group_unset_container+0x64/0x190 [vfio]
[1038876.561978] [<000003ff8007b87e>] vfio_group_put_external_user+0x26/0x38 [vfio]
[1038876.562024] [<000003ff806fc608>] kvm_vfio_group_put_external_user+0x40/0x60 [kvm]
[1038876.562045] [<000003ff806fcb9e>] kvm_vfio_destroy+0x5e/0xd0 [kvm]
[1038876.562065] [<000003ff806f63fc>] kvm_put_kvm+0x2a4/0x3d0 [kvm]
[1038876.562083] [<000003ff806f655e>] kvm_vm_release+0x36/0x48 [kvm]
[1038876.562098] [<00000000003c2dc4>] __fput+0x144/0x228
[1038876.562113] [<000000000016ee82>] task_work_run+0x8a/0xd8
[1038876.562125] [<000000000014c7a8>] do_exit+0x5d8/0xd90
[1038876.562140] [<000000000014d084>] do_group_exit+0xc4/0xc8
[1038876.562155] [<000000000015c046>] get_signal+0x9ae/0xa68
[1038876.562169] [<0000000000108d66>] do_signal+0x66/0x768
[1038876.562185] [<0000000000b9e37e>] system_call+0x1ea/0x2d8
[1038876.562195] 2 locks held by qemu-system-s39/144727:
[1038876.562205] #0: 00000000537abaf9 (&container->group_lock){++++}, at: __vfio_group_unset_container+0x3c/0x190 [vfio]
[1038876.562230] #1: 00000000670008b5 (&iommu->lock){+.+.}, at: vfio_iommu_type1_detach_group+0x36/0x2f0 [vfio_iommu_type1]
[1038876.562250] Last Breaking-Event-Address:
[1038876.562262] [<000003ff8013aa24>] vfio_sanity_check_pfn_list+0x3c/0x70 [vfio_iommu_type1]
[1038876.562272] irq event stamp: 4236481
[1038876.562287] hardirqs last enabled at (4236489): [<00000000001cee7a>] console_unlock+0x6d2/0x740
[1038876.562299] hardirqs last disabled at (4236496): [<00000000001ce87e>] console_unlock+0xd6/0x740
[1038876.562311] softirqs last enabled at (4234162): [<0000000000b9fa1e>] __do_softirq+0x556/0x598
[1038876.562325] softirqs last disabled at (4234153): [<000000000014e4cc>] irq_exit+0xac/0x108
[1038876.562337] ---[ end trace 6c96d467b1c3ca06 ]---
Similarly we do not free the channel program when we are removing
the vfio-ccw device. Let's fix this by resetting the device and freeing
the channel program and pinned pages in the release path. For the remove
path we can just quiesce the device, since in the remove path the mediated
device is going away for good and so we don't need to do a full reset.
Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
Message-Id: <ae9f20dc8873f2027f7b3c5d2aaa0bdfe06850b8.1554756534.git.alifm@linux.ibm.com>
Acked-by: Eric Farman <farman@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5d7ed2f27b ]
When two netdev have same link local addresses (such as vlan and non
vlan), two rdma cm listen id should be able to bind to following different
addresses.
listener-1: addr=lla, scope_id=A, port=X
listener-2: addr=lla, scope_id=B, port=X
However while comparing the addresses only addr and port are considered,
due to which 2nd listener fails to listen.
In below example of two listeners, 2nd listener is failing with address in
use error.
$ rping -sv -a fe80::268a:7ff:feb3:d113%ens2f1 -p 4545&
$ rping -sv -a fe80::268a:7ff:feb3:d113%ens2f1.200 -p 4545
rdma_bind_addr: Address already in use
To overcome this, consider the scope_ids as well which forms the accurate
IPv6 link local address.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 78d4eb8ad9 ]
clang has identified a code path in which it thinks a
variable may be unused:
drivers/md/bcache/alloc.c:333:4: error: variable 'bucket' is used uninitialized whenever 'if' condition is false
[-Werror,-Wsometimes-uninitialized]
fifo_pop(&ca->free_inc, bucket);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/md/bcache/util.h:219:27: note: expanded from macro 'fifo_pop'
#define fifo_pop(fifo, i) fifo_pop_front(fifo, (i))
^~~~~~~~~~~~~~~~~~~~~~~~~
drivers/md/bcache/util.h:189:6: note: expanded from macro 'fifo_pop_front'
if (_r) { \
^~
drivers/md/bcache/alloc.c:343:46: note: uninitialized use occurs here
allocator_wait(ca, bch_allocator_push(ca, bucket));
^~~~~~
drivers/md/bcache/alloc.c:287:7: note: expanded from macro 'allocator_wait'
if (cond) \
^~~~
drivers/md/bcache/alloc.c:333:4: note: remove the 'if' if its condition is always true
fifo_pop(&ca->free_inc, bucket);
^
drivers/md/bcache/util.h:219:27: note: expanded from macro 'fifo_pop'
#define fifo_pop(fifo, i) fifo_pop_front(fifo, (i))
^
drivers/md/bcache/util.h:189:2: note: expanded from macro 'fifo_pop_front'
if (_r) { \
^
drivers/md/bcache/alloc.c:331:15: note: initialize the variable 'bucket' to silence this warning
long bucket;
^
This cannot happen in practice because we only enter the loop
if there is at least one element in the list.
Slightly rearranging the code makes this clearer to both the
reader and the compiler, which avoids the warning.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ce3e4cfb59 ]
Currently run_cache_set() has no return value, if there is failure in
bch_journal_replay(), the caller of run_cache_set() has no idea about
such failure and just continue to execute following code after
run_cache_set(). The internal failure is triggered inside
bch_journal_replay() and being handled in async way. This behavior is
inefficient, while failure handling inside bch_journal_replay(), cache
register code is still running to start the cache set. Registering and
unregistering code running as same time may introduce some rare race
condition, and make the code to be more hard to be understood.
This patch adds return value to run_cache_set(), and returns -EIO if
bch_journal_rreplay() fails. Then caller of run_cache_set() may detect
such failure and stop registering code flow immedidately inside
register_cache_set().
If journal replay fails, run_cache_set() can report error immediately
to register_cache_set(). This patch makes the failure handling for
bch_journal_replay() be in synchronized way, easier to understand and
debug, and avoid poetential race condition for register-and-unregister
in same time.
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 631207314d ]
journal replay failed with messages:
Sep 10 19:10:43 ceph kernel: bcache: error on
bb379a64-e44e-4812-b91d-a5599871a3b1: bcache: journal entries
2057493-2057567 missing! (replaying 2057493-2076601), disabling
caching
The reason is in journal_reclaim(), when discard is enabled, we send
discard command and reclaim those journal buckets whose seq is old
than the last_seq_now, but before we write a journal with last_seq_now,
the machine is restarted, so the journal with the last_seq_now is not
written to the journal bucket, and the last_seq_wrote in the newest
journal is old than last_seq_now which we expect to be, so when we doing
replay, journals from last_seq_wrote to last_seq_now are missing.
It's hard to write a journal immediately after journal_reclaim(),
and it harmless if those missed journal are caused by discarding
since those journals are already wrote to btree node. So, if miss
seqs are started from the beginning journal, we treat it as normal,
and only print a message to show the miss journal, and point out
it maybe caused by discarding.
Patch v2 add a judgement condition to ignore the missed journal
only when discard enabled as Coly suggested.
(Coly Li: rebase the patch with other changes in bch_journal_replay())
Signed-off-by: Tang Junhui <tang.junhui.linux@gmail.com>
Tested-by: Dennis Schridde <devurandom@gmx.net>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 68d10e6979 ]
When failure happens inside bch_journal_replay(), calling
cache_set_err_on() and handling the failure in async way is not a good
idea. Because after bch_journal_replay() returns, registering code will
continue to execute following steps, and unregistering code triggered
by cache_set_err_on() is running in same time. First it is unnecessary
to handle failure and unregister cache set in an async way, second there
might be potential race condition to run register and unregister code
for same cache set.
So in this patch, if failure happens in bch_journal_replay(), we don't
call cache_set_err_on(), and just print out the same error message to
kernel message buffer, then return -EIO immediately caller. Then caller
can detect such failure and handle it in synchrnozied way.
Signed-off-by: Coly Li <colyli@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 95f18c9d13 ]
In the CACHE_SYNC branch of run_cache_set(), LIST_HEAD(journal) is used
to collect journal_replay(s) and filled by bch_journal_read().
If all goes well, bch_journal_replay() will release the list of
jounal_replay(s) at the end of the branch.
If something goes wrong, code flow will jump to the label "err:" and leave
the list unreleased.
This patch will release the list of journal_replay(s) in the case of
error detected.
v1 -> v2:
* Move the release code to the location after label 'err:' to
simply the change.
Signed-off-by: Shenghui Wang <shhuiw@foxmail.com>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f87391558a ]
When nbytes < 4, end is wronlgy set to a negative value which, due to
uint, is then interpreted to a large value leading to a deadlock in the
following code.
This patch fix this problem.
Fixes: 6298e94821 ("crypto: sunxi-ss - Add Allwinner Security System crypto accelerator")
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7a42589654 ]
If we timeout the admin startup sequence we might not yet have
an I/O tagset allocated which causes the teardown sequence to crash.
Make nvme_tcp_teardown_io_queues safe by not iterating inflight tags
if the tagset wasn't allocated.
Fixes: 39d5775746 ("nvme-tcp: fix timeout handler")
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1007709d7d ]
If we timeout the admin startup sequence we might not yet have
an I/O tagset allocated which causes the teardown sequence to crash.
Make nvme_tcp_teardown_io_queues safe by not iterating inflight tags
if the tagset wasn't allocated.
Fixes: 4c174e6366 ("nvme-rdma: fix timeout handler")
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 01fa017484 ]
If our target exposed a namespace with a block size that is greater
than PAGE_SIZE, set 0 capacity on the namespace as we do not support it.
This issue encountered when the nvmet namespace was backed by a tempfile.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0ed2a00534 ]
In case create_singlethread_workqueue fails, the fix free the
hardware and returns NULL to avoid NULL pointer dereference.
Signed-off-by: Kangjie Lu <kjlu@umn.edu>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d5414c2355 ]
kmalloc can fail in rsi_register_rates_channels but memcpy still attempts
to write to channels. The patch replaces these calls with kmemdup and
passes the error upstream.
Signed-off-by: Aditya Pakki <pakki001@umn.edu>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b4c35c1722 ]
The "rate_index" is only used as an index into the phist_data->rx_rate[]
array in the mwifiex_hist_data_set() function. That array has
MWIFIEX_MAX_AC_RX_RATES (74) elements and it's used to generate some
debugfs information. The "rate_index" variable comes from the network
skb->data[] and it is a u8 so it's in the 0-255 range. We need to cap
it to prevent an array overflow.
Fixes: cbf6e05527 ("mwifiex: add rx histogram statistics support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ddb351145a ]
is_slave_mode defaults to false because sai structure
that contains it is kzalloc'ed.
Anyhow, if we decide to set the following configuration
SAI slave -> SAI master, is_slave_mode will remain set on true
although SAI being master it should be set to false.
Fix this by updating is_slave_mode for each call of
fsl_sai_set_dai_fmt.
Signed-off-by: Daniel Baluta <daniel.baluta@nxp.com>
Acked-by: Nicolin Chen <nicoleotsuka@gmail.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 78927aa40b ]
I went to great lengths to hand over the management of the GPIO
descriptors to the regulator core, and some stray rebased
oneliner in the old patch must have been assuming the devices
were still doing devres management of it.
We handed the management over to the regulator core, so of
course the regulator core shall issue gpiod_put() when done.
Sorry for the descriptor leak.
Fixes: 541d052d72 ("regulator: core: Only support passing enable GPIO descriptors")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 32e621e554 ]
Currently, building bpf samples will cause the following error.
./tools/lib/bpf/bpf.h:132:27: error: 'UINT32_MAX' undeclared here (not in a function) ..
#define BPF_LOG_BUF_SIZE (UINT32_MAX >> 8) /* verifier maximum in kernels <= 5.1 */
^
./samples/bpf/bpf_load.h:31:25: note: in expansion of macro 'BPF_LOG_BUF_SIZE'
extern char bpf_log_buf[BPF_LOG_BUF_SIZE];
^~~~~~~~~~~~~~~~
Due to commit 4519efa6f8 ("libbpf: fix BPF_LOG_BUF_SIZE off-by-one error")
hard-coded size of BPF_LOG_BUF_SIZE has been replaced with UINT32_MAX which is
defined in <stdint.h> header.
Even with this change, bpf selftests are running fine since these are built
with clang and it includes header(-idirafter) from clang/6.0.0/include.
(it has <stdint.h>)
clang -I. -I./include/uapi -I../../../include/uapi -idirafter /usr/local/include -idirafter /usr/include \
-idirafter /usr/lib/llvm-6.0/lib/clang/6.0.0/include -idirafter /usr/include/x86_64-linux-gnu \
-Wno-compare-distinct-pointer-types -O2 -target bpf -emit-llvm -c progs/test_sysctl_prog.c -o - | \
llc -march=bpf -mcpu=generic -filetype=obj -o /linux/tools/testing/selftests/bpf/test_sysctl_prog.o
But bpf samples are compiled with GCC, and it only searches and includes
headers declared at the target file. As '#include <stdint.h>' hasn't been
declared in tools/lib/bpf/bpf.h, it causes build failure of bpf samples.
gcc -Wp,-MD,./samples/bpf/.sockex3_user.o.d -Wall -Wmissing-prototypes -Wstrict-prototypes \
-O2 -fomit-frame-pointer -std=gnu89 -I./usr/include -I./tools/lib/ -I./tools/testing/selftests/bpf/ \
-I./tools/ lib/ -I./tools/include -I./tools/perf -c -o ./samples/bpf/sockex3_user.o ./samples/bpf/sockex3_user.c;
This commit add declaration of '#include <stdint.h>' to tools/lib/bpf/bpf.h
to fix this problem.
Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5dc8cdce1d ]
FullMAC STAs have no way to update bss channel after CSA channel switch
completion. As a result, user-space tools may provide inconsistent
channel info. For instance, consider the following two commands:
$ sudo iw dev wlan0 link
$ sudo iw dev wlan0 info
The latter command gets channel info from the hardware, so most probably
its output will be correct. However the former command gets channel info
from scan cache, so its output will contain outdated channel info.
In fact, current bss channel info will not be updated until the
next [re-]connect.
Note that mac80211 STAs have a workaround for this, but it requires
access to internal cfg80211 data, see ieee80211_chswitch_work:
/* XXX: shouldn't really modify cfg80211-owned data! */
ifmgd->associated->channel = sdata->csa_chandef.chan;
This patch suggests to convert mac80211 workaround into cfg80211 behavior
and to update current bss channel in cfg80211_ch_switch_notify.
Signed-off-by: Sergey Matyukevich <sergey.matyukevich.os@quantenna.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2da254cc79 ]
This patch kill instructs the DMAC to immediately terminate
execution of a thread. and then clear the interrupt status,
at last, stop generating interrupts for DMA_SEV. to guarantee
the next dma start is clean. otherwise, one interrupt maybe leave
to next start and make some mistake.
we can reporduce the problem as follows:
DMASEV: modify the event-interrupt resource, and if the INTEN sets
function as interrupt, the DMAC will set irq<event_num> HIGH to
generate interrupt. write INTCLR to clear interrupt.
DMA EXECUTING INSTRUCTS DMA TERMINATE
| |
| |
... _stop
| |
| spin_lock_irqsave
DMASEV |
| |
| mask INTEN
| |
| DMAKILL
| |
| spin_unlock_irqrestore
in above case, a interrupt was left, and if we unmask INTEN, the DMAC
will set irq<event_num> HIGH to generate interrupt.
to fix this, do as follows:
DMA EXECUTING INSTRUCTS DMA TERMINATE
| |
| |
... _stop
| |
| spin_lock_irqsave
DMASEV |
| |
| DMAKILL
| |
| clear INTCLR
| mask INTEN
| |
| spin_unlock_irqrestore
Signed-off-by: Sugar Zhang <sugar.zhang@rock-chips.com>
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 30780a8b16 ]
Since irq handler and mailbox task will both update arq's count,
so arq's count should use atomic_t instead of u32, otherwise
its value may go wrong finally.
Fixes: 07a0556a3a ("net: hns3: Changes to support ARQ(Asynchronous Receive Queue)")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 84ff7a09c3 ]
Rather embarrassingly, our futex() FUTEX_WAKE_OP implementation doesn't
explicitly set the return value on the non-faulting path and instead
leaves it holding the result of the underlying atomic operation. This
means that any FUTEX_WAKE_OP atomic operation which computes a non-zero
value will be reported as having failed. Regrettably, I wrote the buggy
code back in 2011 and it was upstreamed as part of the initial arm64
support in 2012.
The reasons we appear to get away with this are:
1. FUTEX_WAKE_OP is rarely used and therefore doesn't appear to get
exercised by futex() test applications
2. If the result of the atomic operation is zero, the system call
behaves correctly
3. Prior to version 2.25, the only operation used by GLIBC set the
futex to zero, and therefore worked as expected. From 2.25 onwards,
FUTEX_WAKE_OP is not used by GLIBC at all.
Fix the implementation by ensuring that the return value is either 0
to indicate that the atomic operation completed successfully, or -EFAULT
if we encountered a fault when accessing the user mapping.
Cc: <stable@kernel.org>
Fixes: 6170a97460 ("arm64: Atomic operations")
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 46b83629de ]
clang produces a harmless warning for each use for the qeth_adp_supported
macro:
drivers/s390/net/qeth_l2_main.c:559:31: warning: implicit conversion from enumeration type 'enum qeth_ipa_setadp_cmd' to
different enumeration type 'enum qeth_ipa_funcs' [-Wenum-conversion]
if (qeth_adp_supported(card, IPA_SETADP_SET_PROMISC_MODE))
~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/s390/net/qeth_core.h:179:41: note: expanded from macro 'qeth_adp_supported'
qeth_is_ipa_supported(&c->options.adp, f)
~~~~~~~~~~~~~~~~~~~~~ ^
Add a version of this macro that uses the correct types, and
remove the unused qeth_adp_enabled() macro that has the same
problem.
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8c90b795e9 ]
PHY's behave differently when being reset. Some reset registers to
defaults, some don't. Some trigger an autoneg restart, some don't.
So let's also set the autoneg restart bit when resetting. Then PHY
behavior should be more consistent. Clearing BMCR_ISOLATE serves the
same purpose and is borrowed from genphy_restart_aneg.
BMCR holds the speed / duplex settings in fixed mode. Therefore
we may have an issue if a soft reset resets BMCR to its default.
So better call genphy_setup_forced() afterwards in fixed mode.
We've seen no related complaint in the last >10 yrs, so let's
treat it as an improvement.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 63380a1ae4 ]
hns3_desc_unused() returns how many BD have been cleaned, but new
buffer has not been attached to them. The register of
HNS3_RING_RX_RING_FBDNUM_REG returns how many BD need allocating new
buffer to or need to cleaned. So the remaining BD need to be clean
is HNS3_RING_RX_RING_FBDNUM_REG - hns3_desc_unused().
Also, new buffer can not attach to the pending BD when the last BD is
not handled, because memcpy has not been done on the first pending BD.
This patch fixes by subtracting the pending BD num from unused_count
after 'HNS3_RING_RX_RING_FBDNUM_REG - unused_count' is used to calculate
the BD bum need to be clean.
Fixes: e559709505 ("net: hns3: Add handling of GRO Pkts not fully RX'ed in NAPI poll")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fba2efdae8 ]
When configure pause, current implementation returns directly
after setup PFC without setup BP, which is not sufficient.
So this patch fixes it, only return while setting PFC failed.
Fixes: 44e59e375b ("net: hns3: do not return GE PFC setting err when initializing")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 62909da8ac ]
>From the DS2408 datasheet [1]:
"Resume Command function checks the status of the RC flag and, if it is set,
directly transfers control to the control functions, similar to a Skip ROM
command. The only way to set the RC flag is through successfully executing
the Match ROM, Search ROM, Conditional Search ROM, or Overdrive-Match ROM
command"
The function currently works perfectly fine in a multidrop bus, but when we
have only a single slave connected, then only a Skip ROM is used and Match
ROM is not called at all. This is leading to problems e.g. with single one
DS2408 connected, as the Resume Command is not working properly and the
device is responding with failing results after the Resume Command.
This commit is fixing this by using a Skip ROM instead in those cases.
The bandwidth / performance advantage is exactly the same.
Refs:
[1] https://datasheets.maximintegrated.com/en/ds/DS2408.pdf
Signed-off-by: Mariusz Bialonczyk <manio@skyboo.net>
Reviewed-by: Jean-Francois Dagenais <jeff.dagenais@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 06095f34f8 ]
Now CPSW ALE will set/clean Host port bit in Unregistered Multicast Flood
Mask (UNREG_MCAST_FLOOD_MASK) for every VLAN without checking if this port
belongs to VLAN or not when ALLMULTI mode flag is set for nedev. This is
working in non dual_mac mode, but in dual_mac - it causes
enabling/disabling ALLMULTI flag for both ports.
Hence fix it by adding additional parameter to cpsw_ale_set_allmulti() to
specify ALE port number for which ALLMULTI has to be enabled and check if
port belongs to VLAN before modifying UNREG_MCAST_FLOOD_MASK.
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9b019acb72 ]
The NOHZ idle balancer runs on the lowest idle CPU. This can
interfere with isolated CPUs, so confine it to HK_FLAG_MISC
housekeeping CPUs.
HK_FLAG_SCHED is not used for this because it is not set anywhere
at the moment. This could be folded into HK_FLAG_SCHED once that
option is fixed.
The problem was observed with increased jitter on an application
running on CPU0, caused by NOHZ idle load balancing being run on
CPU1 (an SMT sibling).
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190412042613.28930-1-npiggin@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4d95c51776 ]
snd_hda_codec_device_new() is used by both legacy HDA and ASoC
driver. However, we will call snd_hdac_device_unregister() in
snd_hdac_ext_bus_device_remove() for ASoC device. This patch uses
the type flag in hdac_device struct to determine is it a ASoC device
or legacy HDA device and call snd_hdac_device_unregister() in
snd_hda_codec_dev_free() only if it is a legacy HDA device.
Signed-off-by: Bard liao <yung-chuan.liao@linux.intel.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 729829d775 ]
To register data for the next kernel (command line, oldmem_base, etc.) the
current kernel needs to find the ELF segment that contains head.S. This is
currently done by checking ifor 'phdr->p_paddr == 0'. This works fine for
the current kernel build but in theory the first few pages could be
skipped. Make the detection more robust by checking if the entry point lies
within the segment.
Signed-off-by: Philipp Rudo <prudo@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f22b1ba15e ]
The device's remove() attempts to shut down the delayed_work scheduled
on the kernel-global workqueue by calling flush_scheduled_work().
Unfortunately, flush_scheduled_work() does not prevent the delayed_work
from re-scheduling itself. The delayed_work might run after the device
has been removed, and touch the already de-allocated info structure.
This is a potential use-after-free.
Fix by calling cancel_delayed_work_sync() during remove(): this ensures
that the delayed work is properly cancelled, is no longer running, and
is not able to re-schedule itself.
This issue was detected with the help of Coccinelle.
Signed-off-by: Sven Van Asbroeck <TheSven73@gmail.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 30f24eabab ]
If for some reason the device gives us an RX interrupt before we're
ready for it, perhaps during device power-on with misconfigured IRQ
causes mapping or so, we can crash trying to access the queues.
Prevent that by checking that we actually have RXQs and that they
were properly allocated.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7ac1e464c4 ]
When we failed to find a root key in btrfs_update_root(), we just panic.
That's definitely not cool, fix it by outputting an unique error
message, aborting current transaction and return -EUCLEAN. This should
not normally happen as the root has been used by the callers in some
way.
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ff612ba784 ]
We've been seeing the following sporadically throughout our fleet
panic: kernel BUG at fs/btrfs/relocation.c:4584!
netversion: 5.0-0
Backtrace:
#0 [ffffc90003adb880] machine_kexec at ffffffff81041da8
#1 [ffffc90003adb8c8] __crash_kexec at ffffffff8110396c
#2 [ffffc90003adb988] crash_kexec at ffffffff811048ad
#3 [ffffc90003adb9a0] oops_end at ffffffff8101c19a
#4 [ffffc90003adb9c0] do_trap at ffffffff81019114
#5 [ffffc90003adba00] do_error_trap at ffffffff810195d0
#6 [ffffc90003adbab0] invalid_op at ffffffff81a00a9b
[exception RIP: btrfs_reloc_cow_block+692]
RIP: ffffffff8143b614 RSP: ffffc90003adbb68 RFLAGS: 00010246
RAX: fffffffffffffff7 RBX: ffff8806b9c32000 RCX: ffff8806aad00690
RDX: ffff880850b295e0 RSI: ffff8806b9c32000 RDI: ffff88084f205bd0
RBP: ffff880849415000 R8: ffffc90003adbbe0 R9: ffff88085ac90000
R10: ffff8805f7369140 R11: 0000000000000000 R12: ffff880850b295e0
R13: ffff88084f205bd0 R14: 0000000000000000 R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#7 [ffffc90003adbbb0] __btrfs_cow_block at ffffffff813bf1cd
#8 [ffffc90003adbc28] btrfs_cow_block at ffffffff813bf4b3
#9 [ffffc90003adbc78] btrfs_search_slot at ffffffff813c2e6c
The way relocation moves data extents is by creating a reloc inode and
preallocating extents in this inode and then copying the data into these
preallocated extents. Once we've done this for all of our extents,
we'll write out these dirty pages, which marks the extent written, and
goes into btrfs_reloc_cow_block(). From here we get our current
reloc_control, which _should_ match the reloc_control for the current
block group we're relocating.
However if we get an ENOSPC in this path at some point we'll bail out,
never initiating writeback on this inode. Not a huge deal, unless we
happen to be doing relocation on a different block group, and this block
group is now rc->stage == UPDATE_DATA_PTRS. This trips the BUG_ON() in
btrfs_reloc_cow_block(), because we expect to be done modifying the data
inode. We are in fact done modifying the metadata for the data inode
we're currently using, but not the one from the failed block group, and
thus we BUG_ON().
(This happens when writeback finishes for extents from the previous
group, when we are at btrfs_finish_ordered_io() which updates the data
reloc tree (inode item, drops/adds extent items, etc).)
Fix this by writing out the reloc data inode always, and then breaking
out of the loop after that point to keep from tripping this BUG_ON()
later.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
[ add note from Filipe ]
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 39ad317315 ]
When doing fallocate, we first add the range to the reserve_list and
then reserve the quota. If quota reservation fails, we'll release all
reserved parts of reserve_list.
However, cur_offset is not updated to indicate that this range is
already been inserted into the list. Therefore, the same range is freed
twice. Once at list_for_each_entry loop, and once at the end of the
function. This will result in WARN_ON on bytes_may_use when we free the
remaining space.
At the end, under the 'out' label we have a call to:
btrfs_free_reserved_data_space(inode, data_reserved, alloc_start, alloc_end - cur_offset);
The start offset, third argument, should be cur_offset.
Everything from alloc_start to cur_offset was freed by the
list_for_each_entry_safe_loop.
Fixes: 18513091af ("btrfs: update btrfs_space_info's bytes_may_use timely")
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Robbie Ko <robbieko@synology.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e209783d66 ]
Implementations of the .write_pending() callback functions must guarantee
that an appropriate LIO core callback function will be called immediately or
at a later time. Make sure that this guarantee is met for aborted SCSI
commands.
[mkp: typo]
Cc: Himanshu Madhani <hmadhani@marvell.com>
Cc: Giridhar Malavali <gmalavali@marvell.com>
Fixes: 694833ee00 ("scsi: tcm_qla2xxx: Do not allow aborted cmd to advance.") # v4.13.
Fixes: a07100e00a ("qla2xxx: Fix TMR ABORT interaction issue between qla2xxx and TCM") # v4.5.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4ebe36c94a ]
Currently the error return path from kobject_init_and_add() is not
followed by a call to kobject_put() - which means we are leaking the
kobject.
Fix it by adding a call to kobject_put() in the error path of
kobject_init_and_add().
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Tobin C. Harding <tobin@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7ae3f6e130 ]
Using a jiffies timer creates a dependency on the tick_do_timer_cpu
incrementing jiffies. If that CPU has locked up and jiffies is not
incrementing, the watchdog heartbeat timer for all CPUs stops and
creates false positives and confusing warnings on local CPUs, and
also causes the SMP detector to stop, so the root cause is never
detected.
Fix this by using hrtimer based timers for the watchdog heartbeat,
like the generic kernel hardlockup detector.
Cc: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Reported-by: Ravikumar Bangoria <ravi.bangoria@in.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Reported-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 89a37842b0 ]
Remove mt76_queue dependency from tx_queue_skb function pointer and
rely on mt76_tx_qid instead. This is a preliminary patch to introduce
mt76_sw_queue support
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 74dd022f9e ]
When building with -Wunused-but-set-variable, the compiler shouts about
a number of pte_unmap() users, since this expands to an empty macro on
arm64:
| mm/gup.c: In function 'gup_pte_range':
| mm/gup.c:1727:16: warning: variable 'ptem' set but not used
| [-Wunused-but-set-variable]
| mm/gup.c: At top level:
| mm/memory.c: In function 'copy_pte_range':
| mm/memory.c:821:24: warning: variable 'orig_dst_pte' set but not used
| [-Wunused-but-set-variable]
| mm/memory.c:821:9: warning: variable 'orig_src_pte' set but not used
| [-Wunused-but-set-variable]
| mm/swap_state.c: In function 'swap_ra_info':
| mm/swap_state.c:641:15: warning: variable 'orig_pte' set but not used
| [-Wunused-but-set-variable]
| mm/madvise.c: In function 'madvise_free_pte_range':
| mm/madvise.c:318:9: warning: variable 'orig_pte' set but not used
| [-Wunused-but-set-variable]
Rewrite pte_unmap() as a static inline function, which silences the
warnings.
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1f5b62f09f ]
The VDSO code uses the kernel helper that was originally designed
to abstract the access between 32 and 64bit systems. It worked so
far because this function is declared as 'inline'.
As we're about to revamp that part of the code, the VDSO would
break. Let's fix it by doing what should have been done from
the start, a proper system register access.
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f10b83de1f ]
If the BAR is zero size, it indicates it was never successfully mapped.
Ensure that the BAR is valid during initialization before attempting to
use it.
Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 23583f7795 ]
When the DSDT tables expose devices with subdevices and a set of
hierarchical _DSD properties, the data returned by
acpi_get_next_subnode() is incorrect, with the results suggesting a bad
pointer assignment. The parser works fine with device_nodes or
data_nodes, but not with a combination of the two.
The problem is traced to an invalid pointer used when jumping from
handling device_nodes to data nodes. The existing code looks for data
nodes below the last subdevice found instead of the common root. Fix
by forcing the acpi_device pointer to be derived from the same fwnode
for the two types of subnodes.
This same problem of handling device and data nodes was already fixed
in a similar way by 'commit bf4703fdd1 ("ACPI / property: fix data
node parsing in acpi_get_next_subnode()")' but broken later by 'commit
34055190b1 ("ACPI / property: Add fwnode_get_next_child_node()")', so
this should probably go to linux-stable all the way to 4.12
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e025da3d7a ]
If "ret_len" is negative then it could lead to a NULL dereference.
The "ret_len" value comes from nl80211_vendor_cmd(), if it's negative
then we don't allocate the "dcmd_buf" buffer. Then we pass "ret_len" to
brcmf_fil_cmd_data_set() where it is cast to a very high u32 value.
Most of the functions in that call tree check whether the buffer we pass
is NULL but there are at least a couple places which don't such as
brcmf_dbg_hex_dump() and brcmf_msgbuf_query_dcmd(). We memcpy() to and
from the buffer so it would result in a NULL dereference.
The fix is to change the types so that "ret_len" can't be negative. (If
we memcpy() zero bytes to NULL, that's a no-op and doesn't cause an
issue).
Fixes: 1bacb0487d ("brcmfmac: replace cfg80211 testmode with vendor command")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6f4e02193c ]
When the state of rep was introduced, it was also designed to prevent
duplicate unloading of the same rep. Considering the following two
flows when an eswitch manager is at switchdev mode with n VF reps loaded.
+--------------------------------------+--------------------------------+
| cpu-0 | cpu-1 |
| -------- | -------- |
| mlx5_ib_remove | mlx5_eswitch_disable_sriov |
| mlx5_ib_unregister_vport_reps | esw_offloads_cleanup |
| mlx5_eswitch_unregister_vport_reps | esw_offloads_unload_all_reps |
| __unload_reps_all_vport | __unload_reps_all_vport |
+--------------------------------------+--------------------------------+
These two flows will try to unload the same rep. Per original design,
once one flow unloads the rep, the state moves to REGISTERED. The 2nd
flow will no longer needs to do the unload and bails out. However, as
read and write of the state is not atomic, when 1st flow is doing the
unload, the state is still LOADED, 2nd flow is able to do the same
unload action. Kernel crash will happen.
To solve this, driver should do atomic test-and-set for the state. So
that only one flow can change the rep state from LOADED to REGISTERED,
and proceed to do the actual unloading.
Since the state is changing to atomic type, all other read/write should
be atomic action as well.
Fixes: f121e0ea95 (net/mlx5: E-Switch, Add state to eswitch vport representors)
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Vu Pham <vuhuong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 29f2133717 ]
Calculate the divisor for the SCR (Serial Clock Rate), avoiding
that the SSP transmission rate can be greater than the device rate.
When the division between the SSP clock and the device rate generates
a reminder, we have to increment by one the divisor.
In this way the resulting SSP clock will never be greater than the
device SPI max frequency.
For example, with:
- ssp_clk = 50 MHz
- dev freq = 15 MHz
without this patch the SSP clock will be greater than 15 MHz:
- 25 MHz for PXA25x_SSP and CE4100_SSP
- 16,56 MHz for the others
Instead, with this patch, we have in both case an SSP clock of 12.5MHz,
so the max rate of the SPI device clock is respected.
Signed-off-by: Flavio Suligoi <f.suligoi@asem.it>
Reviewed-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Reviewed-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ea751227c8 ]
During randconfig builds, I occasionally run into an invalid configuration
of the freescale FIQ sound support:
WARNING: unmet direct dependencies detected for SND_SOC_IMX_PCM_FIQ
Depends on [m]: SOUND [=y] && !UML && SND [=y] && SND_SOC [=y] && SND_IMX_SOC [=m]
Selected by [y]:
- SND_SOC_FSL_SPDIF [=y] && SOUND [=y] && !UML && SND [=y] && SND_SOC [=y] && SND_IMX_SOC [=m]!=n && (MXC_TZIC [=n] || MXC_AVIC [=y])
sound/soc/fsl/imx-ssi.o: In function `imx_ssi_remove':
imx-ssi.c:(.text+0x28): undefined reference to `imx_pcm_fiq_exit'
sound/soc/fsl/imx-ssi.o: In function `imx_ssi_probe':
imx-ssi.c:(.text+0xa64): undefined reference to `imx_pcm_fiq_init'
The Kconfig warning is a result of the symbol being defined inside of
the "if SND_IMX_SOC" block, and is otherwise harmless. The link error
is more tricky and happens with SND_SOC_IMX_SSI=y, which may or may not
imply FIQ support. However, if SND_SOC_FSL_SSI is set to =m at the same
time, that selects SND_SOC_IMX_PCM_FIQ as a loadable module dependency,
which then causes a link failure from imx-ssi.
The solution here is to make SND_SOC_IMX_PCM_FIQ built-in whenever
one of its potential users is built-in.
Fixes: ff40260f79 ("ASoC: fsl: refine DMA/FIQ dependencies")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e5c27498a0 ]
atmel_qspi objects are kept in spi_controller objects, so, first get
pointer to spi_controller object and then get atmel_qspi object from
spi_controller object.
Fixes: 2d30ac5ed6 ("mtd: spi-nor: atmel-quadspi: Use spi-mem interface for atmel-quadspi driver")
Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Reviewed-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 860b7d2286 ]
The data structure (i.e struct imc_mem_info) to hold the memory address
information for nest imc units is allocated based on the number of nodes
in the system.
nest_imc_event_init() traverse this struct array to calculate the memory
base address for the event-cpu. If we fail to find a match for the event
cpu's chip-id in imc_mem_info struct array, then the do-while loop will
iterate until we crash.
Fix this by changing the loop exit condition based on the number of
non zero vbase elements in the array, since the allocation is done for
nr_chips + 1.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 885dcd709b ("powerpc/perf: Add nest IMC PMU support")
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a913e5e8b4 ]
Nest hardware counter memory resides in a per-chip reserve-memory.
During nest_imc_event_init(), chip-id of the event-cpu is considered to
calculate the base memory addresss for that cpu. Return, proper error
condition if the chip_id calculated is invalid.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 885dcd709b ("powerpc/perf: Add nest IMC PMU support")
Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 30180e8436 ]
If the hdmi codec startup fails, it should clear the current_substream
pointer to free the device. This is properly done for the audio_startup()
callback but for snd_pcm_hw_constraint_eld().
Make sure the pointer cleared if an error is reported.
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 41a91c606e ]
dwc3_gadget_suspend() is called under dwc->lock spinlock. In such context
calling synchronize_irq() is not allowed. Move the problematic call out
of the protected block to fix the following kernel BUG during system
suspend:
BUG: sleeping function called from invalid context at kernel/irq/manage.c:112
in_atomic(): 1, irqs_disabled(): 128, pid: 1601, name: rtcwake
6 locks held by rtcwake/1601:
#0: f70ac2a2 (sb_writers#7){.+.+}, at: vfs_write+0x130/0x16c
#1: b5fe1270 (&of->mutex){+.+.}, at: kernfs_fop_write+0xc0/0x1e4
#2: 7e597705 (kn->count#60){.+.+}, at: kernfs_fop_write+0xc8/0x1e4
#3: 8b3527d0 (system_transition_mutex){+.+.}, at: pm_suspend+0xc4/0xc04
#4: fc7f1c42 (&dev->mutex){....}, at: __device_suspend+0xd8/0x74c
#5: 4b36507e (&(&dwc->lock)->rlock){....}, at: dwc3_gadget_suspend+0x24/0x3c
irq event stamp: 11252
hardirqs last enabled at (11251): [<c09c54a4>] _raw_spin_unlock_irqrestore+0x6c/0x74
hardirqs last disabled at (11252): [<c09c4d44>] _raw_spin_lock_irqsave+0x1c/0x5c
softirqs last enabled at (9744): [<c0102564>] __do_softirq+0x3a4/0x66c
softirqs last disabled at (9737): [<c0128528>] irq_exit+0x140/0x168
Preemption disabled at:
[<00000000>] (null)
CPU: 7 PID: 1601 Comm: rtcwake Not tainted
5.0.0-rc3-next-20190122-00039-ga3f4ee4f8a52 #5252
Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
[<c01110f0>] (unwind_backtrace) from [<c010d120>] (show_stack+0x10/0x14)
[<c010d120>] (show_stack) from [<c09a4d04>] (dump_stack+0x90/0xc8)
[<c09a4d04>] (dump_stack) from [<c014c700>] (___might_sleep+0x22c/0x2c8)
[<c014c700>] (___might_sleep) from [<c0189d68>] (synchronize_irq+0x28/0x84)
[<c0189d68>] (synchronize_irq) from [<c05cbbf8>] (dwc3_gadget_suspend+0x34/0x3c)
[<c05cbbf8>] (dwc3_gadget_suspend) from [<c05bd020>] (dwc3_suspend_common+0x154/0x410)
[<c05bd020>] (dwc3_suspend_common) from [<c05bd34c>] (dwc3_suspend+0x14/0x2c)
[<c05bd34c>] (dwc3_suspend) from [<c051c730>] (platform_pm_suspend+0x2c/0x54)
[<c051c730>] (platform_pm_suspend) from [<c05285d4>] (dpm_run_callback+0xa4/0x3dc)
[<c05285d4>] (dpm_run_callback) from [<c0528a40>] (__device_suspend+0x134/0x74c)
[<c0528a40>] (__device_suspend) from [<c052c508>] (dpm_suspend+0x174/0x588)
[<c052c508>] (dpm_suspend) from [<c0182134>] (suspend_devices_and_enter+0xc0/0xe74)
[<c0182134>] (suspend_devices_and_enter) from [<c0183658>] (pm_suspend+0x770/0xc04)
[<c0183658>] (pm_suspend) from [<c0180ddc>] (state_store+0x6c/0xcc)
[<c0180ddc>] (state_store) from [<c09a9a70>] (kobj_attr_store+0x14/0x20)
[<c09a9a70>] (kobj_attr_store) from [<c02d6800>] (sysfs_kf_write+0x4c/0x50)
[<c02d6800>] (sysfs_kf_write) from [<c02d594c>] (kernfs_fop_write+0xfc/0x1e4)
[<c02d594c>] (kernfs_fop_write) from [<c02593d8>] (__vfs_write+0x2c/0x160)
[<c02593d8>] (__vfs_write) from [<c0259694>] (vfs_write+0xa4/0x16c)
[<c0259694>] (vfs_write) from [<c0259870>] (ksys_write+0x40/0x8c)
[<c0259870>] (ksys_write) from [<c0101000>] (ret_fast_syscall+0x0/0x28)
Exception stack(0xed55ffa8 to 0xed55fff0)
...
Fixes: 01c10880d2 ("usb: dwc3: gadget: synchronize_irq dwc irq in suspend")
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 54f37f5663 ]
Some function drivers queueing more than 128 ISOC requests at a time.
To avoid "descriptor chain full" cases, increasing descriptors count
from MAX_DMA_DESC_NUM_GENERIC to MAX_DMA_DESC_NUM_HS_ISOC for ISOC's
only.
Signed-off-by: Minas Harutyunyan <hminas@synopsys.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 16ec5dfe03 ]
On kbl_rt5663_max98927, commit 38a5882e42
("ASoC: Intel: kbl_rt5663_max98927: Map BTN_0 to KEY_PLAYPAUSE")
This key pair mapping to play/pause when playing Youtube
The Android 3.5mm Headset jack specification mentions that BTN_0 should
be mapped to KEY_MEDIA, but this is less logical than KEY_PLAYPAUSE,
which has much broader userspace support.
For example, the Chrome OS userspace now supports KEY_PLAYPAUSE to toggle
play/pause of videos and audio, but does not handle KEY_MEDIA.
Furthermore, Android itself now supports KEY_PLAYPAUSE equivalently, as the
new USB headset spec requires KEY_PLAYPAUSE for BTN_0.
https://source.android.com/devices/accessories/headset/usb-headset-spec
The same fix is required on Chrome kbl_da7219_max98357a.
Signed-off-by: Mac Chiang <mac.chiang@intel.com>
Reviewed-by: Benson Leung <bleung@chromium.org>
Acked-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 02d15f0d80 ]
The call to of_parse_phandle returns a node pointer with refcount
incremented thus it must be explicitly decremented after the last
usage.
Detected by coccinelle with the following warnings:
./drivers/pinctrl/zte/pinctrl-zx.c:415:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 407, but without a corresponding object release within this function.
./drivers/pinctrl/zte/pinctrl-zx.c:422:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 407, but without a corresponding object release within this function.
./drivers/pinctrl/zte/pinctrl-zx.c:436:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 407, but without a corresponding object release within this function.
./drivers/pinctrl/zte/pinctrl-zx.c:444:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 407, but without a corresponding object release within this function.
./drivers/pinctrl/zte/pinctrl-zx.c:448:1-7: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 407, but without a corresponding object release within this function.
Signed-off-by: Wen Yang <wen.yang99@zte.com.cn>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Jun Nie <jun.nie@linaro.org>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: linux-gpio@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Acked-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f80c5dad7b ]
This commit makes the kernel not send the next queued HCI command until
a command complete arrives for the last HCI command sent to the
controller. This change avoids a problem with some buggy controllers
(seen on two SKUs of QCA9377) that send an extra command complete event
for the previous command after the kernel had already sent a new HCI
command to the controller.
The problem was reproduced when starting an active scanning procedure,
where an extra command complete event arrives for the LE_SET_RANDOM_ADDR
command. When this happends the kernel ends up not processing the
command complete for the following commmand, LE_SET_SCAN_PARAM, and
ultimately behaving as if a passive scanning procedure was being
performed, when in fact controller is performing an active scanning
procedure. This makes it impossible to discover BLE devices as no device
found events are sent to userspace.
This problem is reproducible on 100% of the attempts on the affected
controllers. The extra command complete event can be seen at timestamp
27.420131 on the btmon logs bellow.
Bluetooth monitor ver 5.50
= Note: Linux version 5.0.0+ (x86_64) 0.352340
= Note: Bluetooth subsystem version 2.22 0.352343
= New Index: 80:C5:F2:8F:87:84 (Primary,USB,hci0) [hci0] 0.352344
= Open Index: 80:C5:F2:8F:87:84 [hci0] 0.352345
= Index Info: 80:C5:F2:8F:87:84 (Qualcomm) [hci0] 0.352346
@ MGMT Open: bluetoothd (privileged) version 1.14 {0x0001} 0.352347
@ MGMT Open: btmon (privileged) version 1.14 {0x0002} 0.352366
@ MGMT Open: btmgmt (privileged) version 1.14 {0x0003} 27.302164
@ MGMT Command: Start Discovery (0x0023) plen 1 {0x0003} [hci0] 27.302310
Address type: 0x06
LE Public
LE Random
< HCI Command: LE Set Random Address (0x08|0x0005) plen 6 #1 [hci0] 27.302496
Address: 15:60:F2:91:B2:24 (Non-Resolvable)
> HCI Event: Command Complete (0x0e) plen 4 #2 [hci0] 27.419117
LE Set Random Address (0x08|0x0005) ncmd 1
Status: Success (0x00)
< HCI Command: LE Set Scan Parameters (0x08|0x000b) plen 7 #3 [hci0] 27.419244
Type: Active (0x01)
Interval: 11.250 msec (0x0012)
Window: 11.250 msec (0x0012)
Own address type: Random (0x01)
Filter policy: Accept all advertisement (0x00)
> HCI Event: Command Complete (0x0e) plen 4 #4 [hci0] 27.420131
LE Set Random Address (0x08|0x0005) ncmd 1
Status: Success (0x00)
< HCI Command: LE Set Scan Enable (0x08|0x000c) plen 2 #5 [hci0] 27.420259
Scanning: Enabled (0x01)
Filter duplicates: Enabled (0x01)
> HCI Event: Command Complete (0x0e) plen 4 #6 [hci0] 27.420969
LE Set Scan Parameters (0x08|0x000b) ncmd 1
Status: Success (0x00)
> HCI Event: Command Complete (0x0e) plen 4 #7 [hci0] 27.421983
LE Set Scan Enable (0x08|0x000c) ncmd 1
Status: Success (0x00)
@ MGMT Event: Command Complete (0x0001) plen 4 {0x0003} [hci0] 27.422059
Start Discovery (0x0023) plen 1
Status: Success (0x00)
Address type: 0x06
LE Public
LE Random
@ MGMT Event: Discovering (0x0013) plen 2 {0x0003} [hci0] 27.422067
Address type: 0x06
LE Public
LE Random
Discovery: Enabled (0x01)
@ MGMT Event: Discovering (0x0013) plen 2 {0x0002} [hci0] 27.422067
Address type: 0x06
LE Public
LE Random
Discovery: Enabled (0x01)
@ MGMT Event: Discovering (0x0013) plen 2 {0x0001} [hci0] 27.422067
Address type: 0x06
LE Public
LE Random
Discovery: Enabled (0x01)
Signed-off-by: João Paulo Rechi Vita <jprvita@endlessm.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 93aa4792c3 ]
When the ring buffer is almost full due to RX completion messages, a
TX packet may reach the "low watermark" and cause the queue stopped.
If the TX completion arrives earlier than queue stopping, the wakeup
may be missed.
This patch moves the check for the last pending packet to cover both
EAGAIN and success cases, so the queue will be reliably waked up when
necessary.
Reported-and-tested-by: Stephan Klein <stephan.klein@wegfinder.at>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8ee8ee7fe8 ]
In some cases when a queue related allocation fails, successful past
allocations are freed but the pointer that pointed to them is not
set to NULL. This is a problem for 2 reasons:
1. This is generally a bad practice since this pointer might be
accidentally accessed in the future.
2. Future allocations using the same pointer check if the pointer
is NULL and fail if it is not.
Fixed this by setting such pointers to NULL in the allocation of
queue related objects.
Also refactored the code of ena_setup_tx_resources() to goto-style
error handling to avoid code duplication of resource freeing.
Fixes: 1738cd3ed3 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f913308879 ]
GCC 8 contains a number of new warnings as well as enhancements to existing
checkers. The warning - Wstringop-truncation - warns for calls to bounded
string manipulation functions such as strncat, strncpy, and stpncpy that
may either truncate the copied string or leave the destination unchanged.
In our case the destination string length (32 bytes) is much shorter than
the source string (64 bytes) which causes this warning to show up. In
general the destination has to be at least a byte larger than the length
of the source string with strncpy for this warning not to showup.
This can be easily fixed by using strlcpy instead which already does the
truncation to the string. Documentation for this function can be
found here:
https://elixir.bootlin.com/linux/latest/source/lib/string.c#L141
Fixes: 1738cd3ed3 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e87eb301be ]
Just like aio/io_uring, we need to grab 2 refcount for queuing one
request, one is for submission, another is for completion.
If the request isn't queued from plug code path, the refcount grabbed
in generic_make_request() serves for submission. In theroy, this
refcount should have been released after the sumission(async run queue)
is done. blk_freeze_queue() works with blk_sync_queue() together
for avoiding race between cleanup queue and IO submission, given async
run queue activities are canceled because hctx->run_work is scheduled with
the refcount held, so it is fine to not hold the refcount when
running the run queue work function for dispatch IO.
However, if request is staggered into plug list, and finally queued
from plug code path, the refcount in submission side is actually missed.
And we may start to run queue after queue is removed because the queue's
kobject refcount isn't guaranteed to be grabbed in flushing plug list
context, then kernel oops is triggered, see the following race:
blk_mq_flush_plug_list():
blk_mq_sched_insert_requests()
insert requests to sw queue or scheduler queue
blk_mq_run_hw_queue
Because of concurrent run queue, all requests inserted above may be
completed before calling the above blk_mq_run_hw_queue. Then queue can
be freed during the above blk_mq_run_hw_queue().
Fixes the issue by grab .q_usage_counter before calling
blk_mq_sched_insert_requests() in blk_mq_flush_plug_list(). This way is
safe because the queue is absolutely alive before inserting request.
Cc: Dongli Zhang <dongli.zhang@oracle.com>
Cc: James Smart <james.smart@broadcom.com>
Cc: linux-scsi@vger.kernel.org,
Cc: Martin K . Petersen <martin.petersen@oracle.com>,
Cc: Christoph Hellwig <hch@lst.de>,
Cc: James E . J . Bottomley <jejb@linux.vnet.ibm.com>,
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8f529ff912 ]
Set features can have multiple features turned on|off in a single
call. Grouping these all in an if/else means after one condition
is met, other conditions/features will not be evaluated. Break
the if/else statements by feature to ensure all features will be
handled properly.
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a7d0067147 ]
tools/bpf/bpftool/.gitignore has the "bpftool" pattern, which is
intended to ignore the following build artifact:
tools/bpf/bpftool/bpftool
However, the .gitignore entry is effective not only for the current
directory, but also for any sub-directories.
So, from the point of .gitignore grammar, the following check-in file
is also considered to be ignored:
tools/bpf/bpftool/bash-completion/bpftool
As the manual gitignore(5) says "Files already tracked by Git are not
affected", this is not a problem as far as Git is concerned.
However, Git is not the only program that parses .gitignore because
.gitignore is useful to distinguish build artifacts from source files.
For example, tar(1) supports the --exclude-vcs-ignore option. As of
writing, this option does not work perfectly, but it intends to create
a tarball excluding files specified by .gitignore.
So, I believe it is better to fix this issue.
You can fix it by prefixing the pattern with a slash; the leading slash
means the specified pattern is relative to the current directory.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6cea33701e ]
Test test_libbpf.sh failed on my development server with failure
-bash-4.4$ sudo ./test_libbpf.sh
[0] libbpf: Error in bpf_object__probe_name():Operation not permitted(1).
Couldn't load basic 'r0 = 0' BPF program.
test_libbpf: failed at file test_l4lb.o
selftests: test_libbpf [FAILED]
-bash-4.4$
The reason is because my machine has 64KB locked memory by default which
is not enough for this program to get locked memory.
Similar to other bpf selftests, let us increase RLIMIT_MEMLOCK
to infinity, which fixed the issue.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0e6741f092 ]
When unmapping the AF_XDP memory regions used for the rings, an
invalid address was passed to the munmap() calls. Instead of passing
the beginning of the memory region, the descriptor region was passed
to munmap.
When the userspace application tried to tear down an AF_XDP socket,
the operation failed and the application would still have a reference
to socket it wished to get rid of.
Reported-by: William Tu <u9012063@gmail.com>
Fixes: 1cad078842 ("libbpf: add support for using AF_XDP sockets")
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Tested-by: William Tu <u9012063@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 24474f2709 ]
Fixed possible memory leak in i40e_vc_add_cloud_filter function:
cfilter is being allocated and in some error conditions
the function returns without freeing the memory.
Fix of integer truncation from u16 (type of queue_id value) to u8
when calling i40e_vc_isvalid_queue_id function.
Signed-off-by: Martyna Szapar <martyna.szapar@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ca31ca8247 ]
When build perf for ARC recently, there was a build failure due to lack
of __NR_bpf.
| Auto-detecting system features:
|
| ... get_cpuid: [ OFF ]
| ... bpf: [ on ]
|
| # error __NR_bpf not defined. libbpf does not support your arch.
^~~~~
| bpf.c: In function 'sys_bpf':
| bpf.c:66:17: error: '__NR_bpf' undeclared (first use in this function)
| return syscall(__NR_bpf, cmd, attr, size);
| ^~~~~~~~
| sys_bpf
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9e4be8d03f ]
The SD Physical Layer Spec says the following: Since the SD Memory Card
shall support at least the two bus modes 1-bit or 4-bit width, then any SD
Card shall set at least bits 0 and 2 (SD_BUS_WIDTH="0101").
This change verifies the card has specified a bus width.
AMD SDHC Device 7806 can get into a bad state after a card disconnect
where anything transferred via the DATA lines will always result in a
zero filled buffer. Currently the driver will continue without error if
the HC is in this condition. A block device will be created, but reading
from it will result in a zero buffer. This makes it seem like the SD
device has been erased, when in actuality the data is never getting
copied from the DATA lines to the data buffer.
SCR is the first command in the SD initialization sequence that uses the
DATA lines. By checking that the response was invalid, we can abort
mounting the card.
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Raul E Rangel <rrangel@chromium.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9287c6452d ]
This patch has to do with the life cycle of glocks and buffers. When
gfs2 metadata or journaled data is queued to be written, a gfs2_bufdata
object is assigned to track the buffer, and that is queued to various
lists, including the glock's gl_ail_list to indicate it's on the active
items list. Once the page associated with the buffer has been written,
it is removed from the ail list, but its life isn't over until a revoke
has been successfully written.
So after the block is written, its bufdata object is moved from the
glock's gl_ail_list to a file-system-wide list of pending revokes,
sd_log_le_revoke. At that point the glock still needs to track how many
revokes it contributed to that list (in gl_revokes) so that things like
glock go_sync can ensure all the metadata has been not only written, but
also revoked before the glock is granted to a different node. This is
to guarantee journal replay doesn't replay the block once the glock has
been granted to another node.
Ross Lagerwall recently discovered a race in which an inode could be
evicted, and its glock freed after its ail list had been synced, but
while it still had unwritten revokes on the sd_log_le_revoke list. The
evict decremented the glock reference count to zero, which allowed the
glock to be freed. After the revoke was written, function
revoke_lo_after_commit tried to adjust the glock's gl_revokes counter
and clear its GLF_LFLUSH flag, at which time it referenced the freed
glock.
This patch fixes the problem by incrementing the glock reference count
in gfs2_add_revoke when the glock's first bufdata object is moved from
the glock to the global revokes list. Later, when the glock's last such
bufdata object is freed, the reference count is decremented. This
guarantees that whichever process finishes last (the revoke writing or
the evict) will properly free the glock, and neither will reference the
glock after it has been freed.
Reported-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4c4b1996b5 ]
The work_item cancels that occur when a QP is destroyed can elicit the
following trace:
workqueue: WQ_MEM_RECLAIM ipoib_wq:ipoib_cm_tx_reap [ib_ipoib] is flushing !WQ_MEM_RECLAIM hfi0_0:_hfi1_do_send [hfi1]
WARNING: CPU: 7 PID: 1403 at kernel/workqueue.c:2486 check_flush_dependency+0xb1/0x100
Call Trace:
__flush_work.isra.29+0x8c/0x1a0
? __switch_to_asm+0x40/0x70
__cancel_work_timer+0x103/0x190
? schedule+0x32/0x80
iowait_cancel_work+0x15/0x30 [hfi1]
rvt_reset_qp+0x1f8/0x3e0 [rdmavt]
rvt_destroy_qp+0x65/0x1f0 [rdmavt]
? _cond_resched+0x15/0x30
ib_destroy_qp+0xe9/0x230 [ib_core]
ipoib_cm_tx_reap+0x21c/0x560 [ib_ipoib]
process_one_work+0x171/0x370
worker_thread+0x49/0x3f0
kthread+0xf8/0x130
? max_active_store+0x80/0x80
? kthread_bind+0x10/0x10
ret_from_fork+0x35/0x40
Since QP destruction frees memory, hfi1_wq should have the WQ_MEM_RECLAIM.
The hfi1_wq does not allocate memory with GFP_KERNEL or otherwise become
entangled with memory reclaim, so this flag is appropriate.
Fixes: 0a226edd20 ("staging/rdma/hfi1: Use parallel workqueue for SDMA engines")
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7889f44dd9 ]
This issue is found by running liburing/test/io_uring_setup test.
When test run, the testcase "attempt to bind to invalid cpu" would not
pass with messages like:
io_uring_setup(1, 0xbfc2f7c8), \
flags: IORING_SETUP_SQPOLL|IORING_SETUP_SQ_AFF, \
resv: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000, \
sq_thread_cpu: 2
expected -1, got 3
FAIL
On my system, there is:
CPU(s) possible : 0-3
CPU(s) online : 0-1
CPU(s) offline : 2-3
CPU(s) present : 0-1
The sq_thread_cpu 2 is offline on my system, so the bind should fail.
But cpu_possible() will pass the check. We shouldn't be able to bind
to an offline cpu. Use cpu_online() to do the check.
After the change, the testcase run as expected: EINVAL will be returned
for cpu offlined.
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Shenghui Wang <shhuiw@foxmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8f91821990 ]
As part of the freeze operation, gfs2_freeze_func() is left blocking
on a request to hold the sd_freeze_gl in SH. This glock is held in EX
by the gfs2_freeze() code.
A subsequent call to gfs2_unfreeze() releases the EXclusively held
sd_freeze_gl, which allows gfs2_freeze_func() to acquire it in SH and
resume its operation.
gfs2_unfreeze(), however, doesn't wait for gfs2_freeze_func() to complete.
If a umount is issued right after unfreeze, it could result in an
inconsistent filesystem because some journal data (statfs update) isn't
written out.
Refer to commit 24972557b1 for a more detailed explanation of how
freeze/unfreeze work.
This patch causes gfs2_unfreeze() to wait for gfs2_freeze_func() to
complete before returning to the user.
Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 950a578c61 ]
Actually we don't do anything with return value from
nfs_wait_client_init_complete in nfs_match_client, as a
consequence if we get a fatal signal and client is not
fully initialised, we'll loop to "again" label
This has been proven to cause soft lockups on some scenarios
(no-carrier but configured network interfaces)
Signed-off-by: Roberto Bergantinos Corpas <rbergant@redhat.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a2f611a3dc ]
The AFS3 FID is three 32-bit unsigned numbers and is represented as three
up-to-8-hex-digit numbers separated by colons to the afs.fid xattr.
However, with the advent of support for YFS, the FID is now a 64-bit volume
number, a 96-bit vnode/inode number and a 32-bit uniquifier (as before).
Whilst the sprintf in afs_xattr_get_fid() has been partially updated (it
currently ignores the upper 32 bits of the 96-bit vnode number), the size
of the stack-based buffer has not been increased to match, thereby allowing
stack corruption to occur.
Fix this by increasing the buffer size appropriately and conditionally
including the upper part of the vnode number if it is non-zero. The latter
requires the lower part to be zero-padded if the upper part is non-zero.
Fixes: 3b6492df41 ("afs: Increase to 64-bit volume ID and 96-bit vnode ID for YFS")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7881ef3f33 ]
Under certain conditions, lru_count may drop below zero resulting in
a large amount of log spam like this:
vmscan: shrink_slab: gfs2_dump_glock+0x3b0/0x630 [gfs2] \
negative objects to delete nr=-1
This happens as follows:
1) A glock is moved from lru_list to the dispose list and lru_count is
decremented.
2) The dispose function calls cond_resched() and drops the lru lock.
3) Another thread takes the lru lock and tries to add the same glock to
lru_list, checking if the glock is on an lru list.
4) It is on a list (actually the dispose list) and so it avoids
incrementing lru_count.
5) The glock is moved to lru_list.
5) The original thread doesn't dispose it because it has been re-added
to the lru list but the lru_count has still decreased by one.
Fix by checking if the LRU flag is set on the glock rather than checking
if the glock is on some list and rearrange the code so that the LRU flag
is added/removed precisely when the glock is added/removed from lru_list.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
This reverts commit eb432217d7.
There is currently no corresponding patch in master due to additional
changes that would be significantly different from plain revert in the
respective stable branch.
The range argument was not handled correctly and could cause trim to
overlap allocated areas or reach beyond the end of the device. The
address space that fitrim normally operates on is in logical
coordinates, while the discards are done on the physical device extents.
This distinction cannot be made with the current ioctl interface and
caused the confusion.
The bug depends on the layout of block groups and does not always
happen. The whole-fs trim (run by default by the fstrim tool) is not
affected.
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f8e6089820 upstream.
Commit 59c08c69c2 ("netfilter: ctnetlink: Support L3 protocol-filter
on flush") introduced a user-space regression when flushing connection
track entries. Before this commit, the nfgen_family field was not used
by the kernel and all entries were removed. Since this commit,
nfgen_family is used to filter out entries that should not be removed.
One example a broken tool is conntrack. conntrack always sets
nfgen_family to AF_INET, so after 59c08c69c2 only IPv4 entries were
removed with the -F parameter.
Pablo Neira Ayuso suggested using nfgenmsg->version to resolve the
regression, and this commit implements his suggestion. nfgenmsg->version
is so far set to zero, so it is well-suited to be used as a flag for
selecting old or new flush behavior. If version is 0, nfgen_family is
ignored and all entries are used. If user-space sets the version to one
(or any other value than 0), then the new behavior is used. As version
only can have two valid values, I chose not to add a new
NFNETLINK_VERSION-constant.
Fixes: 59c08c69c2 ("netfilter: ctnetlink: Support L3 protocol-filter on flush")
Reported-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>
Tested-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9419a3191d upstream.
What happens there is that we are replacing file->path.mnt of
a file we'd just opened with a clone and we need the write
count contribution to be transferred from original mount to
new one. That's it. We do *NOT* want any kind of freeze
protection for the duration of switchover.
IOW, we should just use __mnt_{want,drop}_write() for that
switchover; no need to bother with mnt_{want,drop}_write()
there.
Tested-by: Amir Goldstein <amir73il@gmail.com>
Reported-by: syzbot+2a73a6ea9507b7112141@syzkaller.appspotmail.com
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d65842f712 upstream.
Calling VIDIOC_DQBUF can release the core serialization lock pointed to
by vb2_queue->lock if it has to wait for a new buffer to arrive.
However, if userspace dup()ped the video device filehandle, then it is
possible to read or call DQBUF from two filehandles at the same time.
It is also possible to call REQBUFS from one filehandle while the other
is waiting for a buffer. This will remove all the buffers and reallocate
new ones. Removing all the buffers isn't the problem here (that's already
handled correctly by DQBUF), but the reallocating part is: DQBUF isn't
aware that the buffers have changed.
This is fixed by setting a flag whenever the lock is released while waiting
for a buffer to arrive. And checking the flag where needed so we can return
-EBUSY.
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
Reported-by: Syzbot <syzbot+4180ff9ca6810b06c1e9@syzkaller.appspotmail.com>
Reviewed-by: Tomasz Figa <tfiga@chromium.org>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8c40292be9 upstream.
Syzkaller hit 'WARNING in __alloc_pages_nodemask' bug.
WARNING: CPU: 1 PID: 1473 at mm/page_alloc.c:4377
__alloc_pages_nodemask+0x4da/0x2130
Kernel panic - not syncing: panic_on_warn set ...
Call Trace:
alloc_pages_current+0xb1/0x1e0
kmalloc_order+0x1f/0x60
kmalloc_order_trace+0x1d/0x120
fb_alloc_cmap_gfp+0x85/0x2b0
fb_set_user_cmap+0xff/0x370
do_fb_ioctl+0x949/0xa20
fb_ioctl+0xdd/0x120
do_vfs_ioctl+0x186/0x1070
ksys_ioctl+0x89/0xa0
__x64_sys_ioctl+0x74/0xb0
do_syscall_64+0xc8/0x550
entry_SYSCALL_64_after_hwframe+0x49/0xbe
This is a warning about order >= MAX_ORDER and the order is from
userspace ioctl. Add flag __NOWARN to silence this warning.
Signed-off-by: Jiufei Xue <jiufei.xue@linux.alibaba.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit acf3062a7e upstream.
This nasty little syzbot repro:
https://syzkaller.appspot.com/x/repro.syz?x=12c7a94f400000
Creates overlay mounts where the same directory is both in upper and lower
layers. Simplified example:
mkdir foo work
mount -t overlay none foo -o"lowerdir=.,upperdir=foo,workdir=work"
The repro runs several threads in parallel that attempt to chdir into foo
and attempt to symlink/rename/exec/mkdir the file bar.
The repro hits a WARN_ON() I placed in ovl_instantiate(), which suggests
that an overlay inode already exists in cache and is hashed by the pointer
of the real upper dentry that ovl_create_real() has just created. At the
point of the WARN_ON(), for overlay dir inode lock is held and upper dir
inode lock, so at first, I did not see how this was possible.
On a closer look, I see that after ovl_create_real(), because of the
overlapping upper and lower layers, a lookup by another thread can find the
file foo/bar that was just created in upper layer, at overlay path
foo/foo/bar and hash the an overlay inode with the new real dentry as lower
dentry. This is possible because the overlay directory foo/foo is not
locked and the upper dentry foo/bar is in dcache, so ovl_lookup() can find
it without taking upper dir inode shared lock.
Overlapping layers is considered a wrong setup which would result in
unexpected behavior, but it shouldn't crash the kernel and it shouldn't
trigger WARN_ON() either, so relax this WARN_ON() and leave a pr_warn()
instead to cover all cases of failure to get an overlay inode.
The error returned from failure to insert new inode to cache with
inode_insert5() was changed to -EEXIST, to distinguish from the error
-ENOMEM returned on failure to get/allocate inode with iget5_locked().
Reported-by: syzbot+9c69c282adc4edd2b540@syzkaller.appspotmail.com
Fixes: 01b39dcc95 ("ovl: use inode_insert5() to hash a newly...")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 969f5ea627 upstream.
Revisions of the Cortex-A76 CPU prior to r4p0 are affected by an erratum
that can prevent interrupts from being taken when single-stepping.
This patch implements a software workaround to prevent userspace from
effectively being able to disable interrupts.
Cc: <stable@vger.kernel.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e32773357d upstream.
A failed call to kobject_init_and_add() must be followed by a call to
kobject_put(). Currently in the error path when adding fs_devices we
are missing this call. This could be fixed by calling
btrfs_sysfs_remove_fsid() if btrfs_sysfs_add_fsid() returns an error or
by adding a call to kobject_put() directly in btrfs_sysfs_add_fsid().
Here we choose the second option because it prevents the slightly
unusual error path handling requirements of kobject from leaking out
into btrfs functions.
Add a call to kobject_put() in the error path of kobject_add_and_init().
This causes the release method to be called if kobject_init_and_add()
fails. open_tree() is the function that calls btrfs_sysfs_add_fsid()
and the error code in this function is already written with the
assumption that the release method is called during the error path of
open_tree() (as seen by the call to btrfs_sysfs_remove_fsid() under the
fail_fsdev_sysfs label).
Cc: stable@vger.kernel.org # v4.4+
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tobin C. Harding <tobin@kernel.org>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 450ff83488 upstream.
If a call to kobject_init_and_add() fails we must call kobject_put()
otherwise we leak memory.
Calling kobject_put() when kobject_init_and_add() fails drops the
refcount back to 0 and calls the ktype release method (which in turn
calls the percpu destroy and kfree).
Add call to kobject_put() in the error path of call to
kobject_init_and_add().
Cc: stable@vger.kernel.org # v4.4+
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tobin C. Harding <tobin@kernel.org>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0c713cbab6 upstream.
When we do a full fsync (the bit BTRFS_INODE_NEEDS_FULL_SYNC is set in the
inode) that happens to be ranged, which happens during a msync() or writes
for files opened with O_SYNC for example, we can end up with a corrupt log,
due to different file extent items representing ranges that overlap with
each other, or hit some assertion failures.
When doing a ranged fsync we only flush delalloc and wait for ordered
exents within that range. If while we are logging items from our inode
ordered extents for adjacent ranges complete, we end up in a race that can
make us insert the file extent items that overlap with others we logged
previously and the assertion failures.
For example, if tree-log.c:copy_items() receives a leaf that has the
following file extents items, all with a length of 4K and therefore there
is an implicit hole in the range 68K to 72K - 1:
(257 EXTENT_ITEM 64K), (257 EXTENT_ITEM 72K), (257 EXTENT_ITEM 76K), ...
It copies them to the log tree. However due to the need to detect implicit
holes, it may release the path, in order to look at the previous leaf to
detect an implicit hole, and then later it will search again in the tree
for the first file extent item key, with the goal of locking again the
leaf (which might have changed due to concurrent changes to other inodes).
However when it locks again the leaf containing the first key, the key
corresponding to the extent at offset 72K may not be there anymore since
there is an ordered extent for that range that is finishing (that is,
somewhere in the middle of btrfs_finish_ordered_io()), and it just
removed the file extent item but has not yet replaced it with a new file
extent item, so the part of copy_items() that does hole detection will
decide that there is a hole in the range starting from 68K to 76K - 1,
and therefore insert a file extent item to represent that hole, having
a key offset of 68K. After that we now have a log tree with 2 different
extent items that have overlapping ranges:
1) The file extent item copied before copy_items() released the path,
which has a key offset of 72K and a length of 4K, representing the
file range 72K to 76K - 1.
2) And a file extent item representing a hole that has a key offset of
68K and a length of 8K, representing the range 68K to 76K - 1. This
item was inserted after releasing the path, and overlaps with the
extent item inserted before.
The overlapping extent items can cause all sorts of unpredictable and
incorrect behaviour, either when replayed or if a fast (non full) fsync
happens later, which can trigger a BUG_ON() when calling
btrfs_set_item_key_safe() through __btrfs_drop_extents(), producing a
trace like the following:
[61666.783269] ------------[ cut here ]------------
[61666.783943] kernel BUG at fs/btrfs/ctree.c:3182!
[61666.784644] invalid opcode: 0000 [#1] PREEMPT SMP
(...)
[61666.786253] task: ffff880117b88c40 task.stack: ffffc90008168000
[61666.786253] RIP: 0010:btrfs_set_item_key_safe+0x7c/0xd2 [btrfs]
[61666.786253] RSP: 0018:ffffc9000816b958 EFLAGS: 00010246
[61666.786253] RAX: 0000000000000000 RBX: 000000000000000f RCX: 0000000000030000
[61666.786253] RDX: 0000000000000000 RSI: ffffc9000816ba4f RDI: ffffc9000816b937
[61666.786253] RBP: ffffc9000816b998 R08: ffff88011dae2428 R09: 0000000000001000
[61666.786253] R10: 0000160000000000 R11: 6db6db6db6db6db7 R12: ffff88011dae2418
[61666.786253] R13: ffffc9000816ba4f R14: ffff8801e10c4118 R15: ffff8801e715c000
[61666.786253] FS: 00007f6060a18700(0000) GS:ffff88023f5c0000(0000) knlGS:0000000000000000
[61666.786253] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[61666.786253] CR2: 00007f6060a28000 CR3: 0000000213e69000 CR4: 00000000000006e0
[61666.786253] Call Trace:
[61666.786253] __btrfs_drop_extents+0x5e3/0xaad [btrfs]
[61666.786253] ? time_hardirqs_on+0x9/0x14
[61666.786253] btrfs_log_changed_extents+0x294/0x4e0 [btrfs]
[61666.786253] ? release_extent_buffer+0x38/0xb4 [btrfs]
[61666.786253] btrfs_log_inode+0xb6e/0xcdc [btrfs]
[61666.786253] ? lock_acquire+0x131/0x1c5
[61666.786253] ? btrfs_log_inode_parent+0xee/0x659 [btrfs]
[61666.786253] ? arch_local_irq_save+0x9/0xc
[61666.786253] ? btrfs_log_inode_parent+0x1f5/0x659 [btrfs]
[61666.786253] btrfs_log_inode_parent+0x223/0x659 [btrfs]
[61666.786253] ? arch_local_irq_save+0x9/0xc
[61666.786253] ? lockref_get_not_zero+0x2c/0x34
[61666.786253] ? rcu_read_unlock+0x3e/0x5d
[61666.786253] btrfs_log_dentry_safe+0x60/0x7b [btrfs]
[61666.786253] btrfs_sync_file+0x317/0x42c [btrfs]
[61666.786253] vfs_fsync_range+0x8c/0x9e
[61666.786253] SyS_msync+0x13c/0x1c9
[61666.786253] entry_SYSCALL_64_fastpath+0x18/0xad
A sample of a corrupt log tree leaf with overlapping extents I got from
running btrfs/072:
item 14 key (295 108 200704) itemoff 2599 itemsize 53
extent data disk bytenr 0 nr 0
extent data offset 0 nr 458752 ram 458752
item 15 key (295 108 659456) itemoff 2546 itemsize 53
extent data disk bytenr 4343541760 nr 770048
extent data offset 606208 nr 163840 ram 770048
item 16 key (295 108 663552) itemoff 2493 itemsize 53
extent data disk bytenr 4343541760 nr 770048
extent data offset 610304 nr 155648 ram 770048
item 17 key (295 108 819200) itemoff 2440 itemsize 53
extent data disk bytenr 4334788608 nr 4096
extent data offset 0 nr 4096 ram 4096
The file extent item at offset 659456 (item 15) ends at offset 823296
(659456 + 163840) while the next file extent item (item 16) starts at
offset 663552.
Another different problem that the race can trigger is a failure in the
assertions at tree-log.c:copy_items(), which expect that the first file
extent item key we found before releasing the path exists after we have
released path and that the last key we found before releasing the path
also exists after releasing the path:
$ cat -n fs/btrfs/tree-log.c
4080 if (need_find_last_extent) {
4081 /* btrfs_prev_leaf could return 1 without releasing the path */
4082 btrfs_release_path(src_path);
4083 ret = btrfs_search_slot(NULL, inode->root, &first_key,
4084 src_path, 0, 0);
4085 if (ret < 0)
4086 return ret;
4087 ASSERT(ret == 0);
(...)
4103 if (i >= btrfs_header_nritems(src_path->nodes[0])) {
4104 ret = btrfs_next_leaf(inode->root, src_path);
4105 if (ret < 0)
4106 return ret;
4107 ASSERT(ret == 0);
4108 src = src_path->nodes[0];
4109 i = 0;
4110 need_find_last_extent = true;
4111 }
(...)
The second assertion implicitly expects that the last key before the path
release still exists, because the surrounding while loop only stops after
we have found that key. When this assertion fails it produces a stack like
this:
[139590.037075] assertion failed: ret == 0, file: fs/btrfs/tree-log.c, line: 4107
[139590.037406] ------------[ cut here ]------------
[139590.037707] kernel BUG at fs/btrfs/ctree.h:3546!
[139590.038034] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
[139590.038340] CPU: 1 PID: 31841 Comm: fsstress Tainted: G W 5.0.0-btrfs-next-46 #1
(...)
[139590.039354] RIP: 0010:assfail.constprop.24+0x18/0x1a [btrfs]
(...)
[139590.040397] RSP: 0018:ffffa27f48f2b9b0 EFLAGS: 00010282
[139590.040730] RAX: 0000000000000041 RBX: ffff897c635d92c8 RCX: 0000000000000000
[139590.041105] RDX: 0000000000000000 RSI: ffff897d36a96868 RDI: ffff897d36a96868
[139590.041470] RBP: ffff897d1b9a0708 R08: 0000000000000000 R09: 0000000000000000
[139590.041815] R10: 0000000000000008 R11: 0000000000000000 R12: 0000000000000013
[139590.042159] R13: 0000000000000227 R14: ffff897cffcbba88 R15: 0000000000000001
[139590.042501] FS: 00007f2efc8dee80(0000) GS:ffff897d36a80000(0000) knlGS:0000000000000000
[139590.042847] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[139590.043199] CR2: 00007f8c064935e0 CR3: 0000000232252002 CR4: 00000000003606e0
[139590.043547] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[139590.043899] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[139590.044250] Call Trace:
[139590.044631] copy_items+0xa3f/0x1000 [btrfs]
[139590.045009] ? generic_bin_search.constprop.32+0x61/0x200 [btrfs]
[139590.045396] btrfs_log_inode+0x7b3/0xd70 [btrfs]
[139590.045773] btrfs_log_inode_parent+0x2b3/0xce0 [btrfs]
[139590.046143] ? do_raw_spin_unlock+0x49/0xc0
[139590.046510] btrfs_log_dentry_safe+0x4a/0x70 [btrfs]
[139590.046872] btrfs_sync_file+0x3b6/0x440 [btrfs]
[139590.047243] btrfs_file_write_iter+0x45b/0x5c0 [btrfs]
[139590.047592] __vfs_write+0x129/0x1c0
[139590.047932] vfs_write+0xc2/0x1b0
[139590.048270] ksys_write+0x55/0xc0
[139590.048608] do_syscall_64+0x60/0x1b0
[139590.048946] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[139590.049287] RIP: 0033:0x7f2efc4be190
(...)
[139590.050342] RSP: 002b:00007ffe743243a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[139590.050701] RAX: ffffffffffffffda RBX: 0000000000008d58 RCX: 00007f2efc4be190
[139590.051067] RDX: 0000000000008d58 RSI: 00005567eca0f370 RDI: 0000000000000003
[139590.051459] RBP: 0000000000000024 R08: 0000000000000003 R09: 0000000000008d60
[139590.051863] R10: 0000000000000078 R11: 0000000000000246 R12: 0000000000000003
[139590.052252] R13: 00000000003d3507 R14: 00005567eca0f370 R15: 0000000000000000
(...)
[139590.055128] ---[ end trace 193f35d0215cdeeb ]---
So fix this race between a full ranged fsync and writeback of adjacent
ranges by flushing all delalloc and waiting for all ordered extents to
complete before logging the inode. This is the simplest way to solve the
problem because currently the full fsync path does not deal with ranges
at all (it assumes a full range from 0 to LLONG_MAX) and it always needs
to look at adjacent ranges for hole detection. For use cases of ranged
fsyncs this can make a few fsyncs slower but on the other hand it can
make some following fsyncs to other ranges do less work or no need to do
anything at all. A full fsync is rare anyway and happens only once after
loading/creating an inode and once after less common operations such as a
shrinking truncate.
This is an issue that exists for a long time, and was often triggered by
generic/127, because it does mmap'ed writes and msync (which triggers a
ranged fsync). Adding support for the tree checker to detect overlapping
extents (next patch in the series) and trigger a WARN() when such cases
are found, and then calling btrfs_check_leaf_full() at the end of
btrfs_insert_file_extent() made the issue much easier to detect. Running
btrfs/072 with that change to the tree checker and making fsstress open
files always with O_SYNC made it much easier to trigger the issue (as
triggering it with generic/127 is very rare).
CC: stable@vger.kernel.org # 3.16+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ebb929060a upstream.
When we are doing a full fsync (bit BTRFS_INODE_NEEDS_FULL_SYNC set) of a
file that has holes and has file extent items spanning two or more leafs,
we can end up falling to back to a full transaction commit due to a logic
bug that leads to failure to insert a duplicate file extent item that is
meant to represent a hole between the last file extent item of a leaf and
the first file extent item in the next leaf. The failure (EEXIST error)
leads to a transaction commit (as most errors when logging an inode do).
For example, we have the two following leafs:
Leaf N:
-----------------------------------------------
| ..., ..., ..., (257, FILE_EXTENT_ITEM, 64K) |
-----------------------------------------------
The file extent item at the end of leaf N has a length of 4Kb,
representing the file range from 64K to 68K - 1.
Leaf N + 1:
-----------------------------------------------
| (257, FILE_EXTENT_ITEM, 72K), ..., ..., ... |
-----------------------------------------------
The file extent item at the first slot of leaf N + 1 has a length of
4Kb too, representing the file range from 72K to 76K - 1.
During the full fsync path, when we are at tree-log.c:copy_items() with
leaf N as a parameter, after processing the last file extent item, that
represents the extent at offset 64K, we take a look at the first file
extent item at the next leaf (leaf N + 1), and notice there's a 4K hole
between the two extents, and therefore we insert a file extent item
representing that hole, starting at file offset 68K and ending at offset
72K - 1. However we don't update the value of *last_extent, which is used
to represent the end offset (plus 1, non-inclusive end) of the last file
extent item inserted in the log, so it stays with a value of 68K and not
with a value of 72K.
Then, when copy_items() is called for leaf N + 1, because the value of
*last_extent is smaller then the offset of the first extent item in the
leaf (68K < 72K), we look at the last file extent item in the previous
leaf (leaf N) and see it there's a 4K gap between it and our first file
extent item (again, 68K < 72K), so we decide to insert a file extent item
representing the hole, starting at file offset 68K and ending at offset
72K - 1, this insertion will fail with -EEXIST being returned from
btrfs_insert_file_extent() because we already inserted a file extent item
representing a hole for this offset (68K) in the previous call to
copy_items(), when processing leaf N.
The -EEXIST error gets propagated to the fsync callback, btrfs_sync_file(),
which falls back to a full transaction commit.
Fix this by adjusting *last_extent after inserting a hole when we had to
look at the next leaf.
Fixes: 4ee3fad34a ("Btrfs: fix fsync after hole punching when using no-holes feature")
Cc: stable@vger.kernel.org # 4.14+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 72bd2323ec upstream.
Currently when we fail to COW a path at btrfs_update_root() we end up
always aborting the transaction. However all the current callers of
btrfs_update_root() are able to deal with errors returned from it, many do
end up aborting the transaction themselves (directly or not, such as the
transaction commit path), other BUG_ON() or just gracefully cancel whatever
they were doing.
When syncing the fsync log, we call btrfs_update_root() through
tree-log.c:update_log_root(), and if it returns an -ENOSPC error, the log
sync code does not abort the transaction, instead it gracefully handles
the error and returns -EAGAIN to the fsync handler, so that it falls back
to a transaction commit. Any other error different from -ENOSPC, makes the
log sync code abort the transaction.
So remove the transaction abort from btrfs_update_log() when we fail to
COW a path to update the root item, so that if an -ENOSPC failure happens
we avoid aborting the current transaction and have a chance of the fsync
succeeding after falling back to a transaction commit.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203413
Fixes: 79787eaab4 ("btrfs: replace many BUG_ONs with proper error handling")
Cc: stable@vger.kernel.org # 4.4+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2b90883c56 upstream.
When a file's compression property is set as zlib or zstd but leave
the compression mount option not be set, that means btrfs will try
to compress the file with default compression level. But in
btrfs_compress_pages(), it calls get_workspace() with level = 0.
This will return a workspace with a wrong compression level.
For zlib, the compression level in the workspace will be 0
(that means "store only"). And for zstd, the compression in the
workspace will be 1, not the default level 3.
How to reproduce:
mkfs -t btrfs /dev/sdb
mount /dev/sdb /mnt/
mkdir /mnt/zlib
btrfs property set /mnt/zlib/ compression zlib
dd if=/dev/zero of=/mnt/zlib/compression-friendly-file-10M bs=1M count=10
sync
btrfs-debugfs -f /mnt/zlib/compression-friendly-file-10M
btrfs-debugfs output:
* before:
...
(258 9961472): ram 524288 disk 1106247680 disk_size 524288
file: ... extents 20 disk size 10485760 logical size 10485760 ratio 1.00
* after:
...
(258 10354688): ram 131072 disk 14217216 disk_size 4096
file: ... extents 80 disk size 327680 logical size 10485760 ratio 32.00
The steps for zstd are similar, but need to put a debugging message to
show the level of the return workspace in zstd_get_workspace().
This commit adds a check of the compression level before getting a
workspace by set_level().
CC: stable@vger.kernel.org # 5.1+
Signed-off-by: Johnny Chang <johnnyc@synology.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8fca955057 upstream.
If we have an error writing out a delalloc range in
btrfs_punch_hole_lock_range we'll unlock the inode and then goto
out_only_mutex, where we will again unlock the inode. This is bad,
don't do this.
Fixes: f27451f229 ("Btrfs: add support for fallocate's zero range operation")
CC: stable@vger.kernel.org # 4.19+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5a5ec83d6a upstream.
Commit 4d207133e9 changed the types of the statistic values in struct
gfs2_lkstats from s64 to u64. Because of that, what should be a signed
value in gfs2_update_stats turned into an unsigned value. When shifted
right, we end up with a large positive value instead of a small negative
value, which results in an incorrect variance estimate.
Fixes: 4d207133e9 ("gfs2: Make statistics unsigned, suitable for use with do_div()")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Cc: stable@vger.kernel.org # v4.4+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a98d9ae937 upstream.
DMA allocations that can't sleep may return non-remapped addresses, but
we do not properly handle them in the mmap and get_sgtable methods.
Resolve non-vmalloc addresses using virt_to_page to handle this corner
case.
Cc: <stable@vger.kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b2eed9b588 upstream.
The following commit
7290d58095 ("module: use relative references for __ksymtab entries")
updated the ksymtab handling of some KASLR capable architectures
so that ksymtab entries are emitted as pairs of 32-bit relative
references. This reduces the size of the entries, but more
importantly, it gets rid of statically assigned absolute
addresses, which require fixing up at boot time if the kernel
is self relocating (which takes a 24 byte RELA entry for each
member of the ksymtab struct).
Since ksymtab entries are always part of the same module as the
symbol they export, it was assumed at the time that a 32-bit
relative reference is always sufficient to capture the offset
between a ksymtab entry and its target symbol.
Unfortunately, this is not always true: in the case of per-CPU
variables, a per-CPU variable's base address (which usually differs
from the actual address of any of its per-CPU copies) is allocated
in the vicinity of the ..data.percpu section in the core kernel
(i.e., in the per-CPU reserved region which follows the section
containing the core kernel's statically allocated per-CPU variables).
Since we randomize the module space over a 4 GB window covering
the core kernel (based on the -/+ 4 GB range of an ADRP/ADD pair),
we may end up putting the core kernel out of the -/+ 2 GB range of
32-bit relative references of module ksymtab entries that refer to
per-CPU variables.
So reduce the module randomization range a bit further. We lose
1 bit of randomization this way, but this is something we can
tolerate.
Cc: <stable@vger.kernel.org> # v4.19+
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 52f476a323 upstream.
Jeff discovered that performance improves from ~375K iops to ~519K iops
on a simple psync-write fio workload when moving the location of 'struct
page' from the default PMEM location to DRAM. This result is surprising
because the expectation is that 'struct page' for dax is only needed for
third party references to dax mappings. For example, a dax-mapped buffer
passed to another system call for direct-I/O requires 'struct page' for
sending the request down the driver stack and pinning the page. There is
no usage of 'struct page' for first party access to a file via
read(2)/write(2) and friends.
However, this "no page needed" expectation is violated by
CONFIG_HARDENED_USERCOPY and the check_copy_size() performed in
copy_from_iter_full_nocache() and copy_to_iter_mcsafe(). The
check_heap_object() helper routine assumes the buffer is backed by a
slab allocator (DRAM) page and applies some checks. Those checks are
invalid, dax pages do not originate from the slab, and redundant,
dax_iomap_actor() has already validated that the I/O is within bounds.
Specifically that routine validates that the logical file offset is
within bounds of the file, then it does a sector-to-pfn translation
which validates that the physical mapping is within bounds of the block
device.
Bypass additional hardened usercopy overhead and call the 'no check'
versions of the copy_{to,from}_iter operations directly.
Fixes: 0aed55af88 ("x86, uaccess: introduce copy_from_iter_flushcache...")
Cc: <stable@vger.kernel.org>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Matthew Wilcox <willy@infradead.org>
Reported-and-tested-by: Jeff Smits <jeff.smits@intel.com>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c9bcd3e333 upstream.
Current logic does not allow VCPU to be loaded onto CPU with
APIC ID 255. This should be allowed since the host physical APIC ID
field in the AVIC Physical APIC table entry is an 8-bit value,
and APIC ID 255 is valid in system with x2APIC enabled.
Instead, do not allow VCPU load if the host APIC ID cannot be
represented by an 8-bit value.
Also, use the more appropriate AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK
instead of AVIC_MAX_PHYSICAL_ID_COUNT.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 654f1f13ea upstream.
When assigning kvm irqfd we didn't check the irqchip mode but we allow
KVM_IRQFD to succeed with all the irqchip modes. However it does not
make much sense to create irqfd even without the kernel chips. Let's
provide a arch-dependent helper to check whether a specific irqfd is
allowed by the arch. At least for x86, it should make sense to check:
- when irqchip mode is NONE, all irqfds should be disallowed, and,
- when irqchip mode is SPLIT, irqfds that are with resamplefd should
be disallowed.
For either of the case, previously we'll silently ignore the irq or
the irq ack event if the irqchip mode is incorrect. However that can
cause misterious guest behaviors and it can be hard to triage. Let's
fail KVM_IRQFD even earlier to detect these incorrect configurations.
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Radim Krčmář <rkrcmar@redhat.com>
CC: Alex Williamson <alex.williamson@redhat.com>
CC: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7bf7eac8d6 upstream.
Pankaj reports that starting with commit ad428cdb52 "dax: Check the
end of the block-device capacity with dax_direct_access()" device-mapper
no longer allows dax operation. This results from the stricter checks in
__bdev_dax_supported() that validate that the start and end of a
block-device map to the same 'pagemap' instance.
Teach the dax-core and device-mapper to validate the 'pagemap' on a
per-target basis. This is accomplished by refactoring the
bdev_dax_supported() internals into generic_fsdax_supported() which
takes a sector range to validate. Consequently generic_fsdax_supported()
is suitable to be used in a device-mapper ->iterate_devices() callback.
A new ->dax_supported() operation is added to allow composite devices to
split and route upper-level bdev_dax_supported() requests.
Fixes: ad428cdb52 ("dax: Check the end of the block-device...")
Cc: <stable@vger.kernel.org>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reported-by: Pankaj Gupta <pagupta@redhat.com>
Reviewed-by: Pankaj Gupta <pagupta@redhat.com>
Tested-by: Pankaj Gupta <pagupta@redhat.com>
Tested-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9b2ca371b1 upstream.
Without this check a snapshot is taken whenever a bucket's max is hit,
rather than only when the global max is hit, as it should be.
Before:
In this example, we do a first run of the workload (cyclictest),
examine the output, note the max ('triggering value') (347), then do
a second run and note the max again.
In this case, the max in the second run (39) is below the max in the
first run, but since we haven't cleared the histogram, the first max
is still in the histogram and is higher than any other max, so it
should still be the max for the snapshot. It isn't however - the
value should still be 347 after the second run.
# echo 'hist:keys=pid:ts0=common_timestamp.usecs if comm=="cyclictest"' >> /sys/kernel/debug/tracing/events/sched/sched_waking/trigger
# echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0:onmax($wakeup_lat).save(next_prio,next_comm,prev_pid,prev_prio,prev_comm):onmax($wakeup_lat).snapshot() if next_comm=="cyclictest"' >> /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
# cyclictest -p 80 -n -s -t 2 -D 2
# cat /sys/kernel/debug/tracing/events/sched/sched_switch/hist
{ next_pid: 2143 } hitcount: 199
max: 44 next_prio: 120 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/4
{ next_pid: 2145 } hitcount: 1325
max: 38 next_prio: 19 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/2
{ next_pid: 2144 } hitcount: 1982
max: 347 next_prio: 19 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/6
Snapshot taken (see tracing/snapshot). Details:
triggering value { onmax($wakeup_lat) }: 347
triggered by event with key: { next_pid: 2144 }
# cyclictest -p 80 -n -s -t 2 -D 2
# cat /sys/kernel/debug/tracing/events/sched/sched_switch/hist
{ next_pid: 2143 } hitcount: 199
max: 44 next_prio: 120 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/4
{ next_pid: 2148 } hitcount: 199
max: 16 next_prio: 120 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/1
{ next_pid: 2145 } hitcount: 1325
max: 38 next_prio: 19 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/2
{ next_pid: 2150 } hitcount: 1326
max: 39 next_prio: 19 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/4
{ next_pid: 2144 } hitcount: 1982
max: 347 next_prio: 19 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/6
{ next_pid: 2149 } hitcount: 1983
max: 130 next_prio: 19 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/0
Snapshot taken (see tracing/snapshot). Details:
triggering value { onmax($wakeup_lat) }: 39
triggered by event with key: { next_pid: 2150 }
After:
In this example, we do a first run of the workload (cyclictest),
examine the output, note the max ('triggering value') (375), then do
a second run and note the max again.
In this case, the max in the second run is still 375, the highest in
any bucket, as it should be.
# cyclictest -p 80 -n -s -t 2 -D 2
# cat /sys/kernel/debug/tracing/events/sched/sched_switch/hist
{ next_pid: 2072 } hitcount: 200
max: 28 next_prio: 120 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/5
{ next_pid: 2074 } hitcount: 1323
max: 375 next_prio: 19 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/2
{ next_pid: 2073 } hitcount: 1980
max: 153 next_prio: 19 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/6
Snapshot taken (see tracing/snapshot). Details:
triggering value { onmax($wakeup_lat) }: 375
triggered by event with key: { next_pid: 2074 }
# cyclictest -p 80 -n -s -t 2 -D 2
# cat /sys/kernel/debug/tracing/events/sched/sched_switch/hist
{ next_pid: 2101 } hitcount: 199
max: 49 next_prio: 120 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/6
{ next_pid: 2072 } hitcount: 200
max: 28 next_prio: 120 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/5
{ next_pid: 2074 } hitcount: 1323
max: 375 next_prio: 19 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/2
{ next_pid: 2103 } hitcount: 1325
max: 74 next_prio: 19 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/4
{ next_pid: 2073 } hitcount: 1980
max: 153 next_prio: 19 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/6
{ next_pid: 2102 } hitcount: 1981
max: 84 next_prio: 19 next_comm: cyclictest
prev_pid: 12 prev_prio: 120 prev_comm: kworker/0:1
Snapshot taken (see tracing/snapshot). Details:
triggering value { onmax($wakeup_lat) }: 375
triggered by event with key: { next_pid: 2074 }
Link: http://lkml.kernel.org/r/95958351329f129c07504b4d1769c47a97b70d65.1555597045.git.tom.zanussi@linux.intel.com
Cc: stable@vger.kernel.org
Fixes: a3785b7eca ("tracing: Add hist trigger snapshot() action")
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ec0970e0a1 upstream.
The iproc host eMMC/SD controller hold time does not meet the
specification in the HS50 mode. This problem can be mitigated
by disabling the HISPD bit; thus forcing the controller output
data to be driven on the falling clock edges rather than the
rising clock edges.
Stable tag (v4.12+) chosen to assist stable kernel maintainers so that
the change does not produce merge conflicts backporting to older kernel
versions. In reality, the timing bug existed since the driver was first
introduced but there is no need for this driver to be supported in kernel
versions that old.
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: Trac Hoang <trac.hoang@broadcom.com>
Signed-off-by: Scott Branden <scott.branden@broadcom.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b7dfa695af upstream.
The iproc host eMMC/SD controller hold time does not meet the
specification in the HS50 mode. This problem can be mitigated
by disabling the HISPD bit; thus forcing the controller output
data to be driven on the falling clock edges rather than the
rising clock edges.
This change applies only to the Cygnus platform.
Stable tag (v4.12+) chosen to assist stable kernel maintainers so that
the change does not produce merge conflicts backporting to older kernel
versions. In reality, the timing bug existed since the driver was first
introduced but there is no need for this driver to be supported in kernel
versions that old.
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: Trac Hoang <trac.hoang@broadcom.com>
Signed-off-by: Scott Branden <scott.branden@broadcom.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 009b30ac74 upstream.
The kernel self-tests picked up an issue with CTR mode:
alg: skcipher: p8_aes_ctr encryption test failed (wrong result) on test vector 3, cfg="uneven misaligned splits, may sleep"
Test vector 3 has an IV of FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFD, so
after 3 increments it should wrap around to 0.
In the aesp8-ppc code from OpenSSL, there are two paths that
increment IVs: the bulk (8 at a time) path, and the individual
path which is used when there are fewer than 8 AES blocks to
process.
In the bulk path, the IV is incremented with vadduqm: "Vector
Add Unsigned Quadword Modulo", which does 128-bit addition.
In the individual path, however, the IV is incremented with
vadduwm: "Vector Add Unsigned Word Modulo", which instead
does 4 32-bit additions. Thus the IV would instead become
FFFFFFFFFFFFFFFFFFFFFFFF00000000, throwing off the result.
Use vadduqm.
This was probably a typo originally, what with q and w being
adjacent. It is a pretty narrow edge case: I am really
impressed by the quality of the kernel self-tests!
Fixes: 5c380d623e ("crypto: vmx - Add support for VMS instructions by ASM")
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Axtens <dja@axtens.net>
Acked-by: Nayna Jain <nayna@linux.ibm.com>
Tested-by: Nayna Jain <nayna@linux.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e1354400b2 upstream.
The "hmac(sha3-224-generic)" algorithm has a descsize of 368 bytes,
which is greater than HASH_MAX_DESCSIZE (360) which is only enough for
sha3-224-generic. The check in shash_prepare_alg() doesn't catch this
because the HMAC template doesn't set descsize on the algorithms, but
rather sets it on each individual HMAC transform.
This causes a stack buffer overflow when SHASH_DESC_ON_STACK() is used
with hmac(sha3-224-generic).
Fix it by increasing HASH_MAX_DESCSIZE to the real maximum. Also add a
sanity check to hmac_init().
This was detected by the improved crypto self-tests in v5.2, by loading
the tcrypt module with CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y enabled. I
didn't notice this bug when I ran the self-tests by requesting the
algorithms via AF_ALG (i.e., not using tcrypt), probably because the
stack layout differs in the two cases and that made a difference here.
KASAN report:
BUG: KASAN: stack-out-of-bounds in memcpy include/linux/string.h:359 [inline]
BUG: KASAN: stack-out-of-bounds in shash_default_import+0x52/0x80 crypto/shash.c:223
Write of size 360 at addr ffff8880651defc8 by task insmod/3689
CPU: 2 PID: 3689 Comm: insmod Tainted: G E 5.1.0-10741-g35c99ffa20edd #11
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x86/0xc5 lib/dump_stack.c:113
print_address_description+0x7f/0x260 mm/kasan/report.c:188
__kasan_report+0x144/0x187 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
check_memory_region_inline mm/kasan/generic.c:185 [inline]
check_memory_region+0x137/0x190 mm/kasan/generic.c:191
memcpy+0x37/0x50 mm/kasan/common.c:125
memcpy include/linux/string.h:359 [inline]
shash_default_import+0x52/0x80 crypto/shash.c:223
crypto_shash_import include/crypto/hash.h:880 [inline]
hmac_import+0x184/0x240 crypto/hmac.c:102
hmac_init+0x96/0xc0 crypto/hmac.c:107
crypto_shash_init include/crypto/hash.h:902 [inline]
shash_digest_unaligned+0x9f/0xf0 crypto/shash.c:194
crypto_shash_digest+0xe9/0x1b0 crypto/shash.c:211
generate_random_hash_testvec.constprop.11+0x1ec/0x5b0 crypto/testmgr.c:1331
test_hash_vs_generic_impl+0x3f7/0x5c0 crypto/testmgr.c:1420
__alg_test_hash+0x26d/0x340 crypto/testmgr.c:1502
alg_test_hash+0x22e/0x330 crypto/testmgr.c:1552
alg_test.part.7+0x132/0x610 crypto/testmgr.c:4931
alg_test+0x1f/0x40 crypto/testmgr.c:4952
Fixes: b68a7ec1e9 ("crypto: hash - Remove VLA usage")
Reported-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Cc: <stable@vger.kernel.org> # v4.20+
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8acf608e60 upstream.
This reverts commit 20bd1d026a.
This patch introduced regressions for devices that come online in
read-only state and subsequently switch to read-write.
Given how the partition code is currently implemented it is not
possible to persist the read-only flag across a device revalidate
call. This may need to get addressed in the future since it is common
for user applications to proactively call BLKRRPART.
Reverting this commit will re-introduce a regression where a
device-initiated revalidate event will cause the admin state to be
forgotten. A separate patch will address this issue.
Fixes: 20bd1d026a ("scsi: sd: Keep disk read-only when re-reading partition")
Cc: <stable@vger.kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a80c4ec10e upstream.
After commit:
672ff6cff8 ("KVM: x86: Raise #GP when guest vCPU do not support PMU")
my AMD guests started #GPing like this:
general protection fault: 0000 [#1] PREEMPT SMP
CPU: 1 PID: 4355 Comm: bash Not tainted 5.1.0-rc6+ #3
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
RIP: 0010:x86_perf_event_update+0x3b/0xa0
with Code: pointing to RDPMC. It is RDPMC because the guest has the
hardware watchdog CONFIG_HARDLOCKUP_DETECTOR_PERF enabled which uses
perf. Instrumenting kvm_pmu_rdpmc() some, showed that it fails due to:
if (!pmu->version)
return 1;
which the above commit added. Since AMD's PMU leaves the version at 0,
that causes the #GP injection into the guest.
Set pmu->version arbitrarily to 1 and move it above the non-applicable
struct kvm_pmu members.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>
Cc: kvm@vger.kernel.org
Cc: Liran Alon <liran.alon@oracle.com>
Cc: Mihai Carabas <mihai.carabas@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86@kernel.org
Cc: stable@vger.kernel.org
Fixes: 672ff6cff8 ("KVM: x86: Raise #GP when guest vCPU do not support PMU")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 66f61c9288 upstream.
Commit 11988499e6 ("KVM: x86: Skip EFER vs. guest CPUID checks for
host-initiated writes", 2019-04-02) introduced a "return false" in a
function returning int, and anyway set_efer has a "nonzero on error"
conventon so it should be returning 1.
Reported-by: Pavel Machek <pavel@denx.de>
Fixes: 11988499e6 ("KVM: x86: Skip EFER vs. guest CPUID checks for host-initiated writes")
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 82a25b027c upstream.
We didn't wait for outstanding direct IO during truncate in nojournal
mode (as we skip orphan handling in that case). This can lead to fs
corruption or stale data exposure if truncate ends up freeing blocks
and these get reallocated before direct IO finishes. Fix the condition
determining whether the wait is necessary.
CC: stable@vger.kernel.org
Fixes: 1c9114f9c0 ("ext4: serialize unlocked dio reads with truncate")
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ee0ed02ca9 upstream.
It is possible that unlinked inode enters ext4_setattr() (e.g. if
somebody calls ftruncate(2) on unlinked but still open file). In such
case we should not delete the inode from the orphan list if truncate
fails. Note that this is mostly a theoretical concern as filesystem is
corrupted if we reach this path anyway but let's be consistent in our
orphan handling.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 693713cbdb upstream.
User Mode Linux does not have access to the ip or sp fields of the pt_regs,
and accessing them causes UML to fail to build. Hide the int3_emulate_jmp()
and int3_emulate_call() instructions from UML, as it doesn't need them
anyway.
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9dc2011398 upstream.
A fallthrough in switch/case was introduced in f627caf55b ("fbdev:
sm712fb: fix crashes and garbled display during DPMS modesetting"),
due to my copy-paste error, which would cause the memory clock frequency
for SM720 to be programmed to SM712.
Since it only reprograms the clock to a different frequency, it's only
a benign issue without visible side-effect, so it also evaded Sudip
Mukherjee's code review and regression tests. scripts/checkpatch.pl
also failed to discover the issue, possibly due to nested switch
statements.
This issue was found by Stephen Rothwell by building linux-next with
-Wimplicit-fallthrough.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: f627caf55b ("fbdev: sm712fb: fix crashes and garbled display during DPMS modesetting")
Signed-off-by: Yifeng Li <tomli@tomli.me>
Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Cc: "Gustavo A. R. Silva" <gustavo@embeddedor.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dbb58e291c upstream.
The main 3.3V regulator sources a series of additional regulators.
This patch adds a small delay, so when the 3.3V regulator comes
on it delays a bit before the subsequent regulators can come on.
This reduces the inrush current a bit on the external DC power
supply to help prevent a situation where the sourcing power supply
cannot source enough current and overloads and the kit fails to
start.
Fixes: 1c207f911f ("ARM: dts: imx: Add support for Logic PD i.MX6QD EVM")
Signed-off-by: Adam Ford <aford173@gmail.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7aedca8750 upstream.
Some USB peripherals draw more power, and the sourcing regulator
take a little time to turn on. This patch fixes an issue where
some devices occasionally do not get detected, because the power
isn't quite ready when communication starts, so we add a bit
of a delay.
Fixes: 1c207f911f ("ARM: dts: imx: Add support for Logic PD i.MX6QD EVM")
Signed-off-by: Adam Ford <aford173@gmail.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 10995c0491 upstream.
Commit d2311e6985 ("btrfs: relocation: Delay reloc tree deletion after
merge_reloc_roots()") expands the life span of root->reloc_root.
This breaks certain checs of fs_info->reloc_ctl. Before that commit, if
we have a root with valid reloc_root, then it's ensured to have
fs_info->reloc_ctl.
But now since reloc_root doesn't always mean a valid fs_info->reloc_ctl,
such check is unreliable and can cause the following NULL pointer
dereference:
BUG: unable to handle kernel NULL pointer dereference at 00000000000005c1
IP: btrfs_reloc_pre_snapshot+0x20/0x50 [btrfs]
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 10379 Comm: snapperd Not tainted
Call Trace:
create_pending_snapshot+0xd7/0xfc0 [btrfs]
create_pending_snapshots+0x8e/0xb0 [btrfs]
btrfs_commit_transaction+0x2ac/0x8f0 [btrfs]
btrfs_mksubvol+0x561/0x570 [btrfs]
btrfs_ioctl_snap_create_transid+0x189/0x190 [btrfs]
btrfs_ioctl_snap_create_v2+0x102/0x150 [btrfs]
btrfs_ioctl+0x5c9/0x1e60 [btrfs]
do_vfs_ioctl+0x90/0x5f0
SyS_ioctl+0x74/0x80
do_syscall_64+0x7b/0x150
entry_SYSCALL_64_after_hwframe+0x3d/0xa2
RIP: 0033:0x7fd7cdab8467
Fix it by explicitly checking fs_info->reloc_ctl other than using the
implied root->reloc_root.
Fixes: d2311e6985 ("btrfs: relocation: Delay reloc tree deletion after merge_reloc_roots")
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f3d964673b upstream.
As Stepan Golosunov points out, there is a small mistake in the
get_timespec64() function in the kernel. It was originally added under the
assumption that CONFIG_64BIT_TIME would get enabled on all 32-bit and
64-bit architectures, but when the conversion was done, it was only turned
on for 32-bit ones.
The effect is that the get_timespec64() function never clears the upper
half of the tv_nsec field for 32-bit tasks in compat mode. Clearing this is
required for POSIX compliant behavior of functions that pass a 'timespec'
structure with a 64-bit tv_sec and a 32-bit tv_nsec, plus uninitialized
padding.
The easiest fix for linux-5.1 is to just make the Kconfig symbol
unconditional, as it was originally intended. As a follow-up, the #ifdef
CONFIG_64BIT_TIME can be removed completely..
Note: for native 32-bit mode, no change is needed, this works as
designed and user space should never need to clear the upper 32
bits of the tv_nsec field, in or out of the kernel.
Fixes: 00bf25d693 ("y2038: use time32 syscall names on 32-bit")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Joseph Myers <joseph@codesourcery.com>
Cc: libc-alpha@sourceware.org
Cc: linux-api@vger.kernel.org
Cc: Deepa Dinamani <deepa.kernel@gmail.com>
Cc: Lukasz Majewski <lukma@denx.de>
Cc: Stepan Golosunov <stepan@golosunov.pp.ru>
Link: https://lore.kernel.org/lkml/20190422090710.bmxdhhankurhafxq@sghpc.golosunov.pp.ru/
Link: https://lkml.kernel.org/r/20190429131951.471701-1-arnd@arndb.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 50b045a8c0 upstream.
One of the biggest issues we face right now with picking LRU map over
regular hash table is that a map walk out of user space, for example,
to just dump the existing entries or to remove certain ones, will
completely mess up LRU eviction heuristics and wrong entries such
as just created ones will get evicted instead. The reason for this
is that we mark an entry as "in use" via bpf_lru_node_set_ref() from
system call lookup side as well. Thus upon walk, all entries are
being marked, so information of actual least recently used ones
are "lost".
In case of Cilium where it can be used (besides others) as a BPF
based connection tracker, this current behavior causes disruption
upon control plane changes that need to walk the map from user space
to evict certain entries. Discussion result from bpfconf [0] was that
we should simply just remove marking from system call side as no
good use case could be found where it's actually needed there.
Therefore this patch removes marking for regular LRU and per-CPU
flavor. If there ever should be a need in future, the behavior could
be selected via map creation flag, but due to mentioned reason we
avoid this here.
[0] http://vger.kernel.org/bpfconf.html
Fixes: 29ba732acb ("bpf: Add BPF_MAP_TYPE_LRU_HASH")
Fixes: 8f8449384e ("bpf: Add BPF_MAP_TYPE_LRU_PERCPU_HASH")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c6110222c6 upstream.
Add a callback map_lookup_elem_sys_only() that map implementations
could use over map_lookup_elem() from system call side in case the
map implementation needs to handle the latter differently than from
the BPF data path. If map_lookup_elem_sys_only() is set, this will
be preferred pick for map lookups out of user space. This hook is
used in a follow-up fix for LRU map, but once development window
opens, we can convert other map types from map_lookup_elem() (here,
the one called upon BPF_MAP_LOOKUP_ELEM cmd is meant) over to use
the callback to simplify and clean up the latter.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e547ff3f80 upstream.
For iptable module to load a bpf program from a pinned location, it
only retrieve a loaded program and cannot change the program content so
requiring a write permission for it might not be necessary.
Also when adding or removing an unrelated iptable rule, it might need to
flush and reload the xt_bpf related rules as well and triggers the inode
permission check. It might be better to remove the write premission
check for the inode so we won't need to grant write access to all the
processes that flush and restore iptables rules.
Signed-off-by: Chenbo Feng <fengc@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0b777eee88 upstream.
In commit 376991db4b ("driver core: Postpone DMA tear-down until after
devres release"), we changed the ordering of tearing down the device DMA
ops and releasing all the device's resources; this was because the DMA ops
should be maintained until we release the device's managed DMA memories.
However, we have seen another crash on an arm64 system when a
device driver probe fails:
hisi_sas_v3_hw 0000:74:02.0: Adding to iommu group 2
scsi host1: hisi_sas_v3_hw
BUG: Bad page state in process swapper/0 pfn:313f5
page:ffff7e0000c4fd40 count:1 mapcount:0
mapping:0000000000000000 index:0x0
flags: 0xfffe00000001000(reserved)
raw: 0fffe00000001000 ffff7e0000c4fd48 ffff7e0000c4fd48
0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff
0000000000000000
page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
bad because of flags: 0x1000(reserved)
Modules linked in:
CPU: 49 PID: 1 Comm: swapper/0 Not tainted
5.1.0-rc1-43081-g22d97fd-dirty #1433
Hardware name: Huawei D06/D06, BIOS Hisilicon D06 UEFI
RC0 - V1.12.01 01/29/2019
Call trace:
dump_backtrace+0x0/0x118
show_stack+0x14/0x1c
dump_stack+0xa4/0xc8
bad_page+0xe4/0x13c
free_pages_check_bad+0x4c/0xc0
__free_pages_ok+0x30c/0x340
__free_pages+0x30/0x44
__dma_direct_free_pages+0x30/0x38
dma_direct_free+0x24/0x38
dma_free_attrs+0x9c/0xd8
dmam_release+0x20/0x28
release_nodes+0x17c/0x220
devres_release_all+0x34/0x54
really_probe+0xc4/0x2c8
driver_probe_device+0x58/0xfc
device_driver_attach+0x68/0x70
__driver_attach+0x94/0xdc
bus_for_each_dev+0x5c/0xb4
driver_attach+0x20/0x28
bus_add_driver+0x14c/0x200
driver_register+0x6c/0x124
__pci_register_driver+0x48/0x50
sas_v3_pci_driver_init+0x20/0x28
do_one_initcall+0x40/0x25c
kernel_init_freeable+0x2b8/0x3c0
kernel_init+0x10/0x100
ret_from_fork+0x10/0x18
Disabling lock debugging due to kernel taint
BUG: Bad page state in process swapper/0 pfn:313f6
page:ffff7e0000c4fd80 count:1 mapcount:0
mapping:0000000000000000 index:0x0
[ 89.322983] flags: 0xfffe00000001000(reserved)
raw: 0fffe00000001000 ffff7e0000c4fd88 ffff7e0000c4fd88
0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff
0000000000000000
The crash occurs for the same reason.
In this case, on the really_probe() failure path, we are still clearing
the DMA ops prior to releasing the device's managed memories.
This patch fixes this issue by reordering the DMA ops teardown and the
call to devres_release_all() on the failure path.
Reported-by: Xiang Chen <chenxiang66@hisilicon.com>
Tested-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 941acd566b upstream.
On imx8mq B0 chip, AHB/SDMA clock ratio 2:1 can't be supported,
since SDMA clock ratio has to be increased to 250Mhz, AHB can't reach
to 500Mhz, so use 1:1 instead.
To limit this change to the imx8mq for now this patch also adds an
im8mq-sdma compatible string.
Signed-off-by: Angus Ainslie (Purism) <angus@akkea.ca>
Acked-by: Robin Gong <yibin.gong@nxp.com>
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Cc: Richard Leitner <richard.leitner@skidata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b2176a1dfb upstream.
The problem is that any 'uptodate' vs 'disks' check is not precise
in this path. Put a "WARN_ON(!test_bit(R5_UPTODATE, &dev->flags)" on the
device that might try to kick off writes and then skip the action.
Better to prevent the raid driver from taking unexpected action *and* keep
the system alive vs killing the machine with BUG_ON.
Note: fixed warning reported by kbuild test robot <lkp@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Nigel Croxon <ncroxon@redhat.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 51b86f9a8d upstream.
Commit 61697a6abd ("dm: eliminate 'split_discard_bios' flag from DM
target interface") incorrectly removed code from
__send_changing_extent_only() that is required to impose a per-target IO
boundary on IO that exceeds max_io_len_target_boundary(). Otherwise
"special" IO (e.g. DISCARD, WRITE SAME, WRITE ZEROES) can write beyond
where allowed.
Fix this by restoring the max_io_len_target_boundary() limit in
__send_changing_extent_only()
Fixes: 61697a6abd ("dm: eliminate 'split_discard_bios' flag from DM target interface")
Cc: stable@vger.kernel.org # 5.1+
Signed-off-by: Michael Lass <bevan@bi-co.net>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bbd84f3365 upstream.
Starting from commit 9c225f2655 ("vfs: atomic f_pos accesses as per
POSIX") files opened even via nonseekable_open gate read and write via lock
and do not allow them to be run simultaneously. This can create read vs
write deadlock if a filesystem is trying to implement a socket-like file
which is intended to be simultaneously used for both read and write from
filesystem client. See commit 10dce8af34 ("fs: stream_open - opener for
stream-like files so that read and write can run simultaneously without
deadlock") for details and e.g. commit 581d21a2d0 ("xenbus: fix deadlock
on writes to /proc/xen/xenbus") for a similar deadlock example on
/proc/xen/xenbus.
To avoid such deadlock it was tempting to adjust fuse_finish_open to use
stream_open instead of nonseekable_open on just FOPEN_NONSEEKABLE flags,
but grepping through Debian codesearch shows users of FOPEN_NONSEEKABLE,
and in particular GVFS which actually uses offset in its read and write
handlers
https://codesearch.debian.net/search?q=-%3Enonseekable+%3Dhttps://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1080https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1247-1346https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1399-1481
so if we would do such a change it will break a real user.
Add another flag (FOPEN_STREAM) for filesystem servers to indicate that the
opened handler is having stream-like semantics; does not use file position
and thus the kernel is free to issue simultaneous read and write request on
opened file handle.
This patch together with stream_open() should be added to stable kernels
starting from v3.14+. This will allow to patch OSSPD and other FUSE
filesystems that provide stream-like files to return FOPEN_STREAM |
FOPEN_NONSEEKABLE in open handler and this way avoid the deadlock on all
kernel versions. This should work because fuse_finish_open ignores unknown
open flags returned from a filesystem and so passing FOPEN_STREAM to a
kernel that is not aware of this flag cannot hurt. In turn the kernel that
is not aware of FOPEN_STREAM will be < v3.14 where just FOPEN_NONSEEKABLE
is sufficient to implement streams without read vs write deadlock.
Cc: stable@vger.kernel.org # v3.14+
Signed-off-by: Kirill Smelkov <kirr@nexedi.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 940bc47178 upstream.
Commit b592211c33 ("dm mpath: fix attached_handler_name leak and
dangling hw_handler_name pointer") fixed a memory leak for the case
where setup_scsi_dh() returns failure. But setup_scsi_dh may return
success and not "use" attached_handler_name if the
retain_attached_hwhandler flag is not set on the map. As setup_scsi_sh
properly "steals" the pointer by nullifying it, freeing it
unconditionally in parse_path() is safe.
Fixes: b592211c33 ("dm mpath: fix attached_handler_name leak and dangling hw_handler_name pointer")
Cc: stable@vger.kernel.org
Reported-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0f41fcf788 upstream.
The dm_early_create() function (which deals with "dm-mod.create=" kernel
command line option) calls dm_hash_insert() who gets an extra reference
to the md object.
In case of failure, this reference wasn't being released, causing
dm_destroy() to hang, thus hanging the whole boot process.
Fix this by calling __hash_remove() in the error path.
Fixes: 6bbc923dfc ("dm: add support to directly boot to a mapped device")
Cc: stable@vger.kernel.org
Signed-off-by: Helen Koike <helen.koike@collabora.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 30bba430dd upstream.
When we use separate devices for data and metadata, dm-integrity would
incorrectly calculate the size of the metadata device as if it had
512-byte block size - and it would refuse activation with larger block
size and smaller metadata device.
Fix this so that it takes actual block size into account, which fixes
the following reported issue:
https://gitlab.com/cryptsetup/cryptsetup/issues/450
Fixes: 356d9d52e1 ("dm integrity: allow separate metadata device")
Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7a1cd7238f upstream.
The information about tag size should not be printed without debug info
set. Also print device major:minor in the error message to identify the
device instance.
Also use rate limiting and debug level for info about used crypto API
implementaton. This is important because during online reencryption
the existing message saturates syslog (because we are moving hotzone
across the whole device).
Cc: stable@vger.kernel.org
Signed-off-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 81bc6d150a upstream.
When the target line contains an invalid device, delay_ctr() will call
delay_dtr() with NULL workqueue. Attempting to destroy the NULL
workqueue causes a crash.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8e890c1ab1 upstream.
dm-init should allow up to DM_MAX_{DEVICES,TARGETS} for devices/targets,
and not DM_MAX_{DEVICES,TARGETS} - 1.
Fix the checks and also fix the error message when the number of devices
is surpassed.
Fixes: 6bbc923dfc ("dm: add support to directly boot to a mapped device")
Cc: stable@vger.kernel.org
Signed-off-by: Helen Koike <helen.koike@collabora.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7aedf75ff7 upstream.
The function blkdev_report_zones() returns success even if no zone
information is reported (empty report). Empty zone reports can only
happen if the report start sector passed exceeds the device capacity.
The conditions for this to happen are either a bug in the caller code,
or, a change in the device that forced the low level driver to change
the device capacity to a value that is lower than the report start
sector. This situation includes a failed disk revalidation resulting in
the disk capacity being changed to 0.
If this change happens while dm-zoned is in its initialization phase
executing dmz_init_zones(), this function may enter an infinite loop
and hang the system. To avoid this, add a check to disallow empty zone
reports and bail out early. Also fix the function dmz_update_zone() to
make sure that the report for the requested zone was correctly obtained.
Fixes: 3b1a94c88b ("dm zoned: drive-managed zoned block device target")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Shaun Tancheff <shaun@tancheff.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e28adc3bf3 upstream.
Add missing dm_bitset_cursor_next() to properly advance the bitset
cursor.
Otherwise, the discarded state of all blocks is set according to the
discarded state of the first block.
Fixes: ae4a46a1f6 ("dm cache metadata: use bitset cursor api to load discard bitset")
Cc: stable@vger.kernel.org
Signed-off-by: Nikos Tsironis <ntsironis@arrikto.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4ec73791a6 upstream.
Due to an erratum in some Pericom PCIe-to-PCI bridges in reverse mode
(conventional PCI on primary side, PCIe on downstream side), the Retrain
Link bit needs to be cleared manually to allow the link training to
complete successfully.
If it is not cleared manually, the link training is continuously restarted
and no devices below the PCI-to-PCIe bridge can be accessed. That means
drivers for devices below the bridge will be loaded but won't work and may
even crash because the driver is only reading 0xffff.
See the Pericom Errata Sheet PI7C9X111SLB_errata_rev1.2_102711.pdf for
details. Devices known as affected so far are: PI7C9X110, PI7C9X111SL,
PI7C9X130.
Add a new flag, clear_retrain_link, in struct pci_dev. Quirks for affected
devices set this bit.
Note that pcie_retrain_link() lives in aspm.c because that's currently the
only place we use it, but this erratum is not specific to ASPM, and we may
retrain links for other reasons in the future.
Signed-off-by: Stefan Mätje <stefan.maetje@esd.eu>
[bhelgaas: apply regardless of CONFIG_PCIEASPM]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
CC: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit be20bbcb0a upstream.
Reestablish the PCIe link very early in the resume process in case it
went down to prevent PCI accesses from hanging the bus. Such accesses
can happen early in the PCI resume process, as early as the
SUSPEND_RESUME_NOIRQ step, thus the link must be reestablished in the
driver resume_noirq() callback.
Fixes: e015f88c36 ("PCI: rcar: Add support for R-Car H3 to pcie-rcar")
Signed-off-by: Kazufumi Ikeda <kaz-ikeda@xc.jp.nec.com>
Signed-off-by: Gaku Inami <gaku.inami.xw@bp.renesas.com>
Signed-off-by: Marek Vasut <marek.vasut+renesas@gmail.com>
[lorenzo.pieralisi@arm.com: reformatted commit log]
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Simon Horman <horms+renesas@verge.net.au>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Cc: stable@vger.kernel.org
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Phil Edworthy <phil.edworthy@renesas.com>
Cc: Simon Horman <horms+renesas@verge.net.au>
Cc: Wolfram Sang <wsa@the-dreams.de>
Cc: linux-renesas-soc@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6302bf3ef7 upstream.
Two functions allocate a host bridge: devm_pci_alloc_host_bridge() and
pci_alloc_host_bridge(). At the moment, only the unmanaged one initializes
the PCIe feature bits, which prevents from using features such as hotplug
or AER on some systems, when booting with device tree. Make the
initialization code common.
Fixes: 02bfeb4842 ("PCI/portdrv: Simplify PCIe feature permission checking")
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: stable@vger.kernel.org # v4.17+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e0547c81bf upstream.
On ThinkPad P50 SKUs with an Nvidia Quadro M1000M instead of the M2000M
variant, the BIOS does not always reset the secondary Nvidia GPU during
reboot if the laptop is configured in Hybrid Graphics mode. The reason is
unknown, but the following steps and possibly a good bit of patience will
reproduce the issue:
1. Boot up the laptop normally in Hybrid Graphics mode
2. Make sure nouveau is loaded and that the GPU is awake
3. Allow the Nvidia GPU to runtime suspend itself after being idle
4. Reboot the machine, the more sudden the better (e.g. sysrq-b may help)
5. If nouveau loads up properly, reboot the machine again and go back to
step 2 until you reproduce the issue
This results in some very strange behavior: the GPU will be left in exactly
the same state it was in when the previously booted kernel started the
reboot. This has all sorts of bad side effects: for starters, this
completely breaks nouveau starting with a mysterious EVO channel failure
that happens well before we've actually used the EVO channel for anything:
nouveau 0000:01:00.0: disp: chid 0 mthd 0000 data 00000400 00001000 00000002
This causes a timeout trying to bring up the GR ctx:
nouveau 0000:01:00.0: timeout
WARNING: CPU: 0 PID: 12 at drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgf100.c:1547 gf100_grctx_generate+0x7b2/0x850 [nouveau]
Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET82W (1.55 ) 12/18/2018
Workqueue: events_long drm_dp_mst_link_probe_work [drm_kms_helper]
...
nouveau 0000:01:00.0: gr: wait for idle timeout (en: 1, ctxsw: 0, busy: 1)
nouveau 0000:01:00.0: gr: wait for idle timeout (en: 1, ctxsw: 0, busy: 1)
nouveau 0000:01:00.0: fifo: fault 01 [WRITE] at 0000000000008000 engine 00 [GR] client 15 [HUB/SCC_NB] reason c4 [] on channel -1 [0000000000 unknown]
The GPU never manages to recover. Booting without loading nouveau causes
issues as well, since the GPU starts sending spurious interrupts that cause
other device's IRQs to get disabled by the kernel:
irq 16: nobody cared (try booting with the "irqpoll" option)
...
handlers:
[<000000007faa9e99>] i801_isr [i2c_i801]
Disabling IRQ #16
...
serio: RMI4 PS/2 pass-through port at rmi4-00.fn03
i801_smbus 0000:00:1f.4: Timeout waiting for interrupt!
i801_smbus 0000:00:1f.4: Transaction timeout
rmi4_f03 rmi4-00.fn03: rmi_f03_pt_write: Failed to write to F03 TX register (-110).
i801_smbus 0000:00:1f.4: Timeout waiting for interrupt!
i801_smbus 0000:00:1f.4: Transaction timeout
rmi4_physical rmi4-00: rmi_driver_set_irq_bits: Failed to change enabled interrupts!
This causes the touchpad and sometimes other things to get disabled.
Since this happens without nouveau, we can't fix this problem from nouveau
itself.
Add a PCI quirk for the specific P50 variant of this GPU. Make sure the
GPU is advertising NoReset- so we don't reset the GPU when the machine is
in Dedicated graphics mode (where the GPU being initialized by the BIOS is
normal and expected). Map the GPU MMIO space and read the magic 0x2240c
register, which will have bit 1 set if the device was POSTed during a
previous boot. Once we've confirmed all of this, reset the GPU and
re-disable it - bringing it back to a healthy state.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=203003
Link: https://lore.kernel.org/lkml/20190212220230.1568-1-lyude@redhat.com
Signed-off-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: nouveau@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Cc: Karol Herbst <kherbst@redhat.com>
Cc: Ben Skeggs <skeggsb@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f627caf55b upstream.
On a Thinkpad s30 (Pentium III / i440MX, Lynx3DM), blanking the display
or starting the X server will crash and freeze the system, or garble the
display.
Experiments showed this problem can mostly be solved by adjusting the
order of register writes. Also, sm712fb failed to consider the difference
of clock frequency when unblanking the display, and programs the clock for
SM712 to SM720.
Fix them by adjusting the order of register writes, and adding an
additional check for SM720 for programming the clock frequency.
Signed-off-by: Yifeng Li <tomli@tomli.me>
Tested-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Cc: Teddy Wang <teddy.wang@siliconmotion.com>
Cc: <stable@vger.kernel.org> # v4.4+
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4ed7d2ccb7 upstream.
Loongson MIPS netbooks use 1024x600 LCD panels, which is the original
target platform of this driver, but nearly all old x86 laptops have
1024x768. Lighting 768 panels using 600's timings would partially
garble the display. Since it's not possible to distinguish them reliably,
we change the default to 768, but keep 600 as-is on MIPS.
Further, earlier laptops, such as IBM Thinkpad 240X, has a 800x600 LCD
panel, this driver would probably garbled those display. As we don't
have one for testing, the original behavior of the driver is kept as-is,
but the problem has been documented is the comments.
Signed-off-by: Yifeng Li <tomli@tomli.me>
Tested-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Cc: Teddy Wang <teddy.wang@siliconmotion.com>
Cc: <stable@vger.kernel.org> # v4.4+
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6053d3a479 upstream.
In order to support the 1024x600 panel on Yeeloong Loongson MIPS
laptop, the original 1024x768-16 table was modified to 1024x600-16,
without leaving the original. It causes problem on x86 laptop as
the 1024x768-16 support was still claimed but not working.
Fix it by introducing the 1024x768-16 mode.
Signed-off-by: Yifeng Li <tomli@tomli.me>
Tested-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Cc: Teddy Wang <teddy.wang@siliconmotion.com>
Cc: <stable@vger.kernel.org> # v4.4+
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9e0e59993d upstream.
On a Thinkpad s30 (Pentium III / i440MX, Lynx3DM), running fbtest or X
will crash the machine instantly, because the VRAM/framebuffer is not
mapped correctly.
On SM712, the framebuffer starts at the beginning of address space, but
SM720's framebuffer starts at the 1 MiB offset from the beginning. However,
sm712fb fails to take this into account, as a result, writing to the
framebuffer will destroy all the registers and kill the system immediately.
Another problem is the driver assumes 8 MiB of VRAM for SM720, but some
SM720 system, such as this IBM Thinkpad, only has 4 MiB of VRAM.
Fix this problem by removing the hardcoded VRAM size, adding a function to
query the amount of VRAM from register MCR76 on SM720, and adding proper
framebuffer offset.
Please note that the memory map may have additional problems on Big-Endian
system, which is not available for testing by myself. But I highly suspect
that the original code is also broken on Big-Endian machines for SM720, so
at least we are not making the problem worse. More, the driver also assumed
SM710/SM712 has 4 MiB of VRAM, but it has a 2 MiB version as well, and used
in earlier laptops, such as IBM Thinkpad 240X, the driver would probably
crash on them. I've never seen one of those machines and cannot fix it, but
I have documented these problems in the comments.
Signed-off-by: Yifeng Li <tomli@tomli.me>
Tested-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Cc: Teddy Wang <teddy.wang@siliconmotion.com>
Cc: <stable@vger.kernel.org> # v4.4+
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ec1587d507 upstream.
When the machine is booted in VGA mode, loading sm712fb would cause
a glitch of random pixels shown on the screen. To prevent it from
happening, we first clear the entire framebuffer, and we also need
to stop calling smtcfb_setmode() during initialization, the fbdev
layer will call it for us later when it's ready.
Signed-off-by: Yifeng Li <tomli@tomli.me>
Tested-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Cc: Teddy Wang <teddy.wang@siliconmotion.com>
Cc: <stable@vger.kernel.org> # v4.4+
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8069053880 upstream.
On a Thinkpad s30 (Pentium III / i440MX, Lynx3DM), rebooting with
sm712fb framebuffer driver would cause a white screen of death on
the next POST, presumably the proper timings for the LCD panel was
not reprogrammed properly by the BIOS.
Experiments showed a few CRTC Scratch Registers, including CRT3D,
CRT3E and CRT3F may be used internally by BIOS as some flags. CRT3B is
a hardware testing register, we shouldn't mess with it. CRT3C has
blanking signal and line compare control, which is not needed for this
driver.
Stop writing to CR3B-CR3F (a.k.a CRT3B-CRT3F) registers. Even if these
registers don't have side-effect on other systems, writing to them is
also highly questionable.
Signed-off-by: Yifeng Li <tomli@tomli.me>
Tested-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Cc: Teddy Wang <teddy.wang@siliconmotion.com>
Cc: <stable@vger.kernel.org> # v4.4+
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dcf9070595 upstream.
On a Thinkpad s30 (Pentium III / i440MX, Lynx3DM), the amount of Video
RAM is not detected correctly by the xf86-video-siliconmotion driver.
This is because sm712fb overwrites the GPR71 Scratch Pad Register, which
is set by BIOS on x86 and used to indicate amount of VRAM.
Other Scratch Pad Registers, including GPR70/74/75, don't have the same
side-effect, but overwriting to them is still questionable, as they are
not related to modesetting.
Stop writing to SR70/71/74/75 (a.k.a GPR70/71/74/75).
Signed-off-by: Yifeng Li <tomli@tomli.me>
Tested-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Cc: Teddy Wang <teddy.wang@siliconmotion.com>
Cc: <stable@vger.kernel.org> # v4.4+
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5481115e25 upstream.
On a Thinkpad s30 (Pentium III / i440MX, Lynx3DM), rebooting with
sm712fb framebuffer driver would cause the role of brightness up/down
button to swap.
Experiments showed the FPR30 register caused this behavior. Moreover,
even if this register don't have side-effect on other systems, over-
writing it is also highly questionable, since it was originally
configurated by the motherboard manufacturer by hardwiring pull-down
resistors to indicate the type of LCD panel. We should not mess with
it.
Stop writing to the SR30 (a.k.a FPR30) register.
Signed-off-by: Yifeng Li <tomli@tomli.me>
Tested-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Cc: Teddy Wang <teddy.wang@siliconmotion.com>
Cc: <stable@vger.kernel.org> # v4.4+
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f8585539df upstream.
The following commit:
38ac0287b7 ("fbdev/efifb: Honour UEFI memory map attributes when mapping the FB")
updated the EFI framebuffer code to use memory mappings for the linear
framebuffer that are permitted by the memory attributes described by the
EFI memory map for the particular region, if the framebuffer happens to
be covered by the EFI memory map (which is typically only the case for
framebuffers in shared memory). This is required since non-x86 systems
may require cacheable attributes for memory mappings that are shared
with other masters (such as GPUs), and this information cannot be
described by the Graphics Output Protocol (GOP) EFI protocol itself,
and so we rely on the EFI memory map for this.
As reported by James, this breaks some x86 systems:
[ 1.173368] efifb: probing for efifb
[ 1.173386] efifb: abort, cannot remap video memory 0x1d5000 @ 0xcf800000
[ 1.173395] Trying to free nonexistent resource <00000000cf800000-00000000cf9d4bff>
[ 1.173413] efi-framebuffer: probe of efi-framebuffer.0 failed with error -5
The problem turns out to be that the memory map entry that describes the
framebuffer has no memory attributes listed at all, and so we end up with
a mem_flags value of 0x0.
So work around this by ensuring that the memory map entry's attribute field
has a sane value before using it to mask the set of usable attributes.
Reported-by: James Hilliard <james.hilliard1@gmail.com>
Tested-by: James Hilliard <james.hilliard1@gmail.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: <stable@vger.kernel.org> # v4.19+
Cc: Borislav Petkov <bp@alien8.de>
Cc: James Morse <james.morse@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Jones <pjones@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Fixes: 38ac0287b7 ("fbdev/efifb: Honour UEFI memory map attributes when ...")
Link: http://lkml.kernel.org/r/20190516213159.3530-2-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5a28fc94c9 upstream.
This is a bit of a mess, to put it mildly. But, it's a bug
that only seems to have showed up in 4.20 but wasn't noticed
until now, because nobody uses MPX.
MPX has the arch_unmap() hook inside of munmap() because MPX
uses bounds tables that protect other areas of memory. When
memory is unmapped, there is also a need to unmap the MPX
bounds tables. Barring this, unused bounds tables can eat 80%
of the address space.
But, the recursive do_munmap() that gets called vi arch_unmap()
wreaks havoc with __do_munmap()'s state. It can result in
freeing populated page tables, accessing bogus VMA state,
double-freed VMAs and more.
See the "long story" further below for the gory details.
To fix this, call arch_unmap() before __do_unmap() has a chance
to do anything meaningful. Also, remove the 'vma' argument
and force the MPX code to do its own, independent VMA lookup.
== UML / unicore32 impact ==
Remove unused 'vma' argument to arch_unmap(). No functional
change.
I compile tested this on UML but not unicore32.
== powerpc impact ==
powerpc uses arch_unmap() well to watch for munmap() on the
VDSO and zeroes out 'current->mm->context.vdso_base'. Moving
arch_unmap() makes this happen earlier in __do_munmap(). But,
'vdso_base' seems to only be used in perf and in the signal
delivery that happens near the return to userspace. I can not
find any likely impact to powerpc, other than the zeroing
happening a little earlier.
powerpc does not use the 'vma' argument and is unaffected by
its removal.
I compile-tested a 64-bit powerpc defconfig.
== x86 impact ==
For the common success case this is functionally identical to
what was there before. For the munmap() failure case, it's
possible that some MPX tables will be zapped for memory that
continues to be in use. But, this is an extraordinarily
unlikely scenario and the harm would be that MPX provides no
protection since the bounds table got reset (zeroed).
I can't imagine anyone doing this:
ptr = mmap();
// use ptr
ret = munmap(ptr);
if (ret)
// oh, there was an error, I'll
// keep using ptr.
Because if you're doing munmap(), you are *done* with the
memory. There's probably no good data in there _anyway_.
This passes the original reproducer from Richard Biener as
well as the existing mpx selftests/.
The long story:
munmap() has a couple of pieces:
1. Find the affected VMA(s)
2. Split the start/end one(s) if neceesary
3. Pull the VMAs out of the rbtree
4. Actually zap the memory via unmap_region(), including
freeing page tables (or queueing them to be freed).
5. Fix up some of the accounting (like fput()) and actually
free the VMA itself.
This specific ordering was actually introduced by:
dd2283f260 ("mm: mmap: zap pages with read mmap_sem in munmap")
during the 4.20 merge window. The previous __do_munmap() code
was actually safe because the only thing after arch_unmap() was
remove_vma_list(). arch_unmap() could not see 'vma' in the
rbtree because it was detached, so it is not even capable of
doing operations unsafe for remove_vma_list()'s use of 'vma'.
Richard Biener reported a test that shows this in dmesg:
[1216548.787498] BUG: Bad rss-counter state mm:0000000017ce560b idx:1 val:551
[1216548.787500] BUG: non-zero pgtables_bytes on freeing mm: 24576
What triggered this was the recursive do_munmap() called via
arch_unmap(). It was freeing page tables that has not been
properly zapped.
But, the problem was bigger than this. For one, arch_unmap()
can free VMAs. But, the calling __do_munmap() has variables
that *point* to VMAs and obviously can't handle them just
getting freed while the pointer is still in use.
I tried a couple of things here. First, I tried to fix the page
table freeing problem in isolation, but I then found the VMA
issue. I also tried having the MPX code return a flag if it
modified the rbtree which would force __do_munmap() to re-walk
to restart. That spiralled out of control in complexity pretty
fast.
Just moving arch_unmap() and accepting that the bonkers failure
case might eat some bounds tables seems like the simplest viable
fix.
This was also reported in the following kernel bugzilla entry:
https://bugzilla.kernel.org/show_bug.cgi?id=203123
There are some reports that this commit triggered this bug:
dd2283f260 ("mm: mmap: zap pages with read mmap_sem in munmap")
While that commit certainly made the issues easier to hit, I believe
the fundamental issue has been with us as long as MPX itself, thus
the Fixes: tag below is for one of the original MPX commits.
[ mingo: Minor edits to the changelog and the patch. ]
Reported-by: Richard Biener <rguenther@suse.de>
Reported-by: H.J. Lu <hjl.tools@gmail.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Yang Shi <yang.shi@linux.alibaba.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rik van Riel <riel@surriel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: linux-um@lists.infradead.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: stable@vger.kernel.org
Fixes: dd2283f260 ("mm: mmap: zap pages with read mmap_sem in munmap")
Link: http://lkml.kernel.org/r/20190419194747.5E1AD6DC@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1b6599a9d8 upstream.
The sample timestamp is updated to ensure that the timestamp represents
the time of the sample and not a branch that the decoder is still
walking towards. The sample timestamp is updated when the decoder
returns, but the decoder does not return for non-taken branches. Update
the sample timestamp then also.
Note that commit 3f04d98e97 ("perf intel-pt: Improve sample
timestamp") was also a stable fix and appears, for example, in v4.4
stable tree as commit a4ebb58fd1 ("perf intel-pt: Improve sample
timestamp").
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: stable@vger.kernel.org # v4.4+
Fixes: 3f04d98e97 ("perf intel-pt: Improve sample timestamp")
Link: http://lkml.kernel.org/r/20190510124143.27054-4-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 61b6e08dc8 upstream.
The decoder uses its current timestamp in samples. Usually that is a
timestamp that has already passed, but in some cases it is a timestamp
for a branch that the decoder is walking towards, and consequently
hasn't reached.
The intel_pt_sample_time() function decides which is which, but was not
handling TNT packets exactly correctly.
In the case of TNT, the timestamp applies to the first branch, so the
decoder must first walk to that branch.
That means intel_pt_sample_time() should return true for TNT, and this
patch makes that change. However, if the first branch is a non-taken
branch (i.e. a 'N'), then intel_pt_sample_time() needs to return false
for subsequent taken branches in the same TNT packet.
To handle that, introduce a new state INTEL_PT_STATE_TNT_CONT to
distinguish the cases.
Note that commit 3f04d98e97 ("perf intel-pt: Improve sample
timestamp") was also a stable fix and appears, for example, in v4.4
stable tree as commit a4ebb58fd1 ("perf intel-pt: Improve sample
timestamp").
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: stable@vger.kernel.org # v4.4+
Fixes: 3f04d98e97 ("perf intel-pt: Improve sample timestamp")
Link: http://lkml.kernel.org/r/20190510124143.27054-3-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7ba8fa20e2 upstream.
The timestamp used to determine if an instruction sample is made, is an
estimate based on the number of instructions since the last known
timestamp. A consequence is that it might go backwards, which results in
extra samples. Change it so that a sample is only made when the
timestamp goes forwards.
Note this does not affect a sampling period of 0 or sampling periods
specified as a count of instructions.
Example:
Before:
$ perf script --itrace=i10us
ls 13812 [003] 2167315.222583: 3270 instructions:u: 7fac71e2e494 __GI___tunables_init+0xf4 (/lib/x86_64-linux-gnu/ld-2.28.so)
ls 13812 [003] 2167315.222667: 30902 instructions:u: 7fac71e2da0f _dl_cache_libcmp+0x2f (/lib/x86_64-linux-gnu/ld-2.28.so)
ls 13812 [003] 2167315.222667: 10 instructions:u: 7fac71e2d9ff _dl_cache_libcmp+0x1f (/lib/x86_64-linux-gnu/ld-2.28.so)
ls 13812 [003] 2167315.222667: 8 instructions:u: 7fac71e2d9ea _dl_cache_libcmp+0xa (/lib/x86_64-linux-gnu/ld-2.28.so)
ls 13812 [003] 2167315.222667: 14 instructions:u: 7fac71e2d9ea _dl_cache_libcmp+0xa (/lib/x86_64-linux-gnu/ld-2.28.so)
ls 13812 [003] 2167315.222667: 6 instructions:u: 7fac71e2d9ff _dl_cache_libcmp+0x1f (/lib/x86_64-linux-gnu/ld-2.28.so)
ls 13812 [003] 2167315.222667: 14 instructions:u: 7fac71e2d9ff _dl_cache_libcmp+0x1f (/lib/x86_64-linux-gnu/ld-2.28.so)
ls 13812 [003] 2167315.222667: 4 instructions:u: 7fac71e2dab2 _dl_cache_libcmp+0xd2 (/lib/x86_64-linux-gnu/ld-2.28.so)
ls 13812 [003] 2167315.222728: 16423 instructions:u: 7fac71e2477a _dl_map_object_deps+0x1ba (/lib/x86_64-linux-gnu/ld-2.28.so)
ls 13812 [003] 2167315.222734: 12731 instructions:u: 7fac71e27938 _dl_name_match_p+0x68 (/lib/x86_64-linux-gnu/ld-2.28.so)
...
After:
$ perf script --itrace=i10us
ls 13812 [003] 2167315.222583: 3270 instructions:u: 7fac71e2e494 __GI___tunables_init+0xf4 (/lib/x86_64-linux-gnu/ld-2.28.so)
ls 13812 [003] 2167315.222667: 30902 instructions:u: 7fac71e2da0f _dl_cache_libcmp+0x2f (/lib/x86_64-linux-gnu/ld-2.28.so)
ls 13812 [003] 2167315.222728: 16479 instructions:u: 7fac71e2477a _dl_map_object_deps+0x1ba (/lib/x86_64-linux-gnu/ld-2.28.so)
...
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: stable@vger.kernel.org
Fixes: f4aa081949 ("perf tools: Add Intel PT decoder")
Link: http://lkml.kernel.org/r/20190510124143.27054-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b906c056b6 upstream.
Multiplying the Memory Controller clock rate by the tick count results
in an integer overflow and in result the truncated tick value is being
programmed into hardware, such that the GR3D memory client performance is
reduced by two times.
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cbe08bcbbe upstream.
When reading only part of the id file, the ppos isn't tracked correctly.
This is taken care by simple_read_from_buffer.
Reading a single byte, and then the next byte would result EOF.
While this seems like not a big deal, this breaks abstractions that
reads information from files unbuffered. See for example
https://github.com/golang/go/issues/29399
This code was mentioned as problematic in
commit cd458ba9d5
("tracing: Do not (ab)use trace_seq in event_id_read()")
An example C code that show this bug is:
#include <stdio.h>
#include <stdint.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
int main(int argc, char **argv) {
if (argc < 2)
return 1;
int fd = open(argv[1], O_RDONLY);
char c;
read(fd, &c, 1);
printf("First %c\n", c);
read(fd, &c, 1);
printf("Second %c\n", c);
}
Then run with, e.g.
sudo ./a.out /sys/kernel/debug/tracing/events/tcp/tcp_set_state/id
You'll notice you're getting the first character twice, instead of the
first two characters in the id file.
Link: http://lkml.kernel.org/r/20181231115837.4932-1-elazar@lightbitslabs.com
Cc: Orit Wasserman <orit.was@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 23725aeeab ("ftrace: provide an id file for each event")
Signed-off-by: Elazar Leibovich <elazar@lightbitslabs.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 43a0541e31 upstream.
Both Tegra30 and Tegra114 have 4 ASID's and the corresponding bitfield of
the TLB_FLUSH register differs from later Tegra generations that have 128
ASID's.
In a result the PTE's are now flushed correctly from TLB and this fixes
problems with graphics (randomly failing tests) on Tegra30.
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 259799ea5a upstream.
Use gen_rtx_set instead of gen_rtx_SET. The former is a wrapper macro
that handles the difference between GCC versions implementing
the latter.
This fixes the following error on my system with g++ 5.4.0 as the host
compiler
HOSTCXX -fPIC scripts/gcc-plugins/arm_ssp_per_task_plugin.o
scripts/gcc-plugins/arm_ssp_per_task_plugin.c:42:14: error: macro "gen_rtx_SET" requires 3 arguments, but only 2 given
mask)),
^
scripts/gcc-plugins/arm_ssp_per_task_plugin.c: In function ‘unsigned int arm_pertask_ssp_rtl_execute()’:
scripts/gcc-plugins/arm_ssp_per_task_plugin.c:39:20: error: ‘gen_rtx_SET’ was not declared in this scope
emit_insn_before(gen_rtx_SET
Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Fixes: 189af46571 ("ARM: smp: add support for per-task stack canaries")
Cc: stable@vger.kernel.org
Tested-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3428030da0 upstream.
Generalize the helper ovl_open_maybe_copy_up() and use it to copy up file
with data before FS_IOC_SETFLAGS ioctl.
The FS_IOC_SETFLAGS ioctl is a bit of an odd ball in vfs, which probably
caused the confusion. File may be open O_RDONLY, but ioctl modifies the
file. VFS does not call mnt_want_write_file() nor lock inode mutex, but
fs-specific code for FS_IOC_SETFLAGS does. So ovl_ioctl() calls
mnt_want_write_file() for the overlay file, and fs-specific code calls
mnt_want_write_file() for upper fs file, but there was no call for
ovl_want_write() for copy up duration which prevents overlayfs from copying
up on a frozen upper fs.
Fixes: dab5ca8fd9 ("ovl: add lsattr/chattr support")
Cc: <stable@vger.kernel.org> # v4.19
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9de5be06d0 upstream.
Writepage requests were cropped to i_size & 0xffffffff, which meant that
mmaped writes to any file larger than 4G might be silently discarded.
Fix by storing the file size in a properly sized variable (loff_t instead
of size_t).
Reported-by: Antonio SJ Musumeci <trapexit@spawn.link>
Fixes: 6eaf4782eb ("fuse: writepages: crop secondary requests")
Cc: <stable@vger.kernel.org> # v3.13
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit babc250e27 upstream.
Rendering calls may be done simultaneously from the workqueue,
dlfb_ops_write, dlfb_ops_ioctl, dlfb_ops_set_par and dlfb_dpy_deferred_io.
The code is robust enough so that it won't crash on concurrent rendering.
However, concurrent rendering may cause display corruption if the same
pixel is simultaneously being rendered. In order to avoid this corruption,
this patch adds a mutex around the rendering calls.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: Bernie Thompson <bernie@plugable.com>
Cc: Ladislav Michl <ladis@linux-mips.org>
Cc: <stable@vger.kernel.org>
[b.zolnierkie: replace "dlfb:" with "uldfb:" in the patch summary]
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6b11f9d843 upstream.
If a framebuffer device is used as a console, the rendering calls
(copyarea, fillrect, imageblit) may be done with the console spinlock
held. On udlfb, these function call dlfb_handle_damage that takes a
blocking semaphore before acquiring an URB.
In order to fix the bug, this patch changes the calls copyarea, fillrect
and imageblit to offload USB work to a workqueue.
A side effect of this patch is 3x improvement in console scrolling speed
because the device doesn't have to be updated after each copyarea call.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: Bernie Thompson <bernie@plugable.com>
Cc: Ladislav Michl <ladis@linux-mips.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fb90339213 upstream.
This patch fixes definition of several clock gate and select register
that is wrong for rk3328 referring to the TRM and vendor kernel.
Also use correct number of softrst registers.
Fix clock definition for:
- clk_crypto
- aclk_h265
- pclk_h265
- aclk_h264
- hclk_h264
- aclk_axisram
- aclk_gmac
- aclk_usb3otg
Fixes: fe3511ad8a ("clk: rockchip: add clock controller for rk3328")
Cc: stable@vger.kernel.org
Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
Tested-by: Peter Geis <pgwipeout@gmail.com>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 40db569d67 upstream.
There are wrongly set parenthesis in the code that are resulting in a
wrong configuration being programmed for PLLM. The original fix was made
by Danny Huang in the downstream kernel. The patch was tested on Nyan Big
Tegra124 chromebook, PLLM rate changing works correctly now and system
doesn't lock up after changing the PLLM rate due to EMC scaling.
Cc: <stable@vger.kernel.org>
Tested-by: Steev Klimaszewski <steev@kali.org>
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Acked-By: Peter De Schrijver <pdeschrijver@nvidia.com>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b1029c9bc0 upstream.
If we fail to find a good deviceid while trying to pnfs instead of
propogating an error back fallback to doing IO to the MDS. Currently,
code with fals the IO with EINVAL.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Fixes: 8d40b0f148 ("NFS filelayout:call GETDEVICEINFO after pnfs_layout_process completes"
Cc: stable@vger.kernel.org # v4.11+
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f02f3755db upstream.
stat command with soft mount never return after server is stopped.
When alloc a new client, the state of the client will be set to
NFS4CLNT_LEASE_EXPIRED.
When the server is stopped, the state manager will work, and accord
the state to recover. But the state is NFS4CLNT_LEASE_EXPIRED, it
will drain the slot table and lead other task to wait queue, until
the client recovered. Then the stat command is hung.
When discover server trunking, the client will renew the lease,
but check the client state, it lead the client state corruption.
So, we need to call state manager to recover it when detect server
ip trunking.
Signed-off-by: ZhangXiaoxu <zhangxiaoxu5@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b79656ed44 upstream.
Systemd triggers the following warning during IPoIB device load:
mlx5_core 0000:00:0c.0 ib0: "systemd-udevd" wants to know my dev_id.
Should it look at dev_port instead?
See Documentation/ABI/testing/sysfs-class-net for more info.
This is caused due to user space attempt to differentiate old systems
without dev_port and new systems with dev_port. In case dev_port will be
zero, the systemd will try to read dev_id instead.
There is no need to print a warning in such case, because it is valid
situation and it is needed to ensure systemd compatibility with old
kernels.
Link: https://github.com/systemd/systemd/blob/master/src/udev/udev-builtin-net_id.c#L358
Cc: <stable@vger.kernel.org> # 4.19
Fixes: f6350da41d ("IB/ipoib: Log sysfs 'dev_id' accesses from userspace")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 107927fa59 upstream.
In imx_media_create_csi_of_links(), the 'struct v4l2_fwnode_link' must
be cleared for each endpoint iteration, otherwise if the remote port
has no "reg" property, link.remote_port will not be reset to zero.
This was discovered on the i.MX53 SMD board, since the OV5642 connects
directly to ipu1_csi0 and has a single source port with no "reg"
property.
Fixes: 621b08eabc ("media: staging/imx: remove static media link arrays")
Signed-off-by: Steve Longerbeam <slongerbeam@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 904371f90b upstream.
On i.MX6, the nearest upstream entity to the CSI can only be the
CSI video muxes or the Synopsys DW MIPI CSI-2 receiver.
However the i.MX53 has no CSI video muxes or a MIPI CSI-2 receiver.
So allow for the nearest upstream entity to the CSI to be something
other than those.
Fixes: bf3cfaa712 ("media: staging/imx: get CSI bus type from nearest
upstream entity")
Signed-off-by: Steve Longerbeam <slongerbeam@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 63604a143f upstream.
I previously added an RC_CORE dependency here, but missed the corner
case of CONFIG_VIDEO_SECO_CEC=y with CONFIG_RC_CORE=m, which still
causes a link error:
drivers/media/platform/seco-cec/seco-cec.o: In function `secocec_probe':
seco-cec.c:(.text+0x1b8): undefined reference to `devm_rc_allocate_device'
seco-cec.c:(.text+0x2e8): undefined reference to `devm_rc_register_device'
drivers/media/platform/seco-cec/seco-cec.o: In function `secocec_irq_handler':
seco-cec.c:(.text+0xa2c): undefined reference to `rc_keydown'
Refine the dependency to disallow building the RC subdriver in this case.
This is the same logic we apply in other drivers like it.
Fixes: f27dd0ad68 ("media: seco-cec: fix RC_CORE dependency")
Cc: <stable@vger.kernel.org> # 5.1
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Sean Young <sean@mess.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 933c132084 upstream.
After removal of clock_start() from before soc_camera_init_i2c() in
soc_camera_probe() by commit 9aea470b39 ("[media] soc-camera: switch
I2C subdevice drivers to use v4l2-clk") introduced in v3.11, the ov6650
driver could no longer probe the sensor successfully because its clock
was no longer turned on in advance. The issue was initially worked
around by adding that missing clock_start() equivalent to OMAP1 camera
interface driver - the only user of this sensor - but a propoer fix
should be rather implemented in the sensor driver code itself.
Fix the issue by inserting a delay between the clock is turned on and
the sensor I2C registers are read for the first time.
Tested on Amstrad Delta with now out of tree but still locally
maintained omap1_camera host driver.
Fixes: 9aea470b39 ("[media] soc-camera: switch I2C subdevice drivers to use v4l2-clk")
Signed-off-by: Janusz Krzysztofik <jmkrzyszt@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6a54b2e002 upstream.
Change strcat to strncpy in the "None" case to fix a buffer overflow
when cinode->oplock is reset to 0 by another thread accessing the same
cinode. It is never valid to append "None" to any other message.
Consolidate multiple writes to cinode->oplock to reduce raciness.
Signed-off-by: Christoph Probst <kernel@probst.it>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d69cb728e7 upstream.
For SMB1 oplock breaks we would grab one credit while sending the PDU
but we would never relese the credit back since we will never receive a
response to this from the server. Eventuallt this would lead to a hang
once all credits are leaked.
Fix this by defining a new flag CIFS_NO_SRV_RSP which indicates that there
is no server response to this command and thus we need to add any credits back
immediately after sending the PDU.
CC: Stable <stable@vger.kernel.org> #v5.0+
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b1a0ba8f77 upstream.
The ACEPC T8 and T11 mini PCs contain quite generic names in the sys_vendor
and product_name DMI strings, without this patch brcmfmac will try to load:
"brcmfmac43455-sdio.Default string-Default string.txt" as nvram file which
is way too generic.
The DMI strings on which we are matching are somewhat generic too, but
"To be filled by O.E.M." is less common then "Default string" and the
system-sku and bios-version strings are pretty unique. Beside the DMI
strings we also check the wifi-module chip-id and revision. I'm confident
that the combination of all this is unique.
Both the T8 and T11 use the same wifi-module, this commit adds DMI
quirks for both mini PCs pointing to brcmfmac43455-sdio.acepc-t8.txt .
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1690852
Cc: stable@vger.kernel.org
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4e0eaf239f upstream.
Currently, the pages that are allocated for the single mode of MSC are not
mapped into the device's dma space and the code is incorrectly using
*_to_phys() in place of a dma address. This fails with IOMMU enabled and
is otherwise bad practice.
Fix the single mode buffer allocation to map the pages into the device's
DMA space.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Fixes: ba82664c13 ("intel_th: Add Memory Storage Unit driver")
Cc: stable@vger.kernel.org # v4.4+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5467a68cbf upstream.
For lockless accesses to dentries we don't have pinned we rely
(among other things) upon having an RCU delay between dropping
the last reference and actually freeing the memory.
On the other hand, for things like pipes and sockets we neither
do that kind of lockless access, nor want to deal with the
overhead of an RCU delay every time a socket gets closed.
So delay was made optional - setting DCACHE_RCUACCESS in ->d_flags
made sure it would happen. We tried to avoid setting it unless
we knew we need it. Unfortunately, that had led to recurring
class of bugs, in which we missed the need to set it.
We only really need it for dentries that are created by
d_alloc_pseudo(), so let's not bother with trying to be smart -
just make having an RCU delay the default. The ones that do
*not* get it set the replacement flag (DCACHE_NORCU) and we'd
better use that sparingly. d_alloc_pseudo() is the only
such user right now.
FWIW, the race that finally prompted that switch had been
between __lock_parent() of immediate subdirectory of what's
currently the root of a disconnected tree (e.g. from
open-by-handle in progress) racing with d_splice_alias()
elsewhere picking another alias for the same inode, either
on outright corrupted fs image, or (in case of open-by-handle
on NFS) that subdirectory having been just moved on server.
It's not easy to hit, so the sky is not falling, but that's
not the first race on similar missed cases and the logics
for settinf DCACHE_RCUACCESS has gotten ridiculously
convoluted.
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ed4d0a4ea1 upstream.
The on-disk value is little endian and we need to convert it to
native endian before storing the value in the in-core structure.
Fixes: 7564beda19 ("md-cluster/raid10: support add disk under grow mode")
Cc: <stable@vger.kernel.org> # 4.20+
Acked-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4bc034d353 upstream.
This reverts commit 5a409b4f56.
This patch has two problems.
1/ it make multiple calls to submit_bio() from inside a make_request_fn.
The bios thus submitted will be queued on current->bio_list and not
submitted immediately. As the bios are allocated from a mempool,
this can theoretically result in a deadlock - all the pool of requests
could be in various ->bio_list queues and a subsequent mempool_alloc
could block waiting for one of them to be released.
2/ It aims to handle a case when there are many concurrent flush requests.
It handles this by submitting many requests in parallel - all of which
are identical and so most of which do nothing useful.
It would be more efficient to just send one lower-level request, but
allow that to satisfy multiple upper-level requests.
Fixes: 5a409b4f56 ("MD: fix lock contention for flush bios")
Cc: <stable@vger.kernel.org> # v4.19+
Tested-by: Xiao Ni <xni@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f6b50160a0 upstream.
__GFP_HIGHMEM is disabled if dax is enabled on brd, however
dax support for brd has been removed since commit (7a862fbbde
"brd: remove dax support"), so restore __GFP_HIGHMEM in
brd_insert_page().
Also remove the no longer applicable comments about DAX and highmem.
Cc: stable@vger.kernel.org
Fixes: 7a862fbbde ("brd: remove dax support")
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 51e0f22781 upstream.
Commit 7bd1d4093c ("stm class: Introduce an abstraction for System Trace
Module devices") naively calculates the channel bitmap size in 64-bit
chunks regardless of the size of underlying unsigned long, making the
bitmap half as big on a 32-bit system. This leads to an out of bounds
access with the upper half of the bitmap.
Fix this by using BITS_TO_LONGS. While at it, convert to using
struct_size() for the total size calculation of the master struct.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Fixes: 7bd1d4093c ("stm class: Introduce an abstraction for System Trace Module devices")
Reported-by: Mulu He <muluhe@codeaurora.org>
Cc: stable@vger.kernel.org # v4.4+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ee496da4c3 upstream.
Number of free masters is not set correctly in stm
free path. Fix this by properly adding the number
of output channels before setting them to 0 in
stm_output_disclaim().
Currently it is equivalent to doing nothing since
master->nr_free is incremented by 0.
Fixes: 7bd1d4093c ("stm class: Introduce an abstraction for System Trace Module devices")
Signed-off-by: Tingwei Zhang <tingwei@codeaurora.org>
Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
Cc: stable@vger.kernel.org # v4.4
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1829dda0e8 upstream.
LEVEL is a very common word, and now after many years it suddenly
clashed with another LEVEL define in the DRBD code.
Rename it to PA_ASM_LEVEL instead.
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d19a12906e upstream.
When making the text sections writeable with set_kernel_text_rw(1),
include all text sections including those in the __init section.
Otherwise functions marked with __meminit will stay read-only.
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: <stable@vger.kernel.org> # 4.20+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 44224bdb99 upstream.
The pdtlb and pitlb instructions are strongly ordered. The asms invoking
these instructions should be compiler memory barriers to ensure the
compiler doesn't reorder memory operations around these instructions.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
CC: stable@vger.kernel.org # v4.20+
Fixes: 3847dab774 ("parisc: Add alternative coding infrastructure")
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 70b464918e upstream.
During several error paths in the function
regulator_set_voltage_unlocked() the value of 'ret' can take on negative
error values. However, in calls that go through the 'goto out' statement,
this return value is lost and return 0 is used instead, indicating a
'pass'.
There are several cases where this function should legitimately return a
fail instead of a pass: one such case includes constraints check during
voltage selection in the call to regulator_check_voltage(), which can
have -EINVAL for the case when an unsupported voltage is incorrectly
requested. In that case, -22 is expected as the return value, not 0.
Fixes: 9243a195be ("regulator: core: Change voltage setting path")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Steve Twiss <stwiss.opensource@diasemi.com>
Reviewed-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c7e2d94b3d upstream.
Once blk_cleanup_queue() returns, tags shouldn't be used any more,
because blk_mq_free_tag_set() may be called. Commit 45a9c9d909
("blk-mq: Fix a use-after-free") fixes this issue exactly.
However, that commit introduces another issue. Before 45a9c9d909,
we are allowed to run queue during cleaning up queue if the queue's
kobj refcount is held. After that commit, queue can't be run during
queue cleaning up, otherwise oops can be triggered easily because
some fields of hctx are freed by blk_mq_free_queue() in blk_cleanup_queue().
We have invented ways for addressing this kind of issue before, such as:
8dc765d438 ("SCSI: fix queue cleanup race before queue initialization is done")
c2856ae2f3 ("blk-mq: quiesce queue before freeing queue")
But still can't cover all cases, recently James reports another such
kind of issue:
https://marc.info/?l=linux-scsi&m=155389088124782&w=2
This issue can be quite hard to address by previous way, given
scsi_run_queue() may run requeues for other LUNs.
Fixes the above issue by freeing hctx's resources in its release handler, and this
way is safe becasue tags isn't needed for freeing such hctx resource.
This approach follows typical design pattern wrt. kobject's release handler.
Cc: Dongli Zhang <dongli.zhang@oracle.com>
Cc: James Smart <james.smart@broadcom.com>
Cc: Bart Van Assche <bart.vanassche@wdc.com>
Cc: linux-scsi@vger.kernel.org,
Cc: Martin K . Petersen <martin.petersen@oracle.com>,
Cc: Christoph Hellwig <hch@lst.de>,
Cc: James E . J . Bottomley <jejb@linux.vnet.ibm.com>,
Reported-by: James Smart <james.smart@broadcom.com>
Fixes: 45a9c9d909 ("blk-mq: Fix a use-after-free")
Cc: stable@vger.kernel.org
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 8f0916c6dc ]
ethtool user spaces needs to know ring count via ETHTOOL_GRXRINGS when
executing (ethtool -x) which is retrieved via ethtool get_rxnfc callback,
in mlx5 this callback is disabled when CONFIG_MLX5_EN_RXNFC=n.
This patch allows only ETHTOOL_GRXRINGS command on mlx5e_get_rxnfc() when
CONFIG_MLX5_EN_RXNFC is disabled, so ethtool -x will continue working.
Fixes: fe6d86b3c3 ("net/mlx5e: Add CONFIG_MLX5_EN_RXNFC for ethtool rx nfc")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit bad861f31b ]
mlxfw can be compiled as external module while mlx5_core can be
builtin, in such case mlx5 will act like mlxfw is disabled.
Since mlxfw is just a service library for mlx* drivers,
imply it in mlx5_core to make it always reachable if it was enabled.
Fixes: 3ffaabecd1 ("net/mlx5e: Support the flash device ethtool callback")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c979c445a8 ]
Flow destination comparison has an inaccuracy: code see no
difference between same vf ports, which belong to different pfs.
Example: If start ping from VF0 (PF1) to VF1 (PF1) and mirror
all traffic to VF0 (PF2), icmp reply to VF0 (PF1) and mirrored
flow to VF0 (PF2) would be determined as same destination. It lead
to creating flow handler with rule nodes, which not added to node
tree. When later driver try to delete this flow rules we got
kernel crash.
Add comparison of vhca_id field to avoid this.
Fixes: 1228e912c9 ("net/mlx5: Consider encapsulation properties when comparing destinations")
Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit cf83c8fdcd ]
For all representors added firmware version info to show in
ethtool driver info.
For uplink representor, because only it is tied to the pci device
sysfs, added pci bus info.
Fixes: ff9b85de5d ("net/mlx5e: Add some ethtool port control entries to the uplink rep netdev")
Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com>
Reviewed-by: Gavi Teitz <gavi@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 12d5cbf89a ]
When flow_rule_match_XYZ() functions were first introduced,
flow_rule_match_cvlan() for inner vlan is missing.
In mlx5_core driver, to get inner vlan key and mask, flow_rule_match_vlan()
is just called, which is wrong because it obtains outer vlan information by
FLOW_DISSECTOR_KEY_VLAN.
This commit fixes this by changing to call flow_rule_match_cvlan() after
it's added.
Fixes: 8f2566225a ("flow_offload: add flow_rule and flow_match structures and use them")
Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit f1436c8036 ]
Prevent reading unsupported slave address from SFP EEPROM by testing
Diagnostic Monitoring Type byte in EEPROM. Read only page zero of
EEPROM, in case this byte is zero.
If some SFP transceiver does not support Digital Optical Monitoring
(DOM), reading SFP EEPROM slave address 0x51 could return an error.
Availability of DOM support is verified by reading from zero page
Diagnostic Monitoring Type byte describing how diagnostic monitoring is
implemented by transceiver. If bit 6 of this byte is set, it indicates
that digital diagnostic monitoring has been implemented. Otherwise it is
not and transceiver could fail to reply to transaction for slave address
0x51 [1010001X (A2h)], which is used to access measurements page.
Such issue has been observed when reading cable MCP2M00-xxxx,
MCP7F00-xxxx, and few others.
Fixes: 2ea109039c ("mlxsw: spectrum: Add support for access cable info via ethtool")
Fixes: 4400081b63 ("mlxsw: spectrum: Fix EEPROM access in case of SFP/SFP+")
Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c52ecff7e6 ]
Old Mellanox silicons, like switchx-2, switch-ib do not support reading
QSFP modules temperature through MTMP register. Attempt to access this
register on systems equipped with the this kind of silicon will cause
initialization flow failure.
Test for hardware resource capability is added in order to distinct
between old and new silicon - old silicons do not have such capability.
Fixes: 6a79507cfe ("mlxsw: core: Extend thermal module with per QSFP module thermal zones")
Fixes: 5c42eaa07b ("mlxsw: core: Extend hwmon interface with QSFP module temperature attributes")
Reported-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 532b0f7ece ]
Error message printed:
modprobe: ERROR: could not insert 'tipc': Address family not
supported by protocol.
when modprobe tipc after the following patch: switch order of
device registration, commit 7e27e8d613
("tipc: switch order of device registration to fix a crash")
Because sock_create_kern(net, AF_TIPC, ...) is called by
tipc_topsrv_create_listener() in the initialization process
of tipc_net_ops, tipc_socket_init() must be execute before that.
I move tipc_socket_init() into function tipc_init_net().
Fixes: 7e27e8d613
("tipc: switch order of device registration to fix a crash")
Signed-off-by: Junwei Hu <hujunwei4@huawei.com>
Reported-by: Wang Wang <wangwang2@huawei.com>
Reviewed-by: Kang Zhou <zhoukang7@huawei.com>
Reviewed-by: Suanming Mou <mousuanming@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ac03046ece ]
When the socket is released, we should free all packets
queued in the per-socket list in order to avoid a memory
leak.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 7e27e8d613 ]
When tipc is loaded while many processes try to create a TIPC socket,
a crash occurs:
PANIC: Unable to handle kernel paging request at virtual
address "dfff20000000021d"
pc : tipc_sk_create+0x374/0x1180 [tipc]
lr : tipc_sk_create+0x374/0x1180 [tipc]
Exception class = DABT (current EL), IL = 32 bits
Call trace:
tipc_sk_create+0x374/0x1180 [tipc]
__sock_create+0x1cc/0x408
__sys_socket+0xec/0x1f0
__arm64_sys_socket+0x74/0xa8
...
This is due to race between sock_create and unfinished
register_pernet_device. tipc_sk_insert tries to do
"net_generic(net, tipc_net_id)".
but tipc_net_id is not initialized yet.
So switch the order of the two to close the race.
This can be reproduced with multiple processes doing socket(AF_TIPC, ...)
and one process doing module removal.
Fixes: a62fbccecd ("tipc: make subscriber server support net namespace")
Signed-off-by: Junwei Hu <hujunwei4@huawei.com>
Reported-by: Wang Wang <wangwang2@huawei.com>
Reviewed-by: Xiaogang Wang <wangxiaogang3@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit feadc4b6cf ]
Currently, nla_put_iflink() doesn't put the IFLA_LINK attribute when
iflink == ifindex.
In some cases, a device can be created in a different netns with the
same ifindex as its parent. That device will not dump its IFLA_LINK
attribute, which can confuse some userspace software that expects it.
For example, if the last ifindex created in init_net and foo are both
8, these commands will trigger the issue:
ip link add parent type dummy # ifindex 9
ip link add link parent netns foo type macvlan # ifindex 9 in ns foo
So, in case a device puts the IFLA_LINK_NETNSID attribute in a dump,
always put the IFLA_LINK attribute as well.
Thanks to Dan Winship for analyzing the original OpenShift bug down to
the missing netlink attribute.
v2: change Fixes tag, it's been here forever, as Nicolas Dichtel said
add Nicolas' ack
v3: change Fixes tag
fix subject typo, spotted by Edward Cree
Analyzed-by: Dan Winship <danw@redhat.com>
Fixes: d8a5ec6727 ("[NET]: netlink support for moving devices between network namespaces.")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 185ce5c38e ]
Zerocopy skbs without completion notification were added for packet
sockets with PACKET_TX_RING user buffers. Those signal completion
through the TP_STATUS_USER bit in the ring. Zerocopy annotation was
added only to avoid premature notification after clone or orphan, by
triggering a copy on these paths for these packets.
The mechanism had to define a special "no-uarg" mode because packet
sockets already use skb_uarg(skb) == skb_shinfo(skb)->destructor_arg
for a different pointer.
Before deferencing skb_uarg(skb), verify that it is a real pointer.
Fixes: 5cd8d46ea1 ("packet: copy user buffers before orphan or clone")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d7c04b05c9 ]
When host is under high stress, it is very possible thread
running netdev_wait_allrefs() returns from msleep(250)
10 seconds late.
This leads to these messages in the syslog :
[...] unregister_netdevice: waiting for syz_tun to become free. Usage count = 0
If the device refcount is zero, the wait is over.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0fe9f173d6 ]
Jiri reported that with a kernel built with CONFIG_FIXED_PHY=y,
CONFIG_NET_DSA=m and CONFIG_NET_DSA_LOOP=m, we would not get to a
functional state where the mock-up driver is registered. Turns out that
we are not descending into drivers/net/dsa/ unconditionally, and we
won't be able to link-in dsa_loop_bdinfo.o which does the actual mock-up
mdio device registration.
Reported-by: Jiri Pirko <jiri@resnulli.us>
Fixes: 40013ff20b ("net: dsa: Fix functional dsa-loop dependency on FIXED_PHY")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>
Tested-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 61fb0d0168 ]
At ipv6 route dismantle, fib6_drop_pcpu_from() is responsible
for finding all percpu routes and set their ->from pointer
to NULL, so that fib6_ref can reach its expected value (1).
The problem right now is that other cpus can still catch the
route being deleted, since there is no rcu grace period
between the route deletion and call to fib6_drop_pcpu_from()
This can leak the fib6 and associated resources, since no
notifier will take care of removing the last reference(s).
I decided to add another boolean (fib6_destroying) instead
of reusing/renaming exception_bucket_flushed to ease stable backports,
and properly document the memory barriers used to implement this fix.
This patch has been co-developped with Wei Wang.
Fixes: 93531c6743 ("net/ipv6: separate handling of FIB entries from dst based routes")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Wei Wang <weiwan@google.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Martin Lau <kafai@fb.com>
Acked-by: Wei Wang <weiwan@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 510e2ceda0 ]
When inserting route cache into the exception table, the key is
generated with both src_addr and dest_addr with src addr routing.
However, current logic always assumes the src_addr used to generate the
key is a /128 host address. This is not true in the following scenarios:
1. When the route is a gateway route or does not have next hop.
(rt6_is_gw_or_nonexthop() == false)
2. When calling ip6_rt_cache_alloc(), saddr is passed in as NULL.
This means, when looking for a route cache in the exception table, we
have to do the lookup twice: first time with the passed in /128 host
address, second time with the src_addr stored in fib6_info.
This solves the pmtu discovery issue reported by Mikael Magnusson where
a route cache with a lower mtu info is created for a gateway route with
src addr. However, the lookup code is not able to find this route cache.
Fixes: 2b760fcf5c ("ipv6: hook up exception table to store dst cache")
Reported-by: Mikael Magnusson <mikael.kernel@lists.m7n.se>
Bisected-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Wei Wang <weiwan@google.com>
Cc: Martin Lau <kafai@fb.com>
Cc: Eric Dumazet <edumazet@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1a42010cdc upstream.
Define the gup_fast_permitted to check against the asce_limit of the
mm attached to the current task, then replace the s390 specific gup
code with the generic implementation in mm/gup.c.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d1874a0c28 upstream.
Change the way how pgd_offset, p4d_offset, pud_offset and pmd_offset
walk the page tables. pgd_offset now always calculates the index for
the top-level page table and adds it to the pgd, this is either a
segment table offset for a 2-level setup, a region-3 offset for 3-levels,
region-2 offset for 4-levels, or a region-1 offset for a 5-level setup.
The other three functions p4d_offset, pud_offset and pmd_offset will
only add the respective offset if they dereference the passed pointer.
With the new way of walking the page tables a sequence like this from
mm/gup.c now works:
pgdp = pgd_offset(current->mm, addr);
pgd = READ_ONCE(*pgdp);
p4dp = p4d_offset(&pgd, addr);
p4d = READ_ONCE(*p4dp);
pudp = pud_offset(&p4d, addr);
pud = READ_ONCE(*pudp);
pmdp = pmd_offset(&pud, addr);
pmd = READ_ONCE(*pmdp);
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c4703ce11c upstream.
Users have reported intermittent occurrences of DIMM initialization
failures due to duplicate allocations of address capacity detected in
the labels, or errors of the form below, both have the same root cause.
nd namespace1.4: failed to track label: 0
WARNING: CPU: 17 PID: 1381 at drivers/nvdimm/label.c:863
RIP: 0010:__pmem_label_update+0x56c/0x590 [libnvdimm]
Call Trace:
? nd_pmem_namespace_label_update+0xd6/0x160 [libnvdimm]
nd_pmem_namespace_label_update+0xd6/0x160 [libnvdimm]
uuid_store+0x17e/0x190 [libnvdimm]
kernfs_fop_write+0xf0/0x1a0
vfs_write+0xb7/0x1b0
ksys_write+0x57/0xd0
do_syscall_64+0x60/0x210
Unfortunately those reports were typically with a busy parallel
namespace creation / destruction loop making it difficult to see the
components of the bug. However, Jane provided a simple reproducer using
the work-in-progress sub-section implementation.
When ndctl is reconfiguring a namespace it may take an existing defunct
/ disabled namespace and reconfigure it with a new uuid and other
parameters. Critically namespace_update_uuid() takes existing address
resources and renames them for the new namespace to use / reconfigure as
it sees fit. The bug is that this rename only happens in the resource
tracking tree. Existing labels with the old uuid are not reaped leading
to a scenario where multiple active labels reference the same span of
address range.
Teach namespace_update_uuid() to flag any references to the old uuid for
reaping at the next label update attempt.
Cc: <stable@vger.kernel.org>
Fixes: bf9bccc14c ("libnvdimm: pmem label sets and namespace instantiation")
Link: https://github.com/pmem/ndctl/issues/91
Reported-by: Jane Chu <jane.chu@oracle.com>
Reported-by: Jeff Moyer <jmoyer@redhat.com>
Reported-by: Erwin Tsaur <erwin.tsaur@oracle.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d2f8ae0e4c upstream.
syncconfig is responsible for keeping auto.conf up-to-date, so if it
fails for any reason, the build must be terminated immediately.
However, since commit 9390dff66a ("kbuild: invoke syncconfig if
include/config/auto.conf.cmd is missing"), Kbuild continues running
even after syncconfig fails.
You can confirm this by intentionally making syncconfig error out:
# diff --git a/scripts/kconfig/confdata.c b/scripts/kconfig/confdata.c
# index 08ba146..307b9de 100644
# --- a/scripts/kconfig/confdata.c
# +++ b/scripts/kconfig/confdata.c
# @@ -1023,6 +1023,9 @@ int conf_write_autoconf(int overwrite)
# FILE *out, *tristate, *out_h;
# int i;
#
# + if (overwrite)
# + return 1;
# +
# if (!overwrite && is_present(autoconf_name))
# return 0;
Then, syncconfig fails, but Make would not stop:
$ make -s mrproper allyesconfig defconfig
$ make
scripts/kconfig/conf --syncconfig Kconfig
*** Error during sync of the configuration.
make[2]: *** [scripts/kconfig/Makefile;69: syncconfig] Error 1
make[1]: *** [Makefile;557: syncconfig] Error 2
make: *** [include/config/auto.conf.cmd] Deleting file 'include/config/tristate.conf'
make: Failed to remake makefile 'include/config/auto.conf'.
SYSTBL arch/x86/include/generated/asm/syscalls_32.h
SYSHDR arch/x86/include/generated/asm/unistd_32_ia32.h
SYSHDR arch/x86/include/generated/asm/unistd_64_x32.h
SYSTBL arch/x86/include/generated/asm/syscalls_64.h
[ continue running ... ]
The reason is in the behavior of a pattern rule with multi-targets.
%/auto.conf %/auto.conf.cmd %/tristate.conf: $(KCONFIG_CONFIG)
$(Q)$(MAKE) -f $(srctree)/Makefile syncconfig
GNU Make knows this rule is responsible for making all the three files
simultaneously. As far as examined, auto.conf.cmd is the target in
question when this rule is invoked. It is probably because auto.conf.cmd
is included below the inclusion of auto.conf.
The inclusion of auto.conf is mandatory, while that of auto.conf.cmd
is optional. GNU Make does not care about the failure in the process
of updating optional include files.
I filed this issue (https://savannah.gnu.org/bugs/?56301) in case this
behavior could be improved somehow in future releases of GNU Make.
Anyway, it is quite easy to fix our Makefile.
Given that auto.conf is already a mandatory include file, there is no
reason to stick auto.conf.cmd optional. Make it mandatory as well.
Cc: linux-stable <stable@vger.kernel.org> # 5.0+
Fixes: 9390dff66a ("kbuild: invoke syncconfig if include/config/auto.conf.cmd is missing")
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
[commented out diff above to keep patch happy - gregkh]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b63a9de02d upstream.
Displaying the session id in /proc/fs/cifs/DebugData
is needed in order to correlate Linux client information
with network and server traces for many common support
scenarios. Turned out to be very important for debugging.
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ee66e453db upstream.
...now that VMX's preemption timer, i.e. the hv_timer, also adjusts its
programmed time based on lapic_timer_advance_ns. Without the delay, a
guest can see a timer interrupt arrive before the requested time when
KVM is using the hv_timer to emulate the guest's interrupt.
Fixes: c5ce8235cf ("KVM: VMX: Optimize tscdeadline timer latency")
Cc: <stable@vger.kernel.org>
Cc: Wanpeng Li <wanpengli@tencent.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 11988499e6 upstream.
KVM allows userspace to violate consistency checks related to the
guest's CPUID model to some degree. Generally speaking, userspace has
carte blanche when it comes to guest state so long as jamming invalid
state won't negatively affect the host.
Currently this is seems to be a non-issue as most of the interesting
EFER checks are missing, e.g. NX and LME, but those will be added
shortly. Proactively exempt userspace from the CPUID checks so as not
to break userspace.
Note, the efer_reserved_bits check still applies to userspace writes as
that mask reflects the host's capabilities, e.g. KVM shouldn't allow a
guest to run with NX=1 if it has been disabled in the host.
Fixes: d80174745b ("KVM: SVM: Only allow setting of EFER_SVME when CPUID SVM is set")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4ddc920457 upstream.
kvm_dirty_bitmap_bytes() will return the size of the dirty bitmap of
the memslot rather than the size of bitmap passed over from the ioctl.
Here for KVM_CLEAR_DIRTY_LOG we should only copy exactly the size of
bitmap that covers kvm_clear_dirty_log.num_pages.
Signed-off-by: Peter Xu <peterx@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 2a31b9db15
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f93f7ede08 upstream.
The RDPMC-exiting control is dependent on the existence of the RDPMC
instruction itself, i.e. is not tied to the "Architectural Performance
Monitoring" feature. For all intents and purposes, the control exists
on all CPUs with VMX support since RDPMC also exists on all VCPUs with
VMX supported. Per Intel's SDM:
The RDPMC instruction was introduced into the IA-32 Architecture in
the Pentium Pro processor and the Pentium processor with MMX technology.
The earlier Pentium processors have performance-monitoring counters, but
they must be read with the RDMSR instruction.
Because RDPMC-exiting always exists, KVM requires the control and refuses
to load if it's not available. As a result, hiding the PMU from a guest
breaks nested virtualization if the guest attemts to use KVM.
While it's not explicitly stated in the RDPMC pseudocode, the VM-Exit
check for RDPMC-exiting follows standard fault vs. VM-Exit prioritization
for privileged instructions, e.g. occurs after the CPL/CR0.PE/CR4.PCE
checks, but before the counter referenced in ECX is checked for validity.
In other words, the original KVM behavior of injecting a #GP was correct,
and the KVM unit test needs to be adjusted accordingly, e.g. eat the #GP
when the unit test guest (L3 in this case) executes RDPMC without
RDPMC-exiting set in the unit test host (L2).
This reverts commit e51bfdb687.
Fixes: e51bfdb687 ("KVM: nVMX: Expose RDPMC-exiting only when guest supports PMU")
Reported-by: David Hill <hilld@binarystorm.net>
Cc: Saar Amar <saaramar@microsoft.com>
Cc: Mihai Carabas <mihai.carabas@oracle.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Liran Alon <liran.alon@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0d52154bb0 upstream.
When failing from creating cache jbd2_inode_cache, we will destroy the
previously created cache jbd2_handle_cache twice. This patch fixes
this by moving each cache initialization/destruction to its own
separate, individual function.
Signed-off-by: Chengguang Xu <cgxu519@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 56df90b631 upstream.
Add patch for realtek codec in Lenovo B50-70 that fixes inverted
internal microphone channel.
Device IdeaPad Y410P has the same PCI SSID as Lenovo B50-70,
but first one is about fix the noise and it didn't seem help in a
later kernel version.
So I replaced IdeaPad Y410P device description with B50-70 and apply
inverted microphone fix.
Bugzilla: https://bugs.launchpad.net/ubuntu/+source/alsa-driver/+bug/1524215
Signed-off-by: Michał Wadowski <wadosm@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dad3197da7 upstream.
Dell platform with ALC298.
system enter to runtime suspend. Headphone had noise.
Let Headset Mic not shutup will solve this issue.
[ Fixed minor coding style issues by tiwai ]
Signed-off-by: Kailang Yang <kailang@realtek.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 891afcf246 upstream.
A mistake was made in the identification of the four variants of the
System76 Gazelle (gaze14). This patch corrects the PCI ID of the
17-inch, GTX 1660 Ti variant from 0x8560 to 0x8551. This patch also
adds the correct fixups for the 15-inch and 17-inch GTX 1650 variants
with PCI IDs 0x8560 and 0x8561.
Tests were done on all four variants ensuring full audio capability.
Fixes: 80a5052db7 ("ALSA: hdea/realtek - Headset fixup for System76 Gazelle (gaze14)")
Signed-off-by: Jeremy Soller <jeremy@system76.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2c1d0e3631 upstream.
Handling of aborted journal is a special code path different from
standard ext4_error() one and it can call panic() as well. Commit
1dc1097ff6 ("ext4: avoid panic during forced reboot") forgot to update
this path so fix that omission.
Fixes: 1dc1097ff6 ("ext4: avoid panic during forced reboot")
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org # 5.1
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 08fc98a4d6 upstream.
The buffer_head (frames[0].bh) and it's corresping page can be
potentially free'd once brelse() is done inside the for loop
but before the for loop exits in dx_release(). It can be free'd
in another context, when the page cache is flushed via
drop_caches_sysctl_handler(). This results into below data abort
when accessing info->indirect_levels in dx_release().
Unable to handle kernel paging request at virtual address ffffffc17ac3e01e
Call trace:
dx_release+0x70/0x90
ext4_htree_fill_tree+0x2d4/0x300
ext4_readdir+0x244/0x6f8
iterate_dir+0xbc/0x160
SyS_getdents64+0x94/0x174
Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 57a0da28ce upstream.
Unaligned AIO must be serialized because the zeroing of partial blocks
of unaligned AIO can result in data corruption in case it's overlapping
another in flight IO.
Currently we wait for all unwritten extents before we submit unaligned
AIO which protects data in case of unaligned AIO is following overlapping
IO. However if a unaligned AIO is followed by overlapping aligned AIO we
can still end up corrupting data.
To fix this, we must make sure that the unaligned AIO is the only IO in
flight by waiting for unwritten extents conversion not just before the
IO submission, but right after it as well.
This problem can be reproduced by xfstest generic/538
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 592acbf168 upstream.
This commit zeroes out the unused memory region in the buffer_head
corresponding to the extent metablock after writing the extent header
and the corresponding extent node entries.
This is done to prevent random uninitialized data from getting into
the filesystem when the extent block is synced.
This fixes CVE-2019-11833.
Signed-off-by: Sriram Rajagopalan <sriramr@arista.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f91253a3d0 upstream.
The Linux kernel will auto-disables all boot consoles whenever it
gets a preferred real console.
Currently on RISC-V systems, if we have a real console which is not
RISCV SBI console then boot consoles (such as earlycon=sbi) are not
auto-disabled when a real console (ttyS0 or ttySIF0) is available.
This results in duplicate prints at boot-time after kernel starts
using real console (i.e. ttyS0 or ttySIF0) if "earlycon=" kernel
parameter was passed by bootloader.
The reason for above issue is that RISCV SBI console always adds
itself as preferred console which is causing other real consoles
to be not used as preferred console.
Ideally "console=" kernel parameter passed by bootloaders should
be the one selecting a preferred real console.
This patch fixes above issue by not forcing RISCV SBI console as
preferred console.
Fixes: afa6b1ccfa ("tty: New RISC-V SBI console driver")
Cc: stable@vger.kernel.org
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Reviewed-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ec084de929 upstream.
synchronize_rcu() didn't wait for call_rcu() callbacks, so inode wb
switch may not go to the workqueue after synchronize_rcu(). Thus
previous scheduled switches was not finished even flushing the
workqueue, which will cause a NULL pointer dereferenced followed below.
VFS: Busy inodes after unmount of vdd. Self-destruct in 5 seconds. Have a nice day...
BUG: unable to handle kernel NULL pointer dereference at 0000000000000278
evict+0xb3/0x180
iput+0x1b0/0x230
inode_switch_wbs_work_fn+0x3c0/0x6a0
worker_thread+0x4e/0x490
? process_one_work+0x410/0x410
kthread+0xe6/0x100
ret_from_fork+0x39/0x50
Replace the synchronize_rcu() call with a rcu_barrier() to wait for all
pending callbacks to finish. And inc isw_nr_in_flight after call_rcu()
in inode_switch_wbs() to make more sense.
Link: http://lkml.kernel.org/r/20190429024108.54150-1-jiufei.xue@linux.alibaba.com
Signed-off-by: Jiufei Xue <jiufei.xue@linux.alibaba.com>
Acked-by: Tejun Heo <tj@kernel.org>
Suggested-by: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 60fce36afa upstream.
syzbot reported the following error from a tree with a head commit of
baf76f0c58 ("slip: make slhc_free() silently accept an error pointer")
BUG: unable to handle kernel paging request at ffffea0003348000
#PF error: [normal kernel read fault]
PGD 12c3f9067 P4D 12c3f9067 PUD 12c3f8067 PMD 0
Oops: 0000 [#1] PREEMPT SMP KASAN
CPU: 1 PID: 28916 Comm: syz-executor.2 Not tainted 5.1.0-rc6+ #89
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:constant_test_bit arch/x86/include/asm/bitops.h:314 [inline]
RIP: 0010:PageCompound include/linux/page-flags.h:186 [inline]
RIP: 0010:isolate_freepages_block+0x1c0/0xd40 mm/compaction.c:579
Code: 01 d8 ff 4d 85 ed 0f 84 ef 07 00 00 e8 29 00 d8 ff 4c 89 e0 83 85 38 ff
ff ff 01 48 c1 e8 03 42 80 3c 38 00 0f 85 31 0a 00 00 <4d> 8b 2c 24 31 ff 49
c1 ed 10 41 83 e5 01 44 89 ee e8 3a 01 d8 ff
RSP: 0018:ffff88802b31eab8 EFLAGS: 00010246
RAX: 1ffffd4000669000 RBX: 00000000000cd200 RCX: ffffc9000a235000
RDX: 000000000001ca5e RSI: ffffffff81988cc7 RDI: 0000000000000001
RBP: ffff88802b31ebd8 R08: ffff88805af700c0 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffea0003348000
R13: 0000000000000000 R14: ffff88802b31f030 R15: dffffc0000000000
FS: 00007f61648dc700(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffea0003348000 CR3: 0000000037c64000 CR4: 00000000001426e0
Call Trace:
fast_isolate_around mm/compaction.c:1243 [inline]
fast_isolate_freepages mm/compaction.c:1418 [inline]
isolate_freepages mm/compaction.c:1438 [inline]
compaction_alloc+0x1aee/0x22e0 mm/compaction.c:1550
There is no reproducer and it is difficult to hit -- 1 crash every few
days. The issue is very similar to the fix in commit 6b0868c820
("mm/compaction.c: correct zone boundary handling when resetting pageblock
skip hints"). When isolating free pages around a target pageblock, the
boundary handling is off by one and can stray into the next pageblock.
Triggering the syzbot error requires that the end of pageblock is section
or zone aligned, and that the next section is unpopulated.
A more subtle consequence of the bug is that pageblocks were being
improperly used as migration targets which potentially hurts fragmentation
avoidance in the long-term one page at a time.
A debugging patch revealed that it's definitely possible to stray outside
of a pageblock which is not intended. While syzbot cannot be used to
verify this patch, it was confirmed that the debugging warning no longer
triggers with this patch applied. It has also been confirmed that the THP
allocation stress tests are not degraded by this patch.
Link: http://lkml.kernel.org/r/20190510182124.GI18914@techsingularity.net
Fixes: e332f741a8 ("mm, compaction: be selective about what pageblocks to clear skip hints")
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reported-by: syzbot+d84c80f9fe26a0f7a734@syzkaller.appspotmail.com
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org> # v5.1+
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0672d22a19 upstream.
Commit 6d4cd041f0 ("net: phy: at803x: disable delay only for RGMII mode")
exposed an issue on imx DTS files using AR8031/AR8035 PHYs.
The end result is that the boards can no longer obtain an IP address
via UDHCP, for example.
Quoting Andrew Lunn:
"The problem here is, all the DTs were broken since day 0. However,
because the PHY driver was also broken, nobody noticed and it
worked. Now that the PHY driver has been fixed, all the bugs in the
DTs now become an issue"
To fix this problem, the phy-mode property needs to be "rgmii-id", which
has the following meaning as per
Documentation/devicetree/bindings/net/ethernet.txt:
"RGMII with internal RX and TX delays provided by the PHY, the MAC should
not add the RX or TX delays in this case)"
Tested on imx6-sabresd, imx6sx-sdb and imx7d-pico boards with
successfully restored networking.
Based on the initial submission from Steve Twiss for the
imx6qdl-sabresd.
Signed-off-by: Fabio Estevam <festevam@gmail.com>
Tested-by: Baruch Siach <baruch@tkos.co.il>
Tested-by: Soeren Moch <smoch@web.de>
Tested-by: Steve Twiss <stwiss.opensource@diasemi.com>
Tested-by: Adam Thomson <Adam.Thomson@diasemi.com>
Signed-off-by: Steve Twiss <stwiss.opensource@diasemi.com>
Tested-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Cc: "George G. Davis" <george_davis@mentor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 55be8658c7 upstream.
According to ipmi spec, block number is a number that is incremented,
starting with 0, for each new block of message data returned using the
middle transaction.
Here, the 'blocknum' is data[0] which always starts from zero(0) and
'ssif_info->multi_pos' starts from 1.
So, we need to add +1 to blocknum while comparing with multi_pos.
Fixes: 7d6380cd40 ("ipmi:ssif: Fix handling of multi-part return messages").
Reported-by: Kiran Kolukuluru <kirank@ami.com>
Signed-off-by: Kamlakant Patel <kamlakantp@marvell.com>
Message-Id: <1556106615-18722-1-git-send-email-kamlakantp@marvell.com>
[Also added a debug log if the block numbers don't match.]
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Cc: stable@vger.kernel.org # 4.4
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d73236383e upstream.
This is required for SSIF to work.
There was no way to know if the interface being added was SI
or SSIF from the platform data, but that was required so the
i2c-addr is only added for SSIF interfaces. So add a field
for that.
Also rework the logic a bit so that ipmi-type is not set
for SSIF interfaces, as it is not necessary for that.
Fixes: 3cd83bac48 ("ipmi: Consolidate the adding of platform devices")
Reported-by: Kamlakant Patel <kamlakantp@marvell.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Cc: stable@vger.kernel.org # 5.1
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1bee2addc0 upstream.
In journal_reclaim() ja->cur_idx of each cache will be update to
reclaim available journal buckets. Variable 'int n' is used to count how
many cache is successfully reclaimed, then n is set to c->journal.key
by SET_KEY_PTRS(). Later in journal_write_unlocked(), a for_each_cache()
loop will write the jset data onto each cache.
The problem is, if all jouranl buckets on each cache is full, the
following code in journal_reclaim(),
529 for_each_cache(ca, c, iter) {
530 struct journal_device *ja = &ca->journal;
531 unsigned int next = (ja->cur_idx + 1) % ca->sb.njournal_buckets;
532
533 /* No space available on this device */
534 if (next == ja->discard_idx)
535 continue;
536
537 ja->cur_idx = next;
538 k->ptr[n++] = MAKE_PTR(0,
539 bucket_to_sector(c, ca->sb.d[ja->cur_idx]),
540 ca->sb.nr_this_dev);
541 }
542
543 bkey_init(k);
544 SET_KEY_PTRS(k, n);
If there is no available bucket to reclaim, the if() condition at line
534 will always true, and n remains 0. Then at line 544, SET_KEY_PTRS()
will set KEY_PTRS field of c->journal.key to 0.
Setting KEY_PTRS field of c->journal.key to 0 is wrong. Because in
journal_write_unlocked() the journal data is written in following loop,
649 for (i = 0; i < KEY_PTRS(k); i++) {
650-671 submit journal data to cache device
672 }
If KEY_PTRS field is set to 0 in jouranl_reclaim(), the journal data
won't be written to cache device here. If system crahed or rebooted
before bkeys of the lost journal entries written into btree nodes, data
corruption will be reported during bcache reload after rebooting the
system.
Indeed there is only one cache in a cache set, there is no need to set
KEY_PTRS field in journal_reclaim() at all. But in order to keep the
for_each_cache() logic consistent for now, this patch fixes the above
problem by not setting 0 KEY_PTRS of journal key, if there is no bucket
available to reclaim.
Signed-off-by: Coly Li <colyli@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 62d54f3a7f upstream.
Send operates on read only trees and expects them to never change while it
is using them. This is part of its initial design, and this expection is
due to two different reasons:
1) When it was introduced, no operations were allowed to modifiy read-only
subvolumes/snapshots (including defrag for example).
2) It keeps send from having an impact on other filesystem operations.
Namely send does not need to keep locks on the trees nor needs to hold on
to transaction handles and delay transaction commits. This ends up being
a consequence of the former reason.
However the deduplication feature was introduced later (on September 2013,
while send was introduced in July 2012) and it allowed for deduplication
with destination files that belong to read-only trees (subvolumes and
snapshots).
That means that having a send operation (either full or incremental) running
in parallel with a deduplication that has the destination inode in one of
the trees used by the send operation, can result in tree nodes and leaves
getting freed and reused while send is using them. This problem is similar
to the problem solved for the root nodes getting freed and reused when a
snapshot is made against one tree that is currenly being used by a send
operation, fixed in commits [1] and [2]. These commits explain in detail
how the problem happens and the explanation is valid for any node or leaf
that is not the root of a tree as well. This problem was also discussed
and explained recently in a thread [3].
The problem is very easy to reproduce when using send with large trees
(snapshots) and just a few concurrent deduplication operations that target
files in the trees used by send. A stress test case is being sent for
fstests that triggers the issue easily. The most common error to hit is
the send ioctl return -EIO with the following messages in dmesg/syslog:
[1631617.204075] BTRFS error (device sdc): did not find backref in send_root. inode=63292, offset=0, disk_byte=5228134400 found extent=5228134400
[1631633.251754] BTRFS error (device sdc): parent transid verify failed on 32243712 wanted 24 found 27
The first one is very easy to hit while the second one happens much less
frequently, except for very large trees (in that test case, snapshots
with 100000 files having large xattrs to get deep and wide trees).
Less frequently, at least one BUG_ON can be hit:
[1631742.130080] ------------[ cut here ]------------
[1631742.130625] kernel BUG at fs/btrfs/ctree.c:1806!
[1631742.131188] invalid opcode: 0000 [#6] SMP DEBUG_PAGEALLOC PTI
[1631742.131726] CPU: 1 PID: 13394 Comm: btrfs Tainted: G B D W 5.0.0-rc8-btrfs-next-45 #1
[1631742.132265] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
[1631742.133399] RIP: 0010:read_node_slot+0x122/0x130 [btrfs]
(...)
[1631742.135061] RSP: 0018:ffffb530021ebaa0 EFLAGS: 00010246
[1631742.135615] RAX: ffff93ac8912e000 RBX: 000000000000009d RCX: 0000000000000002
[1631742.136173] RDX: 000000000000009d RSI: ffff93ac564b0d08 RDI: ffff93ad5b48c000
[1631742.136759] RBP: ffffb530021ebb7d R08: 0000000000000001 R09: ffffb530021ebb7d
[1631742.137324] R10: ffffb530021eba70 R11: 0000000000000000 R12: ffff93ac87d0a708
[1631742.137900] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001
[1631742.138455] FS: 00007f4cdb1528c0(0000) GS:ffff93ad76a80000(0000) knlGS:0000000000000000
[1631742.139010] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1631742.139568] CR2: 00007f5acb3d0420 CR3: 000000012be3e006 CR4: 00000000003606e0
[1631742.140131] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[1631742.140719] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[1631742.141272] Call Trace:
[1631742.141826] ? do_raw_spin_unlock+0x49/0xc0
[1631742.142390] tree_advance+0x173/0x1d0 [btrfs]
[1631742.142948] btrfs_compare_trees+0x268/0x690 [btrfs]
[1631742.143533] ? process_extent+0x1070/0x1070 [btrfs]
[1631742.144088] btrfs_ioctl_send+0x1037/0x1270 [btrfs]
[1631742.144645] _btrfs_ioctl_send+0x80/0x110 [btrfs]
[1631742.145161] ? trace_sched_stick_numa+0xe0/0xe0
[1631742.145685] btrfs_ioctl+0x13fe/0x3120 [btrfs]
[1631742.146179] ? account_entity_enqueue+0xd3/0x100
[1631742.146662] ? reweight_entity+0x154/0x1a0
[1631742.147135] ? update_curr+0x20/0x2a0
[1631742.147593] ? check_preempt_wakeup+0x103/0x250
[1631742.148053] ? do_vfs_ioctl+0xa2/0x6f0
[1631742.148510] ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs]
[1631742.148942] do_vfs_ioctl+0xa2/0x6f0
[1631742.149361] ? __fget+0x113/0x200
[1631742.149767] ksys_ioctl+0x70/0x80
[1631742.150159] __x64_sys_ioctl+0x16/0x20
[1631742.150543] do_syscall_64+0x60/0x1b0
[1631742.150931] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[1631742.151326] RIP: 0033:0x7f4cd9f5add7
(...)
[1631742.152509] RSP: 002b:00007ffe91017708 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
[1631742.152892] RAX: ffffffffffffffda RBX: 0000000000000105 RCX: 00007f4cd9f5add7
[1631742.153268] RDX: 00007ffe91017790 RSI: 0000000040489426 RDI: 0000000000000007
[1631742.153633] RBP: 0000000000000007 R08: 00007f4cd9e79700 R09: 00007f4cd9e79700
[1631742.153999] R10: 00007f4cd9e799d0 R11: 0000000000000202 R12: 0000000000000003
[1631742.154365] R13: 0000555dfae53020 R14: 0000000000000000 R15: 0000000000000001
(...)
[1631742.156696] ---[ end trace 5dac9f96dcc3fd6b ]---
That BUG_ON happens because while send is using a node, that node is COWed
by a concurrent deduplication, gets freed and gets reused as a leaf (because
a transaction commit happened in between), so when it attempts to read a
slot from the extent buffer, at ctree.c:read_node_slot(), the extent buffer
contents were wiped out and it now matches a leaf (which can even belong to
some other tree now), hitting the BUG_ON(level == 0).
Fix this concurrency issue by not allowing send and deduplication to run
in parallel if both operate on the same readonly trees, returning EAGAIN
to user space and logging an exlicit warning in dmesg/syslog.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=be6821f82c3cc36e026f5afd10249988852b35ea
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6f2f0b394b54e2b159ef969a0b5274e9bbf82ff2
[3] https://lore.kernel.org/linux-btrfs/CAL3q7H7iqSEEyFaEtpRZw3cp613y+4k2Q8b4W7mweR3tZA05bQ@mail.gmail.com/
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bfc61c3626 upstream.
When finding out which inodes have references on a particular extent, done
by backref.c:iterate_extent_inodes(), from the BTRFS_IOC_LOGICAL_INO (both
v1 and v2) ioctl and from scrub we use the transaction join API to grab a
reference on the currently running transaction, since in order to give
accurate results we need to inspect the delayed references of the currently
running transaction.
However, if there is currently no running transaction, the join operation
will create a new transaction. This is inefficient as the transaction will
eventually be committed, doing unnecessary IO and introducing a potential
point of failure that will lead to a transaction abort due to -ENOSPC, as
recently reported [1].
That's because the join, creates the transaction but does not reserve any
space, so when attempting to update the root item of the root passed to
btrfs_join_transaction(), during the transaction commit, we can end up
failling with -ENOSPC. Users of a join operation are supposed to actually
do some filesystem changes and reserve space by some means, which is not
the case of iterate_extent_inodes(), it is a read-only operation for all
contextes from which it is called.
The reported [1] -ENOSPC failure stack trace is the following:
heisenberg kernel: ------------[ cut here ]------------
heisenberg kernel: BTRFS: Transaction aborted (error -28)
heisenberg kernel: WARNING: CPU: 0 PID: 7137 at fs/btrfs/root-tree.c:136 btrfs_update_root+0x22b/0x320 [btrfs]
(...)
heisenberg kernel: CPU: 0 PID: 7137 Comm: btrfs-transacti Not tainted 4.19.0-4-amd64 #1 Debian 4.19.28-2
heisenberg kernel: Hardware name: FUJITSU LIFEBOOK U757/FJNB2A5, BIOS Version 1.21 03/19/2018
heisenberg kernel: RIP: 0010:btrfs_update_root+0x22b/0x320 [btrfs]
(...)
heisenberg kernel: RSP: 0018:ffffb5448828bd40 EFLAGS: 00010286
heisenberg kernel: RAX: 0000000000000000 RBX: ffff8ed56bccef50 RCX: 0000000000000006
heisenberg kernel: RDX: 0000000000000007 RSI: 0000000000000092 RDI: ffff8ed6bda166a0
heisenberg kernel: RBP: 00000000ffffffe4 R08: 00000000000003df R09: 0000000000000007
heisenberg kernel: R10: 0000000000000000 R11: 0000000000000001 R12: ffff8ed63396a078
heisenberg kernel: R13: ffff8ed092d7c800 R14: ffff8ed64f5db028 R15: ffff8ed6bd03d068
heisenberg kernel: FS: 0000000000000000(0000) GS:ffff8ed6bda00000(0000) knlGS:0000000000000000
heisenberg kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
heisenberg kernel: CR2: 00007f46f75f8000 CR3: 0000000310a0a002 CR4: 00000000003606f0
heisenberg kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
heisenberg kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
heisenberg kernel: Call Trace:
heisenberg kernel: commit_fs_roots+0x166/0x1d0 [btrfs]
heisenberg kernel: ? _cond_resched+0x15/0x30
heisenberg kernel: ? btrfs_run_delayed_refs+0xac/0x180 [btrfs]
heisenberg kernel: btrfs_commit_transaction+0x2bd/0x870 [btrfs]
heisenberg kernel: ? start_transaction+0x9d/0x3f0 [btrfs]
heisenberg kernel: transaction_kthread+0x147/0x180 [btrfs]
heisenberg kernel: ? btrfs_cleanup_transaction+0x530/0x530 [btrfs]
heisenberg kernel: kthread+0x112/0x130
heisenberg kernel: ? kthread_bind+0x30/0x30
heisenberg kernel: ret_from_fork+0x35/0x40
heisenberg kernel: ---[ end trace 05de912e30e012d9 ]---
So fix that by using the attach API, which does not create a transaction
when there is currently no running transaction.
[1] https://lore.kernel.org/linux-btrfs/b2a668d7124f1d3e410367f587926f622b3f03a4.camel@scientia.net/
Reported-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 03628cdbc6 upstream.
During fiemap, for regular extents (non inline) we need to check if they
are shared and if they are, set the shared bit. Checking if an extent is
shared requires checking the delayed references of the currently running
transaction, since some reference might have not yet hit the extent tree
and be only in the in-memory delayed references.
However we were using a transaction join for this, which creates a new
transaction when there is no transaction currently running. That means
that two more potential failures can happen: creating the transaction and
committing it. Further, if no write activity is currently happening in the
system, and fiemap calls keep being done, we end up creating and
committing transactions that do nothing.
In some extreme cases this can result in the commit of the transaction
created by fiemap to fail with ENOSPC when updating the root item of a
subvolume tree because a join does not reserve any space, leading to a
trace like the following:
heisenberg kernel: ------------[ cut here ]------------
heisenberg kernel: BTRFS: Transaction aborted (error -28)
heisenberg kernel: WARNING: CPU: 0 PID: 7137 at fs/btrfs/root-tree.c:136 btrfs_update_root+0x22b/0x320 [btrfs]
(...)
heisenberg kernel: CPU: 0 PID: 7137 Comm: btrfs-transacti Not tainted 4.19.0-4-amd64 #1 Debian 4.19.28-2
heisenberg kernel: Hardware name: FUJITSU LIFEBOOK U757/FJNB2A5, BIOS Version 1.21 03/19/2018
heisenberg kernel: RIP: 0010:btrfs_update_root+0x22b/0x320 [btrfs]
(...)
heisenberg kernel: RSP: 0018:ffffb5448828bd40 EFLAGS: 00010286
heisenberg kernel: RAX: 0000000000000000 RBX: ffff8ed56bccef50 RCX: 0000000000000006
heisenberg kernel: RDX: 0000000000000007 RSI: 0000000000000092 RDI: ffff8ed6bda166a0
heisenberg kernel: RBP: 00000000ffffffe4 R08: 00000000000003df R09: 0000000000000007
heisenberg kernel: R10: 0000000000000000 R11: 0000000000000001 R12: ffff8ed63396a078
heisenberg kernel: R13: ffff8ed092d7c800 R14: ffff8ed64f5db028 R15: ffff8ed6bd03d068
heisenberg kernel: FS: 0000000000000000(0000) GS:ffff8ed6bda00000(0000) knlGS:0000000000000000
heisenberg kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
heisenberg kernel: CR2: 00007f46f75f8000 CR3: 0000000310a0a002 CR4: 00000000003606f0
heisenberg kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
heisenberg kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
heisenberg kernel: Call Trace:
heisenberg kernel: commit_fs_roots+0x166/0x1d0 [btrfs]
heisenberg kernel: ? _cond_resched+0x15/0x30
heisenberg kernel: ? btrfs_run_delayed_refs+0xac/0x180 [btrfs]
heisenberg kernel: btrfs_commit_transaction+0x2bd/0x870 [btrfs]
heisenberg kernel: ? start_transaction+0x9d/0x3f0 [btrfs]
heisenberg kernel: transaction_kthread+0x147/0x180 [btrfs]
heisenberg kernel: ? btrfs_cleanup_transaction+0x530/0x530 [btrfs]
heisenberg kernel: kthread+0x112/0x130
heisenberg kernel: ? kthread_bind+0x30/0x30
heisenberg kernel: ret_from_fork+0x35/0x40
heisenberg kernel: ---[ end trace 05de912e30e012d9 ]---
Since fiemap (and btrfs_check_shared()) is a read-only operation, do not do
a transaction join to avoid the overhead of creating a new transaction (if
there is currently no running transaction) and introducing a potential
point of failure when the new transaction gets committed, instead use a
transaction attach to grab a handle for the currently running transaction
if any.
Reported-by: Christoph Anton Mitterer <calestyo@scientia.net>
Link: https://lore.kernel.org/linux-btrfs/b2a668d7124f1d3e410367f587926f622b3f03a4.camel@scientia.net/
Fixes: afce772e87 ("btrfs: fix check_shared for fiemap ioctl")
CC: stable@vger.kernel.org # 4.14+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9f89d5de86 upstream.
When we set a subvolume to read-only mode we do not flush dellaloc for any
of its inodes (except if the filesystem is mounted with -o flushoncommit),
since it does not affect correctness for any subsequent operations - except
for a future send operation. The send operation will not be able to see the
delalloc data since the respective file extent items, inode item updates,
backreferences, etc, have not hit yet the subvolume and extent trees.
Effectively this means data loss, since the send stream will not contain
any data from existing delalloc. Another problem from this is that if the
writeback starts and finishes while the send operation is in progress, we
have the subvolume tree being being modified concurrently which can result
in send failing unexpectedly with EIO or hitting runtime errors, assertion
failures or hitting BUG_ONs, etc.
Simple reproducer:
$ mkfs.btrfs -f /dev/sdb
$ mount /dev/sdb /mnt
$ btrfs subvolume create /mnt/sv
$ xfs_io -f -c "pwrite -S 0xea 0 108K" /mnt/sv/foo
$ btrfs property set /mnt/sv ro true
$ btrfs send -f /tmp/send.stream /mnt/sv
$ od -t x1 -A d /mnt/sv/foo
0000000 ea ea ea ea ea ea ea ea ea ea ea ea ea ea ea ea
*
0110592
$ umount /mnt
$ mkfs.btrfs -f /dev/sdc
$ mount /dev/sdc /mnt
$ btrfs receive -f /tmp/send.stream /mnt
$ echo $?
0
$ od -t x1 -A d /mnt/sv/foo
0000000
# ---> empty file
Since this a problem that affects send only, fix it in send by flushing
dellaloc for all the roots used by the send operation before send starts
to process the commit roots.
This is a problem that affects send since it was introduced (commit
31db9f7c23 ("Btrfs: introduce BTRFS_IOC_SEND for btrfs send/receive"))
but backporting it to older kernels has some dependencies:
- For kernels between 3.19 and 4.20, it depends on commit 3cd24c6980
("btrfs: use tagged writepage to mitigate livelock of snapshot") because
the function btrfs_start_delalloc_snapshot() does not exist before that
commit. So one has to either pick that commit or replace the calls to
btrfs_start_delalloc_snapshot() in this patch with calls to
btrfs_start_delalloc_inodes().
- For kernels older than 3.19 it also requires commit e5fa8f865b
("Btrfs: ensure send always works on roots without orphans") because
it depends on the function ensure_commit_roots_uptodate() which that
commits introduced.
- No dependencies for 5.0+ kernels.
A test case for fstests follows soon.
CC: stable@vger.kernel.org # 3.19+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c2d1b3aae3 upstream.
Up until now trimming the freespace was done irrespective of what the
arguments of the FITRIM ioctl were. For example fstrim's -o/-l arguments
will be entirely ignored. Fix it by correctly handling those paramter.
This requires breaking if the found freespace extent is after the end of
the passed range as well as completing trim after trimming
fstrim_range::len bytes.
Fixes: 499f377f49 ("btrfs: iterate over unused chunk space in FITRIM")
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 537f38f019 upstream.
If a an eb fails to be read for whatever reason - it's corrupted on disk
and parent transid/key validations fail or IO for eb pages fail then
this buffer must be removed from the buffer cache. Currently the code
calls free_extent_buffer if an error occurs. Unfortunately this doesn't
achieve the desired behavior since btrfs_find_create_tree_block returns
with eb->refs == 2.
On the other hand free_extent_buffer will only decrement the refs once
leaving it added to the buffer cache radix tree. This enables later
code to look up the buffer from the cache and utilize it potentially
leading to a crash.
The correct way to free the buffer is call free_extent_buffer_stale.
This function will correctly call atomic_dec explicitly for the buffer
and subsequently call release_extent_buffer which will decrement the
final reference thus correctly remove the invalid buffer from buffer
cache. This change affects only newly allocated buffers since they have
eb->refs == 2.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=202755
Reported-by: Jungyeon <jungyeon@gatech.edu>
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 448de471cd upstream.
[BUG]
When reading a file from a fuzzed image, kernel can panic like:
BTRFS warning (device loop0): csum failed root 5 ino 270 off 0 csum 0x98f94189 expected csum 0x00000000 mirror 1
assertion failed: !memcmp_extent_buffer(b, &disk_key, offsetof(struct btrfs_leaf, items[0].key), sizeof(disk_key)), file: fs/btrfs/ctree.c, line: 2544
------------[ cut here ]------------
kernel BUG at fs/btrfs/ctree.h:3500!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
RIP: 0010:btrfs_search_slot.cold.24+0x61/0x63 [btrfs]
Call Trace:
btrfs_lookup_csum+0x52/0x150 [btrfs]
__btrfs_lookup_bio_sums+0x209/0x640 [btrfs]
btrfs_submit_bio_hook+0x103/0x170 [btrfs]
submit_one_bio+0x59/0x80 [btrfs]
extent_read_full_page+0x58/0x80 [btrfs]
generic_file_read_iter+0x2f6/0x9d0
__vfs_read+0x14d/0x1a0
vfs_read+0x8d/0x140
ksys_read+0x52/0xc0
do_syscall_64+0x60/0x210
entry_SYSCALL_64_after_hwframe+0x49/0xbe
[CAUSE]
The fuzzed image has a corrupted leaf whose first key doesn't match its
parent:
checksum tree key (CSUM_TREE ROOT_ITEM 0)
node 29741056 level 1 items 14 free 107 generation 19 owner CSUM_TREE
fs uuid 3381d111-94a3-4ac7-8f39-611bbbdab7e6
chunk uuid 9af1c3c7-2af5-488b-8553-530bd515f14c
...
key (EXTENT_CSUM EXTENT_CSUM 79691776) block 29761536 gen 19
leaf 29761536 items 1 free space 1726 generation 19 owner CSUM_TREE
leaf 29761536 flags 0x1(WRITTEN) backref revision 1
fs uuid 3381d111-94a3-4ac7-8f39-611bbbdab7e6
chunk uuid 9af1c3c7-2af5-488b-8553-530bd515f14c
item 0 key (EXTENT_CSUM EXTENT_CSUM 8798638964736) itemoff 1751 itemsize 2244
range start 8798638964736 end 8798641262592 length 2297856
When reading the above tree block, we have extent_buffer->refs = 2 in
the context:
- initial one from __alloc_extent_buffer()
alloc_extent_buffer()
|- __alloc_extent_buffer()
|- atomic_set(&eb->refs, 1)
- one being added to fs_info->buffer_radix
alloc_extent_buffer()
|- check_buffer_tree_ref()
|- atomic_inc(&eb->refs)
So if even we call free_extent_buffer() in read_tree_block or other
similar situation, we only decrease the refs by 1, it doesn't reach 0
and won't be freed right now.
The staled eb and its corrupted content will still be kept cached.
Furthermore, we have several extra cases where we either don't do first
key check or the check is not proper for all callers:
- scrub
We just don't have first key in this context.
- shared tree block
One tree block can be shared by several snapshot/subvolume trees.
In that case, the first key check for one subvolume doesn't apply to
another.
So for the above reasons, a corrupted extent buffer can sneak into the
buffer cache.
[FIX]
Call verify_level_key in read_block_for_search to do another
verification. For that purpose the function is exported.
Due to above reasons, although we can free corrupted extent buffer from
cache, we still need the check in read_block_for_search(), for scrub and
shared tree blocks.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=202755
Link: https://bugzilla.kernel.org/show_bug.cgi?id=202757
Link: https://bugzilla.kernel.org/show_bug.cgi?id=202759
Link: https://bugzilla.kernel.org/show_bug.cgi?id=202761
Link: https://bugzilla.kernel.org/show_bug.cgi?id=202767
Link: https://bugzilla.kernel.org/show_bug.cgi?id=202769
Reported-by: Yoon Jungyeon <jungyeon@gatech.edu>
CC: stable@vger.kernel.org # 4.19+
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 50b29d8f03 upstream.
Instead of removing EXT4_MOUNT_JOURNAL_CHECKSUM from s_def_mount_opt as
I assume was intended, all other options were blown away leading to
_ext4_show_options() output being incorrect.
Fixes: 1e381f60da ("ext4: do not allow journal_opts for fs w/o journal")
Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 310a997fd7 upstream.
It is never possible, that number of block groups decreases,
since only online grow is supported.
But after a growing occured, we have to zero inode tables
for just created new block groups.
Fixes: 19c5246d25 ("ext4: add new online resize interface")
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8c380ab4b7 upstream.
The reference to iloc.bh has been dropped in ext4_mark_iloc_dirty.
However, the reference is dropped again if error occurs during
ext4_handle_dirty_metadata, which may result in use-after-free bugs.
Fixes: fb265c9cb49e("ext4: add ext4_sb_bread() to disambiguate ENOMEM cases")
Signed-off-by: Pan Bian <bianpan2016@163.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 31562b954b upstream.
The sanity check in mb_find_extent() only checked that returned extent
does not extend past blocksize * 8, however it should not extend past
EXT4_CLUSTERS_PER_GROUP(sb). This can happen when clusters_per_group <
blocksize * 8 and the tail of the bitmap is not properly filled by 1s
which happened e.g. when ancient kernels have grown the filesystem.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 742b06b562 upstream.
We hit a BUG at fs/buffer.c:3057 if we detached the nbd device
before unmounting ext4 filesystem.
The typical chain of events leading to the BUG:
jbd2_write_superblock
submit_bh
submit_bh_wbc
BUG_ON(!buffer_mapped(bh));
The block device is removed and all the pages are invalidated. JBD2
was trying to write journal superblock to the block device which is
no longer present.
Fix this by checking the journal superblock's buffer head prior to
submitting.
Reported-by: Eric Ren <renzhen@linux.alibaba.com>
Signed-off-by: Jiufei Xue <jiufei.xue@linux.alibaba.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 75ddbc1fb1 upstream.
Previously, in the userspace, it was possible to use the "setterm" command
from util-linux to blank the VT console by default, using the following
command.
According to the man page,
> The force option keeps the screen blank even if a key is pressed.
It was implemented by calling TIOCL_BLANKSCREEN.
case BLANKSCREEN:
ioctlarg = TIOCL_BLANKSCREEN;
if (ioctl(STDIN_FILENO, TIOCLINUX, &ioctlarg))
warn(_("cannot force blank"));
break;
However, after Linux 4.12, this command ceased to work anymore, which is
unexpected. By inspecting the kernel source, it shows that the issue was
triggered by the side-effect from commit a4199f5eb8 ("tty: Disable
default console blanking interval").
The console blanking is implemented by function do_blank_screen() in vt.c:
"blank_state" will be initialized to "blank_normal_wait" in con_init() if
AND ONLY IF ("blankinterval" > 0). If "blankinterval" is 0, "blank_state"
will be "blank_off" (== 0), and a call to do_blank_screen() will always
abort, even if a forced blanking is required from the user by calling
TIOCL_BLANKSCREEN, the console won't be blanked.
This behavior is unexpected from a user's point-of-view, since it's not
mentioned in any documentation. The setterm man page suggests it will
always work, and the kernel comments in uapi/linux/tiocl.h says
> /* keep screen blank even if a key is pressed */
> #define TIOCL_BLANKSCREEN 14
To fix it, we simply remove the "blank_state != blank_off" check, as
pointed out by Nicolas Pitre, this check doesn't logically make sense
and it's safe to remove.
Suggested-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Fixes: a4199f5eb8 ("tty: Disable default console blanking interval")
Signed-off-by: Yifeng Li <tomli@tomli.me>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 64d14c6fe0 upstream.
When the gpio-addr-flash.c driver was merged with physmap-core.c the
code to store the current gpio_values was lost. This meant that once a
gpio was asserted it was never de-asserted. Fix this by storing the
current offset in gpio_values like the old driver used to.
Fixes: commit ba32ce95cb ("mtd: maps: Merge gpio-addr-flash.c into physmap-core.c")
Cc: <stable@vger.kernel.org>
Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2b75ebeea6 upstream.
It was observed that reads crossing 4K address boundary are failing.
This limitation is mentioned in Intel documents:
Intel(R) 9 Series Chipset Family Platform Controller Hub (PCH) Datasheet:
"5.26.3 Flash Access
Program Register Access:
* Program Register Accesses are not allowed to cross a 4 KB boundary..."
Enhanced Serial Peripheral Interface (eSPI)
Interface Base Specification (for Client and Server Platforms):
"5.1.4 Address
For other memory transactions, the address may start or end at any byte
boundary. However, the address and payload length combination must not
cross the naturally aligned address boundary of the corresponding Maximum
Payload Size. It must not cross a 4 KB address boundary."
Avoid this by splitting an operation crossing the boundary into two
operations.
Fixes: 8afda8b26d ("spi-nor: Add support for Intel SPI serial flash controller")
Cc: stable@vger.kernel.org
Reported-by: Romain Porte <romain.porte@nokia.com>
Tested-by: Pascal Fabreges <pascal.fabreges@nokia.com>
Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com>
Reviewed-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6b4814a945 upstream.
Mismatch between what is found in the Datasheets for DA9063 and DA9063L
provided by Dialog Semiconductor, and the register names provided in the
MFD registers file. The changes are for the OTP (one-time-programming)
control registers. The two naming errors are OPT instead of OTP, and
COUNT instead of CONT (i.e. control).
Cc: Stable <stable@vger.kernel.org>
Signed-off-by: Steve Twiss <stwiss.opensource@diasemi.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2f844b61db upstream.
I noticed that recently multiple systems (chromebooks) couldn't wake
from S0ix using LID or Keyboard after updating to a newer kernel. I
bisected and it turned up commit f941d3e41d ("ACPI: EC / PM: Disable
non-wakeup GPEs for suspend-to-idle"). I checked that the issue got
fixed if that commit was reverted.
I debugged and found that although PNP0C0D:00 (representing the LID)
is wake capable and should wakeup the system per the code in
acpi_wakeup_gpe_init() and in drivers/acpi/button.c:
localhost /sys # cat /proc/acpi/wakeup
Device S-state Status Sysfs node
LID0 S4 *enabled platform:PNP0C0D:00
CREC S5 *disabled platform:GOOG0004:00
*disabled platform:cros-ec-dev.1.auto
*disabled platform:cros-ec-accel.0
*disabled platform:cros-ec-accel.1
*disabled platform:cros-ec-gyro.0
*disabled platform:cros-ec-ring.0
*disabled platform:cros-usbpd-charger.2.auto
*disabled platform:cros-usbpd-logger.3.auto
D015 S3 *enabled i2c:i2c-ELAN0000:00
PENH S3 *enabled platform:PRP0001:00
XHCI S3 *enabled pci:0000:00:14.0
GLAN S4 *disabled
WIFI S3 *disabled pci:0000:00:14.3
localhost /sys #
On debugging, I found that its corresponding GPE is not being enabled.
The particular GPE's "gpe_register_info->enable_for_wake" does not
have any bits set when acpi_enable_all_wakeup_gpes() comes around to
use it. I looked at code and could not find any other code path that
should set the bits in "enable_for_wake" bitmask for the wake enabled
devices for s2idle. [I do see that it happens for S3 in
acpi_sleep_prepare()].
Thus I used the same call to enable the GPEs for wake enabled devices,
and verified that this fixes the regression I was seeing on multiple
of my devices.
[ rjw: The problem is that commit f941d3e41d ("ACPI: EC / PM:
Disable non-wakeup GPEs for suspend-to-idle") forgot to add
the acpi_enable_wakeup_devices() call for s2idle along with
acpi_enable_all_wakeup_gpes(). ]
Fixes: f941d3e41d ("ACPI: EC / PM: Disable non-wakeup GPEs for suspend-to-idle")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=203579
Signed-off-by: Rajat Jain <rajatja@google.com>
[ rjw: Subject & changelog ]
Cc: 5.0+ <stable@vger.kernel.org> # 5.0+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c3f3ce049f upstream.
The task structure is freed while get_mem_cgroup_from_mm() holds
rcu_read_lock() and dereferences mm->owner.
get_mem_cgroup_from_mm() failing fork()
---- ---
task = mm->owner
mm->owner = NULL;
free(task)
if (task) *task; /* use after free */
The fix consists in freeing the task with RCU also in the fork failure
case, exactly like it always happens for the regular exit(2) path. That
is enough to make the rcu_read_lock hold in get_mem_cgroup_from_mm()
(left side above) effective to avoid a use after free when dereferencing
the task structure.
An alternate possible fix would be to defer the delivery of the
userfaultfd contexts to the monitor until after fork() is guaranteed to
succeed. Such a change would require more changes because it would
create a strict ordering dependency where the uffd methods would need to
be called beyond the last potentially failing branch in order to be
safe. This solution as opposed only adds the dependency to common code
to set mm->owner to NULL and to free the task struct that was pointed by
mm->owner with RCU, if fork ends up failing. The userfaultfd methods
can still be called anywhere during the fork runtime and the monitor
will keep discarding orphaned "mm" coming from failed forks in userland.
This race condition couldn't trigger if CONFIG_MEMCG was set =n at build
time.
[aarcange@redhat.com: improve changelog, reduce #ifdefs per Michal]
Link: http://lkml.kernel.org/r/20190429035752.4508-1-aarcange@redhat.com
Link: http://lkml.kernel.org/r/20190325225636.11635-2-aarcange@redhat.com
Fixes: 893e26e61d ("userfaultfd: non-cooperative: Add fork() event")
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Tested-by: zhong jiang <zhongjiang@huawei.com>
Reported-by: syzbot+cbb52e396df3e565ab02@syzkaller.appspotmail.com
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: zhong jiang <zhongjiang@huawei.com>
Cc: syzbot+cbb52e396df3e565ab02@syzkaller.appspotmail.com
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e091eab028 upstream.
In some cases, ocfs2_iget() reads the data of inode, which has been
deleted for some reason. That will make the system panic. So We should
judge whether this inode has been deleted, and tell the caller that the
inode is a bad inode.
For example, the ocfs2 is used as the backed of nfs, and the client is
nfsv3. This issue can be reproduced by the following steps.
on the nfs server side,
..../patha/pathb
Step 1: The process A was scheduled before calling the function fh_verify.
Step 2: The process B is removing the 'pathb', and just completed the call
to function dput. Then the dentry of 'pathb' has been deleted from the
dcache, and all ancestors have been deleted also. The relationship of
dentry and inode was deleted through the function hlist_del_init. The
following is the call stack.
dentry_iput->hlist_del_init(&dentry->d_u.d_alias)
At this time, the inode is still in the dcache.
Step 3: The process A call the function ocfs2_get_dentry, which get the
inode from dcache. Then the refcount of inode is 1. The following is the
call stack.
nfsd3_proc_getacl->fh_verify->exportfs_decode_fh->fh_to_dentry(ocfs2_get_dentry)
Step 4: Dirty pages are flushed by bdi threads. So the inode of 'patha'
is evicted, and this directory was deleted. But the inode of 'pathb'
can't be evicted, because the refcount of the inode was 1.
Step 5: The process A keep running, and call the function
reconnect_path(in exportfs_decode_fh), which call function
ocfs2_get_parent of ocfs2. Get the block number of parent
directory(patha) by the name of ... Then read the data from disk by the
block number. But this inode has been deleted, so the system panic.
Process A Process B
1. in nfsd3_proc_getacl |
2. | dput
3. fh_to_dentry(ocfs2_get_dentry) |
4. bdi flush dirty cache |
5. ocfs2_iget |
[283465.542049] OCFS2: ERROR (device sdp): ocfs2_validate_inode_block:
Invalid dinode #580640: OCFS2_VALID_FL not set
[283465.545490] Kernel panic - not syncing: OCFS2: (device sdp): panic forced
after error
[283465.546889] CPU: 5 PID: 12416 Comm: nfsd Tainted: G W
4.1.12-124.18.6.el6uek.bug28762940v3.x86_64 #2
[283465.548382] Hardware name: VMware, Inc. VMware Virtual Platform/440BX
Desktop Reference Platform, BIOS 6.00 09/21/2015
[283465.549657] 0000000000000000 ffff8800a56fb7b8 ffffffff816e839c
ffffffffa0514758
[283465.550392] 000000000008dc20 ffff8800a56fb838 ffffffff816e62d3
0000000000000008
[283465.551056] ffff880000000010 ffff8800a56fb848 ffff8800a56fb7e8
ffff88005df9f000
[283465.551710] Call Trace:
[283465.552516] [<ffffffff816e839c>] dump_stack+0x63/0x81
[283465.553291] [<ffffffff816e62d3>] panic+0xcb/0x21b
[283465.554037] [<ffffffffa04e66b0>] ocfs2_handle_error+0xf0/0xf0 [ocfs2]
[283465.554882] [<ffffffffa04e7737>] __ocfs2_error+0x67/0x70 [ocfs2]
[283465.555768] [<ffffffffa049c0f9>] ocfs2_validate_inode_block+0x229/0x230
[ocfs2]
[283465.556683] [<ffffffffa047bcbc>] ocfs2_read_blocks+0x46c/0x7b0 [ocfs2]
[283465.557408] [<ffffffffa049bed0>] ? ocfs2_inode_cache_io_unlock+0x20/0x20
[ocfs2]
[283465.557973] [<ffffffffa049f0eb>] ocfs2_read_inode_block_full+0x3b/0x60
[ocfs2]
[283465.558525] [<ffffffffa049f5ba>] ocfs2_iget+0x4aa/0x880 [ocfs2]
[283465.559082] [<ffffffffa049146e>] ocfs2_get_parent+0x9e/0x220 [ocfs2]
[283465.559622] [<ffffffff81297c05>] reconnect_path+0xb5/0x300
[283465.560156] [<ffffffff81297f46>] exportfs_decode_fh+0xf6/0x2b0
[283465.560708] [<ffffffffa062faf0>] ? nfsd_proc_getattr+0xa0/0xa0 [nfsd]
[283465.561262] [<ffffffff810a8196>] ? prepare_creds+0x26/0x110
[283465.561932] [<ffffffffa0630860>] fh_verify+0x350/0x660 [nfsd]
[283465.562862] [<ffffffffa0637804>] ? nfsd_cache_lookup+0x44/0x630 [nfsd]
[283465.563697] [<ffffffffa063a8b9>] nfsd3_proc_getattr+0x69/0xf0 [nfsd]
[283465.564510] [<ffffffffa062cf60>] nfsd_dispatch+0xe0/0x290 [nfsd]
[283465.565358] [<ffffffffa05eb892>] ? svc_tcp_adjust_wspace+0x12/0x30
[sunrpc]
[283465.566272] [<ffffffffa05ea652>] svc_process_common+0x412/0x6a0 [sunrpc]
[283465.567155] [<ffffffffa05eaa03>] svc_process+0x123/0x210 [sunrpc]
[283465.568020] [<ffffffffa062c90f>] nfsd+0xff/0x170 [nfsd]
[283465.568962] [<ffffffffa062c810>] ? nfsd_destroy+0x80/0x80 [nfsd]
[283465.570112] [<ffffffff810a622b>] kthread+0xcb/0xf0
[283465.571099] [<ffffffff810a6160>] ? kthread_create_on_node+0x180/0x180
[283465.572114] [<ffffffff816f11b8>] ret_from_fork+0x58/0x90
[283465.573156] [<ffffffff810a6160>] ? kthread_create_on_node+0x180/0x180
Link: http://lkml.kernel.org/r/1554185919-3010-1-git-send-email-sunny.s.zhang@oracle.com
Signed-off-by: Shuning Zhang <sunny.s.zhang@oracle.com>
Reviewed-by: Joseph Qi <jiangqi903@gmail.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: piaojun <piaojun@huawei.com>
Cc: "Gang He" <ghe@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1b426bac66 upstream.
hugetlb uses a fault mutex hash table to prevent page faults of the
same pages concurrently. The key for shared and private mappings is
different. Shared keys off address_space and file index. Private keys
off mm and virtual address. Consider a private mappings of a populated
hugetlbfs file. A fault will map the page from the file and if needed
do a COW to map a writable page.
Hugetlbfs hole punch uses the fault mutex to prevent mappings of file
pages. It uses the address_space file index key. However, private
mappings will use a different key and could race with this code to map
the file page. This causes problems (BUG) for the page cache remove
code as it expects the page to be unmapped. A sample stack is:
page dumped because: VM_BUG_ON_PAGE(page_mapped(page))
kernel BUG at mm/filemap.c:169!
...
RIP: 0010:unaccount_page_cache_page+0x1b8/0x200
...
Call Trace:
__delete_from_page_cache+0x39/0x220
delete_from_page_cache+0x45/0x70
remove_inode_hugepages+0x13c/0x380
? __add_to_page_cache_locked+0x162/0x380
hugetlbfs_fallocate+0x403/0x540
? _cond_resched+0x15/0x30
? __inode_security_revalidate+0x5d/0x70
? selinux_file_permission+0x100/0x130
vfs_fallocate+0x13f/0x270
ksys_fallocate+0x3c/0x80
__x64_sys_fallocate+0x1a/0x20
do_syscall_64+0x5b/0x180
entry_SYSCALL_64_after_hwframe+0x44/0xa9
There seems to be another potential COW issue/race with this approach
of different private and shared keys as noted in commit 8382d914eb
("mm, hugetlb: improve page-fault scalability").
Since every hugetlb mapping (even anon and private) is actually a file
mapping, just use the address_space index key for all mappings. This
results in potentially more hash collisions. However, this should not
be the common case.
Link: http://lkml.kernel.org/r/20190328234704.27083-3-mike.kravetz@oracle.com
Link: http://lkml.kernel.org/r/20190412165235.t4sscoujczfhuiyt@linux-r8p5
Fixes: b5cec28d36 ("hugetlbfs: truncate_hugepages() takes a range of pages")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3499efbeed upstream.
During power management suspend the driver need to prepare the device
for the power down operation and as a last indication write to the
HOST_POWER_DOWN_EN register which signals to the hardware that
The ccree is ready for power down.
Signed-off-by: Ofir Drang <ofir.drang@arm.com>
Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f3df82b468 upstream.
We were computing the size of the import buffer based on the digest size
but the 318 and 224 byte variants use 512 and 256 bytes internal state
sizes respectfully, thus causing the import buffer to overrun.
Fix it by using the right sizes.
Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c4b22bf51b upstream.
We were handling chained scattergather lists with specialized code
needlessly as the regular sg APIs handle them just fine. The code
handling this also had an (unused) code path with a use-before-init
error, flagged by Coverity.
Remove all special handling of chained sg and leave their handling
to the regular sg APIs.
Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8968c67a82 upstream.
Prefetch-with-intent-to-write is currently part of the XADD mapping in
the AArch64 JIT and follows the kernel's implementation of atomic_add.
This may interfere with other threads executing the LDXR/STXR loop,
leading to potential starvation and fairness issues. Drop the optional
prefetch instruction.
Fixes: 85f68fe898 ("bpf, arm64: implement jiting of BPF_XADD")
Reported-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 01c8327667 upstream.
In resume from S3, HDAC HDMI codec driver dapm event callback may be
operated before HDMI codec driver turns on the display audio power
domain because of the contest between display driver and hdmi codec driver.
This patch adds the device_link between soc card device (consumer) and
hdmi codec device (supplier) to make sure the sequence is always correct.
Signed-off-by: Libin Yang <libin.yang@intel.com>
Reviewed-by: Takashi Iwai <tiwai@suse.de>
Acked-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a46eb52322 upstream.
The current algorithm allows 3 types of transfers, 16bit, 32bit and
burst. According to Realtek, 16bit transfers have a special restriction
in that it is restricted to the memory region of
0x18020000 ~ 0x18021000. This region is the memory location of the I2C
registers. The current algorithm does not uphold this restriction and
therefore fails to complete writes.
Since this has been broken for some time it likely no one is using it.
Better to simply disable the 16 bit writes. This will allow users to
properly load firmware over SPI without data corruption.
Signed-off-by: Curtis Malainey <cujomalainey@chromium.org>
Reviewed-by: Ben Zhang <benzh@chromium.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ecb2795c08 upstream.
The max98090 driver defines 3 DAPM muxes; one for the right line output
(LINMOD Mux), one for the left headphone mixer source (MIXHPLSEL Mux)
and one for the right headphone mixer source (MIXHPRSEL Mux). The same
bit is used for the mux as well as the DAPM enable, and although the mux
can be correctly configured, after playback has completed, the mux will
be reset during the disable phase. This is preventing the state of these
muxes from being saved and restored correctly on system reboot. Fix this
by marking these muxes as SND_SOC_NOPM.
Note this has been verified this on the Tegra124 Nyan Big which features
the MAX98090 codec.
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 80a5052db7 upstream.
On the System76 Gazelle (gaze14), there is a headset microphone input
attached to 0x1a that does not have a jack detect. In order to get it
working, the pin configuration needs to be set correctly, and the
ALC269_FIXUP_HEADSET_MODE_NO_HP_MIC fixup needs to be applied. This is
identical to the patch already applied for the System76 Darter Pro
(darp5).
Signed-off-by: Jeremy Soller <jeremy@system76.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 607ca3bd22 upstream.
Let EAPD turn on after set pin output.
[ NOTE: This change is supposed to reduce the possible click noises at
(runtime) PM resume. The functionality should be same (i.e. the
verbs are executed correctly) no matter which order is, so this
should be safe to apply for all codecs -- tiwai ]
Signed-off-by: Kailang Yang <kailang@realtek.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7f641e26a6 upstream.
On the machines with AMD GPU or Nvidia GPU, we often meet this issue:
after s3, there are 4 HDMI/DP audio devices in the gnome-sound-setting
even there is no any monitors plugged.
When this problem happens, we check the /proc/asound/cardX/eld#N.M, we
will find the monitor_present=1, eld_valid=0.
The root cause is BIOS or GPU driver makes the PRESENCE valid even no
monitor plugged, and of course the driver will not get the valid
eld_data subsequently.
In this situation, we should not report the jack_plugged event, to do
so, let us change the function hdmi_present_sense_via_verbs(). In this
function, it reads the pin_sense via snd_hda_pin_sense(), after
calling this function, the jack_dirty is 0, and before exiting
via_verbs(), we change the shadow pin_sense according to both
monitor_present and eld_valid, then in the snd_hda_jack_report_sync(),
since the jack_dirty is still 0, it will report jack event according
to this modified shadow pin_sense.
After this change, the driver will not report Jack_is_plugged event
through hdmi_present_sense_via_verbs() if monitor_present is 1 and
eld_valid is 0.
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8c2e6728c2 upstream.
The driver will check the monitor presence when resuming from suspend,
starting poll or interrupt triggers. In these 3 situations, the
jack_dirty will be set to 1 first, then the hda_jack.c reads the
pin_sense from register, after reading the register, the jack_dirty
will be set to 0. But hdmi_repoll_work() is enabled in these 3
situations, It will read the pin_sense a couple of times subsequently,
since the jack_dirty is 0 now, It does not read the register anymore,
instead it uses the shadow pin_sense which is read at the first time.
It is meaningless to check the shadow pin_sense a couple of times,
we need to read the register to check the real plugging state, so
we set the jack_dirty to 1 in the hdmi_repoll_work().
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cb5173594d upstream.
In parse_audio_selector_unit(), the string array 'namelist' is allocated
through kmalloc_array(), and each string pointer in this array, i.e.,
'namelist[]', is allocated through kmalloc() in the following for loop.
Then, a control instance 'kctl' is created by invoking snd_ctl_new1(). If
an error occurs during the creation process, the string array 'namelist',
including all string pointers in the array 'namelist[]', should be freed,
before the error code ENOMEM is returned. However, the current code does
not free 'namelist[]', resulting in memory leaks.
To fix the above issue, free all string pointers 'namelist[]' in a loop.
Signed-off-by: Wenwen Wang <wang6495@umn.edu>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0a49a619e7 upstream.
Some time ago, a fix was done for the sdhci-acpi driver, refer
commit 6e1c7d6103 ("mmc: sdhci-acpi: Reduce Baytrail eMMC/SD/SDIO
hangs"). The same issue was not expected to affect the sdhci-pci driver,
but there have been reports to the contrary, so make the same hardware
setting change.
This patch applies to v5.0+ but before that backports will be required.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 92cd1667d5 upstream.
ddr_signaling is set to true for DDR50 and DDR52 modes but is
not set back to false for other modes. This programs incorrect
host clock when mode change happens from DDR52/DDR50 to other
SDR or HS modes like incase of mmc_retune where it switches
from HS400 to HS DDR and then from HS DDR to HS mode and then
to HS200.
This patch fixes the ddr_signaling to set properly for non DDR
modes.
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Sowjanya Komatineni <skomatineni@nvidia.com>
Cc: stable@vger.kernel.org # v4.20 +
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 67476656fe upstream.
This move the dependency to DEV_DAX_PMEM_COMPAT such that only
if DEV_DAX_PMEM is built as module we can allow the compat support.
This allows to test the new code easily in a emulation setup where we
often build things without module support.
Cc: <stable@vger.kernel.org>
Fixes: 730926c3b0 ("device-dax: Add /sys/class/dax backwards compatibility")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4a8108b705 upstream.
If the user-provided IV needs to be aligned to the algorithm's
alignmask, then skcipher_walk_virt() copies the IV into a new aligned
buffer walk.iv. But skcipher_walk_virt() can fail afterwards, and then
if the caller unconditionally accesses walk.iv, it's a use-after-free.
xts-aes-neonbs doesn't set an alignmask, so currently it isn't affected
by this despite unconditionally accessing walk.iv. However this is more
subtle than desired, and unconditionally accessing walk.iv has caused a
real problem in other algorithms. Thus, update xts-aes-neonbs to start
checking the return value of skcipher_walk_virt().
Fixes: 1abee99eaf ("crypto: arm64/aes - reimplement bit-sliced ARM/NEON implementation for arm64")
Cc: <stable@vger.kernel.org> # v4.11+
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 767f015ea0 upstream.
If the user-provided IV needs to be aligned to the algorithm's
alignmask, then skcipher_walk_virt() copies the IV into a new aligned
buffer walk.iv. But skcipher_walk_virt() can fail afterwards, and then
if the caller unconditionally accesses walk.iv, it's a use-after-free.
arm32 xts-aes-neonbs doesn't set an alignmask, so currently it isn't
affected by this despite unconditionally accessing walk.iv. However
this is more subtle than desired, and it was actually broken prior to
the alignmask being removed by commit cc477bf645 ("crypto: arm/aes -
replace bit-sliced OpenSSL NEON code"). Thus, update xts-aes-neonbs to
start checking the return value of skcipher_walk_virt().
Fixes: e4e7f10bfc ("ARM: add support for bit sliced AES using NEON instructions")
Cc: <stable@vger.kernel.org> # v3.13+
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 418cd20e4d upstream.
Commit 307244452d ("crypto: caam - generate hash keys in-place")
fixed ahash implementation in caam/jr driver such that user-provided key
buffer is not DMA mapped, since it's not guaranteed to be DMAable.
Apply a similar fix for caam/qi2 driver.
Cc: <stable@vger.kernel.org> # v4.20+
Fixes: 3f16f6c9d6 ("crypto: caam/qi2 - add support for ahash algorithms")
Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5965dc7452 upstream.
Commits c19650d6ea ("crypto: caam - fix DMA mapping of stack memory")
and 65055e2108 ("crypto: caam - fix hash context DMA unmap size")
fixed the ahash implementation in caam/jr driver such that req->result
is not DMA-mapped (since it's not guaranteed to be DMA-able).
Apply a similar fix for ahash implementation in caam/qi2 driver.
Cc: <stable@vger.kernel.org> # v4.20+
Fixes: 3f16f6c9d6 ("crypto: caam/qi2 - add support for ahash algorithms")
Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f699594d43 upstream.
GCM instances can be created by either the "gcm" template, which only
allows choosing the block cipher, e.g. "gcm(aes)"; or by "gcm_base",
which allows choosing the ctr and ghash implementations, e.g.
"gcm_base(ctr(aes-generic),ghash-generic)".
However, a "gcm_base" instance prevents a "gcm" instance from being
registered using the same implementations. Nor will the instance be
found by lookups of "gcm". This can be used as a denial of service.
Moreover, "gcm_base" instances are never tested by the crypto
self-tests, even if there are compatible "gcm" tests.
The root cause of these problems is that instances of the two templates
use different cra_names. Therefore, fix these problems by making
"gcm_base" instances set the same cra_name as "gcm" instances, e.g.
"gcm(aes)" instead of "gcm_base(ctr(aes-generic),ghash-generic)".
This requires extracting the block cipher name from the name of the ctr
algorithm. It also requires starting to verify that the algorithms are
really ctr and ghash, not something else entirely. But it would be
bizarre if anyone were actually using non-gcm-compatible algorithms with
gcm_base, so this shouldn't break anyone in practice.
Fixes: d00aa19b50 ("[CRYPTO] gcm: Allow block cipher parameter")
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 580e295178 upstream.
The arm64 gcm-aes-ce algorithm is failing the extra crypto self-tests
following my patches to test the !may_use_simd() code paths, which
previously were untested. The problem is that in the !may_use_simd()
case, an odd number of AES blocks can be processed within each step of
the skcipher_walk. However, the skcipher_walk is being done with a
"stride" of 2 blocks and is advanced by an even number of blocks after
each step. This causes the encryption to produce the wrong ciphertext
and authentication tag, and causes the decryption to incorrectly fail.
Fix it by only processing an even number of blocks per step.
Fixes: c2b24c36e0 ("crypto: arm64/aes-gcm-ce - fix scatterwalk API violation")
Fixes: 71e52c278c ("crypto: arm64/aes-ce-gcm - operate on two input blocks at a time")
Cc: <stable@vger.kernel.org> # v4.19+
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dec3d0b107 upstream.
The ->digest() method of crct10dif-pclmul reads the current CRC value
from the shash_desc context. But this value is uninitialized, causing
crypto_shash_digest() to compute the wrong result. Fix it.
Probably this wasn't noticed before because lib/crc-t10dif.c only uses
crypto_shash_update(), not crypto_shash_digest(). Likewise,
crypto_shash_digest() is not yet tested by the crypto self-tests because
those only test the ahash API which only uses shash init/update/final.
Fixes: 0b95a7f857 ("crypto: crct10dif - Glue code to cast accelerated CRCT10DIF assembly as a crypto transform")
Cc: <stable@vger.kernel.org> # v3.11+
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 307508d107 upstream.
The ->digest() method of crct10dif-generic reads the current CRC value
from the shash_desc context. But this value is uninitialized, causing
crypto_shash_digest() to compute the wrong result. Fix it.
Probably this wasn't noticed before because lib/crc-t10dif.c only uses
crypto_shash_update(), not crypto_shash_digest(). Likewise,
crypto_shash_digest() is not yet tested by the crypto self-tests because
those only test the ahash API which only uses shash init/update/final.
This bug was detected by my patches that improve testmgr to fuzz
algorithms against their generic implementation.
Fixes: 2d31e518a4 ("crypto: crct10dif - Wrap crc_t10dif function all to use crypto transform framework")
Cc: <stable@vger.kernel.org> # v3.11+
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dcaca01a42 upstream.
skcipher_walk_done() assumes it's a bug if, after the "slow" path is
executed where the next chunk of data is processed via a bounce buffer,
the algorithm says it didn't process all bytes. Thus it WARNs on this.
However, this can happen legitimately when the message needs to be
evenly divisible into "blocks" but isn't, and the algorithm has a
'walksize' greater than the block size. For example, ecb-aes-neonbs
sets 'walksize' to 128 bytes and only supports messages evenly divisible
into 16-byte blocks. If, say, 17 message bytes remain but they straddle
scatterlist elements, the skcipher_walk code will take the "slow" path
and pass the algorithm all 17 bytes in the bounce buffer. But the
algorithm will only be able to process 16 bytes, triggering the WARN.
Fix this by just removing the WARN_ON(). Returning -EINVAL, as the code
already does, is the right behavior.
This bug was detected by my patches that improve testmgr to fuzz
algorithms against their generic implementation.
Fixes: b286d8b1a6 ("crypto: skcipher - Add skcipher walk interface")
Cc: <stable@vger.kernel.org> # v4.10+
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dcf7b48212 upstream.
The original assembly imported from OpenSSL has two copy-paste
errors in handling CTR mode. When dealing with a 2 or 3 block tail,
the code branches to the CBC decryption exit path, rather than to
the CTR exit path.
This leads to corruption of the IV, which leads to subsequent blocks
being corrupted.
This can be detected with libkcapi test suite, which is available at
https://github.com/smuellerDD/libkcapi
Reported-by: Ondrej Mosnáček <omosnacek@gmail.com>
Fixes: 5c380d623e ("crypto: vmx - Add support for VMS instructions by ASM")
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Axtens <dja@axtens.net>
Tested-by: Michael Ellerman <mpe@ellerman.id.au>
Tested-by: Ondrej Mosnacek <omosnacek@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f5a2aeb8b2 upstream.
Currently, we free the psp_master if the PLATFORM_INIT fails during the
SEV FW probe. If psp_master is freed then driver does not invoke the PSP
FW. As per SEV FW spec, there are several commands (PLATFORM_RESET,
PLATFORM_STATUS, GET_ID etc) which can be executed in the UNINIT state
We should not free the psp_master when PLATFORM_INIT fails.
Fixes: 200664d523 ("crypto: ccp: Add SEV support")
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Gary Hook <gary.hook@amd.com>
Cc: stable@vger.kernel.org # 4.19.y
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6a1faa4a43 upstream.
CCM instances can be created by either the "ccm" template, which only
allows choosing the block cipher, e.g. "ccm(aes)"; or by "ccm_base",
which allows choosing the ctr and cbcmac implementations, e.g.
"ccm_base(ctr(aes-generic),cbcmac(aes-generic))".
However, a "ccm_base" instance prevents a "ccm" instance from being
registered using the same implementations. Nor will the instance be
found by lookups of "ccm". This can be used as a denial of service.
Moreover, "ccm_base" instances are never tested by the crypto
self-tests, even if there are compatible "ccm" tests.
The root cause of these problems is that instances of the two templates
use different cra_names. Therefore, fix these problems by making
"ccm_base" instances set the same cra_name as "ccm" instances, e.g.
"ccm(aes)" instead of "ccm_base(ctr(aes-generic),cbcmac(aes-generic))".
This requires extracting the block cipher name from the name of the ctr
and cbcmac algorithms. It also requires starting to verify that the
algorithms are really ctr and cbcmac using the same block cipher, not
something else entirely. But it would be bizarre if anyone were
actually using non-ccm-compatible algorithms with ccm_base, so this
shouldn't break anyone in practice.
Fixes: 4a49b499df ("[CRYPTO] ccm: Added CCM mode")
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5e27f38f1f upstream.
If the rfc7539 template is instantiated with specific implementations,
e.g. "rfc7539(chacha20-generic,poly1305-generic)" rather than
"rfc7539(chacha20,poly1305)", then the implementation names end up
included in the instance's cra_name. This is incorrect because it then
prevents all users from allocating "rfc7539(chacha20,poly1305)", if the
highest priority implementations of chacha20 and poly1305 were selected.
Also, the self-tests aren't run on an instance allocated in this way.
Fix it by setting the instance's cra_name from the underlying
algorithms' actual cra_names, rather than from the requested names.
This matches what other templates do.
Fixes: 71ebc4d1b2 ("crypto: chacha20poly1305 - Add a ChaCha20-Poly1305 AEAD construction, RFC7539")
Cc: <stable@vger.kernel.org> # v4.2+
Cc: Martin Willi <martin@strongswan.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7aceaaef04 upstream.
The arm64 implementations of ChaCha and XChaCha are failing the extra
crypto self-tests following my patches to test the !may_use_simd() code
paths, which previously were untested. The problem is as follows:
When !may_use_simd(), the arm64 NEON implementations fall back to the
generic implementation, which uses the skcipher_walk API to iterate
through the src/dst scatterlists. Due to how the skcipher_walk API
works, walk.stride is set from the skcipher_alg actually being used,
which in this case is the arm64 NEON algorithm. Thus walk.stride is
5*CHACHA_BLOCK_SIZE, not CHACHA_BLOCK_SIZE.
This unnecessarily large stride shouldn't cause an actual problem.
However, the generic implementation computes round_down(nbytes,
walk.stride). round_down() assumes the round amount is a power of 2,
which 5*CHACHA_BLOCK_SIZE is not, so it gives the wrong result.
This causes the following case in skcipher_walk_done() to be hit,
causing a WARN() and failing the encryption operation:
if (WARN_ON(err)) {
/* unexpected case; didn't process all bytes */
err = -EINVAL;
goto finish;
}
Fix it by rounding down to CHACHA_BLOCK_SIZE instead of walk.stride.
(Or we could replace round_down() with rounddown(), but that would add a
slow division operation every time, which I think we should avoid.)
Fixes: 2fe55987b2 ("crypto: arm64/chacha - use combined SIMD/ALU routine for more speed")
Cc: <stable@vger.kernel.org> # v5.0+
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit aec286cd36 upstream.
If the user-provided IV needs to be aligned to the algorithm's
alignmask, then skcipher_walk_virt() copies the IV into a new aligned
buffer walk.iv. But skcipher_walk_virt() can fail afterwards, and then
if the caller unconditionally accesses walk.iv, it's a use-after-free.
Fix this in the LRW template by checking the return value of
skcipher_walk_virt().
This bug was detected by my patches that improve testmgr to fuzz
algorithms against their generic implementation. When the extra
self-tests were run on a KASAN-enabled kernel, a KASAN use-after-free
splat occured during lrw(aes) testing.
Fixes: c778f96bf3 ("crypto: lrw - Optimize tweak computation")
Cc: <stable@vger.kernel.org> # v4.20+
Cc: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit edaf28e996 upstream.
If the user-provided IV needs to be aligned to the algorithm's
alignmask, then skcipher_walk_virt() copies the IV into a new aligned
buffer walk.iv. But skcipher_walk_virt() can fail afterwards, and then
if the caller unconditionally accesses walk.iv, it's a use-after-free.
salsa20-generic doesn't set an alignmask, so currently it isn't affected
by this despite unconditionally accessing walk.iv. However this is more
subtle than desired, and it was actually broken prior to the alignmask
being removed by commit b62b3db76f ("crypto: salsa20-generic - cleanup
and convert to skcipher API").
Since salsa20-generic does not update the IV and does not need any IV
alignment, update it to use req->iv instead of walk.iv.
Fixes: 2407d60872 ("[CRYPTO] salsa20: Salsa20 stream cipher")
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7e92e1717e upstream.
Currently, crypto4xx CFB and OFB AES ciphers are
failing testmgr's test vectors.
|cfb-aes-ppc4xx encryption overran dst buffer on test vector 3, cfg="in-place"
|ofb-aes-ppc4xx encryption overran dst buffer on test vector 1, cfg="in-place"
This is because of a very subtile "bug" in the hardware that
gets indirectly mentioned in 18.1.3.5 Encryption/Decryption
of the hardware spec:
the OFB and CFB modes for AES are listed there as operation
modes for >>> "Block ciphers" <<<. Which kind of makes sense,
but we would like them to be considered as stream ciphers just
like the CTR mode.
To workaround this issue and stop the hardware from causing
"overran dst buffer" on crypttexts that are not a multiple
of 16 (AES_BLOCK_SIZE), we force the driver to use the scatter
buffers as the go-between.
As a bonus this patch also kills redundant pd_uinfo->num_gd
and pd_uinfo->num_sd setters since the value has already been
set before.
Cc: stable@vger.kernel.org
Fixes: f2a13e7cba ("crypto: crypto4xx - enable AES RFC3686, ECB, CFB and OFB offloads")
Signed-off-by: Christian Lamparter <chunkeey@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 25baaf8e2c upstream.
Commit 8efd972ef9 ("crypto: testmgr - support checking skcipher output IV")
caused the crypto4xx driver to produce the following error:
| ctr-aes-ppc4xx encryption test failed (wrong output IV)
| on test vector 0, cfg="in-place"
This patch fixes this by reworking the crypto4xx_setkey_aes()
function to:
- not save the iv for ECB (as per 18.2.38 CRYP0_SA_CMD_0:
"This bit mut be cleared for DES ECB mode or AES ECB mode,
when no IV is used.")
- instruct the hardware to save the generated IV for all
other modes of operations that have IV and then supply
it back to the callee in pretty much the same way as we
do it for cbc-aes already.
- make it clear that the DIR_(IN|OUT)BOUND is the important
bit that tells the hardware to encrypt or decrypt the data.
(this is cosmetic - but it hopefully prevents me from
getting confused again).
- don't load any bogus hash when we don't use any hash
operation to begin with.
Cc: stable@vger.kernel.org
Fixes: f2a13e7cba ("crypto: crypto4xx - enable AES RFC3686, ECB, CFB and OFB offloads")
Signed-off-by: Christian Lamparter <chunkeey@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6690e86be8 upstream.
Effectively reverts commit:
2c7577a758 ("sched/x86_64: Don't save flags on context switch")
Specifically because SMAP uses FLAGS.AC which invalidates the claim
that the kernel has clean flags.
In particular; while preemption from interrupt return is fine (the
IRET frame on the exception stack contains FLAGS) it breaks any code
that does synchonous scheduling, including preempt_enable().
This has become a significant issue ever since commit:
5b24a7a2aa ("Add 'unsafe' user access functions for batched accesses")
provided for means of having 'normal' C code between STAC / CLAC,
exposing the FLAGS.AC state. So far this hasn't led to trouble,
however fix it before it comes apart.
Reported-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@kernel.org
Fixes: 5b24a7a2aa ("Add 'unsafe' user access functions for batched accesses")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d263119387 upstream.
Currently, compat tasks running on arm64 can allocate memory up to
TASK_SIZE_32 (UL(0x100000000)).
This means that mmap() allocations, if we treat them as returning an
array, are not compliant with the sections 6.5.8 of the C standard
(C99) which states that: "If the expression P points to an element of
an array object and the expression Q points to the last element of the
same array object, the pointer expression Q+1 compares greater than P".
Redefine TASK_SIZE_32 to address the issue.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Jann Horn <jannh@google.com>
Cc: <stable@vger.kernel.org>
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
[will: fixed typo in comment]
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 75a19a0202 upstream.
When executing clock_gettime(), either in the vDSO or via a system call,
we need to ensure that the read of the counter register occurs within
the seqlock reader critical section. This ensures that updates to the
clocksource parameters (e.g. the multiplier) are consistent with the
counter value and therefore avoids the situation where time appears to
go backwards across multiple reads.
Extend the vDSO logic so that the seqlock critical section covers the
read of the counter register as well as accesses to the data page. Since
reads of the counter system registers are not ordered by memory barrier
instructions, introduce dependency ordering from the counter read to a
subsequent memory access so that the seqlock memory barriers apply to
the counter access in both the vDSO and the system call paths.
Cc: <stable@vger.kernel.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Link: https://lore.kernel.org/linux-arm-kernel/alpine.DEB.2.21.1902081950260.1662@nanos.tec.linutronix.de/
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f08cae2f28 upstream.
The file offset argument to the arm64 sys_mmap() implementation is
scaled from bytes to pages by shifting right by PAGE_SHIFT.
Unfortunately, the offset is passed in as a signed 'off_t' type and
therefore large offsets (i.e. with the top bit set) are incorrectly
sign-extended by the shift. This has been observed to cause false mmap()
failures when mapping GPU doorbells on an arm64 server part.
Change the type of the file offset argument to sys_mmap() from 'off_t'
to 'unsigned long' so that the shifting scales the value as expected.
Cc: <stable@vger.kernel.org>
Signed-off-by: Boyang Zhou <zhouby_cn@126.com>
[will: rewrote commit message]
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9274c78305 upstream.
The ACEPC T8 and T11 Cherry Trail Z8350 mini PCs use an AXP288 and as PCs,
rather then portables, they does not have a battery. Still for some
reason the AXP288 not only thinks there is a battery, it actually
thinks it is discharging while the PC is running, slowly going to
0% full, causing userspace to shutdown the system due to the battery
being critically low after a while.
This commit adds the ACEPC T8 and T11 to the axp288 fuel-gauge driver
blacklist, so that we stop reporting bogus battery readings on this device.
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1690852
Cc: stable@vger.kernel.org
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c3422ad5f8 upstream.
Currently there is no check on platform_get_irq() return value
in case it fails, hence never actually reporting any errors and
causing unexpected behavior when using such value as argument
for function regmap_irq_get_virq().
Fix this by adding a proper check, a message reporting any errors
and returning *pirq*
Addresses-Coverity-ID: 1443940 ("Improper use of negative value")
Fixes: 843735b788 ("power: axp288_charger: axp288 charger driver")
Cc: stable@vger.kernel.org
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 629266bf72 upstream.
The call to of_get_next_child returns a node pointer with refcount
incremented thus it must be explicitly decremented after the last
usage.
Detected by coccinelle with warnings like:
arch/arm/mach-exynos/firmware.c:201:2-8: ERROR: missing of_node_put;
acquired a node pointer with refcount incremented on line 193,
but without a corresponding object release within this function.
Cc: stable@vger.kernel.org
Signed-off-by: Wen Yang <wen.yang99@zte.com.cn>
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7bda9482e7 upstream.
Direct commands (DCMDs) are an optional feature of eMMC 5.1's command
queue engine (CQE). The Arasan eMMC 5.1 controller uses the CQHCI,
which exposes a control register bit to enable the feature.
The current implementation sets this bit unconditionally.
This patch allows to suppress the feature activation,
by specifying the property disable-cqe-dcmd.
Signed-off-by: Christoph Muellner <christoph.muellner@theobroma-systems.com>
Signed-off-by: Philipp Tomsich <philipp.tomsich@theobroma-systems.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Fixes: 84362d79f4 ("mmc: sdhci-of-arasan: Add CQHCI support for arasan,sdhci-5.1")
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9b23e1a3e8 upstream.
The name of CODEC input widget to which microphone is connected through
the "Headphone" jack is "IN12" not "IN1". This fixes microphone support
on Odroid XU3.
Cc: <stable@vger.kernel.org> # v4.14+
Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 34dc822574 upstream.
Add missing audio routing entry for the capture stream, this change
is required to fix audio recording on Odroid XU3/XU3-Lite.
Fixes: 885b005d23 ("ARM: dts: exynos: Add support for secondary DAI to Odroid XU3")
Cc: stable@vger.kernel.org
Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f3e35357cd upstream.
David Bauer reported that the VDSL modem (attached via PCIe)
on his AVM Fritz!Box 7530 was complaining about not having
enough space in the BAR. A closer inspection of the old
qcom-ipq40xx.dtsi pulled from the GL-iNet repository listed:
| qcom,pcie@80000 {
| compatible = "qcom,msm_pcie";
| reg = <0x80000 0x2000>,
| <0x99000 0x800>,
| <0x40000000 0xf1d>,
| <0x40000f20 0xa8>,
| <0x40100000 0x1000>,
| <0x40200000 0x100000>,
| <0x40300000 0xd00000>;
| reg-names = "parf", "phy", "dm_core", "elbi",
| "conf", "io", "bars";
Matching the reg-names with the listed reg leads to
<0xd00000> as the size for the "bars".
Cc: stable@vger.kernel.org
BugLink: https://www.mail-archive.com/openwrt-devel@lists.openwrt.org/msg45212.html
Reported-by: David Bauer <mail@david-bauer.net>
Signed-off-by: Christian Lamparter <chunkeey@gmail.com>
Signed-off-by: Andy Gross <agross@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a3eec13b8f upstream.
When using direct commands (DCMDs) on an RK3399, we get spurious
CQE completion interrupts for the DCMD transaction slot (#31):
[ 931.196520] ------------[ cut here ]------------
[ 931.201702] mmc1: cqhci: spurious TCN for tag 31
[ 931.206906] WARNING: CPU: 0 PID: 1433 at /usr/src/kernel/drivers/mmc/host/cqhci.c:725 cqhci_irq+0x2e4/0x490
[ 931.206909] Modules linked in:
[ 931.206918] CPU: 0 PID: 1433 Comm: irq/29-mmc1 Not tainted 4.19.8-rt6-funkadelic #1
[ 931.206920] Hardware name: Theobroma Systems RK3399-Q7 SoM (DT)
[ 931.206924] pstate: 40000005 (nZcv daif -PAN -UAO)
[ 931.206927] pc : cqhci_irq+0x2e4/0x490
[ 931.206931] lr : cqhci_irq+0x2e4/0x490
[ 931.206933] sp : ffff00000e54bc80
[ 931.206934] x29: ffff00000e54bc80 x28: 0000000000000000
[ 931.206939] x27: 0000000000000001 x26: ffff000008f217e8
[ 931.206944] x25: ffff8000f02ef030 x24: ffff0000091417b0
[ 931.206948] x23: ffff0000090aa000 x22: ffff8000f008b000
[ 931.206953] x21: 0000000000000002 x20: 000000000000001f
[ 931.206957] x19: ffff8000f02ef018 x18: ffffffffffffffff
[ 931.206961] x17: 0000000000000000 x16: 0000000000000000
[ 931.206966] x15: ffff0000090aa6c8 x14: 0720072007200720
[ 931.206970] x13: 0720072007200720 x12: 0720072007200720
[ 931.206975] x11: 0720072007200720 x10: 0720072007200720
[ 931.206980] x9 : 0720072007200720 x8 : 0720072007200720
[ 931.206984] x7 : 0720073107330720 x6 : 00000000000005a0
[ 931.206988] x5 : ffff00000860d4b0 x4 : 0000000000000000
[ 931.206993] x3 : 0000000000000001 x2 : 0000000000000001
[ 931.206997] x1 : 1bde3a91b0d4d900 x0 : 0000000000000000
[ 931.207001] Call trace:
[ 931.207005] cqhci_irq+0x2e4/0x490
[ 931.207009] sdhci_arasan_cqhci_irq+0x5c/0x90
[ 931.207013] sdhci_irq+0x98/0x930
[ 931.207019] irq_forced_thread_fn+0x2c/0xa0
[ 931.207023] irq_thread+0x114/0x1c0
[ 931.207027] kthread+0x128/0x130
[ 931.207032] ret_from_fork+0x10/0x20
[ 931.207035] ---[ end trace 0000000000000002 ]---
The driver shows this message only for the first spurious interrupt
by using WARN_ONCE(). Changing this to WARN() shows, that this is
happening quite frequently (up to once a second).
Since the eMMC 5.1 specification, where CQE and CQHCI are specified,
does not mention that spurious TCN interrupts for DCMDs can be simply
ignored, we must assume that using this feature is not working reliably.
The current implementation uses DCMD for REQ_OP_FLUSH only, and
I could not see any performance/power impact when disabling
this optional feature for RK3399.
Therefore this patch disables DCMDs for RK3399.
Signed-off-by: Christoph Muellner <christoph.muellner@theobroma-systems.com>
Signed-off-by: Philipp Tomsich <philipp.tomsich@theobroma-systems.com>
Fixes: 84362d79f4 ("mmc: sdhci-of-arasan: Add CQHCI support for arasan,sdhci-5.1")
Cc: stable@vger.kernel.org
[the corresponding code changes are queued for 5.2 so doing that as well]
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 798689e451 upstream.
This patch fixes IO domain voltage setting that is related to
audio_gpio3d4a_ms (bit 1) of GRF_IO_VSEL.
This is because RockPro64 schematics P.16 says that regulator
supplies 3.0V power to APIO5_VDD. So audio_gpio3d4a_ms bit should
be clear (means 3.0V). Power domain map is saying different thing
(supplies 1.8V) but I believe P.16 is actual connectings.
Fixes: e4f3fb4909 ("arm64: dts: rockchip: add initial dts support for Rockpro64")
Cc: stable@vger.kernel.org
Suggested-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Katsuhiro Suzuki <katsuhiro@katsuster.net>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e6f393bc93 upstream.
When a function falls through to the next function due to a compiler
bug, objtool prints some obscure warnings. For example:
drivers/regulator/core.o: warning: objtool: regulator_count_voltages()+0x95: return with modified stack frame
drivers/regulator/core.o: warning: objtool: regulator_count_voltages()+0x0: stack state mismatch: cfa1=7+32 cfa2=7+8
Instead it should be printing:
drivers/regulator/core.o: warning: objtool: regulator_supply_is_couple() falls through to next function regulator_count_voltages()
This used to work, but was broken by the following commit:
13810435b9 ("objtool: Support GCC 8's cold subfunctions")
The padding nops at the end of a function aren't actually part of the
function, as defined by the symbol table. So the 'func' variable in
validate_branch() is getting cleared to NULL when a padding nop is
encountered, breaking the fallthrough detection.
If the current instruction doesn't have a function associated with it,
just consider it to be part of the previously detected function by not
overwriting the previous value of 'func'.
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org>
Fixes: 13810435b9 ("objtool: Support GCC 8's cold subfunctions")
Link: http://lkml.kernel.org/r/546d143820cd08a46624ae8440d093dd6c902cae.1557766718.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a9e9bcb45b ]
During my rwsem testing, it was found that after a down_read(), the
reader count may occasionally become 0 or even negative. Consequently,
a writer may steal the lock at that time and execute with the reader
in parallel thus breaking the mutual exclusion guarantee of the write
lock. In other words, both readers and writer can become rwsem owners
simultaneously.
The current reader wakeup code does it in one pass to clear waiter->task
and put them into wake_q before fully incrementing the reader count.
Once waiter->task is cleared, the corresponding reader may see it,
finish the critical section and do unlock to decrement the count before
the count is incremented. This is not a problem if there is only one
reader to wake up as the count has been pre-incremented by 1. It is
a problem if there are more than one readers to be woken up and writer
can steal the lock.
The wakeup was actually done in 2 passes before the following v4.9 commit:
70800c3c0c ("locking/rwsem: Scan the wait_list for readers only once")
To fix this problem, the wakeup is now done in two passes
again. In the first pass, we collect the readers and count them.
The reader count is then fully incremented. In the second pass, the
waiter->task is then cleared and they are put into wake_q to be woken
up later.
Signed-off-by: Waiman Long <longman@redhat.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: huang ying <huang.ying.caritas@gmail.com>
Fixes: 70800c3c0c ("locking/rwsem: Scan the wait_list for readers only once")
Link: http://lkml.kernel.org/r/20190428212557.13482-2-longman@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 0916878da3 upstream.
For a single device mount using a zoned block device, the zone
information for the device is stored in the sbi->devs single entry
array and sbi->s_ndevs is set to 1. This differs from a single device
mount using a regular block device which does not allocate sbi->devs
and sets sbi->s_ndevs to 0.
However, sbi->s_devs == 0 condition is used throughout the code to
differentiate a single device mount from a multi-device mount where
sbi->s_ndevs is always larger than 1. This results in problems with
single zoned block device volumes as these are treated as multi-device
mounts but do not have the start_blk and end_blk information set. One
of the problem observed is skipping of zone discard issuing resulting in
write commands being issued to full zones or unaligned to a zone write
pointer.
Fix this problem by simply treating the cases sbi->s_ndevs == 0 (single
regular block device mount) and sbi->s_ndevs == 1 (single zoned block
device mount) in the same manner. This is done by introducing the
helper function f2fs_is_multi_device() and using this helper in place
of direct tests of sbi->s_ndevs value, improving code readability.
Fixes: 7bb3a371d1 ("f2fs: Fix zoned block device support")
Cc: <stable@vger.kernel.org>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 340d455699 upstream.
When we hot-remove a device, usually the host sends us a PCI_EJECT message,
and a PCI_BUS_RELATIONS message with bus_rel->device_count == 0.
When we execute the quick hot-add/hot-remove test, the host may not send
us the PCI_EJECT message if the guest has not fully finished the
initialization by sending the PCI_RESOURCES_ASSIGNED* message to the
host, so it's potentially unsafe to only depend on the
pci_destroy_slot() in hv_eject_device_work() because the code path
create_root_hv_pci_bus()
-> hv_pci_assign_slots()
is not called in this case. Note: in this case, the host still sends the
guest a PCI_BUS_RELATIONS message with bus_rel->device_count == 0.
In the quick hot-add/hot-remove test, we can have such a race before
the code path
pci_devices_present_work()
-> new_pcichild_device()
adds the new device into the hbus->children list, we may have already
received the PCI_EJECT message, and since the tasklet handler
hv_pci_onchannelcallback()
may fail to find the "hpdev" by calling
get_pcichild_wslot(hbus, dev_message->wslot.slot)
hv_pci_eject_device() is not called; Later, by continuing execution
create_root_hv_pci_bus()
-> hv_pci_assign_slots()
creates the slot and the PCI_BUS_RELATIONS message with
bus_rel->device_count == 0 removes the device from hbus->children, and
we end up being unable to remove the slot in
hv_pci_remove()
-> hv_pci_remove_slots()
Remove the slot in pci_devices_present_work() when the device
is removed to address this race.
pci_devices_present_work() and hv_eject_device_work() run in the
singled-threaded hbus->wq, so there is not a double-remove issue for the
slot.
We cannot offload hv_pci_eject_device() from hv_pci_onchannelcallback()
to the workqueue, because we need the hv_pci_onchannelcallback()
synchronously call hv_pci_eject_device() to poll the channel
ringbuffer to work around the "hangs in hv_compose_msi_msg()" issue
fixed in commit de0aa7b2f9 ("PCI: hv: Fix 2 hang issues in
hv_compose_msi_msg()")
Fixes: a15f2c08c7 ("PCI: hv: support reporting serial number as slot information")
Signed-off-by: Dexuan Cui <decui@microsoft.com>
[lorenzo.pieralisi@arm.com: rewritten commit log]
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 15becc2b56 upstream.
When we unload the pci-hyperv host controller driver, the host does not
send us a PCI_EJECT message.
In this case we also need to make sure the sysfs PCI slot directory is
removed, otherwise a command on a slot file eg:
"cat /sys/bus/pci/slots/2/address"
will trigger a
"BUG: unable to handle kernel paging request"
and, if we unload/reload the driver several times we would end up with
stale slot entries in PCI slot directories in /sys/bus/pci/slots/
root@localhost:~# ls -rtl /sys/bus/pci/slots/
total 0
drwxr-xr-x 2 root root 0 Feb 7 10:49 2
drwxr-xr-x 2 root root 0 Feb 7 10:49 2-1
drwxr-xr-x 2 root root 0 Feb 7 10:51 2-2
Add the missing code to remove the PCI slot and fix the current
behaviour.
Fixes: a15f2c08c7 ("PCI: hv: support reporting serial number as slot information")
Signed-off-by: Dexuan Cui <decui@microsoft.com>
[lorenzo.pieralisi@arm.com: reformatted the log]
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Stephen Hemminger <sthemmin@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 05f151a73e upstream.
When a device is created in new_pcichild_device(), hpdev->refs is set
to 2 (i.e. the initial value of 1 plus the get_pcichild()).
When we hot remove the device from the host, in a Linux VM we first call
hv_pci_eject_device(), which increases hpdev->refs by get_pcichild() and
then schedules a work of hv_eject_device_work(), so hpdev->refs becomes
3 (let's ignore the paired get/put_pcichild() in other places). But in
hv_eject_device_work(), currently we only call put_pcichild() twice,
meaning the 'hpdev' struct can't be freed in put_pcichild().
Add one put_pcichild() to fix the memory leak.
The device can also be removed when we run "rmmod pci-hyperv". On this
path (hv_pci_remove() -> hv_pci_bus_exit() -> hv_pci_devices_present()),
hpdev->refs is 2, and we do correctly call put_pcichild() twice in
pci_devices_present_work().
Fixes: 4daace0d8c ("PCI: hv: Add paravirtual PCI front-end for Microsoft Hyper-V VMs")
Signed-off-by: Dexuan Cui <decui@microsoft.com>
[lorenzo.pieralisi@arm.com: commit log rework]
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5266e58d6c upstream.
Set RI in the default kernel's MSR so that the architected way of
detecting unrecoverable machine check interrupts has a chance to work.
This is inline with the MSR setup of the rest of booke powerpc
architectures configured here.
Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
Cc: stable@vger.kernel.org
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a3f3072db6 upstream.
Without restoring the IAMR after idle, execution prevention on POWER9
with Radix MMU is overwritten and the kernel can freely execute
userspace without faulting.
This is necessary when returning from any stop state that modifies
user state, as well as hypervisor state.
To test how this fails without this patch, load the lkdtm driver and
do the following:
$ echo EXEC_USERSPACE > /sys/kernel/debug/provoke-crash/DIRECT
which won't fault, then boot the kernel with powersave=off, where it
will fault. Applying this patch will fix this.
Fixes: 3b10d0095a ("powerpc/mm/radix: Prevent kernel execution of user space")
Cc: stable@vger.kernel.org # v4.10+
Signed-off-by: Russell Currey <ruscur@russell.cc>
Reviewed-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b4010af981 ]
We have valid scenarios where ETHTOOL_LINK_MODE_Pause_BIT doesn't
need to be supported. Therefore extend the first check to check
for rx_pause being set.
See also phy_set_asym_pause:
rx=0 and tx=1: advertise asym pause only
rx=0 and tx=0: stop advertising both pause modes
The fixed commit isn't wrong, it's just the one that introduced the
linkmode bitmaps.
Fixes: 3c1bcc8614 ("net: ethernet: Convert phydev advertize and supported from u32 to link mode")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 9871a9e47a ]
When a queue(tfile) is detached through __tun_detach(), we move the
last enabled tfile to the position where detached one sit but don't
NULL out last position. We expect to synchronize the datapath through
tun->numqueues. Unfortunately, this won't work since we're lacking
sufficient mechanism to order or synchronize the access to
tun->numqueues.
To fix this, NULL out the last position during detaching and check
RCU protected tfile against NULL instead of checking tun->numqueues in
datapath.
Cc: YueHaibing <yuehaibing@huawei.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: weiyongjun (A) <weiyongjun1@huawei.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Fixes: c8d68e6be1 ("tuntap: multiqueue support")
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a35d310f03 ]
We need check if tun->numqueues is zero (e.g for the persist device)
before trying to use it for modular arithmetic.
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Fixes: 96f84061620c6("tun: add eBPF based queue selection method")
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ff6ab32bd4 ]
VRF netdev mtu isn't typically set and have an mtu of 65536. When the
link of a tunnel is set, the tunnel mtu is changed from 1480 to the link
mtu minus tunnel header. In the case of VRF netdev is the link, then the
tunnel mtu becomes 65516. So, fix it by not setting the tunnel mtu in
this case.
Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 873017af77 ]
With NET_ADMIN enabled in container, a normal user could be mapped to
root and is able to change the real device's rx filter via ioctl on
vlan, which would affect the other ptp process on host. Fix it by
disabling SIOCSHWTSTAMP in container.
Fixes: a6111d3c93 ("vlan: Pass SIOC[SG]HWTSTAMP ioctls to real device")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ff946833b7 ]
commit 517d7c79bd ("tipc: fix hanging poll() for stream sockets")
introduced a regression for clients using non-blocking sockets.
After the commit, we send EPOLLOUT event to the client even in
TIPC_CONNECTING state. This causes the subsequent send() to fail
with ENOTCONN, as the socket is still not in TIPC_ESTABLISHED state.
In this commit, we:
- improve the fix for hanging poll() by replacing sk_data_ready()
with sk_state_change() to wake up all clients.
- revert the faulty updates introduced by commit 517d7c79bd
("tipc: fix hanging poll() for stream sockets").
Fixes: 517d7c79bd ("tipc: fix hanging poll() for stream sockets")
Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@gmail.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c7e0d6cca8 ]
calling connect(AF_UNSPEC) on an already connected TCP socket is an
established way to disconnect() such socket. After commit 68741a8ada
("selinux: Fix ltp test connect-syscall failure") it no longer works
and, in the above scenario connect() fails with EAFNOSUPPORT.
Fix the above falling back to the generic/old code when the address family
is not AF_INET{4,6}, but leave the SCTP code path untouched, as it has
specific constraints.
Fixes: 68741a8ada ("selinux: Fix ltp test connect-syscall failure")
Reported-by: Tom Deseyn <tdeseyn@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5afcd14cfc ]
The old MIPS implementation of dma_cache_sync() didn't use the dev argument,
but commit c9eb6172c3 ("dma-mapping: turn dma_cache_sync into a
dma_map_ops method") changed that, so we now need to set dev.parent.
Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0504453139 ]
Current order in open:
-> Enable interrupts (macb_init_hw)
-> Enable NAPI
-> Start PHY
Sequence of RX handling:
-> RX interrupt occurs
-> Interrupt is cleared and interrupt bits disabled in handler
-> NAPI is scheduled
-> In NAPI, RX budget is processed and RX interrupts are re-enabled
With the above, on QEMU or fixed link setups (where PHY state doesn't
matter), there's a chance macb RX interrupt occurs before NAPI is
enabled. This will result in NAPI being scheduled before it is enabled.
Fix this macb open by changing the order.
Fixes: ae1f2a56d2 ("net: macb: Added support for many RX queues")
Signed-off-by: Harini Katakam <harini.katakam@xilinx.com>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d4c26eb6e7 ]
When adding more MAC addresses to a dwmac-sun8i interface, the device goes
directly in promiscuous mode.
This is due to IFF_UNICAST_FLT missing flag.
So since the hardware support unicast filtering, let's add IFF_UNICAST_FLT.
Fixes: 9f93ac8d40 ("net-next: stmmac: Add dwmac-sun8i")
Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 19e4e76806 ]
inet_iif should be used for the raw socket lookup. inet_iif considers
rt_iif which handles the case of local traffic.
As it stands, ping to a local address with the '-I <dev>' option fails
ever since ping was changed to use SO_BINDTODEVICE instead of
cmsg + IP_PKTINFO.
IPv6 works fine.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e9919a24d3 ]
With commit 153380ec4b ("fib_rules: Added NLM_F_EXCL support to
fib_nl_newrule") we now able to check if a rule already exists. But this
only works with iproute2. For other tools like libnl, NetworkManager,
it still could add duplicate rules with only NLM_F_CREATE flag, like
[localhost ~ ]# ip rule
0: from all lookup local
32766: from all lookup main
32767: from all lookup default
100000: from 192.168.7.5 lookup 5
100000: from 192.168.7.5 lookup 5
As it doesn't make sense to create two duplicate rules, let's just return
0 if the rule exists.
Fixes: 153380ec4b ("fib_rules: Added NLM_F_EXCL support to fib_nl_newrule")
Reported-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 17170e6570 ]
Fix issue with the entry indexing in the sg frame cleanup code being
off-by-1. This problem showed up when doing some basic iperf tests and
manifested in traffic coming to a halt.
Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
Acked-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit bdfad5aec1 ]
Currently error return from kobject_init_and_add() is not followed by a
call to kobject_put(). This means there is a memory leak. We currently
set p to NULL so that kfree() may be called on it as a noop, the code is
arguably clearer if we move the kfree() up closer to where it is
called (instead of after goto jump).
Remove a goto label 'err1' and jump to call to kobject_put() in error
return from kobject_init_and_add() fixing the memory leak. Re-name goto
label 'put_back' to 'err1' now that we don't use err1, following current
nomenclature (err1, err2 ...). Move call to kfree out of the error
code at bottom of function up to closer to where memory was allocated.
Add comment to clarify call to kfree().
Signed-off-by: Tobin C. Harding <tobin@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a9b8a2b39c ]
There's currently a problem with toggling arp_validate on and off with an
active-backup bond. At the moment, you can start up a bond, like so:
modprobe bonding mode=1 arp_interval=100 arp_validate=0 arp_ip_targets=192.168.1.1
ip link set bond0 down
echo "ens4f0" > /sys/class/net/bond0/bonding/slaves
echo "ens4f1" > /sys/class/net/bond0/bonding/slaves
ip link set bond0 up
ip addr add 192.168.1.2/24 dev bond0
Pings to 192.168.1.1 work just fine. Now turn on arp_validate:
echo 1 > /sys/class/net/bond0/bonding/arp_validate
Pings to 192.168.1.1 continue to work just fine. Now when you go to turn
arp_validate off again, the link falls flat on it's face:
echo 0 > /sys/class/net/bond0/bonding/arp_validate
dmesg
...
[133191.911987] bond0: Setting arp_validate to none (0)
[133194.257793] bond0: bond_should_notify_peers: slave ens4f0
[133194.258031] bond0: link status definitely down for interface ens4f0, disabling it
[133194.259000] bond0: making interface ens4f1 the new active one
[133197.330130] bond0: link status definitely down for interface ens4f1, disabling it
[133197.331191] bond0: now running without any active interface!
The problem lies in bond_options.c, where passing in arp_validate=0
results in bond->recv_probe getting set to NULL. This flies directly in
the face of commit 3fe68df97c, which says we need to set recv_probe =
bond_arp_recv, even if we're not using arp_validate. Said commit fixed
this in bond_option_arp_interval_set, but missed that we can get to that
same state in bond_option_arp_validate_set as well.
One solution would be to universally set recv_probe = bond_arp_recv here
as well, but I don't think bond_option_arp_validate_set has any business
touching recv_probe at all, and that should be left to the arp_interval
code, so we can just make things much tidier here.
Fixes: 3fe68df97c ("bonding: always set recv_probe to bond_arp_rcv in arp monitor")
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: "David S. Miller" <davem@davemloft.net>
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4f4fd7c579 upstream.
Changing state from check_state_check_result to
check_state_compute_result not only is unsafe but also doesn't
appear to serve a valid purpose. A raid6 check should only be
pushing out extra writes if doing repair and a mis-match occurs.
The stripe dev management will already try and do repair writes
for failing sectors.
This patch makes the raid6 check_state_check_result handling
work more like raid5's. If somehow too many failures for a
check, just quit the check operation for the stripe. When any
checks pass, don't try and use check_state_compute_result for
a purpose it isn't needed for and is unsafe for. Just mark the
stripe as in sync for passing its parity checks and let the
stripe dev read/write code and the bad blocks list do their
job handling I/O errors.
Repro steps from Xiao:
These are the steps to reproduce this problem:
1. redefined OPT_MEDIUM_ERR_ADDR to 12000 in scsi_debug.c
2. insmod scsi_debug.ko dev_size_mb=11000 max_luns=1 num_tgts=1
3. mdadm --create /dev/md127 --level=6 --raid-devices=5 /dev/sde1 /dev/sde2 /dev/sde3 /dev/sde5 /dev/sde6
sde is the disk created by scsi_debug
4. echo "2" >/sys/module/scsi_debug/parameters/opts
5. raid-check
It panic:
[ 4854.730899] md: data-check of RAID array md127
[ 4854.857455] sd 5:0:0:0: [sdr] tag#80 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 4854.859246] sd 5:0:0:0: [sdr] tag#80 Sense Key : Medium Error [current]
[ 4854.860694] sd 5:0:0:0: [sdr] tag#80 Add. Sense: Unrecovered read error
[ 4854.862207] sd 5:0:0:0: [sdr] tag#80 CDB: Read(10) 28 00 00 00 2d 88 00 04 00 00
[ 4854.864196] print_req_error: critical medium error, dev sdr, sector 11656 flags 0
[ 4854.867409] sd 5:0:0:0: [sdr] tag#100 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 4854.869469] sd 5:0:0:0: [sdr] tag#100 Sense Key : Medium Error [current]
[ 4854.871206] sd 5:0:0:0: [sdr] tag#100 Add. Sense: Unrecovered read error
[ 4854.872858] sd 5:0:0:0: [sdr] tag#100 CDB: Read(10) 28 00 00 00 2e e0 00 00 08 00
[ 4854.874587] print_req_error: critical medium error, dev sdr, sector 12000 flags 4000
[ 4854.876456] sd 5:0:0:0: [sdr] tag#101 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 4854.878552] sd 5:0:0:0: [sdr] tag#101 Sense Key : Medium Error [current]
[ 4854.880278] sd 5:0:0:0: [sdr] tag#101 Add. Sense: Unrecovered read error
[ 4854.881846] sd 5:0:0:0: [sdr] tag#101 CDB: Read(10) 28 00 00 00 2e e8 00 00 08 00
[ 4854.883691] print_req_error: critical medium error, dev sdr, sector 12008 flags 4000
[ 4854.893927] sd 5:0:0:0: [sdr] tag#166 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 4854.896002] sd 5:0:0:0: [sdr] tag#166 Sense Key : Medium Error [current]
[ 4854.897561] sd 5:0:0:0: [sdr] tag#166 Add. Sense: Unrecovered read error
[ 4854.899110] sd 5:0:0:0: [sdr] tag#166 CDB: Read(10) 28 00 00 00 2e e0 00 00 10 00
[ 4854.900989] print_req_error: critical medium error, dev sdr, sector 12000 flags 0
[ 4854.902757] md/raid:md127: read error NOT corrected!! (sector 9952 on sdr1).
[ 4854.904375] md/raid:md127: read error NOT corrected!! (sector 9960 on sdr1).
[ 4854.906201] ------------[ cut here ]------------
[ 4854.907341] kernel BUG at drivers/md/raid5.c:4190!
raid5.c:4190 above is this BUG_ON:
handle_parity_checks6()
...
BUG_ON(s->uptodate < disks - 1); /* We don't need Q to recover */
Cc: <stable@vger.kernel.org> # v3.16+
OriginalAuthor: David Jeffery <djeffery@redhat.com>
Cc: Xiao Ni <xni@redhat.com>
Tested-by: David Jeffery <djeffery@redhat.com>
Signed-off-by: David Jeffy <djeffery@redhat.com>
Signed-off-by: Nigel Croxon <ncroxon@redhat.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 84242b82d8 upstream.
Add missing break statement in order to prevent the code from falling
through to case 0x1025, and erroneously setting rtlhal->oem_id to
RT_CID_819X_ACER when rtlefuse->eeprom_svid is equal to 0x10EC and
none of the cases in switch (rtlefuse->eeprom_smid) match.
This bug was found thanks to the ongoing efforts to enable
-Wimplicit-fallthrough.
Fixes: 238ad2ddf3 ("rtlwifi: rtl8723ae: Clean up the hardware info routine")
Cc: stable@vger.kernel.org
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6b583201fa upstream.
It was reported on OpenWrt bug tracking system[1], that several users
are affected by the endless reboot of their routers if they configure
5GHz interface with channel 44 or 48.
The reboot loop is caused by the following excessive number of WARN_ON
messages:
WARNING: CPU: 0 PID: 0 at backports-4.19.23-1/net/mac80211/rx.c:4516
ieee80211_rx_napi+0x1fc/0xa54 [mac80211]
as the messages are being correctly emitted by the following guard:
case RX_ENC_LEGACY:
if (WARN_ON(status->rate_idx >= sband->n_bitrates))
as the rate_idx is in this case erroneously set to 251 (0xfb). This fix
simply converts previously used magic number to proper constant and
guards against substraction which is leading to the currently observed
underflow.
1. https://bugs.openwrt.org/index.php?do=details&task_id=2218
Fixes: 854783444b ("mwl8k: properly set receive status rate index on 5 GHz receive")
Cc: <stable@vger.kernel.org>
Tested-by: Eubert Bao <bunnier@gmail.com>
Reported-by: Eubert Bao <bunnier@gmail.com>
Signed-off-by: Petr Štetiar <ynezz@true.cz>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3f5edd58d0 upstream.
Fix two long-standing bugs which could potentially lead to memory
corruption or leave the port throttled until it is reopened (on weakly
ordered systems), respectively, when read-URB completion races with
unthrottle().
First, the URB must not be marked as free before processing is complete
to prevent it from being submitted by unthrottle() on another CPU.
CPU 1 CPU 2
================ ================
complete() unthrottle()
process_urb();
smp_mb__before_atomic();
set_bit(i, free); if (test_and_clear_bit(i, free))
submit_urb();
Second, the URB must be marked as free before checking the throttled
flag to prevent unthrottle() on another CPU from failing to observe that
the URB needs to be submitted if complete() sees that the throttled flag
is set.
CPU 1 CPU 2
================ ================
complete() unthrottle()
set_bit(i, free); throttled = 0;
smp_mb__after_atomic(); smp_mb();
if (throttled) if (test_and_clear_bit(i, free))
return; submit_urb();
Note that test_and_clear_bit() only implies barriers when the test is
successful. To handle the case where the URB is still in use an explicit
barrier needs to be added to unthrottle() for the second race condition.
Fixes: d83b405383 ("USB: serial: add support for multiple read urbs")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cf4f2ad6b8 upstream.
Userspace can make host function calls, called hgcm-calls through the
/dev/vboxguest device.
In this case we should not accept all hgcm-function-parameter-types, some
are only valid for in kernel calls.
This commit adds proper hgcm-function-parameter-type validation to the
ioctl for doing a hgcm-call from userspace.
Cc: stable@vger.kernel.org
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4db61c2a16 upstream.
There are two problems with WARN_ON() here. One: It is not ratelimited.
Two: We don't see which adapter was used when trying to transfer
something when already suspended. Implement a custom ratelimit once per
adapter and use dev_WARN there. This fixes both issues. Drawback is that
we don't see if multiple drivers are trying to transfer with the same
adapter while suspended. They need to be discovered one after the other
now. This is better than a high CPU load because a really broken driver
might try to resend endlessly.
Fixes: 9ac6cb5fbb ("i2c: add suspended flag and accessors for i2c adapters")
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: stable@vger.kernel.org # v5.1+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9dd3fcb0ab upstream.
When running without USERNS or PIDNS the seccomp test would hang since
it was waiting forever for the child to trigger the user notification
since it seems the glibc() abort handler makes a call to getpid(),
which would trap again. This changes the getpid filter to getppid, and
makes sure ASSERTs execute to stop from spawning the listener.
Reported-by: Shuah Khan <shuah@kernel.org>
Fixes: 6a21cc50f0 ("seccomp: add a return code to trap to userspace")
Cc: stable@vger.kernel.org # > 5.0
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Tycho Andersen <tycho@tycho.ws>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b88c504921 upstream.
The occ's extended status is checked and shown as sysfs attributes. But
the code was incorrectly checking the "status" bits.
Fix it by checking the "ext_status" bits.
Cc: stable@vger.kernel.org
Fixes: df04ced684 ("hwmon (occ): Add sysfs attributes for additional OCC data")
Signed-off-by: Lei YU <mine260309@gmail.com>
Reviewed-by: Eddie James <eajames@linux.ibm.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f7db839fcc upstream.
Some AMD based ThinkPads have a firmware bug that calling
"GBDC" will cause Bluetooth on Intel wireless cards blocked.
Probe these models by DMI match and disable Bluetooth subdriver
if specified Intel wireless card exist.
Cc: stable <stable@vger.kernel.org> # 4.14+
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ea01668f9f upstream
Adjust the last two rows in the table that display possible values when
MDS mitigation is enabled. They both were slightly innacurate.
In addition, convert the table of possible values and their descriptions
to a list-table. The simple table format uses the top border of equals
signs to determine cell width which resulted in the first column being
far too wide in comparison to the second column that contained the
majority of the text.
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e672f8bf71 upstream
Updated the documentation for a new CVE-2019-11091 Microarchitectural Data
Sampling Uncacheable Memory (MDSUM) which is a variant of
Microarchitectural Data Sampling (MDS). MDS is a family of side channel
attacks on internal buffers in Intel CPUs.
MDSUM is a special case of MSBDS, MFBDS and MLPDS. An uncacheable load from
memory that takes a fault or assist can leave data in a microarchitectural
structure that may later be observed using one of the same methods used by
MSBDS, MFBDS or MLPDS. There are no new code changes expected for MDSUM.
The existing mitigation for MDS applies to MDSUM as well.
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Tyler Hicks <tyhicks@canonical.com>
Reviewed-by: Jon Masters <jcm@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e2c3c94788 upstream
This code is only for CPUs which are affected by MSBDS, but are *not*
affected by the other two MDS issues.
For such CPUs, enabling the mds_idle_clear mitigation is enough to
mitigate SMT.
However if user boots with 'mds=off' and still has SMT enabled, we should
not report that SMT is mitigated:
$cat /sys//devices/system/cpu/vulnerabilities/mds
Vulnerable; SMT mitigated
But rather:
Vulnerable; SMT vulnerable
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Tyler Hicks <tyhicks@canonical.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20190412215118.294906495@localhost.localdomain
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7c3658b201 upstream
arch_smt_update() now has a dependency on both Spectre v2 and MDS
mitigations. Move its initial call to after all the mitigation decisions
have been made.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Tyler Hicks <tyhicks@canonical.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 65fd4cb65b upstream
Move L!TF to a separate directory so the MDS stuff can be added at the
side. Otherwise the all hardware vulnerabilites have their own top level
entry. Should have done that right away.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Jon Masters <jcm@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 22dd836508 upstream
In virtualized environments it can happen that the host has the microcode
update which utilizes the VERW instruction to clear CPU buffers, but the
hypervisor is not yet updated to expose the X86_FEATURE_MD_CLEAR CPUID bit
to guests.
Introduce an internal mitigation mode VMWERV which enables the invocation
of the CPU buffer clearing even if X86_FEATURE_MD_CLEAR is not set. If the
system has no updated microcode this results in a pointless execution of
the VERW instruction wasting a few CPU cycles. If the microcode is updated,
but not exposed to a guest then the CPU buffers will be cleared.
That said: Virtual Machines Will Eventually Receive Vaccine
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Jon Masters <jcm@redhat.com>
Tested-by: Jon Masters <jcm@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8a4b06d391 upstream
Add the sysfs reporting file for MDS. It exposes the vulnerability and
mitigation state similar to the existing files for the other speculative
hardware vulnerabilities.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Jon Masters <jcm@redhat.com>
Tested-by: Jon Masters <jcm@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bc1241700a upstream
Now that the mitigations are in place, add a command line parameter to
control the mitigation, a mitigation selector function and a SMT update
mechanism.
This is the minimal straight forward initial implementation which just
provides an always on/off mode. The command line parameter is:
mds=[full|off]
This is consistent with the existing mitigations for other speculative
hardware vulnerabilities.
The idle invocation is dynamically updated according to the SMT state of
the system similar to the dynamic update of the STIBP mitigation. The idle
mitigation is limited to CPUs which are only affected by MSBDS and not any
other variant, because the other variants cannot be mitigated on SMT
enabled systems.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Jon Masters <jcm@redhat.com>
Tested-by: Jon Masters <jcm@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 07f07f55a2 upstream
Add a static key which controls the invocation of the CPU buffer clear
mechanism on idle entry. This is independent of other MDS mitigations
because the idle entry invocation to mitigate the potential leakage due to
store buffer repartitioning is only necessary on SMT systems.
Add the actual invocations to the different halt/mwait variants which
covers all usage sites. mwaitx is not patched as it's not available on
Intel CPUs.
The buffer clear is only invoked before entering the C-State to prevent
that stale data from the idling CPU is spilled to the Hyper-Thread sibling
after the Store buffer got repartitioned and all entries are available to
the non idle sibling.
When coming out of idle the store buffer is partitioned again so each
sibling has half of it available. Now CPU which returned from idle could be
speculatively exposed to contents of the sibling, but the buffers are
flushed either on exit to user space or on VMENTER.
When later on conditional buffer clearing is implemented on top of this,
then there is no action required either because before returning to user
space the context switch will set the condition flag which causes a flush
on the return to user path.
Note, that the buffer clearing on idle is only sensible on CPUs which are
solely affected by MSBDS and not any other variant of MDS because the other
MDS variants cannot be mitigated when SMT is enabled, so the buffer
clearing on idle would be a window dressing exercise.
This intentionally does not handle the case in the acpi/processor_idle
driver which uses the legacy IO port interface for C-State transitions for
two reasons:
- The acpi/processor_idle driver was replaced by the intel_idle driver
almost a decade ago. Anything Nehalem upwards supports it and defaults
to that new driver.
- The legacy IO port interface is likely to be used on older and therefore
unaffected CPUs or on systems which do not receive microcode updates
anymore, so there is no point in adding that.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Jon Masters <jcm@redhat.com>
Tested-by: Jon Masters <jcm@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 650b68a062 upstream
CPUs which are affected by L1TF and MDS mitigate MDS with the L1D Flush on
VMENTER when updated microcode is installed.
If a CPU is not affected by L1TF or if the L1D Flush is not in use, then
MDS mitigation needs to be invoked explicitly.
For these cases, follow the host mitigation state and invoke the MDS
mitigation before VMENTER.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Jon Masters <jcm@redhat.com>
Tested-by: Jon Masters <jcm@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 04dcbdb805 upstream
Add a static key which controls the invocation of the CPU buffer clear
mechanism on exit to user space and add the call into
prepare_exit_to_usermode() and do_nmi() right before actually returning.
Add documentation which kernel to user space transition this covers and
explain why some corner cases are not mitigated.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Jon Masters <jcm@redhat.com>
Tested-by: Jon Masters <jcm@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6a9e529272 upstream
The Microarchitectural Data Sampling (MDS) vulernabilities are mitigated by
clearing the affected CPU buffers. The mechanism for clearing the buffers
uses the unused and obsolete VERW instruction in combination with a
microcode update which triggers a CPU buffer clear when VERW is executed.
Provide a inline function with the assembly magic. The argument of the VERW
instruction must be a memory operand as documented:
"MD_CLEAR enumerates that the memory-operand variant of VERW (for
example, VERW m16) has been extended to also overwrite buffers affected
by MDS. This buffer overwriting functionality is not guaranteed for the
register operand variant of VERW."
Documentation also recommends to use a writable data segment selector:
"The buffer overwriting occurs regardless of the result of the VERW
permission check, as well as when the selector is null or causes a
descriptor load segment violation. However, for lowest latency we
recommend using a selector that indicates a valid writable data
segment."
Add x86 specific documentation about MDS and the internal workings of the
mitigation.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Jon Masters <jcm@redhat.com>
Tested-by: Jon Masters <jcm@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6c4dbbd147 upstream
X86_FEATURE_MD_CLEAR is a new CPUID bit which is set when microcode
provides the mechanism to invoke a flush of various exploitable CPU buffers
by invoking the VERW instruction.
Hand it through to guests so they can adjust their mitigations.
This also requires corresponding qemu changes, which are available
separately.
[ tglx: Massaged changelog ]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Jon Masters <jcm@redhat.com>
Tested-by: Jon Masters <jcm@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e261f209c3 upstream
This bug bit is set on CPUs which are only affected by Microarchitectural
Store Buffer Data Sampling (MSBDS) and not by any other MDS variant.
This is important because the Store Buffers are partitioned between
Hyper-Threads so cross thread forwarding is not possible. But if a thread
enters or exits a sleep state the store buffer is repartitioned which can
expose data from one thread to the other. This transition can be mitigated.
That means that for CPUs which are only affected by MSBDS SMT can be
enabled, if the CPU is not affected by other SMT sensitive vulnerabilities,
e.g. L1TF. The XEON PHI variants fall into that category. Also the
Silvermont/Airmont ATOMs, but for them it's not really relevant as they do
not support SMT, but mark them for completeness sake.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Jon Masters <jcm@redhat.com>
Tested-by: Jon Masters <jcm@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ed5194c273 upstream
Microarchitectural Data Sampling (MDS), is a class of side channel attacks
on internal buffers in Intel CPUs. The variants are:
- Microarchitectural Store Buffer Data Sampling (MSBDS) (CVE-2018-12126)
- Microarchitectural Fill Buffer Data Sampling (MFBDS) (CVE-2018-12130)
- Microarchitectural Load Port Data Sampling (MLPDS) (CVE-2018-12127)
MSBDS leaks Store Buffer Entries which can be speculatively forwarded to a
dependent load (store-to-load forwarding) as an optimization. The forward
can also happen to a faulting or assisting load operation for a different
memory address, which can be exploited under certain conditions. Store
buffers are partitioned between Hyper-Threads so cross thread forwarding is
not possible. But if a thread enters or exits a sleep state the store
buffer is repartitioned which can expose data from one thread to the other.
MFBDS leaks Fill Buffer Entries. Fill buffers are used internally to manage
L1 miss situations and to hold data which is returned or sent in response
to a memory or I/O operation. Fill buffers can forward data to a load
operation and also write data to the cache. When the fill buffer is
deallocated it can retain the stale data of the preceding operations which
can then be forwarded to a faulting or assisting load operation, which can
be exploited under certain conditions. Fill buffers are shared between
Hyper-Threads so cross thread leakage is possible.
MLDPS leaks Load Port Data. Load ports are used to perform load operations
from memory or I/O. The received data is then forwarded to the register
file or a subsequent operation. In some implementations the Load Port can
contain stale data from a previous operation which can be forwarded to
faulting or assisting loads under certain conditions, which again can be
exploited eventually. Load ports are shared between Hyper-Threads so cross
thread leakage is possible.
All variants have the same mitigation for single CPU thread case (SMT off),
so the kernel can treat them as one MDS issue.
Add the basic infrastructure to detect if the current CPU is affected by
MDS.
[ tglx: Rewrote changelog ]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Jon Masters <jcm@redhat.com>
Tested-by: Jon Masters <jcm@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 36ad35131a upstream
The CPU vulnerability whitelists have some overlap and there are more
whitelists coming along.
Use the driver_data field in the x86_cpu_id struct to denote the
whitelisted vulnerabilities and combine all whitelists into one.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Jon Masters <jcm@redhat.com>
Tested-by: Jon Masters <jcm@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d8eabc3731 upstream
Greg pointed out that speculation related bit defines are using (1 << N)
format instead of BIT(N). Aside of that (1 << N) is wrong as it should use
1UL at least.
Clean it up.
[ Josh Poimboeuf: Fix tools build ]
Reported-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Jon Masters <jcm@redhat.com>
Tested-by: Jon Masters <jcm@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 03110a5cb2 upstream.
Our futex implementation makes use of LDXR/STXR loops to perform atomic
updates to user memory from atomic context. This can lead to latency
problems if we end up spinning around the LL/SC sequence at the expense
of doing something useful.
Rework our futex atomic operations so that we return -EAGAIN if we fail
to update the futex word after 128 attempts. The core futex code will
reschedule if necessary and we'll try again later.
Cc: <stable@kernel.org>
Fixes: 6170a97460 ("arm64: Atomic operations")
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6b4f4bc9cb upstream.
Some futex() operations, including FUTEX_WAKE_OP, require the kernel to
perform an atomic read-modify-write of the futex word via the userspace
mapping. These operations are implemented by each architecture in
arch_futex_atomic_op_inuser() and futex_atomic_cmpxchg_inatomic(), which
are called in atomic context with the relevant hash bucket locks held.
Although these routines may return -EFAULT in response to a page fault
generated when accessing userspace, they are expected to succeed (i.e.
return 0) in all other cases. This poses a problem for architectures
that do not provide bounded forward progress guarantees or fairness of
contended atomic operations and can lead to starvation in some cases.
In these problematic scenarios, we must return back to the core futex
code so that we can drop the hash bucket locks and reschedule if
necessary, much like we do in the case of a page fault.
Allow architectures to return -EAGAIN from their implementations of
arch_futex_atomic_op_inuser() and futex_atomic_cmpxchg_inatomic(), which
will cause the core futex code to reschedule if necessary and return
back to the architecture code later on.
Cc: <stable@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 476c7e1d34 upstream.
The problem here is that addr can be I3C_BROADCAST_ADDR (126). That
means we're shifting by (126 * 2) % 64 which is 60. The
I3C_ADDR_SLOT_STATUS_MASK is an enum which is an unsigned int in GCC
so shifts greater than 31 are undefined.
Fixes: 3a379bbcea ("i3c: Add core I3C infrastructure")
Cc: <stable@vger.kernel.org>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0efa3334d6 upstream.
Currently in sst_dsp_new() if we get an error return from sst_dma_new()
we just print an error message and then still complete the function
successfully. This means that we are trying to run without sst->dma
properly set up, which will result in NULL pointer dereference when
sst->dma is later used. This was happening for me in
sst_dsp_dma_get_channel():
struct sst_dma *dma = dsp->dma;
...
dma->ch = dma_request_channel(mask, dma_chan_filter, dsp);
This resulted in:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
IP: sst_dsp_dma_get_channel+0x4f/0x125 [snd_soc_sst_firmware]
Fix this by adding proper error handling for the case where we fail to
set up DMA.
This change only affects Haswell and Broadwell systems. Baytrail
systems explicilty opt-out of DMA via sst->pdata->resindex_dma_base
being set to -1.
Signed-off-by: Ross Zwisler <zwisler@google.com>
Cc: stable@vger.kernel.org
Acked-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3ae62a4209 upstream.
This is the UAS version of
747668dbc0
usb-storage: Set virt_boundary_mask to avoid SG overflows
We are not as likely to be vulnerable as storage, as it is unlikelier
that UAS is run over a controller without native support for SG,
but the issue exists.
The issue has been existing since the inception of the driver.
Fixes: 115bb1ffa5 ("USB: Add UAS driver")
Signed-off-by: Oliver Neukum <oneukum@suse.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 62611abc8f upstream.
The code path for Macs goes through bcm_apple_get_resources(), which
skips over the code that sets up the regulator supplies. As a result,
the call to regulator_bulk_enable() / regulator_bulk_disable() results
in a NULL pointer dereference.
This was reported on the kernel.org Bugzilla, bug 202963.
Unbreak Broadcom Bluetooth support on Intel Macs by checking if the
supplies were set up before enabling or disabling them.
The same does not need to be done for the clocks, as the common clock
framework API checks for NULL pointers.
Fixes: 75d11676dc ("Bluetooth: hci_bcm: Add support for regulator supplies")
Cc: <stable@vger.kernel.org> # 5.0.x
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Tested-by: Imre Kaloz <kaloz@openwrt.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ba8f5289f7 upstream.
l2cap_le_flowctl_init was reseting the tx_credits which works only for
outgoing connection since that set the tx_credits on the response, for
incoming connections that was not the case which leaves the channel
without any credits causing it to be suspended.
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: stable@vger.kernel.org # 4.20+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a1616a5ac9 upstream.
Struct ca is copied from userspace. It is not checked whether the "name"
field is NULL terminated, which allows local users to obtain potentially
sensitive information from kernel stack memory, via a HIDPCONNADD command.
This vulnerability is similar to CVE-2011-1079.
Signed-off-by: Young Xiao <YangX92@hotmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2137490f21 upstream.
This patch fixes issue reported by some of the customers, who discovered
that after cable pull scenario the devices disappear and path seems to
remain in blocked state. Once the device reappears, driver does not seem to
update path to online. This issue appears because of the defer flag
creating race condition where the same session reappears. This patch fixes
this issue by indicating SCSI-ML of device lost when
qlt_free_session_done() is called from qlt_unreg_sess().
Fixes: 41dc529a46 ("qla2xxx: Improve RSCN handling in driver")
Signed-off-by: Quinn Tran <qtran@marvell.com>
Cc: stable@vger.kernel.org #4.19
Signed-off-by: Himanshu Madhani <hmadhani@marvell.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5cbdae10bf upstream.
Commit e6f77540c0 ("scsi: qla2xxx: Fix an integer overflow in sysfs
code") incorrectly set 'optrom_region_size' to 'start+size', which can
overflow option-rom boundaries when 'start' is non-zero. Continue setting
optrom_region_size to the proper adjusted value of 'size'.
Fixes: e6f77540c0 ("scsi: qla2xxx: Fix an integer overflow in sysfs code")
Cc: stable@vger.kernel.org
Signed-off-by: Andrew Vasquez <andrewv@marvell.com>
Signed-off-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e7f7b6f38a upstream.
Change snprintf to scnprintf. There are generally two cases where using
snprintf causes problems.
1) Uses of size += snprintf(buf, SIZE - size, fmt, ...)
In this case, if snprintf would have written more characters than what the
buffer size (SIZE) is, then size will end up larger than SIZE. In later
uses of snprintf, SIZE - size will result in a negative number, leading
to problems. Note that size might already be too large by using
size = snprintf before the code reaches a case of size += snprintf.
2) If size is ultimately used as a length parameter for a copy back to user
space, then it will potentially allow for a buffer overflow and information
disclosure when size is greater than SIZE. When the size is used to index
the buffer directly, we can have memory corruption. This also means when
size = snprintf... is used, it may also cause problems since size may become
large. Copying to userspace is mitigated by the HARDENED_USERCOPY kernel
configuration.
The solution to these issues is to use scnprintf which returns the number of
characters actually written to the buffer, so the size variable will never
exceed SIZE.
Signed-off-by: Silvio Cesare <silvio.cesare@gmail.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: James Smart <james.smart@broadcom.com>
Cc: Dick Kennedy <dick.kennedy@broadcom.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a84014e1db upstream.
When enabling ARCH_SUNXI from allnoconfig, SUNXI_SRAM is enabled, but
not REGMAP_MMIO, so the kernel fails to link with an undefined reference
to __devm_regmap_init_mmio_clk. Select REGMAP_MMIO, as suggested in
drivers/base/regmap/Kconfig.
This creates the following dependency loop:
drivers/of/Kconfig:68: symbol OF_IRQ depends on IRQ_DOMAIN
kernel/irq/Kconfig:63: symbol IRQ_DOMAIN is selected by REGMAP
drivers/base/regmap/Kconfig:7: symbol REGMAP default is visible depending on REGMAP_MMIO
drivers/base/regmap/Kconfig:39: symbol REGMAP_MMIO is selected by SUNXI_SRAM
drivers/soc/sunxi/Kconfig:4: symbol SUNXI_SRAM is selected by USB_MUSB_SUNXI
drivers/usb/musb/Kconfig:63: symbol USB_MUSB_SUNXI depends on GENERIC_PHY
drivers/phy/Kconfig:7: symbol GENERIC_PHY is selected by PHY_BCM_NS_USB3
drivers/phy/broadcom/Kconfig:29: symbol PHY_BCM_NS_USB3 depends on MDIO_BUS
drivers/net/phy/Kconfig:12: symbol MDIO_BUS default is visible depending on PHYLIB
drivers/net/phy/Kconfig:181: symbol PHYLIB is selected by ARC_EMAC_CORE
drivers/net/ethernet/arc/Kconfig:18: symbol ARC_EMAC_CORE is selected by ARC_EMAC
drivers/net/ethernet/arc/Kconfig:24: symbol ARC_EMAC depends on OF_IRQ
To fix the circular dependency, make USB_MUSB_SUNXI select GENERIC_PHY
instead of depending on it. This matches the use of GENERIC_PHY by all
but two other drivers.
Cc: <stable@vger.kernel.org> # 4.19
Fixes: 5828729beb ("soc: sunxi: export a regmap for EMAC clock reg on A64")
Signed-off-by: Samuel Holland <samuel@sholland.org>
Acked-by: Maxime Ripard <maxime.ripard@bootlin.com>
Signed-off-by: Bin Liu <b-liu@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c8afd03486 upstream.
Commit 48402cee68 ("ACPI / LPSS: Resume BYT/CHT I2C controllers from
resume_noirq") makes acpi_lpss_{suspend_late,resume_early}() bail early
on BYT/CHT as resume_from_noirq is set.
This means that on resume from hibernate dw_i2c_plat_resume() doesn't get
called by the restore_early callback, acpi_lpss_resume_early(). Instead it
should be called by the restore_noirq callback matching how things are done
when resume_from_noirq is set and we are doing a regular resume.
Change the restore_noirq callback to acpi_lpss_resume_noirq so that
dw_i2c_plat_resume() gets properly called when resume_from_noirq is set
and we are resuming from hibernate.
Likewise also change the poweroff_noirq callback so that
dw_i2c_plat_suspend gets called properly.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202139
Fixes: 48402cee68 ("ACPI / LPSS: Resume BYT/CHT I2C controllers from resume_noirq")
Reported-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Cc: 4.20+ <stable@vger.kernel.org> # 4.20+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8db8256345 upstream.
The frequency calculation was based on the current(max) frequency of the
CPU. However for low frequency, the value used was already the parent
frequency divided by a factor of 2.
Instead of using this frequency, this fix directly get the frequency from
the parent clock.
Fixes: 92ce45fb87 ("cpufreq: Add DVFS support for Armada 37xx")
Cc: <stable@vger.kernel.org>
Reported-by: Christian Neubert <christian.neubert.86@gmail.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 447ccb4e08 upstream.
The of_device_id table needs to be registered as module alias in order
for automatic module loading to pick the kernel module based on the
DeviceTree compatible. So add MODULE_DEVICE_TABLE() to make this happen.
Fixes: e13d757279 ("iio: adc: Add QCOM SPMI PMIC5 ADC driver")
Cc: stable@vger.kernel.org
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 747668dbc0 upstream.
The USB subsystem has always had an unusual requirement for its
scatter-gather transfers: Each element in the scatterlist (except the
last one) must have a length divisible by the bulk maxpacket size.
This is a particular issue for USB mass storage, which uses SG lists
created by the block layer rather than setting up its own.
So far we have scraped by okay because most devices have a logical
block size of 512 bytes or larger, and the bulk maxpacket sizes for
USB 2 and below are all <= 512. However, USB 3 has a bulk maxpacket
size of 1024. Since the xhci-hcd driver includes native SG support,
this hasn't mattered much. But now people are trying to use USB-3
mass storage devices with USBIP, and the vhci-hcd driver currently
does not have full SG support.
The result is an overflow error, when the driver attempts to implement
an SG transfer of 63 512-byte blocks as a single
3584-byte (7 blocks) transfer followed by seven 4096-byte (8 blocks)
transfers. The device instead sends 31 1024-byte packets followed by
a 512-byte packet, and this overruns the first SG buffer.
Ideally this would be fixed by adding better SG support to vhci-hcd.
But for now it appears we can work around the problem by
asking the block layer to respect the maxpacket limitation, through
the use of the virt_boundary_mask.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: Seth Bollinger <Seth.Bollinger@digi.com>
Tested-by: Seth Bollinger <Seth.Bollinger@digi.com>
CC: Ming Lei <tom.leiming@gmail.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 764478f411 upstream.
Fix two long-standing bugs which could potentially lead to memory
corruption or leave the port throttled until it is reopened (on weakly
ordered systems), respectively, when read-URB completion races with
unthrottle().
First, the URB must not be marked as free before processing is complete
to prevent it from being submitted by unthrottle() on another CPU.
CPU 1 CPU 2
================ ================
complete() unthrottle()
process_urb();
smp_mb__before_atomic();
set_bit(i, free); if (test_and_clear_bit(i, free))
submit_urb();
Second, the URB must be marked as free before checking the throttled
flag to prevent unthrottle() on another CPU from failing to observe that
the URB needs to be submitted if complete() sees that the throttled flag
is set.
CPU 1 CPU 2
================ ================
complete() unthrottle()
set_bit(i, free); throttled = 0;
smp_mb__after_atomic(); smp_mb();
if (throttled) if (test_and_clear_bit(i, free))
return; submit_urb();
Note that test_and_clear_bit() only implies barriers when the test is
successful. To handle the case where the URB is still in use an explicit
barrier needs to be added to unthrottle() for the second race condition.
Also note that the first race was fixed by 36e59e0d70 ("cdc-acm: fix
race between callback and unthrottle") back in 2015, but the bug was
reintroduced a year later.
Fixes: 1aba579f3c ("cdc-acm: handle read pipe errors")
Fixes: 088c64f812 ("USB: cdc-acm: re-write read processing")
Signed-off-by: Johan Hovold <johan@kernel.org>
Acked-by: Oliver Neukum <oneukum@suse.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 804dbee1e4 upstream.
The F81232 will use interrupt worker to handle MSR change.
This patch will fix the issue that interrupt work should stop
in close() and suspend().
This also fixes line-status events being disabled after a suspend cycle
until the port is re-opened.
Signed-off-by: Ji-Ze Hong (Peter Hong) <hpeter+linux_kernel@gmail.com>
[ johan: amend commit message ]
Fixes: 87fe5adcd8 ("USB: f81232: implement read IIR/MSR with endpoint")
Cc: stable <stable@vger.kernel.org> # 4.1
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 47830c1127 upstream.
Since moving the message buffers off the stack, the dynamically
allocated get-prop-descriptor request buffer is incorrectly sized due to
using the pointer rather than request-struct size when creating the
operation.
Fortunately, the pointer size is always larger than this one-byte
request, but this could still cause trouble on the remote end due to the
unexpected message size.
Fixes: 9d15134d06 ("greybus: power_supply: rework get descriptors")
Cc: stable <stable@vger.kernel.org> # 4.9
Cc: Rui Miguel Silva <rui.silva@linaro.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Reviewed-by: Rui Miguel Silva <rmfrfs@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f0996bc297 upstream.
Building lib/ubsan.c with gcc-9 results in a ton of nasty warnings like
this one:
lib/ubsan.c warning: conflicting types for built-in function
‘__ubsan_handle_negate_overflow’; expected ‘void(void *, void *)’ [-Wbuiltin-declaration-mismatch]
The kernel's declarations of __ubsan_handle_*() often uses 'unsigned
long' types in parameters while GCC these parameters as 'void *' types,
hence the mismatch.
Fix this by using 'void *' to match GCC's declarations.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Fixes: c6d308534a ("UBSAN: run-time undefined behavior sanity checker")
Cc: <stable@vger.kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull perf fixes from Ingo Molnar:
"I'd like to apologize for this very late pull request: I was dithering
through the week whether to send the fixes, and then yesterday Jiri's
crash fix for a regression introduced in this cycle clearly marked
perf/urgent as 'must merge now'.
Most of the commits are tooling fixes, plus there's three kernel fixes
via four commits:
- race fix in the Intel PEBS code
- fix an AUX bug and roll back a previous attempt
- fix AMD family 17h generic HW cache-event perf counters
The largest diffstat contribution comes from the AMD fix - a new event
table is introduced, which is a fairly low risk change but has a large
linecount"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel: Fix race in intel_pmu_disable_event()
perf/x86/intel/pt: Remove software double buffering PMU capability
perf/ring_buffer: Fix AUX software double buffering
perf tools: Remove needless asm/unistd.h include fixing build in some places
tools arch uapi: Copy missing unistd.h headers for arc, hexagon and riscv
tools build: Add -ldl to the disassembler-four-args feature test
perf cs-etm: Always allocate memory for cs_etm_queue::prev_packet
perf cs-etm: Don't check cs_etm_queue::prev_packet validity
perf report: Report OOM in status line in the GTK UI
perf bench numa: Add define for RUSAGE_THREAD if not present
tools lib traceevent: Change tag string for error
perf annotate: Fix build on 32 bit for BPF annotation
tools uapi x86: Sync vmx.h with the kernel
perf bpf: Return value with unlocking in perf_env__find_btf()
MAINTAINERS: Include vendor specific files under arch/*/events/*
perf/x86/amd: Update generic hardware cache events for Family 17h
Pull scheduler fix from Ingo Molnar:
"Fix a kobject memory leak in the cpufreq code"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/cpufreq: Fix kobject memleak
Pull x86 fix from Ingo Molnar:
"Disable function tracing during early SME setup to fix a boot crash on
SME-enabled kernels running distro kernels (some of which have
function tracing enabled)"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm/mem_encrypt: Disable all instrumentation for early SME setup
Pull vfs fixes from Al Viro:
- a couple of ->i_link use-after-free fixes
- regression fix for wrong errno on absent device name in mount(2)
(this cycle stuff)
- ancient UFS braino in large GID handling on Solaris UFS images (bogus
cut'n'paste from large UID handling; wrong field checked to decide
whether we should look at old (16bit) or new (32bit) field)
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
ufs: fix braino in ufs_get_inode_gid() for solaris UFS flavour
Abort file_remove_privs() for non-reg. files
[fix] get rid of checking for absent device name in vfs_get_tree()
apparmorfs: fix use-after-free on symlink traversal
securityfs: fix use-after-free on symlink traversal
New race in x86_pmu_stop() was introduced by replacing the
atomic __test_and_clear_bit() of cpuc->active_mask by separate
test_bit() and __clear_bit() calls in the following commit:
3966c3feca ("x86/perf/amd: Remove need to check "running" bit in NMI handler")
The race causes panic for PEBS events with enabled callchains:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
...
RIP: 0010:perf_prepare_sample+0x8c/0x530
Call Trace:
<NMI>
perf_event_output_forward+0x2a/0x80
__perf_event_overflow+0x51/0xe0
handle_pmi_common+0x19e/0x240
intel_pmu_handle_irq+0xad/0x170
perf_event_nmi_handler+0x2e/0x50
nmi_handle+0x69/0x110
default_do_nmi+0x3e/0x100
do_nmi+0x11a/0x180
end_repeat_nmi+0x16/0x1a
RIP: 0010:native_write_msr+0x6/0x20
...
</NMI>
intel_pmu_disable_event+0x98/0xf0
x86_pmu_stop+0x6e/0xb0
x86_pmu_del+0x46/0x140
event_sched_out.isra.97+0x7e/0x160
...
The event is configured to make samples from PEBS drain code,
but when it's disabled, we'll go through NMI path instead,
where data->callchain will not get allocated and we'll crash:
x86_pmu_stop
test_bit(hwc->idx, cpuc->active_mask)
intel_pmu_disable_event(event)
{
...
intel_pmu_pebs_disable(event);
...
EVENT OVERFLOW -> <NMI>
intel_pmu_handle_irq
handle_pmi_common
TEST PASSES -> test_bit(bit, cpuc->active_mask))
perf_event_overflow
perf_prepare_sample
{
...
if (!(sample_type & __PERF_SAMPLE_CALLCHAIN_EARLY))
data->callchain = perf_callchain(event, regs);
CRASH -> size += data->callchain->nr;
}
</NMI>
...
x86_pmu_disable_event(event)
}
__clear_bit(hwc->idx, cpuc->active_mask);
Fixing this by disabling the event itself before setting
off the PEBS bit.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Arcari <darcari@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Lendacky Thomas <Thomas.Lendacky@amd.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Fixes: 3966c3feca ("x86/perf/amd: Remove need to check "running" bit in NMI handler")
Link: http://lkml.kernel.org/r/20190504151556.31031-1-jolsa@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull powerpc fix from Michael Ellerman:
"One regression fix.
Changes we merged to STRICT_KERNEL_RWX on 32-bit were causing crashes
under load on some machines depending on memory layout.
Thanks to Christophe Leroy"
* tag 'powerpc-5.1-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/32s: Fix BATs setting with CONFIG_STRICT_KERNEL_RWX
Pull KVM fixes from Paolo Bonzini:
- PPC and ARM bugfixes from submaintainers
- Fix old Windows versions on AMD (recent regression)
- Fix old Linux versions on processors without EPT
- Fixes for LAPIC timer optimizations
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (21 commits)
KVM: nVMX: Fix size checks in vmx_set_nested_state
KVM: selftests: make hyperv_cpuid test pass on AMD
KVM: lapic: Check for in-kernel LAPIC before deferencing apic pointer
KVM: fix KVM_CLEAR_DIRTY_LOG for memory slots of unaligned size
x86/kvm/mmu: reset MMU context when 32-bit guest switches PAE
KVM: x86: Whitelist port 0x7e for pre-incrementing %rip
Documentation: kvm: fix dirty log ioctl arch lists
KVM: VMX: Move RSB stuffing to before the first RET after VM-Exit
KVM: arm/arm64: Don't emulate virtual timers on userspace ioctls
kvm: arm: Skip stage2 huge mappings for unaligned ipa backed by THP
KVM: arm/arm64: Ensure vcpu target is unset on reset failure
KVM: lapic: Convert guest TSC to host time domain if necessary
KVM: lapic: Allow user to disable adaptive tuning of timer advancement
KVM: lapic: Track lapic timer advance per vCPU
KVM: lapic: Disable timer advancement if adaptive tuning goes haywire
x86: kvm: hyper-v: deal with buggy TLB flush requests from WS2012
KVM: x86: Consider LAPIC TSC-Deadline timer expired if deadline too short
KVM: PPC: Book3S: Protect memslots while validating user address
KVM: PPC: Book3S HV: Perserve PSSCR FAKE_SUSPEND bit on guest exit
KVM: arm/arm64: vgic-v3: Retire pending interrupts on disabling LPIs
...
Pull i2c fixes from Wolfram Sang:
"I2C driver bugfixes and a MAINTAINERS update for you"
* 'i2c/for-current-fixed' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: Prevent runtime suspend of adapter when Host Notify is required
i2c: synquacer: fix enumeration of slave devices
MAINTAINERS: friendly takeover of i2c-gpio driver
i2c: designware: ratelimit 'transfer when suspended' errors
i2c: imx: correct the method of getting private data in notifier_call
Pull drm fix from Dave Airlie:
"Just a single qxl revert"
* tag 'drm-fixes-2019-05-03' of git://anongit.freedesktop.org/drm/drm:
Revert "drm/qxl: drop prime import/export callbacks"
Pull clk fixes from Stephen Boyd:
"Two fixes for the NKMP clks on Allwinner SoCs, a locking fix for
clkdev where we forgot to hold a lock while iterating a list that can
change, and finally a build fix that adds some stubs for clk APIs that
are used by devfreq drivers on platforms without the clk APIs"
* tag 'clk-fixes-for-linus' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: Add missing stubs for a few functions
clkdev: Hold clocks_mutex while iterating clocks list
clk: sunxi-ng: nkmp: Explain why zero width check is needed
clk: sunxi-ng: nkmp: Avoid GENMASK(-1, 0)
Pull sound fixes from Takashi Iwai:
"A few stable fixes at this round.
The USB Line6 audio fixes are a bit large, but they are rather trivial
and pretty much device-specific, so should be safe to apply at this
late stage. Ditto for other HD-audio quirks"
* tag 'sound-5.1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/realtek - Apply the fixup for ASUS Q325UAR
ALSA: line6: use dynamic buffers
ALSA: hda/realtek - Fixed Dell AIO speaker noise
ALSA: hda/realtek - Add new Dell platform for headset mode
Pull perf/urgent fixes from Arnaldo Carvalho de Melo:
tools UAPI:
Arnaldo Carvalho de Melo:
- Sync x86's vmx.h with the kernel.
- Copy missing unistd.h headers for arc, hexagon and riscv, fixing
a reported build regression on the ARC 32-bit architecture.
perf bench numa:
Arnaldo Carvalho de Melo:
- Add define for RUSAGE_THREAD if not present, fixing the build on the
ARC architecture when only zlib and libnuma are present.
perf BPF:
Arnaldo Carvalho de Melo:
- The disassembler-four-args feature test needs -ldl on distros such as
Mageia 7.
Bo YU:
- Fix unlocking on success in perf_env__find_btf(), detected with
the coverity tool.
libtraceevent:
Leo Yan:
- Change misleading hard coded 'trace-cmd' string in error messages.
ARM hardware tracing:
Leo Yan:
- Always allocate memory for cs_etm_queue::prev_packet, fixing a segfault
when processing CoreSight perf data.
perf annotate:
Thadeu Lima de Souza Cascardo:
- Fix build on 32 bit for BPF.
perf report:
Thomas Richter:
- Report OOM in status line in the GTK UI.
core libs:
- Remove needless asm/unistd.h that, used with sys/syscall.h ended
up redefining the syscalls defines in environments such as the
ARC arch when using uClibc.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Since those were introduced in:
c8ce48f065 ("asm-generic: Make time32 syscall numbers optional")
But when the asm-generic/unistd.h was sync'ed with tools/ in:
1a787fc5ba ("tools headers uapi: Sync copy of asm-generic/unistd.h with the kernel sources")
I forgot to copy the files for the architectures that define
__ARCH_WANT_TIME32_SYSCALLS, so the perf build was breaking there, as
reported by Vineet Gupta for the ARC architecture.
After updating my ARC container to use the glibc based toolchain + cross
building libnuma, zlib and elfutils, I finally managed to reproduce the
problem and verify that this now is fixed and will not regress as will
be tested before each pull req sent upstream.
Reported-by: Vineet Gupta <Vineet.Gupta1@synopsys.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Jiri Olsa <jolsa@kernel.org>
CC: linux-snps-arc@lists.infradead.org
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/r/20190426193531.GC28586@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Thomas Backlund reported that the perf build was failing on the Mageia 7
distro, that is because it uses:
cat /tmp/build/perf/feature/test-disassembler-four-args.make.output
/usr/bin/ld: /usr/lib64/libbfd.a(plugin.o): in function `try_load_plugin':
/home/iurt/rpmbuild/BUILD/binutils-2.32/objs/bfd/../../bfd/plugin.c:243:
undefined reference to `dlopen'
/usr/bin/ld:
/home/iurt/rpmbuild/BUILD/binutils-2.32/objs/bfd/../../bfd/plugin.c:271:
undefined reference to `dlsym'
/usr/bin/ld:
/home/iurt/rpmbuild/BUILD/binutils-2.32/objs/bfd/../../bfd/plugin.c:256:
undefined reference to `dlclose'
/usr/bin/ld:
/home/iurt/rpmbuild/BUILD/binutils-2.32/objs/bfd/../../bfd/plugin.c:246:
undefined reference to `dlerror'
as we allow dynamic linking and loading
Mageia 7 uses these linker flags:
$ rpm --eval %ldflags
-Wl,--as-needed -Wl,--no-undefined -Wl,-z,relro -Wl,-O1 -Wl,--build-id -Wl,--enable-new-dtags
So add -ldl to this feature LDFLAGS.
Reported-by: Thomas Backlund <tmb@mageia.org>
Tested-by: Thomas Backlund <tmb@mageia.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Link: https://lkml.kernel.org/r/20190501173158.GC21436@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Robert Walker reported a segmentation fault is observed when process
CoreSight trace data; this issue can be easily reproduced by the command
'perf report --itrace=i1000i' for decoding tracing data.
If neither the 'b' flag (synthesize branches events) nor 'l' flag
(synthesize last branch entries) are specified to option '--itrace',
cs_etm_queue::prev_packet will not been initialised. After merging the
code to support exception packets and sample flags, there introduced a
number of uses of cs_etm_queue::prev_packet without checking whether it
is valid, for these cases any accessing to uninitialised prev_packet
will cause crash.
As cs_etm_queue::prev_packet is used more widely now and it's already
hard to follow which functions have been called in a context where the
validity of cs_etm_queue::prev_packet has been checked, this patch
always allocates memory for cs_etm_queue::prev_packet.
Reported-by: Robert Walker <robert.walker@arm.com>
Suggested-by: Robert Walker <robert.walker@arm.com>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Tested-by: Robert Walker <robert.walker@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Suzuki K Poulouse <suzuki.poulose@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Fixes: 7100b12cf4 ("perf cs-etm: Generate branch sample for exception packet")
Fixes: 24fff5eb2b ("perf cs-etm: Avoid stale branch samples when flush packet")
Link: http://lkml.kernel.org/r/20190428083228.20246-1-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
An -ENOMEM error is not reported in the GTK GUI. Instead this error
message pops up on the screen:
[root@m35lp76 perf]# ./perf report -i perf.data.error68-1
Processing events... [974K/3M]
Error:failed to process sample
0xf4198 [0x8]: failed to process type: 68
However when I use the same perf.data file with --stdio it works:
[root@m35lp76 perf]# ./perf report -i perf.data.error68-1 --stdio \
| head -12
# Total Lost Samples: 0
#
# Samples: 76K of event 'cycles'
# Event count (approx.): 99056160000
#
# Overhead Command Shared Object Symbol
# ........ ............... ................. .........
#
8.81% find [kernel.kallsyms] [k] ftrace_likely_update
8.74% swapper [kernel.kallsyms] [k] ftrace_likely_update
8.34% sshd [kernel.kallsyms] [k] ftrace_likely_update
2.19% kworker/u512:1- [kernel.kallsyms] [k] ftrace_likely_update
The sample precentage is a bit low.....
The GUI always fails in the FINISHED_ROUND event (68) and does not
indicate the reason why.
When happened is the following. Perf report calls a lot of functions and
down deep when a FINISHED_ROUND event is processed, these functions are
called:
perf_session__process_event()
+ perf_session__process_user_event()
+ process_finished_round()
+ ordered_events__flush()
+ __ordered_events__flush()
+ do_flush()
+ ordered_events__deliver_event()
+ perf_session__deliver_event()
+ machine__deliver_event()
+ perf_evlist__deliver_event()
+ process_sample_event()
+ hist_entry_iter_add() --> only called in GUI case!!!
+ hist_iter__report__callback()
+ symbol__inc_addr_sample()
Now this functions runs out of memory and
returns -ENOMEM. This is reported all the way up
until function
perf_session__process_event() returns to its caller, where -ENOMEM is
changed to -EINVAL and processing stops:
if ((skip = perf_session__process_event(session, event, head)) < 0) {
pr_err("%#" PRIx64 " [%#x]: failed to process type: %d\n",
head, event->header.size, event->header.type);
err = -EINVAL;
goto out_err;
}
This occurred in the FINISHED_ROUND event when it has to process some
10000 entries and ran out of memory.
This patch indicates the root cause and displays it in the status line
of ther perf report GUI.
Output before (on GUI status line):
0xf4198 [0x8]: failed to process type: 68
Output after:
0xf4198 [0x8]: failed to process type: 68 [not enough memory]
Committer notes:
the 'skip' variable needs to be initialized to -EINVAL, so that when the
size is less than sizeof(struct perf_event_attr) we avoid this valid
compiler warning:
util/session.c: In function ‘perf_session__process_events’:
util/session.c:1936:7: error: ‘skip’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
err = skip;
~~~~^~~~~~
util/session.c:1874:6: note: ‘skip’ was declared here
s64 skip;
^~~~
cc1: all warnings being treated as errors
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Reviewed-by: Jiri Olsa <jolsa@redhat.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Link: http://lkml.kernel.org/r/20190423105303.61683-1-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
While cross building perf to the ARC architecture on a fedora 30 host,
we were failing with:
CC /tmp/build/perf/bench/numa.o
bench/numa.c: In function ‘worker_thread’:
bench/numa.c:1261:12: error: ‘RUSAGE_THREAD’ undeclared (first use in this function); did you mean ‘SIGEV_THREAD’?
getrusage(RUSAGE_THREAD, &rusage);
^~~~~~~~~~~~~
SIGEV_THREAD
bench/numa.c:1261:12: note: each undeclared identifier is reported only once for each function it appears in
[perfbuilder@60d5802468f6 perf]$ /arc_gnu_2019.03-rc1_prebuilt_uclibc_le_archs_linux_install/bin/arc-linux-gcc --version | head -1
arc-linux-gcc (ARCv2 ISA Linux uClibc toolchain 2019.03-rc1) 8.3.1 20190225
[perfbuilder@60d5802468f6 perf]$
Trying to reproduce a report by Vineet, I noticed that, with just
cross-built zlib and numactl libraries, I ended up with the above
failure.
So, since RUSAGE_THREAD is available as a define, check for that and
numactl libraries, I ended up with the above failure.
So, since RUSAGE_THREAD is available as a define in the system headers,
check if it is defined in the 'perf bench numa' sources and define it if
not.
Now it builds and I have to figure out if the problem reported by Vineet
only takes place if we have libelf or some other library available.
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: linux-snps-arc@lists.infradead.org
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Vineet Gupta <Vineet.Gupta1@synopsys.com>
Link: https://lkml.kernel.org/n/tip-2wb4r1gir9xrevbpq7qp0amk@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The traceevent lib is used by the perf tool, and when executing
perf test -v 6
it outputs error log on the ARM64 platform:
running test 33 '*:*'trace-cmd: No such file or directory
[...]
trace-cmd: Invalid argument
The trace event parsing code originally came from trace-cmd so it keeps
the tag string "trace-cmd" for errors, this easily introduces the
impression that the perf tool launches trace-cmd command for trace event
parsing, but in fact the related parsing is accomplished by the
traceevent lib.
This patch changes the tag string to "libtraceevent" so that we can
avoid confusion and let users to more easily connect the error with
traceevent lib.
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20190424013802.27569-1-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick up the changes from:
2b27924bb1 ("KVM: nVMX: always use early vmcs check when EPT is disabled")
That causes this object in the tools/perf build process to be rebuilt:
CC /tmp/build/perf/arch/x86/util/kvm-stat.o
But it isn't using VMX_ABORT_ prefixed constants, so no change in
behaviour.
This silences this perf build warning:
Warning: Kernel ABI header at 'tools/arch/x86/include/uapi/asm/vmx.h' differs from latest version at 'arch/x86/include/uapi/asm/vmx.h'
diff -u tools/arch/x86/include/uapi/asm/vmx.h arch/x86/include/uapi/asm/vmx.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lkml.kernel.org/n/tip-bjbo3zc0r8i8oa0udpvftya6@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull networking fixes from David Miller:
1) Out of bounds access in xfrm IPSEC policy unlink, from Yue Haibing.
2) Missing length check for esp4 UDP encap, from Sabrina Dubroca.
3) Fix byte order of RX STBC access in mac80211, from Johannes Berg.
4) Inifnite loop in bpftool map create, from Alban Crequy.
5) Register mark fix in ebpf verifier after pkt/null checks, from Paul
Chaignon.
6) Properly use rcu_dereference_sk_user_data in L2TP code, from Eric
Dumazet.
7) Buffer overrun in marvell phy driver, from Andrew Lunn.
8) Several crash and statistics handling fixes to bnxt_en driver, from
Michael Chan and Vasundhara Volam.
9) Several fixes to the TLS layer from Jakub Kicinski (copying negative
amounts of data in reencrypt, reencrypt frag copying, blind nskb->sk
NULL deref, etc).
10) Several UDP GRO fixes, from Paolo Abeni and Eric Dumazet.
11) PID/UID checks on ipv6 flow labels are inverted, from Willem de
Bruijn.
12) Use after free in l2tp, from Eric Dumazet.
13) IPV6 route destroy races, also from Eric Dumazet.
14) SCTP state machine can erroneously run recursively, fix from Xin
Long.
15) Adjust AF_PACKET msg_name length checks, add padding bytes if
necessary. From Willem de Bruijn.
16) Preserve skb_iif, so that forwarded packets have consistent values
even if fragmentation is involved. From Shmulik Ladkani.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (69 commits)
udp: fix GRO packet of death
ipv6: A few fixes on dereferencing rt->from
rds: ib: force endiannes annotation
selftests: fib_rule_tests: print the result and return 1 if any tests failed
ipv4: ip_do_fragment: Preserve skb_iif during fragmentation
net/tls: avoid NULL pointer deref on nskb->sk in fallback
selftests: fib_rule_tests: Fix icmp proto with ipv6
packet: validate msg_namelen in send directly
packet: in recvmsg msg_name return at least sizeof sockaddr_ll
sctp: avoid running the sctp state machine recursively
stmmac: pci: Fix typo in IOT2000 comment
Documentation: fix netdev-FAQ.rst markup warning
ipv6: fix races in ip6_dst_destroy()
l2ip: fix possible use-after-free
appletalk: Set error code if register_snap_client failed
net: dsa: bcm_sf2: fix buffer overflow doing set_rxnfc
rxrpc: Fix net namespace cleanup
ipv6/flowlabel: wait rcu grace period before put_pid()
vrf: Use orig netdev to count Ip6InNoRoutes and a fresh route lookup when sending dest unreach
tcp: add sanity tests in tcp_add_backlog()
...
Pull io_uring fixes from Jens Axboe:
"This is mostly io_uring fixes/tweaks. Most of these were actually done
in time for the last -rc, but I wanted to ensure that everything
tested out great before including them. The code delta looks larger
than it really is, as it's mostly just comment additions/changes.
Outside of the comment additions/changes, this is mostly removal of
unnecessary barriers. In all, this pull request contains:
- Tweak to how we handle errors at submission time. We now post a
completion event if the error occurs on behalf of an sqe, instead
of returning it through the system call. If the error happens
outside of a specific sqe, we return the error through the system
call. This makes it nicer to use and makes the "normal" use case
behave the same as the offload cases. (me)
- Fix for a missing req reference drop from async context (me)
- If an sqe is submitted with RWF_NOWAIT, don't punt it to async
context. Return -EAGAIN directly, instead of using it as a hint to
do async punt. (Stefan)
- Fix notes on barriers (Stefan)
- Remove unnecessary barriers (Stefan)
- Fix potential double free of memory in setup error (Mark)
- Further improve sq poll CPU validation (Mark)
- Fix page allocation warning and leak on buffer registration error
(Mark)
- Fix iov_iter_type() for new no-ref flag (Ming)
- Fix a case where dio doesn't honor bio no-page-ref (Ming)"
* tag 'for-linus-20190502' of git://git.kernel.dk/linux-block:
io_uring: avoid page allocation warnings
iov_iter: fix iov_iter_type
block: fix handling for BIO_NO_PAGE_REF
io_uring: drop req submit reference always in async punt
io_uring: free allocated io_memory once
io_uring: fix SQPOLL cpu validation
io_uring: have submission side sqe errors post a cqe
io_uring: remove unnecessary barrier after unsetting IORING_SQ_NEED_WAKEUP
io_uring: remove unnecessary barrier after incrementing dropped counter
io_uring: remove unnecessary barrier before reading SQ tail
io_uring: remove unnecessary barrier after updating SQ head
io_uring: remove unnecessary barrier before reading cq head
io_uring: remove unnecessary barrier before wq_has_sleeper
io_uring: fix notes on barriers
io_uring: fix handling SQEs requesting NOWAIT
Multiple users have reported their Synaptics touchpad has stopped
working between v4.20.1 and v4.20.2 when using SMBus interface.
The culprit for this appeared to be commit c5eb119007 ("PCI / PM: Allow
runtime PM without callback functions") that fixed the runtime PM for
i2c-i801 SMBus adapter. Those Synaptics touchpad are using i2c-i801
for SMBus communication and testing showed they are able to get back
working by preventing the runtime suspend of adapter.
Normally when i2c-i801 SMBus adapter transmits with the client it resumes
before operation and autosuspends after.
However, if client requires SMBus Host Notify protocol, what those
Synaptics touchpads do, then the host adapter must not go to runtime
suspend since then it cannot process incoming SMBus Host Notify commands
the client may send.
Fix this by keeping I2C/SMBus adapter active in case client requires
Host Notify.
Reported-by: Keijo Vaara <ferdasyn@rocketmail.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=203297
Fixes: c5eb119007 ("PCI / PM: Allow runtime PM without callback functions")
Cc: stable@vger.kernel.org # v4.20+
Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Keijo Vaara <ferdasyn@rocketmail.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
The I2C host driver for SynQuacer fails to populate the of_node and
ACPI companion fields of the struct i2c_adapter it instantiates,
resulting in enumeration of the subordinate I2C bus to fail.
Fixes: 0d676a6c43 ("i2c: add support for Socionext SynQuacer I2C controller")
Cc: <stable@vger.kernel.org> # v4.19+
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
I haven't heard from Haavard in years despite putting him to the CC list for
i2c-gpio related mails. Since I was doing the work on this driver for a while
now, let me take official maintainership, so it will be more clear to users.
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Acked-by: Haavard Skinnemoen <hskinnemoen@gmail.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Add a new amd_hw_cache_event_ids_f17h assignment structure set
for AMD families 17h and above, since a lot has changed. Specifically:
L1 Data Cache
The data cache access counter remains the same on Family 17h.
For DC misses, PMCx041's definition changes with Family 17h,
so instead we use the L2 cache accesses from L1 data cache
misses counter (PMCx060,umask=0xc8).
For DC hardware prefetch events, Family 17h breaks compatibility
for PMCx067 "Data Prefetcher", so instead, we use PMCx05a "Hardware
Prefetch DC Fills."
L1 Instruction Cache
PMCs 0x80 and 0x81 (32-byte IC fetches and misses) are backward
compatible on Family 17h.
For prefetches, we remove the erroneous PMCx04B assignment which
counts how many software data cache prefetch load instructions were
dispatched.
LL - Last Level Cache
Removing PMCs 7D, 7E, and 7F assignments, as they do not exist
on Family 17h, where the last level cache is L3. L3 counters
can be accessed using the existing AMD Uncore driver.
Data TLB
On Intel machines, data TLB accesses ("dTLB-loads") are assigned
to counters that count load/store instructions retired. This
is inconsistent with instruction TLB accesses, where Intel
implementations report iTLB misses that hit in the STLB.
Ideally, dTLB-loads would count higher level dTLB misses that hit
in lower level TLBs, and dTLB-load-misses would report those
that also missed in those lower-level TLBs, therefore causing
a page table walk. That would be consistent with instruction
TLB operation, remove the redundancy between dTLB-loads and
L1-dcache-loads, and prevent perf from producing artificially
low percentage ratios, i.e. the "0.01%" below:
42,550,869 L1-dcache-loads
41,591,860 dTLB-loads
4,802 dTLB-load-misses # 0.01% of all dTLB cache hits
7,283,682 L1-dcache-stores
7,912,392 dTLB-stores
310 dTLB-store-misses
On AMD Families prior to 17h, the "Data Cache Accesses" counter is
used, which is slightly better than load/store instructions retired,
but still counts in terms of individual load/store operations
instead of TLB operations.
So, for AMD Families 17h and higher, this patch assigns "dTLB-loads"
to a counter for L1 dTLB misses that hit in the L2 dTLB, and
"dTLB-load-misses" to a counter for L1 DTLB misses that caused
L2 DTLB misses and therefore also caused page table walks. This
results in a much more accurate view of data TLB performance:
60,961,781 L1-dcache-loads
4,601 dTLB-loads
963 dTLB-load-misses # 20.93% of all dTLB cache hits
Note that for all AMD families, data loads and stores are combined
in a single accesses counter, so no 'L1-dcache-stores' are reported
separately, and stores are counted with loads in 'L1-dcache-loads'.
Also note that the "% of all dTLB cache hits" string is misleading
because (a) "dTLB cache": although TLBs can be considered caches for
page tables, in this context, it can be misinterpreted as data cache
hits because the figures are similar (at least on Intel), and (b) not
all those loads (technically accesses) technically "hit" at that
hardware level. "% of all dTLB accesses" would be more clear/accurate.
Instruction TLB
On Intel machines, 'iTLB-loads' measure iTLB misses that hit in the
STLB, and 'iTLB-load-misses' measure iTLB misses that also missed in
the STLB and completed a page table walk.
For AMD Family 17h and above, for 'iTLB-loads' we replace the
erroneous instruction cache fetches counter with PMCx084
"L1 ITLB Miss, L2 ITLB Hit".
For 'iTLB-load-misses' we still use PMCx085 "L1 ITLB Miss,
L2 ITLB Miss", but set a 0xff umask because without it the event
does not get counted.
Branch Predictor (BPU)
PMCs 0xc2 and 0xc3 continue to be valid across all AMD Families.
Node Level Events
Family 17h does not have a PMCx0e9 counter, and corresponding counters
have not been made available publicly, so for now, we mark them as
unsupported for Families 17h and above.
Reference:
"Open-Source Register Reference For AMD Family 17h Processors Models 00h-2Fh"
Released 7/17/2018, Publication #56255, Revision 3.03:
https://www.amd.com/system/files/TechDocs/56255_OSRR.pdf
[ mingo: tidied up the line breaks. ]
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Cc: <stable@vger.kernel.org> # v4.9+
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Liška <mliska@suse.cz>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Pu Wen <puwen@hygon.cn>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Lendacky <Thomas.Lendacky@amd.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Cc: linux-perf-users@vger.kernel.org
Fixes: e40ed1542d ("perf/x86: Add perf support for AMD family-17h processors")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There are two problems with dev_err() here. One: It is not ratelimited.
Two: We don't see which driver tried to transfer something with a
suspended adapter. Switch to dev_WARN_ONCE to fix both issues. Drawback
is that we don't see if multiple drivers are trying to transfer while
suspended. They need to be discovered one after the other now. This is
better than a high CPU load because a really broken driver might try to
resend endlessly.
Link: https://bugs.archlinux.org/task/62391
Fixes: 2751541555 ("i2c: designware: Do not allow i2c_dw_xfer() calls while suspended")
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reported-by: skidnik <skidnik@gmail.com>
Acked-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Tested-by: skidnik <skidnik@gmail.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Pull PCI fixes from Bjorn Helgaas:
"I apologize for sending these so late in the cycle. We went back and
forth about how to deal with the unexpected logging of intentional
link state changes and finally decided to just config them off by
default.
PCI fixes:
- Stop ignoring "pci=disable_acs_redir" parameter (Logan Gunthorpe)
- Use shared MSI/MSI-X vector for Link Bandwidth Management (Alex
Williamson)
- Add Kconfig option for Link Bandwidth notification messages (Keith
Busch)"
* tag 'pci-v5.1-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI/LINK: Add Kconfig option (default off)
PCI/portdrv: Use shared MSI/MSI-X vector for Bandwidth Management
PCI: Fix issue with "pci=disable_acs_redir" parameter being ignored
Pull MTD fix from Richard Weinberger:
"A single regression fix for the marvell nand driver"
* tag 'mtd/fixes-for-5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
mtd: rawnand: marvell: Clean the controller state before each operation
e8303bb7a7 ("PCI/LINK: Report degraded links via link bandwidth
notification") added dmesg logging whenever a link changes speed or width
to a state that is considered degraded. Unfortunately, it cannot
differentiate signal integrity-related link changes from those
intentionally initiated by an endpoint driver, including drivers that may
live in userspace or VMs when making use of vfio-pci. Some GPU drivers
actively manage the link state to save power, which generates a stream of
messages like this:
vfio-pci 0000:07:00.0: 32.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s x16 link at 0000:00:02.0 (capable of 64.000 Gb/s with 5 GT/s x16 link)
Since we can't distinguish the intentional changes from the signal
integrity issues, leave the reporting turned off by default. Add a Kconfig
option to turn it on if desired.
Fixes: e8303bb7a7 ("PCI/LINK: Report degraded links via link bandwidth notification")
Link: https://lore.kernel.org/linux-pci/20190501142942.26972-1-keith.busch@intel.com
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
To choose whether to pick the GID from the old (16bit) or new (32bit)
field, we should check if the old gid field is set to 0xffff. Mainline
checks the old *UID* field instead - cut'n'paste from the corresponding
code in ufs_get_inode_uid().
Fixes: 252e211e90
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
syzbot was able to crash host by sending UDP packets with a 0 payload.
TCP does not have this issue since we do not aggregate packets without
payload.
Since dev_gro_receive() sets gso_size based on skb_gro_len(skb)
it seems not worth trying to cope with padded packets.
BUG: KASAN: slab-out-of-bounds in skb_gro_receive+0xf5f/0x10e0 net/core/skbuff.c:3826
Read of size 16 at addr ffff88808893fff0 by task syz-executor612/7889
CPU: 0 PID: 7889 Comm: syz-executor612 Not tainted 5.1.0-rc7+ #96
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:187
kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
__asan_report_load16_noabort+0x14/0x20 mm/kasan/generic_report.c:133
skb_gro_receive+0xf5f/0x10e0 net/core/skbuff.c:3826
udp_gro_receive_segment net/ipv4/udp_offload.c:382 [inline]
call_gro_receive include/linux/netdevice.h:2349 [inline]
udp_gro_receive+0xb61/0xfd0 net/ipv4/udp_offload.c:414
udp4_gro_receive+0x763/0xeb0 net/ipv4/udp_offload.c:478
inet_gro_receive+0xe72/0x1110 net/ipv4/af_inet.c:1510
dev_gro_receive+0x1cd0/0x23c0 net/core/dev.c:5581
napi_gro_frags+0x36b/0xd10 net/core/dev.c:5843
tun_get_user+0x2f24/0x3fb0 drivers/net/tun.c:1981
tun_chr_write_iter+0xbd/0x156 drivers/net/tun.c:2027
call_write_iter include/linux/fs.h:1866 [inline]
do_iter_readv_writev+0x5e1/0x8e0 fs/read_write.c:681
do_iter_write fs/read_write.c:957 [inline]
do_iter_write+0x184/0x610 fs/read_write.c:938
vfs_writev+0x1b3/0x2f0 fs/read_write.c:1002
do_writev+0x15e/0x370 fs/read_write.c:1037
__do_sys_writev fs/read_write.c:1110 [inline]
__se_sys_writev fs/read_write.c:1107 [inline]
__x64_sys_writev+0x75/0xb0 fs/read_write.c:1107
do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x441cc0
Code: 05 48 3d 01 f0 ff ff 0f 83 9d 09 fc ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 83 3d 51 93 29 00 00 75 14 b8 14 00 00 00 0f 05 <48> 3d 01 f0 ff ff 0f 83 74 09 fc ff c3 48 83 ec 08 e8 ba 2b 00 00
RSP: 002b:00007ffe8c716118 EFLAGS: 00000246 ORIG_RAX: 0000000000000014
RAX: ffffffffffffffda RBX: 00007ffe8c716150 RCX: 0000000000441cc0
RDX: 0000000000000001 RSI: 00007ffe8c716170 RDI: 00000000000000f0
RBP: 0000000000000000 R08: 000000000000ffff R09: 0000000000a64668
R10: 0000000020000040 R11: 0000000000000246 R12: 000000000000c2d9
R13: 0000000000402b50 R14: 0000000000000000 R15: 0000000000000000
Allocated by task 5143:
save_stack+0x45/0xd0 mm/kasan/common.c:75
set_track mm/kasan/common.c:87 [inline]
__kasan_kmalloc mm/kasan/common.c:497 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:470
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:505
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3393 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3555
mm_alloc+0x1d/0xd0 kernel/fork.c:1030
bprm_mm_init fs/exec.c:363 [inline]
__do_execve_file.isra.0+0xaa3/0x23f0 fs/exec.c:1791
do_execveat_common fs/exec.c:1865 [inline]
do_execve fs/exec.c:1882 [inline]
__do_sys_execve fs/exec.c:1958 [inline]
__se_sys_execve fs/exec.c:1953 [inline]
__x64_sys_execve+0x8f/0xc0 fs/exec.c:1953
do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Freed by task 5351:
save_stack+0x45/0xd0 mm/kasan/common.c:75
set_track mm/kasan/common.c:87 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:459
kasan_slab_free+0xe/0x10 mm/kasan/common.c:467
__cache_free mm/slab.c:3499 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3765
__mmdrop+0x238/0x320 kernel/fork.c:677
mmdrop include/linux/sched/mm.h:49 [inline]
finish_task_switch+0x47b/0x780 kernel/sched/core.c:2746
context_switch kernel/sched/core.c:2880 [inline]
__schedule+0x81b/0x1cc0 kernel/sched/core.c:3518
preempt_schedule_irq+0xb5/0x140 kernel/sched/core.c:3745
retint_kernel+0x1b/0x2d
arch_local_irq_restore arch/x86/include/asm/paravirt.h:767 [inline]
kmem_cache_free+0xab/0x260 mm/slab.c:3766
anon_vma_chain_free mm/rmap.c:134 [inline]
unlink_anon_vmas+0x2ba/0x870 mm/rmap.c:401
free_pgtables+0x1af/0x2f0 mm/memory.c:394
exit_mmap+0x2d1/0x530 mm/mmap.c:3144
__mmput kernel/fork.c:1046 [inline]
mmput+0x15f/0x4c0 kernel/fork.c:1067
exec_mmap fs/exec.c:1046 [inline]
flush_old_exec+0x8d9/0x1c20 fs/exec.c:1279
load_elf_binary+0x9bc/0x53f0 fs/binfmt_elf.c:864
search_binary_handler fs/exec.c:1656 [inline]
search_binary_handler+0x17f/0x570 fs/exec.c:1634
exec_binprm fs/exec.c:1698 [inline]
__do_execve_file.isra.0+0x1394/0x23f0 fs/exec.c:1818
do_execveat_common fs/exec.c:1865 [inline]
do_execve fs/exec.c:1882 [inline]
__do_sys_execve fs/exec.c:1958 [inline]
__se_sys_execve fs/exec.c:1953 [inline]
__x64_sys_execve+0x8f/0xc0 fs/exec.c:1953
do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
The buggy address belongs to the object at ffff88808893f7c0
which belongs to the cache mm_struct of size 1496
The buggy address is located 600 bytes to the right of
1496-byte region [ffff88808893f7c0, ffff88808893fd98)
The buggy address belongs to the page:
page:ffffea0002224f80 count:1 mapcount:0 mapping:ffff88821bc40ac0 index:0xffff88808893f7c0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea00025b4f08 ffffea00027b9d08 ffff88821bc40ac0
raw: ffff88808893f7c0 ffff88808893e440 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff88808893fe80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff88808893ff00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff88808893ff80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^
ffff888088940000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888088940080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Fixes: e20cf8d3f1 ("udp: implement GRO for plain UDP sockets.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull power supply fixes from Sebastian Reichel:
"Two more fixes for the 5.1 cycle.
One division by zero fix in a specific driver and one core workaround
for bad userspace behaviour from systemd regarding uevents. IMHO this
can be considered to be a userspace bug, but the debug messages are
useless anyways
- cpcap-battery: fix a division by zero
- core: fix systemd issue due to log messages produced by uevent"
* tag 'for-v5.1-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply:
power: supply: sysfs: prevent endless uevent loop with CONFIG_POWER_SUPPLY_DEBUG
power: supply: cpcap-battery: Fix division by zero
It is a followup after the fix in
commit 9c69a13205 ("route: Avoid crash from dereferencing NULL rt->from")
rt6_do_redirect():
1. NULL checking is needed on rt->from because a parallel
fib6_info delete could happen that sets rt->from to NULL.
(e.g. rt6_remove_exception() and fib6_drop_pcpu_from()).
2. fib6_info_hold() is not enough. Same reason as (1).
Meaning, holding dst->__refcnt cannot ensure
rt->from is not NULL or rt->from->fib6_ref is not 0.
Instead of using fib6_info_hold_safe() which ip6_rt_cache_alloc()
is already doing, this patch chooses to extend the rcu section
to keep "from" dereference-able after checking for NULL.
inet6_rtm_getroute():
1. NULL checking is also needed on rt->from for a similar reason.
Note that inet6_rtm_getroute() is using RTNL_FLAG_DOIT_UNLOCKED.
Fixes: a68886a691 ("net/ipv6: Make from in rt6_info rcu protected")
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Wei Wang <weiwan@google.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While the endiannes is being handled correctly as indicated by the comment
above the offending line - sparse was unhappy with the missing annotation
as be64_to_cpu() expects a __be64 argument. To mitigate this annotation
all involved variables are changed to a consistent __le64 and the
conversion to uint64_t delayed to the call to rds_cong_map_updated().
Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull ARC fixes from Vineet Gupta:
"A few minor fixes for ARC.
- regression in memset if line size !64
- avoid panic if PAE and IOC"
* tag 'arc-5.1-final' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: memset: fix build with L1_CACHE_SHIFT != 6
ARC: [hsdk] Make it easier to add PAE40 region to DTB
ARC: PAE40: don't panic and instead turn off hw ioc
The Interrupt Message Number in the PCIe Capabilities register (PCIe r4.0,
sec 7.5.3.2) indicates which MSI/MSI-X vector is shared by interrupts
related to the PCIe Capability, including Link Bandwidth Management and
Link Autonomous Bandwidth Interrupts (Link Control, 7.5.3.7), Command
Completed and Hot-Plug Interrupts (Slot Control, 7.5.3.10), and the PME
Interrupt (Root Control, 7.5.3.12).
pcie_message_numbers() checked whether we want to enable PME or Hot-Plug
interrupts but neglected to check for Link Bandwidth Management, so if we
only wanted the Bandwidth Management interrupts, it decided we didn't need
any vectors at all. Then pcie_port_enable_irq_vec() tried to reallocate
zero vectors, which failed, resulting in fallback to INTx.
On some systems, e.g., an X79-based workstation, that INTx seems broken or
not handled correctly, so we got spurious IRQ16 interrupts for Bandwidth
Management events.
Change pcie_message_numbers() so that if we want Link Bandwidth Management
interrupts, we use the shared MSI/MSI-X vector from the PCIe Capabilities
register.
Fixes: e8303bb7a7 ("PCI/LINK: Report degraded links via link bandwidth notification")
Link: https://lore.kernel.org/lkml/155597243666.19387.1205950870601742062.stgit@gimli.home
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
[bhelgaas: changelog]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Pull ACPI fix from Rafael Wysocki:
"Revert a recent ACPICA change that caused initialization to fail on
systems with Thunderbolt docking stations connected at the init time"
* tag 'acpi-5.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
Revert "ACPICA: Clear status of GPEs before enabling them"
The 'extent_type' variable does seem to be reliably initialized, but
it's _very_ non-obvious, since there's a "goto next" case that jumps
over the normal initialization. That will then always trigger the
"start >= extent_end" test, which will end up never falling through to
the use of that variable.
But the code is certainly not obvious, and the compiler warning looks
reasonable. Make 'extent_type' an int, and initialize it to an invalid
negative value, which seems to be the common pattern in other places.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The pvlock_page and hvclock_page variables are (as the name implies)
addresses to pages, created by the linker script.
But we declared them as just "extern u8" variables, which _works_, but
now that gcc does some more bounds checking, it causes warnings like
warning: array subscript 1 is outside array bounds of ‘u8[1]’
when we then access more than one byte from those variables.
Fix this by simply making the declaration of the variables match
reality, which makes the compiler happy too.
Signed-off-by: Linus Torvalds <torvalds@-linux-foundation.org>
I'm not sure what made gcc warn about this code now. The 'ret' variable
does end up initialized in all cases, but it's definitely not obvious,
so the compiler is quite reasonable to warn about this.
So just add initialization to make it all much more obvious both to
compilers and to humans.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We already did this for clang, but now gcc has that warning too. Yes,
yes, the address may be unaligned. And that's kind of the point.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Previously, during fragmentation after forwarding, skb->skb_iif isn't
preserved, i.e. 'ip_copy_metadata' does not copy skb_iif from given
'from' skb.
As a result, ip_do_fragment's creates fragments with zero skb_iif,
leading to inconsistent behavior.
Assume for example an eBPF program attached at tc egress (post
forwarding) that examines __sk_buff->ingress_ifindex:
- the correct iif is observed if forwarding path does not involve
fragmentation/refragmentation
- a bogus iif is observed if forwarding path involves
fragmentation/refragmentatiom
Fix, by preserving skb_iif during 'ip_copy_metadata'.
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In io_sqe_buffer_register() we allocate a number of arrays based on the
iov_len from the user-provided iov. While we limit iov_len to SZ_1G,
we can still attempt to allocate arrays exceeding MAX_ORDER.
On a 64-bit system with 4KiB pages, for an iov where iov_base = 0x10 and
iov_len = SZ_1G, we'll calculate that nr_pages = 262145. When we try to
allocate a corresponding array of (16-byte) bio_vecs, requiring 4194320
bytes, which is greater than 4MiB. This results in SLUB warning that
we're trying to allocate greater than MAX_ORDER, and failing the
allocation.
Avoid this by using kvmalloc() for allocations dependent on the
user-provided iov_len. At the same time, fix a leak of imu->bvec when
registration fails.
Full splat from before this patch:
WARNING: CPU: 1 PID: 2314 at mm/page_alloc.c:4595 __alloc_pages_nodemask+0x7ac/0x2938 mm/page_alloc.c:4595
Kernel panic - not syncing: panic_on_warn set ...
CPU: 1 PID: 2314 Comm: syz-executor326 Not tainted 5.1.0-rc7-dirty #4
Hardware name: linux,dummy-virt (DT)
Call trace:
dump_backtrace+0x0/0x2f0 include/linux/compiler.h:193
show_stack+0x20/0x30 arch/arm64/kernel/traps.c:158
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x110/0x190 lib/dump_stack.c:113
panic+0x384/0x68c kernel/panic.c:214
__warn+0x2bc/0x2c0 kernel/panic.c:571
report_bug+0x228/0x2d8 lib/bug.c:186
bug_handler+0xa0/0x1a0 arch/arm64/kernel/traps.c:956
call_break_hook arch/arm64/kernel/debug-monitors.c:301 [inline]
brk_handler+0x1d4/0x388 arch/arm64/kernel/debug-monitors.c:316
do_debug_exception+0x1a0/0x468 arch/arm64/mm/fault.c:831
el1_dbg+0x18/0x8c
__alloc_pages_nodemask+0x7ac/0x2938 mm/page_alloc.c:4595
alloc_pages_current+0x164/0x278 mm/mempolicy.c:2132
alloc_pages include/linux/gfp.h:509 [inline]
kmalloc_order+0x20/0x50 mm/slab_common.c:1231
kmalloc_order_trace+0x30/0x2b0 mm/slab_common.c:1243
kmalloc_large include/linux/slab.h:480 [inline]
__kmalloc+0x3dc/0x4f0 mm/slub.c:3791
kmalloc_array include/linux/slab.h:670 [inline]
io_sqe_buffer_register fs/io_uring.c:2472 [inline]
__io_uring_register fs/io_uring.c:2962 [inline]
__do_sys_io_uring_register fs/io_uring.c:3008 [inline]
__se_sys_io_uring_register fs/io_uring.c:2990 [inline]
__arm64_sys_io_uring_register+0x9e0/0x1bc8 fs/io_uring.c:2990
__invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
invoke_syscall arch/arm64/kernel/syscall.c:47 [inline]
el0_svc_common.constprop.0+0x148/0x2e0 arch/arm64/kernel/syscall.c:83
el0_svc_handler+0xdc/0x100 arch/arm64/kernel/syscall.c:129
el0_svc+0x8/0xc arch/arm64/kernel/entry.S:948
SMP: stopping secondary CPUs
Dumping ftrace buffer:
(ftrace buffer empty)
Kernel Offset: disabled
CPU features: 0x002,23000438
Memory Limit: none
Rebooting in 1 seconds..
Fixes: edafccee56 ("io_uring: add support for pre-mapped user IO buffers")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-block@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
update_chksum() accesses nskb->sk before it has been set
by complete_skb(), move the init up.
Fixes: e8f6979981 ("net/tls: Add generic NIC offload infrastructure")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A recent commit returns an error if icmp is used as the ip-proto for
IPv6 fib rules. Update fib_rule_tests to send ipv6-icmp instead of icmp.
Fixes: 5e1a99eae8 ("ipv4: Add ICMPv6 support when parse route ipproto")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Packet sockets in datagram mode take a destination address. Verify its
length before passing to dev_hard_header.
Prior to 2.6.14-rc3, the send code ignored sll_halen. This is
established behavior. Directly compare msg_namelen to dev->addr_len.
Change v1->v2: initialize addr in all paths
Fixes: 6b8d95f179 ("packet: validate address length if non-zero")
Suggested-by: David Laight <David.Laight@aculab.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Packet send checks that msg_name is at least sizeof sockaddr_ll.
Packet recv must return at least this length, so that its output
can be passed unmodified to packet send.
This ceased to be true since adding support for lladdr longer than
sll_addr. Since, the return value uses true address length.
Always return at least sizeof sockaddr_ll, even if address length
is shorter. Zero the padding bytes.
Change v1->v2: do not overwrite zeroed padding again. use copy_len.
Fixes: 0fb375fb9b ("[AF_PACKET]: Allow for > 8 byte hardware addresses.")
Suggested-by: David Laight <David.Laight@aculab.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 875f1d0769 ("iov_iter: add ITER_BVEC_FLAG_NO_REF flag")
introduces one extra flag of ITER_BVEC_FLAG_NO_REF, and this flag
is stored into iter->type.
However, iov_iter_type() doesn't consider the new added flag, fix
it by masking this flag in iov_iter_type().
Fixes: 875f1d0769 ("iov_iter: add ITER_BVEC_FLAG_NO_REF flag")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Commit 399254aaf4 ("block: add BIO_NO_PAGE_REF flag") introduces
BIO_NO_PAGE_REF, and once this flag is set for one bio, all pages
in the bio won't be get/put during IO.
However, if one bio is submitted via __blkdev_direct_IO_simple(),
even though BIO_NO_PAGE_REF is set, pages still may be put.
Fixes this issue by avoiding to put pages if BIO_NO_PAGE_REF is
set.
Fixes: 399254aaf4 ("block: add BIO_NO_PAGE_REF flag")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If we don't end up actually calling submit in io_sq_wq_submit_work(),
we still need to drop the submit reference to the request. If we
don't, then we can leak the request. This can happen if we race
with ring shutdown while flushing the workqueue for requests that
require use of the mm_struct.
Fixes: e65ef56db4 ("io_uring: use regular request ref counts")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In io_sq_offload_start(), we call cpu_possible() on an unbounded cpu
value from userspace. On v5.1-rc7 on arm64 with
CONFIG_DEBUG_PER_CPU_MAPS, this results in a splat:
WARNING: CPU: 1 PID: 27601 at include/linux/cpumask.h:121 cpu_max_bits_warn include/linux/cpumask.h:121 [inline]
There was an attempt to fix this in commit:
917257daa0 ("io_uring: only test SQPOLL cpu after we've verified it")
... by adding a check after the cpu value had been limited to NR_CPU_IDS
using array_index_nospec(). However, this left an unbound check at the
start of the function, for which the warning still fires.
Let's fix this correctly by checking that the cpu value is bound by
nr_cpu_ids before passing it to cpu_possible(). Note that only
nr_cpu_ids of a cpumask are guaranteed to exist at runtime, and
nr_cpu_ids can be significantly smaller than NR_CPUs. For example, an
arm64 defconfig has NR_CPUS=256, while my test VM has 4 vCPUs.
Following the intent from the commit message for 917257daa0, the
check is moved under the SQ_AFF branch, which is the only branch where
the cpu values is consumed. The check is performed before bounding the
value with array_index_nospec() so that we don't silently accept bogus
cpu values from userspace, where array_index_nospec() would force these
values to 0.
I suspect we can remove the array_index_nospec() call entirely, but I've
conservatively left that in place, updated to use nr_cpu_ids to match
the prior check.
Tested on arm64 with the Syzkaller reproducer:
https://syzkaller.appspot.com/bug?extid=cd714a07c6de2bc34293https://syzkaller.appspot.com/x/repro.syz?x=15d8b397200000
Full splat from before this patch:
WARNING: CPU: 1 PID: 27601 at include/linux/cpumask.h:121 cpu_max_bits_warn include/linux/cpumask.h:121 [inline]
WARNING: CPU: 1 PID: 27601 at include/linux/cpumask.h:121 cpumask_check include/linux/cpumask.h:128 [inline]
WARNING: CPU: 1 PID: 27601 at include/linux/cpumask.h:121 cpumask_test_cpu include/linux/cpumask.h:344 [inline]
WARNING: CPU: 1 PID: 27601 at include/linux/cpumask.h:121 io_sq_offload_start fs/io_uring.c:2244 [inline]
WARNING: CPU: 1 PID: 27601 at include/linux/cpumask.h:121 io_uring_create fs/io_uring.c:2864 [inline]
WARNING: CPU: 1 PID: 27601 at include/linux/cpumask.h:121 io_uring_setup+0x1108/0x15a0 fs/io_uring.c:2916
Kernel panic - not syncing: panic_on_warn set ...
CPU: 1 PID: 27601 Comm: syz-executor.0 Not tainted 5.1.0-rc7 #3
Hardware name: linux,dummy-virt (DT)
Call trace:
dump_backtrace+0x0/0x2f0 include/linux/compiler.h:193
show_stack+0x20/0x30 arch/arm64/kernel/traps.c:158
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x110/0x190 lib/dump_stack.c:113
panic+0x384/0x68c kernel/panic.c:214
__warn+0x2bc/0x2c0 kernel/panic.c:571
report_bug+0x228/0x2d8 lib/bug.c:186
bug_handler+0xa0/0x1a0 arch/arm64/kernel/traps.c:956
call_break_hook arch/arm64/kernel/debug-monitors.c:301 [inline]
brk_handler+0x1d4/0x388 arch/arm64/kernel/debug-monitors.c:316
do_debug_exception+0x1a0/0x468 arch/arm64/mm/fault.c:831
el1_dbg+0x18/0x8c
cpu_max_bits_warn include/linux/cpumask.h:121 [inline]
cpumask_check include/linux/cpumask.h:128 [inline]
cpumask_test_cpu include/linux/cpumask.h:344 [inline]
io_sq_offload_start fs/io_uring.c:2244 [inline]
io_uring_create fs/io_uring.c:2864 [inline]
io_uring_setup+0x1108/0x15a0 fs/io_uring.c:2916
__do_sys_io_uring_setup fs/io_uring.c:2929 [inline]
__se_sys_io_uring_setup fs/io_uring.c:2926 [inline]
__arm64_sys_io_uring_setup+0x50/0x70 fs/io_uring.c:2926
__invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
invoke_syscall arch/arm64/kernel/syscall.c:47 [inline]
el0_svc_common.constprop.0+0x148/0x2e0 arch/arm64/kernel/syscall.c:83
el0_svc_handler+0xdc/0x100 arch/arm64/kernel/syscall.c:129
el0_svc+0x8/0xc arch/arm64/kernel/entry.S:948
SMP: stopping secondary CPUs
Dumping ftrace buffer:
(ftrace buffer empty)
Kernel Offset: disabled
CPU features: 0x002,23000438
Memory Limit: none
Rebooting in 1 seconds..
Fixes: 917257daa0 ("io_uring: only test SQPOLL cpu after we've verified it")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-block@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Simplied the logic
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Ying triggered a call trace when doing an asconf testing:
BUG: scheduling while atomic: swapper/12/0/0x10000100
Call Trace:
<IRQ> [<ffffffffa4375904>] dump_stack+0x19/0x1b
[<ffffffffa436fcaf>] __schedule_bug+0x64/0x72
[<ffffffffa437b93a>] __schedule+0x9ba/0xa00
[<ffffffffa3cd5326>] __cond_resched+0x26/0x30
[<ffffffffa437bc4a>] _cond_resched+0x3a/0x50
[<ffffffffa3e22be8>] kmem_cache_alloc_node+0x38/0x200
[<ffffffffa423512d>] __alloc_skb+0x5d/0x2d0
[<ffffffffc0995320>] sctp_packet_transmit+0x610/0xa20 [sctp]
[<ffffffffc098510e>] sctp_outq_flush+0x2ce/0xc00 [sctp]
[<ffffffffc098646c>] sctp_outq_uncork+0x1c/0x20 [sctp]
[<ffffffffc0977338>] sctp_cmd_interpreter.isra.22+0xc8/0x1460 [sctp]
[<ffffffffc0976ad1>] sctp_do_sm+0xe1/0x350 [sctp]
[<ffffffffc099443d>] sctp_primitive_ASCONF+0x3d/0x50 [sctp]
[<ffffffffc0977384>] sctp_cmd_interpreter.isra.22+0x114/0x1460 [sctp]
[<ffffffffc0976ad1>] sctp_do_sm+0xe1/0x350 [sctp]
[<ffffffffc097b3a4>] sctp_assoc_bh_rcv+0xf4/0x1b0 [sctp]
[<ffffffffc09840f1>] sctp_inq_push+0x51/0x70 [sctp]
[<ffffffffc099732b>] sctp_rcv+0xa8b/0xbd0 [sctp]
As it shows, the first sctp_do_sm() running under atomic context (NET_RX
softirq) invoked sctp_primitive_ASCONF() that uses GFP_KERNEL flag later,
and this flag is supposed to be used in non-atomic context only. Besides,
sctp_do_sm() was called recursively, which is not expected.
Vlad tried to fix this recursive call in Commit c078669340 ("sctp: Fix
oops when sending queued ASCONF chunks") by introducing a new command
SCTP_CMD_SEND_NEXT_ASCONF. But it didn't work as this command is still
used in the first sctp_do_sm() call, and sctp_primitive_ASCONF() will
be called in this command again.
To avoid calling sctp_do_sm() recursively, we send the next queued ASCONF
not by sctp_primitive_ASCONF(), but by sctp_sf_do_prm_asconf() in the 1st
sctp_do_sm() directly.
Reported-by: Ying Xu <yinxu@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix ReST underline warning:
./Documentation/networking/netdev-FAQ.rst:135: WARNING: Title underline too short.
Q: I made changes to only a few patches in a patch series should I resend only those changed?
--------------------------------------------------------------------------------------------
Fixes: ffa9125373 ("Documentation: networking: Update netdev-FAQ regarding patches")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently we only post a cqe if we get an error OUTSIDE of submission.
For submission, we return the error directly through io_uring_enter().
This is a bit awkward for applications, and it makes more sense to
always post a cqe with an error, if the error happens on behalf of an
sqe.
This changes submission behavior a bit. io_uring_enter() returns -ERROR
for an error, and > 0 for number of sqes submitted. Before this change,
if you wanted to submit 8 entries and had an error on the 5th entry,
io_uring_enter() would return 4 (for number of entries successfully
submitted) and rewind the sqring. The application would then have to
peek at the sqring and figure out what was wrong with the head sqe, and
then skip it itself. With this change, we'll return 5 since we did
consume 5 sqes, and the last sqe (with the error) will result in a cqe
being posted with the error.
This makes the logic easier to handle in the application, and it cleans
up the submission part.
Suggested-by: Stefan Bühler <source@stbuehler.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull fsnotify fix from Jan Kara:
"A fix of user trigerable NULL pointer dereference syzbot has recently
spotted.
The problem was introduced in this merge window so no CC stable is
needed"
* tag 'fsnotify_for_v5.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fsnotify: Fix NULL ptr deref in fanotify_get_fsid()
KVM/ARM fixes for 5.1, take #2:
- Don't try to emulate timers on userspace access
- Fix unaligned huge mappings, again
- Properly reset a vcpu that fails to reset(!)
- Properly retire pending LPIs on reset
- Fix computation of emulated CNTP_TVAL
Enlightened VMCS is only supported on Intel CPUs but the test shouldn't
fail completely.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
If a memory slot's size is not a multiple of 64 pages (256K), then
the KVM_CLEAR_DIRTY_LOG API is unusable: clearing the final 64 pages
either requires the requested page range to go beyond memslot->npages,
or requires log->num_pages to be unaligned, and kvm_clear_dirty_log_protect
requires log->num_pages to be both in range and aligned.
To allow this case, allow log->num_pages not to be a multiple of 64 if
it ends exactly on the last page of the slot.
Reported-by: Peter Xu <peterx@redhat.com>
Fixes: 98938aa8ed ("KVM: validate userspace input in kvm_clear_dirty_log_protect()", 2019-01-02)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Commit 47c42e6b41 ("KVM: x86: fix handling of role.cr4_pae and rename it
to 'gpte_size'") introduced a regression: 32-bit PAE guests stopped
working. The issue appears to be: when guest switches (enables) PAE we need
to re-initialize MMU context (set context->root_level, do
reset_rsvds_bits_mask(), ...) but init_kvm_tdp_mmu() doesn't do that
because we threw away is_pae(vcpu) flag from mmu role. Restore it to
kvm_mmu_extended_role (as we now don't need it in base role) to fix
the issue.
Fixes: 47c42e6b41 ("KVM: x86: fix handling of role.cr4_pae and rename it to 'gpte_size'")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM's recent bug fix to update %rip after emulating I/O broke userspace
that relied on the previous behavior of incrementing %rip prior to
exiting to userspace. When running a Windows XP guest on AMD hardware,
Qemu may patch "OUT 0x7E" instructions in reaction to the OUT itself.
Because KVM's old behavior was to increment %rip before exiting to
userspace to handle the I/O, Qemu manually adjusted %rip to account for
the OUT instruction.
Arguably this is a userspace bug as KVM requires userspace to re-enter
the kernel to complete instruction emulation before taking any other
actions. That being said, this is a bit of a grey area and breaking
userspace that has worked for many years is bad.
Pre-increment %rip on OUT to port 0x7e before exiting to userspace to
hack around the issue.
Fixes: 45def77ebf ("KVM: x86: update %rip after emulating IO")
Reported-by: Simon Becherer <simon@becherer.de>
Reported-and-tested-by: Iakov Karpov <srid@rkmail.ru>
Reported-by: Gabriele Balducci <balducci@units.it>
Reported-by: Antti Antinoja <reader@fennosys.fi>
Cc: stable@vger.kernel.org
Cc: Takashi Iwai <tiwai@suse.com>
Cc: Jiri Slaby <jslaby@suse.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Kalle Valo says:
====================
wireless-drivers fixes for 5.1
Third set of fixes for 5.1.
iwlwifi
* fix an oops when creating debugfs entries
* fix bug when trying to capture debugging info while in rfkill
* prevent potential uninitialized memory dumps into debugging logs
* fix some initialization parameters for AX210 devices
* fix an oops with non-MSIX devices
* fix an oops when we receive a packet with bogus lengths
* fix a bug that prevented 5350 devices from working
* fix a small merge damage from the previous series
mwifiex
* fig regression with resume on SDIO
ath10k
* fix locking problem with crashdump
* fix warnings during suspend and resume
Also note that this pull conflicts with net-next. And I want to emphasie
that it's really net-next, so when you pull this to net tree it should
go without conflicts. Stephen reported the conflict here:
https://lkml.kernel.org/r/20190429115338.5decb50b@canb.auug.org.au
In iwlwifi oddly commit 154d4899e4 adds the IS_ERR_OR_NULL() in
wireless-drivers but commit c9af7528c3 removes the whole check in
wireless-drivers-next. The fix is easy, just drop the whole check for
mvmvif->dbgfs_dir in iwlwifi/mvm/debugfs-vif.c, it's unneeded anyway.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull USB fixes from Greg KH:
"Here are some small USB fixes for a bunch of warnings/errors that the
syzbot has been finding with it's new-found ability to stress-test the
USB layer.
All of these are tiny, but fix real issues, and are marked for stable
as well. All of these have had lots of testing in linux-next as well"
* tag 'usb-5.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
USB: w1 ds2490: Fix bug caused by improper use of altsetting array
USB: yurex: Fix protection fault after device removal
usb: usbip: fix isoc packet num validation in get_pipe
USB: core: Fix bug caused by duplicate interface PM usage counter
USB: dummy-hcd: Fix failure to give back unlinked URBs
USB: core: Fix unterminated string returned by usb_string()
There is no operation to order with afterwards, and removing the flag is
not critical in any way.
There will always be a "race condition" where the application will
trigger IORING_ENTER_SQ_WAKEUP when it isn't actually needed.
Signed-off-by: Stefan Bühler <source@stbuehler.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
smp_store_release in io_commit_sqring already orders the store to
dropped before the update to SQ head.
Signed-off-by: Stefan Bühler <source@stbuehler.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The memory operations before reading cq head are unrelated and we
don't care about their order.
Document that the control dependency in combination with READ_ONCE and
WRITE_ONCE forms a barrier we need.
Signed-off-by: Stefan Bühler <source@stbuehler.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
wq_has_sleeper has a full barrier internally. The smp_rmb barrier in
io_uring_poll synchronizes with it.
Signed-off-by: Stefan Bühler <source@stbuehler.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The application reading the CQ ring needs a barrier to pair with the
smp_store_release in io_commit_cqring, not the barrier after it.
Also a write barrier *after* writing something (but not *before*
writing anything interesting) doesn't order anything, so an smp_wmb()
after writing SQ tail is not needed.
Additionally consider reading SQ head and writing CQ tail in the notes.
Also add some clarifications how the various other fields in the ring
buffers are used.
Signed-off-by: Stefan Bühler <source@stbuehler.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Not all request types set REQ_F_FORCE_NONBLOCK when they needed async
punting; reverse logic instead and set REQ_F_NOWAIT if request mustn't
be punted.
Signed-off-by: Stefan Bühler <source@stbuehler.de>
Merged with my previous patch for this.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull selinux fix from Paul Moore:
"One small patch for the stable folks to fix a problem when building
against the latest glibc.
I'll be honest and say that I'm not really thrilled with the idea of
sending this up right now, but Greg is a little annoyed so here I
figured I would at least send this"
* tag 'selinux-pr-20190429' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: use kernel linux/socket.h for genheaders and mdp
If register_snap_client fails in atalk_init,
error code should be set, otherwise it will
triggers NULL pointer dereference while unloading
module.
Fixes: 9804501fa1 ("appletalk: Fix potential NULL pointer dereference in unregister_snap_client")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The "fs->location" is a u32 that comes from the user in ethtool_set_rxnfc().
We can't pass unclamped values to test_bit() or it results in an out of
bounds access beyond the end of the bitmap.
Fixes: 7318166cac ("net: dsa: bcm_sf2: Add support for ethtool::rxnfc")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In rxrpc_destroy_all_calls(), there are two phases: (1) make sure the
->calls list is empty, emitting error messages if not, and (2) wait for the
RCU cleanup to happen on outstanding calls (ie. ->nr_calls becomes 0).
To avoid taking the call_lock, the function prechecks ->calls and if empty,
it returns to avoid taking the lock - this is wrong, however: it still
needs to go and do the second phase and wait for ->nr_calls to become 0.
Without this, the rxrpc_net struct may get deallocated before we get to the
RCU cleanup for the last calls. This can lead to:
Slab corruption (Not tainted): kmalloc-16k start=ffff88802b178000, len=16384
050: 6b 6b 6b 6b 6b 6b 6b 6b 61 6b 6b 6b 6b 6b 6b 6b kkkkkkkkakkkkkkk
Note the "61" at offset 0x58. This corresponds to the ->nr_calls member of
struct rxrpc_net (which is >9k in size, and thus allocated out of the 16k
slab).
Fix this by flipping the condition on the if-statement, putting the locked
section inside the if-body and dropping the return from there. The
function will then always go on to wait for the RCU cleanup on outstanding
calls.
Fixes: 2baec2c3f8 ("rxrpc: Support network namespacing")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert says:
====================
pull request (net): ipsec 2019-04-30
1) Fix an out-of-bound array accesses in __xfrm_policy_unlink.
From YueHaibing.
2) Reset the secpath on failure in the ESP GRO handlers
to avoid dereferencing an invalid pointer on error.
From Myungho Jung.
3) Add and revert a patch that tried to add rcu annotations
to netns_xfrm. From Su Yanjun.
4) Wait for rcu callbacks before freeing xfrm6_tunnel_spi_kmem.
From Su Yanjun.
5) Fix forgotten vti4 ipip tunnel deregistration.
From Jeremy Sowden:
6) Remove some duplicated log messages in vti4.
From Jeremy Sowden.
7) Don't use IPSEC_PROTO_ANY when flushing states because
this will flush only IPsec portocol speciffic states.
IPPROTO_ROUTING states may remain in the lists when
doing net exit. Fix this by replacing IPSEC_PROTO_ANY
with zero. From Cong Wang.
8) Add length check for UDP encapsulation to fix "Oversized IP packet"
warnings on receive side. From Sabrina Dubroca.
9) Fix xfrm interface lookup when the interface is associated to
a vrf layer 3 master device. From Martin Willi.
10) Reload header pointers after pskb_may_pull() in _decode_session4(),
otherwise we may read from uninitialized memory.
11) Update the documentation about xfrm[46]_gc_thresh, it
is not used anymore after the flowcache removal.
From Nicolas Dichtel.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When there is no route to an IPv6 dest addr, skb_dst(skb) points
to loopback dev in the case of that the IP6CB(skb)->iif is
enslaved to a vrf. This causes Ip6InNoRoutes to be incremented on the
loopback dev. This also causes the lookup to fail on icmpv6_send() and
the dest unreachable to not sent and Ip6OutNoRoutes gets incremented on
the loopback dev.
To reproduce:
* Gateway configuration:
ip link add dev vrf_258 type vrf table 258
ip link set dev enp0s9 master vrf_258
ip addr add 66:1/64 dev enp0s9
ip -6 route add unreachable default metric 8192 table 258
sysctl -w net.ipv6.conf.all.forwarding=1
sysctl -w net.ipv6.conf.enp0s9.forwarding=1
* Sender configuration:
ip addr add 66::2/64 dev enp0s9
ip -6 route add default via 66::1
and ping 67::1 for example from the sender.
Fix this by counting on the original netdev and reset the skb dst to
force a fresh lookup.
v2: Fix typo of destination address in the repro steps.
v3: Simplify the loopback check (per David Ahern) and use reverse
Christmas tree format (per David Miller).
Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Tested-by: David Ahern <dsahern@gmail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Richard and Bruno both reported that my commit added a bug,
and Bruno was able to determine the problem came when a segment
wih a FIN packet was coalesced to a prior one in tcp backlog queue.
It turns out the header prediction in tcp_rcv_established()
looks back to TCP headers in the packet, not in the metadata
(aka TCP_SKB_CB(skb)->tcp_flags)
The fast path in tcp_rcv_established() is not supposed to
handle a FIN flag (it does not call tcp_fin())
Therefore we need to make sure to propagate the FIN flag,
so that the coalesced packet does not go through the fast path,
the same than a GRO packet carrying a FIN flag.
While we are at it, make sure we do not coalesce packets with
RST or SYN, or if they do not have ACK set.
Many thanks to Richard and Bruno for pinpointing the bad commit,
and to Richard for providing a first version of the fix.
Fixes: 4f693b55c3 ("tcp: implement coalescing on backlog queue")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Richard Purdie <richard.purdie@linuxfoundation.org>
Reported-by: Bruno Prémont <bonbons@sysophe.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
A request for a flowlabel fails in process or user exclusive mode must
fail if the caller pid or uid does not match. Invert the test.
Previously, the test was unsafe wrt PID recycling, but indeed tested
for inequality: fl1->owner != fl->owner
Fixes: 4f82f45730 ("net ip6 flowlabel: Make owner a union of struct pid* and kuid_t")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefan Schmidt says:
====================
ieee802154 for net 2019-04-25
An update from ieee802154 for your *net* tree.
Another fix from Kangjie Lu to ensure better checking regmap updates in the
mcr20a driver. Nothing else I have pending for the final release.
If there are any problems let me know.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull seccomp fixes from Kees Cook:
"Syzbot found a use-after-free bug in seccomp due to flags that should
not be allowed to be used together.
Tycho fixed this, I updated the self-tests, and the syzkaller PoC has
been running for several days without triggering KASan (before this
fix, it would reproduce). These patches have also been in -next for
almost a week, just to be sure.
- Add logic for making some seccomp flags exclusive (Tycho)
- Update selftests for exclusivity testing (Kees)"
* tag 'seccomp-v5.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
seccomp: Make NEW_LISTENER and TSYNC flags exclusive
selftests/seccomp: Prepare for exclusive seccomp flags
This doesn't really do anything, but at least we now parse teh
ZERO_PAGE() address argument so that we'll catch the most obvious errors
in usage next time they'll happen.
See commit 6a5c5d26c4 ("rdma: fix build errors on s390 and MIPS due to
bad ZERO_PAGE use") what happens when we don't have any use of the macro
argument at all.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When compiling genheaders and mdp from a newer host kernel, the
following error happens:
In file included from scripts/selinux/genheaders/genheaders.c:18:
./security/selinux/include/classmap.h:238:2: error: #error New
address family defined, please update secclass_map. #error New
address family defined, please update secclass_map. ^~~~~
make[3]: *** [scripts/Makefile.host:107:
scripts/selinux/genheaders/genheaders] Error 1 make[2]: ***
[scripts/Makefile.build:599: scripts/selinux/genheaders] Error 2
make[1]: *** [scripts/Makefile.build:599: scripts/selinux] Error 2
make[1]: *** Waiting for unfinished jobs....
Instead of relying on the host definition, include linux/socket.h in
classmap.h to have PF_MAX.
Cc: stable@vger.kernel.org
Signed-off-by: Paulo Alcantara <paulo@paulo.ac>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
[PM: manually merge in mdp.c, subject line tweaks]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Johannes Berg says:
====================
* fix use-after-free in mac80211 TXQs
* fix RX STBC byte order
* fix debugfs rename crashing due to ERR_PTR()
* fix missing regulatory notification
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
ath10k_mac_vif_chan() always returns an error for the given vif
during system-wide resume which reliably triggers two WARN_ON()s
in ath10k_bss_info_changed() and they are not particularly
useful in that code path, so drop them.
Tested: QCA6174 hw3.2 PCI with WLAN.RM.2.0-00180-QCARMSWPZ-1
Tested: QCA6174 hw3.2 SDIO with WLAN.RMH.4.4.1-00007-QCARMSWP-1
Fixes: cd93b83ad9 ("ath10k: support for multicast rate control")
Fixes: f279294e9e ("ath10k: add support for configuring management packet rate")
Cc: stable@vger.kernel.org
Reviewed-by: Brian Norris <briannorris@chromium.org>
Tested-by: Brian Norris <briannorris@chromium.org>
Tested-by: Claire Chang <tientzu@chromium.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Commit 25733c4e67 ("ath10k: pci: use mutex for diagnostic window CE
polling") introduced a regression where we try to sleep (grab a mutex)
in an atomic context:
[ 233.602619] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:254
[ 233.602626] in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/0
[ 233.602636] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 5.1.0-rc2 #4
[ 233.602642] Hardware name: Google Scarlet (DT)
[ 233.602647] Call trace:
[ 233.602663] dump_backtrace+0x0/0x11c
[ 233.602672] show_stack+0x20/0x28
[ 233.602681] dump_stack+0x98/0xbc
[ 233.602690] ___might_sleep+0x154/0x16c
[ 233.602696] __might_sleep+0x78/0x88
[ 233.602704] mutex_lock+0x2c/0x5c
[ 233.602717] ath10k_pci_diag_read_mem+0x68/0x21c [ath10k_pci]
[ 233.602725] ath10k_pci_diag_read32+0x48/0x74 [ath10k_pci]
[ 233.602733] ath10k_pci_dump_registers+0x5c/0x16c [ath10k_pci]
[ 233.602741] ath10k_pci_fw_crashed_dump+0xb8/0x548 [ath10k_pci]
[ 233.602749] ath10k_pci_napi_poll+0x60/0x128 [ath10k_pci]
[ 233.602757] net_rx_action+0x140/0x388
[ 233.602766] __do_softirq+0x1b0/0x35c
[...]
ath10k_pci_fw_crashed_dump() is called from NAPI contexts, and firmware
memory dumps are retrieved using the diag memory interface.
A simple reproduction case is to run this on QCA6174A /
WLAN.RM.4.4.1-00132-QCARMSWP-1, which happens to be a way to b0rk the
firmware:
dd if=/sys/kernel/debug/ieee80211/phy0/ath10k/mem_value bs=4K count=1
of=/dev/null
(NB: simulated firmware crashes, via debugfs, don't trigger firmware
dumps.)
The fix is to move the crash-dump into a workqueue context, and avoid
relying on 'data_lock' for most mutual exclusion. We only keep using it
here for protecting 'fw_crash_counter', while the rest of the coredump
buffers are protected by a new 'dump_mutex'.
I've tested the above with simulated firmware crashes (debugfs 'reset'
file), real firmware crashes (the 'dd' command above), and a variety of
reboot and suspend/resume configurations on QCA6174A.
Reported here:
http://lkml.kernel.org/linux-wireless/20190325202706.GA68720@google.com
Fixes: 25733c4e67 ("ath10k: pci: use mutex for diagnostic window CE polling")
Signed-off-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
KVM_GET_DIRTY_LOG is implemented by all architectures, not just x86,
and KVM_CAP_MANUAL_DIRTY_LOG_PROTECT is additionally implemented by
arm, arm64, and mips.
Signed-off-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
file_remove_privs() might be called for non-regular files, e.g.
blkdev inode. There is no reason to do its job on things
like blkdev inodes, pipes, or cdevs. Hence, abort if
file does not refer to a regular inode.
AV: more to the point, for devices there might be any number of
inodes refering to given device. Which one to strip the permissions
from, even if that made any sense in the first place? All of them
will be observed with contents modified, after all.
Found by LockDoc (Alexander Lochmann, Horst Schirmeier and Olaf
Spinczyk)
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Alexander Lochmann <alexander.lochmann@tu-dortmund.de>
Signed-off-by: Horst Schirmeier <horst.schirmeier@tu-dortmund.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
It has no business being there, it's checked by relevant ->get_tree()
as it is *and* it returns the wrong error for no reason whatsoever.
Fixes: f3a09c9201 "introduce fs_context methods"
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
fanotify_get_fsid() is reading mark->connector->fsid under srcu. It can
happen that it sees mark not fully initialized or mark that is already
detached from the object list. In these cases mark->connector
can be NULL leading to NULL ptr dereference. Fix the problem by
being careful when reading mark->connector and check it for being NULL.
Also use WRITE_ONCE when writing the mark just to prevent compiler from
doing something stupid.
Reported-by: syzbot+15927486a4f1bfcbaf91@syzkaller.appspotmail.com
Fixes: 77115225ac ("fanotify: cache fsid in fsnotify_mark_connector")
Signed-off-by: Jan Kara <jack@suse.cz>
Pull ARM fixes from Russell King:
"A small number of ARM fixes
- Fix function tracer and unwinder dependencies so that we don't end
up building kernels that will crash
- Fix ARMv7M nommu initialisation (missing register initialisation)
- Fix EFI decompressor entry (ensuring barrier instructions are
enabled prior to use)"
* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: 8857/1: efi: enable CP15 DMB instructions before cleaning the cache
ARM: 8856/1: NOMMU: Fix CCR register faulty initialization when MPU is disabled
ARM: fix function graph tracer and unwinder dependencies
Pull powerpc fixes from Michael Ellerman:
"A one-liner to make our Radix MMU support depend on HUGETLB_PAGE. We
use some of the hugetlb inlines (eg. pud_huge()) when operating on the
linear mapping and if they're compiled into empty wrappers we can
corrupt memory.
Then two fixes to our VFIO IOMMU code. The first is not a regression
but fixes the locking to avoid a user-triggerable deadlock.
The second does fix a regression since rc1, and depends on the first
fix. It makes it possible to run guests with large amounts of memory
again (~256GB).
Thanks to Alexey Kardashevskiy"
* tag 'powerpc-5.1-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/mm_iommu: Allow pinning large regions
powerpc/mm_iommu: Fix potential deadlock
powerpc/mm/radix: Make Radix require HUGETLB_PAGE
Pull block fixes from Jens Axboe:
"A set of io_uring fixes that should go into this release. In
particular, this contains:
- The mutex lock vs ctx ref count fix (me)
- Removal of a dead variable (me)
- Two race fixes (Stefan)
- Ring head/tail condition fix for poll full SQ detection (Stefan)"
* tag 'for-linus-20190428' of git://git.kernel.dk/linux-block:
io_uring: remove 'state' argument from io_{read,write} path
io_uring: fix poll full SQ detection
io_uring: fix race condition when sq threads goes sleeping
io_uring: fix race condition reading SQ entries
io_uring: fail io_uring_register(2) on a dying io_uring instance
Pull rdma fixes from Jason Gunthorpe:
"One core bug fix and a few driver ones
- FRWR memory registration for hfi1/qib didn't work with with some
iovas causing a NFSoRDMA failure regression due to a fix in the NFS
side
- A command flow error in mlx5 allowed user space to send a corrupt
command (and also smash the kernel stack we've since learned)
- Fix a regression and some bugs with device hot unplug that was
discovered while reviewing Andrea's patches
- hns has a failure if the user asks for certain QP configurations"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/hns: Bugfix for mapping user db
RDMA/ucontext: Fix regression with disassociate
RDMA/mlx5: Use rdma_user_map_io for mapping BAR pages
RDMA/mlx5: Do not allow the user to write to the clock page
IB/mlx5: Fix scatter to CQE in DCT QP creation
IB/rdmavt: Fix frwr memory registration
Pull dmaengine fixes from Vinod Koul:
- fix for wrong register use in mediatek driver
- fix in sh driver for glitch is tx_status and treating 0 a valid
residue for cyclic
- fix in bcm driver for using right memory allocation flag
* tag 'dmaengine-fix-5.1-rc7' of git://git.infradead.org/users/vkoul/slave-dma:
dmaengine: mediatek-cqdma: fix wrong register usage in mtk_cqdma_start
dmaengine: sh: rcar-dmac: Fix glitch in dmaengine_tx_status
dmaengine: sh: rcar-dmac: With cyclic DMA residue 0 is valid
dmaengine: bcm2835: Avoid GFP_KERNEL in device_prep_slave_sg
The line6 driver uses a lot of USB buffers off of the stack, which is
not allowed on many systems, causing the driver to crash on some of
them. Fix this up by dynamically allocating the buffers with kmalloc()
which allows for proper DMA-able memory.
Reported-by: Christo Gouws <gouws.christo@gmail.com>
Reported-by: Alan Stern <stern@rowland.harvard.edu>
Tested-by: Christo Gouws <gouws.christo@gmail.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Fourth batch of patches intended for v5.1
* Fix an oops when we receive a packet with bogus lengths;
* Fix a bug that prevented 5350 devices from working;
* Fix a small merge damage from the previous series;
When I rebased Greg's patch, I accidentally left the old if block that
was already there. Remove it.
Fixes: 154d4899e4 ("iwlwifi: mvm: properly check debugfs dentry before using it")
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
We introduced a bug that prevented this old device from
working. The driver would simply not be able to complete
the INIT flow while spewing this warning:
CSR addresses aren't configured
WARNING: CPU: 0 PID: 819 at drivers/net/wireless/intel/iwlwifi/pcie/drv.c:917
iwl_pci_probe+0x160/0x1e0 [iwlwifi]
Cc: stable@vger.kernel.org # v4.18+
Fixes: a8cbb46f83 ("iwlwifi: allow different csr flags for different device families")
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Fixes: c8f1b51e50 ("iwlwifi: allow different csr flags for different device families")
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
We don't check for the validity of the lengths in the packet received
from the firmware. If the MPDU length received in the rx descriptor
is too short to contain the header length and the crypt length
together, we may end up trying to copy a negative number of bytes
(headlen - hdrlen < 0) which will underflow and cause us to try to
copy a huge amount of data. This causes oopses such as this one:
BUG: unable to handle kernel paging request at ffff896be2970000
PGD 5e201067 P4D 5e201067 PUD 5e205067 PMD 16110d063 PTE 8000000162970161
Oops: 0003 [#1] PREEMPT SMP NOPTI
CPU: 2 PID: 1824 Comm: irq/134-iwlwifi Not tainted 4.19.33-04308-geea41cf4930f #1
Hardware name: [...]
RIP: 0010:memcpy_erms+0x6/0x10
Code: 90 90 90 90 eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 <f3> a4 c3
0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38 fe
RSP: 0018:ffffa4630196fc60 EFLAGS: 00010287
RAX: ffff896be2924618 RBX: ffff896bc8ecc600 RCX: 00000000fffb4610
RDX: 00000000fffffff8 RSI: ffff896a835e2a38 RDI: ffff896be2970000
RBP: ffffa4630196fd30 R08: ffff896bc8ecc600 R09: ffff896a83597000
R10: ffff896bd6998400 R11: 000000000200407f R12: ffff896a83597050
R13: 00000000fffffff8 R14: 0000000000000010 R15: ffff896a83597038
FS: 0000000000000000(0000) GS:ffff896be8280000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff896be2970000 CR3: 000000005dc12002 CR4: 00000000003606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
iwl_mvm_rx_mpdu_mq+0xb51/0x121b [iwlmvm]
iwl_pcie_rx_handle+0x58c/0xa89 [iwlwifi]
iwl_pcie_irq_rx_msix_handler+0xd9/0x12a [iwlwifi]
irq_thread_fn+0x24/0x49
irq_thread+0xb0/0x122
kthread+0x138/0x140
ret_from_fork+0x1f/0x40
Fix that by checking the lengths for correctness and trigger a warning
to show that we have received wrong data.
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Currently, the UDP GRO code path does bad things on some edge
conditions - Aggregation can happen even on packet with different
lengths.
Fix the above by rewriting the 'complete' condition for GRO
packets. While at it, note explicitly that we allow merging the
first packet per burst below gso_size.
Reported-by: Sean Tong <seantong114@gmail.com>
Fixes: e20cf8d3f1 ("udp: implement GRO for plain UDP sockets.")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski says:
====================
net/tls: fix data copies in tls_device_reencrypt()
This series fixes the tls_device_reencrypt() which is broken
if record starts in the frags of the message skb.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Fragments may contain data from other records so we have to account
for that when we calculate the destination and max length of copy we
can perform. Note that 'offset' is the offset within the message,
so it can't be passed as offset within the frag..
Here skb_store_bits() would have realised the call is wrong and
simply not copy data.
Fixes: 4799ac81e5 ("tls: Add rx inline crypto offload")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no guarantee the record starts before the skb frags.
If we don't check for this condition copy amount will get
negative, leading to reads and writes to random memory locations.
Familiar hilarity ensues.
Fixes: 4799ac81e5 ("tls: Add rx inline crypto offload")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull input fixes from Dmitry Torokhov:
"Just a couple of fixups for Synaptics RMI4 driver and allowing
snvs_pwrkey to be selected on more boards"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: synaptics-rmi4 - write config register values to the right offset
Input: synaptics-rmi4 - fix possible double free
Input: snvs_pwrkey - make it depend on ARCH_MXC
Michael Chan says:
====================
bnxt_en: Misc. bug fixes.
6 miscellaneous bug fixes covering several issues in error code paths,
a setup issue for statistics DMA, and an improvement for setting up
multicast address filters.
Please queue these for stable as well.
Patch #5 (bnxt_en: Fix statistics context reservation logic) is for the
most recent 5.0 stable only. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
In bnxt_rx_pkt(), if the driver encounters BD errors, it will recycle
the buffers and jump to the end where the uninitailized variable "len"
is referenced. Fix it by adding a new jump label that will skip
the length update. This is the most correct fix since the length
may not be valid when we get this type of error.
Fixes: 6a8788f256 ("bnxt_en: add support for software dynamic interrupt moderation")
Reported-by: Nathan Chancellor <natechancellor@gmail.com>
Cc: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Tested-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In an earlier commit that fixes the number of stats contexts to
reserve for the RDMA driver, we added a function parameter to pass in
the number of stats contexts to all the relevant functions. The passed
in parameter should have been used to set the enables field of the
firmware message.
Fixes: 780baad44f ("bnxt_en: Reserve 1 stat_ctx for RDMA driver.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If driver determines that extended TX port statistics are not supported
or allocation of the data structure fails, make sure to pass 0 TX stats
size to firmware to disable it. The firmware returned TX stats size should
also be set to 0 for consistency. This will prevent
bnxt_get_ethtool_stats() from accessing the NULL TX stats pointer in
case there is mismatch between firmware and driver.
Fixes: 36e53349b6 ("bnxt_en: Add additional extended port statistics.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If we encounter errors during open and proceed to clean up,
bnxt_hwrm_ring_free() may crash if the rings we try to free have never
been allocated. bnxt_cp_ring_for_rx() or bnxt_cp_ring_for_tx()
may reference pointers that have not been allocated.
Fix it by checking for valid fw_ring_id first before calling
bnxt_cp_ring_for_rx() or bnxt_cp_ring_for_tx().
Fixes: 2c61d2117e ("bnxt_en: Add helper functions to get firmware CP ring ID.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the bnxt_init_one() error path, short FW command request memory
is not freed. This patch fixes it.
Fixes: e605db801b ("bnxt_en: Support for Short Firmware Message")
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver builds a list of multicast addresses and sends it to the
firmware when the driver's ndo_set_rx_mode() is called. In rare
cases, the firmware can fail this call if internal resources to
add multicast addresses are exhausted. In that case, we should
try the call again by setting the ALL_MCAST flag which is more
guaranteed to succeed.
Fixes: c0c050c58d ("bnxt_en: New Broadcom ethernet driver.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull x86 fixes from Ingo Molnar:
- Fix an early boot crash in the RSDP parsing code by effectively
turning off the parsing call - we ran out of time but want to fix the
regression. The more involved fix is being worked on.
- Fix a crash that can trigger in the kmemlek code.
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Fix a crash with kmemleak_scan()
x86/boot: Disable RSDP parsing temporarily
Pull scheduler fix from Ingo Molnar:
"Fix a division by zero bug that can trigger in the NUMA placement
code"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/numa: Fix a possible divide-by-zero
Pull perf fix from Ingo Molnar:
"A cstate event enumeration fix for Kaby/Coffee Lake CPUs"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel: Update KBL Package C-state events to also include PC8/PC9/PC10 counters
The not-so-recent change to move VMX's VM-Exit handing to a dedicated
"function" unintentionally exposed KVM to a speculative attack from the
guest by executing a RET prior to stuffing the RSB. Make RSB stuffing
happen immediately after VM-Exit, before any unpaired returns.
Alternatively, the VM-Exit path could postpone full RSB stuffing until
its current location by stuffing the RSB only as needed, or by avoiding
returns in the VM-Exit path entirely, but both alternatives are beyond
ugly since vmx_vmexit() has multiple indirect callers (by way of
vmx_vmenter()). And putting the RSB stuffing immediately after VM-Exit
makes it much less likely to be re-broken in the future.
Note, the cost of PUSH/POP could be avoided in the normal flow by
pairing the PUSH RAX with the POP RAX in __vmx_vcpu_run() and adding an
a POP to nested_vmx_check_vmentry_hw(), but such a weird/subtle
dependency is likely to cause problems in the long run, and PUSH/POP
will take all of a few cycles, which is peanuts compared to the number
of cycles required to fill the RSB.
Fixes: 453eafbe65 ("KVM: VMX: Move VM-Enter + VM-Exit handling to non-inline sub-routines")
Reported-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This way, slhc_free() accepts what slhc_init() returns, whether that is
an error or not.
In particular, the pattern in sl_alloc_bufs() is
slcomp = slhc_init(16, 16);
...
slhc_free(slcomp);
for the error handling path, and rather than complicate that code, just
make it ok to always free what was returned by the init function.
That's what the code used to do before commit 4ab42d78e3 ("ppp, slip:
Validate VJ compression slot parameters completely") when slhc_init()
just returned NULL for the error case, with no actual indication of the
details of the error.
Reported-by: syzbot+45474c076a4927533d2e@syzkaller.appspotmail.com
Fixes: 4ab42d78e3 ("ppp, slip: Validate VJ compression slot parameters completely")
Acked-by: Ben Hutchings <ben@decadent.org.uk>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge misc fixes from Andrew Morton:
"9 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
fs/proc/proc_sysctl.c: Fix a NULL pointer dereference
mm/page_alloc.c: fix never set ALLOC_NOFRAGMENT flag
mm/page_alloc.c: avoid potential NULL pointer dereference
mm, page_alloc: always use a captured page regardless of compaction result
mm: do not boost watermarks to avoid fragmentation for the DISCONTIG memory model
lib/test_vmalloc.c: do not create cpumask_t variable on stack
lib/Kconfig.debug: fix build error without CONFIG_BLOCK
zram: pass down the bvec we need to read into in the work struct
mm/memory_hotplug.c: drop memory device reference after find_memory_block()
Currently any changed config register values don't take effect, as the
function to write them back is called with the wrong register offset.
Fixes: ff8f83708b (Input: synaptics-rmi4 - add support for 2D
sensors and F11)
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Pull arm64 fixes from Catalin Marinas:
- keep the tail of an unaligned initrd reserved
- adjust ftrace_make_call() to deal with the relative nature of PLTs
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64/module: ftrace: deal with place relative nature of PLTs
arm64: mm: Ensure tail of unaligned initrd is reserved
Pull tracing fixes from Steven Rostedt:
"Three tracing fixes:
- Use "nosteal" for ring buffer splice pages
- Memory leak fix in error path of trace_pid_write()
- Fix preempt_enable_no_resched() (use preempt_enable()) in ring
buffer code"
* tag 'trace-v5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
trace: Fix preempt_enable_no_resched() abuse
tracing: Fix a memory leak by early error exit in trace_pid_write()
tracing: Fix buffer_ref pipe ops
Pull GPIO fixes from Linus Walleij:
"Not much to say about them, regular fixes:
- Fix a bug on the errorpath of gpiochip_add_data_with_key()
- IRQ type setting on the spreadtrum GPIO driver"
* tag 'gpio-v5.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpio: Fix gpiochip_add_data_with_key() error path
gpio: eic: sprd: Fix incorrect irq type setting for the sync EIC
Pull btrfs fix from David Sterba:
"One patch to fix a crash in io submission path, due to memory
allocation errors.
In short, the multipage bio work that landed in 5.1 caused larger bios
that in turn require larger temporary memory for checksums. The patch
is a workaround, we're going to rework the allocation so it does not
require the vmalloc fallback.
It took a while to identify that it's caused by patches in 5.1 and not
a patchset that did some changes in error handling in the code. I've
tested it on various memory/cpu combinations, it could hit OOM but
does not crash.
The timestamp of the patch is less than a day due to updates in the
changelog, tests were running meanwhile"
* tag 'for-5.1-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: Switch memory allocations in async csum calculation path to kvmalloc
Pull cifs fixes from Steve French:
"Three small SMB3 fixes (all for stable as well): two leaks and a
rename bug"
* tag '5.1-rc6-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: fix page reference leak with readv/writev
cifs: do not attempt cifs operation on smb2+ rename error
cifs: fix memory leak in SMB2_read
ac.preferred_zoneref->zone passed to alloc_flags_nofragment() can be NULL.
'zone' pointer unconditionally derefernced in alloc_flags_nofragment().
Bail out on NULL zone to avoid potential crash. Currently we don't see
any crashes only because alloc_flags_nofragment() has another bug which
allows compiler to optimize away all accesses to 'zone'.
Link: http://lkml.kernel.org/r/20190423120806.3503-1-aryabinin@virtuozzo.com
Fixes: 6bb154504f ("mm, page_alloc: spread allocations across zones before introducing fragmentation")
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
During the development of commit 5e1f0f098b ("mm, compaction: capture
a page under direct compaction"), a paranoid check was added to ensure
that if a captured page was available after compaction that it was
consistent with the final state of compaction. The intent was to catch
serious programming bugs such as using a stale page pointer and causing
corruption problems.
However, it is possible to get a captured page even if compaction was
unsuccessful if an interrupt triggered and happened to free pages in
interrupt context that got merged into a suitable high-order page. It's
highly unlikely but Li Wang did report the following warning on s390
occuring when testing OOM handling. Note that the warning is slightly
edited for clarity.
WARNING: CPU: 0 PID: 9783 at mm/page_alloc.c:3777 __alloc_pages_direct_compact+0x182/0x190
Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs
lockd grace fscache sunrpc pkey ghash_s390 prng xts aes_s390
des_s390 des_generic sha512_s390 zcrypt_cex4 zcrypt vmur binfmt_misc
ip_tables xfs libcrc32c dasd_fba_mod qeth_l2 dasd_eckd_mod dasd_mod
qeth qdio lcs ctcm ccwgroup fsm dm_mirror dm_region_hash dm_log
dm_mod
CPU: 0 PID: 9783 Comm: copy.sh Kdump: loaded Not tainted 5.1.0-rc 5 #1
This patch simply removes the check entirely instead of trying to be
clever about pages freed from interrupt context. If a serious
programming error was introduced, it is highly likely to be caught by
prep_new_page() instead.
Link: http://lkml.kernel.org/r/20190419085133.GH18914@techsingularity.net
Fixes: 5e1f0f098b ("mm, compaction: capture a page under direct compaction")
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reported-by: Li Wang <liwang@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mikulas Patocka reported that commit 1c30844d2d ("mm: reclaim small
amounts of memory when an external fragmentation event occurs") "broke"
memory management on parisc.
The machine is not NUMA but the DISCONTIG model creates three pgdats
even though it's a UMA machine for the following ranges
0) Start 0x0000000000000000 End 0x000000003fffffff Size 1024 MB
1) Start 0x0000000100000000 End 0x00000001bfdfffff Size 3070 MB
2) Start 0x0000004040000000 End 0x00000040ffffffff Size 3072 MB
Mikulas reported:
With the patch 1c30844d2, the kernel will incorrectly reclaim the
first zone when it fills up, ignoring the fact that there are two
completely free zones. Basiscally, it limits cache size to 1GiB.
For example, if I run:
# dd if=/dev/sda of=/dev/null bs=1M count=2048
- with the proper kernel, there should be "Buffers - 2GiB"
when this command finishes. With the patch 1c30844d2, buffers
will consume just 1GiB or slightly more, because the kernel was
incorrectly reclaiming them.
The page allocator and reclaim makes assumptions that pgdats really
represent NUMA nodes and zones represent ranges and makes decisions on
that basis. Watermark boosting for small pgdats leads to unexpected
results even though this would have behaved reasonably on SPARSEMEM.
DISCONTIG is essentially deprecated and even parisc plans to move to
SPARSEMEM so there is no need to be fancy, this patch simply disables
watermark boosting by default on DISCONTIGMEM.
Link: http://lkml.kernel.org/r/20190419094335.GJ18914@techsingularity.net
Fixes: 1c30844d2d ("mm: reclaim small amounts of memory when an external fragmentation event occurs")
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Tested-by: Mikulas Patocka <mpatocka@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
marvell_get_sset_count() returns how many statistics counters there
are. If the PHY supports fibre, there are 3, otherwise two.
marvell_get_strings() does not make this distinction, and always
returns 3 strings. This then often results in writing past the end
of the buffer for the strings.
Fixes: 2170fef78a ("Marvell phy: add field to get errors from fiber link.")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When allocating the next family->id it makes more sense to use
idr_alloc_cyclic to avoid re-using a previously used family->id as much
as possible.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In trace_pid_write(), the buffer for trace parser is allocated through
kmalloc() in trace_parser_get_init(). Later on, after the buffer is used,
it is then freed through kfree() in trace_parser_put(). However, it is
possible that trace_pid_write() is terminated due to unexpected errors,
e.g., ENOMEM. In that case, the allocated buffer will not be freed, which
is a memory leak bug.
To fix this issue, free the allocated buffer when an error is encountered.
Link: http://lkml.kernel.org/r/1555726979-15633-1-git-send-email-wang6495@umn.edu
Fixes: f4d34a87e9 ("tracing: Use pid bitmap instead of a pid array for set_event_pid")
Cc: stable@vger.kernel.org
Signed-off-by: Wenwen Wang <wang6495@umn.edu>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
This fixes multiple issues in buffer_pipe_buf_ops:
- The ->steal() handler must not return zero unless the pipe buffer has
the only reference to the page. But generic_pipe_buf_steal() assumes
that every reference to the pipe is tracked by the page's refcount,
which isn't true for these buffers - buffer_pipe_buf_get(), which
duplicates a buffer, doesn't touch the page's refcount.
Fix it by using generic_pipe_buf_nosteal(), which refuses every
attempted theft. It should be easy to actually support ->steal, but the
only current users of pipe_buf_steal() are the virtio console and FUSE,
and they also only use it as an optimization. So it's probably not worth
the effort.
- The ->get() and ->release() handlers can be invoked concurrently on pipe
buffers backed by the same struct buffer_ref. Make them safe against
concurrency by using refcount_t.
- The pointers stored in ->private were only zeroed out when the last
reference to the buffer_ref was dropped. As far as I know, this
shouldn't be necessary anyway, but if we do it, let's always do it.
Link: http://lkml.kernel.org/r/20190404215925.253531-1-jannh@google.com
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@vger.kernel.org
Fixes: 73a757e631 ("ring-buffer: Return reader page back into existing ring buffer")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Canonical way to fetch sk_user_data from an encap_rcv() handler called
from UDP stack in rcu protected section is to use rcu_dereference_sk_user_data(),
otherwise compiler might read it multiple times.
Fixes: d00fa9adc5 ("il2tp: fix races with tunnel socket close")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov says:
====================
pull-request: bpf 2019-04-25
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) the bpf verifier fix to properly mark registers in all stack frames, from Paul.
2) preempt_enable_no_resched->preempt_enable fix, from Peter.
3) other misc fixes.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
drm/imx: fix DP CSC handling
- Fix the DP color space conversion matrix setup to avoid bugs where
disabling the overlay plane while both primary and overlay plane are
routed via the CSC unit would not reconfigure the CSC routing
properly, leaving the display in a nonworking state, or the CSC
setting from a previously set mode would be left behind, causing
wrong colors when reenabling the display in certain configurations.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Philipp Zabel <p.zabel@pengutronix.de>
Link: https://patchwork.freedesktop.org/patch/msgid/1556183136.2271.3.camel@pengutronix.de
Unless the very next line is schedule(), or implies it, one must not use
preempt_enable_no_resched(). It can cause a preemption to go missing and
thereby cause arbitrary delays, breaking the PREEMPT=y invariant.
Cc: Roman Gushchin <guro@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The first test case, for pointer null checks, is equivalent to the
following pseudo-code. It checks that the verifier does not complain on
line 6 and recognizes that ptr isn't null.
1: ptr = bpf_map_lookup_elem(map, &key);
2: ret = subprog(ptr) {
3: return ptr != NULL;
4: }
5: if (ret)
6: value = *ptr;
The second test case, for packet bound checks, is equivalent to the
following pseudo-code. It checks that the verifier does not complain on
line 7 and recognizes that the packet is at least 1 byte long.
1: pkt_end = ctx.pkt_end;
2: ptr = ctx.pkt + 8;
3: ret = subprog(ptr, pkt_end) {
4: return ptr <= pkt_end;
5: }
6: if (ret)
7: value = *(u8 *)ctx.pkt;
Signed-off-by: Paul Chaignon <paul.chaignon@orange.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
In case of a null check on a pointer inside a subprog, we should mark all
registers with this pointer as either safe or unknown, in both the current
and previous frames. Currently, only spilled registers and registers in
the current frame are marked. Packet bound checks in subprogs have the
same issue. This patch fixes it to mark registers in previous frames as
well.
A good reproducer for null checks looks as follow:
1: ptr = bpf_map_lookup_elem(map, &key);
2: ret = subprog(ptr) {
3: return ptr != NULL;
4: }
5: if (ret)
6: value = *ptr;
With the above, the verifier will complain on line 6 because it sees ptr
as map_value_or_null despite the null check in subprog 1.
Note that this patch fixes another resulting bug when using
bpf_sk_release():
1: sk = bpf_sk_lookup_tcp(...);
2: subprog(sk) {
3: if (sk)
4: bpf_sk_release(sk);
5: }
6: if (!sk)
7: return 0;
8: return 1;
In the above, mark_ptr_or_null_regs will warn on line 6 because it will
try to free the reference state, even though it was already freed on
line 3.
Fixes: f4d7e40a5b ("bpf: introduce function calls (verification)")
Signed-off-by: Paul Chaignon <paul.chaignon@orange.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
"bpftool map create" has an infinite loop on "while (argc)". The error
case is missing.
Symptoms: when forgetting to type the keyword 'type' in front of 'hash':
$ sudo bpftool map create /sys/fs/bpf/dir/foobar hash key 8 value 8 entries 128
(infinite loop, taking all the CPU)
^C
After the patch:
$ sudo bpftool map create /sys/fs/bpf/dir/foobar hash key 8 value 8 entries 128
Error: unknown arg hash
Fixes: 0b592b5a01 ("tools: bpftool: add map create command")
Signed-off-by: Alban Crequy <alban@kinvolk.io>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Fix sparse warning:
arch/mips/net/ebpf_jit.c:196:5: warning:
symbol 'ebpf_to_mips_reg' was not declared. Should it be static?
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
As the comment notes, the return codes for TSYNC and NEW_LISTENER
conflict, because they both return positive values, one in the case of
success and one in the case of error. So, let's disallow both of these
flags together.
While this is technically a userspace break, all the users I know
of are still waiting on me to land this feature in libseccomp, so I
think it'll be safe. Also, at present my use case doesn't require
TSYNC at all, so this isn't a big deal to disallow. If someone
wanted to support this, a path forward would be to add a new flag like
TSYNC_AND_LISTENER_YES_I_UNDERSTAND_THAT_TSYNC_WILL_JUST_RETURN_EAGAIN,
but the use cases are so different I don't see it really happening.
Finally, it's worth noting that this does actually fix a UAF issue: at the
end of seccomp_set_mode_filter(), we have:
if (flags & SECCOMP_FILTER_FLAG_NEW_LISTENER) {
if (ret < 0) {
listener_f->private_data = NULL;
fput(listener_f);
put_unused_fd(listener);
} else {
fd_install(listener, listener_f);
ret = listener;
}
}
out_free:
seccomp_filter_free(prepared);
But if ret > 0 because TSYNC raced, we'll install the listener fd and then
free the filter out from underneath it, causing a UAF when the task closes
it or dies. This patch also switches the condition to be simply if (ret),
so that if someone does add the flag mentioned above, they won't have to
remember to fix this too.
Reported-by: syzbot+b562969adb2e04af3442@syzkaller.appspotmail.com
Fixes: 6a21cc50f0 ("seccomp: add a return code to trap to userspace")
CC: stable@vger.kernel.org # v5.0+
Signed-off-by: Tycho Andersen <tycho@tycho.ws>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: James Morris <jamorris@linux.microsoft.com>
Fix a similar endless event loop as was done in commit
8dcf32175b ("i2c: prevent endless uevent loop with
CONFIG_I2C_DEBUG_CORE"):
The culprit is the dev_dbg printk in the i2c uevent handler. If
this is activated (for instance by CONFIG_I2C_DEBUG_CORE) it results
in an endless loop with systemd-journald.
This happens if user-space scans the system log and reads the uevent
file to get information about a newly created device, which seems
fair use to me. Unfortunately reading the "uevent" file uses the
same function that runs for creating the uevent for a new device,
generating the next syslog entry
Both CONFIG_I2C_DEBUG_CORE and CONFIG_POWER_SUPPLY_DEBUG were reported
in https://bugs.freedesktop.org/show_bug.cgi?id=76886 but only former
seems to have been fixed. Drop debug prints as it was done in I2C
subsystem to resolve the issue.
Signed-off-by: Andrey Smirnov <andrew.smirnov@gmail.com>
Cc: Chris Healy <cphealy@gmail.com>
Cc: linux-pm@vger.kernel.org
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Since the migration of the driver to stop using the legacy
->select_chip() hook, there is nothing deselecting the target anymore,
thus the selection is not forced at the next access. Ensure the ND_RUN
bit and the interrupts are always in a clean state.
Cc: Daniel Mack <daniel@zonque.org>
Cc: stable@vger.kernel.org
Fixes: b25251414f ("mtd: rawnand: marvell: Stop implementing ->select_chip()")
Suggested-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Tested-by: Daniel Mack <daniel@zonque.org>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
sched_clock_cpu() may not be consistent between CPUs. If a task
migrates to another CPU, then se.exec_start is set to that CPU's
rq_clock_task() by update_stats_curr_start(). Specifically, the new
value might be before the old value due to clock skew.
So then if in numa_get_avg_runtime() the expression:
'now - p->last_task_numa_placement'
ends up as -1, then the divider '*period + 1' in task_numa_placement()
is 0 and things go bang. Similar to update_curr(), check if time goes
backwards to avoid this.
[ peterz: Wrote new changelog. ]
[ mingo: Tweaked the code comment. ]
Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: cj.chengjian@huawei.com
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20190425080016.GX11158@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull ceph fixes from Ilya Dryomov:
"dentry name handling fixes from Jeff and a memory leak fix from Zheng.
Both are old issues, marked for stable"
* tag 'ceph-for-5.1-rc7' of git://github.com/ceph/ceph-client:
ceph: fix ci->i_head_snapc leak
ceph: handle the case where a dentry has been renamed on outstanding req
ceph: ensure d_name stability in ceph_dentry_hash()
ceph: only use d_name directly when parent is locked
Pull crypto fixes from Herbert Xu:
"This fixes a bug in xts and lrw where they may sleep in an atomic
context"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: lrw - Fix atomic sleep when walking skcipher
crypto: xts - Fix atomic sleep when walking skcipher
Compilation fails if any of undeclared clk_set_*() functions are in use
and CONFIG_HAVE_CLK=n.
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
When the maximum send wr delivered by the user is zero, the qp does not
have a sq.
When allocating the sq db buffer to store the user sq pi pointer and map
it to the kernel mode, max_send_wr is used as the trigger condition, while
the kernel does not consider the max_send_wr trigger condition when
mapmping db. It will cause sq record doorbell map fail and create qp fail.
The failed print information as follows:
hns3 0000:7d:00.1: Send cmd: tail - 418, opcode - 0x8504, flag - 0x0011, retval - 0x0000
hns3 0000:7d:00.1: Send cmd: 0xe59dc000 0x00000000 0x00000000 0x00000000 0x00000116 0x0000ffff
hns3 0000:7d:00.1: sq record doorbell map failed!
hns3 0000:7d:00.1: Create RC QP failed
Fixes: 0425e3e6e0 ("RDMA/hns: Support flush cqe for hip08 in kernel space")
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
When a VCPU never runs before a guest exists, but we set timer registers
up via ioctls, the associated hrtimer might never get cancelled.
Since we moved vcpu_load/put into the arch-specific implementations and
only have load/put for KVM_RUN, we won't ever have a scheduled hrtimer
for emulating a timer when modifying the timer state via an ioctl from
user space. All we need to do is make sure that we pick up the right
state when we load the timer state next time userspace calls KVM_RUN
again.
We also do not need to worry about this interacting with the bg_timer,
because if we were in WFI from the guest, and somehow ended up in a
kvm_arm_timer_set_reg, it means that:
1. the VCPU thread has received a signal,
2. we have called vcpu_load when being scheduled in again,
3. we have called vcpu_put when we returned to userspace for it to issue
another ioctl
And therefore will not have a bg_timer programmed and the event is
treated as a spurious wakeup from WFI if userspace decides to run the
vcpu again even if there are not virtual interrupts.
This fixes stray virtual timer interrupts triggered by an expiring
hrtimer, which happens after a failed live migration, for instance.
Fixes: bee038a674 ("KVM: arm/arm64: Rework the timer code to use a timer_map")
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Reported-by: Andre Przywara <andre.przywara@arm.com>
Tested-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Recent multi-page biovec rework allowed creation of bios that can span
large regions - up to 128 megabytes in the case of btrfs. OTOH btrfs'
submission path currently allocates a contiguous array to store the
checksums for every bio submitted. This means we can request up to
(128mb / BTRFS_SECTOR_SIZE) * 4 bytes + 32bytes of memory from kmalloc.
On busy systems with possibly fragmented memory said kmalloc can fail
which will trigger BUG_ON due to improper error handling IO submission
context in btrfs.
Until error handling is improved or bios in btrfs limited to a more
manageable size (e.g. 1m) let's use kvmalloc to fallback to vmalloc for
such large allocations. There is no hard requirement that the memory
allocated for checksums during IO submission has to be contiguous, but
this is a simple fix that does not require several non-contiguous
allocations.
For small writes this is unlikely to have any visible effect since
kmalloc will still satisfy allocation requests as usual. For larger
requests the code will just fallback to vmalloc.
We've performed evaluation on several workload types and there was no
significant difference kmalloc vs kvmalloc.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The commit fc3a2fcaa1 ("mwifiex: use atomic bitops to represent
adapter status variables") had a fairly straightforward bug in it. It
contained this bit of diff:
- if (!adapter->is_suspended) {
+ if (test_bit(MWIFIEX_IS_SUSPENDED, &adapter->work_flags)) {
As you can see the patch missed the "!" when converting to the atomic
bitops. This meant that the resume hasn't done anything at all since
that commit landed and suspend/resume for mwifiex SDIO cards has been
totally broken.
After fixing this mwifiex suspend/resume appears to work again, at
least with the simple testing I've done.
Fixes: fc3a2fcaa1 ("mwifiex: use atomic bitops to represent adapter status variables")
Cc: <stable@vger.kernel.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
A failed KVM_ARM_VCPU_INIT should not set the vcpu target,
as the vcpu target is used by kvm_vcpu_initialized() to
determine if other vcpu ioctls may proceed. We need to set
the target before calling kvm_reset_vcpu(), but if that call
fails, we should then unset it and clear the feature bitmap
while we're at it.
Signed-off-by: Andrew Jones <drjones@redhat.com>
[maz: Simplified patch, completed commit message]
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
The syzkaller USB fuzzer spotted a slab-out-of-bounds bug in the
ds2490 driver. This bug is caused by improper use of the altsetting
array in the usb_interface structure (the array's entries are not
always stored in numerical order), combined with a naive assumption
that all interfaces probed by the driver will have the expected number
of altsettings.
The bug can be fixed by replacing references to the possibly
non-existent intf->altsetting[alt] entry with the guaranteed-to-exist
intf->cur_altsetting entry.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-and-tested-by: syzbot+d65f673b847a1a96cdba@syzkaller.appspotmail.com
CC: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The syzkaller USB fuzzer found a general-protection-fault bug in the
yurex driver. The fault occurs when a device has been unplugged; the
driver's interrupt-URB handler logs an error message referring to the
device by name, after the device has been unregistered and its name
deallocated.
This problem is caused by the fact that the interrupt URB isn't
cancelled until the driver's private data structure is released, which
can happen long after the device is gone. The cure is to make sure
that the interrupt URB is killed before yurex_disconnect() returns;
this is exactly the sort of thing that usb_poison_urb() was meant for.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-and-tested-by: syzbot+2eb9121678bdb36e6d57@syzkaller.appspotmail.com
CC: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change the validation of number_of_packets in get_pipe to compare the
number of packets to a fixed maximum number of packets allowed, set to
be 1024. This number was chosen due to it being used by other drivers as
well, for example drivers/usb/host/uhci-q.c
Background/reason:
The get_pipe function in stub_rx.c validates the number of packets in
isochronous mode and aborts with an error if that number is too large,
in order to prevent malicious input from possibly triggering large
memory allocations. This was previously done by checking whether
pdu->u.cmd_submit.number_of_packets is bigger than the number of packets
that would be needed for pdu->u.cmd_submit.transfer_buffer_length bytes
if all except possibly the last packet had maximum length, given by
usb_endpoint_maxp(epd) * usb_endpoint_maxp_mult(epd). This leads to an
error if URBs with packets shorter than the maximum possible length are
submitted, which is allowed according to
Documentation/driver-api/usb/URB.rst and occurs for example with the
snd-usb-audio driver.
Fixes: c6688ef9f2 ("usbip: fix stub_rx: harden CMD_SUBMIT path to handle malicious input")
Signed-off-by: Malte Leip <malte@leip.net>
Cc: stable <stable@vger.kernel.org>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When ddc-i2c-bus property is used, a NULL pointer dereference is reported:
[ 31.041669] Unable to handle kernel NULL pointer dereference at virtual address 00000008
[ 31.041671] pgd = 4d3c16f6
[ 31.041673] [00000008] *pgd=00000000
[ 31.041678] Internal error: Oops: 5 [#1] SMP ARM
[ 31.041711] Hardware name: Rockchip (Device Tree)
[ 31.041718] PC is at i2c_transfer+0x8/0xe4
[ 31.041721] LR is at drm_scdc_read+0x54/0x84
[ 31.041723] pc : [<c073273c>] lr : [<c05926c4>] psr: 280f0013
[ 31.041725] sp : edffdad0 ip : 5ccb5511 fp : 00000058
[ 31.041727] r10: 00000780 r9 : edf91608 r8 : c11b0f48
[ 31.041728] r7 : 00000438 r6 : 00000000 r5 : 00000000 r4 : 00000000
[ 31.041730] r3 : edffdae7 r2 : 00000002 r1 : edffdaec r0 : 00000000
[ 31.041908] [<c073273c>] (i2c_transfer) from [<c05926c4>] (drm_scdc_read+0x54/0x84)
[ 31.041913] [<c05926c4>] (drm_scdc_read) from [<c0592858>] (drm_scdc_set_scrambling+0x30/0xbc)
[ 31.041919] [<c0592858>] (drm_scdc_set_scrambling) from [<c05cc0f4>] (dw_hdmi_update_power+0x1440/0x1610)
[ 31.041926] [<c05cc0f4>] (dw_hdmi_update_power) from [<c05cc574>] (dw_hdmi_bridge_enable+0x2c/0x70)
[ 31.041932] [<c05cc574>] (dw_hdmi_bridge_enable) from [<c05aed48>] (drm_bridge_enable+0x24/0x34)
[ 31.041938] [<c05aed48>] (drm_bridge_enable) from [<c0591060>] (drm_atomic_helper_commit_modeset_enables+0x114/0x220)
[ 31.041943] [<c0591060>] (drm_atomic_helper_commit_modeset_enables) from [<c05c3fe0>] (rockchip_atomic_helper_commit_tail_rpm+0x28/0x64)
hdmi->i2c may not be set when ddc-i2c-bus property is used in device tree.
Fix this by using hdmi->ddc as the i2c adapter when calling drm_scdc_*().
Also report that SCDC is not supported when there is no DDC bus.
Fixes: 264fce6cc2 ("drm/bridge: dw-hdmi: Add SCDC and TMDS Scrambling support")
Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: Neil Armstrong <narmstrong@baylibre.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Andrzej Hajda <a.hajda@samsung.com>
Link: https://patchwork.freedesktop.org/patch/msgid/VE1PR03MB59031814B5BCAB2152923BDAAC210@VE1PR03MB5903.eurprd03.prod.outlook.com
The err_remove_chip block is too coarse, and may perform cleanup that
must not be done. E.g. if of_gpiochip_add() fails, of_gpiochip_remove()
is still called, causing:
OF: ERROR: Bad of_node_put() on /soc/gpio@e6050000
CPU: 1 PID: 20 Comm: kworker/1:1 Not tainted 5.1.0-rc2-koelsch+ #407
Hardware name: Generic R-Car Gen2 (Flattened Device Tree)
Workqueue: events deferred_probe_work_func
[<c020ec74>] (unwind_backtrace) from [<c020ae58>] (show_stack+0x10/0x14)
[<c020ae58>] (show_stack) from [<c07c1224>] (dump_stack+0x7c/0x9c)
[<c07c1224>] (dump_stack) from [<c07c5a80>] (kobject_put+0x94/0xbc)
[<c07c5a80>] (kobject_put) from [<c0470420>] (gpiochip_add_data_with_key+0x8d8/0xa3c)
[<c0470420>] (gpiochip_add_data_with_key) from [<c0473738>] (gpio_rcar_probe+0x1d4/0x314)
[<c0473738>] (gpio_rcar_probe) from [<c052fca8>] (platform_drv_probe+0x48/0x94)
and later, if a GPIO consumer tries to use a GPIO from a failed
controller:
WARNING: CPU: 0 PID: 1 at lib/refcount.c:156 kobject_get+0x38/0x4c
refcount_t: increment on 0; use-after-free.
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.1.0-rc2-koelsch+ #407
Hardware name: Generic R-Car Gen2 (Flattened Device Tree)
[<c020ec74>] (unwind_backtrace) from [<c020ae58>] (show_stack+0x10/0x14)
[<c020ae58>] (show_stack) from [<c07c1224>] (dump_stack+0x7c/0x9c)
[<c07c1224>] (dump_stack) from [<c0221580>] (__warn+0xd0/0xec)
[<c0221580>] (__warn) from [<c02215e0>] (warn_slowpath_fmt+0x44/0x6c)
[<c02215e0>] (warn_slowpath_fmt) from [<c07c58fc>] (kobject_get+0x38/0x4c)
[<c07c58fc>] (kobject_get) from [<c068b3ec>] (of_node_get+0x14/0x1c)
[<c068b3ec>] (of_node_get) from [<c0686f24>] (of_find_node_by_phandle+0xc0/0xf0)
[<c0686f24>] (of_find_node_by_phandle) from [<c0686fbc>] (of_phandle_iterator_next+0x68/0x154)
[<c0686fbc>] (of_phandle_iterator_next) from [<c0687fe4>] (__of_parse_phandle_with_args+0x40/0xd0)
[<c0687fe4>] (__of_parse_phandle_with_args) from [<c0688204>] (of_parse_phandle_with_args_map+0x100/0x3ac)
[<c0688204>] (of_parse_phandle_with_args_map) from [<c0471240>] (of_get_named_gpiod_flags+0x38/0x380)
[<c0471240>] (of_get_named_gpiod_flags) from [<c046f864>] (gpiod_get_from_of_node+0x24/0xd8)
[<c046f864>] (gpiod_get_from_of_node) from [<c0470aa4>] (devm_fwnode_get_index_gpiod_from_child+0xa0/0x144)
[<c0470aa4>] (devm_fwnode_get_index_gpiod_from_child) from [<c05f425c>] (gpio_keys_probe+0x418/0x7bc)
[<c05f425c>] (gpio_keys_probe) from [<c052fca8>] (platform_drv_probe+0x48/0x94)
Fix this by splitting the cleanup block, and adding a missing call to
gpiochip_irqchip_remove().
Fixes: 28355f8196 ("gpio: defer probe if pinctrl cannot be found")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Mukesh Ojha <mojha@codeaurora.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Remove the check for IOMMU presence since it was considered a
layer violation.
This means we have no reliable way to destinguish between coherent
hardware IOMMU DMA address translations and incoherent SWIOTLB DMA
address translations, which we can't handle. So always presume the
former. This means that if anybody forces SWIOTLB without also setting
the vmw_force_coherent=1 vmwgfx option, driver operation will fail,
like it will on most other graphics drivers.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Pull networking fixes from David Miller:
"Just the usual assortment of small'ish fixes:
1) Conntrack timeout is sometimes not initialized properly, from
Alexander Potapenko.
2) Add a reasonable range limit to tcp_min_rtt_wlen to avoid
undefined behavior. From ZhangXiaoxu.
3) des1 field of descriptor in stmmac driver is initialized with the
wrong variable. From Yue Haibing.
4) Increase mlxsw pci sw reset timeout a little bit more, from Ido
Schimmel.
5) Match IOT2000 stmmac devices more accurately, from Su Bao Cheng.
6) Fallback refcount fix in TLS code, from Jakub Kicinski.
7) Fix max MTU check when using XDP in mlx5, from Maxim Mikityanskiy.
8) Fix recursive locking in team driver, from Hangbin Liu.
9) Fix tls_set_device_offload_Rx() deadlock, from Jakub Kicinski.
10) Don't use napi_alloc_frag() outside of softiq context of socionext
driver, from Ilias Apalodimas.
11) MAC address increment overflow in ncsi, from Tao Ren.
12) Fix a regression in 8K/1M pool switching of RDS, from Zhu Yanjun.
13) ipv4_link_failure has to validate the headers that are actually
there because RAW sockets can pass in arbitrary garbage, from Eric
Dumazet"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (43 commits)
ipv4: add sanity checks in ipv4_link_failure()
net/rose: fix unbound loop in rose_loopback_timer()
rxrpc: fix race condition in rxrpc_input_packet()
net: rds: exchange of 8K and 1M pool
net: vrf: Fix operation not supported when set vrf mac
net/ncsi: handle overflow when incrementing mac address
net: socionext: replace napi_alloc_frag with the netdev variant on init
net: atheros: fix spelling mistake "underun" -> "underrun"
spi: ST ST95HF NFC: declare missing of table
spi: Micrel eth switch: declare missing of table
net: stmmac: move stmmac_check_ether_addr() to driver probe
netfilter: fix nf_l4proto_log_invalid to log invalid packets
netfilter: never get/set skb->tstamp
netfilter: ebtables: CONFIG_COMPAT: drop a bogus WARN_ON
Documentation: decnet: remove reference to CONFIG_DECNET_ROUTE_FWMARK
dt-bindings: add an explanation for internal phy-mode
net/tls: don't leak IV and record seq when offload fails
net/tls: avoid potential deadlock in tls_set_device_offload_rx()
selftests/net: correct the return value for run_afpackettests
team: fix possible recursive locking when add slaves
...
Pull LED update from Jacek Anaszewski:
"A single change to MAINTAINERS:
We announce a new LED reviewer - Dan Murphy"
* tag 'leds-for-5.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds:
MAINTAINERS: LEDs: Add designated reviewer for LED subsystem
Add a designated reviewer for the LED subsystem as there
are already two maintainers assigned.
Signed-off-by: Dan Murphy <dmurphy@ti.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Before the commit 490ea5967b ("RDS: IB: move FMR code to its own file"),
when the dirty_count is greater than 9/10 of max_items of 8K pool,
1M pool is used, Vice versa. After the commit 490ea5967b ("RDS: IB: move
FMR code to its own file"), the above is removed. When we make the
following tests.
Server:
rds-stress -r 1.1.1.16 -D 1M
Client:
rds-stress -r 1.1.1.14 -s 1.1.1.16 -D 1M
The following will appear.
"
connecting to 1.1.1.16:4000
negotiated options, tasks will start in 2 seconds
Starting up..header from 1.1.1.166:4001 to id 4001 bogus
..
tsks tx/s rx/s tx+rx K/s mbi K/s mbo K/s tx us/c rtt us
cpu %
1 0 0 0.00 0.00 0.00 0.00 0.00 -1.00
1 0 0 0.00 0.00 0.00 0.00 0.00 -1.00
1 0 0 0.00 0.00 0.00 0.00 0.00 -1.00
1 0 0 0.00 0.00 0.00 0.00 0.00 -1.00
1 0 0 0.00 0.00 0.00 0.00 0.00 -1.00
...
"
So this exchange between 8K and 1M pool is added back.
Fixes: commit 490ea5967b ("RDS: IB: move FMR code to its own file")
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vrf device is not able to change mac address now because lack of
ndo_set_mac_address. Complete this in case some apps need to do
this.
Reported-by: Hui Wang <wanghui104@huawei.com>
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
regmap_update_bits could fail and deserves a check.
The patch adds the checks and if it fails, returns its error
code upstream.
Signed-off-by: Kangjie Lu <kjlu@umn.edu>
Reviewed-by: Mukesh Ojha <mojha@codeaurora.org>
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
A path-based rename returning EBUSY will incorrectly try opening
the file with a cifs (NT Create AndX) operation on an smb2+ mount,
which causes the server to force a session close.
If the mount is smb2+, skip the fallback.
Signed-off-by: Frank Sorenson <sorenson@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
The RMI4 function structure has been released in rmi_register_function
if error occurs. However, it will be released again in the function
rmi_create_function, which may result in a double-free bug.
Signed-off-by: Pan Bian <bianpan2016@163.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
When this code was consolidated the intention was that the VMA would
become backed by anonymous zero pages after the zap_vma_pte - however this
very subtly relied on setting the vm_ops = NULL and clearing the VM_SHARED
bits to transform the VMA into an anonymous VMA. Since the vm_ops was
removed this broke.
Now userspace gets a SIGBUS if it touches the vma after disassociation.
Instead of converting the VMA to anonymous provide a fault handler that
puts a zero'd page into the VMA when user-space touches it after
disassociation.
Cc: stable@vger.kernel.org
Suggested-by: Andrea Arcangeli <aarcange@redhat.com>
Fixes: 5f9794dc94 ("RDMA/ucontext: Add a core API for mmaping driver IO memory")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The SNVS power key is not only used on i.MX6SX and i.MX7D, it is also
used by i.MX6UL and NXP's latest ARMv8 based i.MX8M series SOC. So
update the config dependency to use ARCH_MXC, and add the COMPILE_TEST
too.
Signed-off-by: Jacky Bai <ping.bai@nxp.com>
Reviewed-by: Dong Aisheng <aisheng.dong@nxp.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Since mlx5 supports device disassociate it must use this API for all
BAR page mmaps, otherwise the pages can remain mapped after the device
is unplugged causing a system crash.
Cc: stable@vger.kernel.org
Fixes: 5f9794dc94 ("RDMA/ucontext: Add a core API for mmaping driver IO memory")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
The intent of this VMA was to be read-only from user space, but the
VM_MAYWRITE masking was missed, so mprotect could make it writable.
Cc: stable@vger.kernel.org
Fixes: 5c99eaecb1 ("IB/mlx5: Mmap the HCA's clock info to user-space")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
A pointer to crtc was missing, resulting in the following build error:
drivers/gpu/drm/vc4/vc4_crtc.c:1045:44: sparse: sparse: incorrect type in argument 1 (different base types)
drivers/gpu/drm/vc4/vc4_crtc.c:1045:44: sparse: expected struct drm_crtc *crtc
drivers/gpu/drm/vc4/vc4_crtc.c:1045:44: sparse: got struct drm_crtc_state *state
drivers/gpu/drm/vc4/vc4_crtc.c:1045:39: sparse: sparse: not enough arguments for function vc4_crtc_destroy_state
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reported-by: kbuild test robot <lkp@intel.com>
Cc: Eric Anholt <eric@anholt.net>
Link: https://patchwork.freedesktop.org/patch/msgid/2b6ed5e6-81b0-4276-8860-870b54ca3262@linux.intel.com
Fixes: d08106796a ("drm/vc4: Fix memory leak during gpu reset.")
Cc: <stable@vger.kernel.org> # v4.6+
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Third batch of iwlwifi fixes intended for v5.1
* Fix an oops when creating debugfs entries;
* Fix bug when trying to capture debugging info while in rfkill;
* Prevent potential uninitialized memory dumps into debugging logs;
* Fix some initialization parameters for AX210 devices;
* Fix an oops with non-MSIX devices;
Add two Dell platform for headset mode.
[ Note: this is a further correction / addition of the previous
pin-based quirks for Dell machines; another entry for ALC236 with
the d-mic pin 0x12 and an entry for ALC295 -- tiwai ]
Fixes: b26e36b7ef ("ALSA: hda/realtek - add two more pin configuration sets to quirk table")
Signed-off-by: Kailang Yang <kailang@realtek.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The first kmemleak_scan() call after boot would trigger the crash below
because this callpath:
kernel_init
free_initmem
mem_encrypt_free_decrypted_mem
free_init_pages
unmaps memory inside the .bss when DEBUG_PAGEALLOC=y.
kmemleak_init() will register the .data/.bss sections and then
kmemleak_scan() will scan those addresses and dereference them looking
for pointer references. If free_init_pages() frees and unmaps pages in
those sections, kmemleak_scan() will crash if referencing one of those
addresses:
BUG: unable to handle kernel paging request at ffffffffbd402000
CPU: 12 PID: 325 Comm: kmemleak Not tainted 5.1.0-rc4+ #4
RIP: 0010:scan_block
Call Trace:
scan_gray_list
kmemleak_scan
kmemleak_scan_thread
kthread
ret_from_fork
Since kmemleak_free_part() is tolerant to unknown objects (not tracked
by kmemleak), it is fine to call it from free_init_pages() even if not
all address ranges passed to this function are known to kmemleak.
[ bp: Massage. ]
Fixes: b3f0907c71 ("x86/mm: Add .bss..decrypted section to hold shared variables")
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190423165811.36699-1-cai@lca.pw
Previously BMC's MAC address is calculated by simply adding 1 to the
last byte of network controller's MAC address, and it produces incorrect
result when network controller's MAC address ends with 0xFF.
The problem can be fixed by calling eth_addr_inc() function to increment
MAC address; besides, the MAC address is also validated before assigning
to BMC.
Fixes: cb10c7c0df ("net/ncsi: Add NCSI Broadcom OEM command")
Signed-off-by: Tao Ren <taoren@fb.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull drm regression fixes from Dave Airlie:
"We interrupt your regularly scheduled drm fixes for a regression
special.
The first is for a fix in i915 that had unexpected side effects
fallout in the userspace X.org modesetting driver where X would no
longer start. I got tired of the nitpicking and issued a large hammer
on it. The X.org driver is buggy, but blackscreen regressions are
worse.
The second was an oversight that myself and Gerd should have noticed
better, Gerd is trying to fix this properly, but the regression is too
large to leave, even if the original behaviour is bad in some cases,
it's clearly bad to break a bunch of working use cases.
I'll likely have a regular fixes pull later, but I really wanted to
highlight these"
* tag 'drm-fixes-2019-04-24' of git://anongit.freedesktop.org/drm/drm:
Revert "drm/virtio: drop prime import/export callbacks"
Revert "drm/i915/fbdev: Actually configure untiled displays"
The netdev variant is usable on any context since it disables interrupts.
The napi variant of the call should only be used within softirq context.
Replace napi_alloc_frag on driver init with the correct netdev_alloc_frag
call
Changes since v1:
- Adjusted commit message
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Jassi Brar <jaswinder.singh@linaro.org>
Fixes: 4acb20b462 ("net: socionext: different approach on DMA")
Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch does more harm than good, as it breaks both Xwayland and
gnome-shell with X11.
Xwayland requires DRI3 & DRI3 requires PRIME.
X11 crash for obscure double-free reason which are hard to debug
(starting X11 by hand doesn't trigger the crash).
I don't see an apparent problem implementing those stub prime
functions, they may return an error at run-time, and it seems to be
handled fine by GNOME at least.
This reverts commit b318e3ff7c.
[airlied:
This broke userspace for virtio-gpus, and regressed things from DRI3 to DRI2.
This brings back the original problem, but it's better than regressions.]
Fixes: b318e3ff7c ("drm/virtio: drop prime import/export callbacks")
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This reverts commit d179b88deb.
This commit is documented to break userspace X.org modesetting driver in certain configurations.
The X.org modesetting userspace driver is broken. No fixes are available yet. In order for this patch to be applied it either needs a config option or a workaround developed.
This has been reported a few times, saying it's a userspace problem is clearly against the regression rules.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109806
Signed-off-by: Dave Airlie <airlied@redhat.com>
Cc: <stable@vger.kernel.org> # v3.19+
Pull nfsd bugfixes from Bruce Fields:
"Fix miscellaneous nfsd bugs, in NFSv4.1 callbacks, NFSv4.1
lock-notification callbacks, NFSv3 readdir encoding, and the
cache/upcall code"
* tag 'nfsd-5.1-1' of git://linux-nfs.org/~bfields/linux:
nfsd: wake blocked file lock waiters before sending callback
nfsd: wake waiters blocked on file_lock before deleting it
nfsd: Don't release the callback slot unless it was actually held
nfsd/nfsd3_proc_readdir: fix buffer count and page pointers
sunrpc: don't mark uninitialised items as VALID.
Pull syscall numbering updates from Arnd Bergmann:
"arch: add pidfd and io_uring syscalls everywhere
This comes a bit late, but should be in 5.1 anyway: we want the newly
added system calls to be synchronized across all architectures in the
release.
I hope that in the future, any newly added system calls can be added
to all architectures at the same time, and tested there while they are
in linux-next, avoiding dependencies between the architecture
maintainer trees and the tree that contains the new system call"
* tag 'syscalls-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
arch: add pidfd and io_uring syscalls everywhere
It's possible for us to issue a lookup to revalidate a dentry
concurrently with a rename. If done in the right order, then we could
end up processing dentry info in the reply that no longer reflects the
state of the dentry.
If req->r_dentry->d_name differs from the one in the trace, then just
ignore the trace in the reply. We only need to do this however if the
parent's i_rwsem is not held.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Ben reported tripping the BUG_ON in create_request_message during some
performance testing. Analysis of the vmcore showed that the length of
the r_dentry->d_name string changed after we allocated the buffer, but
before we encoded it.
build_dentry_path returns pointers to d_name in the common case of
non-snapped dentries, but this optimization isn't safe unless the parent
directory is locked. When it isn't, have the code make a copy of the
d_name while holding the d_lock.
Cc: stable@vger.kernel.org
Reported-by: Ben England <bengland@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Add missing <of_device_id> table for SPI driver relying on SPI
device match since compatible is in a DT binding or in a DTS.
Before this patch:
modinfo drivers/nfc/st95hf/st95hf.ko | grep alias
alias: spi:st95hf
After this patch:
modinfo drivers/nfc/st95hf/st95hf.ko | grep alias
alias: spi:st95hf
alias: of:N*T*Cst,st95hfC*
alias: of:N*T*Cst,st95hf
Reported-by: Javier Martinez Canillas <javier@dowhile0.org>
Signed-off-by: Daniel Gomez <dagmcr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add missing <of_device_id> table for SPI driver relying on SPI
device match since compatible is in a DT binding or in a DTS.
Before this patch:
modinfo drivers/net/phy/spi_ks8995.ko | grep alias
alias: spi:ksz8795
alias: spi:ksz8864
alias: spi:ks8995
After this patch:
modinfo drivers/net/phy/spi_ks8995.ko | grep alias
alias: spi:ksz8795
alias: spi:ksz8864
alias: spi:ks8995
alias: of:N*T*Cmicrel,ksz8795C*
alias: of:N*T*Cmicrel,ksz8795
alias: of:N*T*Cmicrel,ksz8864C*
alias: of:N*T*Cmicrel,ksz8864
alias: of:N*T*Cmicrel,ks8995C*
alias: of:N*T*Cmicrel,ks8995
Reported-by: Javier Martinez Canillas <javier@dowhile0.org>
Signed-off-by: Daniel Gomez <dagmcr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The EFI stub is entered with the caches and MMU enabled by the
firmware, and once the stub is ready to hand over to the decompressor,
we clean and disable the caches.
The cache clean routines use CP15 barrier instructions, which can be
disabled via SCTLR. Normally, when using the provided cache handling
routines to enable the caches and MMU, this bit is enabled as well.
However, but since we entered the stub with the caches already enabled,
this routine is not executed before we call the cache clean routines,
resulting in undefined instruction exceptions if the firmware never
enabled this bit.
So set the bit explicitly in the EFI entry code, but do so in a way that
guarantees that the resulting code can still run on v6 cores as well
(which are guaranteed to have CP15 barriers enabled)
Cc: <stable@vger.kernel.org> # v4.9+
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
When CONFIG_ARM_MPU is not defined, the base address of v7M SCB register
is not initialized with correct value. This prevents enabling I/D caches
when the L1 cache poilcy is applied in kernel.
Fixes: 3c24121039 ("ARM: 8756/1: NOMMU: Postpone MPU activation till __after_proc_init")
Signed-off-by: Tigran Tadevosyan <tigran.tadevosyan@arm.com>
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Naresh Kamboju recently reported that the function-graph tracer crashes
on ARM. The function-graph tracer assumes that the kernel is built with
frame pointers.
We explicitly disabled the function-graph tracer when building Thumb2,
since the Thumb2 ABI doesn't have frame pointers.
We recently changed the way the unwinder method was selected, which
seems to have made it more likely that we can end up with the function-
graph tracer enabled but without the kernel built with frame pointers.
Fix up the function graph tracer dependencies so the option is not
available when we have no possibility of having frame pointers, and
adjust the dependencies on the unwinder option to hide the non-frame
pointer unwinder options if the function-graph tracer is enabled.
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Tested-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Since commit 222b5f0441 ("drm/sched: Refactor ring mirror list
handling."), drm_sched_hw_job_reset is no longer there, so let's adjust
the doc comment accordingly.
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Since commit 09bb839434 we don't use the state argument for any sort
of on-stack caching in the io read and write path. Remove the stale
and unused argument from them, and bubble it up to __io_submit_sqe()
and down to io_prep_rw().
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In order to make sure that the plane color space gets reset correctly.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Initialize the flow input colorspaces to unknown and reset to that value
when the channel gets disabled. This avoids the state getting mixed up
with a previous mode.
Also keep the CSC settings for the background flow intact when disabling
the foreground flow.
Root-caused-by: Jonathan Marek <jonathan@marek.ca>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Another bodge for the ftrace PLT code: plt_entries_equal() now takes
the place relative nature of the ADRP/ADD based PLT entries into
account, which means that a struct trampoline instance on the stack
is no longer equal to the same set of opcodes in the module struct,
given that they don't point to the same place in memory anymore.
Work around this by using memcmp() in the ftrace PLT handling code.
Acked-by: Will Deacon <will.deacon@arm.com>
Tested-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
We need to dereference the directory to get its parent to
be able to rename it, so it's clearly not safe to try to
do this with ERR_PTR() pointers. Skip in this case.
It seems that this is most likely what was causing the
report by syzbot, but I'm not entirely sure as it didn't
come with a reproducer this time.
Cc: stable@vger.kernel.org
Reported-by: syzbot+4ece1a28b8f4730547c9@syzkaller.appspotmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Commit c82c06ce43d3("cfg80211: Notify all User Hints To self managed wiphys")
notified all new user hints to self managed wiphy's after device registration.
But it didn't do this for anything other than cell base hints done before
registration.
This needs to be done during wiphy registration of a self managed device also,
so that the previous user settings are retained.
Fixes: c82c06ce43 ("cfg80211: Notify all User Hints To self managed wiphys")
Signed-off-by: Sriram R <srirrama@codeaurora.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The txq of vif is added to active_txqs list for ATF TXQ scheduling
in the function ieee80211_queue_skb(), but it was not properly removed
before freeing the txq object. It was causing use after free of the txq
objects from the active_txqs list, result was kernel panic
due to invalid memory access.
Fix kernel invalid memory access by properly removing txq object
from active_txqs list before free the object.
Signed-off-by: Bhagavathi Perumal S <bperumal@codeaurora.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
In the event that the start address of the initrd is not aligned, but
has an aligned size, the base + size will not cover the entire initrd
image and there is a chance that the kernel will corrupt the tail of the
image.
By aligning the end of the initrd to a page boundary and then
subtracting the adjusted start address the memblock reservation will
cover all pages that contains the initrd.
Fixes: c756c592e4 ("arm64: Utilize phys_initrd_start/phys_initrd_size")
Cc: stable@vger.kernel.org
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The tx_status poll in the rcar_dmac driver reads the status register
which indicates which chunk is busy (DMACHCRB). Afterwards the point
inside the chunk is read from DMATCRB. It is possible that the chunk
has changed between the two reads. The result is a non-monotonous
increase of the residue. Fix this by introducing a 'safe read' logic.
Fixes: 73a47bd0da ("dmaengine: rcar-dmac: use TCRB instead of TCR for residue")
Signed-off-by: Achim Dahlhoff <Achim.Dahlhoff@de.bosch.com>
Signed-off-by: Dirk Behme <dirk.behme@de.bosch.com>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Cc: <stable@vger.kernel.org> # v4.16+
Signed-off-by: Vinod Koul <vkoul@kernel.org>
The commit af19b7ce76 ("mmc: bcm2835: Avoid possible races on
data requests") introduces a possible circular locking dependency,
which is triggered by swapping to the sdhost interface.
So instead of reintroduce the race condition again, we could also
avoid this situation by using GFP_NOWAIT for the allocation of the
DMA buffer descriptors.
Reported-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Fixes: af19b7ce76 ("mmc: bcm2835: Avoid possible races on data requests")
Link: http://lists.infradead.org/pipermail/linux-rpi-kernel/2019-March/008615.html
Signed-off-by: Vinod Koul <vkoul@kernel.org>
stmmac_check_ether_addr() checks the MAC address and assigns one in
driver open(). In many cases when we create slave netdevice, the dev
addr is inherited from master but the master dev addr maybe NULL at
that time, so move this call to driver probe so that address is
always valid.
Signed-off-by: Xiaofei Shen <xiaofeis@codeaurora.org>
Tested-by: Xiaofei Shen <xiaofeis@codeaurora.org>
Signed-off-by: Sneh Shah <snehshah@codeaurora.org>
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter/IPVS fixes for net
The following patchset contains Netfilter/IPVS fixes for your net tree:
1) Add a selftest for icmp packet too big errors with conntrack, from
Florian Westphal.
2) Validate inner header in ICMP error message does not lie to us
in conntrack, also from Florian.
3) Initialize ct->timeout to calm down KASAN, from Alexander Potapenko.
4) Skip ICMP error messages from tunnels in IPVS, from Julian Anastasov.
5) Use a hash to expose conntrack and expectation ID, from Florian Westphal.
6) Prevent shift wrap in nft_chain_parse_hook(), from Dan Carpenter.
7) Fix broken ICMP ID randomization with NAT, also from Florian.
8) Remove WARN_ON in ebtables compat that is reached via syzkaller,
from Florian Westphal.
9) Fix broken timestamps since fb420d5d91 ("tcp/fq: move back to
CLOCK_MONOTONIC"), from Florian.
10) Fix logging of invalid packets in conntrack, from Andrei Vagin.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When a blocked NFS lock is "awoken" we send a callback to the server and
then wake any hosts waiting on it. If a client attempts to get a lock
and then drops off the net, we could end up waiting for a long time
until we end up waking locks blocked on that request.
So, wake any other waiting lock requests before sending the callback.
Do this by calling locks_delete_block in a new "prepare" phase for
CB_NOTIFY_LOCK callbacks.
URL: https://bugzilla.kernel.org/show_bug.cgi?id=203363
Fixes: 16306a61d3 ("fs/locks: always delete_block after waiting.")
Reported-by: Slawomir Pryczek <slawek1211@gmail.com>
Cc: Neil Brown <neilb@suse.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
After a blocked nfsd file_lock request is deleted, knfsd will send a
callback to the client and then free the request. Commit 16306a61d3
("fs/locks: always delete_block after waiting.") changed it such that
locks_delete_block is always called on a request after it is awoken,
but that patch missed fixing up blocked nfsd request handling.
Call locks_delete_block on the block to wake up any locks still blocked
on the nfsd lock request before freeing it. Some of its callers already
do this however, so just remove those calls.
URL: https://bugzilla.kernel.org/show_bug.cgi?id=203363
Fixes: 16306a61d3 ("fs/locks: always delete_block after waiting.")
Reported-by: Slawomir Pryczek <slawek1211@gmail.com>
Cc: Neil Brown <neilb@suse.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pull MIPS fixes from Paul Burton:
"A couple more MIPS fixes:
- Fix indirect syscall tracing & seccomp filtering for big endian
MIPS64 kernels, which previously loaded the syscall number
incorrectly & would always use zero.
- Fix performance counter IRQ setup for Atheros/ath79 SoCs, allowing
perf to function on those systems.
And not really a fix, but a useful addition:
- Add a Broadcom mailing list to the MAINTAINERS entry for BMIPS
systems to allow relevant engineers to track patch submissions"
* tag 'mips_fixes_5.1_3' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: perf: ath79: Fix perfcount IRQ assignment
MIPS: scall64-o32: Fix indirect syscall number load
MAINTAINERS: BMIPS: Add internal Broadcom mailing list
io_uring_poll shouldn't signal EPOLLOUT | EPOLLWRNORM if the queue is
full; the old check would always signal EPOLLOUT | EPOLLWRNORM (unless
there were U32_MAX - 1 entries in the SQ queue).
Signed-off-by: Stefan Bühler <source@stbuehler.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Reading the SQ tail needs to come after setting IORING_SQ_NEED_WAKEUP in
flags; there is no cheap barrier for ordering a store before a load, a
full memory barrier is required.
Userspace needs a full memory barrier between updating SQ tail and
checking for the IORING_SQ_NEED_WAKEUP too.
Signed-off-by: Stefan Bühler <source@stbuehler.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
A read memory barrier is required between reading SQ tail and reading
the actual data belonging to the SQ entry.
Userspace needs a matching write barrier between writing SQ entries and
updating SQ tail (using smp_store_release to update tail will do).
Signed-off-by: Stefan Bühler <source@stbuehler.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If we have multiple threads doing io_uring_register(2) on an io_uring
fd, then we can potentially try and kill the percpu reference while
someone else has already killed it.
Prevent this race by failing io_uring_register(2) if the ref is marked
dying. This is safe since we're inside the io_uring mutex.
Fixes: b19062a567 ("io_uring: fix possible deadlock between io_uring_{enter,register}")
Reported-by: syzbot <syzbot+10d25e23199614b7721f@syzkaller.appspotmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
It doesn't log a packet if sysctl_log_invalid isn't equal to protonum
OR sysctl_log_invalid isn't equal to IPPROTO_RAW. This sentence is
always true. I believe we need to replace OR to AND.
Cc: Florian Westphal <fw@strlen.de>
Fixes: c4f3db1595 ("netfilter: conntrack: add and use nf_l4proto_log_invalid")
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
setting net.netfilter.nf_conntrack_timestamp=1 breaks xmit with fq
scheduler. skb->tstamp might be "refreshed" using ktime_get_real(),
but fq expects CLOCK_MONOTONIC.
This patch removes all places in netfilter that check/set skb->tstamp:
1. To fix the bogus "start" time seen with conntrack timestamping for
outgoing packets, never use skb->tstamp and always use current time.
2. In nfqueue and nflog, only use skb->tstamp for incoming packets,
as determined by current hook (prerouting, input, forward).
3. xt_time has to use system clock as well rather than skb->tstamp.
We could still use skb->tstamp for prerouting/input/foward, but
I see no advantage to make this conditional.
Fixes: fb420d5d91 ("tcp/fq: move back to CLOCK_MONOTONIC")
Cc: Eric Dumazet <edumazet@google.com>
Reported-by: Michal Soltys <soltys@ziu.info>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
CONFIG_DECNET_ROUTE_FWMARK was removed in commit 47dcf0cb10 ("[NET]: Rethink mark field in struct flowi")
Since nothing replace it (and nothindg need to replace it, simply remove
it from documentation.
Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When working on the Allwinner internal PHY, the first work was to use
the "internal" mode, but some answer was made my mail on what are really
internal mean for PHY.
This patch write that in the doc.
Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
When device refuses the offload in tls_set_device_offload_rx()
it calls tls_sw_free_resources_rx() to clean up software context
state.
Unfortunately, tls_sw_free_resources_rx() does not free all
the state tls_set_sw_offload() allocated - it leaks IV and
sequence number buffers. All other code paths which lead to
tls_sw_release_resources_rx() (which tls_sw_free_resources_rx()
calls) free those right before the call.
Avoid the leak by moving freeing of iv and rec_seq into
tls_sw_release_resources_rx().
Fixes: 4799ac81e5 ("tls: Add rx inline crypto offload")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If device supports offload, but offload fails tls_set_device_offload_rx()
will call tls_sw_free_resources_rx() which (unhelpfully) releases
and reacquires the socket lock.
For a small fix release and reacquire the device_offload_lock.
Fixes: 4799ac81e5 ("tls: Add rx inline crypto offload")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The run_afpackettests will be marked as passed regardless the return
value of those sub-tests in the script:
--------------------
running psock_tpacket test
--------------------
[FAIL]
selftests: run_afpackettests [PASS]
Fix this by changing the return value for each tests.
Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull NFS client bugfix from Trond Myklebust:
"Fix a regression in which an RPC call can be tagged with an error
despite the transmission being successful"
* tag 'nfs-for-5.1-5' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
SUNRPC: Ignore queue transmission errors on successful transmission
Pull SCSI fixes from James Bottomley:
"Three minor fixes: two obvious ones in drivers and a fix to the SG_IO
path to correctly return status on error"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: aic7xxx: fix EISA support
Revert "scsi: fcoe: clear FC_RP_STARTED flags when receiving a LOGO"
scsi: core: set result when the command cannot be dispatched
Pull block fixes from Jens Axboe:
"A set of small fixes that should go into this series. This contains:
- Removal of unused queue member (Hou)
- Overflow bvec fix (Ming)
- Various little io_uring tweaks (me)
- kthread parking
- Only call cpu_possible() for verified CPU
- Drop unused 'file' argument to io_file_put()
- io_uring_enter vs io_uring_register deadlock fix
- CQ overflow fix
- BFQ internal depth update fix (me)"
* tag 'for-linus-20190420' of git://git.kernel.dk/linux-block:
block: make sure that bvec length can't be overflow
block: kill all_q_node in request_queue
io_uring: fix CQ overflow condition
io_uring: fix possible deadlock between io_uring_{enter,register}
io_uring: drop io_file_put() 'file' argument
bfq: update internal depth state when queue depth changes
io_uring: only test SQPOLL cpu after we've verified it
io_uring: park SQPOLL thread if it's percpu
Pill i3c fixes from Boris Brezillon:
- fix the random PID check
- fix the disable controller logic in the designware driver
- fix I3C entry in MAINTAINERS
* tag 'i3c/fixes-for-5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux:
MAINTAINERS: Fix the I3C entry
i3c: dw: Fix dw_i3c_master_disable controller by using correct mask
i3c: Fix the verification of random PID
Pull sound fixes from Takashi Iwai:
"Two core fixes for long-standing bugs for the races at concurrent
device creation and deletion that were (unsurprisingly) spotted by
syzkaller with usb-fuzzer.
The rest are usual small HD-audio fixes"
* tag 'sound-5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/realtek - add two more pin configuration sets to quirk table
ALSA: core: Fix card races between register and disconnect
ALSA: info: Fix racy addition/deletion of nodes
ALSA: hda: Initialize power_state field properly
Pull perf fixes from Ingo Molnar:
"Misc fixes:
- various tooling fixes
- kretprobe fixes
- kprobes annotation fixes
- kprobes error checking fix
- fix the default events for AMD Family 17h CPUs
- PEBS fix
- AUX record fix
- address filtering fix"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/kprobes: Avoid kretprobe recursion bug
kprobes: Mark ftrace mcount handler functions nokprobe
x86/kprobes: Verify stack frame on kretprobe
perf/x86/amd: Add event map for AMD Family 17h
perf bpf: Return NULL when RB tree lookup fails in perf_env__find_btf()
perf tools: Fix map reference counting
perf evlist: Fix side band thread draining
perf tools: Check maps for bpf programs
perf bpf: Return NULL when RB tree lookup fails in perf_env__find_bpf_prog_info()
tools include uapi: Sync sound/asound.h copy
perf top: Always sample time to satisfy needs of use of ordered queuing
perf evsel: Use hweight64() instead of hweight_long(attr.sample_regs_user)
tools lib traceevent: Fix missing equality check for strcmp
perf stat: Disable DIR_FORMAT feature for 'perf stat record'
perf scripts python: export-to-sqlite.py: Fix use of parent_id in calls_view
perf header: Fix lock/unlock imbalances when processing BPF/BTF info
perf/x86: Fix incorrect PEBS_REGS
perf/ring_buffer: Fix AUX record suppression
perf/core: Fix the address filtering fix
kprobes: Fix error check when reusing optimized probes
Pull x86 fixes from Ingo Molnar:
"Misc fixes all over the place: a console spam fix, section attributes
fixes, a KASLR fix, a TLB stack-variable alignment fix, a reboot
quirk, boot options related warnings fix, an LTO fix, a deadlock fix
and an RDT fix"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu/intel: Lower the "ENERGY_PERF_BIAS: Set to normal" message's log priority
x86/cpu/bugs: Use __initconst for 'const' init data
x86/mm/KASLR: Fix the size of the direct mapping section
x86/Kconfig: Fix spelling mistake "effectivness" -> "effectiveness"
x86/mm/tlb: Revert "x86/mm: Align TLB invalidation info"
x86/reboot, efi: Use EFI reboot for Acer TravelMate X514-51T
x86/mm: Prevent bogus warnings with "noexec=off"
x86/build/lto: Fix truncated .bss with -fdata-sections
x86/speculation: Prevent deadlock on ssb_state::lock
x86/resctrl: Do not repeat rdtgroup mode initialization
Pull scheduler fixes from Ingo Molnar:
"A deadline scheduler warning/race fix, and a cfs_period_us quota
calculation workaround where the real fix looks too involved to merge
immediately"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/deadline: Correctly handle active 0-lag timers
sched/fair: Limit sched_cfs_period_timer() loop to avoid hard lockup
Pull locking fixes from Ingo Molnar:
"A lockdep warning fix and a script execution fix when atomics are
generated"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/atomics: Don't assume that scripts are executable
locking/lockdep: Make lockdep_unregister_key() honor 'debug_locks' again
Pull cgroup fix from Tejun Heo:
"A patch to fix a RCU imbalance error in the devices cgroup
configuration error path"
* 'for-5.1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
device_cgroup: fix RCU imbalance in error case
Pull percpu fixlet from Dennis Zhou:
"This stops printing the base address of percpu memory on
initialization"
* 'for-5.1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu:
percpu: stop printing kernel addresses
Saeed Mahameed says:
====================
Mellanox, mlx5 fixes 2019-04-19
This series introduces some fixes to mlx5 driver.
Please pull and let me know if there is any problem.
For -stable v4.7:
('net/mlx5e: ethtool, Remove unsupported SFP EEPROM high pages query')
For -stable v4.19:
('net/mlx5e: Fix the max MTU check in case of XDP')
For -stable v5.0:
('net/mlx5e: Fix use-after-free after xdp_return_frame')
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
If we add a bond device which is already the master of the team interface,
we will hold the team->lock in team_add_slave() first and then request the
lock in team_set_mac_address() again. The functions are called like:
- team_add_slave()
- team_port_add()
- team_port_enter()
- team_modeop_port_enter()
- __set_port_dev_addr()
- dev_set_mac_address()
- bond_set_mac_address()
- dev_set_mac_address()
- team_set_mac_address
Although team_upper_dev_link() would check the upper devices but it is
called too late. Fix it by adding a checking before processing the slave.
v2: Do not split the string in netdev_err()
Fixes: 3d249d4ca7 ("net: introduce ethernet teaming device")
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The run_netsocktests will be marked as passed regardless the actual test
result from the ./socket:
selftests: net: run_netsocktests
========================================
--------------------
running socket test
--------------------
[FAIL]
ok 1..6 selftests: net: run_netsocktests [PASS]
This is because the test script itself has been successfully executed.
Fix this by exit 1 when the test failed.
Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Querying EEPROM high pages data for SFP module is currently
not supported by our driver and yet queried, resulting in
invalid FW queries.
Set the EEPROM ethtool data length to 256 for SFP module will
limit the reading for page 0 only and prevent invalid FW queries.
Fixes: bb64143eee ("net/mlx5e: Add ethtool support for dump module EEPROM")
Signed-off-by: Erez Alfasi <ereza@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
MLX5E_XDP_MAX_MTU was calculated incorrectly. It didn't account for
NET_IP_ALIGN and MLX5E_HW2SW_MTU, and it also misused MLX5_SKB_FRAG_SZ.
This commit fixes the calculations and adds a brief explanation for the
formula used.
Fixes: a26a5bdf3e ("net/mlx5e: Restrict the combination of large MTU and XDP")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
xdp_return_frame releases the frame. It leads to releasing the page, so
it's not allowed to access xdpi.xdpf->len after that, because xdpi.xdpf
is at xdp->data_hard_start after convert_to_xdp_frame. This patch moves
the memory access to precede the return of the frame.
Fixes: 58b99ee3e3 ("net/mlx5e: Add support for XDP_REDIRECT in device-out side")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Pull Allwinner clk fixes from Maxime Ripard:
- Some fixes for odd cases of the NKMP clocks
* tag 'clk-fixes-for-5.1' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux:
clk: sunxi-ng: nkmp: Explain why zero width check is needed
clk: sunxi-ng: nkmp: Avoid GENMASK(-1, 0)
Pull tty/serial fixes from Greg KH:
"Here are five small fixes for some tty/serial/vt issues that have been
reported.
The vt one has been around for a while, it is good to finally get that
resolved. The others fix a build warning that showed up in 5.1-rc1,
and resolve a problem in the sh-sci driver.
Note, the second patch for build warning fix for the sc16is7xx driver
was just applied to the tree, as it resolves a problem with the
previous patch to try to solve the issue. It has not shown up in
linux-next yet, unlike all of the other patches, but it has passed
0-day testing and everyone seems to agree that it is correct"
* tag 'tty-5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
sc16is7xx: put err_spi and err_i2c into correct #ifdef
vt: fix cursor when clearing the screen
sc16is7xx: move label 'err_spi' to correct section
serial: sh-sci: Fix HSCIF RX sampling point adjustment
serial: sh-sci: Fix HSCIF RX sampling point calculation
The syzkaller fuzzer reported a bug in the USB hub driver which turned
out to be caused by a negative runtime-PM usage counter. This allowed
a hub to be runtime suspended at a time when the driver did not expect
it. The symptom is a WARNING issued because the hub's status URB is
submitted while it is already active:
URB 0000000031fb463e submitted while active
WARNING: CPU: 0 PID: 2917 at drivers/usb/core/urb.c:363
The negative runtime-PM usage count was caused by an unfortunate
design decision made when runtime PM was first implemented for USB.
At that time, USB class drivers were allowed to unbind from their
interfaces without balancing the usage counter (i.e., leaving it with
a positive count). The core code would take care of setting the
counter back to 0 before allowing another driver to bind to the
interface.
Later on when runtime PM was implemented for the entire kernel, the
opposite decision was made: Drivers were required to balance their
runtime-PM get and put calls. In order to maintain backward
compatibility, however, the USB subsystem adapted to the new
implementation by keeping an independent usage counter for each
interface and using it to automatically adjust the normal usage
counter back to 0 whenever a driver was unbound.
This approach involves duplicating information, but what is worse, it
doesn't work properly in cases where a USB class driver delays
decrementing the usage counter until after the driver's disconnect()
routine has returned and the counter has been adjusted back to 0.
Doing so would cause the usage counter to become negative. There's
even a warning about this in the USB power management documentation!
As it happens, this is exactly what the hub driver does. The
kick_hub_wq() routine increments the runtime-PM usage counter, and the
corresponding decrement is carried out by hub_event() in the context
of the hub_wq work-queue thread. This work routine may sometimes run
after the driver has been unbound from its interface, and when it does
it causes the usage counter to go negative.
It is not possible for hub_disconnect() to wait for a pending
hub_event() call to finish, because hub_disconnect() is called with
the device lock held and hub_event() acquires that lock. The only
feasible fix is to reverse the original design decision: remove the
duplicate interface-specific usage counter and require USB drivers to
balance their runtime PM gets and puts. As far as I know, all
existing drivers currently do this.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-and-tested-by: syzbot+7634edaea4d0b341c625@syzkaller.appspotmail.com
CC: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
I've discovered following discrepancy in the bindings/net/ethernet.txt
documentation, where it states following:
- nvmem-cells: phandle, reference to an nvmem node for the MAC address;
- nvmem-cell-names: string, should be "mac-address" if nvmem is to be..
which is actually misleading and confusing. There are only two ethernet
drivers in the tree, cadence/macb and davinci which supports this
properties.
This nvmem-cell* properties were introduced in commit 9217e566bd
("of_net: Implement of_get_nvmem_mac_address helper"), but
commit afa64a72b8 ("of: net: kill of_get_nvmem_mac_address()")
forget to properly clean up this parts.
So this patch fixes the documentation by moving the nvmem-cell*
properties at the appropriate places. While at it, I've removed unused
include as well.
Cc: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Fixes: afa64a72b8 ("of: net: kill of_get_nvmem_mac_address()")
Signed-off-by: Petr Štetiar <ynezz@true.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Merge misc fixes from Andrew Morton:
"16 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping
mm/kmemleak.c: fix unused-function warning
init: initialize jump labels before command line option parsing
kernel/watchdog_hld.c: hard lockup message should end with a newline
kcov: improve CONFIG_ARCH_HAS_KCOV help text
mm: fix inactive list balancing between NUMA nodes and cgroups
mm/hotplug: treat CMA pages as unmovable
proc: fixup proc-pid-vm test
proc: fix map_files test on F29
mm/vmstat.c: fix /proc/vmstat format for CONFIG_DEBUG_TLBFLUSH=y CONFIG_SMP=n
mm/memory_hotplug: do not unlock after failing to take the device_hotplug_lock
mm: swapoff: shmem_unuse() stop eviction without igrab()
mm: swapoff: take notice of completion sooner
mm: swapoff: remove too limiting SWAP_UNUSE_MAX_TRIES
mm: swapoff: shmem_find_swap_entries() filter out other types
slab: store tagged freelist for off-slab slabmgmt
Pull staging and IIO fixes from Greg KH:
"Here is a bunch of IIO driver fixes, and some smaller staging driver
fixes, for 5.1-rc6. The IIO fixes were delayed due to my vacation, but
all resolve a number of reported issues and have been in linux-next
for a few weeks with no reported issues.
The other staging driver fixes are all tiny, resolving some reported
issues in the comedi and most drivers, as well as some erofs fixes.
All of these patches have been in linux-next with no reported issues"
* tag 'staging-5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (24 commits)
staging: comedi: ni_usb6501: Fix possible double-free of ->usb_rx_buf
staging: comedi: ni_usb6501: Fix use of uninitialized mutex
staging: erofs: fix unexpected out-of-bound data access
staging: comedi: vmk80xx: Fix possible double-free of ->usb_rx_buf
staging: comedi: vmk80xx: Fix use of uninitialized semaphore
staging: most: core: use device description as name
iio: core: fix a possible circular locking dependency
iio: ad_sigma_delta: select channel when reading register
iio: pms7003: select IIO_TRIGGERED_BUFFER
iio: cros_ec: Fix the maths for gyro scale calculation
iio: adc: xilinx: prevent touching unclocked h/w on remove
iio: adc: xilinx: fix potential use-after-free on probe
iio: adc: xilinx: fix potential use-after-free on remove
iio: dac: mcp4725: add missing powerdown bits in store eeprom
io: accel: kxcjk1013: restore the range after resume.
iio:chemical:bme680: Fix SPI read interface
iio:chemical:bme680: Fix, report temperature in millidegrees
iio: chemical: fix missing Kconfig block for sgp30
iio: adc: at91: disable adc channel interrupt in timeout case
iio: gyro: mpu3050: fix chip ID reading
...
Pull char/misc fixes from Greg KH:
"Here are four small misc driver fixes for 5.1-rc6.
Nothing major at all, they fix up a Kconfig issues, a SPDX invalid
license tag, and two tiny bugfixes.
All have been in linux-next for a while with no reported issues"
* tag 'char-misc-5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
drivers: power: supply: goldfish_battery: Fix bogus SPDX identifier
extcon: ptn5150: fix COMPILE_TEST dependencies
misc: fastrpc: add checked value for dma_set_mask
habanalabs: remove low credit limit of DMA #0
bvec->bv_offset may be bigger than PAGE_SIZE sometimes, such as,
when one bio is splitted in the middle of one bvec via bio_split(),
and bi_iter.bi_bvec_done is used to build offset of the 1st bvec of
remained bio. And the remained bio's bvec may be re-submitted to fs
layer via ITER_IBVEC, such as loop and nvme-loop.
So we have to make sure that every bvec's offset is less than
PAGE_SIZE from bio_for_each_segment_all() because some drivers(loop,
nvme-loop) passes the splitted bvec to fs layer via ITER_BVEC.
This patch fixes this issue reported by Zhang Yi When running nvme/011.
Cc: Christoph Hellwig <hch@lst.de>
Cc: Yi Zhang <yi.zhang@redhat.com>
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Fixes: 6dc4f100c1 ("block: allow bio_for_each_segment_all() to iterate over multi-page bvec")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull input updates from Dmitry Torokhov:
- several new key mappings for HID
- a host of new ACPI IDs used to identify Elan touchpads in Lenovo
laptops
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: snvs_pwrkey - initialize necessary driver data before enabling IRQ
HID: input: add mapping for "Toggle Display" key
HID: input: add mapping for "Full Screen" key
HID: input: add mapping for keyboard Brightness Up/Down/Toggle keys
HID: input: add mapping for Expose/Overview key
HID: input: fix mapping of aspect ratio key
[media] doc-rst: switch to new names for Full Screen/Aspect keys
Input: document meanings of KEY_SCREEN and KEY_ZOOM
Input: elan_i2c - add hardware ID for multiple Lenovo laptops
Pull perf/urgent fixes from Arnaldo Carvalho de Melo:
perf top:
Jiri Olsa:
- Fix 'perf top --pid', it needs PERF_SAMPLE_TIME since we switched to using
a different thread to sort the events and then even for just a single
thread we now need timestamps.
BPF:
Jiri Olsa:
- Fix bpf_prog and btf lookup functions failure path to to properly return
NULL.
- Fix side band thread draining, used to process PERF_RECORD_BPF_EVENT
metadata records.
core:
Jiri Olsa:
- Fix map lookup by name to get a refcount when the name is already in
the tree. Found
Song Liu:
- Fix __map__is_kmodule() by taking into account recently added BPF
maps.
UAPI:
Arnaldo Carvalho de Melo:
- Sync sound/asound.h copy
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The core dumping code has always run without holding the mmap_sem for
writing, despite that is the only way to ensure that the entire vma
layout will not change from under it. Only using some signal
serialization on the processes belonging to the mm is not nearly enough.
This was pointed out earlier. For example in Hugh's post from Jul 2017:
https://lkml.kernel.org/r/alpine.LSU.2.11.1707191716030.2055@eggly.anvils
"Not strictly relevant here, but a related note: I was very surprised
to discover, only quite recently, how handle_mm_fault() may be called
without down_read(mmap_sem) - when core dumping. That seems a
misguided optimization to me, which would also be nice to correct"
In particular because the growsdown and growsup can move the
vm_start/vm_end the various loops the core dump does around the vma will
not be consistent if page faults can happen concurrently.
Pretty much all users calling mmget_not_zero()/get_task_mm() and then
taking the mmap_sem had the potential to introduce unexpected side
effects in the core dumping code.
Adding mmap_sem for writing around the ->core_dump invocation is a
viable long term fix, but it requires removing all copy user and page
faults and to replace them with get_dump_page() for all binary formats
which is not suitable as a short term fix.
For the time being this solution manually covers the places that can
confuse the core dump either by altering the vma layout or the vma flags
while it runs. Once ->core_dump runs under mmap_sem for writing the
function mmget_still_valid() can be dropped.
Allowing mmap_sem protected sections to run in parallel with the
coredump provides some minor parallelism advantage to the swapoff code
(which seems to be safe enough by never mangling any vma field and can
keep doing swapins in parallel to the core dumping) and to some other
corner case.
In order to facilitate the backporting I added "Fixes: 86039bd3b4e6"
however the side effect of this same race condition in /proc/pid/mem
should be reproducible since before 2.6.12-rc2 so I couldn't add any
other "Fixes:" because there's no hash beyond the git genesis commit.
Because find_extend_vma() is the only location outside of the process
context that could modify the "mm" structures under mmap_sem for
reading, by adding the mmget_still_valid() check to it, all other cases
that take the mmap_sem for reading don't need the new check after
mmget_not_zero()/get_task_mm(). The expand_stack() in page fault
context also doesn't need the new check, because all tasks under core
dumping are frozen.
Link: http://lkml.kernel.org/r/20190325224949.11068-1-aarcange@redhat.com
Fixes: 86039bd3b4 ("userfaultfd: add new syscall to provide memory externalization")
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Jann Horn <jannh@google.com>
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jann Horn <jannh@google.com>
Acked-by: Jason Gunthorpe <jgg@mellanox.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When a module option, or core kernel argument, toggles a static-key it
requires jump labels to be initialized early. While x86, PowerPC, and
ARM64 arrange for jump_label_init() to be called before parse_args(),
ARM does not.
Kernel command line: rdinit=/sbin/init page_alloc.shuffle=1 panic=-1 console=ttyAMA0,115200 page_alloc.shuffle=1
------------[ cut here ]------------
WARNING: CPU: 0 PID: 0 at ./include/linux/jump_label.h:303
page_alloc_shuffle+0x12c/0x1ac
static_key_enable(): static key 'page_alloc_shuffle_key+0x0/0x4' used
before call to jump_label_init()
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted
5.1.0-rc4-next-20190410-00003-g3367c36ce744 #1
Hardware name: ARM Integrator/CP (Device Tree)
[<c0011c68>] (unwind_backtrace) from [<c000ec48>] (show_stack+0x10/0x18)
[<c000ec48>] (show_stack) from [<c07e9710>] (dump_stack+0x18/0x24)
[<c07e9710>] (dump_stack) from [<c001bb1c>] (__warn+0xe0/0x108)
[<c001bb1c>] (__warn) from [<c001bb88>] (warn_slowpath_fmt+0x44/0x6c)
[<c001bb88>] (warn_slowpath_fmt) from [<c0b0c4a8>]
(page_alloc_shuffle+0x12c/0x1ac)
[<c0b0c4a8>] (page_alloc_shuffle) from [<c0b0c550>] (shuffle_store+0x28/0x48)
[<c0b0c550>] (shuffle_store) from [<c003e6a0>] (parse_args+0x1f4/0x350)
[<c003e6a0>] (parse_args) from [<c0ac3c00>] (start_kernel+0x1c0/0x488)
Move the fallback call to jump_label_init() to occur before
parse_args().
The redundant calls to jump_label_init() in other archs are left intact
in case they have static key toggling use cases that are even earlier
than option parsing.
Link: http://lkml.kernel.org/r/155544804466.1032396.13418949511615676665.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reported-by: Guenter Roeck <groeck@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Russell King <rmk@armlinux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
During !CONFIG_CGROUP reclaim, we expand the inactive list size if it's
thrashing on the node that is about to be reclaimed. But when cgroups
are enabled, we suddenly ignore the node scope and use the cgroup scope
only. The result is that pressure bleeds between NUMA nodes depending
on whether cgroups are merely compiled into Linux. This behavioral
difference is unexpected and undesirable.
When the refault adaptivity of the inactive list was first introduced,
there were no statistics at the lruvec level - the intersection of node
and memcg - so it was better than nothing.
But now that we have that infrastructure, use lruvec_page_state() to
make the list balancing decision always NUMA aware.
[hannes@cmpxchg.org: fix bisection hole]
Link: http://lkml.kernel.org/r/20190417155241.GB23013@cmpxchg.org
Link: http://lkml.kernel.org/r/20190412144438.2645-1-hannes@cmpxchg.org
Fixes: 2a2e48854d ("mm: vmscan: fix IO/refault regression in cache workingset transition")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
has_unmovable_pages() is used by allocating CMA and gigantic pages as
well as the memory hotplug. The later doesn't know how to offline CMA
pool properly now, but if an unused (free) CMA page is encountered, then
has_unmovable_pages() happily considers it as a free memory and
propagates this up the call chain. Memory offlining code then frees the
page without a proper CMA tear down which leads to an accounting issues.
Moreover if the same memory range is onlined again then the memory never
gets back to the CMA pool.
State after memory offline:
# grep cma /proc/vmstat
nr_free_cma 205824
# cat /sys/kernel/debug/cma/cma-kvm_cma/count
209920
Also, kmemleak still think those memory address are reserved below but
have already been used by the buddy allocator after onlining. This
patch fixes the situation by treating CMA pageblocks as unmovable except
when has_unmovable_pages() is called as part of CMA allocation.
Offlined Pages 4096
kmemleak: Cannot insert 0xc000201f7d040008 into the object search tree (overlaps existing)
Call Trace:
dump_stack+0xb0/0xf4 (unreliable)
create_object+0x344/0x380
__kmalloc_node+0x3ec/0x860
kvmalloc_node+0x58/0x110
seq_read+0x41c/0x620
__vfs_read+0x3c/0x70
vfs_read+0xbc/0x1a0
ksys_read+0x7c/0x140
system_call+0x5c/0x70
kmemleak: Kernel memory leak detector disabled
kmemleak: Object 0xc000201cc8000000 (size 13757317120):
kmemleak: comm "swapper/0", pid 0, jiffies 4294937297
kmemleak: min_count = -1
kmemleak: count = 0
kmemleak: flags = 0x5
kmemleak: checksum = 0
kmemleak: backtrace:
cma_declare_contiguous+0x2a4/0x3b0
kvm_cma_reserve+0x11c/0x134
setup_arch+0x300/0x3f8
start_kernel+0x9c/0x6e8
start_here_common+0x1c/0x4b0
kmemleak: Automatic memory scanning thread ended
[cai@lca.pw: use is_migrate_cma_page() and update commit log]
Link: http://lkml.kernel.org/r/20190416170510.20048-1-cai@lca.pw
Link: http://lkml.kernel.org/r/20190413002623.8967-1-cai@lca.pw
Signed-off-by: Qian Cai <cai@lca.pw>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The igrab() in shmem_unuse() looks good, but we forgot that it gives no
protection against concurrent unmounting: a point made by Konstantin
Khlebnikov eight years ago, and then fixed in 2.6.39 by 778dd893ae
("tmpfs: fix race between umount and swapoff"). The current 5.1-rc
swapoff is liable to hit "VFS: Busy inodes after unmount of tmpfs.
Self-destruct in 5 seconds. Have a nice day..." followed by GPF.
Once again, give up on using igrab(); but don't go back to making such
heavy-handed use of shmem_swaplist_mutex as last time: that would spoil
the new design, and I expect could deadlock inside shmem_swapin_page().
Instead, shmem_unuse() just raise a "stop_eviction" count in the shmem-
specific inode, and shmem_evict_inode() wait for that to go down to 0.
Call it "stop_eviction" rather than "swapoff_busy" because it can be put
to use for others later (huge tmpfs patches expect to use it).
That simplifies shmem_unuse(), protecting it from both unlink and
unmount; and in practice lets it locate all the swap in its first try.
But do not rely on that: there's still a theoretical case, when
shmem_writepage() might have been preempted after its get_swap_page(),
before making the swap entry visible to swapoff.
[hughd@google.com: remove incorrect list_del()]
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1904091133570.1898@eggly.anvils
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1904081259400.1523@eggly.anvils
Fixes: b56a2d8af9 ("mm: rid swapoff of quadratic complexity")
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: "Alex Xu (Hello71)" <alex_y_xu@yahoo.ca>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Kelley Nielsen <kelleynnn@gmail.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Rik van Riel <riel@surriel.com>
Cc: Vineeth Pillai <vpillai@digitalocean.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
SWAP_UNUSE_MAX_TRIES 3 appeared to work well in earlier testing, but
further testing has proved it to be a source of unnecessary swapoff
EBUSY failures (which can then be followed by unmount EBUSY failures).
When mmget_not_zero() or shmem's igrab() fails, there is an mm exiting
or inode being evicted, freeing up swap independent of try_to_unuse().
Those typically completed much sooner than the old quadratic swapoff,
but now it's more common that swapoff may need to wait for them.
It's possible to move those cases from init_mm.mmlist and shmem_swaplist
to separate "exiting" swaplists, and try_to_unuse() then wait for those
lists to be emptied; but we've not bothered with that in the past, and
don't want to risk missing some other forgotten case. So just revert to
cycling around until the swap is gone, without any retries limit.
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1904081256170.1523@eggly.anvils
Fixes: b56a2d8af9 ("mm: rid swapoff of quadratic complexity")
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: "Alex Xu (Hello71)" <alex_y_xu@yahoo.ca>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Kelley Nielsen <kelleynnn@gmail.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Rik van Riel <riel@surriel.com>
Cc: Vineeth Pillai <vpillai@digitalocean.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 51dedad06b ("kasan, slab: make freelist stored without tags")
calls kasan_reset_tag() for off-slab slab management object leading to
freelist being stored non-tagged.
However, cache_grow_begin() calls alloc_slabmgmt() which calls
kmem_cache_alloc_node() assigns a tag for the address and stores it in
the shadow address. As the result, it causes endless errors below
during boot due to drain_freelist() -> slab_destroy() ->
kasan_slab_free() which compares already untagged freelist against the
stored tag in the shadow address.
Since off-slab slab management object freelist is such a special case,
just store it tagged. Non-off-slab management object freelist is still
stored untagged which has not been assigned a tag and should not cause
any other troubles with this inconsistency.
BUG: KASAN: double-free or invalid-free in slab_destroy+0x84/0x88
Pointer tag: [ff], memory tag: [99]
CPU: 0 PID: 1376 Comm: kworker/0:4 Tainted: G W 5.1.0-rc3+ #8
Hardware name: HPE Apollo 70 /C01_APACHE_MB , BIOS L50_5.13_1.0.6 07/10/2018
Workqueue: cgroup_destroy css_killed_work_fn
Call trace:
print_address_description+0x74/0x2a4
kasan_report_invalid_free+0x80/0xc0
__kasan_slab_free+0x204/0x208
kasan_slab_free+0xc/0x18
kmem_cache_free+0xe4/0x254
slab_destroy+0x84/0x88
drain_freelist+0xd0/0x104
__kmem_cache_shrink+0x1ac/0x224
__kmemcg_cache_deactivate+0x1c/0x28
memcg_deactivate_kmem_caches+0xa0/0xe8
memcg_offline_kmem+0x8c/0x3d4
mem_cgroup_css_offline+0x24c/0x290
css_killed_work_fn+0x154/0x618
process_one_work+0x9cc/0x183c
worker_thread+0x9b0/0xe38
kthread+0x374/0x390
ret_from_fork+0x10/0x18
Allocated by task 1625:
__kasan_kmalloc+0x168/0x240
kasan_slab_alloc+0x18/0x20
kmem_cache_alloc_node+0x1f8/0x3a0
cache_grow_begin+0x4fc/0xa24
cache_alloc_refill+0x2f8/0x3e8
kmem_cache_alloc+0x1bc/0x3bc
sock_alloc_inode+0x58/0x334
alloc_inode+0xb8/0x164
new_inode_pseudo+0x20/0xec
sock_alloc+0x74/0x284
__sock_create+0xb0/0x58c
sock_create+0x98/0xb8
__sys_socket+0x60/0x138
__arm64_sys_socket+0xa4/0x110
el0_svc_handler+0x2c0/0x47c
el0_svc+0x8/0xc
Freed by task 1625:
__kasan_slab_free+0x114/0x208
kasan_slab_free+0xc/0x18
kfree+0x1a8/0x1e0
single_release+0x7c/0x9c
close_pdeo+0x13c/0x43c
proc_reg_release+0xec/0x108
__fput+0x2f8/0x784
____fput+0x1c/0x28
task_work_run+0xc0/0x1b0
do_notify_resume+0xb44/0x1278
work_pending+0x8/0x10
The buggy address belongs to the object at ffff809681b89e00
which belongs to the cache kmalloc-128 of size 128
The buggy address is located 0 bytes inside of
128-byte region [ffff809681b89e00, ffff809681b89e80)
The buggy address belongs to the page:
page:ffff7fe025a06e00 count:1 mapcount:0 mapping:01ff80082000fb00
index:0xffff809681b8fe04
flags: 0x17ffffffc000200(slab)
raw: 017ffffffc000200 ffff7fe025a06d08 ffff7fe022ef7b88 01ff80082000fb00
raw: ffff809681b8fe04 ffff809681b80000 00000001000000e0 0000000000000000
page dumped because: kasan: bad access detected
page allocated via order 0, migratetype Unmovable, gfp_mask
0x2420c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_COMP|__GFP_THISNODE)
prep_new_page+0x4e0/0x5e0
get_page_from_freelist+0x4ce8/0x50d4
__alloc_pages_nodemask+0x738/0x38b8
cache_grow_begin+0xd8/0xa24
____cache_alloc_node+0x14c/0x268
__kmalloc+0x1c8/0x3fc
ftrace_free_mem+0x408/0x1284
ftrace_free_init_mem+0x20/0x28
kernel_init+0x24/0x548
ret_from_fork+0x10/0x18
Memory state around the buggy address:
ffff809681b89c00: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe
ffff809681b89d00: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe
>ffff809681b89e00: 99 99 99 99 99 99 99 99 fe fe fe fe fe fe fe fe
^
ffff809681b89f00: 43 43 43 43 43 fe fe fe fe fe fe fe fe fe fe fe
ffff809681b8a000: 6d fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe
Link: http://lkml.kernel.org/r/20190403022858.97584-1-cai@lca.pw
Fixes: 51dedad06b ("kasan, slab: make freelist stored without tags")
Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When a driver unloads without unloading TTM we don't correctly
clear the global structures leading to errors on re-init.
Next step should probably be to remove the global structures and
kobjs all together, but this is tricky since we need to maintain
backward compatibility.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Tested-by: Karol Herbst <kherbst@redhat.com>
CC: stable@vger.kernel.org # 5.0.x
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Avoid kretprobe recursion loop bg by setting a dummy
kprobes to current_kprobe per-CPU variable.
This bug has been introduced with the asm-coded trampoline
code, since previously it used another kprobe for hooking
the function return placeholder (which only has a nop) and
trampoline handler was called from that kprobe.
This revives the old lost kprobe again.
With this fix, we don't see deadlock anymore.
And you can see that all inner-called kretprobe are skipped.
event_1 235 0
event_2 19375 19612
The 1st column is recorded count and the 2nd is missed count.
Above shows (event_1 rec) + (event_2 rec) ~= (event_2 missed)
(some difference are here because the counter is racy)
Reported-by: Andrea Righi <righi.andrea@gmail.com>
Tested-by: Andrea Righi <righi.andrea@gmail.com>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Fixes: c9becf58d9 ("[PATCH] kretprobe: kretprobe-booster")
Link: http://lkml.kernel.org/r/155094064889.6137.972160690963039.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The syzkaller USB fuzzer identified a failure mode in which dummy-hcd
would never give back an unlinked URB. This causes usb_kill_urb() to
hang, leading to WARNINGs and unkillable threads.
In dummy-hcd, all URBs are given back by the dummy_timer() routine as
it scans through the list of pending URBS. Failure to give back URBs
can be caused by failure to start or early exit from the scanning
loop. The code currently has two such pathways: One is triggered when
an unsupported bus transfer speed is encountered, and the other by
exhausting the simulated bandwidth for USB transfers during a frame.
This patch removes those two paths, thereby allowing all unlinked URBs
to be given back in a timely manner. It adds a check for the bus
speed when the gadget first starts running, so that dummy_timer() will
never thereafter encounter an unsupported speed. And it prevents the
loop from exiting as soon as the total bandwidth has been used up (the
scanning loop continues, giving back unlinked URBs as they are found,
but not transferring any more data).
Thanks to Andrey Konovalov for manually running the syzkaller fuzzer
to help track down the source of the bug.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-and-tested-by: syzbot+d919b0f29d7b5a4994b9@syzkaller.appspotmail.com
CC: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
err_spi is only called within SERIAL_SC16IS7XX_SPI
while err_i2c is called inside SERIAL_SC16IS7XX_I2C.
So we need to put err_spi and err_i2c into each #ifdef
accordingly.
This change fixes ("sc16is7xx: move label 'err_spi'
to correct section").
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Instead of relying on the now removed NULL argument to
pci_alloc_consistent, switch to the generic DMA API, and store the struct
device so that we can pass it.
Fixes: 4167b2ad51 ("PCI: Remove NULL device handling from PCI DMA API")
Reported-by: Matthew Whitehead <tedheadster@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Matthew Whitehead <tedheadster@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch clears FC_RP_STARTED flag during logoff, because of this
re-login(flogi) didn't happen to the switch.
This reverts commit 1550ec458e.
Fixes: 1550ec458e ("scsi: fcoe: clear FC_RP_STARTED flags when receiving a LOGO")
Cc: <stable@vger.kernel.org> # v4.18+
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Reviewed-by: Hannes Reinecke <hare@#suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Unlike atomic_add(), refcount_add() does not deal well
with a negative argument. TLS fallback code reallocates
the skb and is very likely to shrink the truesize, leading to:
[ 189.513254] WARNING: CPU: 5 PID: 0 at lib/refcount.c:81 refcount_add_not_zero_checked+0x15c/0x180
Call Trace:
refcount_add_checked+0x6/0x40
tls_enc_skb+0xb93/0x13e0 [tls]
Once wmem_allocated count saturates the application can no longer
send data on the socket. This is similar to Eric's fixes for GSO,
TCP:
commit 7ec318feee ("tcp: gso: avoid refcount_t warning from tcp_gso_segment()")
and UDP:
commit 575b65bc5b ("udp: avoid refcount_t saturation in __udp_gso_segment()").
Unlike the GSO case, for TLS fallback it's likely that the skb has
shrunk, so the "likely" annotation is the other way around (likely
branch being "sub").
Fixes: e8f6979981 ("net/tls: Add generic NIC offload infrastructure")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We recently introduced a change to support devm clk lookups. That change
introduced a code-path that used clk_find() without holding the
'clocks_mutex'. Unfortunately, clk_find() iterates over the 'clocks'
list and so we need to prevent the list from being modified at the same
time. Do this by holding the mutex and checking to make sure it's held
while iterating the list.
Note, we don't really care if the lookup is freed after we find it with
clk_find() because we're just doing a pointer comparison, but if we did
care we would need to keep holding the mutex while we dereference the
clk_lookup pointer.
Fixes: 3eee6c7d11 ("clkdev: add managed clkdev lookup registration")
Cc: Miquel Raynal <miquel.raynal@bootlin.com>
Cc: Jerome Brunet <jbrunet@baylibre.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Michael Turquette <mturquette@baylibre.com>
Cc: Jeffrey Hugo <jhugo@codeaurora.org>
Cc: Chen-Yu Tsai <wens@csie.org>
Cc: Matti Vaittinen <matti.vaittinen@fi.rohmeurope.com>
Acked-by: Matti Vaittinen <matti.vaittinen@fi.rohmeurope.com>
Tested-by: Jeffrey Hugo <jhugo@codeaurora.org>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Since there are more IOT2040 variants with identical hardware but
different asset tags, the asset tag matching should be adjusted to
support them.
For the board name "SIMATIC IOT2000", currently there are 2 types of
hardware, IOT2020 and IOT2040. The IOT2020 is identified by its unique
asset tag. Match on it first. If we then match on the board name only,
we will catch all IOT2040 variants. In the future there will be no other
devices with the "SIMATIC IOT2000" DMI board name but different
hardware.
Signed-off-by: Su Bao Cheng <baocheng.su@siemens.com>
Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel says:
====================
mlxsw: Few small fixes
Patch #1, from Petr, adjusts mlxsw to provide the same QoS behavior for
both Spectrum-1 and Spectrum-2. The fix is required due to a difference
in the behavior of Spectrum-2 compared to Spectrum-1. The problem and
solution are described in the detail in the changelog.
Patch #2 increases the time period in which the driver waits for the
firmware to signal it has finished its initialization. The issue will be
fixed in future firmware versions and the timeout will be decreased.
Patch #3, from Amit, fixes a display problem where the autoneg status in
ethtool is not updated in case the netdev is not running.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
If link is down and autoneg is set to on/off, the status in ethtool does
not change.
The reason is when the link is down the function returns with zero
before changing autoneg value.
Move the checking of link state (up/down) to be performed after setting
autoneg value, in order to be sure that autoneg will change in any case.
Fixes: 56ade8fe3f ("mlxsw: spectrum: Add initial support for Spectrum ASIC")
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
During driver initialization the driver sends a reset to the device and
waits for the firmware to signal that it is ready to continue.
Commit d2f372ba09 ("mlxsw: pci: Increase PCI SW reset timeout")
increased the timeout to 13 seconds due to longer PHY calibration in
Spectrum-2 compared to Spectrum-1.
Recently it became apparent that this timeout is too short and therefore
this patch increases it again to a safer limit that will be reduced in
the future.
Fixes: c3ab435466 ("mlxsw: spectrum: Extend to support Spectrum-2 ASIC")
Fixes: d2f372ba09 ("mlxsw: pci: Increase PCI SW reset timeout")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Both Spectrum-1 and Spectrum-2 chips are currently configured such that
pairs of TC n (which is used for UC traffic) and TC n+8 (which is used
for MC traffic) are feeding into the same subgroup. Strict
prioritization is configured between the two TCs, and by enabling
MC-aware mode on the switch, the lower-numbered (UC) TCs are favored
over the higher-numbered (MC) TCs.
On Spectrum-2 however, there is an issue in configuration of the
MC-aware mode. As a result, MC traffic is prioritized over UC traffic.
To work around the issue, configure the MC TCs with DWRR mode (while
keeping the UC TCs in strict mode).
With this patch, the multicast-unicast arbitration results in the same
behavior on both Spectrum-1 and Spectrum-2 chips.
Fixes: 7b81953066 ("mlxsw: spectrum: Configure MC-aware mode on mlxsw ports")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull arm64 fix from Catalin Marinas:
"Avoid compiler uninitialised warning introduced by recent arm64 futex
fix"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: futex: Restore oldval initialization to work around buggy compilers
Commit 045afc2412 ("arm64: futex: Fix FUTEX_WAKE_OP atomic ops with
non-zero result value") removed oldval's zero initialization in
arch_futex_atomic_op_inuser because it is not necessary. Unfortunately,
Android's arm64 GCC 4.9.4 [1] does not agree:
../kernel/futex.c: In function 'do_futex':
../kernel/futex.c:1658:17: warning: 'oldval' may be used uninitialized
in this function [-Wmaybe-uninitialized]
return oldval == cmparg;
^
In file included from ../kernel/futex.c:73:0:
../arch/arm64/include/asm/futex.h:53:6: note: 'oldval' was declared here
int oldval, ret, tmp;
^
GCC fails to follow that when ret is non-zero, futex_atomic_op_inuser
returns right away, avoiding the uninitialized use that it claims.
Restoring the zero initialization works around this issue.
[1]: https://android.googlesource.com/platform/prebuilts/gcc/linux-x86/aarch64/aarch64-linux-android-4.9/
Cc: stable@vger.kernel.org
Fixes: 045afc2412 ("arm64: futex: Fix FUTEX_WAKE_OP atomic ops with non-zero result value")
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
To minimize the latency of timer interrupts as observed by the guest,
KVM adjusts the values it programs into the host timers to account for
the host's overhead of programming and handling the timer event. In
the event that the adjustments are too aggressive, i.e. the timer fires
earlier than the guest expects, KVM busy waits immediately prior to
entering the guest.
Currently, KVM manually converts the delay from nanoseconds to clock
cycles. But, the conversion is done in the guest's time domain, while
the delay occurs in the host's time domain. This is perfectly ok when
the guest and host are using the same TSC ratio, but if the guest is
using a different ratio then the delay may not be accurate and could
wait too little or too long.
When the guest is not using the host's ratio, convert the delay from
guest clock cycles to host nanoseconds and use ndelay() instead of
__delay() to provide more accurate timing. Because converting to
nanoseconds is relatively expensive, e.g. requires division and more
multiplication ops, continue using __delay() directly when guest and
host TSCs are running at the same ratio.
Cc: Liran Alon <liran.alon@oracle.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: stable@vger.kernel.org
Fixes: 3b8a5df6c4 ("KVM: LAPIC: Tune lapic_timer_advance_ns automatically")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The introduction of adaptive tuning of lapic timer advancement did not
allow for the scenario where userspace would want to disable adaptive
tuning but still employ timer advancement, e.g. for testing purposes or
to handle a use case where adaptive tuning is unable to settle on a
suitable time. This is epecially pertinent now that KVM places a hard
threshold on the maximum advancment time.
Rework the timer semantics to accept signed values, with a value of '-1'
being interpreted as "use adaptive tuning with KVM's internal default",
and any other value being used as an explicit advancement time, e.g. a
time of '0' effectively disables advancement.
Note, this does not completely restore the original behavior of
lapic_timer_advance_ns. Prior to tracking the advancement per vCPU,
which is necessary to support autotuning, userspace could adjust
lapic_timer_advance_ns for *running* vCPU. With per-vCPU tracking, the
module params are snapshotted at vCPU creation, i.e. applying a new
advancement effectively requires restarting a VM.
Dynamically updating a running vCPU is possible, e.g. a helper could be
added to retrieve the desired delay, choosing between the global module
param and the per-VCPU value depending on whether or not auto-tuning is
(globally) enabled, but introduces a great deal of complexity. The
wrapper itself is not complex, but understanding and documenting the
effects of dynamically toggling auto-tuning and/or adjusting the timer
advancement is nigh impossible since the behavior would be dependent on
KVM's implementation as well as compiler optimizations. In other words,
providing stable behavior would require extremely careful consideration
now and in the future.
Given that the expected use of a manually-tuned timer advancement is to
"tune once, run many", use the vastly simpler approach of recognizing
changes to the module params only when creating a new vCPU.
Cc: Liran Alon <liran.alon@oracle.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Cc: stable@vger.kernel.org
Fixes: 3b8a5df6c4 ("KVM: LAPIC: Tune lapic_timer_advance_ns automatically")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Automatically adjusting the globally-shared timer advancement could
corrupt the timer, e.g. if multiple vCPUs are concurrently adjusting
the advancement value. That could be partially fixed by using a local
variable for the arithmetic, but it would still be susceptible to a
race when setting timer_advance_adjust_done.
And because virtual_tsc_khz and tsc_scaling_ratio are per-vCPU, the
correct calibration for a given vCPU may not apply to all vCPUs.
Furthermore, lapic_timer_advance_ns is marked __read_mostly, which is
effectively violated when finding a stable advancement takes an extended
amount of timer.
Opportunistically change the definition of lapic_timer_advance_ns to
a u32 so that it matches the style of struct kvm_timer. Explicitly
pass the param to kvm_create_lapic() so that it doesn't have to be
exposed to lapic.c, thus reducing the probability of unintentionally
using the global value instead of the per-vCPU value.
Cc: Liran Alon <liran.alon@oracle.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Cc: stable@vger.kernel.org
Fixes: 3b8a5df6c4 ("KVM: LAPIC: Tune lapic_timer_advance_ns automatically")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
To minimize the latency of timer interrupts as observed by the guest,
KVM adjusts the values it programs into the host timers to account for
the host's overhead of programming and handling the timer event. Now
that the timer advancement is automatically tuned during runtime, it's
effectively unbounded by default, e.g. if KVM is running as L1 the
advancement can measure in hundreds of milliseconds.
Disable timer advancement if adaptive tuning yields an advancement of
more than 5000ns, as large advancements can break reasonable assumptions
of the guest, e.g. that a timer configured to fire after 1ms won't
arrive on the next instruction. Although KVM busy waits to mitigate the
case of a timer event arriving too early, complications can arise when
shifting the interrupt too far, e.g. kvm-unit-test's vmx.interrupt test
will fail when its "host" exits on interrupts as KVM may inject the INTR
before the guest executes STI+HLT. Arguably the unit test is "broken"
in the sense that delaying a timer interrupt by 1ms doesn't technically
guarantee the interrupt will arrive after STI+HLT, but it's a reasonable
assumption that KVM should support.
Furthermore, an unbounded advancement also effectively unbounds the time
spent busy waiting, e.g. if the guest programs a timer with a very large
delay.
5000ns is a somewhat arbitrary threshold. When running on bare metal,
which is the intended use case, timer advancement is expected to be in
the general vicinity of 1000ns. 5000ns is high enough that false
positives are unlikely, while not being so high as to negatively affect
the host's performance/stability.
Note, a future patch will enable userspace to disable KVM's adaptive
tuning, which will allow priveleged userspace will to specifying an
advancement value in excess of this arbitrary threshold in order to
satisfy an abnormal use case.
Cc: Liran Alon <liran.alon@oracle.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: stable@vger.kernel.org
Fixes: 3b8a5df6c4 ("KVM: LAPIC: Tune lapic_timer_advance_ns automatically")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
It was reported that with some special Multi Processor Group configuration,
e.g:
bcdedit.exe /set groupsize 1
bcdedit.exe /set maxgroup on
bcdedit.exe /set groupaware on
for a 16-vCPU guest WS2012 shows BSOD on boot when PV TLB flush mechanism
is in use.
Tracing kvm_hv_flush_tlb immediately reveals the issue:
kvm_hv_flush_tlb: processor_mask 0x0 address_space 0x0 flags 0x2
The only flag set in this request is HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES,
however, processor_mask is 0x0 and no HV_FLUSH_ALL_PROCESSORS is specified.
We don't flush anything and apparently it's not what Windows expects.
TLFS doesn't say anything about such requests and newer Windows versions
seem to be unaffected. This all feels like a WS2012 bug, which is, however,
easy to workaround in KVM: let's flush everything when we see an empty
flush request, over-flushing doesn't hurt.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
If guest sets MSR_IA32_TSCDEADLINE to value such that in host
time-domain it's shorter than lapic_timer_advance_ns, we can
reach a case that we call hrtimer_start() with expiration time set at
the past.
Because lapic_timer.timer is init with HRTIMER_MODE_ABS_PINNED, it
is not allowed to run in softirq and therefore will never expire.
To avoid such a scenario, verify that deadline expiration time is set on
host time-domain further than (now + lapic_timer_advance_ns).
A future patch can also consider adding a min_timer_deadline_ns module parameter,
similar to min_timer_period_us to avoid races that amount of ns it takes
to run logic could still call hrtimer_start() with expiration timer set
at the past.
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
As stated in the original commit for pidfd_send_signal() we don't allow
to signal processes through O_PATH file descriptors since it is
semantically equivalent to a write on the pidfd.
We already correctly error out right now and return EBADF if an O_PATH
fd is passed. This is because we use file->f_op to detect whether a
pidfd is passed and O_PATH fds have their file->f_op set to empty_fops
in do_dentry_open() and thus fail the test.
Thus, there is no regression. It's just semantically correct to use
fdget() and return an error right from there instead of taking a
reference and returning an error later.
Signed-off-by: Christian Brauner <christian@brauner.io>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jann Horn <jann@thejh.net>
Cc: David Howells <dhowells@redhat.com>
Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
Cc: Andy Lutomirsky <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull s390 bug fixes from Martin Schwidefsky:
- Fix overwrite of the initial ramdisk due to misuse of IS_ENABLED
- Fix integer overflow in the dasd driver resulting in incorrect number
of blocks for large devices
- Fix a lockdep false positive in the 3270 driver
- Fix a deadlock in the zcrypt driver
- Fix incorrect debug feature entries in the pkey api
- Fix inline assembly constraints fallout with CONFIG_KASAN=y
* tag 's390-5.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390: correct some inline assembly constraints
s390/pkey: add one more argument space for debug feature entry
s390/zcrypt: fix possible deadlock situation on ap queue remove
s390/3270: fix lockdep false positive on view->lock
s390/dasd: Fix capacity calculation for large volumes
s390/mem_detect: Use IS_ENABLED(CONFIG_BLK_DEV_INITRD)
Pull AFS fixes from David Howells:
- Stop using the deprecated get_seconds().
- Don't make tracepoint strings const as the section they go in isn't
read-only.
- Differentiate failure due to unmarshalling from other failure cases.
We shouldn't abort with RXGEN_CC/SS_UNMARSHAL if it's not due to
unmarshalling.
- Add a missing unlock_page().
- Fix the interaction between receiving a notification from a server
that it has invalidated all outstanding callback promises and a
client call that we're in the middle of making that will get a new
promise.
* tag 'afs-fixes-20190413' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
afs: Fix in-progess ops to ignore server-level callback invalidation
afs: Unlock pages for __pagevec_release()
afs: Differentiate abort due to unmarshalling from other errors
afs: Avoid section confusion in CM_NAME
afs: avoid deprecated get_seconds()
Pull crypto fix from Herbert Xu:
"Fix a bug in the implementation of the x86 accelerated version of
poly1305"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: x86/poly1305 - fix overflow during partial reduction
Pull drm fixes from Dave Airlie:
"Since Easter is looming for me, I'm just pushing whatever is in my
tree, I'll see what else turns up and maybe I'll send another pull
early next week if there is anything.
tegra:
- stream id programming fix
- avoid divide by 0 for bad hdmi audio setup code
ttm:
- Hugepages fix
- refcount imbalance in error path fix
amdgpu:
- GPU VM fixes for Vega/RV
- DC AUX fix for active DP-DVI dongles
- DC fix for multihead regression"
* tag 'drm-fixes-2019-04-18' of git://anongit.freedesktop.org/drm/drm:
drm/tegra: hdmi: Setup audio only if configured
drm/amd/display: If one stream full updates, full update all planes
drm/amdgpu/gmc9: fix VM_L2_CNTL3 programming
drm/amdgpu: shadow in shadow_list without tbo.mem.start cause page fault in sriov TDR
gpu: host1x: Program stream ID to bypass without SMMU
drm/amd/display: extending AUX SW Timeout
drm/ttm: fix dma_fence refcount imbalance on error path
drm/ttm: fix incrementing the page pointer for huge pages
drm/ttm: fix start page for huge page check in ttm_put_pages()
drm/ttm: fix out-of-bounds read in ttm_put_pages() v2
tick_freeze() introduced by suspend-to-idle in commit 124cf9117c ("PM /
sleep: Make it possible to quiesce timers during suspend-to-idle") uses
timekeeping_suspend() instead of syscore_suspend() during
suspend-to-idle. As a consequence generic sched_clock will keep going
because sched_clock_suspend() and sched_clock_resume() are not invoked
during suspend-to-idle which can result in a generic sched_clock wrap.
On a ARM system with suspend-to-idle enabled, sched_clock is registered
as "56 bits at 13MHz, resolution 76ns, wraps every 4398046511101ns", which
means the real wrapping duration is 8796093022202ns.
[ 134.551779] suspend-to-idle suspend (timekeeping_suspend())
[ 1204.912239] suspend-to-idle resume (timekeeping_resume())
......
[ 1206.912239] suspend-to-idle suspend (timekeeping_suspend())
[ 5880.502807] suspend-to-idle resume (timekeeping_resume())
......
[ 6000.403724] suspend-to-idle suspend (timekeeping_suspend())
[ 8035.753167] suspend-to-idle resume (timekeeping_resume())
......
[ 8795.786684] (2)[321:charger_thread]......
[ 8795.788387] (2)[321:charger_thread]......
[ 0.057226] (0)[0:swapper/0]......
[ 0.061447] (2)[0:swapper/2]......
sched_clock was not stopped during suspend-to-idle, and sched_clock_poll
hrtimer was not expired because timekeeping_suspend() was invoked during
suspend-to-idle. It makes sched_clock wrap at kernel time 8796s.
To prevent this, invoke sched_clock_suspend() and sched_clock_resume() in
tick_freeze() together with timekeeping_suspend() and timekeeping_resume().
Fixes: 124cf9117c (PM / sleep: Make it possible to quiesce timers during suspend-to-idle)
Signed-off-by: Chang-An Chen <chang-an.chen@mediatek.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Corey Minyard <cminyard@mvista.com>
Cc: <linux-mediatek@lists.infradead.org>
Cc: <linux-arm-kernel@lists.infradead.org>
Cc: Stanley Chu <stanley.chu@mediatek.com>
Cc: <kuohong.wang@mediatek.com>
Cc: <freddy.hsin@mediatek.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1553828349-8914-1-git-send-email-chang-an.chen@mediatek.com
Specifically, max_tfd_queue_size should be 0x10000 like in
22560 family and not 0x100 like in 22000 family.
Signed-off-by: Shaul Triebitz <shaul.triebitz@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
debugfs can now report an error code if something went wrong instead of
just NULL. So if the return value is to be used as a "real" dentry, it
needs to be checked if it is an error before dereferencing it.
This is now happening because of ff9fb72bc0 ("debugfs: return error
values, not NULL"). If multiple iwlwifi devices are in the system, this
can cause problems when the driver attempts to create the main debugfs
directory again. Later on in the code we fail horribly by trying to
dereference a pointer that is an error value.
Reported-by: Laura Abbott <labbott@redhat.com>
Reported-by: Gabriel Ramirez <gabriello.ramirez@gmail.com>
Cc: Johannes Berg <johannes.berg@intel.com>
Cc: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Cc: Luca Coelho <luciano.coelho@intel.com>
Cc: Intel Linux Wireless <linuxwifi@intel.com>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: stable <stable@vger.kernel.org> # 5.0
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
In ini debug TLVs bit 24 is set. The driver relies on it in the memory
allocation for the debug configuration. This implementation is
problematic in case of a new debug TLV that is not supported yet is added
and uses bit 24. In such a scenario the driver allocate space without
using it which causes errors in the apply point enabling flow.
Solve it by explicitly checking if a given TLV is part of the list of
the supported ini debug TLVs.
Signed-off-by: Shahar S Matityahu <shahar.s.matityahu@intel.com>
Fixes: f14cda6f3b ("iwlwifi: trans: parse and store debug ini TLVs")
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
If we fail to initialize because rfkill is enabled, then trying
to do debug collection currently just fails. Prevent that in the
high-level code, although we should probably also fix the lower
level code to do things more carefully.
It's not 100% clear that it fixes this commit, as the original
dump code at the time might've been more careful. In any case,
we don't really need to dump anything in this expected scenario.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Fixes: 7125648074 ("iwlwifi: add fw dump upon RT ucode start failure")
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
The driver uses msix causes-register to handle both msix and non msix
interrupts when performing sync nmi. On devices that do not support
msix this register is unmapped and accessing it causes a kernel panic.
Solve this by differentiating the two cases and accessing the proper
causes-register in each case.
Reported-by: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Shahar S Matityahu <shahar.s.matityahu@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
kernel_randomize_memory() uses __PHYSICAL_MASK_SHIFT to calculate
the maximum amount of system RAM supported. The size of the direct
mapping section is obtained from the smaller one of the below two
values:
(actual system RAM size + padding size) vs (max system RAM size supported)
This calculation is wrong since commit
b83ce5ee91 ("x86/mm/64: Make __PHYSICAL_MASK_SHIFT always 52").
In it, __PHYSICAL_MASK_SHIFT was changed to be 52, regardless of whether
the kernel is using 4-level or 5-level page tables. Thus, it will always
use 4 PB as the maximum amount of system RAM, even in 4-level paging
mode where it should actually be 64 TB.
Thus, the size of the direct mapping section will always
be the sum of the actual system RAM size plus the padding size.
Even when the amount of system RAM is 64 TB, the following layout will
still be used. Obviously KALSR will be weakened significantly.
|____|_______actual RAM_______|_padding_|______the rest_______|
0 64TB ~120TB
Instead, it should be like this:
|____|_______actual RAM_______|_________the rest______________|
0 64TB ~120TB
The size of padding region is controlled by
CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING, which is 10 TB by default.
The above issue only exists when
CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING is set to a non-zero value,
which is the case when CONFIG_MEMORY_HOTPLUG is enabled. Otherwise,
using __PHYSICAL_MASK_SHIFT doesn't affect KASLR.
Fix it by replacing __PHYSICAL_MASK_SHIFT with MAX_PHYSMEM_BITS.
[ bp: Massage commit message. ]
Fixes: b83ce5ee91 ("x86/mm/64: Make __PHYSICAL_MASK_SHIFT always 52")
Signed-off-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thomas Garnier <thgarnie@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: frank.ramsay@hpe.com
Cc: herbert@gondor.apana.org.au
Cc: kirill@shutemov.name
Cc: mike.travis@hpe.com
Cc: thgarnie@google.com
Cc: x86-ml <x86@kernel.org>
Cc: yamada.masahiro@socionext.com
Link: https://lkml.kernel.org/r/20190417083536.GE7065@MiWiFi-R3L-srv
clang points out that the return code from this function is
undefined for one of the error paths:
../drivers/s390/net/ctcm_main.c:1595:7: warning: variable 'result' is used uninitialized whenever 'if' condition is true
[-Wsometimes-uninitialized]
if (priv->channel[direction] == NULL) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../drivers/s390/net/ctcm_main.c:1638:9: note: uninitialized use occurs here
return result;
^~~~~~
../drivers/s390/net/ctcm_main.c:1595:3: note: remove the 'if' if its condition is always false
if (priv->channel[direction] == NULL) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../drivers/s390/net/ctcm_main.c:1539:12: note: initialize the variable 'result' to silence this warning
int result;
^
Make it return -ENODEV here, as in the related failure cases.
gcc has a known bug in underreporting some of these warnings
when it has already eliminated the assignment of the return code
based on some earlier optimization step.
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
gcc warn this:
drivers/net/ethernet/stmicro/stmmac/norm_desc.c: In function ndesc_init_rx_desc:
drivers/net/ethernet/stmicro/stmmac/norm_desc.c:138:6: warning: variable 'bfsize1' set but not used [-Wunused-but-set-variable]
Like enh_desc_init_rx_desc, we should use bfsize1
in ndesc_init_rx_desc to calculate 'p->des1'
Fixes: 583e636141 ("net: stmmac: use correct DMA buffer size in the RX descriptor")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Aaro Koskinen <aaro.koskinen@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When scatter to CQE is enabled on a DCT QP it corrupts the mailbox command
since it tried to treat it as as QP create mailbox command instead of a
DCT create command.
The corrupted mailbox command causes userspace to malfunction as the
device doesn't create the QP as expected.
A new mlx5 capability is exposed to user-space which ensures that it will
not enable the feature on DCT without this fix in the kernel.
Fixes: 5d6ff1babe ("IB/mlx5: Support scatter to CQE for DC transport type")
Signed-off-by: Guy Levi <guyle@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Pull smb3 fixes from Steve French:
"Five small SMB3 fixes, all also for stable - an important fix for an
oplock (lease) bug, a handle leak, and three bugs spotted by KASAN"
* tag '5.1-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
CIFS: keep FileInfo handle live during oplock break
cifs: fix handle leak in smb2_query_symlink()
cifs: Fix lease buffer length error
cifs: Fix use-after-free in SMB2_read
cifs: Fix use-after-free in SMB2_write
If a request transmission fails due to write space or slot unavailability
errors, but the queued task then gets transmitted before it has time to
process the error in call_transmit_status() or call_bc_transmit_status(),
we need to suppress the transmission error code to prevent it from leaking
out of the RPC layer.
Reported-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Tested-by: Chuck Lever <chuck.lever@oracle.com>
This is a leftover from when the rings initially were not free flowing,
and hence a test for tail + 1 == head would indicate full. Since we now
let them wrap instead of mask them with the size, we need to check if
they drift more than the ring size from each other.
This fixes a case where we'd overwrite CQ ring entries, if the user
failed to reap completions. Both cases would ultimately result in lost
completions as the application violated the depth it asked for. The only
difference is that before this fix we'd return invalid entries for the
overflowed completions, instead of properly flagging it in the
cq_ring->overflow variable.
Reported-by: Stefan Bühler <source@stbuehler.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
By calling maps__insert() we assume to get 2 references on the map,
which we relese within maps__remove call.
However if there's already same map name, we currently don't bump the
reference and can crash, like:
Program received signal SIGABRT, Aborted.
0x00007ffff75e60f5 in raise () from /lib64/libc.so.6
(gdb) bt
#0 0x00007ffff75e60f5 in raise () from /lib64/libc.so.6
#1 0x00007ffff75d0895 in abort () from /lib64/libc.so.6
#2 0x00007ffff75d0769 in __assert_fail_base.cold () from /lib64/libc.so.6
#3 0x00007ffff75de596 in __assert_fail () from /lib64/libc.so.6
#4 0x00000000004fc006 in refcount_sub_and_test (i=1, r=0x1224e88) at tools/include/linux/refcount.h:131
#5 refcount_dec_and_test (r=0x1224e88) at tools/include/linux/refcount.h:148
#6 map__put (map=0x1224df0) at util/map.c:299
#7 0x00000000004fdb95 in __maps__remove (map=0x1224df0, maps=0xb17d80) at util/map.c:953
#8 maps__remove (maps=0xb17d80, map=0x1224df0) at util/map.c:959
#9 0x00000000004f7d8a in map_groups__remove (map=<optimized out>, mg=<optimized out>) at util/map_groups.h:65
#10 machine__process_ksymbol_unregister (sample=<optimized out>, event=0x7ffff7279670, machine=<optimized out>) at util/machine.c:728
#11 machine__process_ksymbol (machine=<optimized out>, event=0x7ffff7279670, sample=<optimized out>) at util/machine.c:741
#12 0x00000000004fffbb in perf_session__deliver_event (session=0xb11390, event=0x7ffff7279670, tool=0x7fffffffc7b0, file_offset=13936) at util/session.c:1362
#13 0x00000000005039bb in do_flush (show_progress=false, oe=0xb17e80) at util/ordered-events.c:243
#14 __ordered_events__flush (oe=0xb17e80, how=OE_FLUSH__ROUND, timestamp=<optimized out>) at util/ordered-events.c:322
#15 0x00000000005005e4 in perf_session__process_user_event (session=session@entry=0xb11390, event=event@entry=0x7ffff72a4af8,
...
Add the map to the list and getting the reference event if we find the
map with same name.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Eric Saint-Etienne <eric.saint.etienne@oracle.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Fixes: 1e6285699b ("perf symbols: Fix slowness due to -ffunction-section")
Link: http://lkml.kernel.org/r/20190416160127.30203-10-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull IPMI fixes from Corey Minyard:
"Fixes for some bugs cause by recent changes. One crash if you feed bad
data to the module parameters, one BUG that sometimes occurs when a
user closes the connection, and one bug that cause the driver to not
work if the configuration information only comes in from SMBIOS"
* tag 'for-linus-5.1-2' of git://github.com/cminyard/linux-ipmi:
ipmi: fix sleep-in-atomic in free_user at cleanup SRCU user->release_barrier
ipmi: ipmi_si_hardcode.c: init si_type array to fix a crash
ipmi: Fix failure on SMBIOS specified devices
Pull networking fixes from David Miller:
1) Handle init flow failures properly in iwlwifi driver, from Shahar S
Matityahu.
2) mac80211 TXQs need to be unscheduled on powersave start, from Felix
Fietkau.
3) SKB memory accounting fix in A-MDSU aggregation, from Felix Fietkau.
4) Increase RCU lock hold time in mlx5 FPGA code, from Saeed Mahameed.
5) Avoid checksum complete with XDP in mlx5, also from Saeed.
6) Fix netdev feature clobbering in ibmvnic driver, from Thomas Falcon.
7) Partial sent TLS record leak fix from Jakub Kicinski.
8) Reject zero size iova range in vhost, from Jason Wang.
9) Allow pending work to complete before clcsock release from Karsten
Graul.
10) Fix XDP handling max MTU in thunderx, from Matteo Croce.
11) A lot of protocols look at the sa_family field of a sockaddr before
validating it's length is large enough, from Tetsuo Handa.
12) Don't write to free'd pointer in qede ptp error path, from Colin Ian
King.
13) Have to recompile IP options in ipv4_link_failure because it can be
invoked from ARP, from Stephen Suryaputra.
14) Doorbell handling fixes in qed from Denis Bolotin.
15) Revert net-sysfs kobject register leak fix, it causes new problems.
From Wang Hai.
16) Spectre v1 fix in ATM code, from Gustavo A. R. Silva.
17) Fix put of BROPT_VLAN_STATS_PER_PORT in bridging code, from Nikolay
Aleksandrov.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (111 commits)
socket: fix compat SO_RCVTIMEO_NEW/SO_SNDTIMEO_NEW
tcp: tcp_grow_window() needs to respect tcp_space()
ocelot: Clean up stats update deferred work
ocelot: Don't sleep in atomic context (irqs_disabled())
net: bridge: fix netlink export of vlan_stats_per_port option
qed: fix spelling mistake "faspath" -> "fastpath"
tipc: set sysctl_tipc_rmem and named_timeout right range
tipc: fix link established but not in session
net: Fix missing meta data in skb with vlan packet
net: atm: Fix potential Spectre v1 vulnerabilities
net/core: work around section mismatch warning for ptp_classifier
net: bridge: fix per-port af_packet sockets
bnx2x: fix spelling mistake "dicline" -> "decline"
route: Avoid crash from dereferencing NULL rt->from
MAINTAINERS: normalize Woojung Huh's email address
bonding: fix event handling for stacked bonds
Revert "net-sysfs: Fix memory leak in netdev_register_kobject"
rtnetlink: fix rtnl_valid_stats_req() nlmsg_len check
qed: Fix the DORQ's attentions handling
qed: Fix missing DORQ attentions
...
The patch a6dbe44275 ("vt: perform safe console erase in the right
order") introduced a bug. The conditional do_update_region() was
replaced by a call to update_region() that does contain the conditional
already, but with unwanted extra side effects such as restoring the cursor
drawing.
In order to reproduce the bug:
- use framebuffer console with the AMDGPU driver
- type "links" to start the console www browser
- press 'q' and space to exit links
Now the cursor will be permanently visible in the center of the
screen. It will stay there until something overwrites it.
The bug goes away if we change update_region() back to the conditional
do_update_region().
[ nico: reworded changelog ]
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Nicolas Pitre <nico@fluxnic.net>
Cc: stable@vger.kernel.org
Fixes: a6dbe44275 ("vt: perform safe console erase in the right order")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When called with vmas_arg==NULL, get_user_pages_longterm() allocates
an array of nr_pages*8 which can easily get greater that the max order,
for example, registering memory for a 256GB guest does this and fails
in __alloc_pages_nodemask().
This adds a loop over chunks of entries to fit the max order limit.
Fixes: 678e174c4c ("powerpc/mm/iommu: allow migration of cma allocated pages during mm_iommu_do_alloc", 2019-03-05)
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Currently mm_iommu_do_alloc() is called in 2 cases:
- VFIO_IOMMU_SPAPR_REGISTER_MEMORY ioctl() for normal memory:
this locks &mem_list_mutex and then locks mm::mmap_sem
several times when adjusting locked_vm or pinning pages;
- vfio_pci_nvgpu_regops::mmap() for GPU memory:
this is called with mm::mmap_sem held already and it locks
&mem_list_mutex.
So one can craft a userspace program to do special ioctl and mmap in
2 threads concurrently and cause a deadlock which lockdep warns about
(below).
We did not hit this yet because QEMU constructs the machine in a single
thread.
This moves the overlap check next to where the new entry is added and
reduces the amount of time spent with &mem_list_mutex held.
This moves locked_vm adjustment from under &mem_list_mutex.
This relies on mm_iommu_adjust_locked_vm() doing nothing when entries==0.
This is one of the lockdep warnings:
======================================================
WARNING: possible circular locking dependency detected
5.1.0-rc2-le_nv2_aikATfstn1-p1 #363 Not tainted
------------------------------------------------------
qemu-system-ppc/8038 is trying to acquire lock:
000000002ec6c453 (mem_list_mutex){+.+.}, at: mm_iommu_do_alloc+0x70/0x490
but task is already holding lock:
00000000fd7da97f (&mm->mmap_sem){++++}, at: vm_mmap_pgoff+0xf0/0x160
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&mm->mmap_sem){++++}:
lock_acquire+0xf8/0x260
down_write+0x44/0xa0
mm_iommu_adjust_locked_vm.part.1+0x4c/0x190
mm_iommu_do_alloc+0x310/0x490
tce_iommu_ioctl.part.9+0xb84/0x1150 [vfio_iommu_spapr_tce]
vfio_fops_unl_ioctl+0x94/0x430 [vfio]
do_vfs_ioctl+0xe4/0x930
ksys_ioctl+0xc4/0x110
sys_ioctl+0x28/0x80
system_call+0x5c/0x70
-> #0 (mem_list_mutex){+.+.}:
__lock_acquire+0x1484/0x1900
lock_acquire+0xf8/0x260
__mutex_lock+0x88/0xa70
mm_iommu_do_alloc+0x70/0x490
vfio_pci_nvgpu_mmap+0xc0/0x130 [vfio_pci]
vfio_pci_mmap+0x198/0x2a0 [vfio_pci]
vfio_device_fops_mmap+0x44/0x70 [vfio]
mmap_region+0x5d4/0x770
do_mmap+0x42c/0x650
vm_mmap_pgoff+0x124/0x160
ksys_mmap_pgoff+0xdc/0x2f0
sys_mmap+0x40/0x80
system_call+0x5c/0x70
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&mm->mmap_sem);
lock(mem_list_mutex);
lock(&mm->mmap_sem);
lock(mem_list_mutex);
*** DEADLOCK ***
1 lock held by qemu-system-ppc/8038:
#0: 00000000fd7da97f (&mm->mmap_sem){++++}, at: vm_mmap_pgoff+0xf0/0x160
Fixes: c10c21efa4 ("powerpc/vfio/iommu/kvm: Do not pin device memory", 2018-12-19)
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
`ni6501_alloc_usb_buffers()` is called from `ni6501_auto_attach()` to
allocate RX and TX buffers for USB transfers. It allocates
`devpriv->usb_rx_buf` followed by `devpriv->usb_tx_buf`. If the
allocation of `devpriv->usb_tx_buf` fails, it frees
`devpriv->usb_rx_buf`, leaving the pointer set dangling, and returns an
error. Later, `ni6501_detach()` will be called from the core comedi
module code to clean up. `ni6501_detach()` also frees both
`devpriv->usb_rx_buf` and `devpriv->usb_tx_buf`, but
`devpriv->usb_rx_buf` may have already beed freed, leading to a
double-free error. Fix it bu removing the call to
`kfree(devpriv->usb_rx_buf)` from `ni6501_alloc_usb_buffers()`, relying
on `ni6501_detach()` to free the memory.
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If `ni6501_auto_attach()` returns an error, the core comedi module code
will call `ni6501_detach()` to clean up. If `ni6501_auto_attach()`
successfully allocated the comedi device private data, `ni6501_detach()`
assumes that a `struct mutex mut` contained in the private data has been
initialized and uses it. Unfortunately, there are a couple of places
where `ni6501_auto_attach()` can return an error after allocating the
device private data but before initializing the mutex, so this
assumption is invalid. Fix it by initializing the mutex just after
allocating the private data in `ni6501_auto_attach()` before any other
errors can be retturned. Also move the call to `usb_set_intfdata()`
just to keep the code a bit neater (either position for the call is
fine).
I believe this was the cause of the following syzbot crash report
<https://syzkaller.appspot.com/bug?extid=cf4f2b6c24aff0a3edf6>:
usb 1-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
usb 1-1: config 0 descriptor??
usb 1-1: string descriptor 0 read error: -71
comedi comedi0: Wrong number of endpoints
ni6501 1-1:0.233: driver 'ni6501' failed to auto-configure device.
INFO: trying to register non-static key.
the code is fine but needs lockdep annotation.
turning off the locking correctness validator.
CPU: 0 PID: 585 Comm: kworker/0:3 Not tainted 5.1.0-rc4-319354-g9a33b36 #3
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: usb_hub_wq hub_event
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0xe8/0x16e lib/dump_stack.c:113
assign_lock_key kernel/locking/lockdep.c:786 [inline]
register_lock_class+0x11b8/0x1250 kernel/locking/lockdep.c:1095
__lock_acquire+0xfb/0x37c0 kernel/locking/lockdep.c:3582
lock_acquire+0x10d/0x2f0 kernel/locking/lockdep.c:4211
__mutex_lock_common kernel/locking/mutex.c:925 [inline]
__mutex_lock+0xfe/0x12b0 kernel/locking/mutex.c:1072
ni6501_detach+0x5b/0x110 drivers/staging/comedi/drivers/ni_usb6501.c:567
comedi_device_detach+0xed/0x800 drivers/staging/comedi/drivers.c:204
comedi_device_cleanup.part.0+0x68/0x140 drivers/staging/comedi/comedi_fops.c:156
comedi_device_cleanup drivers/staging/comedi/comedi_fops.c:187 [inline]
comedi_free_board_dev.part.0+0x16/0x90 drivers/staging/comedi/comedi_fops.c:190
comedi_free_board_dev drivers/staging/comedi/comedi_fops.c:189 [inline]
comedi_release_hardware_device+0x111/0x140 drivers/staging/comedi/comedi_fops.c:2880
comedi_auto_config.cold+0x124/0x1b0 drivers/staging/comedi/drivers.c:1068
usb_probe_interface+0x31d/0x820 drivers/usb/core/driver.c:361
really_probe+0x2da/0xb10 drivers/base/dd.c:509
driver_probe_device+0x21d/0x350 drivers/base/dd.c:671
__device_attach_driver+0x1d8/0x290 drivers/base/dd.c:778
bus_for_each_drv+0x163/0x1e0 drivers/base/bus.c:454
__device_attach+0x223/0x3a0 drivers/base/dd.c:844
bus_probe_device+0x1f1/0x2a0 drivers/base/bus.c:514
device_add+0xad2/0x16e0 drivers/base/core.c:2106
usb_set_configuration+0xdf7/0x1740 drivers/usb/core/message.c:2021
generic_probe+0xa2/0xda drivers/usb/core/generic.c:210
usb_probe_device+0xc0/0x150 drivers/usb/core/driver.c:266
really_probe+0x2da/0xb10 drivers/base/dd.c:509
driver_probe_device+0x21d/0x350 drivers/base/dd.c:671
__device_attach_driver+0x1d8/0x290 drivers/base/dd.c:778
bus_for_each_drv+0x163/0x1e0 drivers/base/bus.c:454
__device_attach+0x223/0x3a0 drivers/base/dd.c:844
bus_probe_device+0x1f1/0x2a0 drivers/base/bus.c:514
device_add+0xad2/0x16e0 drivers/base/core.c:2106
usb_new_device.cold+0x537/0xccf drivers/usb/core/hub.c:2534
hub_port_connect drivers/usb/core/hub.c:5089 [inline]
hub_port_connect_change drivers/usb/core/hub.c:5204 [inline]
port_event drivers/usb/core/hub.c:5350 [inline]
hub_event+0x138e/0x3b00 drivers/usb/core/hub.c:5432
process_one_work+0x90f/0x1580 kernel/workqueue.c:2269
worker_thread+0x9b/0xe20 kernel/workqueue.c:2415
kthread+0x313/0x420 kernel/kthread.c:253
ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352
Reported-by: syzbot+cf4f2b6c24aff0a3edf6@syzkaller.appspotmail.com
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Joel reported weird crashes using skiroot_defconfig, in his case we
jumped into an NX page:
kernel tried to execute exec-protected page (c000000002bff4f0) - exploit attempt? (uid: 0)
BUG: Unable to handle kernel instruction fetch
Faulting instruction address: 0xc000000002bff4f0
Looking at the disassembly, we had simply branched to that address:
c000000000c001bc 49fff335 bl c000000002bff4f0
But that didn't match the original kernel image:
c000000000c001bc 4bfff335 bl c000000000bff4f0 <kobject_get+0x8>
When STRICT_KERNEL_RWX is enabled, and we're using the radix MMU, we
call radix__change_memory_range() late in boot to change page
protections. We do that both to mark rodata read only and also to mark
init text no-execute. That involves walking the kernel page tables,
and clearing _PAGE_WRITE or _PAGE_EXEC respectively.
With radix we may use hugepages for the linear mapping, so the code in
radix__change_memory_range() uses eg. pmd_huge() to test if it has
found a huge mapping, and if so it stops the page table walk and
changes the PMD permissions.
However if the kernel is built without HUGETLBFS support, pmd_huge()
is just a #define that always returns 0. That causes the code in
radix__change_memory_range() to incorrectly interpret the PMD value as
a pointer to a PTE page rather than as a PTE at the PMD level.
We can see this using `dv` in xmon which also uses pmd_huge():
0:mon> dv c000000000000000
pgd @ 0xc000000001740000
pgdp @ 0xc000000001740000 = 0x80000000ffffb009
pudp @ 0xc0000000ffffb000 = 0x80000000ffffa009
pmdp @ 0xc0000000ffffa000 = 0xc00000000000018f <- this is a PTE
ptep @ 0xc000000000000100 = 0xa64bb17da64ab07d <- kernel text
The end result is we treat the value at 0xc000000000000100 as a PTE
and clear _PAGE_WRITE or _PAGE_EXEC, potentially corrupting the code
at that address.
In Joel's specific case we cleared the sign bit in the offset of the
branch, causing a backward branch to turn into a forward branch which
caused us to branch into a non-executable page. However the exact
nature of the crash depends on kernel version, compiler version, and
other factors.
We need to fix radix__change_memory_range() to not use accessors that
depend on HUGETLBFS, but we also have radix memory hotplug code that
uses pmd_huge() etc that will also need fixing. So for now just
disallow the broken combination of Radix with HUGETLBFS disabled.
The only defconfig we have that is affected is skiroot_defconfig, so
turn on HUGETLBFS there so that it still gets Radix.
Fixes: 566ca99af0 ("powerpc/mm/radix: Add dummy radix_enabled()")
Cc: stable@vger.kernel.org # v4.7+
Reported-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
We have two Dell laptops which have the codec 10ec0236 and 10ec0256
respectively, the headset mic on them can't work, need to apply the
quirk of ALC255_FIXUP_DELL1_MIC_NO_PRESENCE. So adding their pin
configurations in the pin quirk table.
Cc: <stable@vger.kernel.org>
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Inline assembly code changed in this patch should really use "Q"
constraint "Memory reference without index register and with short
displacement". The kernel build with kasan instrumentation enabled
might occasionally break otherwise (due to stack instrumentation).
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The audio configuration is only valid if the HDMI codec has been
properly set up. Do not attempt to set up audio before that happens
because it causes a division by zero.
Note that this is only problematic on Tegra20 and Tegra30. Later chips
implement the division instructions which return zero when dividing by
zero and don't throw an exception.
Fixes: db5adf4d6d ("drm/tegra: hdmi: Fix audio to work with any pixel clock rate")
Reported-by: Marcel Ziswiler <marcel.ziswiler@toradex.com>
Tested-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
It looks like the new socket options only work correctly
for native execution, but in case of compat mode fall back
to the old behavior as we ignore the 'old_timeval' flag.
Rework so we treat SO_RCVTIMEO_NEW/SO_SNDTIMEO_NEW the
same way in compat and native 32-bit mode.
Cc: Deepa Dinamani <deepa.kernel@gmail.com>
Fixes: a9beb86ae6 ("sock: Add SO_RCVTIMEO_NEW and SO_SNDTIMEO_NEW")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Deepa Dinamani <deepa.kernel@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For some reason, tcp_grow_window() correctly tests if enough room
is present before attempting to increase tp->rcv_ssthresh,
but does not prevent it to grow past tcp_space()
This is causing hard to debug issues, like failing
the (__tcp_select_window(sk) >= tp->rcv_wnd) test
in __tcp_ack_snd_check(), causing ACK delays and possibly
slow flows.
Depending on tcp_rmem[2], MTU, skb->len/skb->truesize ratio,
we can see the problem happening on "netperf -t TCP_RR -- -r 2000,2000"
after about 60 round trips, when the active side no longer sends
immediate acks.
This bug predates git history.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Wei Wang <weiwan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is preventive cleanup that may save troubles later.
No need to cancel repeateadly queued work if code is properly
refactored.
Don't let the ethtool -s process interfere with the stat workqueue
scheduling.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the introduction of the vlan_stats_per_port option the netlink
export of it has been broken since I made a typo and used the ifla
attribute instead of the bridge option to retrieve its state.
Sysfs export is fine, only netlink export has been affected.
Fixes: 9163a0fc1f ("net: bridge: add support for per-port vlan stats")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We find that sysctl_tipc_rmem and named_timeout do not have the right minimum
setting. sysctl_tipc_rmem should be larger than zero, like sysctl_tcp_rmem.
And named_timeout as a timeout setting should be not less than zero.
Fixes: cc79dd1ba9 ("tipc: change socket buffer overflow control to respect sk_rcvbuf")
Fixes: a5325ae5b8 ("tipc: add name distributor resiliency queue")
Signed-off-by: Jie Liu <liujie165@huawei.com>
Reported-by: Qiang Ning <ningqiang1@huawei.com>
Reviewed-by: Zhiqiang Liu <liuzhiqiang26@huawei.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
According to the link FSM, when a link endpoint got RESET_MSG (- a
traditional one without the stopping bit) from its peer, it moves to
PEER_RESET state and raises a LINK_DOWN event which then resets the
link itself. Its state will become ESTABLISHING after the reset event
and the link will be re-established soon after this endpoint starts to
send ACTIVATE_MSG to the peer.
There is no problem with this mechanism, however the link resetting has
cleared the link 'in_session' flag (along with the other important link
data such as: the link 'mtu') that was correctly set up at the 1st step
(i.e. when this endpoint received the peer RESET_MSG). As a result, the
link will become ESTABLISHED, but the 'in_session' flag is not set, and
all STATE_MSG from its peer will be dropped at the link_validate_msg().
It means the link not synced and will sooner or later face a failure.
Since the link reset action is obviously needed for a new link session
(this is also true in the other situations), the problem here is that
the link is re-established a bit too early when the link endpoints are
not really in-sync yet. The commit forces a resync as already done in
the previous commit 91986ee166 ("tipc: fix link session and
re-establish issues") by simply varying the link 'peer_session' value
at the link_reset().
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
arg is controlled by user-space, hence leading to a potential
exploitation of the Spectre variant 1 vulnerability.
This issue was detected with the help of Smatch:
net/atm/lec.c:715 lec_mcast_attach() warn: potential spectre issue 'dev_lec' [r] (local cap)
Fix this by sanitizing arg before using it to index dev_lec.
Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].
[1] https://lore.kernel.org/lkml/20180423164740.GY17484@dhcp22.suse.cz/
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The routine ptp_classifier_init() uses an initializer for an
automatic struct type variable which refers to an __initdata
symbol. This is perfectly legal, but may trigger a section
mismatch warning when running the compiler in -fpic mode, due
to the fact that the initializer may be emitted into an anonymous
.data section thats lack the __init annotation. So work around it
by using assignments instead.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the commit below was introduced it changed two visible things:
- the skb was no longer passed through the protocol handlers with the
original device
- the skb was passed up the stack with skb->dev = bridge
The first change broke af_packet sockets on bridge ports. For example we
use them for hostapd which listens for ETH_P_PAE packets on the ports.
We discussed two possible fixes:
- create a clone and pass it through NF_HOOK(), act on the original skb
based on the result
- somehow signal to the caller from the okfn() that it was called,
meaning the skb is ok to be passed, which this patch is trying to
implement via returning 1 from the bridge link-local okfn()
Note that we rely on the fact that NF_QUEUE/STOLEN would return 0 and
drop/error would return < 0 thus the okfn() is called only when the
return was 1, so we signal to the caller that it was called by preserving
the return value from nf_hook().
Fixes: 8626c56c82 ("bridge: fix potential use-after-free when hook returns QUEUE or STOLEN verdict")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently it's not possible to use perf on ath79 due to genirq flags
mismatch happening on static virtual IRQ 13 which is used for
performance counters hardware IRQ 5.
On TP-Link Archer C7v5:
CPU0
2: 0 MIPS 2 ath9k
4: 318 MIPS 4 19000000.eth
7: 55034 MIPS 7 timer
8: 1236 MISC 3 ttyS0
12: 0 INTC 1 ehci_hcd:usb1
13: 0 gpio-ath79 2 keys
14: 0 gpio-ath79 5 keys
15: 31 AR724X PCI 1 ath10k_pci
$ perf top
genirq: Flags mismatch irq 13. 00014c83 (mips_perf_pmu) vs. 00002003 (keys)
On TP-Link Archer C7v4:
CPU0
4: 0 MIPS 4 19000000.eth
5: 7135 MIPS 5 1a000000.eth
7: 98379 MIPS 7 timer
8: 30 MISC 3 ttyS0
12: 90028 INTC 0 ath9k
13: 5520 INTC 1 ehci_hcd:usb1
14: 4623 INTC 2 ehci_hcd:usb2
15: 32844 AR724X PCI 1 ath10k_pci
16: 0 gpio-ath79 16 keys
23: 0 gpio-ath79 23 keys
$ perf top
genirq: Flags mismatch irq 13. 00014c80 (mips_perf_pmu) vs. 00000080 (ehci_hcd:usb1)
This problem is happening, because currently statically assigned virtual
IRQ 13 for performance counters is not claimed during the initialization
of MIPS PMU during the bootup, so the IRQ subsystem doesn't know, that
this interrupt isn't available for further use.
So this patch fixes the issue by simply booking hardware IRQ 5 for MIPS PMU.
Tested-by: Kevin 'ldir' Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
Signed-off-by: Petr Štetiar <ynezz@true.cz>
Acked-by: John Crispin <john@phrozen.org>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Paul Burton <paul.burton@mips.com>
Cc: linux-mips@vger.kernel.org
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jason Cooper <jason@lakedaemon.net>
The intended behavior of function ipmi_hardcode_init_one() is to default
to kcs interface when no type argument is presented when initializing
ipmi with hard coded addresses.
However, the array of char pointers allocated on the stack by function
ipmi_hardcode_init() was not inited to zeroes, so it contained stack
debris.
Consequently, passing the cruft stored in this array to function
ipmi_hardcode_init_one() caused a crash when it was unable to detect
that the char * being passed was nonsense and tried to access the
address specified by the bogus pointer.
The fix is simply to initialize the si_type array to zeroes, so if
there were no type argument given to at the command line, function
ipmi_hardcode_init_one() could properly default to the kcs interface.
Signed-off-by: Tony Camuso <tcamuso@redhat.com>
Message-Id: <1554837603-40299-1-git-send-email-tcamuso@redhat.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
An extra memset was put into a place that cleared the interface
type.
Reported-by: Tony Camuso <tcamuso@redhat.com>
Fixes: 3cd83bac48 ("ipmi: Consolidate the adding of platform devices")
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Pull RISC-V fixes from Palmer Dabbelt:
"This contains an assortment of RISC-V-related fixups that we found
after rc4. They're all really unrelated:
- The addition of a 32-bit defconfig, to emphasize testing the 32-bit
port.
- A device tree bindings patch, which is pre-work for some patches
that target 5.2.
- A fix to support booting on systems with more physical memory than
the maximum supported by the kernel"
* tag 'riscv-for-linus-5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux:
RISC-V: Fix Maximum Physical Memory 2GiB option for 64bit systems
dt-bindings: clock: sifive: add FU540-C000 PRCI clock constants
RISC-V: Add separate defconfig for 32bit systems
Pull KVM fixes from Paolo Bonzini:
"5.1 keeps its reputation as a big bugfix release for KVM x86.
- Fix for a memory leak introduced during the merge window
- Fixes for nested VMX with ept=0
- Fixes for AMD (APIC virtualization, NMI injection)
- Fixes for Hyper-V under KVM and KVM under Hyper-V
- Fixes for 32-bit SMM and tests for SMM virtualization
- More array_index_nospec peppering"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (21 commits)
KVM: x86: avoid misreporting level-triggered irqs as edge-triggered in tracing
KVM: fix spectrev1 gadgets
KVM: x86: fix warning Using plain integer as NULL pointer
selftests: kvm: add a selftest for SMM
selftests: kvm: fix for compilers that do not support -no-pie
selftests: kvm/evmcs_test: complete I/O before migrating guest state
KVM: x86: Always use 32-bit SMRAM save state for 32-bit kernels
KVM: x86: Don't clear EFER during SMM transitions for 32-bit vCPU
KVM: x86: clear SMM flags before loading state while leaving SMM
KVM: x86: Open code kvm_set_hflags
KVM: x86: Load SMRAM in a single shot when leaving SMM
KVM: nVMX: Expose RDPMC-exiting only when guest supports PMU
KVM: x86: Raise #GP when guest vCPU do not support PMU
x86/kvm: move kvm_load/put_guest_xcr0 into atomic context
KVM: x86: svm: make sure NMI is injected after nmi_singlestep
svm/avic: Fix invalidate logical APIC id entry
Revert "svm: Fix AVIC incomplete IPI emulation"
kvm: mmu: Fix overflow on kvm mmu page limit calculation
KVM: nVMX: always use early vmcs check when EPT is disabled
KVM: nVMX: allow tests to use bad virtual-APIC page address
...
Pull perf/urgent fixes from Arnaldo Carvalho de Melo:
core:
Mao Han:
- Use hweight64() instead of hweight_long(attr.sample_regs_user) when parsing
samples, this is what the kernel uses and fixes de problem in 32-bit
architectures such as C-SKY that have more than 32 registers that can come
in a sample.
perf stat:
Jiri Olsa:
- Disable DIR_FORMAT feature for 'perf stat record', fixing an assert()
failure.
Intel PT:
Adrian Hunter:
- Fix use of parent_id in calls_view in export-to-sqlite.py.
BPF:
Gustavo A. R. Silva:
- Fix lock/unlock imbalances when processing BPF/BTF info, found by the
coverity tool.
libtraceevent:
Rikard Falkeborn:
- Fix missing equality check for strcmp(), detected by the cppcheck tool.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There is a small race window in the card disconnection code that
allows the registration of another card with the very same card id.
This leads to a warning in procfs creation as caught by syzkaller.
The problem is that we delete snd_cards and snd_cards_lock entries at
the very beginning of the disconnection procedure. This makes the
slot available to be assigned for another card object while the
disconnection procedure is being processed. Then it becomes possible
to issue a procfs registration with the existing file name although we
check the conflict beforehand.
The fix is simply to move the snd_cards and snd_cards_lock clearances
at the end of the disconnection procedure. The references to these
entries are merely either from the global proc files like
/proc/asound/cards or from the card registration / disconnection, so
it should be fine to shift at the very end.
Reported-by: syzbot+48df349490c36f9f54ab@syzkaller.appspotmail.com
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
syzbot reported the following warning:
[ ] WARNING: CPU: 4 PID: 17089 at kernel/sched/deadline.c:255 task_non_contending+0xae0/0x1950
line 255 of deadline.c is:
WARN_ON(hrtimer_active(&dl_se->inactive_timer));
in task_non_contending().
Unfortunately, in some cases (for example, a deadline task
continuosly blocking and waking immediately) it can happen that
a task blocks (and task_non_contending() is called) while the
0-lag timer is still active.
In this case, the safest thing to do is to immediately decrease
the running bandwidth of the task, without trying to re-arm the 0-lag timer.
Signed-off-by: luca abeni <luca.abeni@santannapisa.it>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juri Lelli <juri.lelli@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: chengjian (D) <cj.chengjian@huawei.com>
Link: https://lkml.kernel.org/r/20190325131530.34706-1-luca.abeni@santannapisa.it
Signed-off-by: Ingo Molnar <mingo@kernel.org>
With extremely short cfs_period_us setting on a parent task group with a large
number of children the for loop in sched_cfs_period_timer() can run until the
watchdog fires. There is no guarantee that the call to hrtimer_forward_now()
will ever return 0. The large number of children can make
do_sched_cfs_period_timer() take longer than the period.
NMI watchdog: Watchdog detected hard LOCKUP on cpu 24
RIP: 0010:tg_nop+0x0/0x10
<IRQ>
walk_tg_tree_from+0x29/0xb0
unthrottle_cfs_rq+0xe0/0x1a0
distribute_cfs_runtime+0xd3/0xf0
sched_cfs_period_timer+0xcb/0x160
? sched_cfs_slack_timer+0xd0/0xd0
__hrtimer_run_queues+0xfb/0x270
hrtimer_interrupt+0x122/0x270
smp_apic_timer_interrupt+0x6a/0x140
apic_timer_interrupt+0xf/0x20
</IRQ>
To prevent this we add protection to the loop that detects when the loop has run
too many times and scales the period and quota up, proportionally, so that the timer
can complete before then next period expires. This preserves the relative runtime
quota while preventing the hard lockup.
A warning is issued reporting this state and the new values.
Signed-off-by: Phil Auld <pauld@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: Anton Blanchard <anton@ozlabs.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190319130005.25492-1-pauld@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In the oplock break handler, writing pending changes from pages puts
the FileInfo handle. If the refcount reaches zero it closes the handle
and waits for any oplock break handler to return, thus causing a deadlock.
To prevent this situation:
* We add a wait flag to cifsFileInfo_put() to decide whether we should
wait for running/pending oplock break handlers
* We keep an additionnal reference of the SMB FileInfo handle so that
for the rest of the handler putting the handle won't close it.
- The ref is bumped everytime we queue the handler via the
cifs_queue_oplock_break() helper.
- The ref is decremented at the end of the handler
This bug was triggered by xfstest 464.
Also important fix to address the various reports of
oops in smb2_push_mandatory_locks
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
CC: Stable <stable@vger.kernel.org>
If we enter smb2_query_symlink() for something that is not a symlink
and where the SMB2_open() would succeed we would never end up
closing this handle and would thus leak a handle on the server.
Fix this by immediately calling SMB2_close() on successfull open.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
There is a KASAN slab-out-of-bounds:
BUG: KASAN: slab-out-of-bounds in _copy_from_iter_full+0x783/0xaa0
Read of size 80 at addr ffff88810c35e180 by task mount.cifs/539
CPU: 1 PID: 539 Comm: mount.cifs Not tainted 4.19 #10
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
Call Trace:
dump_stack+0xdd/0x12a
print_address_description+0xa7/0x540
kasan_report+0x1ff/0x550
check_memory_region+0x2f1/0x310
memcpy+0x2f/0x80
_copy_from_iter_full+0x783/0xaa0
tcp_sendmsg_locked+0x1840/0x4140
tcp_sendmsg+0x37/0x60
inet_sendmsg+0x18c/0x490
sock_sendmsg+0xae/0x130
smb_send_kvec+0x29c/0x520
__smb_send_rqst+0x3ef/0xc60
smb_send_rqst+0x25a/0x2e0
compound_send_recv+0x9e8/0x2af0
cifs_send_recv+0x24/0x30
SMB2_open+0x35e/0x1620
open_shroot+0x27b/0x490
smb2_open_op_close+0x4e1/0x590
smb2_query_path_info+0x2ac/0x650
cifs_get_inode_info+0x1058/0x28f0
cifs_root_iget+0x3bb/0xf80
cifs_smb3_do_mount+0xe00/0x14c0
cifs_do_mount+0x15/0x20
mount_fs+0x5e/0x290
vfs_kern_mount+0x88/0x460
do_mount+0x398/0x31e0
ksys_mount+0xc6/0x150
__x64_sys_mount+0xea/0x190
do_syscall_64+0x122/0x590
entry_SYSCALL_64_after_hwframe+0x44/0xa9
It can be reproduced by the following step:
1. samba configured with: server max protocol = SMB2_10
2. mount -o vers=default
When parse the mount version parameter, the 'ops' and 'vals'
was setted to smb30, if negotiate result is smb21, just
update the 'ops' to smb21, but the 'vals' is still smb30.
When add lease context, the iov_base is allocated with smb21
ops, but the iov_len is initiallited with the smb30. Because
the iov_len is longer than iov_base, when send the message,
copy array out of bounds.
we need to keep the 'ops' and 'vals' consistent.
Fixes: 9764c02fcb ("SMB3: Add support for multidialect negotiate (SMB2.1 and later)")
Fixes: d5c7076b77 ("smb3: add smb3.1.1 to default dialect list")
Signed-off-by: ZhangXiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
There is a KASAN use-after-free:
BUG: KASAN: use-after-free in SMB2_read+0x1136/0x1190
Read of size 8 at addr ffff8880b4e45e50 by task ln/1009
Should not release the 'req' because it will use in the trace.
Fixes: eccb4422cf ("smb3: Add ftrace tracepoints for improved SMB3 debugging")
Signed-off-by: ZhangXiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org> 4.18+
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
There is a KASAN use-after-free:
BUG: KASAN: use-after-free in SMB2_write+0x1342/0x1580
Read of size 8 at addr ffff8880b6a8e450 by task ln/4196
Should not release the 'req' because it will use in the trace.
Fixes: eccb4422cf ("smb3: Add ftrace tracepoints for improved SMB3 debugging")
Signed-off-by: ZhangXiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org> 4.18+
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
There was a missing comparison with 0 when checking if type is "s64" or
"u64". Therefore, the body of the if-statement was entered if "type" was
"u64" or not "s64", which made the first strcmp() redundant since if
type is "u64", it's not "s64".
If type is "s64", the body of the if-statement is not entered but since
the remainder of the function consists of if-statements which will not
be entered if type is "s64", we will just return "val", which is
correct, albeit at the cost of a few more calls to strcmp(), i.e., it
will behave just as if the if-statement was entered.
If type is neither "s64" or "u64", the body of the if-statement will be
entered incorrectly and "val" returned. This means that any type that is
checked after "s64" and "u64" is handled the same way as "s64" and
"u64", i.e., the limiting of "val" to fit in for example "s8" is never
reached.
This was introduced in the kernel tree when the sources were copied from
trace-cmd in commit f7d82350e5 ("tools/events: Add files to create
libtraceevent.a"), and in the trace-cmd repo in 1cdbae6035cei
("Implement typecasting in parser") when the function was introduced,
i.e., it has always behaved the wrong way.
Detected by cppcheck.
Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Tzvetomir Stoyanov <tstoyanov@vmware.com>
Fixes: f7d82350e5 ("tools/events: Add files to create libtraceevent.a")
Link: http://lkml.kernel.org/r/20190409091529.2686-1-rikard.falkeborn@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull clockevent/clocksource fixes from Daniel Lezcano:
- Fix TIMER_OF missing option dependency for npcm (Arnd Bergmann)
- Remove a pointless macro call for arm_arch_timer (Yangtao Li)
- Fix wrong compatible string for oxnas (Neil Armstrong)
- Fix compilation warning by removing a dead function on omap (Nathan Chancellor)
The ALSA proc helper manages the child nodes in a linked list, but its
addition and deletion is done without any lock. This leads to a
corruption if they are operated concurrently. Usually this isn't a
problem because the proc entries are added sequentially in the driver
probe procedure itself. But the card registrations are done often
asynchronously, and the crash could be actually reproduced with
syzkaller.
This patch papers over it by protecting the link addition and deletion
with the parent's mutex. There is "access" mutex that is used for the
file access, and this can be reused for this purpose as well.
Reported-by: syzbot+48df349490c36f9f54ab@syzkaller.appspotmail.com
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
In __apic_accept_irq() interface trig_mode is int and actually on some code
paths it is set above u8:
kvm_apic_set_irq() extracts it from 'struct kvm_lapic_irq' where trig_mode
is u16. This is done on purpose as e.g. kvm_set_msi_irq() sets it to
(1 << 15) & e->msi.data
kvm_apic_local_deliver sets it to reg & (1 << 15).
Fix the immediate issue by making 'tm' into u16. We may also want to adjust
__apic_accept_irq() interface and use proper sizes for vector, level,
trig_mode but this is not urgent.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Changed passing argument as "0 to NULL" which resolves below sparse warning
arch/x86/kvm/x86.c:3096:61: warning: Using plain integer as NULL pointer
Signed-off-by: Hariprasad Kelam <hariprasad.kelam@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add a simple test for SMM, based on VMX. The test implements its own
sync between the guest and the host as using our ucall library seems to
be too cumbersome: SMI handler is happening in real-address mode.
This patch also fixes KVM_SET_NESTED_STATE to happen after
KVM_SET_VCPU_EVENTS, in fact it places it last. This is because
KVM needs to know whether the processor is in SMM or not.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-no-pie was added to GCC at the same time as their configuration option
--enable-default-pie. Compilers that were built before do not have
-no-pie, but they also do not need it. Detect the option at build
time.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Starting state migration after an IO exit without first completing IO
may result in test failures. We already have two tests that need this
(this patch in fact fixes evmcs_test, similar to what was fixed for
state_test in commit 0f73bbc851, "KVM: selftests: complete IO before
migrating guest state", 2019-03-13) and a third is coming. So, move the
code to vcpu_save_state, and while at it do not access register state
until after I/O is complete.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Invoking the 64-bit variation on a 32-bit kenrel will crash the guest,
trigger a WARN, and/or lead to a buffer overrun in the host, e.g.
rsm_load_state_64() writes r8-r15 unconditionally, but enum kvm_reg and
thus x86_emulate_ctxt._regs only define r8-r15 for CONFIG_X86_64.
KVM allows userspace to report long mode support via CPUID, even though
the guest is all but guaranteed to crash if it actually tries to enable
long mode. But, a pure 32-bit guest that is ignorant of long mode will
happily plod along.
SMM complicates things as 64-bit CPUs use a different SMRAM save state
area. KVM handles this correctly for 64-bit kernels, e.g. uses the
legacy save state map if userspace has hid long mode from the guest,
but doesn't fare well when userspace reports long mode support on a
32-bit host kernel (32-bit KVM doesn't support 64-bit guests).
Since the alternative is to crash the guest, e.g. by not loading state
or explicitly requesting shutdown, unconditionally use the legacy SMRAM
save state map for 32-bit KVM. If a guest has managed to get far enough
to handle SMIs when running under a weird/buggy userspace hypervisor,
then don't deliberately crash the guest since there are no downsides
(from KVM's perspective) to allow it to continue running.
Fixes: 660a5d517a ("KVM: x86: save/load state on SMM switch")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Neither AMD nor Intel CPUs have an EFER field in the legacy SMRAM save
state area, i.e. don't save/restore EFER across SMM transitions. KVM
somewhat models this, e.g. doesn't clear EFER on entry to SMM if the
guest doesn't support long mode. But during RSM, KVM unconditionally
clears EFER so that it can get back to pure 32-bit mode in order to
start loading CRs with their actual non-SMM values.
Clear EFER only when it will be written when loading the non-SMM state
so as to preserve bits that can theoretically be set on 32-bit vCPUs,
e.g. KVM always emulates EFER_SCE.
And because CR4.PAE is cleared only to play nice with EFER, wrap that
code in the long mode check as well. Note, this may result in a
compiler warning about cr4 being consumed uninitialized. Re-read CR4
even though it's technically unnecessary, as doing so allows for more
readable code and RSM emulation is not a performance critical path.
Fixes: 660a5d517a ("KVM: x86: save/load state on SMM switch")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
RSM emulation is currently broken on VMX when the interrupted guest has
CR4.VMXE=1. Stop dancing around the issue of HF_SMM_MASK being set when
loading SMSTATE into architectural state, e.g. by toggling it for
problematic flows, and simply clear HF_SMM_MASK prior to loading
architectural state (from SMRAM save state area).
Reported-by: Jon Doron <arilou@gmail.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Liran Alon <liran.alon@oracle.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Fixes: 5bea5123cb ("KVM: VMX: check nested state and CR4.VMXE against SMM")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Tested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Prepare for clearing HF_SMM_MASK prior to loading state from the SMRAM
save state map, i.e. kvm_smm_changed() needs to be called after state
has been loaded and so cannot be done automatically when setting
hflags from RSM.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
RSM emulation is currently broken on VMX when the interrupted guest has
CR4.VMXE=1. Rather than dance around the issue of HF_SMM_MASK being set
when loading SMSTATE into architectural state, ideally RSM emulation
itself would be reworked to clear HF_SMM_MASK prior to loading non-SMM
architectural state.
Ostensibly, the only motivation for having HF_SMM_MASK set throughout
the loading of state from the SMRAM save state area is so that the
memory accesses from GET_SMSTATE() are tagged with role.smm. Load
all of the SMRAM save state area from guest memory at the beginning of
RSM emulation, and load state from the buffer instead of reading guest
memory one-by-one.
This paves the way for clearing HF_SMM_MASK prior to loading state,
and also aligns RSM with the enter_smm() behavior, which fills a
buffer and writes SMRAM save state in a single go.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Issue was discovered when running kvm-unit-tests on KVM running as L1 on
top of Hyper-V.
When vmx_instruction_intercept unit-test attempts to run RDPMC to test
RDPMC-exiting, it is intercepted by L1 KVM which it's EXIT_REASON_RDPMC
handler raise #GP because vCPU exposed by Hyper-V doesn't support PMU.
Instead of unit-test expectation to be reflected with EXIT_REASON_RDPMC.
The reason vmx_instruction_intercept unit-test attempts to run RDPMC
even though Hyper-V doesn't support PMU is because L1 expose to L2
support for RDPMC-exiting. Which is reasonable to assume that is
supported only in case CPU supports PMU to being with.
Above issue can easily be simulated by modifying
vmx_instruction_intercept config in x86/unittests.cfg to run QEMU with
"-cpu host,+vmx,-pmu" and run unit-test.
To handle issue, change KVM to expose RDPMC-exiting only when guest
supports PMU.
Reported-by: Saar Amar <saaramar@microsoft.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Before this change, reading a VMware pseduo PMC will succeed even when
PMU is not supported by guest. This can easily be seen by running
kvm-unit-test vmware_backdoors with "-cpu host,-pmu" option.
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
guest xcr0 could leak into host when MCE happens in guest mode. Because
do_machine_check() could schedule out at a few places.
For example:
kvm_load_guest_xcr0
...
kvm_x86_ops->run(vcpu) {
vmx_vcpu_run
vmx_complete_atomic_exit
kvm_machine_check
do_machine_check
do_memory_failure
memory_failure
lock_page
In this case, host_xcr0 is 0x2ff, guest vcpu xcr0 is 0xff. After schedule
out, host cpu has guest xcr0 loaded (0xff).
In __switch_to {
switch_fpu_finish
copy_kernel_to_fpregs
XRSTORS
If any bit i in XSTATE_BV[i] == 1 and xcr0[i] == 0, XRSTORS will
generate #GP (In this case, bit 9). Then ex_handler_fprestore kicks in
and tries to reinitialize fpu by restoring init fpu state. Same story as
last #GP, except we get DOUBLE FAULT this time.
Cc: stable@vger.kernel.org
Signed-off-by: WANG Chao <chao.wang@ucloud.cn>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
I noticed that apic test from kvm-unit-tests always hangs on my EPYC 7401P,
the hanging test nmi-after-sti is trying to deliver 30000 NMIs and tracing
shows that we're sometimes able to deliver a few but never all.
When we're trying to inject an NMI we may fail to do so immediately for
various reasons, however, we still need to inject it so enable_nmi_window()
arms nmi_singlestep mode. #DB occurs as expected, but we're not checking
for pending NMIs before entering the guest and unless there's a different
event to process, the NMI will never get delivered.
Make KVM_REQ_EVENT request on the vCPU from db_interception() to make sure
pending NMIs are checked and possibly injected.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Only clear the valid bit when invalidate logical APIC id entry.
The current logic clear the valid bit, but also set the rest of
the bits (including reserved bits) to 1.
Fixes: 98d90582be ('svm: Fix AVIC DFR and LDR handling')
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This reverts commit bb218fbcfa.
As Oren Twaig pointed out the old discussion:
https://patchwork.kernel.org/patch/8292231/
that the change coud potentially cause an extra IPI to be sent to
the destination vcpu because the AVIC hardware already set the IRR bit
before the incomplete IPI #VMEXIT with id=1 (target vcpu is not running).
Since writting to ICR and ICR2 will also set the IRR. If something triggers
the destination vcpu to get scheduled before the emulation finishes, then
this could result in an additional IPI.
Also, the issue mentioned in the commit bb218fbcfa was misdiagnosed.
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Reported-by: Oren Twaig <oren@scalemp.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM bases its memory usage limits on the total number of guest pages
across all memslots. However, those limits, and the calculations to
produce them, use 32 bit unsigned integers. This can result in overflow
if a VM has more guest pages that can be represented by a u32. As a
result of this overflow, KVM can use a low limit on the number of MMU
pages it will allocate. This makes KVM unable to map all of guest memory
at once, prompting spurious faults.
Tested: Ran all kvm-unit-tests on an Intel Haswell machine. This patch
introduced no new failures.
Signed-off-by: Ben Gardon <bgardon@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The remaining failures of vmx.flat when EPT is disabled are caused by
incorrectly reflecting VMfails to the L1 hypervisor. What happens is
that nested_vmx_restore_host_state corrupts the guest CR3, reloading it
with the host's shadow CR3 instead, because it blindly loads GUEST_CR3
from the vmcs01.
For simplicity let's just always use hardware VMCS checks when EPT is
disabled. This way, nested_vmx_restore_host_state is not reached at
all (or at least shouldn't be reached).
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
err_spi is used when SERIAL_SC16IS7XX_SPI is enabled, so make
the label only available under SERIAL_SC16IS7XX_SPI option.
Otherwise, the below warning appears.
drivers/tty/serial/sc16is7xx.c:1523:1: warning: label ‘err_spi’ defined but not used [-Wunused-label]
err_spi:
^~~~~~~
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Fixes: ac0cdb3d99 ("sc16is7xx: missing unregister/delete driver on error in sc16is7xx_init()")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There are several issues with the formula used for calculating the
deviation from the intended rate:
1. While min_err and last_stop are signed, srr and baud are unsigned.
Hence the signed values are promoted to unsigned, which will lead
to a bogus value of deviation if min_err is negative,
2. Srr is the register field value, which is one less than the actual
sampling rate factor,
3. The divisions do not use rounding.
Fix this by casting unsigned variables to int, adding one to srr, and
using a single DIV_ROUND_CLOSEST().
Fixes: 63ba1e00f1 ("serial: sh-sci: Support for HSCIF RX sampling point adjustment")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Mukesh Ojha <mojha@codeaurora.org>
Cc: stable <stable@vger.kernel.org>
Reviewed-by: Ulrich Hecht <uli+renesas@fpond.eu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 008258d995 ("clocksource/drivers/timer-ti-dm: Make
omap_dm_timer_set_load_start() static") made omap_dm_time_set_load_start
static because its prototype was not defined in a header. Unfortunately,
this causes a build warning on multi_v7_defconfig because this function
is not used anywhere in this translation unit:
drivers/clocksource/timer-ti-dm.c:589:12: error: unused function
'omap_dm_timer_set_load_start' [-Werror,-Wunused-function]
In fact, omap_dm_timer_set_load_start hasn't been used anywhere since
commit f190be7f39 ("staging: tidspbridge: remove driver") and the
prototype was removed in commit 592ea6bd1f ("clocksource: timer-ti-dm:
Make unexported functions static"), which is probably where this should
have happened.
Fixes: 592ea6bd1f ("clocksource: timer-ti-dm: Make unexported functions static")
Fixes: 008258d995 ("clocksource/drivers/timer-ti-dm: Make omap_dm_timer_set_load_start() static")
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
`vmk80xx_alloc_usb_buffers()` is called from `vmk80xx_auto_attach()` to
allocate RX and TX buffers for USB transfers. It allocates
`devpriv->usb_rx_buf` followed by `devpriv->usb_tx_buf`. If the
allocation of `devpriv->usb_tx_buf` fails, it frees
`devpriv->usb_rx_buf`, leaving the pointer set dangling, and returns an
error. Later, `vmk80xx_detach()` will be called from the core comedi
module code to clean up. `vmk80xx_detach()` also frees both
`devpriv->usb_rx_buf` and `devpriv->usb_tx_buf`, but
`devpriv->usb_rx_buf` may have already been freed, leading to a
double-free error. Fix it by removing the call to
`kfree(devpriv->usb_rx_buf)` from `vmk80xx_alloc_usb_buffers()`, relying
on `vmk80xx_detach()` to free the memory.
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If `vmk80xx_auto_attach()` returns an error, the core comedi module code
will call `vmk80xx_detach()` to clean up. If `vmk80xx_auto_attach()`
successfully allocated the comedi device private data,
`vmk80xx_detach()` assumes that a `struct semaphore limit_sem` contained
in the private data has been initialized and uses it. Unfortunately,
there are a couple of places where `vmk80xx_auto_attach()` can return an
error after allocating the device private data but before initializing
the semaphore, so this assumption is invalid. Fix it by initializing
the semaphore just after allocating the private data in
`vmk80xx_auto_attach()` before any other errors can be returned.
I believe this was the cause of the following syzbot crash report
<https://syzkaller.appspot.com/bug?extid=54c2f58f15fe6876b6ad>:
usb 1-1: config 0 has no interface number 0
usb 1-1: New USB device found, idVendor=10cf, idProduct=8068, bcdDevice=e6.8d
usb 1-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
usb 1-1: config 0 descriptor??
vmk80xx 1-1:0.117: driver 'vmk80xx' failed to auto-configure device.
INFO: trying to register non-static key.
the code is fine but needs lockdep annotation.
turning off the locking correctness validator.
CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.1.0-rc4-319354-g9a33b36 #3
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: usb_hub_wq hub_event
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0xe8/0x16e lib/dump_stack.c:113
assign_lock_key kernel/locking/lockdep.c:786 [inline]
register_lock_class+0x11b8/0x1250 kernel/locking/lockdep.c:1095
__lock_acquire+0xfb/0x37c0 kernel/locking/lockdep.c:3582
lock_acquire+0x10d/0x2f0 kernel/locking/lockdep.c:4211
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0x44/0x60 kernel/locking/spinlock.c:152
down+0x12/0x80 kernel/locking/semaphore.c:58
vmk80xx_detach+0x59/0x100 drivers/staging/comedi/drivers/vmk80xx.c:829
comedi_device_detach+0xed/0x800 drivers/staging/comedi/drivers.c:204
comedi_device_cleanup.part.0+0x68/0x140 drivers/staging/comedi/comedi_fops.c:156
comedi_device_cleanup drivers/staging/comedi/comedi_fops.c:187 [inline]
comedi_free_board_dev.part.0+0x16/0x90 drivers/staging/comedi/comedi_fops.c:190
comedi_free_board_dev drivers/staging/comedi/comedi_fops.c:189 [inline]
comedi_release_hardware_device+0x111/0x140 drivers/staging/comedi/comedi_fops.c:2880
comedi_auto_config.cold+0x124/0x1b0 drivers/staging/comedi/drivers.c:1068
usb_probe_interface+0x31d/0x820 drivers/usb/core/driver.c:361
really_probe+0x2da/0xb10 drivers/base/dd.c:509
driver_probe_device+0x21d/0x350 drivers/base/dd.c:671
__device_attach_driver+0x1d8/0x290 drivers/base/dd.c:778
bus_for_each_drv+0x163/0x1e0 drivers/base/bus.c:454
__device_attach+0x223/0x3a0 drivers/base/dd.c:844
bus_probe_device+0x1f1/0x2a0 drivers/base/bus.c:514
device_add+0xad2/0x16e0 drivers/base/core.c:2106
usb_set_configuration+0xdf7/0x1740 drivers/usb/core/message.c:2021
generic_probe+0xa2/0xda drivers/usb/core/generic.c:210
usb_probe_device+0xc0/0x150 drivers/usb/core/driver.c:266
really_probe+0x2da/0xb10 drivers/base/dd.c:509
driver_probe_device+0x21d/0x350 drivers/base/dd.c:671
__device_attach_driver+0x1d8/0x290 drivers/base/dd.c:778
bus_for_each_drv+0x163/0x1e0 drivers/base/bus.c:454
__device_attach+0x223/0x3a0 drivers/base/dd.c:844
bus_probe_device+0x1f1/0x2a0 drivers/base/bus.c:514
device_add+0xad2/0x16e0 drivers/base/core.c:2106
usb_new_device.cold+0x537/0xccf drivers/usb/core/hub.c:2534
hub_port_connect drivers/usb/core/hub.c:5089 [inline]
hub_port_connect_change drivers/usb/core/hub.c:5204 [inline]
port_event drivers/usb/core/hub.c:5350 [inline]
hub_event+0x138e/0x3b00 drivers/usb/core/hub.c:5432
process_one_work+0x90f/0x1580 kernel/workqueue.c:2269
worker_thread+0x9b/0xe20 kernel/workqueue.c:2415
kthread+0x313/0x420 kernel/kthread.c:253
ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352
Reported-by: syzbot+54c2f58f15fe6876b6ad@syzkaller.appspotmail.com
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Chanwoo writes:
Update extcon for v5.1-rc4
Detailed description for this pull request:
1. Fix the build issue of extcon-ptn5150.c driver by editing
the module dependency in Kconfig.
* tag 'extcon-fixes-for-5.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/extcon:
extcon: ptn5150: fix COMPILE_TEST dependencies
Some drivers (such as the vub300 MMC driver) expect usb_string() to
return a properly NUL-terminated string, even when an error occurs.
(In fact, vub300's probe routine doesn't bother to check the return
code from usb_string().) When the driver goes on to use an
unterminated string, it leads to kernel errors such as
stack-out-of-bounds, as found by the syzkaller USB fuzzer.
An out-of-range string index argument is not at all unlikely, given
that some devices don't provide string descriptors and therefore list
0 as the value for their string indexes. This patch makes
usb_string() return a properly terminated empty string along with the
-EINVAL error code when an out-of-range index is encountered.
And since a USB string index is a single-byte value, indexes >= 256
are just as invalid as values of 0 or below.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: syzbot+b75b85111c10b8d680f1@syzkaller.appspotmail.com
CC: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Current implementation was not properly handling frwr memory
registrations. This was uncovered by commit 27f26cec761das ("xprtrdma:
Plant XID in on-the-wire RDMA offset (FRWR)") in which xprtrdma, which is
used for NFS over RDMA, started failing as it was the first ULP to modify
the ib_mr iova resulting in the NFS server getting REMOTE ACCESS ERROR
when attempting to perform RDMA Writes to the client.
The fix is to properly capture the true iova, offset, and length in the
call to ib_map_mr_sg, and then update the iova when processing the
IB_WR_REG_MEM on the send queue.
Fixes: a41081aa59 ("IB/rdmavt: Add support for ib_map_mr_sg")
Cc: stable@vger.kernel.org
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Josh Collier <josh.d.collier@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
As mentioned in the comment, there are some special cases where we can simply
clear the TPR shadow bit from the CPU-based execution controls in the vmcs02.
Handle them so that we can remove some XFAILs from vmx.flat.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Xose Vazquez Perez reported boot warnings when NX is disabled on the kernel command line.
__early_set_fixmap() triggers this warning:
attempted to set unsupported pgprot: 8000000000000163
bits: 8000000000000000
supported: 7fffffffffffffff
WARNING: CPU: 0 PID: 0 at arch/x86/include/asm/pgtable.h:537
__early_set_fixmap+0xa2/0xff
because it uses __default_kernel_pte_mask to mask out unsupported bits.
Use __supported_pte_mask instead.
Disabling NX on the command line also triggers the NX warning in the page
table mapping check:
WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:262 note_page+0x2ae/0x650
....
Make the warning depend on NX set in __supported_pte_mask.
Reported-by: Xose Vazquez Perez <xose.vazquez@gmail.com>
Tested-by: Xose Vazquez Perez <xose.vazquez@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.21.1904151037530.1729@nanos.tec.linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The following commit introduced a bug in one of our error paths:
819319fc93 ("kprobes: Return error if we fail to reuse kprobe instead of BUG_ON()")
it missed to handle the return value of kprobe_optready() as
error-value. In reality, the kprobe_optready() returns a bool
result, so "true" case must be passed instead of 0.
This causes some errors on kprobe boot-time selftests on ARM:
[ ] Beginning kprobe tests...
[ ] Probe ARM code
[ ] kprobe
[ ] kretprobe
[ ] ARM instruction simulation
[ ] Check decoding tables
[ ] Run test cases
[ ] FAIL: test_case_handler not run
[ ] FAIL: Test andge r10, r11, r14, asr r7
[ ] FAIL: Scenario 11
...
[ ] FAIL: Scenario 7
[ ] Total instruction simulation tests=1631, pass=1433 fail=198
[ ] kprobe tests failed
This can happen if an optimized probe is unregistered and next
kprobe is registered on same address until the previous probe
is not reclaimed.
If this happens, a hidden aggregated probe may be kept in memory,
and no new kprobe can probe same address. Also, in that case
register_kprobe() will return "1" instead of minus error value,
which can mislead caller logic.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David S . Miller <davem@davemloft.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Naveen N . Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org # v5.0+
Fixes: 819319fc93 ("kprobes: Return error if we fail to reuse kprobe instead of BUG_ON()")
Link: http://lkml.kernel.org/r/155530808559.32517.539898325433642204.stgit@devnote2
Signed-off-by: Ingo Molnar <mingo@kernel.org>
If lockdep_register_key() and lockdep_unregister_key() are called with
debug_locks == false then the following warning is reported:
WARNING: CPU: 2 PID: 15145 at kernel/locking/lockdep.c:4920 lockdep_unregister_key+0x1ad/0x240
That warning is reported because lockdep_unregister_key() ignores the
value of 'debug_locks' and because the behavior of lockdep_register_key()
depends on whether or not 'debug_locks' is set. Fix this inconsistency
by making lockdep_unregister_key() take 'debug_locks' again into
account.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: shenghui <shhuiw@foxmail.com>
Fixes: 90c1cba2b3 ("locking/lockdep: Zap lock classes even with lock debugging disabled")
Link: http://lkml.kernel.org/r/20190415170538.23491-1-bvanassche@acm.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When SCSI blk-mq is enabled, there is a bug in handling errors in
scsi_queue_rq. Specifically, the bug is not setting result field of
scsi_request correctly when the dispatch of the command has been
failed. Since the upper layer code including the sg_io ioctl expects to
receive any error status from result field of scsi_request, the error is
silently ignored and this could cause data corruptions for some
applications.
Fixes: d285203cf6 ("scsi: add support for a blk-mq based I/O path.")
Cc: <stable@vger.kernel.org>
Signed-off-by: Jaesoo Lee <jalee@purestorage.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Pull libnvdimm fixes from Dan Williams:
"I debated holding this back for the v5.2 merge window due to the size
of the "zero-key" changes, but affected users would benefit from
having the fixes sooner. It did not make sense to change the zero-key
semantic in isolation for the "secure-erase" command, but instead
include it for all security commands.
The short background on the need for these changes is that some NVDIMM
platforms enable security with a default zero-key rather than let the
OS specify the initial key. This makes the security enabling that
landed in v5.0 unusable for some users.
Summary:
- Compatibility fix for nvdimm-security implementations with a
default zero-key.
- Miscellaneous small fixes for out-of-bound accesses, cleanup after
initialization failures, and missing debug messages"
* tag 'libnvdimm-fixes-5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
tools/testing/nvdimm: Retain security state after overwrite
libnvdimm/pmem: fix a possible OOB access when read and write pmem
libnvdimm/security, acpi/nfit: unify zero-key for all security commands
libnvdimm/security: provide fix for secure-erase to use zero-key
libnvdimm/btt: Fix a kmemdup failure check
libnvdimm/namespace: Fix a potential NULL pointer dereference
acpi/nfit: Always dump _DSM output payload
Pull fsdax fix from Dan Williams:
"A single filesystem-dax fix. It has been lingering in -next for a long
while and there are no other fsdax fixes on the horizon:
- Avoid a crash scenario with architectures like powerpc that require
'pgtable_deposit' for the zero page"
* tag 'fsdax-fix-5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
fs/dax: Deposit pagetable even when installing zero page
MAINTAINERS contains a lower-case and upper-case variant of
Woojung Huh' s email address.
Only keep the lower-case variant in MAINTAINERS.
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Acked-by: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a bond is enslaved to another bond, bond_netdev_event() only
handles the event as if the bond is a master, and skips treating the
bond as a slave.
This leads to a refcount leak on the slave, since we don't remove the
adjacency to its master and the master holds a reference on the slave.
Reproducer:
ip link add bondL type bond
ip link add bondU type bond
ip link set bondL master bondU
ip link del bondL
No "Fixes:" tag, this code is older than git history.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kalle Valo says:
====================
wireless-drivers fixes for 5.1
Second set of fixes for 5.1.
iwlwifi
* add some new PCI IDs (plus a struct name change they depend on)
* fix crypto with new devices, namely 22560 and above
* fix for a potential deadlock in the TX path
* a fix for offloaded rate-control
* support new PCI HW IDs which use a new FW
mt76
* fix lock initialisation and a possible deadlock
* aggregation fixes
rt2x00
* fix sequence numbering during retransmits
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 4c21b8fd8f (MIPS: seccomp: Handle indirect system calls (o32))
added indirect syscall detection for O32 processes running on MIPS64,
but it did not work correctly for big endian kernel/processes. The
reason is that the syscall number is loaded from ARG1 using the lw
instruction while this is a 64-bit value, so zero is loaded instead of
the syscall number.
Fix the code by using the ld instruction instead. When running a 32-bit
processes on a 64 bit CPU, the values are properly sign-extended, so it
ensures the value passed to syscall_trace_enter is correct.
Recent systemd versions with seccomp enabled whitelist the getpid
syscall for their internal processes (e.g. systemd-journald), but call
it through syscall(SYS_getpid). This fix therefore allows O32 big endian
systems with a 64-bit kernel to run recent systemd versions.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Cc: <stable@vger.kernel.org> # v3.15+
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Paul Burton <paul.burton@mips.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: linux-mips@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
If we have multiple threads, one doing io_uring_enter() while the other
is doing io_uring_register(), we can run into a deadlock between the
two. io_uring_register() must wait for existing users of the io_uring
instance to exit. But it does so while holding the io_uring mutex.
Callers of io_uring_enter() may need this mutex to make progress (and
eventually exit). If we wait for users to exit in io_uring_register(),
we can't do so with the io_uring mutex held without potentially risking
a deadlock.
Drop the io_uring mutex while waiting for existing callers to exit. This
is safe and guaranteed to make forward progress, since we already killed
the percpu ref before doing so. Hence later callers of io_uring_enter()
will be rejected.
Reported-by: syzbot+16dc03452dee970a0c3e@syzkaller.appspotmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We are no longer calling bxt_ddi_phy_calc_lane_lat_optim_mask() when
intel{hdmi,dp}_compute_config() succeeds, and instead only call it
when those fail. This is fallout from the bool->int
.compute_config() conversion which failed to invert the return
value check before calling bxt_ddi_phy_calc_lane_lat_optim_mask().
Let's just replace it with an early bailout so that it's harder
to miss.
This restores the correct latency optim setting calculation
(which could fix some real failures), and avoids the
MISSING_CASE() from bxt_ddi_phy_calc_lane_lat_optim_mask()
after intel{hdmi,dp}_compute_config() has failed.
Cc: Lyude Paul <lyude@redhat.com>
Fixes: 204474a6b8 ("drm/i915: Pass down rc in intel_encoder->compute_config()")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109373
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190411164925.28491-1-ville.syrjala@linux.intel.com
Reviewed-by: Lyude Paul <lyude@redhat.com>
(cherry picked from commit 7a412b8f60)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Add the io_uring and pidfd_send_signal system calls to all architectures.
These system calls are designed to handle both native and compat tasks,
so all entries are the same across architectures, only arm-compat and
the generic tale still use an old format.
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> (s390)
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
spdxcheck.py complains:
drivers/power/supply/goldfish_battery.c: 1:28 Invalid License ID: GPL
which is correct because GPL is not a valid identifier. Of course this
could have been caught by checkpatch.pl _before_ submitting or merging the
patch.
WARNING: 'SPDX-License-Identifier: GPL' is not supported in LICENSES/...
#19: FILE: drivers/power/supply/goldfish_battery.c:1:
+// SPDX-License-Identifier: GPL
Which is absolutely hillarious as the commit introducing this wreckage says
in the changelog:
There was a checkpatch complain:
"Missing or malformed SPDX-License-Identifier tag".
Oh well. Replacing a checkpatch warning by a different checkpatch warning
is a really useful exercise.
Use the proper GPL-2.0 identifier which is what the boiler plate in the
file had originally.
Fixes: e75e3a125b ("drivers: power: supply: goldfish_battery: Put an SPDX tag")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The debug feature entries have been used with up to 5 arguents
(including the pointer to the format string) but there was only
space reserved for 4 arguemnts. So now the registration does
reserve space for 5 times a long value.
This fixes a sometime appearing weired value as the last
value of an debug feature entry like this:
... pkey_sec2protkey zcrypt_send_cprb (cardnr=10 domain=12)
failed with errno -2143346254
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reported-by: Christian Rund <Christian.Rund@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Sven Auhagen reported that a 2nd ping request will fail if 'fully-random'
mode is used.
Reason is that if no proto information is given, min/max are both 0,
so we set the icmp id to 0 instead of chosing a random value between
0 and 65535.
Update test case as well to catch this, without fix this yields:
[..]
ERROR: cannot ping ns1 from ns2 with ip masquerade fully-random (attempt 2)
ERROR: cannot ping ns1 from ns2 with ipv6 masquerade fully-random (attempt 2)
... becaus 2nd ping clashes with existing 'id 0' icmp conntrack and gets
dropped.
Fixes: 203f2e7820 ("netfilter: nat: remove l4proto->unique_tuple")
Reported-by: Sven Auhagen <sven.auhagen@voleatech.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
I believe that "hook->num" can be up to UINT_MAX. Shifting more than
31 bits would is undefined in C but in practice it would lead to shift
wrapping. That would lead to an array overflow in nf_tables_addchain():
ops->hook = hook.type->hooks[ops->hooknum];
Fixes: fe19c04ca1 ("netfilter: nf_tables: remove nhooks field from struct nft_af_info")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
else, we leak the addresses to userspace via ctnetlink events
and dumps.
Compute an ID on demand based on the immutable parts of nf_conn struct.
Another advantage compared to using an address is that there is no
immediate re-use of the same ID in case the conntrack entry is freed and
reallocated again immediately.
Fixes: 3583240249 ("[NETFILTER]: nf_conntrack_expect: kill unique ID")
Fixes: 7f85f91472 ("[NETFILTER]: nf_conntrack: kill unique ID")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
[Why]
On some compositors, with two monitors attached, VT terminal
switch can cause a graphical issue by the following means:
There are two streams, one for each monitor. Each stream has one
plane
current state:
M1:S1->P1
M2:S2->P2
The user calls for a terminal switch and a commit is made to
change both planes to linear swizzle mode. In atomic check,
a new dc_state is constructed with new planes on each stream
new state:
M1:S1->P3
M2:S2->P4
In commit tail, each stream is committed, one at a time. The first
stream (S1) updates properly, triggerring a full update and replacing
the state
current state:
M1:S1->P3
M2:S2->P4
The update for S2 comes in, but dc detects that there is no difference
between the stream and plane in the new and current states, and so
triggers a fast update. The fast update does not program swizzle,
so the second monitor is corrupted
[How]
Add a flag to dc_plane_state that forces full updates
When a stream undergoes a full update, set this flag on all changed
planes, then clear it on the current stream
Subsequent streams will get full updates as a result
Signed-off-by: David Francis <David.Francis@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Reviewed-by: Roman Li <Roman.Li@amd.com>
Acked-by: Bhawanpreet Lakha <Bhawanpreet Lakha@amd.com>
Acked-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Merge page ref overflow branch.
Jann Horn reported that he can overflow the page ref count with
sufficient memory (and a filesystem that is intentionally extremely
slow).
Admittedly it's not exactly easy. To have more than four billion
references to a page requires a minimum of 32GB of kernel memory just
for the pointers to the pages, much less any metadata to keep track of
those pointers. Jann needed a total of 140GB of memory and a specially
crafted filesystem that leaves all reads pending (in order to not ever
free the page references and just keep adding more).
Still, we have a fairly straightforward way to limit the two obvious
user-controllable sources of page references: direct-IO like page
references gotten through get_user_pages(), and the splice pipe page
duplication. So let's just do that.
* branch page-refs:
fs: prevent page refcount overflow in pipe_buf_get
mm: prevent get_user_pages() from overflowing page refcount
mm: add 'try_get_page()' helper function
mm: make page ref count overflow check tighter and more explicit
Saeed Mahameed says:
====================
Mellanox, mlx5 fixes 2019-04-09
This series provides some fixes to mlx5 driver.
I've cc'ed some of the checksum fixes to Eric Dumazet and i would like to get
his feedback before you pull.
For -stable v4.19
('net/mlx5: FPGA, tls, idr remove on flow delete')
('net/mlx5: FPGA, tls, hold rcu read lock a bit longer')
For -stable v4.20
('net/mlx5e: Rx, Check ip headers sanity')
('Revert "net/mlx5e: Enable reporting checksum unnecessary also for L3 packets"')
('net/mlx5e: Rx, Fixup skb checksum for packets with tail padding')
For -stable v5.0
('net/mlx5e: Switch to Toeplitz RSS hash by default')
('net/mlx5e: Protect against non-uplink representor for encap')
('net/mlx5e: XDP, Avoid checksum complete when XDP prog is loaded')
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub forgot to either use nlmsg_len() or nlmsg_msg_size(),
allowing KMSAN to detect a possible uninit-value in rtnl_stats_get
BUG: KMSAN: uninit-value in rtnl_stats_get+0x6d9/0x11d0 net/core/rtnetlink.c:4997
CPU: 0 PID: 10428 Comm: syz-executor034 Not tainted 5.1.0-rc2+ #24
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x173/0x1d0 lib/dump_stack.c:113
kmsan_report+0x131/0x2a0 mm/kmsan/kmsan.c:619
__msan_warning+0x7a/0xf0 mm/kmsan/kmsan_instr.c:310
rtnl_stats_get+0x6d9/0x11d0 net/core/rtnetlink.c:4997
rtnetlink_rcv_msg+0x115b/0x1550 net/core/rtnetlink.c:5192
netlink_rcv_skb+0x431/0x620 net/netlink/af_netlink.c:2485
rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:5210
netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
netlink_unicast+0xf3e/0x1020 net/netlink/af_netlink.c:1336
netlink_sendmsg+0x127f/0x1300 net/netlink/af_netlink.c:1925
sock_sendmsg_nosec net/socket.c:622 [inline]
sock_sendmsg net/socket.c:632 [inline]
___sys_sendmsg+0xdb3/0x1220 net/socket.c:2137
__sys_sendmsg net/socket.c:2175 [inline]
__do_sys_sendmsg net/socket.c:2184 [inline]
__se_sys_sendmsg+0x305/0x460 net/socket.c:2182
__x64_sys_sendmsg+0x4a/0x70 net/socket.c:2182
do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291
entry_SYSCALL_64_after_hwframe+0x63/0xe7
Fixes: 51bc860d4a ("rtnetlink: stats: validate attributes in get as well as dumps")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mikhail reported a lockdep splat related to the AMD specific ssb_state
lock:
CPU0 CPU1
lock(&st->lock);
local_irq_disable();
lock(&(&sighand->siglock)->rlock);
lock(&st->lock);
<Interrupt>
lock(&(&sighand->siglock)->rlock);
*** DEADLOCK ***
The connection between sighand->siglock and st->lock comes through seccomp,
which takes st->lock while holding sighand->siglock.
Make sure interrupts are disabled when __speculation_ctrl_update() is
invoked via prctl() -> speculation_ctrl_update(). Add a lockdep assert to
catch future offenders.
Fixes: 1f50ddb4f4 ("x86/speculation: Handle HT correctly on AMD")
Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Cc: Thomas Lendacky <thomas.lendacky@amd.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1904141948200.4917@nanos.tec.linutronix.de
Denis Bolotin says:
====================
qed: Fix the Doorbell Overflow Recovery mechanism
This patch series fixes and improves the doorbell recovery mechanism.
The main goals of this series are to fix missing attentions from the
doorbells block (DORQ) or not handling them properly, and execute the
recovery from periodic handler instead of the attention handler.
Please consider applying the series to net.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Separate the overflow handling from the hardware interrupt status analysis.
The interrupt status is a single register and is common for all PFs. The
first PF reading the register is not necessarily the one who overflowed.
All PFs must check their overflow status on every attention.
In this change we clear the sticky indication in the attention handler to
allow doorbells to be processed again as soon as possible, but running
the doorbell recovery is scheduled for the periodic handler to reduce the
time spent in the attention handler.
Checking the need for DORQ flush was changed to "db_bar_no_edpm" because
qed_edpm_enabled()'s result could change dynamically and might have
prevented a needed flush.
Signed-off-by: Denis Bolotin <dbolotin@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the DORQ (doorbell block) is overflowed, all PFs get attentions at the
same time. If one PF finished handling the attention before another PF even
started, the second PF might miss the DORQ's attention bit and not handle
the attention at all.
If the DORQ attention is missed and the issue is not resolved, another
attention will not be sent, therefore each attention is treated as a
potential DORQ attention.
As a result, the attention callback is called more frequently so the debug
print was moved to reduce its quantity.
The number of periodic doorbell recovery handler schedules was reduced
because it was the previous way to mitigating the missed attention issue.
Signed-off-by: Denis Bolotin <dbolotin@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix the condition which verifies that doorbell address is inside the
doorbell bar by checking that the end of the address is within range
as well.
Signed-off-by: Denis Bolotin <dbolotin@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
DB_REC_DRY_RUN (running doorbell recovery without sending doorbells) is
never used. DB_REC_ONCE (send a single doorbell from the doorbell recovery)
is not needed anymore because by running the periodic handler we make sure
we check the overflow status later instead.
This patch is needed because in the next patches, the only doorbell
recovery type being used is DB_REC_REAL_DEAL, and the fixes are much
cleaner without this enum.
Signed-off-by: Denis Bolotin <dbolotin@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change pipe_buf_get() to return a bool indicating whether it succeeded
in raising the refcount of the page (if the thing in the pipe is a page).
This removes another mechanism for overflowing the page refcount. All
callers converted to handle a failure.
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If the page refcount wraps around past zero, it will be freed while
there are still four billion references to it. One of the possible
avenues for an attacker to try to make this happen is by doing direct IO
on a page multiple times. This patch makes get_user_pages() refuse to
take a new page reference if there are already more than two billion
references to the page.
Reported-by: Jann Horn <jannh@google.com>
Acked-by: Matthew Wilcox <willy@infradead.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is the same as the traditional 'get_page()' function, but instead
of unconditionally incrementing the reference count of the page, it only
does so if the count was "safe". It returns whether the reference count
was incremented (and is marked __must_check, since the caller obviously
has to be aware of it).
Also like 'get_page()', you can't use this function unless you already
had a reference to the page. The intent is that you can use this
exactly like get_page(), but in situations where you want to limit the
maximum reference count.
The code currently does an unconditional WARN_ON_ONCE() if we ever hit
the reference count issues (either zero or negative), as a notification
that the conditional non-increment actually happened.
NOTE! The count access for the "safety" check is inherently racy, but
that doesn't matter since the buffer we use is basically half the range
of the reference count (ie we look at the sign of the count).
Acked-by: Matthew Wilcox <willy@infradead.org>
Cc: Jann Horn <jannh@google.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We have a VM_BUG_ON() to check that the page reference count doesn't
underflow (or get close to overflow) by checking the sign of the count.
That's all fine, but we actually want to allow people to use a "get page
ref unless it's already very high" helper function, and we want that one
to use the sign of the page ref (without triggering this VM_BUG_ON).
Change the VM_BUG_ON to only check for small underflows (or _very_ close
to overflowing), and ignore overflows which have strayed into negative
territory.
Acked-by: Matthew Wilcox <willy@infradead.org>
Cc: Jann Horn <jannh@google.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When cache allocation is supported and the user creates a new resctrl
resource group, the allocations of the new resource group are
initialized to all regions that it can possibly use. At this time these
regions are all that are shareable by other resource groups as well as
regions that are not currently used. The new resource group's mode is
also initialized to reflect this initialization and set to "shareable".
The new resource group's mode is currently repeatedly initialized within
the loop that configures the hardware with the resource group's default
allocations.
Move the initialization of the resource group's mode outside the
hardware configuration loop. The resource group's mode is now
initialized only once as the final step to reflect that its configured
allocations are "shareable".
Fixes: 95f0b77efa ("x86/intel_rdt: Initialize new resource group with sane defaults")
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Fenghua Yu <fenghua.yu@intel.com>
Acked-by: Reinette Chatre <reinette.chatre@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: pei.p.jia@intel.com
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/1554839629-5448-1-git-send-email-xiaochen.shen@intel.com
Since the fget/fput handling was reworked in commit 09bb839434, we
never call io_file_put() with state == NULL (and hence file != NULL)
anymore. Remove that case.
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
A previous commit moved the shallow depth and BFQ depth map calculations
to be done at init time, moving it outside of the hotter IO path. This
potentially causes hangs if the users changes the depth of the scheduler
map, by writing to the 'nr_requests' sysfs file for that device.
Add a blk-mq-sched hook that allows blk-mq to inform the scheduler if
the depth changes, so that the scheduler can update its internal state.
Tested-by: Kai Krakow <kai@kaishome.de>
Reported-by: Paolo Valente <paolo.valente@linaro.org>
Fixes: f0635b8a41 ("bfq: calculate shallow depths at init time")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We currently call cpu_possible() even if we don't use the CPU. Move the
test under the SQ_AFF branch, which is the only place where we'll use
the value. Do the cpu_possible() test AFTER we've limited it to a max
of NR_CPUS. This avoids triggering the following warning:
WARNING: CPU: 1 PID: 7600 at include/linux/cpumask.h:121 cpu_max_bits_warn
if CONFIG_DEBUG_PER_CPU_MAPS is enabled.
While in there, also move the SQ thread idle period assignment inside
SETUP_SQPOLL, as we don't use it otherwise either.
Reported-by: syzbot+cd714a07c6de2bc34293@syzkaller.appspotmail.com
Fixes: 6c271ce2f1 ("io_uring: add submission polling")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull block fixes from Jens Axboe:
"Set of fixes that should go into this round. This pull is larger than
I'd like at this time, but there's really no specific reason for that.
Some are fixes for issues that went into this merge window, others are
not. Anyway, this contains:
- Hardware queue limiting for virtio-blk/scsi (Dongli)
- Multi-page bvec fixes for lightnvm pblk
- Multi-bio dio error fix (Jason)
- Remove the cache hint from the io_uring tool side, since we didn't
move forward with that (me)
- Make io_uring SETUP_SQPOLL root restricted (me)
- Fix leak of page in error handling for pc requests (Jérôme)
- Fix BFQ regression introduced in this merge window (Paolo)
- Fix break logic for bio segment iteration (Ming)
- Fix NVMe cancel request error handling (Ming)
- NVMe pull request with two fixes (Christoph):
- fix the initial CSN for nvme-fc (James)
- handle log page offsets properly in the target (Keith)"
* tag 'for-linus-20190412' of git://git.kernel.dk/linux-block:
block: fix the return errno for direct IO
nvmet: fix discover log page when offsets are used
nvme-fc: correct csn initialization and increments on error
block: do not leak memory in bio_copy_user_iov()
lightnvm: pblk: fix crash in pblk_end_partial_read due to multipage bvecs
nvme: cancel request synchronously
blk-mq: introduce blk_mq_complete_request_sync()
scsi: virtio_scsi: limit number of hw queues by nr_cpu_ids
virtio-blk: limit number of hw queues by nr_cpu_ids
block, bfq: fix use after free in bfq_bfqq_expire
io_uring: restrict IORING_SETUP_SQPOLL to root
tools/io_uring: remove IOCQE_FLAG_CACHEHIT
block: don't use for-inside-for in bio_for_each_segment_all
Pull NFS client bugfixes from Trond Myklebust:
"Highlights include:
Stable fix:
- Fix a deadlock in close() due to incorrect draining of RDMA queues
Bugfixes:
- Revert "SUNRPC: Micro-optimise when the task is known not to be
sleeping" as it is causing stack overflows
- Fix a regression where NFSv4 getacl and fs_locations stopped
working
- Forbid setting AF_INET6 to "struct sockaddr_in"->sin_family.
- Fix xfstests failures due to incorrect copy_file_range() return
values"
* tag 'nfs-for-5.1-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
Revert "SUNRPC: Micro-optimise when the task is known not to be sleeping"
NFSv4.1 fix incorrect return value in copy_file_range
xprtrdma: Fix helper that drains the transport
NFS: Fix handling of reply page vector
NFS: Forbid setting AF_INET6 to "struct sockaddr_in"->sin_family.
Pull SCSI fix from James Bottomley:
"One obvious fix for a ciostor data corruption on error bug"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: csiostor: fix missing data copy in csio_scsi_err_handler()
Pull clk fixes from Stephen Boyd:
"Here's more than a handful of clk driver fixes for changes that came
in during the merge window:
- Fix the AT91 sama5d2 programmable clk prescaler formula
- A bunch of Amlogic meson clk driver fixes for the VPU clks
- A DMI quirk for Intel's Bay Trail SoC's driver to properly mark pmc
clks as critical only when really needed
- Stop overwriting CLK_SET_RATE_PARENT flag in mediatek's clk gate
implementation
- Use the right structure to test for a frequency table in i.MX's
PLL_1416x driver"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: imx: Fix PLL_1416X not rounding rates
clk: mediatek: fix clk-gate flag setting
platform/x86: pmc_atom: Drop __initconst on dmi table
clk: x86: Add system specific quirk to mark clocks as critical
clk: meson: vid-pll-div: remove warning and return 0 on invalid config
clk: meson: pll: fix rounding and setting a rate that matches precisely
clk: meson-g12a: fix VPU clock parents
clk: meson: g12a: fix VPU clock muxes mask
clk: meson-gxbb: round the vdec dividers to closest
clk: at91: fix programmable clock for sama5d2
Pull PCI fixes from Bjorn Helgaas:
- Add a DMA alias quirk for another Marvell SATA device (Andre
Przywara)
- Fix a pciehp regression that broke safe removal of devices (Sergey
Miroshnichenko)
* tag 'pci-v5.1-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI: pciehp: Ignore Link State Changes after powering off a slot
PCI: Add function 1 DMA alias quirk for Marvell 9170 SATA controller
Pull powerpc fixes from Michael Ellerman:
"A minor build fix for 64-bit FLATMEM configs.
A fix for a boot failure on 32-bit powermacs.
My commit to fix CLOCK_MONOTONIC across Y2038 broke the 32-bit VDSO on
64-bit kernels, ie. compat mode, which is only used on big endian.
The rewrite of the SLB code we merged in 4.20 missed the fact that the
0x380 exception is also used with the Radix MMU to report out of range
accesses. This could lead to an oops if userspace tried to read from
addresses outside the user or kernel range.
Thanks to: Aneesh Kumar K.V, Christophe Leroy, Larry Finger, Nicholas
Piggin"
* tag 'powerpc-5.1-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/mm: Define MAX_PHYSMEM_BITS for all 64-bit configs
powerpc/64s/radix: Fix radix segment exception handling
powerpc/vdso32: fix CLOCK_MONOTONIC on PPC64
powerpc/32: Fix early boot failure with RTAS built-in
Pull arm64 fixes from Will Deacon:
"The main thing is a fix to our FUTEX_WAKE_OP implementation which was
unbelievably broken, but did actually work for the one scenario that
GLIBC used to use.
Summary:
- Fix stack unwinding so we ignore user stacks
- Fix ftrace module PLT trampoline initialisation checks
- Fix terminally broken implementation of FUTEX_WAKE_OP atomics"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: futex: Fix FUTEX_WAKE_OP atomic ops with non-zero result value
arm64: backtrace: Don't bother trying to unwind the userspace stack
arm64/ftrace: fix inadvertent BUG() in trampoline check
We can receive ICMP errors from client or from
tunneling real server. While the former can be
scheduled to real server, the latter should
not be scheduled, they are decapsulated only when
existing connection is found.
Fixes: 6044eeffaf ("ipvs: attempt to schedule icmp packets")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Luca Moro says:
------
The issue lies in the filtering of ICMP and ICMPv6 errors that include an
inner IP datagram.
For these packets, icmp_error_message() extract the ICMP error and inner
layer to search of a known state.
If a state is found the packet is tagged as related (IP_CT_RELATED).
The problem is that there is no correlation check between the inner and
outer layer of the packet.
So one can encapsulate an error with an inner layer matching a known state,
while its outer layer is directed to a filtered host.
In this case the whole packet will be tagged as related.
This has various implications from a rule bypass (if a rule to related
trafic is allow), to a known state oracle.
Unfortunately, we could not find a real statement in a RFC on how this case
should be filtered.
The closest we found is RFC5927 (Section 4.3) but it is not very clear.
A possible fix would be to check that the inner IP source is the same than
the outer destination.
We believed this kind of attack was not documented yet, so we started to
write a blog post about it.
You can find it attached to this mail (sorry for the extract quality).
It contains more technical details, PoC and discussion about the identified
behavior.
We discovered later that
https://www.gont.com.ar/papers/filtering-of-icmp-error-messages.pdf
described a similar attack concept in 2004 but without the stateful
filtering in mind.
-----
This implements above suggested fix:
In icmp(v6) error handler, take outer destination address, then pass
that into the common function that does the "related" association.
After obtaining the nf_conn of the matching inner-headers connection,
check that the destination address of the opposite direction tuple
is the same as the outer address and only set RELATED if thats the case.
Reported-by: Luca Moro <luca.moro@synacktiv.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
When an icmp error such as pkttoobig is received, conntrack checks
if the "inner" header (header of packet that did not fit link mtu)
is matches an existing connection, and, if so, sets that packet as
being related to the conntrack entry it found.
It was recently reported that this "related" setting also works
if the inner header is from another, different connection (i.e.,
artificial/forged icmp error).
Add a test, followup patch will add additional "inner dst matches
outer dst in reverse direction" check before setting related state.
Link: https://www.synacktiv.com/posts/systems/icmp-reachable.html
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The recent commit 98081ca62c ("ALSA: hda - Record the current power
state before suspend/resume calls") made the HD-audio driver to store
the PM state in power_state field. This forgot, however, the
initialization at power up. Although the codec drivers usually don't
need to refer to this field in the normal operation, let's initialize
it properly for consistency.
Fixes: 98081ca62c ("ALSA: hda - Record the current power state before suspend/resume calls")
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The in-kernel afs filesystem client counts the number of server-level
callback invalidation events (CB.InitCallBackState* RPC operations) that it
receives from the server. This is stored in cb_s_break in various
structures, including afs_server and afs_vnode.
If an inode is examined by afs_validate(), say, the afs_server copy is
compared, along with other break counters, to those in afs_vnode, and if
one or more of the counters do not match, it is considered that the
server's callback promise is broken. At points where this happens,
AFS_VNODE_CB_PROMISED is cleared to indicate that the status must be
refetched from the server.
afs_validate() issues an FS.FetchStatus operation to get updated metadata -
and based on the updated data_version may invalidate the pagecache too.
However, the break counters are also used to determine whether to note a
new callback in the vnode (which would set the AFS_VNODE_CB_PROMISED flag)
and whether to cache the permit data included in the YFSFetchStatus record
by the server.
The problem comes when the server sends us a CB.InitCallBackState op. The
first such instance doesn't cause cb_s_break to be incremented, but rather
causes AFS_SERVER_FL_NEW to be cleared - but thereafter, say some hours
after last use and all the volumes have been automatically unmounted and
the server has forgotten about the client[*], this *will* likely cause an
increment.
[*] There are other circumstances too, such as the server restarting or
needing to make space in its callback table.
Note that the server won't send us a CB.InitCallBackState op until we talk
to it again.
So what happens is:
(1) A mount for a new volume is attempted, a inode is created for the root
vnode and vnode->cb_s_break and AFS_VNODE_CB_PROMISED aren't set
immediately, as we don't have a nominated server to talk to yet - and
we may iterate through a few to find one.
(2) Before the operation happens, afs_fetch_status(), say, notes in the
cursor (fc.cb_break) the break counter sum from the vnode, volume and
server counters, but the server->cb_s_break is currently 0.
(3) We send FS.FetchStatus to the server. The server sends us back
CB.InitCallBackState. We increment server->cb_s_break.
(4) Our FS.FetchStatus completes. The reply includes a callback record.
(5) xdr_decode_AFSCallBack()/xdr_decode_YFSCallBack() check to see whether
the callback promise was broken by checking the break counter sum from
step (2) against the current sum.
This fails because of step (3), so we don't set the callback record
and, importantly, don't set AFS_VNODE_CB_PROMISED on the vnode.
This does not preclude the syscall from progressing, and we don't loop here
rechecking the status, but rather assume it's good enough for one round
only and will need to be rechecked next time.
(6) afs_validate() it triggered on the vnode, probably called from
d_revalidate() checking the parent directory.
(7) afs_validate() notes that AFS_VNODE_CB_PROMISED isn't set, so doesn't
update vnode->cb_s_break and assumes the vnode to be invalid.
(8) afs_validate() needs to calls afs_fetch_status(). Go back to step (2)
and repeat, every time the vnode is validated.
This primarily affects volume root dir vnodes. Everything subsequent to
those inherit an already incremented cb_s_break upon mounting.
The issue is that we assume that the callback record and the cached permit
information in a reply from the server can't be trusted after getting a
server break - but this is wrong since the server makes sure things are
done in the right order, holding up our ops if necessary[*].
[*] There is an extremely unlikely scenario where a reply from before the
CB.InitCallBackState could get its delivery deferred till after - at
which point we think we have a promise when we don't. This, however,
requires unlucky mass packet loss to one call.
AFS_SERVER_FL_NEW tries to paper over the cracks for the initial mount from
a server we've never contacted before, but this should be unnecessary.
It's also further insulated from the problem on an initial mount by
querying the server first with FS.GetCapabilities, which triggers the
CB.InitCallBackState.
Fix this by
(1) Remove AFS_SERVER_FL_NEW.
(2) In afs_calc_vnode_cb_break(), don't include cb_s_break in the
calculation.
(3) In afs_cb_is_broken(), don't include cb_s_break in the check.
Signed-off-by: David Howells <dhowells@redhat.com>
__pagevec_release() complains loudly if any page in the vector is still
locked. The pages need to be locked for generic_error_remove_page(), but
that function doesn't actually unlock them.
Unlock the pages afterwards.
Signed-off-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Jonathan Billings <jsbillin@umich.edu>
Differentiate an abort due to an unmarshalling error from an abort due to
other errors, such as ENETUNREACH. It doesn't make sense to set abort code
RXGEN_*_UNMARSHAL in such a case, so use RX_USER_ABORT instead.
Signed-off-by: David Howells <dhowells@redhat.com>
get_seconds() has a limited range on 32-bit architectures and is
deprecated because of that. While AFS uses the same limits for
its inode timestamps on the wire protocol, let's just use the
simpler current_time() as we do for other file systems.
This will still zero out the 'tv_nsec' field of the timestamps
internally.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Pull x86 fixes from Ingo Molnar:
"Fix typos in user-visible resctrl parameters, and also fix assembly
constraint bugs that might result in miscompilation"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/asm: Use stricter assembly constraints in bitops
x86/resctrl: Fix typos in the mba_sc mount option
Pull timer fix from Ingo Molnar:
"Fix the alarm_timer_remaining() return value"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
alarmtimer: Return correct remaining time
Pull scheduler fix from Ingo Molnar:
"Fix a NULL pointer dereference crash in certain environments"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: Do not re-read ->h_load_next during hierarchical load calculation
Pull perf fixes from Ingo Molnar:
"Six kernel side fixes: three related to NMI handling on AMD systems, a
race fix, a kexec initialization fix and a PEBS sampling fix"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: Fix perf_event_disable_inatomic() race
x86/perf/amd: Remove need to check "running" bit in NMI handler
x86/perf/amd: Resolve NMI latency issues for active PMCs
x86/perf/amd: Resolve race condition when disabling PMC
perf/x86/intel: Initialize TFA MSR
perf/x86/intel: Fix handling of wakeup_events for multi-entry PEBS
Pull locking fix from Ingo Molnar:
"Fixes a crash when accessing /proc/lockdep"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/lockdep: Zap lock classes even with lock debugging disabled
Pull core fixes from Ingo Molnar:
"Fix an objtool warning plus fix a u64_to_user_ptr() macro expansion
bug"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Add rewind_stack_do_exit() to the noreturn list
linux/kernel.h: Use parentheses around argument in u64_to_user_ptr()
Recompile IP options since IPCB may not be valid anymore when
ipv4_link_failure is called from arp_error_report.
Refer to the commit 3da1ed7ac3 ("net: avoid use IPCB in cipso_v4_error")
and the commit before that (9ef6b42ad6) for a similar issue.
Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Howells says:
====================
rxrpc: Fixes
Here is a collection of fixes for rxrpc:
(1) rxrpc_error_report() needs to call sock_error() to clear the error
code from the UDP transport socket, lest it be unexpectedly revisited
on the next kernel_sendmsg() call. This has been causing all sorts of
weird effects in AFS as the effects have typically been felt by the
wrong RxRPC call.
(2) Allow a kernel user of AF_RXRPC to easily detect if an rxrpc call has
completed.
(3) Allow errors incurred by attempting to transmit data through the UDP
socket to get back up the stack to AFS.
(4) Make AFS use (2) to abort the synchronous-mode call waiting loop if
the rxrpc-level call completed.
(5) Add a missing tracepoint case for tracing abort reception.
(6) Fix detection and handling of out-of-order ACKs.
====================
Tested-by: Jonathan Billings <jsbillin@umich.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
The rxrpc packet serial number cannot be safely used to compute out of
order ack packets for several reasons:
1. The allocation of serial numbers cannot be assumed to imply the order
by which acks are populated and transmitted. In some rxrpc
implementations, delayed acks and ping acks are transmitted
asynchronously to the receipt of data packets and so may be transmitted
out of order. As a result, they can race with idle acks.
2. Serial numbers are allocated by the rxrpc connection and not the call
and as such may wrap independently if multiple channels are in use.
In any case, what matters is whether the ack packet provides new
information relating to the bounds of the window (the firstPacket and
previousPacket in the ACK data).
Fix this by discarding packets that appear to wind back the window bounds
rather than on serial number procession.
Fixes: 298bc15b20 ("rxrpc: Only take the rwind and mtu values from latest ACK")
Signed-off-by: Jeffrey Altman <jaltman@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Trace received calls that are aborted due to a connection abort, typically
because of authentication failure. Without this, connection aborts don't
show up in the trace log.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Check the state of the rxrpc call backing an afs call in each iteration of
the call wait loop in case the rxrpc call has already been terminated at
the rxrpc layer.
Interrupt the wait loop and mark the afs call as complete if the rxrpc
layer call is complete.
There were cases where rxrpc errors were not passed up to afs, which could
result in this loop waiting forever for an afs call to transition to
AFS_CALL_COMPLETE while the rx call was already complete.
Signed-off-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change rxrpc_queue_packet()'s signature so that it can return any error
code it may encounter when trying to send the packet.
This allows the caller to eventually do something in case of error - though
it should be noted that the packet has been queued and a resend is
scheduled.
Signed-off-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make rxrpc_kernel_check_life() pass back the life counter through the
argument list and return true if the call has not yet completed.
Suggested-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When an ICMP or ICMPV6 error is received, the error will be attached
to the socket (sk_err) and the report function will get called.
Clear any pending error here by calling sock_error().
This would cause the following attempt to use the socket to fail with
the error code stored by the ICMP error, resulting in unexpected errors
with various side effects depending on the context.
Signed-off-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Jonathan Billings <jsbillin@umich.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
The err2 error return path calls qede_ptp_disable that cleans up
on an error and frees ptp. After this, the free'd ptp is dereferenced
when ptp->clock is set to NULL and the code falls-through to error
path err1 that frees ptp again.
Fix this by calling qede_ptp_disable and exiting via an error
return path that does not set ptp->clock or kfree ptp.
Addresses-Coverity: ("Write to pointer after free")
Fixes: 035744975a ("qede: Add support for PTP resource locking.")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently if a pci dma mapping failure is detected a free'd
memblock address is returned rather than a NULL (that indicates
an error). Fix this by ensuring NULL is returned on this error case.
Addresses-Coverity: ("Use after free")
Fixes: 528f727279 ("vxge: code cleanup and reorganization")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Code which initializes the "clk_init_data.ops" checks pll->rate_table
before that field is ever assigned to so it always picks
"clk_pll1416x_min_ops".
This breaks dynamic rate rounding for features such as cpufreq.
Fix by checking pll_clk->rate_table instead, here pll_clk refers to
the constant initialization data coming from per-soc clk driver.
Signed-off-by: Leonard Crestez <leonard.crestez@nxp.com>
Fixes: 8646d4dcc7 ("clk: imx: Add PLLs driver for imx8mm soc")
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
In most cases, kmalloc() will not be available early in boot when
pci_setup() is called. Thus, the kstrdup() call that was added to fix the
__initdata bug with the disable_acs_redir parameter usually returns NULL,
so the parameter is discarded and has no effect.
To fix this, store the string that's in initdata until an initcall function
can allocate the memory appropriately. This way we don't need any
additional static memory.
Fixes: d2fd6e8191 ("PCI: Fix __initdata issue with "pci=disable_acs_redir" parameter")
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Second batch of iwlwifi fixes intended for v5.1
* fix for a potential deadlock in the TX path;
* a fix for offloaded rate-control;
* support new PCI HW IDs which use a new FW;
Currently rt2x00 devices retransmit the management frames with
incremented sequence number if hardware is assigning the sequence.
This is HW bug fixed already for non-QOS data frames, but it should
be fixed for management frames except beacon.
Without fix retransmitted frames have wrong SN:
AlphaNet_e8:fb:36 Vivotek_52:31:51 Authentication, SN=1648, FN=0, Flags=........C Frame is not being retransmitted 1648 1
AlphaNet_e8:fb:36 Vivotek_52:31:51 Authentication, SN=1649, FN=0, Flags=....R...C Frame is being retransmitted 1649 1
AlphaNet_e8:fb:36 Vivotek_52:31:51 Authentication, SN=1650, FN=0, Flags=....R...C Frame is being retransmitted 1650 1
With the fix SN stays correctly the same:
88:6a:e3:e8:f9:a2 8c:f5:a3:88:76:87 Authentication, SN=1450, FN=0, Flags=........C
88:6a:e3:e8:f9:a2 8c:f5:a3:88:76:87 Authentication, SN=1450, FN=0, Flags=....R...C
88:6a:e3:e8:f9:a2 8c:f5:a3:88:76:87 Authentication, SN=1450, FN=0, Flags=....R...C
Cc: stable@vger.kernel.org
Signed-off-by: Vijayakumar Durai <vijayakumar.durai1@vivint.com>
[sgruszka: simplify code, change comments and changelog]
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Now that the sequence number allocation is fixed, we can finally send a BAR
at powersave wakeup time to refresh the receiver side reorder window
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
If the MT_TXD3_SN_VALID flag is not set in the tx descriptor, the hardware
assigns the sequence number. However, the rest of the code assumes that the
sequence number specified in the 802.11 header gets transmitted.
This was causing issues with the aggregation setup, which worked for the
initial one (where the sequence numbers were still close), but not for
further teardown/re-establishing of sessions.
Additionally, the overwrite of the TID sequence number in WTBL2 was resetting
the hardware assigned sequence numbers, causing them to drift further apart.
Fix this by using the software assigned sequence numbers
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
KMSAN will complain if valid address length passed to udpv6_pre_connect()
is shorter than sizeof("struct sockaddr"->sa_family) bytes.
(This patch is bogus if it is guaranteed that udpv6_pre_connect() is
always called after checking "struct sockaddr"->sa_family. In that case,
we want a comment why we don't need to check valid address length here.)
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
KMSAN will complain if valid address length passed to bpf_bind() is
shorter than sizeof("struct sockaddr"->sa_family) bytes.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
KMSAN will complain if valid address length passed to bind() is shorter
than sizeof(struct sockaddr_llc) bytes.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
KMSAN will complain if valid address length passed to bind() is shorter
than sizeof(struct sockaddr_sco) bytes.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
KMSAN will complain if valid address length passed to bind() is shorter
than sizeof(struct sockaddr_rxrpc) bytes.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
KMSAN will complain if valid address length passed to bind() is shorter
than sizeof(struct sockaddr_nl) bytes.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
KMSAN will complain if valid address length passed to bind() is shorter
than sizeof("struct sockaddr_mISDN"->family) bytes.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
shadow was added into shadow_list by amdgpu_bo_create_shadow.
meanwhile, shadow->tbo.mem was not fully configured.
tbo.mem would be fully configured by amdgpu_vm_sdma_map_table until calling amdgpu_vm_clear_bo.
If sriov TDR occurred between amdgpu_bo_create_shadow and amdgpu_vm_sdma_map_table,
amdgpu_device_recover_vram would deal with shadow without tbo.mem.start.
Signed-off-by: Wentao Lou <Wentao.Lou@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Pull dma-mapping fixes from Christoph Hellwig:
"Fix a sparc64 sun4v_pci regression introduced in this merged window,
and a dma-debug stracktrace regression from the big refactor last
merge window"
* tag 'dma-mapping-5.1-1' of git://git.infradead.org/users/hch/dma-mapping:
dma-debug: only skip one stackframe entry
sparc64/pci_sun4v: fix ATU checks for large DMA masks
Pull IOMMU fix from Joerg Roedel:
"Fix an AMD IOMMU issue where the driver didn't correctly setup the
exclusion range in the hardware registers, resulting in exclusion
ranges being one page too big.
This can cause data corruption of the address of that last page is
used by DMA operations"
* tag 'iommu-fix-v5.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/amd: Set exclusion range correctly
Pull clang-format update from Miguel Ojeda:
"The usual roughly-per-release .clang-format macro list update"
* tag 'clang-format-for-linus-v5.1-rc5' of git://github.com/ojeda/linux:
clang-format: Update with the latest for_each macro list
Pull MMC host fixes from Ulf Hansson:
- alcor: Stabilize data write requests
- sdhci-omap: Fix command error path during tuning
* tag 'mmc-v5.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: sdhci-omap: Don't finish_mrq() on a command error during tuning
mmc: alcor: don't write data before command has completed
Pull sound fixes from Takashi Iwai:
"Well, this one became unpleasantly larger than previous pull requests,
but it's a kind of usual pattern: now it contains a collection of ASoC
fixes, and nothing to worry too much.
The fixes for ASoC core (DAPM, DPCM, topology) are all small and just
covering corner cases. The rest changes are driver-specific, many of
which are for x86 platforms and new drivers like STM32, in addition to
the usual fixups for HD-audio"
* tag 'sound-5.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (66 commits)
ASoC: wcd9335: Fix missing regmap requirement
ALSA: hda: Fix racy display power access
ASoC: pcm: fix error handling when try_module_get() fails.
ASoC: stm32: sai: fix master clock management
ASoC: Intel: kbl: fix wrong number of channels
ALSA: hda - Add two more machines to the power_save_blacklist
ASoC: pcm: update module refcount if module_get_upon_open is set
ASoC: core: conditionally increase module refcount on component open
ASoC: stm32: fix sai driver name initialisation
ASoC: topology: Use the correct dobj to free enum control values and texts
ALSA: seq: Fix OOB-reads from strlcpy
ASoC: intel: skylake: add remove() callback for component driver
ASoC: cs35l35: Disable regulators on driver removal
ALSA: xen-front: Do not use stream buffer size before it is set
ASoC: rockchip: pdm: change dma burst to 8
ASoC: rockchip: pdm: fix regmap_ops hang issue
ASoC: simple-card: don't select DPCM via simple-audio-card
ASoC: audio-graph-card: don't select DPCM via audio-graph-card
ASoC: tlv320aic32x4: Change author's name
ALSA: hda/realtek - Add quirk for Tuxedo XC 1509
...
Pull ACPI fix from Rafael Wysocki:
"Fix an ACPICA issue introduced during the 4.20 development cycle and
causing some systems to crash because of leftover operation region
data still maintained after the operation region in question has gone
away (Erik Schmauss)"
* tag 'acpi-5.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPICA: Namespace: remove address node from global list after method termination
Pull drm fixes from Dave Airlie:
"Fixes across the driver spectrum this week, the mediatek fbdev support
might be a bit late for this round, but I looked over it and it's not
very large and seems like a useful feature for them.
Otherwise the main thing is a regression fix for i915 5.0 bug that
caused black screens on a bunch of Dell XPS 15s I think, I know at
least Fedora is waiting for this to land, and the udl fix is also for
a regression since 5.0 where unplugging the device would end badly.
core:
- make atomic hooks optional
i915:
- Revert a 5.0 regression where some eDP panels stopped working
- DSI related fixes for platforms up to IceLake
- GVT (regression fix, warning fix, use-after free fix)
amdgpu:
- Cursor fixes
- missing PCI ID fix for KFD
- XGMI fix
- shadow buffer handling after reset fix
udl:
- fix unplugging device crashes.
mediatek:
- stabilise MT2701 HDMI support
- fbdev support
tegra:
- fix for build regression in rc1.
sun4i:
- Allwinner A6 max freq improvements
- null ptr deref fix
dw-hdmi:
- SCDC configuration improvements
omap:
- CEC clock management policy fix"
* tag 'drm-fixes-2019-04-12' of git://anongit.freedesktop.org/drm/drm: (32 commits)
gpu: host1x: Fix compile error when IOMMU API is not available
drm/i915/gvt: Roundup fb->height into tile's height at calucation fb->size
drm/i915/dp: revert back to max link rate and lane count on eDP
drm/i915/icl: Fix port disable sequence for mipi-dsi
drm/i915/icl: Ungate ddi clocks before IO enable
drm/mediatek: no change parent rate in round_rate() for MT2701 hdmi phy
drm/mediatek: using new factor for tvdpll for MT2701 hdmi phy
drm/mediatek: remove flag CLK_SET_RATE_PARENT for MT2701 hdmi phy
drm/mediatek: make implementation of recalc_rate() for MT2701 hdmi phy
drm/mediatek: fix the rate and divder of hdmi phy for MT2701
drm/mediatek: fix possible object reference leak
drm/i915: Get power refs in encoder->get_power_domains()
drm/i915: Fix pipe_bpp readout for BXT/GLK DSI
drm/amd/display: Fix negative cursor pos programming (v2)
drm/sun4i: tcon top: Fix NULL/invalid pointer dereference in sun8i_tcon_top_un/bind
drm/udl: add a release method and delay modeset teardown
drm/i915/gvt: Prevent use-after-free in ppgtt_free_all_spt()
drm/i915/gvt: Annotate iomem usage
drm/sun4i: DW HDMI: Lower max. supported rate for H6
Revert "Documentation/gpu/meson: Remove link to meson_canvas.c"
...
Rather embarrassingly, our futex() FUTEX_WAKE_OP implementation doesn't
explicitly set the return value on the non-faulting path and instead
leaves it holding the result of the underlying atomic operation. This
means that any FUTEX_WAKE_OP atomic operation which computes a non-zero
value will be reported as having failed. Regrettably, I wrote the buggy
code back in 2011 and it was upstreamed as part of the initial arm64
support in 2012.
The reasons we appear to get away with this are:
1. FUTEX_WAKE_OP is rarely used and therefore doesn't appear to get
exercised by futex() test applications
2. If the result of the atomic operation is zero, the system call
behaves correctly
3. Prior to version 2.25, the only operation used by GLIBC set the
futex to zero, and therefore worked as expected. From 2.25 onwards,
FUTEX_WAKE_OP is not used by GLIBC at all.
Fix the implementation by ensuring that the return value is either 0
to indicate that the atomic operation completed successfully, or -EFAULT
if we encountered a fault when accessing the user mapping.
Cc: <stable@kernel.org>
Fixes: 6170a97460 ("arm64: Atomic operations")
Signed-off-by: Will Deacon <will.deacon@arm.com>
The exlcusion range limit register needs to contain the
base-address of the last page that is part of the range, as
bits 0-11 of this register are treated as 0xfff by the
hardware for comparisons.
So correctly set the exclusion range in the hardware to the
last page which is _in_ the range.
Fixes: b2026aa2dc ('x86, AMD IOMMU: add functions for programming IOMMU MMIO space')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Re-run the shell fragment that generated the original list now that
there are two dozens of new entries after v5.1's merge window.
Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Thomas-Mich Richter reported he triggered a WARN()ing from event_function_local()
on his s390. The problem boils down to:
CPU-A CPU-B
perf_event_overflow()
perf_event_disable_inatomic()
@pending_disable = 1
irq_work_queue();
sched-out
event_sched_out()
@pending_disable = 0
sched-in
perf_event_overflow()
perf_event_disable_inatomic()
@pending_disable = 1;
irq_work_queue(); // FAILS
irq_work_run()
perf_pending_event()
if (@pending_disable)
perf_event_disable_local(); // WHOOPS
The problem exists in generic, but s390 is particularly sensitive
because it doesn't implement arch_irq_work_raise(), nor does it call
irq_work_run() from it's PMU interrupt handler (nor would that be
sufficient in this case, because s390 also generates
perf_event_overflow() from pmu::stop). Add to that the fact that s390
is a virtual architecture and (virtual) CPU-A can stall long enough
for the above race to happen, even if it would self-IPI.
Adding a irq_work_sync() to event_sched_in() would work for all hardare
PMUs that properly use irq_work_run() but fails for software PMUs.
Instead encode the CPU number in @pending_disable, such that we can
tell which CPU requested the disable. This then allows us to detect
the above scenario and even redirect the IPI to make up for the failed
queue.
Reported-by: Thomas-Mich Richter <tmricht@linux.ibm.com>
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Hendrik Brueckner <brueckner@linux.ibm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
After commit e21db6f69a ("tcp: track total bytes delivered with ECN CE marks")
core TCP stack does a very good job tracking ECN signals.
The "sender's best estimate of CE information" Yuchung mentioned in his
patch is indeed the best we can do.
DCTCP can use tp->delivered_ce and tp->delivered to not duplicate the logic,
and use the existing best estimate.
This solves some problems, since current DCTCP logic does not deal with losses
and/or GRO or ack aggregation very well.
This also removes a dubious use of inet_csk(sk)->icsk_ack.rcv_mss
(this should have been tp->mss_cache), and a 64 bit divide.
Finally, we can see that the DCTCP logic, calling dctcp_update_alpha() for
every ACK could be done differently, calling it only once per RTT.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Lawrence Brakmo <brakmo@fb.com>
Cc: Abdul Kabbani <akabbani@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Revert back to max link rate and lane count on eDP.
- DSI related fixes for all platforms including Ice Lake.
- GVT Fixes including one vGPU display plane size regression fix,
one for preventing use-after-free in ppgtt shadow free function,
and another warning fix for iomem access annotation.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190411235832.GA6476@intel.com
If the last bio returned is not dio->bio, the status of the bio will
not assigned to dio->bio if it is error. This will cause the whole IO
status wrong.
ksoftirqd/21-117 [021] ..s. 4017.966090: 8,0 C N 4883648 [0]
<idle>-0 [018] ..s. 4017.970888: 8,0 C WS 4924800 + 1024 [0]
<idle>-0 [018] ..s. 4017.970909: 8,0 D WS 4935424 + 1024 [<idle>]
<idle>-0 [018] ..s. 4017.970924: 8,0 D WS 4936448 + 321 [<idle>]
ksoftirqd/21-117 [021] ..s. 4017.995033: 8,0 C R 4883648 + 336 [65475]
ksoftirqd/21-117 [021] d.s. 4018.001988: myprobe1: (blkdev_bio_end_io+0x0/0x168) bi_status=7
ksoftirqd/21-117 [021] d.s. 4018.001992: myprobe: (aio_complete_rw+0x0/0x148) x0=0xffff802f2595ad80 res=0x12a000 res2=0x0
We always have to assign bio->bi_status to dio->bio.bi_status because we
will only check dio->bio.bi_status when we return the whole IO to
the upper layer.
Fixes: 542ff7bf18 ("block: new direct I/O implementation")
Cc: stable@vger.kernel.org
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull btrfs fixes from David Sterba:
- fix parsing of compression algorithm when set as a inode property,
this could end up with eg. 'zst' or 'zli' in the value
- don't allow trim on a filesystem with unreplayed log, this could
cause data loss if there are pending updates to the block groups that
would not be subject to trim after replay
* tag 'for-5.1-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: prop: fix vanished compression property after failed set
btrfs: prop: fix zstd compression parameter validation
Btrfs: do not allow trimming when a fs is mounted with the nologreplay option
A couple of tests are verifying a route has been removed. The helper
expects the prefix as the first part of the expected output. When
checking that a route has been deleted the prefix is empty leading
to an invalid ip command:
$ ip ro ls match
Command line is not complete. Try option "help"
Fix by moving the comparison of expected output and output to a new
function that is used by both check_route and check_route6. Use the
new helper for the 2 checks on route removal.
Also, remove the reset of 'set -x' in route_setup which overrides the
user managed setting.
Fixes: d69faad765 ("selftests: fib_tests: Add prefix route tests with metric")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some SOC like i.MX6SX clock have some limits:
- ahb clock should be disabled before ipg.
- ahb and ipg clocks are required for MAC MII bus.
So, move the ahb clock to runtime management together with
ipg clock.
Signed-off-by: Fugang Duan <fugang.duan@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After this commit ded24019b6b6f(clocksource: arm_arch_timer: clean up
printk usage), the previous macro is redundant, so delete it.
And move the new macro to the previous position.
Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
When this is disabled, we get a link failure:
drivers/clocksource/timer-npcm7xx.o: In function `npcm7xx_timer_init':
timer-npcm7xx.c:(.init.text+0xf): undefined reference to `timer_of_init'
Fixes: 1c00289ecd ("clocksource/drivers/npcm: Add NPCM7xx timer driver")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
This reverts commit 009a82f643.
The ability to optimise here relies on compiler being able to optimise
away tail calls to avoid stack overflows. Unfortunately, we are seeing
reports of problems, so let's just revert.
Reported-by: Daniel Mack <daniel@zonque.org>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
According to the NFSv4.2 spec if the input and output file is the
same file, operation should fail with EINVAL. However, linux
copy_file_range() system call has no such restrictions. Therefore,
in such case let's return EOPNOTSUPP and allow VFS to fallback
to doing do_splice_direct(). Also when copy_file_range is called
on an NFSv4.0 or 4.1 mount (ie., a server that doesn't support
COPY functionality), we also need to return EOPNOTSUPP and
fallback to a regular copy.
Fixes xfstest generic/075, generic/091, generic/112, generic/263
for all NFSv4.x versions.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
We want to drain only the RQ first. Otherwise the transport can
deadlock on ->close if there are outstanding Send completions.
Fixes: 6d2d0ee27c ("xprtrdma: Replace rpcrdma_receive_wq ... ")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: stable@vger.kernel.org # v5.0+
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
NFSv4 GETACL and FS_LOCATIONS requests stopped working in v5.1-rc.
These two need the extra padding to be added directly to the reply
length.
Reported-by: Olga Kornievskaia <aglo@umich.edu>
Fixes: 02ef04e432 ("NFS: Account for XDR pad of buf->pages")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Olga Kornievskaia <aglo@umich.edu>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
syzbot is reporting uninitialized value at rpc_sockaddr2uaddr() [1]. This
is because syzbot is setting AF_INET6 to "struct sockaddr_in"->sin_family
(which is embedded into user-visible "struct nfs_mount_data" structure)
despite nfs23_validate_mount_data() cannot pass sizeof(struct sockaddr_in6)
bytes of AF_INET6 address to rpc_sockaddr2uaddr().
Since "struct nfs_mount_data" structure is user-visible, we can't change
"struct nfs_mount_data" to use "struct sockaddr_storage". Therefore,
assuming that everybody is using AF_INET family when passing address via
"struct nfs_mount_data"->addr, reject if its sin_family is not AF_INET.
[1] https://syzkaller.appspot.com/bug?id=599993614e7cbbf66bc2656a919ab2a95fb5d75c
Reported-by: syzbot <syzbot+047a11c361b872896a4f@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
br_multicast_start_querier() walks over the port list but it can be
called from a timer with only multicast_lock held which doesn't protect
the port list, so use RCU to walk over it.
Fixes: c83b8fab06 ("bridge: Restart queries when last querier expires")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Matteo Croce says:
====================
Fix thunderx MTU with XDP
The thunderx driver can't use XDP with all MTU values.
This patches sets the right MTU values, and add a check to avoid setting
a wrong value which will not function.
v3: Fix a copy-paste from two functions, tested on proper hardware:
2: enP2p1s0v0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether 1c:1b:0d:0d:52:a4 brd ff:ff:ff:ff:ff:ff
[ 787.019730] nicvf 0002:01:00.1 enP2p1s0v0: Jumbo frames not yet supported with XDP, current MTU 1800.
RTNETLINK answers: Operation not supported
[ 800.574568] nicvf 0002:01:00.1 enP2p1s0v0: Link is Up 10000 Mbps Full duplex
[ 807.248321] nicvf 0002:01:00.1 enP2p1s0v0: Jumbo frames not yet supported with XDP, current MTU 1500.
RTNETLINK answers: Invalid argument
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The thunderx driver forbids to load an eBPF program if the MTU is too high,
but this can be circumvented by loading the eBPF, then raising the MTU.
Fix this by limiting the MTU if an eBPF program is already loaded.
Fixes: 05c773f52b ("net: thunderx: Add basic XDP support")
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The thunderx driver splits frames bigger than 1530 bytes to multiple
pages, making impossible to run an eBPF program on it.
This leads to a maximum MTU of 1508 if QinQ is in use.
The thunderx driver forbids to load an eBPF program if the MTU is higher
than 1500 bytes. Raise the limit to 1508 so it is possible to use L2
protocols which need some more headroom.
Fixes: 05c773f52b ("net: thunderx: Add basic XDP support")
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ursula Braun says:
====================
net/smc: fixes 2019-04-11
here are some fixes in different areas of the smc code for the net
tree.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit <26d92e951fe0>
("net/smc: move unhash as early as possible in smc_release()")
fixes one occurrence in the smc code, but the same pattern exists
in other places. This patch covers the remaining occurrences and
makes sure, the unhash operation is done before the smc->clcsock is
released. This avoids a potential use-after-free in smc_diag_dump().
Reviewed-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The FLUSH command is used to empty the pnet table. No return code is
expected from the command. Commit a9d8b0b1e3d6 added namespace support
for the pnet table and changed the FLUSH command processing to call
smc_pnet_remove_by_pnetid() to remove the pnet entries. This function
returns -ENOENT when no entry was deleted, which is now the return code
of the FLUSH command. As a result the FLUSH command will return an error
when the pnet table is already empty.
Restore the expected behavior and let FLUSH always return 0.
Fixes: a9d8b0b1e3d6 ("net/smc: add pnet table namespace support")
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
fcntl(fd, F_SETOWN, getpid()) selects the recipient of SIGURG signals
that are delivered when out-of-band data arrives on socket fd.
If an SMC socket program makes use of such an fcntl() call, it fails
in case of fallback to TCP-mode. In case of fallback the traffic is
processed with the internal TCP socket. Propagating field "file" from the
SMC socket to the internal TCP socket fixes the issue.
Reviewed-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case alloc_ordered_workqueue fails, the fix returns NULL
to avoid NULL pointer dereference.
Signed-off-by: Kangjie Lu <kjlu@umn.edu>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the clcsock is already released using sock_release() and a pending
smc_listen_work accesses the clcsock than that will fail. Solve this
by canceling and waiting for the work to complete first. Because the
work holds the sock_lock it must make sure that the lock is not hold
before the new helper smc_clcsock_release() is invoked. And before the
smc_listen_work starts working check if the parent listen socket is
still valid, otherwise stop the work early.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With skip set to 1, I get a traceback like this:
[ 106.867637] DMA-API: Mapped at:
[ 106.870784] afu_dma_map_region+0x2cd/0x4f0 [dfl_afu]
[ 106.875839] afu_ioctl+0x258/0x380 [dfl_afu]
[ 106.880108] do_vfs_ioctl+0xa9/0x720
[ 106.883688] ksys_ioctl+0x60/0x90
[ 106.887007] __x64_sys_ioctl+0x16/0x20
With the previous value of 2, afu_dma_map_region was being omitted. I
suspect that the code paths have simply changed since the value of 2 was
chosen a decade ago, but it's also possible that it varies based on which
mapping function was used, compiler inlining choices, etc. In any case,
it's best to err on the side of skipping less.
Signed-off-by: Scott Wood <swood@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
It's used by probe and that isn't an init function. Drop this so that we
don't get a section mismatch.
Reported-by: kbuild test robot <lkp@intel.com>
Cc: David Müller <dave.mueller@gmx.ch>
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Fixes: 7c2e071300 ("clk: x86: Add system specific quirk to mark clocks as critical")
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
If SMMU support is not available, fall back to programming the bypass
stream ID (0x7f).
Fixes: de5469c21f ("gpu: host1x: Program the channel stream ID")
Suggested-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com>
[treding@nvidia.com: rebase this on top of a later build fix]
Signed-off-by: Thierry Reding <treding@nvidia.com>
Pull NVMe fixes from Christoph:
"Two nvme fixes for 5.1 - fixing the initial CSN for nvme-fc, and handle
log page offsets properly in the target."
* 'nvme-5.1' of git://git.infradead.org/nvme:
nvmet: fix discover log page when offsets are used
nvme-fc: correct csn initialization and increments on error
The nvme target hadn't been taking the Get Log Page offset parameter
into consideration, and so has been returning corrupted log pages when
offsets are used. Since many tools, including nvme-cli, split the log
request to 4k, we've been breaking discovery log responses when more
than 3 subsystems exist.
Fix the returned data by internally generating the entire discovery
log page and copying only the requested bytes into the user buffer. The
command log page offset type has been modified to a native __le64 to
make it easier to extract the value from a command.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Tested-by: Minwoo Im <minwoo.im@samsung.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
This patch fixes a long-standing bug that initialized the FC-NVME
cmnd iu CSN value to 1. Early FC-NVME specs had the connection starting
with CSN=1. By the time the spec reached approval, the language had
changed to state a connection should start with CSN=0. This patch
corrects the initialization value for FC-NVME connections.
Additionally, in reviewing the transport, the CSN value is assigned to
the new IU early in the start routine. It's possible that a later dma
map request may fail, causing the command to never be sent to the
controller. Change the location of the assignment so that it is
immediately prior to calling the lldd. Add a comment block to explain
the impacts if the lldd were to additionally fail sending the command.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
[Why]
AUX takes longer to reply when using active DP-DVI dongle on some asics
resulting in up to 2000+ us edid read (timeout).
[How]
1. Adjust AUX poll to match spec
2. Extend the SW timeout. This does not affect normal
operation since we exit the loop as soon as AUX acks.
Signed-off-by: Martin Leung <martin.leung@amd.com>
Reviewed-by: Jun Lei <Jun.Lei@amd.com>
Acked-by: Joshua Aberback <Joshua.Aberback@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
the ttm_bo_add_move_fence takes a reference to the struct dma_fence, but
failed to release it on the error path, leading to a memory leak.
add dma_fence_put before return when error occur.
Signed-off-by: Lin Yi <teroincn@163.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
When we increment the counter we need to increment the pointer as well.
Signed-off-by: Christian König <christian.koenig@amd.com>
Fixes: e16858a7e6e7 drm/ttm: fix start page for huge page check in ttm_put_pages()
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Acked-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
When ttm_put_pages() tries to figure out whether it's dealing with
transparent hugepages, it just reads past the bounds of the pages array
without a check.
v2: simplify the test if enough pages are left in the array (Christian).
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Fixes: 5c42c64f7d ("drm/ttm: fix the fix for huge compound pages")
Cc: stable@vger.kernel.org
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
When setting sync EIC as IRQ_TYPE_EDGE_BOTH type, we missed to set the
SPRD_EIC_SYNC_INTMODE register to 0, which means detecting edge signals.
Thus this patch fixes the issue.
Fixes: 25518e024e ("gpio: Add Spreadtrum EIC driver support")
Cc: <stable@vger.kernel.org>
Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
ASoC: Fixes for v5.1
A few core fixes along with the driver specific ones, mainly fixing
small issues that only affect x86 platforms for various reasons (their
unusual machine enumeration mechanisms mainly, plus a fix for error
handling in topology).
There's some of the driver fixes that look larger than they are, like
the hdmi-codec changes which resulted in an indentation change, and most
of the other large changes are for new drivers like the STM32 changes.
commit 5b0d62108b ("mmc: sdhci-omap: Add platform specific reset
callback") skips data resets during tuning operation. Because of this,
a data error or data finish interrupt might still arrive after a command
error has been handled and the mrq ended. This ends up with a "mmc0: Got
data interrupt 0x00000002 even though no data operation was in progress"
error message.
Fix this by adding a platform specific callback for sdhci_irq. Mark the
mrq as a failure but wait for a data interrupt instead of calling
finish_mrq().
Fixes: 5b0d62108b ("mmc: sdhci-omap: Add platform specific reset
callback")
Signed-off-by: Faiz Abbas <faiz_abbas@ti.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
In case the IOMMU API is not available compiling host1x fails with
the following error:
In file included from drivers/gpu/host1x/hw/host1x06.c:27:
drivers/gpu/host1x/hw/channel_hw.c: In function ‘host1x_channel_set_streamid’:
drivers/gpu/host1x/hw/channel_hw.c:118:30: error: implicit declaration of function
‘dev_iommu_fwspec_get’; did you mean ‘iommu_fwspec_free’? [-Werror=implicit-function-declaration]
struct iommu_fwspec *spec = dev_iommu_fwspec_get(channel->dev->parent);
^~~~~~~~~~~~~~~~~~~~
iommu_fwspec_free
Fixes: de5469c21f ("gpu: host1x: Program the channel stream ID")
Signed-off-by: Stefan Agner <stefan@agner.ch>
Signed-off-by: Thierry Reding <treding@nvidia.com>
gue tunnels run iptunnel_pull_offloads on received skbs. This can
determine a possible use-after-free accessing guehdr pointer since
the packet will be 'uncloned' running pskb_expand_head if it is a
cloned gso skb (e.g if the packet has been sent though a veth device)
Fixes: a09a4c8dd1 ("tunnels: Remove encapsulation offloads on decap")
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When binding multiple services with specific type 1Ki, 2Ki..,
this leads to some entries in the name table of publications
missing when listed out via 'tipc name show'.
The problem is at identify zero last_type conditional provided
via netlink. The first is initial 'type' when starting name table
dummping. The second is continuously with zero type (node state
service type). Then, lookup function failure to finding node state
service type in next iteration.
To solve this, adding more conditional to marked as dirty type and
lookup correct service type for the next iteration instead of select
the first service as initial 'type' zero.
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Unlike '&&' operator, the '&' does not have short-circuit
evaluation semantics. IOW both sides of the operator always
get evaluated. Fix the wrong operator in
tls_is_sk_tx_device_offloaded(), which would lead to
out-of-bounds access for for non-full sockets.
Fixes: 4799ac81e5 ("tls: Add rx inline crypto offload")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a netdev appears through hot plug then gets enslaved by a failover
master that is already up and running, the slave will be opened
right away after getting enslaved. Today there's a race that userspace
(udev) may fail to rename the slave if the kernel (net_failover)
opens the slave earlier than when the userspace rename happens.
Unlike bond or team, the primary slave of failover can't be renamed by
userspace ahead of time, since the kernel initiated auto-enslavement is
unable to, or rather, is never meant to be synchronized with the rename
request from userspace.
As the failover slave interfaces are not designed to be operated
directly by userspace apps: IP configuration, filter rules with
regard to network traffic passing and etc., should all be done on master
interface. In general, userspace apps only care about the
name of master interface, while slave names are less important as long
as admin users can see reliable names that may carry
other information describing the netdev. For e.g., they can infer that
"ens3nsby" is a standby slave of "ens3", while for a
name like "eth0" they can't tell which master it belongs to.
Historically the name of IFF_UP interface can't be changed because
there might be admin script or management software that is already
relying on such behavior and assumes that the slave name can't be
changed once UP. But failover is special: with the in-kernel
auto-enslavement mechanism, the userspace expectation for device
enumeration and bring-up order is already broken. Previously initramfs
and various userspace config tools were modified to bypass failover
slaves because of auto-enslavement and duplicate MAC address. Similarly,
in case that users care about seeing reliable slave name, the new type
of failover slaves needs to be taken care of specifically in userspace
anyway.
It's less risky to lift up the rename restriction on failover slave
which is already UP. Although it's possible this change may potentially
break userspace component (most likely configuration scripts or
management software) that assumes slave name can't be changed while
UP, it's relatively a limited and controllable set among all userspace
components, which can be fixed specifically to listen for the rename
events on failover slaves. Userspace component interacting with slaves
is expected to be changed to operate on failover master interface
instead, as the failover slave is dynamic in nature which may come and
go at any point. The goal is to make the role of failover slaves less
relevant, and userspace components should only deal with failover master
in the long run.
Fixes: 30c8bd5aa8 ("net: Introduce generic failover module")
Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Acked-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When fb is tiled and fb->height isn't the multiple of tile's height,
the format fb->size = fb->stride * fb->height, will get a smaller size
than the actual size. As the memory height of tiled fb should be multiple
of tile's height.
Fixes: 7f1a93b1f1 ("drm/i915/gvt: Correct the calculation of plane size")
Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
After adding a team interface to bridge, the team interface will enter
promisc mode. Then if we add a new slave to team0, the slave will keep
promisc off. Fix it by setting slave to promisc on if team master is
already in promisc mode, also do the same for allmulti.
v2: add promisc and allmulti checking when delete ports
Fixes: 3d249d4ca7 ("net: introduce ethernet teaming device")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
buildbot noticed that TLS_HW is not defined if CONFIG_TLS_DEVICE=n.
Wrap the cleanup branch into an ifdef, tls_device_free_resources_tx()
wouldn't be compiled either in this case.
Fixes: 35b71a34ad ("net/tls: don't leak partially sent record in device mode")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit 648e921888 ("clk: x86: Stop marking clocks as
CLK_IS_CRITICAL"), the pmc_plt_clocks of the Bay Trail SoC are
unconditionally gated off. Unfortunately this will break systems where these
clocks are used for external purposes beyond the kernel's knowledge. Fix it
by implementing a system specific quirk to mark the necessary pmc_plt_clks as
critical.
Fixes: 648e921888 ("clk: x86: Stop marking clocks as CLK_IS_CRITICAL")
Signed-off-by: David Müller <dave.mueller@gmx.ch>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
During a safe hot remove, the OS powers off the slot, which may cause a
Data Link Layer State Changed event. The slot has already been set to
OFF_STATE, so that event results in re-enabling the device, making it
impossible to safely remove it.
Clear out the Presence Detect Changed and Data Link Layer State Changed
events when the disabled slot has settled down.
It is still possible to re-enable the device if it remains in the slot
after pressing the Attention Button by pressing it again.
Fixes the problem that Micah reported below: an NVMe drive power button may
not actually turn off the drive.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=203237
Reported-by: Micah Parrish <micah.parrish@hpe.com>
Tested-by: Micah Parrish <micah.parrish@hpe.com>
Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
[bhelgaas: changelog, add bugzilla URL]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Cc: stable@vger.kernel.org # v4.19+
Jakub Kicinski says:
====================
net: tls: fix memory leaks and freeing skbs
This series fixes two memory issues and a stack overflow.
First two patches are fairly simple leaks. Third patch
partially reverts an optimization made to the strparser
which causes creation of skb->frag_list->skb->frag_list...
chains of 100s of skbs, leading to recursive kfree_skb()
filling up the kernel stack.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts the first part of commit 4e485d06bb ("strparser: Call
skb_unclone conditionally"). To build a message with multiple
fragments we need our own root of frag_list. We can't simply
use the frag_list of orig_skb, because it will lead to linking
all orig_skbs together creating very long frag chains, and causing
stack overflow on kfree_skb() (which is called recursively on
the frag_lists).
BUG: stack guard page was hit at 00000000d40fad41 (stack is 0000000029dde9f4..000000008cce03d5)
kernel stack overflow (double-fault): 0000 [#1] PREEMPT SMP
RIP: 0010:free_one_page+0x2b/0x490
Call Trace:
__free_pages_ok+0x143/0x2c0
skb_release_data+0x8e/0x140
? skb_release_data+0xad/0x140
kfree_skb+0x32/0xb0
[...]
skb_release_data+0xad/0x140
? skb_release_data+0xad/0x140
kfree_skb+0x32/0xb0
skb_release_data+0xad/0x140
? skb_release_data+0xad/0x140
kfree_skb+0x32/0xb0
skb_release_data+0xad/0x140
? skb_release_data+0xad/0x140
kfree_skb+0x32/0xb0
skb_release_data+0xad/0x140
? skb_release_data+0xad/0x140
kfree_skb+0x32/0xb0
skb_release_data+0xad/0x140
__kfree_skb+0xe/0x20
tcp_disconnect+0xd6/0x4d0
tcp_close+0xf4/0x430
? tcp_check_oom+0xf0/0xf0
tls_sk_proto_close+0xe4/0x1e0 [tls]
inet_release+0x36/0x60
__sock_release+0x37/0xa0
sock_close+0x11/0x20
__fput+0xa2/0x1d0
task_work_run+0x89/0xb0
exit_to_usermode_loop+0x9a/0xa0
do_syscall_64+0xc0/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Let's leave the second unclone conditional, as I'm not entirely
sure what is its purpose :)
Fixes: 4e485d06bb ("strparser: Call skb_unclone conditionally")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David reports that tls triggers warnings related to
sk->sk_forward_alloc not being zero at destruction time:
WARNING: CPU: 5 PID: 6831 at net/core/stream.c:206 sk_stream_kill_queues+0x103/0x110
WARNING: CPU: 5 PID: 6831 at net/ipv4/af_inet.c:160 inet_sock_destruct+0x15b/0x170
When sender fills up the write buffer and dies from
SIGPIPE. This is due to the device implementation
not cleaning up the partially_sent_record.
This is because commit a42055e8d2 ("net/tls: Add support for async encryption of records for performance")
moved the partial record cleanup to the SW-only path.
Fixes: a42055e8d2 ("net/tls: Add support for async encryption of records for performance")
Reported-by: David Beckett <david.beckett@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit f66de3ee2c ("net/tls: Split conf to rx + tx") made
freeing of IV and record sequence number conditional to SW
path only, but commit e8f6979981 ("net/tls: Add generic NIC
offload infrastructure") also allocates that state for the
device offload configuration. Remember to free it.
Fixes: e8f6979981 ("net/tls: Add generic NIC offload infrastructure")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that we allow drivers to always need to set larger than required
DMA masks we need to be a little more careful in the sun4v PCI iommu
driver to chose when to select the ATU support - a larger DMA mask
can be set even when the platform does not support ATU, so we always
have to check if it is avaiable before using it. Add a little helper
for that and use it in all the places where we make ATU usage decisions
based on the DMA mask.
Fixes: 24132a419c ("sparc64/pci_sun4v: allow large DMA masks")
Reported-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Meelis Roos <mroos@linux.ee>
Acked-by: David S. Miller <davem@davemloft.net>
Pull rdma fixes from Jason Gunthorpe:
"Several driver bug fixes posted in the last several weeks
- Several bug fixes for the hfi1 driver 'TID RDMA' functionality
merged into 5.1. Since TID RDMA is on by default these all seem to
be regressions.
- Wrong software permission checks on memory in mlx5
- Memory leak in vmw_pvrdma during driver remove
- Several bug fixes for hns driver features merged into 5.1"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
IB/hfi1: Do not flush send queue in the TID RDMA second leg
RDMA/hns: Bugfix for SCC hem free
RDMA/hns: Fix bug that caused srq creation to fail
RDMA/vmw_pvrdma: Fix memory leak on pvrdma_pci_remove
IB/mlx5: Reset access mask when looping inside page fault handler
IB/hfi1: Fix the allocation of RSM table
IB/hfi1: Eliminate opcode tests on mr deref
IB/hfi1: Clear the IOWAIT pending bits when QP is put into error state
IB/hfi1: Failed to drain send queue when QP is put into error state
Thomas Falcon says:
====================
ibmvnic: Fix netdev features settings on reset
In its current state, a driver reset clobbers any feature settings
a user may have toggled and will disable GRO as it is not explicitly
enabled in the driver. This patch set enables GRO and tries to retain
user settings after a reset. If the underlying carrier changes, however,
the driver will disable features unsupported by the new carrier.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
While determining offload capabilities of backing hardware during
a device reset, the driver is clobbering current feature settings.
Update hw_features on reset instead of features unless a feature
is enabled that is no longer supported on the current backing device.
Also enable features that were not supported prior to the reset but
were previously enabled or requested by the user.
This can occur if the reset is the result of a carrier change, such
as a device failover or partition migration.
Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a patchwork instance behind bcm-kernel-feedback-list that is
helpful to track submissions, add this list for the MIPS BMIPS entry.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Paul Burton <paul.burton@mips.com>
Cc: linux-mips@linux-mips.org
Ido Schimmel says:
====================
mlxsw: Various fixes
This patchset contains various small fixes for mlxsw.
Patch #1 fixes a warning generated by switchdev core when the driver
fails to insert an MDB entry in the commit phase.
Patches #2-#4 fix a warning in check_flush_dependency() that can be
triggered when a work item in a WQ_MEM_RECLAIM workqueue tries to flush
a non-WQ_MEM_RECLAIM workqueue.
It seems that the semantics of the WQ_MEM_RECLAIM flag are not very
clear [1] and that various patches have been sent to remove it from
various workqueues throughout the kernel [2][3][4] in order to silence
the warning.
These patches do the same for the workqueues created by mlxsw that
probably should not have been created with this flag in the first place.
Patch #5 fixes a regression where an IP address cannot be assigned to a
VRF upper due to erroneous MAC validation check. Patch #6 adds a test
case.
Patch #7 adjusts Spectrum-2 shared buffer configuration to be compatible
with Spectrum-1. The problem and fix are described in detail in the
commit message.
Please consider patches #1-#5 for 5.0.y. I verified they apply cleanly.
[1] https://patchwork.kernel.org/patch/10791315/
[2] Commit ce162bfbc0 ("mac80211_hwsim: don't use WQ_MEM_RECLAIM")
[3] Commit 39baf10310 ("IB/core: Fix use workqueue without WQ_MEM_RECLAIM")
[4] Commit 75215e5bb2 ("iwcm: Don't allocate iwcm workqueue with WQ_MEM_RECLAIM")
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
In Spectrum-1, when a multicast packet is admitted to the shared buffer
it increases the quotas of all the ports and {port, TC} to which it is
forwarded to.
The above means that multicast packets are accounted multiple times in
the shared buffer and can therefore cause the associated shared buffer
pool to fill up very quickly.
To work around this issue, commit e83c045e53 ("mlxsw:
spectrum_buffers: Configure MC pool") added a dedicated multicast pool
in which multicast packets are accounted.
The issue is not present in Spectrum-2, but in order to be backward
compatible with Spectrum-1, its default behavior is to allow a multicast
packet to increase multiple egress quotas instead of one.
Until the new (non-backward compatible) mode is supported, configure a
dedicated multicast pool as in Spectrum-1.
Fixes: fe099bf682 ("mlxsw: spectrum_buffers: Add Spectrum-2 shared buffer configuration")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Test that it is possible to set an IP address on a VRF and that it is
not vetoed.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 74bc993974 ("mlxsw: spectrum_router: Veto unsupported RIF MAC
addresses") enabled the driver to veto router interface (RIF) MAC
addresses that it cannot support.
This check should only be performed for interfaces for which the driver
actually configures a RIF. A VRF upper is not one of them, so ignore it.
Without this patch it is not possible to set an IP address on the VRF
device and use it as a loopback.
Fixes: 74bc993974 ("mlxsw: spectrum_router: Veto unsupported RIF MAC addresses")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Alexander Petrovskiy <alexpe@mellanox.com>
Tested-by: Alexander Petrovskiy <alexpe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The workqueue is used to periodically update the networking stack about
activity / statistics of various objects such as neighbours and TC
actions.
It should not be called as part of memory reclaim path, so remove the
WQ_MEM_RECLAIM flag.
Fixes: 3d5479e920 ("mlxsw: core: Remove deprecated create_workqueue")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The EMAD workqueue is used to handle retransmission of EMAD packets that
contain configuration data for the device's firmware.
Given the workers need to allocate these packets and that the code is
not called as part of memory reclaim path, remove the WQ_MEM_RECLAIM
flag.
Fixes: d965465b60 ("mlxsw: core: Fix possible deadlock")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The introduction of multipage bio vectors broke pblk's partial read
logic due to it not being prepared for multipage bio vectors.
Use bio vector iterators instead of direct bio vector indexing.
Fixes: 07173c3ec2 ("block: enable multipage bvecs")
Reported-by: Klaus Jensen <klaus.jensen@cnexlabs.com>
Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com>
Updated description.
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When a QP is put into error state, the send queue will be flushed.
This mechanism is implemented in both the first and the second leg
of the send engine. Since the second leg is only responsible for
data transactions in the KDETH space for the TID RDMA WRITE request,
it should not perform the flushing of the send queue.
This patch removes the flushing function of the second leg, but
still keeps the bailing out of the QP if it is put into error state.
Fixes: 70dcb2e3dc ("IB/hfi1: Add the TID second leg send packet builder")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
symlink body shouldn't be freed without an RCU delay. Switch apparmorfs
to ->destroy_inode() and use of call_rcu(); free both the inode and symlink
body in the callback.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
symlink body shouldn't be freed without an RCU delay. Switch securityfs
to ->destroy_inode() and use of call_rcu(); free both the inode and symlink
body in the callback.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull virtio fixes from Michael Tsirkin:
"Several fixes, add more reviewers to the list"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
virtio: Honour 'may_reduce_num' in vring_create_virtqueue
MAiNTAINERS: add Paolo, Stefan for virtio blk/scsi
virtio_pci: fix a NULL pointer reference in vp_del_vqs
The Maximum Physical Memory 2GiB option for 64bit systems is currently
broken because kernel hangs at boot-time when this option is enabled
and the underlying system has more than 2GiB memory.
This issue can be easily reproduced on SiFive Unleashed board where
we have 8GiB of memory.
This patch fixes above issue by removing unusable memory region in
setup_bootmem().
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
nvme_cancel_request() is used in error handler, and it is always
reliable to cancel request synchronously, and avoids possible race
in which request may be completed after real hw queue is destroyed.
One issue is reported by our customer on NVMe RDMA, in which freed ib
queue pair may be used in nvme_rdma_complete_rq().
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: James Smart <james.smart@broadcom.com>
Cc: linux-nvme@lists.infradead.org
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In NVMe's error handler, follows the typical steps of tearing down
hardware for recovering controller:
1) stop blk_mq hw queues
2) stop the real hw queues
3) cancel in-flight requests via
blk_mq_tagset_busy_iter(tags, cancel_request, ...)
cancel_request():
mark the request as abort
blk_mq_complete_request(req);
4) destroy real hw queues
However, there may be race between #3 and #4, because blk_mq_complete_request()
may run q->mq_ops->complete(rq) remotelly and asynchronously, and
->complete(rq) may be run after #4.
This patch introduces blk_mq_complete_request_sync() for fixing the
above race.
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: James Smart <james.smart@broadcom.com>
Cc: linux-nvme@lists.infradead.org
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
With commit 01396a374c ("s390/zcrypt: revisit ap device remove
procedure") the ap queue remove is now a two stage process. However,
a del_timer_sync() call may trigger the timer function which may
try to lock the very same spinlock as is held by the function
just initiating the del_timer_sync() call. This could end up in
a deadlock situation. Very unlikely but possible as you need to
remove an ap queue at the exact sime time when a timeout of a
request occurs.
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reported-by: Pierre Morel <pmorel@linux.ibm.com>
Fixes: commit 01396a374c ("s390/zcrypt: revisit ap device remove procedure")
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The spinlock in the raw3270_view structure is used by con3270, tty3270
and fs3270 in different ways. For con3270 the lock can be acquired in
irq context, for tty3270 and fs3270 the highest context is bh.
Lockdep sees the view->lock as a single class and if the 3270 driver
is used for the console the following message is generated:
WARNING: inconsistent lock state
5.1.0-rc3-05157-g5c168033979d #12 Not tainted
--------------------------------
inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
swapper/0/1 [HC0[0]:SC1[1]:HE1:SE0] takes:
(____ptrval____) (&(&view->lock)->rlock){?.-.}, at: tty3270_update+0x7c/0x330
Introduce a lockdep subclass for the view lock to distinguish bh from
irq locks.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
When tag_set->nr_maps is 1, the block layer limits the number of hw queues
by nr_cpu_ids. No matter how many hw queues are used by virtio-scsi, as it
has (tag_set->nr_maps == 1), it can use at most nr_cpu_ids hw queues.
In addition, specifically for pci scenario, when the 'num_queues' specified
by qemu is more than maxcpus, virtio-scsi would not be able to allocate
more than maxcpus vectors in order to have a vector for each queue. As a
result, it falls back into MSI-X with one vector for config and one shared
for queues.
Considering above reasons, this patch limits the number of hw queues used
by virtio-scsi by nr_cpu_ids.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When tag_set->nr_maps is 1, the block layer limits the number of hw queues
by nr_cpu_ids. No matter how many hw queues are used by virtio-blk, as it
has (tag_set->nr_maps == 1), it can use at most nr_cpu_ids hw queues.
In addition, specifically for pci scenario, when the 'num-queues' specified
by qemu is more than maxcpus, virtio-blk would not be able to allocate more
than maxcpus vectors in order to have a vector for each queue. As a result,
it falls back into MSI-X with one vector for config and one shared for
queues.
Considering above reasons, this patch limits the number of hw queues used
by virtio-blk by nr_cpu_ids.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The function bfq_bfqq_expire() invokes the function
__bfq_bfqq_expire(), and the latter may free the in-service bfq-queue.
If this happens, then no other instruction of bfq_bfqq_expire() must
be executed, or a use-after-free will occur.
Basing on the assumption that __bfq_bfqq_expire() invokes
bfq_put_queue() on the in-service bfq-queue exactly once, the queue is
assumed to be freed if its refcounter is equal to one right before
invoking __bfq_bfqq_expire().
But, since commit 9dee8b3b05 ("block, bfq: fix queue removal from
weights tree") this assumption is false. __bfq_bfqq_expire() may also
invoke bfq_weights_tree_remove() and, since commit 9dee8b3b05
("block, bfq: fix queue removal from weights tree"), also
the latter function may invoke bfq_put_queue(). So __bfq_bfqq_expire()
may invoke bfq_put_queue() twice, and this is the actual case where
the in-service queue may happen to be freed.
To address this issue, this commit moves the check on the refcounter
of the queue right around the last bfq_put_queue() that may be invoked
on the queue.
Fixes: 9dee8b3b05 ("block, bfq: fix queue removal from weights tree")
Reported-by: Dmitrii Tcvetkov <demfloro@demfloro.ru>
Reported-by: Douglas Anderson <dianders@chromium.org>
Tested-by: Dmitrii Tcvetkov <demfloro@demfloro.ru>
Tested-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
snd_hdac_display_power() doesn't handle the concurrent calls carefully
enough, and it may lead to the doubly get_power or put_power calls,
when a runtime PM and an async work get called in racy way.
This patch addresses it by reusing the bus->lock mutex that has been
used for protecting the link state change in ext bus code, so that it
can protect against racy display state changes. The initialization of
bus->lock was moved from snd_hdac_ext_bus_init() to
snd_hdac_bus_init() as well accordingly.
Testcase: igt/i915_pm_rpm/module-reload #glk-dsi
Reported-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Imre Deak <imre.deak@intel.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The following commit:
a0b0fd53e1 ("locking/lockdep: Free lock classes that are no longer in use")
changed the behavior of lockdep_free_key_range() from
unconditionally zapping lock classes into only zapping lock classes if
debug_lock == true. Not zapping lock classes if debug_lock == false leaves
dangling pointers in several lockdep datastructures, e.g. lock_class::name
in the all_lock_classes list.
The shell command "cat /proc/lockdep" causes the kernel to iterate the
all_lock_classes list. Hence the "unable to handle kernel paging request" cash
that Shenghui encountered by running cat /proc/lockdep.
Since the new behavior can cause cat /proc/lockdep to crash, restore the
pre-v5.1 behavior.
This patch avoids that cat /proc/lockdep triggers the following crash
with debug_lock == false:
BUG: unable to handle kernel paging request at fffffbfff40ca448
RIP: 0010:__asan_load1+0x28/0x50
Call Trace:
string+0xac/0x180
vsnprintf+0x23e/0x820
seq_vprintf+0x82/0xc0
seq_printf+0x92/0xb0
print_name+0x34/0xb0
l_show+0x184/0x200
seq_read+0x59e/0x6c0
proc_reg_read+0x11f/0x170
__vfs_read+0x4d/0x90
vfs_read+0xc5/0x1f0
ksys_read+0xab/0x130
__x64_sys_read+0x43/0x50
do_syscall_64+0x71/0x210
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Reported-by: shenghui <shhuiw@foxmail.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Fixes: a0b0fd53e1 ("locking/lockdep: Free lock classes that are no longer in use") # v5.1-rc1.
Link: https://lkml.kernel.org/r/20190403233552.124673-1-bvanassche@acm.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Handle error before returning when try_module_get() fails
to prevent inconsistent mutex lock/unlock.
Fixes: 52034add7 (ASoC: pcm: update module refcount if
module_get_upon_open is set)
Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Before commit c5459b829b ("LSM: Plumb visibility into optional "enabled"
state"), /sys/module/apparmor/parameters/enabled would show "Y" or "N"
since it was using the "bool" handler. After being changed to "int",
this switched to "1" or "0", breaking the userspace AppArmor detection
of dbus-broker. This restores the Y/N output while keeping the LSM
infrastructure happy.
Before:
$ cat /sys/module/apparmor/parameters/enabled
1
After:
$ cat /sys/module/apparmor/parameters/enabled
Y
Reported-by: David Rheinsberg <david.rheinsberg@gmail.com>
Reviewed-by: David Rheinsberg <david.rheinsberg@gmail.com>
Link: https://lkml.kernel.org/r/CADyDSO6k8vYb1eryT4g6+EHrLCvb68GAbHVWuULkYjcZcYNhhw@mail.gmail.com
Fixes: c5459b829b ("LSM: Plumb visibility into optional "enabled" state")
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: John Johansen <john.johansen@canonical.com>
When master clock is used, master clock rate is set exclusively.
Parent clocks of master clock cannot be changed after a call to
clk_set_rate_exclusive(). So the parent clock of SAI kernel clock
must be set before.
Ensure also that exclusive rate operations are balanced
in STM32 SAI driver.
Signed-off-by: Olivier Moysan <olivier.moysan@st.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Spurious interrupt support was added to perf in the following commit, almost
a decade ago:
63e6be6d98 ("perf, x86: Catch spurious interrupts after disabling counters")
The two previous patches (resolving the race condition when disabling a
PMC and NMI latency mitigation) allow for the removal of this older
spurious interrupt support.
Currently in x86_pmu_stop(), the bit for the PMC in the active_mask bitmap
is cleared before disabling the PMC, which sets up a race condition. This
race condition was mitigated by introducing the running bitmap. That race
condition can be eliminated by first disabling the PMC, waiting for PMC
reset on overflow and then clearing the bit for the PMC in the active_mask
bitmap. The NMI handler will not re-enable a disabled counter.
If x86_pmu_stop() is called from the perf NMI handler, the NMI latency
mitigation support will guard against any unhandled NMI messages.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org> # 4.14.x-
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: https://lkml.kernel.org/r/Message-ID:
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The validation of random PID should be done by checking the
boardinfo->pid instead of info.pid which is empty.
Doing the change the info struture declaration is no longer necessary.
Cc: Boris Brezillon <bbrezillon@kernel.org>
Cc: <stable@vger.kernel.org>
Fixes: 3a379bbcea ("i3c: Add core I3C infrastructure")
Signed-off-by: Vitor Soares <vitor.soares@synopsys.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
The recent commit 8bc0868998 ("powerpc/mm: Only define
MAX_PHYSMEM_BITS in SPARSEMEM configurations") removed our definition
of MAX_PHYSMEM_BITS when SPARSEMEM is disabled.
This inadvertently broke some 64-bit FLATMEM using configs with eg:
arch/powerpc/include/asm/book3s/64/mmu-hash.h:584:6: error: "MAX_PHYSMEM_BITS" is not defined, evaluates to 0
#if (MAX_PHYSMEM_BITS > MAX_EA_BITS_PER_CONTEXT)
^~~~~~~~~~~~~~~~
Fix it by making sure we define MAX_PHYSMEM_BITS for all 64-bit
configs regardless of SPARSEMEM.
Fixes: 8bc0868998 ("powerpc/mm: Only define MAX_PHYSMEM_BITS in SPARSEMEM configurations")
Reported-by: Andreas Schwab <schwab@linux-m68k.org>
Reported-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Badly-designed systems might have (for example) active-high wake pins
that default to high (e.g., because of external pull ups) until they
have an active firmware which starts driving it low. This can cause an
interrupt storm in the time between request_irq() and disable_irq().
We don't support shared interrupts here, so let's just pre-configure the
interrupt to avoid auto-enabling it.
Fixes: fd913ef7ce ("Bluetooth: btusb: Add out-of-band wakeup support")
Fixes: 5364a0b4f4 ("arm64: dts: rockchip: move QCA6174A wakeup pin into its USB node")
Signed-off-by: Brian Norris <briannorris@chromium.org>
Reviewed-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull MIPS fixes from Paul Burton:
"A few minor MIPS fixes:
- Provide struct pt_regs * from get_irq_regs() to kgdb_nmicallback()
when handling an IPI triggered by kgdb_roundup_cpus(), matching the
behavior of other architectures & resolving kgdb issues for SMP
systems.
- Defer a pointer dereference until after a NULL check in the
irq_shutdown callback for SGI IP27 HUB interrupts.
- A defconfig update for the MSCC Ocelot to enable some necessary
drivers"
* tag 'mips_fixes_5.1_2' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: generic: Add switchdev, pinctrl and fit to ocelot_defconfig
MIPS: SGI-IP27: Fix use of unchecked pointer in shutdown_bridge_irq
MIPS: KGDB: fix kgdb support for SMP platforms.
Pull misc fixes from Al Viro:
"A few regression fixes from this cycle"
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
aio: use kmem_cache_free() instead of kfree()
iov_iter: Fix build error without CONFIG_CRYPTO
aio: Fix an error code in __io_submit_one()
If called fast enough so samples do not increment, we can get
division by zero in kernel:
__div0
cpcap_battery_cc_raw_div
cpcap_battery_get_property
power_supply_get_property.part.1
power_supply_get_property
power_supply_show_property
power_supply_uevent
Fixes: 874b2adbed ("power: supply: cpcap-battery: Add a battery driver")
Signed-off-by: Tony Lindgren <tony@atomide.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Although XOR hash function can perform very well on some special use
cases, to align with all drivers, mlx5 driver should use Toeplitz hash
by default.
Toeplitz is more stable for the general use case and it is more standard
and reliable.
On top of that, since XOR (MLX5_RX_HASH_FN_INVERTED_XOR8) gives only a
repeated 8 bits pattern. When used for udp tunneling RSS source port
manipulation it results in fixed source port, which will cause bad RSS
spread.
Fixes: 2be6967cdb ("net/mlx5e: Support ETH_RSS_HASH_XOR")
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
This reverts commit b820e6fb09.
Prior the commit we are reverting, checksum unnecessary was only set when
both the L3 OK and L4 OK bits are set on the CQE. This caused packets
of IP protocols such as SCTP which are not dealt by the current HW L4
parser (hence the L4 OK bit is not set, but the L4 header type none bit
is set) to go through the checksum none code, where currently we wrongly
report checksum unnecessary for them, a regression. Fix this by a revert.
Note that on our usual track we report checksum complete, so the revert
isn't expected to have any notable performance impact. Also, when we are
not on the checksum complete track, the L4 protocols for which we report
checksum none are not high performance ones, we will still report
checksum unnecessary for UDP/TCP.
Fixes: b820e6fb09 ("net/mlx5e: Enable reporting checksum unnecessary also for L3 packets")
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reported-by: Avi Urman <aviu@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
TC encap offload is supported only for the physical uplink
representor. Fail for non uplink representor.
Fixes: 3e621b19b0 ("net/mlx5e: Support TC encapsulation offloads with upper devices")
Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
In the two places is_last_ethertype_ip is being called, the caller will
be looking inside the ip header, to be safe, add ip{4,6} header sanity
check. And return true only on valid ip headers, i.e: the whole header
is contained in the linear part of the skb.
Note: Such situation is very rare and hard to reproduce, since mlx5e
allocates a large enough headroom to contain the largest header one can
imagine.
Fixes: fe1dc06999 ("net/mlx5e: don't set CHECKSUM_COMPLETE on SCTP packets")
Reported-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When an ethernet frame with ip payload is padded, the padding octets are
not covered by the hardware checksum.
Prior to the cited commit, skb checksum was forced to be CHECKSUM_NONE
when padding is detected. After it, the kernel will try to trim the
padding bytes and subtract their checksum from skb->csum.
In this patch we fixup skb->csum for any ip packet with tail padding of
any size, if any padding found.
FCS case is just one special case of this general purpose patch, hence,
it is removed.
Fixes: 88078d98d1 ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"),
Cc: Eric Dumazet <edumazet@google.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
XDP programs might change packets data contents which will make the
reported skb checksum (checksum complete) invalid.
When XDP programs are loaded/unloaded set/clear rx RQs
MLX5E_RQ_STATE_NO_CSUM_COMPLETE flag.
Fixes: 86994156c7 ("net/mlx5e: XDP fast RX drop bpf programs support")
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When requested to recover from error, the tx reporter might open new
channels and close the existing ones. Use safe channels switch flow in
order to guarantee opened channels at the end of the recover flow.
For this purpose, define mlx5e_safe_reopen_channels function and use it
within those flows.
Fixes: de8650a820 ("net/mlx5e: Add tx reporter support")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Skip recover operation if interface is in down state as TX objects are
not open. This fixes a bug were the recover flow re-opened TX objects
which were not opened before, leading to a possible memory leak at
driver unload.
Fixes: de8650a820 ("net/mlx5e: Add tx reporter support")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Flow is kfreed on mlx5_fpga_tls_del_flow but kept in the idr data
structure, this is risky and can cause use-after-free, since the
idr_remove is delayed until tls_send_teardown_cmd completion.
Instead of delaying idr_remove, in this patch we do it on
mlx5_fpga_tls_del_flow, before actually kfree(flow).
Added synchronize_rcu before kfree(flow)
Fixes: ab412e1dd7 ("net/mlx5: Accel, add TLS rx offload routines")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
To avoid use-after-free, hold the rcu read lock until we are done copying
flow data into the command buffer.
Fixes: ab412e1dd7 ("net/mlx5: Accel, add TLS rx offload routines")
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Johannes Berg says:
====================
Various fixes:
* iTXQ fixes from Felix
* tracing fix - increase message length
* fix SW_CRYPTO_CONTROL enforcement
* WMM rule handling for regdomain intersection
* max_interfaces in hwsim - reported by syzbot
* clear private data in some more commands
* a clang compiler warning fix
I added a patch with two new (unused) macros for
rate-limited printing to simplify getting the users
into the tree.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds rv32_defconfig for 32bit systems. The only
difference between rv32_defconfig and defconfig is that
rv32_defconfig has CONFIG_ARCH_RV32I=y.
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
Restore SW_CRYPTO_CONTROL operation on AP_VLAN interfaces for unicast
keys, the original override was intended to be done for group keys as
those are treated specially by mac80211 and would always have been
rejected.
Now the situation is that AP_VLAN support must be enabled by the driver
if it can support it (meaning it can support software crypto GTK TX).
Thus, also simplify the code - if we get here with AP_VLAN and non-
pairwise key, software crypto must be used (driver doesn't know about
the interface) and can be used (driver must've advertised AP_VLAN if
it also uses SW_CRYPTO_CONTROL).
Fixes: db3bdcb9c3 ("mac80211: allow AP_VLAN operation on crypto controlled devices")
Signed-off-by: Alexander Wetzel <alexander@wetzel-home.de>
[rewrite commit message]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This is the third step to make MT2701 HDMI stable.
We should not change the rate of parent for hdmi phy when
doing round_rate for this clock. The parent clock of hdmi
phy must be the same as it. We change it when doing set_rate
only.
Signed-off-by: Wangyan Wang <wangyan.wang@mediatek.com>
Signed-off-by: CK Hu <ck.hu@mediatek.com>
This is the second step to make MT2701 HDMI stable.
The factor depends on the divider of DPI in MT2701, therefore,
we should fix this factor to the right and new one.
Test: search ok
Signed-off-by: Wangyan Wang <wangyan.wang@mediatek.com>
Signed-off-by: CK Hu <ck.hu@mediatek.com>
This is the first step to make MT2701 hdmi stable.
The parent rate of hdmi phy had set by DPI driver.
We should not set or change the parent rate of MT2701 hdmi phy,
as a result we should remove the flags of "CLK_SET_RATE_PARENT"
from the clock of MT2701 hdmi phy.
Signed-off-by: Wangyan Wang <wangyan.wang@mediatek.com>
Signed-off-by: CK Hu <ck.hu@mediatek.com>
Recalculate the rate of this clock, by querying hardware to
make implementation of recalc_rate() to match the definition.
Signed-off-by: Wangyan Wang <wangyan.wang@mediatek.com>
Signed-off-by: CK Hu <ck.hu@mediatek.com>
Due to a clerical error,there is one zero less for 12800000.
Fix it for 128000000
Fixes: 0fc721b296 ("drm/mediatek: add hdmi driver for MT2701 and MT7623")
Signed-off-by: Wangyan Wang <wangyan.wang@mediatek.com>
Signed-off-by: CK Hu <ck.hu@mediatek.com>
ACPICA commit b233720031a480abd438f2e9c643080929d144c3
ASL operation_regions declare a range of addresses that it uses. In a
perfect world, the range of addresses should be used exclusively by
the AML interpreter. The OS can use this information to decide which
drivers to load so that the AML interpreter and device drivers use
different regions of memory.
During table load, the address information is added to a global
address range list. Each node in this list contains an address range
as well as a namespace node of the operation_region. This list is
deleted at ACPI shutdown.
Unfortunately, ASL operation_regions can be declared inside of control
methods. Although this is not recommended, modern firmware contains
such code. New module level code changes unintentionally removed the
functionality of adding and removing nodes to the global address
range list.
A few months ago, support for adding addresses has been re-
implemented. However, the removal of the address range list was
missed and resulted in some systems to crash due to the address list
containing bogus namespace nodes from operation_regions declared in
control methods. In order to fix the crash, this change removes
dynamic operation_regions after control method termination.
Link: https://github.com/acpica/acpica/commit/b2337200
Link: https://bugzilla.kernel.org/show_bug.cgi?id=202475
Fixes: 4abb951b73 ("ACPICA: AML interpreter: add region addresses in global list during initialization")
Reported-by: Michael J Gruber <mjg@fedoraproject.org>
Signed-off-by: Erik Schmauss <erik.schmauss@intel.com>
Signed-off-by: Bob Moore <robert.moore@intel.com>
Cc: 4.20+ <stable@vger.kernel.org> # 4.20+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Push getting the reference for the encoders' power domains into the
encoder get_power_domains() hook instead of doing this from the caller.
This way the encoder can store away the corresponding wakerefs.
This fixes the DSI encoder disabling, which didn't release these
power references it acquired during HW state readout.
Note that longtime ownership for the corresponding wakerefs can be thus
acquired / released in two ways. Nevertheless there is always only one
owner for them:
After HW readout (booting/system resume):
- encoder->get_power_domains() acquires
- encoder->disable*() releases
After a modeset (calling intel_atomic_commit()):
- encoder->enable*() acquires
- encoder->disable*() releases
* can be any of the encoder enable/disable hooks.
v2:
- Check that the DSI io_wakerefs are unset both during encoder HW
readout and enabling. (Chris)
Fixes: 0e6e0be4c9 ("drm/i915: Markup paired operations on display power domains")
Cc: Vandita Kulkarni <vandita.kulkarni@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190407124655.31536-1-imre.deak@intel.com
(cherry picked from commit 3a52fb7e79)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Pull networking fixes from David Miller:
1) Off by one and bounds checking fixes in NFC, from Dan Carpenter.
2) There have been many weird regressions in r8169 since we turned ASPM
support on, some are still not understood nor completely resolved.
Let's turn this back off for now. From Heiner Kallweit.
3) Signess fixes for ethtool speed value handling, from Michael
Zhivich.
4) Handle timestamps properly in macb driver, from Paul Thomas.
5) Two erspan fixes, it's the usual "skb ->data potentially reallocated
and we're holding a stale protocol header pointer". From Lorenzo
Bianconi.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
bnxt_en: Reset device on RX buffer errors.
bnxt_en: Improve RX consumer index validity check.
net: macb driver, check for SKBTX_HW_TSTAMP
qlogic: qlcnic: fix use of SPEED_UNKNOWN ethtool constant
broadcom: tg3: fix use of SPEED_UNKNOWN ethtool constant
ethtool: avoid signed-unsigned comparison in ethtool_validate_speed()
net: ip6_gre: fix possible use-after-free in ip6erspan_rcv
net: ip_gre: fix possible use-after-free in erspan_rcv
r8169: disable ASPM again
MAINTAINERS: ieee802154: update documentation file pattern
net: vrf: Fix ping failed when vrf mtu is set to 0
selftests: add a tc matchall test case
nfc: nci: Potential off by one in ->pipes[] array
NFC: nci: Add some bounds checking in nci_hci_cmd_received()
Pull TPM fixes from James Morris:
"From Jarkko: These are critical fixes for v5.1. Contains also couple
of new selftests for v5.1 features (partial reads in /dev/tpm0)"
* 'fixes-v5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
selftests/tpm2: Open tpm dev in unbuffered mode
selftests/tpm2: Extend tests to cover partial reads
KEYS: trusted: fix -Wvarags warning
tpm: Fix the type of the return value in calc_tpm2_event_size()
KEYS: trusted: allow trusted.ko to initialize w/o a TPM
tpm: fix an invalid condition in tpm_common_poll
tpm: turn on TPM on suspend for TPM 1.x
Pull xtensa fixes from Max Filippov:
- fix syscall number passed to trace_sys_exit
- fix syscall number initialization in start_thread
- fix level interpretation in the return_address
- fix format string warning in init_pmd
* tag 'xtensa-20190408' of git://github.com/jcmvbkbc/linux-xtensa:
xtensa: fix format string warning in init_pmd
xtensa: fix return_address
xtensa: fix initialization of pt_regs::syscall in start_thread
xtensa: use actual syscall number in do_syscall_trace_leave
If scsi cmd sglist is not suitable for DDP then csiostor driver uses
preallocated buffers for DDP, because of this data copy is required from
DDP buffer to scsi cmd sglist before calling ->scsi_done().
Signed-off-by: Varun Prakash <varun@chelsio.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Michael Chan says:
====================
bnxt_en: 2 bug fixes.
The first patch prevents possible driver crash if we get a bad RX index
from the hardware. The second patch resets the device when the hardware
reports buffer error to recover from the error.
Please queue these for -stable also. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
If the RX completion indicates RX buffers errors, the RX ring will be
disabled by firmware and no packets will be received on that ring from
that point on. Recover by resetting the device.
Fixes: c0c050c58d ("bnxt_en: New Broadcom ethernet driver.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is logic to check that the RX/TPA consumer index is the expected
index to work around a hardware problem. However, the potentially bad
consumer index is first used to index into an array to reference an entry.
This can potentially crash if the bad consumer index is beyond legal
range. Improve the logic to use the consumer index for dereferencing
after the validity check and log an error message.
Fixes: fa7e28127a ("bnxt_en: Add workaround to detect bad opaque in rx completion (part 2)")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make sure SKBTX_HW_TSTAMP (i.e. SOF_TIMESTAMPING_TX_HARDWARE) has been
enabled for this skb. It does fix the issue where normal socks that
aren't expecting a timestamp will not wake up on select, but when a
user does want a SOF_TIMESTAMPING_TX_HARDWARE it does work.
Signed-off-by: Paul Thomas <pthomas8589@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Zhivich says:
====================
ethtool: fix use of SPEED_UNKNOWN constant
This patch series addresses 2 related issues:
1. ethtool_validate_speed() triggers a "signed-unsigned comparison"
warning due to type difference of SPEED_UNKNOWN constant (int)
and argument to ethtool_validate_speed (__u32).
2. some drivers use u16 storage for SPEED_UNKNOWN constant,
resulting in value truncation and thus failure to test against
SPEED_UNKNOWN correctly.
This revised series addresses several feedback comments:
- split up the patch in to series
- do not unnecessarily change drivers that use "int" storage
for speed values
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
qlcnic driver uses u16 to store SPEED_UKNOWN ethtool constant,
which is defined as -1, resulting in value truncation and
thus incorrect test results against SPEED_UNKNOWN.
For example, the following test will print "False":
u16 speed = SPEED_UNKNOWN;
if (speed == SPEED_UNKNOWN)
printf("True");
else
printf("False");
Change storage of speed to use u32 to avoid this issue.
Signed-off-by: Michael Zhivich <mzhivich@akamai.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
tg3 driver uses u16 to store SPEED_UKNOWN ethtool constant,
which is defined as -1, resulting in value truncation and
thus incorrect test results against SPEED_UNKNOWN.
For example, the following test will print "False":
u16 speed = SPEED_UNKNOWN;
if (speed == SPEED_UNKNOWN)
printf("True");
else
printf("False");
Change storage of speed to use u32 to avoid this issue.
Signed-off-by: Michael Zhivich <mzhivich@akamai.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
When building C++ userspace code that includes ethtool.h
with "-Werror -Wall", g++ complains about signed-unsigned comparison in
ethtool_validate_speed() due to definition of SPEED_UNKNOWN as -1.
Explicitly cast SPEED_UNKNOWN to __u32 to match type of
ethtool_validate_speed() argument.
Signed-off-by: Michael Zhivich <mzhivich@akamai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lorenzo Bianconi says:
====================
fix possible use-after-free in erspan_v{4,6}
Similar to what I did in commit bb9bd814eb ("ipv6: sit: reset ip
header pointer in ipip6_rcv"), fix possible use-after-free in
erspan_rcv and ip6erspan_rcv extracting tunnel metadata since the
packet can be 'uncloned' running __iptunnel_pull_header
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
erspan_v6 tunnels run __iptunnel_pull_header on received skbs to remove
erspan header. This can determine a possible use-after-free accessing
pkt_md pointer in ip6erspan_rcv since the packet will be 'uncloned'
running pskb_expand_head if it is a cloned gso skb (e.g if the packet has
been sent though a veth device). Fix it resetting pkt_md pointer after
__iptunnel_pull_header
Fixes: 1d7e2ed22f ("net: erspan: refactor existing erspan code")
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
erspan tunnels run __iptunnel_pull_header on received skbs to remove
gre and erspan headers. This can determine a possible use-after-free
accessing pkt_md pointer in erspan_rcv since the packet will be 'uncloned'
running pskb_expand_head if it is a cloned gso skb (e.g if the packet has
been sent though a veth device). Fix it resetting pkt_md pointer after
__iptunnel_pull_header
Fixes: 1d7e2ed22f ("net: erspan: refactor existing erspan code")
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Three new tests added:
1. Send get random cmd, read header in 1st read, read the rest in second
read - expect success
2. Send get random cmd, read only part of the response, send another
get random command, read the response - expect success
3. Send get random cmd followed by another get random cmd, without
reading the first response - expect the second cmd to fail with -EBUSY
Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: James Morris <james.morris@microsoft.com>
Fixes the warning reported by Clang:
security/keys/trusted.c:146:17: warning: passing an object that
undergoes default
argument promotion to 'va_start' has undefined behavior [-Wvarargs]
va_start(argp, h3);
^
security/keys/trusted.c:126:37: note: parameter of type 'unsigned
char' is declared here
unsigned char *h2, unsigned char h3, ...)
^
Specifically, it seems that both the C90 (4.8.1.1) and C11 (7.16.1.4)
standards explicitly call this out as undefined behavior:
The parameter parmN is the identifier of the rightmost parameter in
the variable parameter list in the function definition (the one just
before the ...). If the parameter parmN is declared with ... or with a
type that is not compatible with the type that results after
application of the default argument promotions, the behavior is
undefined.
Link: https://github.com/ClangBuiltLinux/linux/issues/41
Link: https://www.eskimo.com/~scs/cclass/int/sx11c.html
Suggested-by: David Laight <David.Laight@aculab.com>
Suggested-by: Denis Kenzior <denkenz@gmail.com>
Suggested-by: James Bottomley <jejb@linux.vnet.ibm.com>
Suggested-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Tested-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: James Morris <james.morris@microsoft.com>
The poll condition should only check response_length,
because reads should only be issued if there is data to read.
The response_read flag only prevents double writes.
The problem was that the write set the response_read to false,
enqued a tpm job, and returned. Then application called poll
which checked the response_read flag and returned EPOLLIN.
Then the application called read, but got nothing.
After all that the async_work kicked in.
Added also mutex_lock around the poll check to prevent
other possible race conditions.
Fixes: 9488585b21 ("tpm: add support for partial reads")
Reported-by: Mantas Mikulėnas <grawity@gmail.com>
Tested-by: Mantas Mikulėnas <grawity@gmail.com>
Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: James Morris <james.morris@microsoft.com>
tpm_chip_start/stop() should be also called for TPM 1.x devices on
suspend. Add that functionality back. Do not lock the chip because
it is unnecessary as there are no multiple threads using it when
doing the suspend.
Fixes: a3fbfae82b ("tpm: take TPM chip power gating out of tpm_transmit()")
Reported-by: Paul Zimmerman <pauldzim@gmail.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Tested-by: Domenico Andreoli <domenico.andreoli@linux.com>
Signed-off-by: James Morris <james.morris@microsoft.com>
There's a significant number of reports that re-enabling ASPM causes
different issues, ranging from decreased performance to system not
booting at all. This affects only a minority of users, but the number
of affected users is big enough that we better switch off ASPM again.
This will hurt notebook users who are not affected by the issues, they
may see decreased battery runtime w/o ASPM. With the PCI core folks is
being discussed to add generic sysfs attributes to control ASPM.
Once this is in place brave enough users can re-enable ASPM on their
system.
Fixes: a99790bf5c ("r8169: Reinstate ASPM Support")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
vring_create_virtqueue() allows the caller to specify via the
may_reduce_num parameter whether the vring code is allowed to
allocate a smaller ring than specified.
However, the split ring allocation code tries to allocate a
smaller ring on allocation failure regardless of what the
caller specified. This may cause trouble for e.g. virtio-pci
in legacy mode, which does not support ring resizing. (The
packed ring code does not resize in any case.)
Let's fix this by bailing out immediately in the split ring code
if the requested size cannot be allocated and may_reduce_num has
not been specified.
While at it, fix a typo in the usage instructions.
Fixes: 2a2d1382fe ("virtio: Add improved queue allocation API")
Cc: stable@vger.kernel.org # v4.6+
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
Reviewed-by: Jens Freimann <jfreimann@redhat.com>
When moving the documentation for the ieee802154 subsystem from
plain text to rst the file pattern in the MAINTAINERS file got wrong.
Updating it here to fix scripts using this file.
Reported-by: Joe Perches <joe@perches.com>
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Calling dump_backtrace() with a pt_regs argument corresponding to
userspace doesn't make any sense and our unwinder will simply print
"Call trace:" before unwinding the stack looking for user frames.
Rather than go through this song and dance, just return early if we're
passed a user register state.
Cc: <stable@vger.kernel.org>
Fixes: 1149aad10b ("arm64: Add dump_backtrace() in show_regs")
Reported-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
This options spawns a kernel side thread that will poll for submissions
(and completions, if IORING_SETUP_IOPOLL is set). As this allows a user
to potentially use more cycles outside of the normal hierarchy,
restrict the use of this feature to root.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This ended up not being included in the mainline version of io_uring,
so drop it from the test app as well.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If there are multiple callbacks queued, waiting for the callback
slot when the callback gets shut down, then they all currently
end up acting as if they hold the slot, and call
nfsd4_cb_sequence_done() resulting in interesting side-effects.
In addition, the 'retry_nowait' path in nfsd4_cb_sequence_done()
causes a loop back to nfsd4_cb_prepare() without first freeing the
slot, which causes a deadlock when nfsd41_cb_get_slot() gets called
a second time.
This patch therefore adds a boolean to track whether or not the
callback did pick up the slot, so that it can do the right thing
in these 2 cases.
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The method of hem free for SCC context is different from qp context.
In the current version, if free SCC hem during the execution of qp free,
there may be smmu error as below:
arm-smmu-v3 arm-smmu-v3.1.auto: event 0x10 received:
arm-smmu-v3 arm-smmu-v3.1.auto: 0x00007d0000000010
arm-smmu-v3 arm-smmu-v3.1.auto: 0x000012000000017c
arm-smmu-v3 arm-smmu-v3.1.auto: 0x00000000000009e0
arm-smmu-v3 arm-smmu-v3.1.auto: 0x0000000000000000
As SCC context is still used by hardware after qp free, we can solve this
problem by removing SCC hem free from hns_roce_qp_free.
Fixes: 6a157f7d1b ("RDMA/hns: Add SCC context allocation support for hip08")
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Overwrite retains the security state after completion of operation. Fix
nfit_test to reflect this so that the kernel can test the behavior it is
more likely to see in practice.
Fixes: 926f74802c ("tools/testing/nvdimm: Add overwrite support for nfit_test")
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Due to the incorrect use of the seg and obj information, the position of
the mtt is calculated incorrectly, and the free space of the page is not
enough to store the entire mtt, resulting in access to the next page. This
patch fixes this problem.
Unable to handle kernel paging request at virtual address ffff00006e3cd000
...
Call trace:
hns_roce_write_mtt+0x154/0x2f0 [hns_roce]
hns_roce_buf_write_mtt+0xa8/0xd8 [hns_roce]
hns_roce_create_srq+0x74c/0x808 [hns_roce]
ib_create_srq+0x28/0xc8
Fixes: 0203b14c4f ("RDMA/hns: Unify the calculation for hem index in hip08")
Signed-off-by: chenglang <chenglang@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Make sure to free the DSR on pvrdma_pci_remove() to avoid the memory leak.
Fixes: 29c8d9eba5 ("IB: Add vmw_pvrdma driver")
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Acked-by: Adit Ranadive <aditr@vmware.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The ftrace trampoline code (which deals with modules loaded out of
BL range of the core kernel) uses plt_entries_equal() to check whether
the per-module trampoline equals a zero buffer, to decide whether the
trampoline has already been initialized.
This triggers a BUG() in the opcode manipulation code, since we end
up checking the ADRP offset of a 0x0 opcode, which is not an ADRP
instruction.
So instead, add a helper to check whether a PLT is initialized, and
call that from the frace code.
Cc: <stable@vger.kernel.org> # v5.0
Fixes: bdb85cd1d2 ("arm64/module: switch to ADRP/ADD sequences for PLT entries")
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
In case of 'L1_CACHE_SHIFT != 6' we define dummy assembly macroses
PREALLOC_INSTR and PREFETCHW_INSTR without arguments. However
we pass arguments to them in code which cause build errors.
Fix that.
Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
Cc: <stable@vger.kernel.org> [5.0]
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
[Why]
If the cursor pos passed from DM is less than the plane_state->dst_rect
top left corner then the unsigned cursor pos wraps around to a large
positive number since cursor pos is a u32.
There was an attempt to guard against this in hubp1_cursor_set_position
by checking the src_x_offset and src_y_offset and offseting the
cursor hotspot within hubp1_cursor_set_position.
However, the cursor position itself is still being programmed
incorrectly as a large value.
This manifests itself visually as the cursor disappearing or containing
strange artifacts near the middle of the screen on raven.
[How]
Don't subtract the destination rect top left corner from the pos but
add it to the hotspot instead. This happens before the pos gets
passed into hubp1_cursor_set_position.
This achieves the same result but avoids the subtraction wrap around.
With this fix the original cursor programming logic can be used again.
v2: add hunk that got dropped accidently when this patch was originally
committed. (Alex)
Fixes: 0921c41e19 ("drm/amd/display: Fix negative cursor pos programming")
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Reviewed-by: Charlene Liu <Charlene.Liu@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Acked-by: Murton Liu <Murton.Liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Commit 6dc4f100c1 ("block: allow bio_for_each_segment_all() to
iterate over multi-page bvec") changes bio_for_each_segment_all()
to use for-inside-for.
This way breaks all bio_for_each_segment_all() call with error out
branch via 'break', since now 'break' can only break from the inner
loop.
Fixes this issue by implementing bio_for_each_segment_all() via
single 'for' loop, and now the logic is very similar with normal
bvec iterator.
Cc: Qu Wenruo <quwenruo.btrfs@gmx.com>
Cc: linux-btrfs@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Cc: Omar Sandoval <osandov@fb.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reported-and-Tested-by: Qu Wenruo <quwenruo.btrfs@gmx.com>
Fixes: 6dc4f100c1 ("block: allow bio_for_each_segment_all() to iterate over multi-page bvec")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jason doesn't really have the time to review blk/scsi
patches. Paolo and Setfan agreed to help out.
Thanks guys!
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Commit 48e7b76957 ("powerpc/64s/hash: Convert SLB miss handlers to C")
broke the radix-mode segment exception handler. In radix mode, this is
exception is not an SLB miss, rather it signals that the EA is outside
the range translated by any page table.
The commit lost the radix feature alternate code patch, which can
cause faults to some EAs to kernel BUG at arch/powerpc/mm/slb.c:639!
The original radix code would send faults to slb_miss_large_addr,
which would end up faulting due to slb_addr_limit being 0. This patch
sends radix directly to do_bad_slb_fault, which is a bit clearer.
Fixes: 48e7b76957 ("powerpc/64s/hash: Convert SLB miss handlers to C")
Cc: stable@vger.kernel.org # v4.20+
Reported-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Currently there is no way for the driver to signal to mac80211 that it should
schedule a TXQ even if there are no packets on the mac80211 part of that queue.
This is problematic if the driver has an internal retry queue to deal with
software A-MPDU retry.
This patch changes the behavior of ieee80211_schedule_txq to always schedule
the queue, as its only user (ath9k) seems to expect such behavior already:
it calls this function on tx status and on powersave wakeup whenever its
internal retry queue is not empty.
Also add an extra argument to ieee80211_return_txq to get the same behavior.
This fixes an issue on ath9k where tx queues with packets to retry (and no
new packets in mac80211) would not get serviced.
Fixes: 89cea7493a ("ath9k: Switch to mac80211 TXQ scheduling and airtime APIs")
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
If we just set this to 2048, and have multiple limits you
can select from, the total number might run over and cause
a warning in cfg80211. This doesn't make sense, so we just
calculate the total max_interfaces now.
Reported-by: syzbot+8f91bd563bbff230d0ee@syzkaller.appspotmail.com
Fixes: 99e3a44bac ("mac80211_hwsim: allow setting iftype support")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
There are two problems here:
1. Not all clk_data->hws[] need to be initialized, depending on various
configured quirks. This leads to NULL ptr deref in
clk_hw_unregister_gate() in sun8i_tcon_top_unbind()
2. If there is error when registering the clk_data->hws[],
err_unregister_gates error path will try to unregister
IS_ERR()=true (invalid) pointer.
For problem (1) I have this stack trace:
Unable to handle kernel NULL pointer dereference at virtual
address 0000000000000008
Call trace:
clk_hw_unregister+0x8/0x18
clk_hw_unregister_gate+0x14/0x28
sun8i_tcon_top_unbind+0x2c/0x60
component_unbind.isra.4+0x2c/0x50
component_bind_all+0x1d4/0x230
sun4i_drv_bind+0xc4/0x1a0
try_to_bring_up_master+0x164/0x1c0
__component_add+0xa0/0x168
component_add+0x10/0x18
sun8i_dw_hdmi_probe+0x18/0x20
platform_drv_probe+0x3c/0x70
really_probe+0xcc/0x278
driver_probe_device+0x34/0xa8
Problem (2) was identified by head scratching.
Signed-off-by: Ondrej Jirman <megous@megous.com>
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190405233048.3823-1-megous@megous.com
Setting the module_get_upon_open field for component driver
prevents the module refcount from being incremented during
component probe(). This could lead to the module being
allowed to be unloaded when a pcm stream is open. So,
if this field is set, the module's refcount should be
incremented during pcm open to prevent module removal
when the component is in use. And, the refcount should
be decremented upon pcm close.
Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Acked-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Recently, for Intel platforms the "ignore_module_refcount" field
was introduced for the component driver. In order to avoid a
deadlock preventing the PCI modules from being removed
even when the card was idle, the refcounts were not incremented
for the device driver module during component probe.
However, this change introduced a nasty side effect:
the device driver module can be unloaded while a pcm stream is open.
This patch proposes to change the field to be renamed as
"module_get_upon_open". When this field is set, the module
refcount should be incremented on pcm open amd decremented
upon pcm close. This will enable modules to be removed
when no PCM playback/capture happens and prevent removal
when the component is actually in use.
Also, align with the skylake component driver with the new name.
Fixes: b450b878('ASoC: core: don't increase component module refcount
unconditionally'
Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Acked-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
The x86_64 implementation of Poly1305 produces the wrong result on some
inputs because poly1305_4block_avx2() incorrectly assumes that when
partially reducing the accumulator, the bits carried from limb 'd4' to
limb 'h0' fit in a 32-bit integer. This is true for poly1305-generic
which processes only one block at a time. However, it's not true for
the AVX2 implementation, which processes 4 blocks at a time and
therefore can produce intermediate limbs about 4x larger.
Fix it by making the relevant calculations use 64-bit arithmetic rather
than 32-bit. Note that most of the carries already used 64-bit
arithmetic, but the d4 -> h0 carry was different for some reason.
To be safe I also made the same change to the corresponding SSE2 code,
though that only operates on 1 or 2 blocks at a time. I don't think
it's really needed for poly1305_block_sse2(), but it doesn't hurt
because it's already x86_64 code. It *might* be needed for
poly1305_2block_sse2(), but overflows aren't easy to reproduce there.
This bug was originally detected by my patches that improve testmgr to
fuzz algorithms against their generic implementation. But also add a
test vector which reproduces it directly (in the AVX2 case).
Fixes: b1ccc8f4b6 ("crypto: poly1305 - Add a four block AVX2 variant for x86_64")
Fixes: c70f4abef0 ("crypto: poly1305 - Add a SSE2 SIMD variant for x86_64")
Cc: <stable@vger.kernel.org> # v4.3+
Cc: Martin Willi <martin@strongswan.org>
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch fixes the sai driver structure overwriting which results in
a cpu dai name equal NULL.
Fixes: 3e086ed ("ASoC: stm32: add SAI driver")
Signed-off-by: Arnaud Pouliquen <arnaud.pouliquen@st.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
The control values and texts of the enum kcontrol associated
with a widget need to be freed when the widget is removed.
However, both struct snd_soc_dapm_widget and struct soc_enum
contain a dobj member, which resulted in a confusion.
The existing code generates a null pointer dereference by
attempting to free the values and texts from the dobj which
belongs to the widget instead of the dobj belonging to the
enum kcontrol.
The suggested fix is to use the correct dobj member (se->dobj)
of the enum kcontrol.
Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Pull crypto fix from Herbert Xu:
"This fixes a bug in the implementation of xcbc and cmac in caam"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: caam - fix copy of next buffer for xcbc and cmac
When the mtu of a vrf device is set to 0, it would cause ping
failed. So I think we should limit vrf mtu in a reasonable range
to solve this problem. I set dev->min_mtu to IPV6_MIN_MTU, so it
will works for both ipv4 and ipv6. And if dev->max_mtu still be 0
can be confusing, so I set dev->max_mtu to ETH_MAX_MTU.
Here is the reproduce step:
1.Config vrf interface and set mtu to 0:
3: enp4s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel
master vrf1 state UP mode DEFAULT group default qlen 1000
link/ether 52:54:00:9e:dd:c1 brd ff:ff:ff:ff:ff:ff
2.Ping peer:
3: enp4s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel
master vrf1 state UP group default qlen 1000
link/ether 52:54:00:9e:dd:c1 brd ff:ff:ff:ff:ff:ff
inet 10.0.0.1/16 scope global enp4s0
valid_lft forever preferred_lft forever
connect: Network is unreachable
3.Set mtu to default value, ping works:
PING 10.0.0.2 (10.0.0.2) 56(84) bytes of data.
64 bytes from 10.0.0.2: icmp_seq=1 ttl=64 time=1.88 ms
Fixes: ad49bc6361 ("net: vrf: remove MTU limits for vrf device")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The commit 510ded33e0 ("slab: implement slab_root_caches list")
changes the name of the list node within "struct kmem_cache" from "list"
to "root_caches_node", but leaks_show() still use the "list" which
causes a crash when reading /proc/slab_allocators.
You need to have CONFIG_SLAB=y and CONFIG_MEMCG=y to see the problem,
because without MEMCG all slab caches are root caches, and the "list"
node happens to be the right one.
Fixes: 510ded33e0 ("slab: implement slab_root_caches list")
Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: Tobin C. Harding <tobin@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ppgtt_free_all_spt() iterates the radixtree as it is deleting it,
forgoing all protection against the leaves being freed in the process
(leaving the iter pointing into the void).
A minimal fix seems to be to use the available post_shadow_list to
decompose the tree into a list prior to destroying the radixtree.
Alerted by the sparse warnings:
drivers/gpu/drm/i915/gvt/gtt.c:757:9: warning: incorrect type in assignment (different address spaces)
drivers/gpu/drm/i915/gvt/gtt.c:757:9: expected void **slot
drivers/gpu/drm/i915/gvt/gtt.c:757:9: got void [noderef] <asn:4> **
drivers/gpu/drm/i915/gvt/gtt.c:757:9: warning: incorrect type in assignment (different address spaces)
drivers/gpu/drm/i915/gvt/gtt.c:757:9: expected void **slot
drivers/gpu/drm/i915/gvt/gtt.c:757:9: got void [noderef] <asn:4> **
drivers/gpu/drm/i915/gvt/gtt.c:758:45: warning: incorrect type in argument 1 (different address spaces)
drivers/gpu/drm/i915/gvt/gtt.c:758:45: expected void [noderef] <asn:4> **slot
drivers/gpu/drm/i915/gvt/gtt.c:758:45: got void **slot
drivers/gpu/drm/i915/gvt/gtt.c:757:9: warning: incorrect type in argument 1 (different address spaces)
drivers/gpu/drm/i915/gvt/gtt.c:757:9: expected void [noderef] <asn:4> **slot
drivers/gpu/drm/i915/gvt/gtt.c:757:9: got void **slot
drivers/gpu/drm/i915/gvt/gtt.c:757:9: warning: incorrect type in assignment (different address spaces)
drivers/gpu/drm/i915/gvt/gtt.c:757:9: expected void **slot
drivers/gpu/drm/i915/gvt/gtt.c:757:9: got void [noderef] <asn:4> **
This would also have been loudly warning if run through CI for the
invalid RCU dereferences.
Fixes: b6c126a393 ("drm/i915/gvt: Manage shadow pages with radix tree")
Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Changbin Du <changbin.du@intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Fix the sparse warning for blithely using iomem with normal memcpy:
drivers/gpu/drm/i915/gvt/kvmgt.c:916:21: warning: incorrect type in assignment (different address spaces)
drivers/gpu/drm/i915/gvt/kvmgt.c:916:21: expected void *aperture_va
drivers/gpu/drm/i915/gvt/kvmgt.c:916:21: got void [noderef] <asn:2> *
drivers/gpu/drm/i915/gvt/kvmgt.c:927:26: warning: incorrect type in argument 1 (different address spaces)
drivers/gpu/drm/i915/gvt/kvmgt.c:927:26: expected void [noderef] <asn:2> *vaddr
drivers/gpu/drm/i915/gvt/kvmgt.c:927:26: got void *aperture_va
Fixes: d480b28a41 ("drm/i915/gvt: Fix aperture read/write emulation when enable x-no-mmap=on")
Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Changbin Du <changbin.du@intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
This is a follow up of the commit 0db6f8befc ("net/sched: fix ->get
helper of the matchall cls").
To test it:
$ cd tools/testing/selftests/tc-testing
$ ln -s ../plugin-lib/nsPlugin.py plugins/20-nsPlugin.py
$ ./tdc.py -n -e 2638
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull ARM SoC fixes from Olof Johansson:
"A collection of fixes from the last few weeks. Most of them are
smaller tweaks and fixes to DT and hardware descriptions for boards.
Some of the more significant ones are:
- eMMC and RGMII stability tweaks for rk3288
- DDC fixes for Rock PI 4
- Audio fixes for two TI am335x eval boards
- D_CAN clock fix for am335x
- Compilation fixes for clang
- !HOTPLUG_CPU compilation fix for one of the new platforms this
release (milbeaut)
- A revert of a gpio fix for nomadik that instead was fixed in the
gpio subsystem
- Whitespace fix for the DT JSON schema (no tabs allowed)"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (25 commits)
ARM: milbeaut: fix build with !CONFIG_HOTPLUG_CPU
ARM: iop: don't use using 64-bit DMA masks
ARM: orion: don't use using 64-bit DMA masks
Revert "ARM: dts: nomadik: Fix polarity of SPI CS"
dt-bindings: cpu: Fix JSON schema
arm/mach-at91/pm : fix possible object reference leak
ARM: dts: at91: Fix typo in ISC_D0 on PC9
ARM: dts: Fix dcan clkctrl clock for am3
reset: meson-audio-arb: Fix missing .owner setting of reset_controller_dev
dt-bindings: reset: meson-g12a: Add missing USB2 PHY resets
ARM: dts: rockchip: Remove #address/#size-cells from rk3288-veyron gpio-keys
ARM: dts: rockchip: Remove #address/#size-cells from rk3288 mipi_dsi
ARM: dts: rockchip: Fix gpu opp node names for rk3288
ARM: dts: am335x-evmsk: Correct the regulators for the audio codec
ARM: dts: am335x-evm: Correct the regulators for the audio codec
ARM: OMAP2+: add missing of_node_put after of_device_is_available
ARM: OMAP1: ams-delta: Fix broken GPIO ID allocation
arm64: dts: stratix10: add the sysmgr-syscon property from the gmac's
arm64: dts: rockchip: fix rk3328 sdmmc0 write errors
arm64: dts: rockchip: fix rk3328 rgmii high tx error rate
...
Pull block fixes from Jens Axboe:
- Fixups for the pf/pcd queue handling (YueHaibing)
- Revert of the three direct issue changes as they have been proven to
cause an issue with dm-mpath (Bart)
- Plug rq_count reset fix (Dongli)
- io_uring double free in fileset registration error handling (me)
- Make null_blk handle bad numa node passed in (John)
- BFQ ifdef fix (Konstantin)
- Flush queue leak fix (Shenghui)
- Plug trace fix (Yufen)
* tag 'for-linus-20190407' of git://git.kernel.dk/linux-block:
xsysace: Fix error handling in ace_setup
null_blk: prevent crash from bad home_node value
block: Revert v5.0 blk_mq_request_issue_directly() changes
paride/pcd: Fix potential NULL pointer dereference and mem leak
blk-mq: do not reset plug->rq_count before the list is sorted
paride/pf: Fix potential NULL pointer dereference
io_uring: fix double free in case of fileset regitration failure
blk-mq: add trace block plug and unplug for multiple queues
block: use blk_free_flush_queue() to free hctx->fq in blk_mq_init_hctx
block/bfq: fix ifdef for CONFIG_BFQ_GROUP_IOSCHED=y
When HOTPLUG_CPU is disabled, some fields in the smp operations
are not available or needed:
arch/arm/mach-milbeaut/platsmp.c:90:3: error: field designator 'cpu_die' does not refer to any field in type
'struct smp_operations'
.cpu_die = m10v_cpu_die,
^
arch/arm/mach-milbeaut/platsmp.c:91:3: error: field designator 'cpu_kill' does not refer to any field in type
'struct smp_operations'
.cpu_kill = m10v_cpu_kill,
^
Hide them in an #ifdef like the other platforms do.
Fixes: 9fb29c734f ("ARM: milbeaut: Add basic support for Milbeaut m10v SoC")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Olof Johansson <olof@lixom.net>
clang warns about statically defined DMA masks from the DMA_BIT_MASK
macro with length 64:
arch/arm/mach-iop13xx/setup.c:303:35: error: shift count >= width of type [-Werror,-Wshift-count-overflow]
static u64 iop13xx_adma_dmamask = DMA_BIT_MASK(64);
^~~~~~~~~~~~~~~~
include/linux/dma-mapping.h:141:54: note: expanded from macro 'DMA_BIT_MASK'
#define DMA_BIT_MASK(n) (((n) == 64) ? ~0ULL : ((1ULL<<(n))-1))
^ ~~~
The ones in iop shouldn't really be 64 bit masks, so changing them
to what the driver can support avoids the warning.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Olof Johansson <olof@lixom.net>
clang warns about statically defined DMA masks from the DMA_BIT_MASK
macro with length 64:
arch/arm/plat-orion/common.c:625:29: error: shift count >= width of type [-Werror,-Wshift-count-overflow]
.coherent_dma_mask = DMA_BIT_MASK(64),
^~~~~~~~~~~~~~~~
include/linux/dma-mapping.h:141:54: note: expanded from macro 'DMA_BIT_MASK'
#define DMA_BIT_MASK(n) (((n) == 64) ? ~0ULL : ((1ULL<<(n))-1))
The ones in orion shouldn't really be 64 bit masks, so changing them
to what the driver can support avoids the warning.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Olof Johansson <olof@lixom.net>
This reverts commit fa9463564e.
Per Linus Walleij:
Dear ARM SoC maintainers,
can you please revert this patch. It was the wrong solution to the
wrong problem, and I must have acted in stress. Andrey fixed the
real bug in a proper way in these commits:
commit e5545c94e4
"gpio: of: Check propname before applying "cs-gpios" quirks"
commit 7ce40277bf
"gpio: of: Check for "spi-cs-high" in child instead of parent node"
Signed-off-by: Olof Johansson <olof@lixom.net>
Fixes for omaps for v5.1-rc cycle
Few small fixes for omap variants:
- Fix ams-delta gpio IDs
- Add missing of_node_put for omapdss platform init code
- Fix unconfigured audio regulators for two am335x boards
- Fix use of wrong offset for am335x d_can clocks
* tag 'omap-for-v5.1/fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
ARM: dts: Fix dcan clkctrl clock for am3
ARM: dts: am335x-evmsk: Correct the regulators for the audio codec
ARM: dts: am335x-evm: Correct the regulators for the audio codec
ARM: OMAP2+: add missing of_node_put after of_device_is_available
ARM: OMAP1: ams-delta: Fix broken GPIO ID allocation
Signed-off-by: Olof Johansson <olof@lixom.net>
AT91 fixes for 5.1
- fix a typo in sama5d2 pinmuxing which concerns the ISC data 0 signal
- fix a kobject reference leak
* tag 'at91-5.1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/at91/linux:
arm/mach-at91/pm : fix possible object reference leak
ARM: dts: at91: Fix typo in ISC_D0 on PC9
Signed-off-by: Olof Johansson <olof@lixom.net>
Fixes for dtc warnings, fixes for ethernet transfers on rk3328,
sd-card related fixes on both rk3328 ans rk3288-tinker and a
regulator fix on rock64 and making ddc actually work on the
Rock PI 4 due to missing the ddc bus.
* tag 'v5.1-rockchip-dtfixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
ARM: dts: rockchip: Remove #address/#size-cells from rk3288-veyron gpio-keys
ARM: dts: rockchip: Remove #address/#size-cells from rk3288 mipi_dsi
ARM: dts: rockchip: Fix gpu opp node names for rk3288
arm64: dts: rockchip: fix rk3328 sdmmc0 write errors
arm64: dts: rockchip: fix rk3328 rgmii high tx error rate
ARM: dts: rockchip: Fix SD card detection on rk3288-tinker
arm64: dts: rockchip: Fix vcc_host1_5v GPIO polarity on rk3328-rock64
ARM: dts: rockchip: fix rk3288 cpu opp node reference
arm64: dts: rockchip: add DDC bus on Rock Pi 4
arm64: dts: rockchip: fix rk3328-roc-cc gmac2io tx/rx_delay
Signed-off-by: Olof Johansson <olof@lixom.net>
arm64: dts: stratix10: fix emac loading warning
- Add missing "altr,sysmgr-syscon" property to all gmac nodes
* tag 'stratix10_fix_for_v5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux:
arm64: dts: stratix10: add the sysmgr-syscon property from the gmac's
Signed-off-by: Olof Johansson <olof@lixom.net>
Reset controller fixes for v5.1
This tag adds missing USB PHY reset lines to the Meson G12A reset
controller header and fixes the Meson Audio ARB driver to prevent
module unloading while it is in use.
* tag 'reset-fixes-for-v5.1' of git://git.pengutronix.de/pza/linux:
reset: meson-audio-arb: Fix missing .owner setting of reset_controller_dev
dt-bindings: reset: meson-g12a: Add missing USB2 PHY resets
Signed-off-by: Olof Johansson <olof@lixom.net>
Commit fd73403a48 ("dt-bindings: arm: Add SMP enable-method for
Milbeaut") added support for a new cpu enable-method, but did so using
tabulations to ident. This is however invalid in the syntax, and resulted
in a failure when trying to use that schemas for validation.
Use spaces instead of tabs to indent to fix this.
Fixes: fd73403a48 ("dt-bindings: arm: Add SMP enable-method for Milbeaut")
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Acked-by: Sugaya Taichi <sugaya.taichi@socionext.com>
Signed-off-by: Olof Johansson <olof@lixom.net>
Commit b5b4453e79 ("powerpc/vdso64: Fix CLOCK_MONOTONIC
inconsistencies across Y2038") changed the type of wtom_clock_sec
to s64 on PPC64. Therefore, VDSO32 needs to read it with a 4 bytes
shift in order to retrieve the lower part of it.
Fixes: b5b4453e79 ("powerpc/vdso64: Fix CLOCK_MONOTONIC inconsistencies across Y2038")
Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Pull xen fixes from Juergen Gross:
"One minor fix and a small cleanup for the xen privcmd driver"
* tag 'for-linus-5.1b-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen: Prevent buffer overflow in privcmd ioctl
xen: use struct_size() helper in kzalloc()
Pull MTD fix from Richard Weinberger:
"A single fix for a possible infinite loop in the cfi_cmdset_0002
driver"
* tag 'mtd/fixes-for-5.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
mtd: cfi: fix deadloop in cfi_cmdset_0002.c do_write_buffer
Pull SCSI fixes from James Bottomley:
"Five small fixes. Four in three drivers: qedi, lpfc and storvsc. The
final one is labelled core, but merely adds a dh rdac entry for Lenovo
systems"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: lpfc: Fix missing wakeups on abort threads
scsi: storvsc: Reduce default ring buffer size to 128 Kbytes
scsi: storvsc: Fix calculation of sub-channel count
scsi: core: add new RDAC LENOVO/DE_Series device
scsi: qedi: remove declaration of nvm_image from stack
This is similar to commit e285d5bfb7 ("NFC: Fix the number of pipes")
where we changed NFC_HCI_MAX_PIPES from 127 to 128.
As the comment next to the define explains, the pipe identifier is 7
bits long. The highest possible pipe is 127, but the number of possible
pipes is 128. As the code is now, then there is potential for an
out of bounds array access:
net/nfc/nci/hci.c:297 nci_hci_cmd_received() warn: array off by one?
'ndev->hci_dev->pipes[pipe]' '0-127 == 127'
Fixes: 11f54f2286 ("NFC: nci: Add HCI over NCI protocol support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is similar to commit 674d9de02a ("NFC: Fix possible memory
corruption when handling SHDLC I-Frame commands").
I'm not totally sure, but I think that commit description may have
overstated the danger. I was under the impression that this data came
from the firmware? If you can't trust your networking firmware, then
you're already in trouble.
Anyway, these days we add bounds checking where ever we can and we call
it kernel hardening. Better safe than sorry.
Fixes: 11f54f2286 ("NFC: nci: Add HCI over NCI protocol support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull i2c fix from Wolfram Sang:
"A simple but wanted driver bugfix"
* 'i2c/for-current-fixed' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: imx: don't leak the i2c adapter on error
Pull parisc fixes from Helge Deller:
"A 32-bit boot regression fix introduced in the merge window, a QEMU
detection fix and two fixes by Sven regarding ptrace & kprobes"
* 'parisc-5.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Detect QEMU earlier in boot process
parisc: also set iaoq_b in instruction_pointer_set()
parisc: regs_return_value() should return gpr28
Revert: parisc: Use F_EXTEND() macro in iosapic code
While adding LASI support to QEMU, I noticed that the QEMU detection in
the kernel happens much too late. For example, when a LASI chip is found
by the kernel, it registers the LASI LED driver as well. But when we
run on QEMU it makes sense to avoid spending unnecessary CPU cycles, so
we need to access the running_on_QEMU flag earlier than before.
This patch now makes the QEMU detection the fist task of the Linux
kernel by moving it to where the kernel enters the C-coding.
Fixes: 310d82784f ("parisc: qemu idle sleep support")
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org # v4.14+
When setting the instruction pointer on PA-RISC we also need
to set the back of the instruction queue to the new offset, otherwise
we will execute on instruction from the new location, and jumping
back to the old location stored in iaoq_b.
Signed-off-by: Sven Schnelle <svens@stackframe.org>
Signed-off-by: Helge Deller <deller@gmx.de>
Fixes: 75ebedf1d2 ("parisc: Add HAVE_REGS_AND_STACK_ACCESS_API feature")
Cc: stable@vger.kernel.org # 4.19+
While working on kretprobes for PA-RISC I was wondering while the
kprobes sanity test always fails on kretprobes. This is caused by
returning gpr20 instead of gpr28.
Signed-off-by: Sven Schnelle <svens@stackframe.org>
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org # 4.14+
Revert parts of commit 97d7e2e3fd ("parisc: Use F_EXTEND() macro in
iosapic code"). It breaks booting the 32-bit kernel on some machines.
Reported-by: Sven Schnelle <svens@stackframe.org>
Tested-by: Sven Schnelle <svens@stackframe.org>
Fixes: 97d7e2e3fd ("parisc: Use F_EXTEND() macro in iosapic code")
Signed-off-by: Helge Deller <deller@gmx.de>
Commit 9c225f2655 ("vfs: atomic f_pos accesses as per POSIX") added
locking for file.f_pos access and in particular made concurrent read and
write not possible - now both those functions take f_pos lock for the
whole run, and so if e.g. a read is blocked waiting for data, write will
deadlock waiting for that read to complete.
This caused regression for stream-like files where previously read and
write could run simultaneously, but after that patch could not do so
anymore. See e.g. commit 581d21a2d0 ("xenbus: fix deadlock on writes
to /proc/xen/xenbus") which fixes such regression for particular case of
/proc/xen/xenbus.
The patch that added f_pos lock in 2014 did so to guarantee POSIX thread
safety for read/write/lseek and added the locking to file descriptors of
all regular files. In 2014 that thread-safety problem was not new as it
was already discussed earlier in 2006.
However even though 2006'th version of Linus's patch was adding f_pos
locking "only for files that are marked seekable with FMODE_LSEEK (thus
avoiding the stream-like objects like pipes and sockets)", the 2014
version - the one that actually made it into the tree as 9c225f2655 -
is doing so irregardless of whether a file is seekable or not.
See
https://lore.kernel.org/lkml/53022DB1.4070805@gmail.com/https://lwn.net/Articles/180387https://lwn.net/Articles/180396
for historic context.
The reason that it did so is, probably, that there are many files that
are marked non-seekable, but e.g. their read implementation actually
depends on knowing current position to correctly handle the read. Some
examples:
kernel/power/user.c snapshot_read
fs/debugfs/file.c u32_array_read
fs/fuse/control.c fuse_conn_waiting_read + ...
drivers/hwmon/asus_atk0110.c atk_debugfs_ggrp_read
arch/s390/hypfs/inode.c hypfs_read_iter
...
Despite that, many nonseekable_open users implement read and write with
pure stream semantics - they don't depend on passed ppos at all. And for
those cases where read could wait for something inside, it creates a
situation similar to xenbus - the write could be never made to go until
read is done, and read is waiting for some, potentially external, event,
for potentially unbounded time -> deadlock.
Besides xenbus, there are 14 such places in the kernel that I've found
with semantic patch (see below):
drivers/xen/evtchn.c:667:8-24: ERROR: evtchn_fops: .read() can deadlock .write()
drivers/isdn/capi/capi.c:963:8-24: ERROR: capi_fops: .read() can deadlock .write()
drivers/input/evdev.c:527:1-17: ERROR: evdev_fops: .read() can deadlock .write()
drivers/char/pcmcia/cm4000_cs.c:1685:7-23: ERROR: cm4000_fops: .read() can deadlock .write()
net/rfkill/core.c:1146:8-24: ERROR: rfkill_fops: .read() can deadlock .write()
drivers/s390/char/fs3270.c:488:1-17: ERROR: fs3270_fops: .read() can deadlock .write()
drivers/usb/misc/ldusb.c:310:1-17: ERROR: ld_usb_fops: .read() can deadlock .write()
drivers/hid/uhid.c:635:1-17: ERROR: uhid_fops: .read() can deadlock .write()
net/batman-adv/icmp_socket.c:80:1-17: ERROR: batadv_fops: .read() can deadlock .write()
drivers/media/rc/lirc_dev.c:198:1-17: ERROR: lirc_fops: .read() can deadlock .write()
drivers/leds/uleds.c:77:1-17: ERROR: uleds_fops: .read() can deadlock .write()
drivers/input/misc/uinput.c:400:1-17: ERROR: uinput_fops: .read() can deadlock .write()
drivers/infiniband/core/user_mad.c:985:7-23: ERROR: umad_fops: .read() can deadlock .write()
drivers/gnss/core.c:45:1-17: ERROR: gnss_fops: .read() can deadlock .write()
In addition to the cases above another regression caused by f_pos
locking is that now FUSE filesystems that implement open with
FOPEN_NONSEEKABLE flag, can no longer implement bidirectional
stream-like files - for the same reason as above e.g. read can deadlock
write locking on file.f_pos in the kernel.
FUSE's FOPEN_NONSEEKABLE was added in 2008 in a7c1b990f7 ("fuse:
implement nonseekable open") to support OSSPD. OSSPD implements /dev/dsp
in userspace with FOPEN_NONSEEKABLE flag, with corresponding read and
write routines not depending on current position at all, and with both
read and write being potentially blocking operations:
See
https://github.com/libfuse/osspdhttps://lwn.net/Articles/308445https://github.com/libfuse/osspd/blob/14a9cff0/osspd.c#L1406https://github.com/libfuse/osspd/blob/14a9cff0/osspd.c#L1438-L1477https://github.com/libfuse/osspd/blob/14a9cff0/osspd.c#L1479-L1510
Corresponding libfuse example/test also describes FOPEN_NONSEEKABLE as
"somewhat pipe-like files ..." with read handler not using offset.
However that test implements only read without write and cannot exercise
the deadlock scenario:
https://github.com/libfuse/libfuse/blob/fuse-3.4.2-3-ga1bff7d/example/poll.c#L124-L131https://github.com/libfuse/libfuse/blob/fuse-3.4.2-3-ga1bff7d/example/poll.c#L146-L163https://github.com/libfuse/libfuse/blob/fuse-3.4.2-3-ga1bff7d/example/poll.c#L209-L216
I've actually hit the read vs write deadlock for real while implementing
my FUSE filesystem where there is /head/watch file, for which open
creates separate bidirectional socket-like stream in between filesystem
and its user with both read and write being later performed
simultaneously. And there it is semantically not easy to split the
stream into two separate read-only and write-only channels:
https://lab.nexedi.com/kirr/wendelin.core/blob/f13aa600/wcfs/wcfs.go#L88-169
Let's fix this regression. The plan is:
1. We can't change nonseekable_open to include &~FMODE_ATOMIC_POS -
doing so would break many in-kernel nonseekable_open users which
actually use ppos in read/write handlers.
2. Add stream_open() to kernel to open stream-like non-seekable file
descriptors. Read and write on such file descriptors would never use
nor change ppos. And with that property on stream-like files read and
write will be running without taking f_pos lock - i.e. read and write
could be running simultaneously.
3. With semantic patch search and convert to stream_open all in-kernel
nonseekable_open users for which read and write actually do not
depend on ppos and where there is no other methods in file_operations
which assume @offset access.
4. Add FOPEN_STREAM to fs/fuse/ and open in-kernel file-descriptors via
steam_open if that bit is present in filesystem open reply.
It was tempting to change fs/fuse/ open handler to use stream_open
instead of nonseekable_open on just FOPEN_NONSEEKABLE flags, but
grepping through Debian codesearch shows users of FOPEN_NONSEEKABLE,
and in particular GVFS which actually uses offset in its read and
write handlers
https://codesearch.debian.net/search?q=-%3Enonseekable+%3Dhttps://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1080https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1247-1346https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1399-1481
so if we would do such a change it will break a real user.
5. Add stream_open and FOPEN_STREAM handling to stable kernels starting
from v3.14+ (the kernel where 9c225f2655 first appeared).
This will allow to patch OSSPD and other FUSE filesystems that
provide stream-like files to return FOPEN_STREAM | FOPEN_NONSEEKABLE
in their open handler and this way avoid the deadlock on all kernel
versions. This should work because fs/fuse/ ignores unknown open
flags returned from a filesystem and so passing FOPEN_STREAM to a
kernel that is not aware of this flag cannot hurt. In turn the kernel
that is not aware of FOPEN_STREAM will be < v3.14 where just
FOPEN_NONSEEKABLE is sufficient to implement streams without read vs
write deadlock.
This patch adds stream_open, converts /proc/xen/xenbus to it and adds
semantic patch to automatically locate in-kernel places that are either
required to be converted due to read vs write deadlock, or that are just
safe to be converted because read and write do not use ppos and there
are no other funky methods in file_operations.
Regarding semantic patch I've verified each generated change manually -
that it is correct to convert - and each other nonseekable_open instance
left - that it is either not correct to convert there, or that it is not
converted due to current stream_open.cocci limitations.
The script also does not convert files that should be valid to convert,
but that currently have .llseek = noop_llseek or generic_file_llseek for
unknown reason despite file being opened with nonseekable_open (e.g.
drivers/input/mousedev.c)
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Yongzhi Pan <panyongzhi@gmail.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Tejun Heo <tj@kernel.org>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Julia Lawall <Julia.Lawall@lip6.fr>
Cc: Nikolaus Rath <Nikolaus@rath.org>
Cc: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Kirill Smelkov <kirr@nexedi.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
At module load, if the selected home_node value is greater than
the available numa nodes, the system will crash in
__alloc_pages_nodemask() due to a bad paging request. Prevent this
user error crash by detecting the bad value, logging an error, and
setting g_home_node back to the default of NUMA_NO_NODE.
Signed-off-by: John Pittman <jpittman@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull RTC fixes from Alexandre Belloni:
- Various alarm fixes for da9063, cros-ec and sh
- sd3078 manufacturer name fix as this was introduced this cycle
* tag 'rtc-5.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux:
rtc: da9063: set uie_unsupported when relevant
rtc: sd3078: fix manufacturer name
rtc: sh: Fix invalid alarm warning for non-enabled alarm
rtc: cros-ec: Fail suspend/resume if wake IRQ can't be configured
There's a number of problems with how arch/x86/include/asm/bitops.h
is currently using assembly constraints for the memory region
bitops are modifying:
1) Use memory clobber in bitops that touch arbitrary memory
Certain bit operations that read/write bits take a base pointer and an
arbitrarily large offset to address the bit relative to that base.
Inline assembly constraints aren't expressive enough to tell the
compiler that the assembly directive is going to touch a specific memory
location of unknown size, therefore we have to use the "memory" clobber
to indicate that the assembly is going to access memory locations other
than those listed in the inputs/outputs.
To indicate that BTR/BTS instructions don't necessarily touch the first
sizeof(long) bytes of the argument, we also move the address to assembly
inputs.
This particular change leads to size increase of 124 kernel functions in
a defconfig build. For some of them the diff is in NOP operations, other
end up re-reading values from memory and may potentially slow down the
execution. But without these clobbers the compiler is free to cache
the contents of the bitmaps and use them as if they weren't changed by
the inline assembly.
2) Use byte-sized arguments for operations touching single bytes.
Passing a long value to ANDB/ORB/XORB instructions makes the compiler
treat sizeof(long) bytes as being clobbered, which isn't the case. This
may theoretically lead to worse code in the case of heavy optimization.
Practical impact:
I've built a defconfig kernel and looked through some of the functions
generated by GCC 7.3.0 with and without this clobber, and didn't spot
any miscompilations.
However there is a (trivial) theoretical case where this code leads to
miscompilation:
https://lkml.org/lkml/2019/3/28/393
using just GCC 8.3.0 with -O2. It isn't hard to imagine someone writes
such a function in the kernel someday.
So the primary motivation is to fix an existing misuse of the asm
directive, which happens to work in certain configurations now, but
isn't guaranteed to work under different circumstances.
[ --mingo: Added -stable tag because defconfig only builds a fraction
of the kernel and the trivial testcase looks normal enough to
be used in existing or in-development code. ]
Signed-off-by: Alexander Potapenko <glider@google.com>
Cc: <stable@vger.kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: James Y Knight <jyknight@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20190402112813.193378-1-glider@google.com
[ Edited the changelog, tidied up one of the defines. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Merge misc fixes from Andrew Morton:
"14 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
kernel/sysctl.c: fix out-of-bounds access when setting file-max
mm/util.c: fix strndup_user() comment
sh: fix multiple function definition build errors
MAINTAINERS: add maintainer and replacing reviewer ARM/NUVOTON NPCM
MAINTAINERS: fix bad pattern in ARM/NUVOTON NPCM
mm: writeback: use exact memcg dirty counts
psi: clarify the units used in pressure files
mm/huge_memory.c: fix modifying of page protection by insert_pfn_pmd()
hugetlbfs: fix memory leak for resv_map
mm: fix vm_fault_t cast in VM_FAULT_GET_HINDEX()
lib/lzo: fix bugs for very short or empty input
include/linux/bitrev.h: fix constant bitrev
kmemleak: powerpc: skip scanning holes in the .bss section
lib/string.c: implement a basic bcmp
Commit 32a5ad9c22 ("sysctl: handle overflow for file-max") hooked up
min/max values for the file-max sysctl parameter via the .extra1 and
.extra2 fields in the corresponding struct ctl_table entry.
Unfortunately, the minimum value points at the global 'zero' variable,
which is an int. This results in a KASAN splat when accessed as a long
by proc_doulongvec_minmax on 64-bit architectures:
| BUG: KASAN: global-out-of-bounds in __do_proc_doulongvec_minmax+0x5d8/0x6a0
| Read of size 8 at addr ffff2000133d1c20 by task systemd/1
|
| CPU: 0 PID: 1 Comm: systemd Not tainted 5.1.0-rc3-00012-g40b114779944 #2
| Hardware name: linux,dummy-virt (DT)
| Call trace:
| dump_backtrace+0x0/0x228
| show_stack+0x14/0x20
| dump_stack+0xe8/0x124
| print_address_description+0x60/0x258
| kasan_report+0x140/0x1a0
| __asan_report_load8_noabort+0x18/0x20
| __do_proc_doulongvec_minmax+0x5d8/0x6a0
| proc_doulongvec_minmax+0x4c/0x78
| proc_sys_call_handler.isra.19+0x144/0x1d8
| proc_sys_write+0x34/0x58
| __vfs_write+0x54/0xe8
| vfs_write+0x124/0x3c0
| ksys_write+0xbc/0x168
| __arm64_sys_write+0x68/0x98
| el0_svc_common+0x100/0x258
| el0_svc_handler+0x48/0xc0
| el0_svc+0x8/0xc
|
| The buggy address belongs to the variable:
| zero+0x0/0x40
|
| Memory state around the buggy address:
| ffff2000133d1b00: 00 00 00 00 00 00 00 00 fa fa fa fa 04 fa fa fa
| ffff2000133d1b80: fa fa fa fa 04 fa fa fa fa fa fa fa 04 fa fa fa
| >ffff2000133d1c00: fa fa fa fa 04 fa fa fa fa fa fa fa 00 00 00 00
| ^
| ffff2000133d1c80: fa fa fa fa 00 fa fa fa fa fa fa fa 00 00 00 00
| ffff2000133d1d00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Fix the splat by introducing a unsigned long 'zero_ul' and using that
instead.
Link: http://lkml.kernel.org/r/20190403153409.17307-1-will.deacon@arm.com
Fixes: 32a5ad9c22 ("sysctl: handle overflow for file-max")
Signed-off-by: Will Deacon <will.deacon@arm.com>
Acked-by: Christian Brauner <christian@brauner.io>
Cc: Kees Cook <keescook@chromium.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Many of the sh CPU-types have their own plat_irq_setup() and
arch_init_clk_ops() functions, so these same (empty) functions in
arch/sh/boards/of-generic.c are not needed and cause build errors.
If there is some case where these empty functions are needed, they can
be retained by marking them as "__weak" while at the same time making
builds that do not need them succeed.
Fixes these build errors:
arch/sh/boards/of-generic.o: In function `plat_irq_setup':
(.init.text+0x134): multiple definition of `plat_irq_setup'
arch/sh/kernel/cpu/sh2/setup-sh7619.o:(.init.text+0x30): first defined here
arch/sh/boards/of-generic.o: In function `arch_init_clk_ops':
(.init.text+0x118): multiple definition of `arch_init_clk_ops'
arch/sh/kernel/cpu/sh2/clock-sh7619.o:(.init.text+0x0): first defined here
Link: http://lkml.kernel.org/r/9ee4e0c5-f100-86a2-bd4d-1d3287ceab31@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kbuild test robot <lkp@intel.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since commit a983b5ebee ("mm: memcontrol: fix excessive complexity in
memory.stat reporting") memcg dirty and writeback counters are managed
as:
1) per-memcg per-cpu values in range of [-32..32]
2) per-memcg atomic counter
When a per-cpu counter cannot fit in [-32..32] it's flushed to the
atomic. Stat readers only check the atomic. Thus readers such as
balance_dirty_pages() may see a nontrivial error margin: 32 pages per
cpu.
Assuming 100 cpus:
4k x86 page_size: 13 MiB error per memcg
64k ppc page_size: 200 MiB error per memcg
Considering that dirty+writeback are used together for some decisions the
errors double.
This inaccuracy can lead to undeserved oom kills. One nasty case is
when all per-cpu counters hold positive values offsetting an atomic
negative value (i.e. per_cpu[*]=32, atomic=n_cpu*-32).
balance_dirty_pages() only consults the atomic and does not consider
throttling the next n_cpu*32 dirty pages. If the file_lru is in the
13..200 MiB range then there's absolutely no dirty throttling, which
burdens vmscan with only dirty+writeback pages thus resorting to oom
kill.
It could be argued that tiny containers are not supported, but it's more
subtle. It's the amount the space available for file lru that matters.
If a container has memory.max-200MiB of non reclaimable memory, then it
will also suffer such oom kills on a 100 cpu machine.
The following test reliably ooms without this patch. This patch avoids
oom kills.
$ cat test
mount -t cgroup2 none /dev/cgroup
cd /dev/cgroup
echo +io +memory > cgroup.subtree_control
mkdir test
cd test
echo 10M > memory.max
(echo $BASHPID > cgroup.procs && exec /memcg-writeback-stress /foo)
(echo $BASHPID > cgroup.procs && exec dd if=/dev/zero of=/foo bs=2M count=100)
$ cat memcg-writeback-stress.c
/*
* Dirty pages from all but one cpu.
* Clean pages from the non dirtying cpu.
* This is to stress per cpu counter imbalance.
* On a 100 cpu machine:
* - per memcg per cpu dirty count is 32 pages for each of 99 cpus
* - per memcg atomic is -99*32 pages
* - thus the complete dirty limit: sum of all counters 0
* - balance_dirty_pages() only sees atomic count -99*32 pages, which
* it max()s to 0.
* - So a workload can dirty -99*32 pages before balance_dirty_pages()
* cares.
*/
#define _GNU_SOURCE
#include <err.h>
#include <fcntl.h>
#include <sched.h>
#include <stdlib.h>
#include <stdio.h>
#include <sys/stat.h>
#include <sys/sysinfo.h>
#include <sys/types.h>
#include <unistd.h>
static char *buf;
static int bufSize;
static void set_affinity(int cpu)
{
cpu_set_t affinity;
CPU_ZERO(&affinity);
CPU_SET(cpu, &affinity);
if (sched_setaffinity(0, sizeof(affinity), &affinity))
err(1, "sched_setaffinity");
}
static void dirty_on(int output_fd, int cpu)
{
int i, wrote;
set_affinity(cpu);
for (i = 0; i < 32; i++) {
for (wrote = 0; wrote < bufSize; ) {
int ret = write(output_fd, buf+wrote, bufSize-wrote);
if (ret == -1)
err(1, "write");
wrote += ret;
}
}
}
int main(int argc, char **argv)
{
int cpu, flush_cpu = 1, output_fd;
const char *output;
if (argc != 2)
errx(1, "usage: output_file");
output = argv[1];
bufSize = getpagesize();
buf = malloc(getpagesize());
if (buf == NULL)
errx(1, "malloc failed");
output_fd = open(output, O_CREAT|O_RDWR);
if (output_fd == -1)
err(1, "open(%s)", output);
for (cpu = 0; cpu < get_nprocs(); cpu++) {
if (cpu != flush_cpu)
dirty_on(output_fd, cpu);
}
set_affinity(flush_cpu);
if (fsync(output_fd))
err(1, "fsync(%s)", output);
if (close(output_fd))
err(1, "close(%s)", output);
free(buf);
}
Make balance_dirty_pages() and wb_over_bg_thresh() work harder to
collect exact per memcg counters. This avoids the aforementioned oom
kills.
This does not affect the overhead of memory.stat, which still reads the
single atomic counter.
Why not use percpu_counter? memcg already handles cpus going offline, so
no need for that overhead from percpu_counter. And the percpu_counter
spinlocks are more heavyweight than is required.
It probably also makes sense to use exact dirty and writeback counters
in memcg oom reports. But that is saved for later.
Link: http://lkml.kernel.org/r/20190329174609.164344-1-gthelen@google.com
Signed-off-by: Greg Thelen <gthelen@google.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org> [4.16+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With some architectures like ppc64, set_pmd_at() cannot cope with a
situation where there is already some (different) valid entry present.
Use pmdp_set_access_flags() instead to modify the pfn which is built to
deal with modifying existing PMD entries.
This is similar to commit cae85cb8ad ("mm/memory.c: fix modifying of
page protection by insert_pfn()")
We also do similar update w.r.t insert_pfn_pud eventhough ppc64 don't
support pud pfn entries now.
Without this patch we also see the below message in kernel log "BUG:
non-zero pgtables_bytes on freeing mm:"
Link: http://lkml.kernel.org/r/20190402115125.18803-1-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reported-by: Chandan Rajendra <chandan@linux.ibm.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When mknod is used to create a block special file in hugetlbfs, it will
allocate an inode and kmalloc a 'struct resv_map' via resv_map_alloc().
inode->i_mapping->private_data will point the newly allocated resv_map.
However, when the device special file is opened bd_acquire() will set
inode->i_mapping to bd_inode->i_mapping. Thus the pointer to the
allocated resv_map is lost and the structure is leaked.
Programs to reproduce:
mount -t hugetlbfs nodev hugetlbfs
mknod hugetlbfs/dev b 0 0
exec 30<> hugetlbfs/dev
umount hugetlbfs/
resv_map structures are only needed for inodes which can have associated
page allocations. To fix the leak, only allocate resv_map for those
inodes which could possibly be associated with page allocations.
Link: http://lkml.kernel.org/r/20190401213101.16476-1-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Reported-by: Yufen Yu <yuyufen@huawei.com>
Suggested-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
For very short input data (0 - 1 bytes), lzo-rle was not behaving
correctly. Fix this behaviour and update documentation accordingly.
For zero-length input, lzo v0 outputs an end-of-stream marker only,
which was misinterpreted by lzo-rle as a bitstream version number.
Ensure bitstream versions > 0 require a minimum stream length of 5.
Also fixes a bug in handling the tail for very short inputs when a
bitstream version is present.
Link: http://lkml.kernel.org/r/20190326165857.34613-1-dave.rodgman@arm.com
Signed-off-by: Dave Rodgman <dave.rodgman@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
clang points out with hundreds of warnings that the bitrev macros have a
problem with constant input:
drivers/hwmon/sht15.c:187:11: error: variable '__x' is uninitialized when used within its own initialization
[-Werror,-Wuninitialized]
u8 crc = bitrev8(data->val_status & 0x0F);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/bitrev.h:102:21: note: expanded from macro 'bitrev8'
__constant_bitrev8(__x) : \
~~~~~~~~~~~~~~~~~~~^~~~
include/linux/bitrev.h:67:11: note: expanded from macro '__constant_bitrev8'
u8 __x = x; \
~~~ ^
Both the bitrev and the __constant_bitrev macros use an internal
variable named __x, which goes horribly wrong when passing one to the
other.
The obvious fix is to rename one of the variables, so this adds an extra
'_'.
It seems we got away with this because
- there are only a few drivers using bitrev macros
- usually there are no constant arguments to those
- when they are constant, they tend to be either 0 or (unsigned)-1
(drivers/isdn/i4l/isdnhdlc.o, drivers/iio/amplifiers/ad8366.c) and
give the correct result by pure chance.
In fact, the only driver that I could find that gets different results
with this is drivers/net/wan/slic_ds26522.c, which in turn is a driver
for fairly rare hardware (adding the maintainer to Cc for testing).
Link: http://lkml.kernel.org/r/20190322140503.123580-1-arnd@arndb.de
Fixes: 556d2f055b ("ARM: 8187/1: add CONFIG_HAVE_ARCH_BITREVERSE to support rbit instruction")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Cc: Zhao Qiang <qiang.zhao@nxp.com>
Cc: Yalin Wang <yalin.wang@sonymobile.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 2d4f567103 ("KVM: PPC: Introduce kvm_tmp framework") adds
kvm_tmp[] into the .bss section and then free the rest of unused spaces
back to the page allocator.
kernel_init
kvm_guest_init
kvm_free_tmp
free_reserved_area
free_unref_page
free_unref_page_prepare
With DEBUG_PAGEALLOC=y, it will unmap those pages from kernel. As the
result, kmemleak scan will trigger a panic when it scans the .bss
section with unmapped pages.
This patch creates dedicated kmemleak objects for the .data, .bss and
potentially .data..ro_after_init sections to allow partial freeing via
the kmemleak_free_part() in the powerpc kvm_free_tmp() function.
Link: http://lkml.kernel.org/r/20190321171917.62049-1-catalin.marinas@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Qian Cai <cai@lca.pw>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Tested-by: Qian Cai <cai@lca.pw>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krcmar <rkrcmar@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull device mapper fixes from Mike Snitzer:
- Two queue_limits stacking fixes: disable discards if underlying
driver does. And propagate BDI_CAP_STABLE_WRITES to fix sporadic
checksum errors.
- Fix that reverts a DM core limit that wasn't needed given that
dm-crypt was already updated to impose an equivalent limit.
- Fix dm-init to properly establish 'const' for __initconst array.
- Fix deadlock in DM integrity target that occurs when overlapping IO
is being issued to it. And two smaller fixes to the DM integrity
target.
* tag 'for-5.1/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm integrity: fix deadlock with overlapping I/O
dm: disable DISCARD if the underlying storage no longer supports it
dm table: propagate BDI_CAP_STABLE_WRITES to fix sporadic checksum errors
dm: revert 8f50e35815 ("dm: limit the max bio size as BIO_MAX_PAGES * PAGE_SIZE")
dm init: fix const confusion for dm_allowed_targets array
dm integrity: make dm_integrity_init and dm_integrity_exit static
dm integrity: change memcmp to strncmp in dm_integrity_ctr
Pull VFIO fixes from Alex Williamson:
- Fix clang printk format errors (Louis Taylor)
- Declare structure static to fix sparse warning (Wang Hai)
- Limit user DMA mappings per container (CVE-2019-3882) (Alex
Williamson)
* tag 'vfio-v5.1-rc4' of git://github.com/awilliam/linux-vfio:
vfio/type1: Limit DMA mappings per container
vfio/spapr_tce: Make symbol 'tce_iommu_driver_ops' static
vfio/pci: use correct format characters
After this commit
f875a79 nfsd: allow nfsv3 readdir request to be larger.
nfsv3 readdir request size can be larger than PAGE_SIZE. So if the
directory been read is large enough, we can use multiple pages
in rq_respages. Update buffer count and page pointers like we do
in readdirplus to make this happen.
Now listing a directory within 3000 files will panic because we
are counting in a wrong way and would write on random page.
Fixes: f875a79 "nfsd: allow nfsv3 readdir request to be larger"
Signed-off-by: Murphy Zhou <jencce.kernel@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
A recent commit added a call to cache_fresh_locked()
when an expired item was found.
The call sets the CACHE_VALID flag, so it is important
that the item actually is valid.
There are two ways it could be valid:
1/ If ->update has been called to fill in relevant content
2/ if CACHE_NEGATIVE is set, to say that content doesn't exist.
An expired item that is waiting for an update will be neither.
Setting CACHE_VALID will mean that a subsequent call to cache_put()
will be likely to dereference uninitialised pointers.
So we must make sure the item is valid, and we already have code to do
that in try_to_negate_entry(). This takes the hash lock and so cannot
be used directly, so take out the two lines that we need and use them.
Now cache_fresh_locked() is certain to be called only on
a valid item.
Cc: stable@kernel.org # 2.6.35
Fixes: 4ecd55ea07 ("sunrpc: fix cache_head leak due to queued request")
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pull kvm fixes from Paolo Bonzini:
"x86 fixes for overflows and other nastiness"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: nVMX: fix x2APIC VTPR read intercept
KVM: x86: nVMX: close leak of L0's x2APIC MSRs (CVE-2019-3887)
KVM: SVM: prevent DBG_DECRYPT and DBG_ENCRYPT overflow
kvm: svm: fix potential get_num_contig_pages overflow
Pull arm64 fix from Catalin Marinas:
"Fix unwind_frame() in the context of pseudo NMI"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: fix wrong check of on_sdei_stack in nmi context
Pull syscall-get-arguments cleanup and fixes from Steven Rostedt:
"Andy Lutomirski approached me to tell me that the
syscall_get_arguments() implementation in x86 was horrible and gcc
certainly gets it wrong.
He said that since the tracepoints only pass in 0 and 6 for i and n
repectively, it should be optimized for that case. Inspecting the
kernel, I discovered that all users pass in 0 for i and only one file
passing in something other than 6 for the number of arguments. That
code happens to be my own code used for the special syscall tracing.
That can easily be converted to just using 0 and 6 as well, and only
copying what is needed. Which is probably the faster path anyway for
that case.
Along the way, a couple of real fixes came from this as the
syscall_get_arguments() function was incorrect for csky and riscv.
x86 has been optimized to for the new interface that removes the
variable number of arguments, but the other architectures could still
use some loving and take more advantage of the simpler interface"
* tag 'trace-5.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
syscalls: Remove start and number from syscall_set_arguments() args
syscalls: Remove start and number from syscall_get_arguments() args
csky: Fix syscall_get_arguments() and syscall_set_arguments()
riscv: Fix syscall_get_arguments() and syscall_set_arguments()
tracing/syscalls: Pass in hardcoded 6 into syscall_get_arguments()
ptrace: Remove maxargs from task_current_syscall()
dm-integrity will deadlock if overlapping I/O is issued to it, the bug
was introduced by commit 724376a04d ("dm integrity: implement fair
range locks"). Users rarely use overlapping I/O so this bug went
undetected until now.
Fix this bug by correcting, likely cut-n-paste, typos in
ranges_overlap() and also remove a flawed ranges_overlap() check in
remove_range_unlocked(). This condition could leave unprocessed bios
hanging on wait_list forever.
Cc: stable@vger.kernel.org # v4.19+
Fixes: 724376a04d ("dm integrity: implement fair range locks")
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
There is a Marvell 88SE9170 PCIe SATA controller I found on a board here.
Some quick testing with the ARM SMMU enabled reveals that it suffers from
the same requester ID mixup problems as the other Marvell chips listed
already.
Add the PCI vendor/device ID to the list of chips which need the
workaround.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: stable@vger.kernel.org
Referring to the "VIRTUALIZING MSR-BASED APIC ACCESSES" chapter of the
SDM, when "virtualize x2APIC mode" is 1 and "APIC-register
virtualization" is 0, a RDMSR of 808H should return the VTPR from the
virtual APIC page.
However, for nested, KVM currently fails to disable the read intercept
for this MSR. This means that a RDMSR exit takes precedence over
"virtualize x2APIC mode", and KVM passes through L1's TPR to L2,
instead of sourcing the value from L2's virtual APIC page.
This patch fixes the issue by disabling the read intercept, in VMCS02,
for the VTPR when "APIC-register virtualization" is 0.
The issue described above and fix prescribed here, were verified with
a related patch in kvm-unit-tests titled "Test VMX's virtualize x2APIC
mode w/ nested".
Signed-off-by: Marc Orr <marcorr@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Fixes: c992384bde ("KVM: vmx: speed up MSR bitmap merge")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The nested_vmx_prepare_msr_bitmap() function doesn't directly guard the
x2APIC MSR intercepts with the "virtualize x2APIC mode" MSR. As a
result, we discovered the potential for a buggy or malicious L1 to get
access to L0's x2APIC MSRs, via an L2, as follows.
1. L1 executes WRMSR(IA32_SPEC_CTRL, 1). This causes the spec_ctrl
variable, in nested_vmx_prepare_msr_bitmap() to become true.
2. L1 disables "virtualize x2APIC mode" in VMCS12.
3. L1 enables "APIC-register virtualization" in VMCS12.
Now, KVM will set VMCS02's x2APIC MSR intercepts from VMCS12, and then
set "virtualize x2APIC mode" to 0 in VMCS02. Oops.
This patch closes the leak by explicitly guarding VMCS02's x2APIC MSR
intercepts with VMCS12's "virtualize x2APIC mode" control.
The scenario outlined above and fix prescribed here, were verified with
a related patch in kvm-unit-tests titled "Add leak scenario to
virt_x2apic_mode_test".
Note, it looks like this issue may have been introduced inadvertently
during a merge---see 15303ba5d1.
Signed-off-by: Marc Orr <marcorr@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This ensures that the address and length provided to DBG_DECRYPT and
DBG_ENCRYPT do not cause an overflow.
At the same time, pass the actual number of pages pinned in memory to
sev_unpin_memory() as a cleanup.
Reported-by: Cfir Cohen <cfir@google.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Pull mm/compaction fixes from Mel Gorman:
"The merge window for 5.1 introduced a number of compaction-related
patches. with intermittent reports of corruption and functional
issues. The bugs are due to sloopy checking of zone boundaries and a
corner case where invalid indexes are used to access the free lists.
Reports are not common but at least two users and 0-day have tripped
over them. There is a chance that one of the syzbot reports are
related but it has not been confirmed properly.
The normal submission path is with Andrew but there have been some
delays and I consider them urgent enough that they should be picked up
before RC4 to avoid duplicate reports.
All of these have been successfully tested on older RC windows. This
will make this branch look like a rebase but in fact, they've simply
been lifted again from Andrew's tree and placed on a fresh branch.
I've no reason to believe that this has invalidated the testing given
the lack of change in compaction and the nature of the fixes"
* tag 'mm-compaction-5.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux:
mm/compaction.c: abort search if isolation fails
mm/compaction.c: correct zone boundary handling when resetting pageblock skip hints
The n_r3964 line discipline driver was written in a different time, when
SMP machines were rare, and users were trusted to do the right thing.
Since then, the world has moved on but not this code, it has stayed
rooted in the past with its lovely hand-crafted list structures and
loads of "interesting" race conditions all over the place.
After attempting to clean up most of the issues, I just gave up and am
now marking the driver as BROKEN so that hopefully someone who has this
hardware will show up out of the woodwork (I know you are out there!)
and will help with debugging a raft of changes that I had laying around
for the code, but was too afraid to commit as odds are they would break
things.
Many thanks to Jann and Linus for pointing out the initial problems in
this codebase, as well as many reviews of my attempts to fix the issues.
It was a case of whack-a-mole, and as you can see, the mole won.
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If a child irqchip calls irq_chip_set_wake_parent() but its parent irqchip
has the IRQCHIP_SKIP_SET_WAKE flag set an error is returned.
This is inconsistent behaviour vs. set_irq_wake_real() which returns 0 when
the irqchip has the IRQCHIP_SKIP_SET_WAKE flag set. It doesn't attempt to
walk the chain of parents and set irq wake on any chips that don't have the
flag set either. If the intent is to call the .irq_set_wake() callback of
the parent irqchip, then we expect irqchip implementations to omit the
IRQCHIP_SKIP_SET_WAKE flag and implement an .irq_set_wake() function that
calls irq_chip_set_wake_parent().
The problem has been observed on a Qualcomm sdm845 device where set wake
fails on any GPIO interrupts after applying work in progress wakeup irq
patches to the GPIO driver. The chain of chips looks like this:
QCOM GPIO -> QCOM PDC (SKIP) -> ARM GIC (SKIP)
The GPIO controllers parent is the QCOM PDC irqchip which in turn has ARM
GIC as parent. The QCOM PDC irqchip has the IRQCHIP_SKIP_SET_WAKE flag
set, and so does the grandparent ARM GIC.
The GPIO driver doesn't know if the parent needs to set wake or not, so it
unconditionally calls irq_chip_set_wake_parent() causing this function to
return a failure because the parent irqchip (PDC) doesn't have the
.irq_set_wake() callback set. Returning 0 instead makes everything work and
irqs from the GPIO controller can be configured for wakeup.
Make it consistent by returning 0 (success) from irq_chip_set_wake_parent()
when a parent chip has IRQCHIP_SKIP_SET_WAKE set.
[ tglx: Massaged changelog ]
Fixes: 08b55e2a92 ("genirq: Add irqchip_set_wake_parent")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-gpio@vger.kernel.org
Cc: Lina Iyer <ilina@codeaurora.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20190325181026.247796-1-swboyd@chromium.org
blk_mq_try_issue_directly() can return BLK_STS*_RESOURCE for requests that
have been queued. If that happens when blk_mq_try_issue_directly() is called
by the dm-mpath driver then dm-mpath will try to resubmit a request that is
already queued and a kernel crash follows. Since it is nontrivial to fix
blk_mq_request_issue_directly(), revert the blk_mq_request_issue_directly()
changes that went into kernel v5.0.
This patch reverts the following commits:
* d6a51a97c0 ("blk-mq: replace and kill blk_mq_request_issue_directly") # v5.0.
* 5b7a6f128a ("blk-mq: issue directly with bypass 'false' in blk_mq_sched_insert_requests") # v5.0.
* 7f556a44e6 ("blk-mq: refactor the code of issue request directly") # v5.0.
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: James Smart <james.smart@broadcom.com>
Cc: Dongli Zhang <dongli.zhang@oracle.com>
Cc: Laurence Oberman <loberman@redhat.com>
Cc: <stable@vger.kernel.org>
Reported-by: Laurence Oberman <loberman@redhat.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Fixes: 7f556a44e6 ("blk-mq: refactor the code of issue request directly") # v5.0.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When ioctl calls are made with non-null-terminated userspace strings,
strlcpy causes an OOB-read from within strlen. Fix by changing to use
strscpy instead.
Signed-off-by: Zubin Mithra <zsm@chromium.org>
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The "call" variable comes from the user in privcmd_ioctl_hypercall().
It's an offset into the hypercall_page[] which has (PAGE_SIZE / 32)
elements. We need to put an upper bound on it to prevent an out of
bounds access.
Cc: stable@vger.kernel.org
Fixes: 1246ae0bb9 ("xen: add variable hypercall caller")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
struct privcmd_buf_vma_private has a zero-sized array at the end
(pages), use the new struct_size() helper to determine the proper
allocation size and avoid potential type mistakes.
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Pull drm fixes from Dave Airlie:
"Pretty quiet week, just some amdgpu and i915 fixes.
i915:
- deadlock fix
- gvt fixes
amdgpu:
- PCIE dpm feature fix
- Powerplay fixes"
* tag 'drm-fixes-2019-04-05' of git://anongit.freedesktop.org/drm/drm:
drm/i915/gvt: Fix kerneldoc typo for intel_vgpu_emulate_hotplug
drm/i915/gvt: Correct the calculation of plane size
drm/amdgpu: remove unnecessary rlc reset function on gfx9
drm/i915: Always backoff after a drm_modeset_lock() deadlock
drm/i915/gvt: do not let pin count of shadow mm go negative
drm/i915/gvt: do not deliver a workload if its creation fails
drm/amd/display: VBIOS can't be light up HDMI when restart system
drm/amd/powerplay: fix possible hang with 3+ 4K monitors
drm/amd/powerplay: correct data type to avoid overflow
drm/amd/powerplay: add ECC feature bit
drm/amd/amdgpu: fix PCIe dpm feature issue (v3)
Pull networking fixes from David Miller:
1) Several hash table refcount fixes in batman-adv, from Sven
Eckelmann.
2) Use after free in bpf_evict_inode(), from Daniel Borkmann.
3) Fix mdio bus registration in ixgbe, from Ivan Vecera.
4) Unbounded loop in __skb_try_recv_datagram(), from Paolo Abeni.
5) ila rhashtable corruption fix from Herbert Xu.
6) Don't allow upper-devices to be added to vrf devices, from Sabrina
Dubroca.
7) Add qmi_wwan device ID for Olicard 600, from Bjørn Mork.
8) Don't leave skb->next poisoned in __netif_receive_skb_list_ptype,
from Alexander Lobakin.
9) Missing IDR checks in mlx5 driver, from Aditya Pakki.
10) Fix false connection termination in ktls, from Jakub Kicinski.
11) Work around some ASPM issues with r8169 by disabling rx interrupt
coalescing on certain chips. From Heiner Kallweit.
12) Properly use per-cpu qstat values on NOLOCK qdiscs, from Paolo
Abeni.
13) Fully initialize sockaddr_in structures in SCTP, from Xin Long.
14) Various BPF flow dissector fixes from Stanislav Fomichev.
15) Divide by zero in act_sample, from Davide Caratti.
16) Fix bridging multicast regression introduced by rhashtable
conversion, from Nikolay Aleksandrov.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (106 commits)
ibmvnic: Fix completion structure initialization
ipv6: sit: reset ip header pointer in ipip6_rcv
net: bridge: always clear mcast matching struct on reports and leaves
libcxgb: fix incorrect ppmax calculation
vlan: conditional inclusion of FCoE hooks to match netdevice.h and bnx2x
sch_cake: Make sure we can write the IP header before changing DSCP bits
sch_cake: Use tc_skb_protocol() helper for getting packet protocol
tcp: Ensure DCTCP reacts to losses
net/sched: act_sample: fix divide by zero in the traffic path
net: thunderx: fix NULL pointer dereference in nicvf_open/nicvf_stop
net: hns: Fix sparse: some warnings in HNS drivers
net: hns: Fix WARNING when remove HNS driver with SMMU enabled
net: hns: fix ICMP6 neighbor solicitation messages discard problem
net: hns: Fix probabilistic memory overwrite when HNS driver initialized
net: hns: Use NAPI_POLL_WEIGHT for hns driver
net: hns: fix KASAN: use-after-free in hns_nic_net_xmit_hw()
flow_dissector: rst'ify documentation
ipv6: Fix dangling pointer when ipv6 fragment
net-gro: Fix GRO flush when receiving a GSO packet.
flow_dissector: document BPF flow dissector environment
...
There is a hardware bug in some POWER9 processors where a treclaim in
fake suspend mode can cause an inconsistency in the XER[SO] bit across
the threads of a core, the workaround being to force the core into SMT4
when doing the treclaim.
The FAKE_SUSPEND bit (bit 10) in the PSSCR is used to control whether a
thread is in fake suspend or real suspend. The important difference here
being that thread reconfiguration is blocked in real suspend but not
fake suspend mode.
When we exit a guest which was in fake suspend mode, we force the core
into SMT4 while we do the treclaim in kvmppc_save_tm_hv().
However on the new exit path introduced with the function
kvmhv_run_single_vcpu() we restore the host PSSCR before calling
kvmppc_save_tm_hv() which means that if we were in fake suspend mode we
put the thread into real suspend mode when we clear the
PSSCR[FAKE_SUSPEND] bit. This means that we block thread reconfiguration
and the thread which is trying to get the core into SMT4 before it can
do the treclaim spins forever since it itself is blocking thread
reconfiguration. The result is that that core is essentially lost.
This results in a trace such as:
[ 93.512904] CPU: 7 PID: 13352 Comm: qemu-system-ppc Not tainted 5.0.0 #4
[ 93.512905] NIP: c000000000098a04 LR: c0000000000cc59c CTR: 0000000000000000
[ 93.512908] REGS: c000003fffd2bd70 TRAP: 0100 Not tainted (5.0.0)
[ 93.512908] MSR: 9000000302883033 <SF,HV,VEC,VSX,FP,ME,IR,DR,RI,LE,TM[SE]> CR: 22222444 XER: 00000000
[ 93.512914] CFAR: c000000000098a5c IRQMASK: 3
[ 93.512915] PACATMSCRATCH: 0000000000000001
[ 93.512916] GPR00: 0000000000000001 c000003f6cc1b830 c000000001033100 0000000000000004
[ 93.512928] GPR04: 0000000000000004 0000000000000002 0000000000000004 0000000000000007
[ 93.512930] GPR08: 0000000000000000 0000000000000004 0000000000000000 0000000000000004
[ 93.512932] GPR12: c000203fff7fc000 c000003fffff9500 0000000000000000 0000000000000000
[ 93.512935] GPR16: 2000000000300375 000000000000059f 0000000000000000 0000000000000000
[ 93.512951] GPR20: 0000000000000000 0000000000080053 004000000256f41f c000003f6aa88ef0
[ 93.512953] GPR24: c000003f6aa89100 0000000000000010 0000000000000000 0000000000000000
[ 93.512956] GPR28: c000003f9e9a0800 0000000000000000 0000000000000001 c000203fff7fc000
[ 93.512959] NIP [c000000000098a04] pnv_power9_force_smt4_catch+0x1b4/0x2c0
[ 93.512960] LR [c0000000000cc59c] kvmppc_save_tm_hv+0x40/0x88
[ 93.512960] Call Trace:
[ 93.512961] [c000003f6cc1b830] [0000000000080053] 0x80053 (unreliable)
[ 93.512965] [c000003f6cc1b8a0] [c00800001e9cb030] kvmhv_p9_guest_entry+0x508/0x6b0 [kvm_hv]
[ 93.512967] [c000003f6cc1b940] [c00800001e9cba44] kvmhv_run_single_vcpu+0x2dc/0xb90 [kvm_hv]
[ 93.512968] [c000003f6cc1ba10] [c00800001e9cc948] kvmppc_vcpu_run_hv+0x650/0xb90 [kvm_hv]
[ 93.512969] [c000003f6cc1bae0] [c00800001e8f620c] kvmppc_vcpu_run+0x34/0x48 [kvm]
[ 93.512971] [c000003f6cc1bb00] [c00800001e8f2d4c] kvm_arch_vcpu_ioctl_run+0x2f4/0x400 [kvm]
[ 93.512972] [c000003f6cc1bb90] [c00800001e8e3918] kvm_vcpu_ioctl+0x460/0x7d0 [kvm]
[ 93.512974] [c000003f6cc1bd00] [c0000000003ae2c0] do_vfs_ioctl+0xe0/0x8e0
[ 93.512975] [c000003f6cc1bdb0] [c0000000003aeb24] ksys_ioctl+0x64/0xe0
[ 93.512978] [c000003f6cc1be00] [c0000000003aebc8] sys_ioctl+0x28/0x80
[ 93.512981] [c000003f6cc1be20] [c00000000000b3a4] system_call+0x5c/0x70
[ 93.512983] Instruction dump:
[ 93.512986] 419dffbc e98c0000 2e8b0000 38000001 60000000 60000000 60000000 40950068
[ 93.512993] 392bffff 39400000 79290020 39290001 <7d2903a6> 60000000 60000000 7d235214
To fix this we preserve the PSSCR[FAKE_SUSPEND] bit until we call
kvmppc_save_tm_hv() which will mean the core can get into SMT4 and
perform the treclaim. Note kvmppc_save_tm_hv() clears the
PSSCR[FAKE_SUSPEND] bit again so there is no need to explicitly do that.
Fixes: 95a6432ce9 ("KVM: PPC: Book3S HV: Streamlined guest entry/exit path on P9 for radix guests")
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Topology is not unloaded in the core during unregister_component()
anymore. So, add the remove() callback that will unload the
topology.
Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
The chips main power supplies VA and VP are enabled during probe but
then never disabled, this will cause warnings from the regulator
framework on driver removal. Fix this by adding a remove callback and
disabling the supplies, whilst doing so follow best practice and put the
chip back into reset as well.
Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Use %lu instead of %zu to fix the following warning introduced with
recent memblock refactoring:
xtensa/mm/mmu.c:36:9: warning: format '%zu' expects argument of type
'size_t', but argument 3 has type 'long unsigned int
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
The PTN5150 dependencies look like they were meant to do the
right thing, but they actually should not allow building without
I2C for compile testing, as that results in a Kconfig warning
and subsequent build failure:
WARNING: unmet direct dependencies detected for REGMAP_I2C
Depends on [m]: I2C [=m]
Selected by [y]:
- EXTCON_PTN5150 [=y] && EXTCON [=y] && (I2C [=m] && GPIOLIB [=y] || COMPILE_TEST [=y])
Selected by [m]:
- EEPROM_AT24 [=m] && I2C [=m] && SYSFS [=y]
- KEYBOARD_CAP11XX [=m] && !UML && INPUT [=y] && INPUT_KEYBOARD [=y] && OF [=y] && I2C [=m]
- INPUT_DRV260X_HAPTICS [=m] && !UML && INPUT_MISC [=y] && INPUT [=y] && I2C [=m] && (GPIOLIB [=y] || COMPILE_TEST [=y])
- ... [many others]
Add parentheses around the expression so we can compile-test
without GPIOLIB but not without I2C.
Fixes: 4ed754de2d ("extcon: Add support for ptn5150 extcon driver")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
Pull RISC-V fixes from Palmer Dabbelt:
"I dropped the ball a bit here: these patches should all probably have
been part of rc2, but I wanted to get around to properly testing them
in the various configurations (qemu32, qeum64, unleashed) first.
Unfortunately I've been traveling and didn't have time to actually do
that, but since these fix concrete bugs and pass my old set of tests I
don't want to delay the fixes any longer.
There are four independent fixes here:
- A fix for the rv32 port that corrects the 64-bit user accesor's
fixup label address.
- A fix for a regression introduced during the merge window that
broke medlow configurations at run time. This patch also includes a
fix that disables ftrace for the same set of functions, which was
found by inspection at the same time.
- A modification of the memory map to avoid overlapping the FIXMAP
and VMALLOC regions on systems with small memory maps.
- A fix to the module handling code to use the correct syntax for
probing Kconfig entries.
These have passed my standard test flow, but I didn't have time to
expand that testing like I said I would"
* tag 'riscv-for-linus-5.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux:
RISC-V: Use IS_ENABLED(CONFIG_CMODEL_MEDLOW)
RISC-V: Fix FIXMAP_TOP to avoid overlap with VMALLOC area
RISC-V: Always compile mm/init.c with cmodel=medany and notrace
riscv: fix accessing 8-byte variable from RV32
We need to be careful and always zero the whole br_ip struct when it is
used for matching since the rhashtable change. This patch fixes all the
places which didn't properly clear it which in turn might've caused
mismatches.
Thanks for the great bug report with reproducing steps and bisection.
Steps to reproduce (from the bug report):
ip link add br0 type bridge mcast_querier 1
ip link set br0 up
ip link add v2 type veth peer name v3
ip link set v2 master br0
ip link set v2 up
ip link set v3 up
ip addr add 3.0.0.2/24 dev v3
ip netns add test
ip link add v1 type veth peer name v1 netns test
ip link set v1 master br0
ip link set v1 up
ip -n test link set v1 up
ip -n test addr add 3.0.0.1/24 dev v1
# Multicast receiver
ip netns exec test socat
UDP4-RECVFROM:5588,ip-add-membership=224.224.224.224:3.0.0.1,fork -
# Multicast sender
echo hello | nc -u -s 3.0.0.2 224.224.224.224 5588
Reported-by: liam.mcbirnie@boeing.com
Fixes: 19e3a9c90c ("net: bridge: convert multicast to generic rhashtable")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull power management fixes from Rafael Wysocki:
"These fix up the intel_pstate driver after recent changes to prevent
it from printing pointless messages and update the turbostat utility
(mostly fixes and new hardware support).
Specifics:
- Make intel_pstate only load on Intel processors and prevent it from
printing pointless failure messages (Borislav Petkov).
- Update the turbostat utility:
* Assorted fixes (Ben Hutchings, Len Brown, Prarit Bhargava).
* Support for AMD Fam 17h (Zen) RAPL and package power (Calvin
Walton).
* Support for Intel Icelake and for systems with more than one die
per package (Len Brown).
* Cleanups (Len Brown)"
* tag 'pm-5.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq/intel_pstate: Load only on Intel hardware
tools/power turbostat: update version number
tools/power turbostat: Warn on bad ACPI LPIT data
tools/power turbostat: Add checks for failure of fgets() and fscanf()
tools/power turbostat: Also read package power on AMD F17h (Zen)
tools/power turbostat: Add support for AMD Fam 17h (Zen) RAPL
tools/power turbostat: Do not display an error on systems without a cpufreq driver
tools/power turbostat: Add Die column
tools/power turbostat: Add Icelake support
tools/power turbostat: Cleanup CNL-specific code
tools/power turbostat: Cleanup CC3-skip code
tools/power turbostat: Restore ability to execute in topology-order
Pull ACPI fix from Rafael Wysocki:
"Prevent stale GPE events from triggering spurious system wakeups from
suspend-to-idle (Furquan Shaikh)"
* tag 'acpi-5.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPICA: Clear status of GPEs before enabling them
Pull mfd fixes from Lee Jones:
- Fix failed reads due to enabled IRQs when suspended; twl-core
- Fix driver registration when using DT; sprd-sc27xx-spi
- Fix `make allyesconfig` on x86_64; SUN6I_PRCM
* tag 'mfd-fixes-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd:
mfd: sun6i-prcm: Allow to compile with COMPILE_TEST
mfd: sc27xx: Use SoC compatible string for PMIC devices
mfd: twl-core: Disable IRQ while suspended
BITS_TO_LONGS() uses DIV_ROUND_UP() because of
this ppmax value can be greater than available
per cpu page pods.
This patch removes BITS_TO_LONGS() to fix this
issue.
Signed-off-by: Varun Prakash <varun@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Way back in 3c9c36bced the
ndo_fcoe_get_wwn pointer was switched from depending on CONFIG_FCOE to
CONFIG_LIBFCOE in order to allow building FCoE support into the bnx2x
driver and used by bnx2fc without including the generic software fcoe
module.
But, FCoE is generally used over an 802.1q VLAN, and the implementation
of ndo_fcoe_get_wwn in the 8021q module was not similarly changed. The
result is that if CONFIG_FCOE is disabled, then bnz2fc cannot make a
call to ndo_fcoe_get_wwn through the 8021q interface to the underlying
bnx2x interface. The bnx2fc driver then falls back to a potentially
different mapping of Ethernet MAC to Fibre Channel WWN, creating an
incompatibility with the fabric and target configurations when compared
to the WWNs used by pre-boot firmware and differently-configured
kernels.
So make the conditional inclusion of FCoE code in 8021q match the
conditional inclusion in netdevice.h
Signed-off-by: Chris Leech <cleech@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
memory allocated by kmem_cache_alloc() should be freed using
kmem_cache_free(), not kfree().
Fixes: fa0ca2aee3 ("deal with get_reqs_available() in aio_get_req() itself")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
In function do_write_buffer(), in the for loop, there is a case
chip_ready() returns 1 while chip_good() returns 0, so it never
break the loop.
To fix this, chip_good() is enough and it should timeout if it stay
bad for a while.
Fixes: dfeae1073583("mtd: cfi_cmdset_0002: Change write buffer to check correct value")
Signed-off-by: Yi Huaijie <yihuaijie@huawei.com>
Signed-off-by: Liu Jian <liujian56@huawei.com>
Reviewed-by: Tokunori Ikegami <ikegami_to@yahoo.co.jp>
Signed-off-by: Richard Weinberger <richard@nod.at>
Daniel Borkmann says:
====================
pull-request: bpf 2019-04-04
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Batch of fixes to the existing BPF flow dissector API to support
calling BPF programs from the eth_get_headlen context (support for
latter is planned to be added in bpf-next), from Stanislav.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
* pm-tools:
tools/power turbostat: update version number
tools/power turbostat: Warn on bad ACPI LPIT data
tools/power turbostat: Add checks for failure of fgets() and fscanf()
tools/power turbostat: Also read package power on AMD F17h (Zen)
tools/power turbostat: Add support for AMD Fam 17h (Zen) RAPL
tools/power turbostat: Do not display an error on systems without a cpufreq driver
tools/power turbostat: Add Die column
tools/power turbostat: Add Icelake support
tools/power turbostat: Cleanup CNL-specific code
tools/power turbostat: Cleanup CC3-skip code
tools/power turbostat: Restore ability to execute in topology-order
Storage devices which report supporting discard commands like
WRITE_SAME_16 with unmap, but reject discard commands sent to the
storage device. This is a clear storage firmware bug but it doesn't
change the fact that should a program cause discards to be sent to a
multipath device layered on this buggy storage, all paths can end up
failed at the same time from the discards, causing possible I/O loss.
The first discard to a path will fail with Illegal Request, Invalid
field in cdb, e.g.:
kernel: sd 8:0:8:19: [sdfn] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
kernel: sd 8:0:8:19: [sdfn] tag#0 Sense Key : Illegal Request [current]
kernel: sd 8:0:8:19: [sdfn] tag#0 Add. Sense: Invalid field in cdb
kernel: sd 8:0:8:19: [sdfn] tag#0 CDB: Write same(16) 93 08 00 00 00 00 00 a0 08 00 00 00 80 00 00 00
kernel: blk_update_request: critical target error, dev sdfn, sector 10487808
The SCSI layer converts this to the BLK_STS_TARGET error number, the sd
device disables its support for discard on this path, and because of the
BLK_STS_TARGET error multipath fails the discard without failing any
path or retrying down a different path. But subsequent discards can
cause path failures. Any discards sent to the path which already failed
a discard ends up failing with EIO from blk_cloned_rq_check_limits with
an "over max size limit" error since the discard limit was set to 0 by
the sd driver for the path. As the error is EIO, this now fails the
path and multipath tries to send the discard down the next path. This
cycle continues as discards are sent until all paths fail.
Fix this by training DM core to disable DISCARD if the underlying
storage already did so.
Also, fix branching in dm_done() and clone_endio() to reflect the
mutually exclussive nature of the IO operations in question.
Cc: stable@vger.kernel.org
Reported-by: David Jeffery <djeffery@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
return_address returns the address that is one level higher in the call
stack than requested in its argument, because level 0 corresponds to its
caller's return address. Use requested level as the number of stack
frames to skip.
This fixes the address reported by might_sleep and friends.
Cc: stable@vger.kernel.org
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Toke Høiland-Jørgensen says:
====================
sched: A few small fixes for sch_cake
Kevin noticed a few issues with the way CAKE reads the skb protocol and the IP
diffserv fields. This series fixes those two issues, and should probably go to
in 4.19 as well. However, the previous refactoring patch means they don't apply
as-is; I can send a follow-up directly to stable if that's OK with you?
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
There is not actually any guarantee that the IP headers are valid before we
access the DSCP bits of the packets. Fix this using the same approach taken
in sch_dsmark.
Reported-by: Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We shouldn't be using skb->protocol directly as that will miss cases with
hardware-accelerated VLAN tags. Use the helper instead to get the right
protocol number.
Reported-by: Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
RFC8257 §3.5 explicitly states that "A DCTCP sender MUST react to
loss episodes in the same way as conventional TCP".
Currently, Linux DCTCP performs no cwnd reduction when losses
are encountered. Optionally, the dctcp_clamp_alpha_on_loss resets
alpha to its maximal value if a RTO happens. This behavior
is sub-optimal for at least two reasons: i) it ignores losses
triggering fast retransmissions; and ii) it causes unnecessary large
cwnd reduction in the future if the loss was isolated as it resets
the historical term of DCTCP's alpha EWMA to its maximal value (i.e.,
denoting a total congestion). The second reason has an especially
noticeable effect when using DCTCP in high BDP environments, where
alpha normally stays at low values.
This patch replace the clamping of alpha by setting ssthresh to
half of cwnd for both fast retransmissions and RTOs, at most once
per RTT. Consequently, the dctcp_clamp_alpha_on_loss module parameter
has been removed.
The table below shows experimental results where we measured the
drop probability of a PIE AQM (not applying ECN marks) at a
bottleneck in the presence of a single TCP flow with either the
alpha-clamping option enabled or the cwnd halving proposed by this
patch. Results using reno or cubic are given for comparison.
| Link | RTT | Drop
TCP CC | speed | base+AQM | probability
==================|=========|==========|============
CUBIC | 40Mbps | 7+20ms | 0.21%
RENO | | | 0.19%
DCTCP-CLAMP-ALPHA | | | 25.80%
DCTCP-HALVE-CWND | | | 0.22%
------------------|---------|----------|------------
CUBIC | 100Mbps | 7+20ms | 0.03%
RENO | | | 0.02%
DCTCP-CLAMP-ALPHA | | | 23.30%
DCTCP-HALVE-CWND | | | 0.04%
------------------|---------|----------|------------
CUBIC | 800Mbps | 1+1ms | 0.04%
RENO | | | 0.05%
DCTCP-CLAMP-ALPHA | | | 18.70%
DCTCP-HALVE-CWND | | | 0.06%
We see that, without halving its cwnd for all source of losses,
DCTCP drives the AQM to large drop probabilities in order to keep
the queue length under control (i.e., it repeatedly faces RTOs).
Instead, if DCTCP reacts to all source of losses, it can then be
controlled by the AQM using similar drop levels than cubic or reno.
Signed-off-by: Koen De Schepper <koen.de_schepper@nokia-bell-labs.com>
Signed-off-by: Olivier Tilmans <olivier.tilmans@nokia-bell-labs.com>
Cc: Bob Briscoe <research@bobbriscoe.net>
Cc: Lawrence Brakmo <brakmo@fb.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: Daniel Borkmann <borkmann@iogearbox.net>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Andrew Shewmaker <agshew@gmail.com>
Cc: Glenn Judd <glenn.judd@morganstanley.com>
Acked-by: Florian Westphal <fw@strlen.de>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yonglong Liu says:
====================
net: hns: bugfixes for HNS Driver
This patchset fix some bugs that were found in the test of
various scenarios, or identify by KASAN/sparse.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
There are some sparse warnings in the HNS drivers:
warning: incorrect type in assignment (different address spaces)
expected void [noderef] <asn:2> *io_base
got void *vaddr
warning: cast removes address space '<asn:2>' of expression
[...]
Add __iomem and change all the u8 __iomem to void __iomem to
fix these kind of warnings.
warning: incorrect type in argument 1 (different address spaces)
expected void [noderef] <asn:2> *base
got unsigned char [usertype] *base_addr
warning: cast to restricted __le16
warning: incorrect type in assignment (different base types)
expected unsigned int [usertype] tbl_tcam_data_high
got restricted __le32 [usertype]
warning: cast to restricted __le32
[...]
These variables used u32/u16 as their type, and finally as a
parameter of writel(), writel() will do the cpu_to_le32 coversion
so remove the little endian covert code to fix these kind of warnings.
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ICMP6 neighbor solicitation messages will be discard by the Hip06
chips, because of not setting forwarding pool. Enable promisc mode
has the same problem.
This patch fix the wrong forwarding table configs for the multicast
vague matching when enable promisc mode, and add forwarding pool
for the forwarding table.
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the HNS driver loaded, always have an error print:
"netif_napi_add() called with weight 256"
This is because the kernel checks the NAPI polling weights
requested by drivers and it prints an error message if a driver
requests a weight bigger than 64.
So use NAPI_POLL_WEIGHT to fix it.
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is trying to fix the issue due to:
[27237.844750] BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x708/0xa18[hns_enet_drv]
After hnae_queue_xmit() in hns_nic_net_xmit_hw(), can be
interrupted by interruptions, and than call hns_nic_tx_poll_one()
to handle the new packets, and free the skb. So, when turn back to
hns_nic_net_xmit_hw(), calling skb->len will cause use-after-free.
This patch update tx ring statistics in hns_nic_tx_poll_one() to
fix the bug.
Signed-off-by: Liubin Shu <shuliubin@huawei.com>
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The compression property resets to NULL, instead of the old value if we
fail to set the new compression parameter.
$ btrfs prop get /btrfs compression
compression=lzo
$ btrfs prop set /btrfs compression zli
ERROR: failed to set compression for /btrfs: Invalid argument
$ btrfs prop get /btrfs compression
This is because the compression property ->validate() is successful for
'zli' as the strncmp() used the length passed from the userspace.
Fix it by using the expected string length in strncmp().
Fixes: 63541927c8 ("Btrfs: add support for inode properties")
Fixes: 5c1aab1dd5 ("btrfs: Add zstd support")
CC: stable@vger.kernel.org # 4.14+
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We let pass zstd compression parameter even if it is not fully valid.
For example:
$ btrfs prop set /btrfs compression zst
$ btrfs prop get /btrfs compression
compression=zst
zlib and lzo are fine.
Fix it by checking the correct prefix length.
Fixes: 5c1aab1dd5 ("btrfs: Add zstd support")
CC: stable@vger.kernel.org # 4.14+
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[Why]
the member sdr_white_level of struct dc_cursor_attributes was not
initialized, then the random value result that
dcn10_set_cursor_sdr_white_level() set error hw_scale value 0x20D9(normal
value is 0x3c00), this cause the black cursor issue.
[how]
just initilize the obj of struct dc_cursor_attributes to zero to avoid
the random value.
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Tianci Yin <tianci.yin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
amdgpu_bo_restore_shadow would assign zero to r if succeeded.
r would remain zero if there is only one node in shadow_list.
current code would always return failure when r <= 0.
restart the timeout for each wait was a rather problematic bug as well.
The value of tmo SHOULD be changed, otherwise we wait tmo jiffies on each loop.
Signed-off-by: Wentao Lou <Wentao.Lou@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
When doing unwind_frame() in the context of pseudo nmi (need enable
CONFIG_ARM64_PSEUDO_NMI), reaching the bottom of the stack (fp == 0,
pc != 0), function on_sdei_stack() will return true while the sdei acpi
table is not inited in fact. This will cause a "NULL pointer dereference"
oops when going on.
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Wei Li <liwei391@huawei.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
We would never be able to sort the list if we first reset plug->rq_count
which is used in conditional check later.
Fixes: ce5b009cff ("block: improve logic around when to sort a plug list")
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Rename bpf_flow_dissector.txt to bpf_flow_dissector.rst and fix
formatting. Also, link it from the Documentation/networking/index.rst.
Tested with 'make htmldocs' to make sure it looks reasonable.
Fixes: ae82899bbe ("flow_dissector: document BPF flow dissector environment")
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The only users that calls syscall_get_arguments() with a variable and not a
hard coded '6' is ftrace_syscall_enter(). syscall_get_arguments() can be
optimized by removing a variable input, and always grabbing 6 arguments
regardless of what the system call actually uses.
Change ftrace_syscall_enter() to pass the 6 args into a local stack array
and copy the necessary arguments into the trace event as needed.
This is needed to remove two parameters from syscall_get_arguments().
Link: http://lkml.kernel.org/r/20161107213233.627583542@goodmis.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
task_current_syscall() has a single user that passes in 6 for maxargs, which
is the maximum arguments that can be used to get system calls from
syscall_get_arguments(). Instead of passing in a number of arguments to
grab, just get 6 arguments. The args argument even specifies that it's an
array of 6 items.
This will also allow changing syscall_get_arguments() to not get a variable
number of arguments, but always grab 6.
Linus also suggested not passing in a bunch of arguments to
task_current_syscall() but to instead pass in a pointer to a structure, and
just fill the structure. struct seccomp_data has almost all the parameters
that is needed except for the stack pointer (sp). As seccomp_data is part of
uapi, and I'm afraid to change it, a new structure was created
"syscall_info", which includes seccomp_data and adds the "sp" field.
Link: http://lkml.kernel.org/r/20161107213233.466776454@goodmis.org
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Mikhail Gavrilo reported the following bug being triggered in a Fedora
kernel based on 5.1-rc1 but it is relevant to a vanilla kernel.
kernel: page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
kernel: ------------[ cut here ]------------
kernel: kernel BUG at include/linux/mm.h:1021!
kernel: invalid opcode: 0000 [#1] SMP NOPTI
kernel: CPU: 6 PID: 116 Comm: kswapd0 Tainted: G C 5.1.0-0.rc1.git1.3.fc31.x86_64 #1
kernel: Hardware name: System manufacturer System Product Name/ROG STRIX X470-I GAMING, BIOS 1201 12/07/2018
kernel: RIP: 0010:__reset_isolation_pfn+0x244/0x2b0
kernel: Code: fe 06 e8 0f 8e fc ff 44 0f b6 4c 24 04 48 85 c0 0f 85 dc fe ff ff e9 68 fe ff ff 48 c7 c6 58 b7 2e 8c 4c 89 ff e8 0c 75 00 00 <0f> 0b 48 c7 c6 58 b7 2e 8c e8 fe 74 00 00 0f 0b 48 89 fa 41 b8 01
kernel: RSP: 0018:ffff9e2d03f0fde8 EFLAGS: 00010246
kernel: RAX: 0000000000000034 RBX: 000000000081f380 RCX: ffff8cffbddd6c20
kernel: RDX: 0000000000000000 RSI: 0000000000000006 RDI: ffff8cffbddd6c20
kernel: RBP: 0000000000000001 R08: 0000009898b94613 R09: 0000000000000000
kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000100000
kernel: R13: 0000000000100000 R14: 0000000000000001 R15: ffffca7de07ce000
kernel: FS: 0000000000000000(0000) GS:ffff8cffbdc00000(0000) knlGS:0000000000000000
kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 00007fc1670e9000 CR3: 00000007f5276000 CR4: 00000000003406e0
kernel: Call Trace:
kernel: __reset_isolation_suitable+0x62/0x120
kernel: reset_isolation_suitable+0x3b/0x40
kernel: kswapd+0x147/0x540
kernel: ? finish_wait+0x90/0x90
kernel: kthread+0x108/0x140
kernel: ? balance_pgdat+0x560/0x560
kernel: ? kthread_park+0x90/0x90
kernel: ret_from_fork+0x27/0x50
He bisected it down to e332f741a8 ("mm, compaction: be selective about
what pageblocks to clear skip hints"). The problem is that the patch in
question was sloppy with respect to the handling of zone boundaries. In
some instances, it was possible for PFNs outside of a zone to be examined
and if those were not properly initialised or poisoned then it would
trigger the VM_BUG_ON. This patch corrects the zone boundary issues when
resetting pageblock skip hints and Mikhail reported that the bug did not
trigger after 30 hours of testing.
Link: http://lkml.kernel.org/r/20190327085424.GL3189@techsingularity.net
Fixes: e332f741a8 ("mm, compaction: be selective about what pageblocks to clear skip hints")
Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
of_find_device_by_node() takes a reference to the struct device
when it finds a match via get_device. When returning error we should
call put_device.
Reviewed-by: Mukesh Ojha <mojha@codeaurora.org>
Signed-off-by: Peng Hao <peng.hao2@zte.com.cn>
Signed-off-by: Ludovic Desroches <ludovic.desroches@microchip.com>
This is because set_fmt ops maybe called when PD is off,
and in such case, regmap_ops will lead system hang.
enale PD before doing regmap_ops.
Signed-off-by: Sugar Zhang <sugar.zhang@rock-chips.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
commit da215354eb ("ASoC: simple-card: merge simple-scu-card")
merged simple-scu-audio-card which can handle DPCM into
simple-audio-card.
By this patch, the judgement to select "normal sound card" or
"DPCM sound card" is based on its CPU/Codec DAI count.
But, because of it, existing "simple-audio-card" user who is
assuming "normal sound card" might select DPCM unintentionally.
To solve this issue, this patch allows "simple-audio-card" user
can select "normal sound card", and "simple-scu-audio-card" user
can select both "normal sound card" and "DPCM sound card".
This keeps compatibility collectry.
Fixes: da215354eb ("ASoC: simple-card: merge simple-scu-card")
Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
commit ae3cb57909 ("ASoC: audio-graph-card: merge
audio-graph-scu-card") merged audio-graph-scu-card which can
handle DPCM into audio-graph-card.
By this patch, the judgement to select "normal sound card" or
"DPCM sound card" is based on its OF-graph endpoint connection.
But, because of it, existing "audio-graph-card" user who is
assuming "normal sound card" might select DPCM unintentionally.
To solve this issue, this patch allows "audio-graph-card" user
can select "normal sound card", and "audio-graph-scu-card" user
can select both "normal sound card" and "DPCM sound card".
This keeps compatibility collectry.
Fixes: ae3cb57909 ("ASoC: audio-graph-card: merge audio-graph-scu-card")
Reported-by: Arnaud Pouliquen <arnaud.pouliquen@st.com>
Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Acked-by: Arnaud Pouliquen <arnaud.pouliquen@st.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Add an explanation why zero width check is needed when generating factor
mask using GENMASK() macro.
Signed-off-by: Jernej Skrabec <jernej.skrabec@siol.net>
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
Pull smb3 fixes from Steve French:
"Four smb3 fixes for stable:
- fix an open path where we had an unitialized structure
- fix for snapshot (previous version) enumeration
- allow reconnect timeout on handles to be configurable to better
handle network or server crash
- correctly handle lack of file_all_info structure"
* tag '5.1-rc3-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: a smb2_validate_and_copy_iov failure does not mean the handle is invalid.
SMB3: Allow persistent handle timeout to be configurable on mount
smb3: Fix enumerating snapshots to Azure
cifs: fix kref underflow in close_shroot()
New pt_regs should indicate that there's no syscall, not that there's
syscall #0. While at it wrap macro body in do/while and parenthesize
macro arguments.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Syscall may alter pt_regs structure passed to it, resulting in a
mismatch between syscall entry end syscall exit entries in the ftrace.
Temporary restore syscall field of the pt_regs for the duration of
do_syscall_trace_leave.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
The author of these files has changed her name. Update
instances in the code of her dead name to current legal
name.
Signed-off-by: Annaliese McDermond <nh6z@nh6z.net>
Signed-off-by: Mark Brown <broonie@kernel.org>
At the beginning of ip6_fragment func, the prevhdr pointer is
obtained in the ip6_find_1stfragopt func.
However, all the pointers pointing into skb header may change
when calling skb_checksum_help func with
skb->ip_summed = CHECKSUM_PARTIAL condition.
The prevhdr pointe will be dangling if it is not reloaded after
calling __skb_linearize func in skb_checksum_help func.
Here, I add a variable, nexthdr_offset, to evaluate the offset,
which does not changes even after calling __skb_linearize func.
Fixes: 405c92f7a5 ("ipv6: add defensive check for CHECKSUM_PARTIAL skbs in ip_fragment")
Signed-off-by: Junwei Hu <hujunwei4@huawei.com>
Reported-by: Wenhao Zhang <zhangwenhao8@huawei.com>
Reported-by: syzbot+e8ce541d095e486074fc@syzkaller.appspotmail.com
Reviewed-by: Zhiqiang Liu <liuzhiqiang26@huawei.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently we may merge incorrectly a received GSO packet
or a packet with frag_list into a packet sitting in the
gro_hash list. skb_segment() may crash case because
the assumptions on the skb layout are not met.
The correct behaviour would be to flush the packet in the
gro_hash list and send the received GSO packet directly
afterwards. Commit d61d072e87 ("net-gro: avoid reorders")
sets NAPI_GRO_CB(skb)->flush in this case, but this is not
checked before merging. This patch makes sure to check this
flag and to not merge in that case.
Fixes: d61d072e87 ("net-gro: avoid reorders")
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Abort thread wakeups, on some wqe types, are not happening. The thread
wakeup logic is dependent upon the LPFC_DRIVER_ABORTED flag. However, on
these wqes, the completion handler running prior to the io completion
routine ends up clearing the flag.
Rework the wakeup logic to look at a non-null waitq element which must be
set if the abort thread is waiting. This is reverting the change in the
indicated patch.
Fixes: c2017260ee ("scsi: lpfc: Rework locking on SCSI io completion")
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reduce the default VMbus channel ring buffer size for storvsc SCSI devices
from 1 Mbyte to 128 Kbytes. Measurements show that ring buffer sizes above
128 Kbytes do not increase performance even at very high IOPS rates, so
don't waste the memory. Also remove the dependence on PAGE_SIZE, since the
ring buffer size should not change on architectures where PAGE_SIZE is not
4 Kbytes.
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When the number of sub-channels offered by Hyper-V is >= the number of CPUs
in the VM, calculate the correct number of sub-channels. The current code
produces one too many.
This scenario arises only when the number of CPUs is artificially
restricted (for example, with maxcpus=<n> on the kernel boot line), because
Hyper-V normally offers a sub-channel count < number of CPUs. While the
current code doesn't break, the extra sub-channel is unbalanced across the
CPUs (for example, a total of 5 channels on a VM with 4 CPUs).
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Long Li <longli@microsoft.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If CONFIG_CRYPTO is not set or set to m,
gcc building warn this:
lib/iov_iter.o: In function `hash_and_copy_to_iter':
iov_iter.c:(.text+0x9129): undefined reference to `crypto_stats_get'
iov_iter.c:(.text+0x9152): undefined reference to `crypto_stats_ahash_update'
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: d05f443554 ("iov_iter: introduce hash_and_copy_to_iter helper")
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
drivers/gpu/drm/i915/gvt/display.c:457: warning: Function parameter or member 'connected' not described in 'intel_vgpu_emulate_hotplug'
drivers/gpu/drm/i915/gvt/display.c:457: warning: Excess function parameter 'conncted' description in 'intel_vgpu_emulate_hotplug'
Fixes: 1ca20f33df ("drm/i915/gvt: add hotplug emulation")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Hang Yuan <hang.yuan@linux.intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
stride isn't in unit of pixel, it is bytes, so calculation of
plane size doesn't need to multiple bpp.
Fixes: e546e281d3 ("drm/i915/gvt: Dmabuf support for GVT-g")
Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
SNVS IRQ is requested before necessary driver data initialized,
if there is a pending IRQ during driver probe phase, kernel
NULL pointer panic will occur in IRQ handler. To avoid such
scenario, just initialize necessary driver data before enabling
IRQ. This patch is inspired by NXP's internal kernel tree.
Fixes: d3dc6e2322 ("input: keyboard: imx: add snvs power key driver")
Signed-off-by: Anson Huang <Anson.Huang@nxp.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Memory backed DMA mappings are accounted against a user's locked
memory limit, including multiple mappings of the same memory. This
accounting bounds the number of such mappings that a user can create.
However, DMA mappings that are not backed by memory, such as DMA
mappings of device MMIO via mmaps, do not make use of page pinning
and therefore do not count against the user's locked memory limit.
These mappings still consume memory, but the memory is not well
associated to the process for the purpose of oom killing a task.
To add bounding on this use case, we introduce a limit to the total
number of concurrent DMA mappings that a user is allowed to create.
This limit is exposed as a tunable module option where the default
value of 64K is expected to be well in excess of any reasonable use
case (a large virtual machine configuration would typically only make
use of tens of concurrent mappings).
This fixes CVE-2019-3882.
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Fixes the following sparse warning:
drivers/vfio/vfio_iommu_spapr_tce.c:1401:36: warning:
symbol 'tce_iommu_driver_ops' was not declared. Should it be static?
Fixes: 5ffd229c02 ("powerpc/vfio: Implement IOMMU driver for VFIO")
Signed-off-by: Wang Hai <wanghai26@huawei.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
When compiling with -Wformat, clang emits the following warnings:
drivers/vfio/pci/vfio_pci.c:1601:5: warning: format specifies type
'unsigned short' but the argument has type 'unsigned int' [-Wformat]
vendor, device, subvendor, subdevice,
^~~~~~
drivers/vfio/pci/vfio_pci.c:1601:13: warning: format specifies type
'unsigned short' but the argument has type 'unsigned int' [-Wformat]
vendor, device, subvendor, subdevice,
^~~~~~
drivers/vfio/pci/vfio_pci.c:1601:21: warning: format specifies type
'unsigned short' but the argument has type 'unsigned int' [-Wformat]
vendor, device, subvendor, subdevice,
^~~~~~~~~
drivers/vfio/pci/vfio_pci.c:1601:32: warning: format specifies type
'unsigned short' but the argument has type 'unsigned int' [-Wformat]
vendor, device, subvendor, subdevice,
^~~~~~~~~
drivers/vfio/pci/vfio_pci.c:1605:5: warning: format specifies type
'unsigned short' but the argument has type 'unsigned int' [-Wformat]
vendor, device, subvendor, subdevice,
^~~~~~
drivers/vfio/pci/vfio_pci.c:1605:13: warning: format specifies type
'unsigned short' but the argument has type 'unsigned int' [-Wformat]
vendor, device, subvendor, subdevice,
^~~~~~
drivers/vfio/pci/vfio_pci.c:1605:21: warning: format specifies type
'unsigned short' but the argument has type 'unsigned int' [-Wformat]
vendor, device, subvendor, subdevice,
^~~~~~~~~
drivers/vfio/pci/vfio_pci.c:1605:32: warning: format specifies type
'unsigned short' but the argument has type 'unsigned int' [-Wformat]
vendor, device, subvendor, subdevice,
^~~~~~~~~
The types of these arguments are unconditionally defined, so this patch
updates the format character to the correct ones for unsigned ints.
Link: https://github.com/ClangBuiltLinux/linux/issues/378
Signed-off-by: Louis Taylor <louis@kragniz.eu>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
This accidentally returns the wrong variable. The "req->ki_eventfd"
pointer is NULL so this return success.
Fixes: 7316b49c2a ("aio: move sanity checks and request allocation to io_submit_one()")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull HID fixes from Jiri Kosina:
- build dependency fix for hid-asus from Arnd Bergmann
- addition of omitted mapping of _ASSISTANT key from Dmitry Torokhov
- race condition fix in hid-debug inftastructure from He, Bo
- fixed support for devices with big maximum report size from Kai-Heng
Feng
- deadlock fix in hid-steam from Rodrigo Rivas Costa
- quite a few device-specific quirks
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
HID: input: add mapping for Assistant key
HID: i2c-hid: Disable runtime PM on Synaptics touchpad
HID: quirks: Fix keyboard + touchpad on Lenovo Miix 630
HID: logitech: Handle 0 scroll events for the m560
HID: debug: fix race condition with between rdesc_show() and device removal
HID: logitech: check the return value of create_singlethread_workqueue
HID: Increase maximum report size allowed by hid_field_extract()
HID: steam: fix deadlock with input devices.
HID: uclogic: remove redudant duplicated null check on ver_ptr
HID: quirks: Drop misused kernel-doc annotation
HID: hid-asus: select CONFIG_POWER_SUPPLY
HID: quirks: use correct format chars in dbg_hid
Will Deacon reported the following KASAN complaint:
[ 149.890370] ==================================================================
[ 149.891266] BUG: KASAN: double-free or invalid-free in io_sqe_files_unregister+0xa8/0x140
[ 149.892218]
[ 149.892411] CPU: 113 PID: 3974 Comm: io_uring_regist Tainted: G B 5.1.0-rc3-00012-g40b114779944 #3
[ 149.893623] Hardware name: linux,dummy-virt (DT)
[ 149.894169] Call trace:
[ 149.894539] dump_backtrace+0x0/0x228
[ 149.895172] show_stack+0x14/0x20
[ 149.895747] dump_stack+0xe8/0x124
[ 149.896335] print_address_description+0x60/0x258
[ 149.897148] kasan_report_invalid_free+0x78/0xb8
[ 149.897936] __kasan_slab_free+0x1fc/0x228
[ 149.898641] kasan_slab_free+0x10/0x18
[ 149.899283] kfree+0x70/0x1f8
[ 149.899798] io_sqe_files_unregister+0xa8/0x140
[ 149.900574] io_ring_ctx_wait_and_kill+0x190/0x3c0
[ 149.901402] io_uring_release+0x2c/0x48
[ 149.902068] __fput+0x18c/0x510
[ 149.902612] ____fput+0xc/0x18
[ 149.903146] task_work_run+0xf0/0x148
[ 149.903778] do_notify_resume+0x554/0x748
[ 149.904467] work_pending+0x8/0x10
[ 149.905060]
[ 149.905331] Allocated by task 3974:
[ 149.905934] __kasan_kmalloc.isra.0.part.1+0x48/0xf8
[ 149.906786] __kasan_kmalloc.isra.0+0xb8/0xd8
[ 149.907531] kasan_kmalloc+0xc/0x18
[ 149.908134] __kmalloc+0x168/0x248
[ 149.908724] __arm64_sys_io_uring_register+0x2b8/0x15a8
[ 149.909622] el0_svc_common+0x100/0x258
[ 149.910281] el0_svc_handler+0x48/0xc0
[ 149.910928] el0_svc+0x8/0xc
[ 149.911425]
[ 149.911696] Freed by task 3974:
[ 149.912242] __kasan_slab_free+0x114/0x228
[ 149.912955] kasan_slab_free+0x10/0x18
[ 149.913602] kfree+0x70/0x1f8
[ 149.914118] __arm64_sys_io_uring_register+0xc2c/0x15a8
[ 149.915009] el0_svc_common+0x100/0x258
[ 149.915670] el0_svc_handler+0x48/0xc0
[ 149.916317] el0_svc+0x8/0xc
[ 149.916817]
[ 149.917101] The buggy address belongs to the object at ffff8004ce07ed00
[ 149.917101] which belongs to the cache kmalloc-128 of size 128
[ 149.919197] The buggy address is located 0 bytes inside of
[ 149.919197] 128-byte region [ffff8004ce07ed00, ffff8004ce07ed80)
[ 149.921142] The buggy address belongs to the page:
[ 149.921953] page:ffff7e0013381f00 count:1 mapcount:0 mapping:ffff800503417c00 index:0x0 compound_mapcount: 0
[ 149.923595] flags: 0x1ffff00000010200(slab|head)
[ 149.924388] raw: 1ffff00000010200 dead000000000100 dead000000000200 ffff800503417c00
[ 149.925706] raw: 0000000000000000 0000000080400040 00000001ffffffff 0000000000000000
[ 149.927011] page dumped because: kasan: bad access detected
[ 149.927956]
[ 149.928224] Memory state around the buggy address:
[ 149.929054] ffff8004ce07ec00: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc
[ 149.930274] ffff8004ce07ec80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 149.931494] >ffff8004ce07ed00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 149.932712] ^
[ 149.933281] ffff8004ce07ed80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 149.934508] ffff8004ce07ee00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 149.935725] ==================================================================
which is due to a failure in registrering a fileset. This frees the
ctx->user_files pointer, but doesn't clear it. When the io_uring
instance is later freed through the normal channels, we free this
pointer again. At this point it's invalid.
Ensure we clear the pointer when we free it for the error case.
Reported-by: Will Deacon <will.deacon@arm.com>
Tested-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Stanislav Fomichev says:
====================
This patch series fixes the existing BPF flow dissector API to
support calling BPF progs from the eth_get_headlen context (the
support itself will be added in bpf-next tree).
The summary of the changes:
* fix VLAN handling in bpf_flow.c, we don't need to peek back and look
at skb->vlan_present; add selftests
* pass and use flow_keys->n_proto instead of skb->protocol
* fix clamping of flow_keys->nhoff for packets with nhoff > 0
* prohibit access to most of the __sk_buff fields from BPF flow
dissector progs; only data/data_end/flow_keys are allowed (all input
is now passed via flow_keys)
* finally, document BPF flow dissector program environment
====================
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Petar Penkov <peterpenkov96@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Short doc on what BPF flow dissector should expect in the input
__sk_buff and flow_keys.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Use whitelist instead of a blacklist and allow only a small set of
fields that might be relevant in the context of flow dissector:
* data
* data_end
* flow_keys
This is required for the eth_get_headlen case where we have only a
chunk of data to dissect (i.e. trying to read the other skb fields
doesn't make sense).
Note, that it is a breaking API change! However, we've provided
flow_keys->n_proto as a substitute for skb->protocol; and there is
no need to manually handle skb->vlan_present. So even if we
break somebody, the migration is trivial. Unfortunately, we can't
support eth_get_headlen use-case without those breaking changes.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Don't allow BPF program to set flow_keys->nhoff to less than initial
value. We currently don't read the value afterwards in anything but
the tests, but it's still a good practice to return consistent
values to the test programs.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This is a preparation for the next commit that would prohibit access to
the most fields of __sk_buff from the BPF programs.
Instead of requiring BPF flow dissector programs to look into skb,
pass all input data in the flow_keys.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
When we tail call PROG(VLAN) from parse_eth_proto we don't need to peek
back to handle vlan proto because we didn't adjust nhoff/thoff yet. Use
flow_keys->n_proto, that we set in parse_eth_proto instead and
properly increment nhoff as well.
Also, always use skb->protocol and don't look at skb->vlan_present.
skb->vlan_present indicates that vlan information is stored out-of-band
in skb->vlan_{tci,proto} and vlan header is already pulled from skb.
That means, skb->vlan_present == true is not relevant for BPF flow
dissector.
Add simple test cases with VLAN tagged frames:
* single vlan for ipv4
* double vlan for ipv6
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
According to HUTRR89 usage 0x1cb from the consumer page was assigned to
allow launching desktop-aware assistant application, so let's add the
mapping.
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
This adds a SND_PCI_QUIRK(...) line for the Tuxedo XC 1509.
The Tuxedo XC 1509 and the System76 oryp5 are the same barebone
notebooks manufactured by Clevo. To name the fixups both use after the
actual underlying hardware, this patch also changes System76_orpy5
to clevo_pb51ed in 2 enum symbols and one function name,
matching the other pci_quirk entries which are also named after the
device ODM.
Fixes: 7f665b1c32 ("ALSA: hda/realtek - Headset microphone and internal speaker support for System76 oryp5")
Signed-off-by: Richard Sailer <rs@tuxedocomputers.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
It will be lose Mic JD state when Chrome OS boot and headset was plugged.
Just Implement of reset combo jack JD verb for ACT_PRE_PROBE state.
Intel test result was also failed.
It test passed until changed the initial state to ACT_INIT.
Mic JD will show every time.
This patch also changed the model name as 'alc-chrome-book' for
application of Chrome OS.
Fixes: 10f5b1b85e ("ALSA: hda/realtek - Fixed Headset Mic JD not stable")
Signed-off-by: Kailang Yang <kailang@realtek.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
On AMD processors, the detection of an overflowed PMC counter in the NMI
handler relies on the current value of the PMC. So, for example, to check
for overflow on a 48-bit counter, bit 47 is checked to see if it is 1 (not
overflowed) or 0 (overflowed).
When the perf NMI handler executes it does not know in advance which PMC
counters have overflowed. As such, the NMI handler will process all active
PMC counters that have overflowed. NMI latency in newer AMD processors can
result in multiple overflowed PMC counters being processed in one NMI and
then a subsequent NMI, that does not appear to be a back-to-back NMI, not
finding any PMC counters that have overflowed. This may appear to be an
unhandled NMI resulting in either a panic or a series of messages,
depending on how the kernel was configured.
To mitigate this issue, add an AMD handle_irq callback function,
amd_pmu_handle_irq(), that will invoke the common x86_pmu_handle_irq()
function and upon return perform some additional processing that will
indicate if the NMI has been handled or would have been handled had an
earlier NMI not handled the overflowed PMC. Using a per-CPU variable, a
minimum value of the number of active PMCs or 2 will be set whenever a
PMC is active. This is used to indicate the possible number of NMIs that
can still occur. The value of 2 is used for when an NMI does not arrive
at the LAPIC in time to be collapsed into an already pending NMI. Each
time the function is called without having handled an overflowed counter,
the per-CPU value is checked. If the value is non-zero, it is decremented
and the NMI indicates that it handled the NMI. If the value is zero, then
the NMI indicates that it did not handle the NMI.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org> # 4.14.x-
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: https://lkml.kernel.org/r/Message-ID:
Signed-off-by: Ingo Molnar <mingo@kernel.org>
On AMD processors, the detection of an overflowed counter in the NMI
handler relies on the current value of the counter. So, for example, to
check for overflow on a 48 bit counter, bit 47 is checked to see if it
is 1 (not overflowed) or 0 (overflowed).
There is currently a race condition present when disabling and then
updating the PMC. Increased NMI latency in newer AMD processors makes this
race condition more pronounced. If the counter value has overflowed, it is
possible to update the PMC value before the NMI handler can run. The
updated PMC value is not an overflowed value, so when the perf NMI handler
does run, it will not find an overflowed counter. This may appear as an
unknown NMI resulting in either a panic or a series of messages, depending
on how the kernel is configured.
To eliminate this race condition, the PMC value must be checked after
disabling the counter. Add an AMD function, amd_pmu_disable_all(), that
will wait for the NMI handler to reset any active and overflowed counter
after calling x86_pmu_disable_all().
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org> # 4.14.x-
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: https://lkml.kernel.org/r/Message-ID:
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add a new configuration with a new firmware name for quz devices.
And, since these devices have the same PCI device and subsystem IDs,
we need to add some code to switch from a normal qu firmware to the
quz firmware.
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
With offloaded rate control, if the station parameters (rates, NSS,
bandwidth) change (sta_rc_update method), call iwl_mvm_rs_rate_init()
to propagate those change to the firmware.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
iwl_mvm_tx_mpdu() may run from iwl_mvm_add_new_dqa_stream_wk(), where
soft-IRQs aren't disabled. In this case, it may hold the station lock
and be interrupted by a soft-IRQ that also wants to acquire said lock,
leading to a deadlock.
Fix it by disabling soft-IRQs in iwl_mvm_add_new_dqa_stream_wk().
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
When an event is programmed with attr.wakeup_events=N (N>0), it means
the caller is interested in getting a user level notification after
N samples have been recorded in the kernel sampling buffer.
With precise events on Intel processors, the kernel uses PEBS.
The kernel tries minimize sampling overhead by verifying
if the event configuration is compatible with multi-entry PEBS mode.
If so, the kernel is notified only when the buffer has reached its threshold.
Other PEBS operates in single-entry mode, the kenrel is notified for each
PEBS sample.
The problem is that the current implementation look at frequency
mode and event sample_type but ignores the wakeup_events field. Thus,
it may not be possible to receive a notification after each precise event.
This patch fixes this problem by disabling multi-entry PEBS if wakeup_events
is non-zero.
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: kan.liang@intel.com
Link: https://lkml.kernel.org/r/20190306195048.189514-1-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Sometimes one of the nkmp factors is unused. This means that one of the
factors shift and width values are set to 0. Current nkmp clock code
generates a mask for each factor with GENMASK(width + shift - 1, shift).
For unused factor this translates to GENMASK(-1, 0). This code is
further expanded by C preprocessor to final version:
(((~0UL) - (1UL << (0)) + 1) & (~0UL >> (BITS_PER_LONG - 1 - (-1))))
or a bit simplified:
(~0UL & (~0UL >> BITS_PER_LONG))
It turns out that result of the second part (~0UL >> BITS_PER_LONG) is
actually undefined by C standard, which clearly specifies:
"If the value of the right operand is negative or is greater than or
equal to the width of the promoted left operand, the behavior is
undefined."
Additionally, compiling kernel with aarch64-linux-gnu-gcc 8.3.0 gave
different results whether literals or variables with same values as
literals were used. GENMASK with literals -1 and 0 gives zero and with
variables gives 0xFFFFFFFFFFFFFFF (~0UL). Because nkmp driver uses
GENMASK with variables as parameter, expression calculates mask as ~0UL
instead of 0. This has further consequences that LSB in register is
always set to 1 (1 is neutral value for a factor and shift is 0).
For example, H6 pll-de clock is set to 600 MHz by sun4i-drm driver, but
due to this bug ends up being 300 MHz. Additionally, 300 MHz seems to be
too low because following warning can be found in dmesg:
[ 1.752763] WARNING: CPU: 2 PID: 41 at drivers/clk/sunxi-ng/ccu_common.c:41 ccu_helper_wait_for_lock.part.0+0x6c/0x90
[ 1.763378] Modules linked in:
[ 1.766441] CPU: 2 PID: 41 Comm: kworker/2:1 Not tainted 5.1.0-rc2-next-20190401 #138
[ 1.774269] Hardware name: Pine H64 (DT)
[ 1.778200] Workqueue: events deferred_probe_work_func
[ 1.783341] pstate: 40000005 (nZcv daif -PAN -UAO)
[ 1.788135] pc : ccu_helper_wait_for_lock.part.0+0x6c/0x90
[ 1.793623] lr : ccu_helper_wait_for_lock.part.0+0x48/0x90
[ 1.799107] sp : ffff000010f93840
[ 1.802422] x29: ffff000010f93840 x28: 0000000000000000
[ 1.807735] x27: ffff800073ce9d80 x26: ffff000010afd1b8
[ 1.813049] x25: ffffffffffffffff x24: 00000000ffffffff
[ 1.818362] x23: 0000000000000001 x22: ffff000010abd5c8
[ 1.823675] x21: 0000000010000000 x20: 00000000685f367e
[ 1.828987] x19: 0000000000001801 x18: 0000000000000001
[ 1.834300] x17: 0000000000000001 x16: 0000000000000000
[ 1.839613] x15: 0000000000000000 x14: ffff000010789858
[ 1.844926] x13: 0000000000000000 x12: 0000000000000001
[ 1.850239] x11: 0000000000000000 x10: 0000000000000970
[ 1.855551] x9 : ffff000010f936c0 x8 : ffff800074cec0d0
[ 1.860864] x7 : 0000800067117000 x6 : 0000000115c30b41
[ 1.866177] x5 : 00ffffffffffffff x4 : 002c959300bfe500
[ 1.871490] x3 : 0000000000000018 x2 : 0000000029aaaaab
[ 1.876802] x1 : 00000000000002e6 x0 : 00000000686072bc
[ 1.882114] Call trace:
[ 1.884565] ccu_helper_wait_for_lock.part.0+0x6c/0x90
[ 1.889705] ccu_helper_wait_for_lock+0x10/0x20
[ 1.894236] ccu_nkmp_set_rate+0x244/0x2a8
[ 1.898334] clk_change_rate+0x144/0x290
[ 1.902258] clk_core_set_rate_nolock+0x180/0x1b8
[ 1.906963] clk_set_rate+0x34/0xa0
[ 1.910455] sun8i_mixer_bind+0x484/0x558
[ 1.914466] component_bind_all+0x10c/0x230
[ 1.918651] sun4i_drv_bind+0xc4/0x1a0
[ 1.922401] try_to_bring_up_master+0x164/0x1c0
[ 1.926932] __component_add+0xa0/0x168
[ 1.930769] component_add+0x10/0x18
[ 1.934346] sun8i_dw_hdmi_probe+0x18/0x20
[ 1.938443] platform_drv_probe+0x50/0xa0
[ 1.942455] really_probe+0xcc/0x280
[ 1.946032] driver_probe_device+0x54/0xe8
[ 1.950130] __device_attach_driver+0x80/0xb8
[ 1.954488] bus_for_each_drv+0x78/0xc8
[ 1.958326] __device_attach+0xd4/0x130
[ 1.962163] device_initial_probe+0x10/0x18
[ 1.966348] bus_probe_device+0x90/0x98
[ 1.970185] deferred_probe_work_func+0x6c/0xa0
[ 1.974720] process_one_work+0x1e0/0x320
[ 1.978732] worker_thread+0x228/0x428
[ 1.982484] kthread+0x120/0x128
[ 1.985714] ret_from_fork+0x10/0x18
[ 1.989290] ---[ end trace 9babd42e1ca4b84f ]---
This commit solves the issue by first checking value of the factor
width. If it is equal to 0 (unused factor), mask is set to 0, otherwise
GENMASK() macro is used as before.
Fixes: d897ef56fa ("clk: sunxi-ng: Mask nkmp factors when setting register")
Signed-off-by: Jernej Skrabec <jernej.skrabec@siol.net>
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
A NULL pointer dereference bug was reported on a distribution kernel but
the same issue should be present on mainline kernel. It occured on s390
but should not be arch-specific. A partial oops looks like:
Unable to handle kernel pointer dereference in virtual kernel address space
...
Call Trace:
...
try_to_wake_up+0xfc/0x450
vhost_poll_wakeup+0x3a/0x50 [vhost]
__wake_up_common+0xbc/0x178
__wake_up_common_lock+0x9e/0x160
__wake_up_sync_key+0x4e/0x60
sock_def_readable+0x5e/0x98
The bug hits any time between 1 hour to 3 days. The dereference occurs
in update_cfs_rq_h_load when accumulating h_load. The problem is that
cfq_rq->h_load_next is not protected by any locking and can be updated
by parallel calls to task_h_load. Depending on the compiler, code may be
generated that re-reads cfq_rq->h_load_next after the check for NULL and
then oops when reading se->avg.load_avg. The dissassembly showed that it
was possible to reread h_load_next after the check for NULL.
While this does not appear to be an issue for later compilers, it's still
an accident if the correct code is generated. Full locking in this path
would have high overhead so this patch uses READ_ONCE to read h_load_next
only once and check for NULL before dereferencing. It was confirmed that
there were no further oops after 10 days of testing.
As Peter pointed out, it is also necessary to use WRITE_ONCE() to avoid any
potential problems with store tearing.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org>
Fixes: 685207963b ("sched: Move h_load calculation to task_h_load()")
Link: https://lkml.kernel.org/r/20190319123610.nsivgf3mjbjjesxb@techsingularity.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Since this driver only has a dependency on ARCH_SUNXI just because it
doesn't make any sense to run it on something else, we can definitely
enable it through COMPILE_TEST as well to get some build coverage.
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Pull pidfd fix from Christian Brauner:
"This should be an uncontroversial fix for pidfd_send_signal() by Jann
to better align it's behavior with other signal sending functions:
In one of the early versions of the patchset it was suggested to not
unconditionally error out when a signal with SI_USER is sent to a
non-current task (cf. [1]).
Instead, pidfd_send_signal() currently silently changes this to a
regular kill signal. While this is technically fine, the semantics are
weird since the kernel just silently converts a user's request behind
their back and also no other signal sending function allows to do
this. It gets more hairy when we introduce sending signals to a
specific thread soon.
So let's align pidfd_send_signal() with all the other signal sending
functions and error out when SI_USER signals are sent to a non-current
task"
* tag 'pidfd-fixes-v5.1-rc3' of gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux:
signal: don't silently convert SI_USER signals to non-current pidfd
Users have been seeing sound stability issues with max98090 codecs since:
commit 648e921888 ("clk: x86: Stop marking clocks as CLK_IS_CRITICAL")
At first that commit broke sound for Chromebook Swanky and Clapper models,
the problem was that the machine-driver has been controlling the wrong
clock on those models since support for them was added. This was hidden by
clk-pmc-atom.c keeping the actual clk on unconditionally.
With the machine-driver controlling the proper clock, sound works again
but we are seeing bug reports describing it as: low volume,
"sounds like played at 10x speed" and instable.
When these issues are hit the following message is seen in dmesg:
"max98090 i2c-193C9890:00: PLL unlocked".
Attempts have been made to fix this by inserting a delay between enabling
the clk and enabling and checking the pll, but this has not helped.
It seems that at least on boards which use pmc_plt_clk_0 as clock,
if we ever disable the clk, the pll looses its lock and after that we get
various issues.
This commit fixes this by enabling the clock once at probe time on
these boards. In essence this restores the old behavior of clk-pmc-atom.c
always keeping the clk on on these boards.
Fixes: 648e921888 ("clk: x86: Stop marking clocks as CLK_IS_CRITICAL")
Reported-by: Mogens Jensen <mogens-jensen@protonmail.com>
Reported-by: Dean Wallace <duffydack73@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Pull hwmon fixes from Guenter Roeck:
"Couple of minor hwmon fixes"
* tag 'hwmon-for-v5.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
dt-bindings: hwmon: (adc128d818) Specify ti,mode property size
hwmon: (ntc_thermistor) Fix temperature type reporting
hwmon: (occ) Fix power sensor indexing
hwmon: (w83773g) Select REGMAP_I2C to fix build error
Trigger stop can be called in situations where trigger start failed
and as such it can't be assumed the buffer is already attached to
the compressed stream or a NULL pointer may be dereferenced.
Fixes: 639e5eb3c7 ("ASoC: wm_adsp: Correct handling of compressed streams that restart")
Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
When disabling LPIs (for example on reset) at the redistributor
level, it is expected that LPIs that was pending in the CPU
interface are eventually retired.
Currently, this is not what is happening, and these LPIs will
stay in the ap_list, eventually being acknowledged by the vcpu
(which didn't quite expect this behaviour).
The fix is thus to retire these LPIs from the list of pending
interrupts as we disable LPIs.
Reported-by: Heyi Guo <guoheyi@huawei.com>
Tested-by: Heyi Guo <guoheyi@huawei.com>
Fixes: 0e4e82f154 ("KVM: arm64: vgic-its: Enable ITS emulation as a virtual MSI controller")
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
The rlc reset function is not necessary during gfx9 initialization/resume phase.
And this function would even cause rlc fw loading failed on some gfx9 ASIC.
Remove this function safely with verification well on Vega/Raven platform.
Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jeff Kirsher says:
====================
Intel Wired LAN Driver Fixes 2019-04-01
This series contains two fixes for XDP in the i40e driver.
Björn provides both fixes, first moving a function out of the header and
into the main.c file. Second fixes a regression introduced in an
earlier patch that removed umem from the VSI. This caused an issue
because the setup code would try to enable AF_XDP zero copy
unconditionally, as long as there was a umem placed in the netdev
receive structure.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The device type for ip6 tunnels is set to
ARPHRD_TUNNEL6. However, the ip4ip6_err function
is expecting the device type of the tunnel to be
ARPHRD_TUNNEL. Since the device types do not
match, the function exits and the ICMP error
packet is not sent to the originating host. Note
that the device type for IPv4 tunnels is set to
ARPHRD_TUNNEL.
Fix is to expect a tunnel device type of
ARPHRD_TUNNEL6 instead. Now the tunnel device
type matches and the ICMP error packet is sent
to the originating host.
Signed-off-by: Sheena Mira-ato <sheena.mira-ato@alliedtelesis.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
1. Bump top level address-cells/size-cells nodes to 2 (to ensure all
down stream addresses are 64-bits, unless explicitly specified
otherwise (in "soc" bus with all peripherals)
2. "memory" also specified with address/size 2
3. Add a commented reference for PAE40 region beyond 4GB physical
address space
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
HSDK currently panics when built for HIGHMEM/ARC_HAS_PAE40 because ioc
is enabled with default which doesn't work for the 2 non contiguous
memory nodes. So get PAE working by disabling ioc instead.
Tested with !PAE40 by forcing @ioc_enable=0 and running the glibc
testsuite over ssh
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
This patch uses the device description to clearly identity a device
attached to the bus. It is needed as the currently useed mdevX
notation is not sufficiant in case more than one network
interface controller is being used at the same time.
Cc: stable@vger.kernel.org
Signed-off-by: Christian Gromm <christian.gromm@microchip.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There be should check return value from dma_set_mask to throw some info
if fail to set dma mask.
Detected by CoverityScan, CID# 1443983: Error handling issues (CHECKED_RETURN)
Fixes: f6f9279f2b ("misc: fastrpc: Add Qualcomm fastrpc basic driver model")
Signed-off-by: Bo YU <tsu.yubo@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Oded writes:
The following bug fix is included in this tag:
- Fix the low credit limit for DMA channel #0. Without this fix, the
channel is unusable by the user.
* tag 'misc-habanalabs-fixes-2019-04-01' of git://people.freedesktop.org/~gabbayo/linux:
habanalabs: remove low credit limit of DMA #0
For now, we just trace plug for single queue device or drivers
provide .commit_rqs, and have not trace plug for multiple queues
device. But, unplug events will be recorded when call
blk_mq_flush_plug_list(). Then, trace events will be asymmetrical,
just have unplug and without plug.
This patch add trace plug and unplug for multiple queues device in
blk_mq_make_request(). After that, we can accurately trace plug and
unplug for multiple queues.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The uapi header asound.h defines types based on struct timespec. We need
to #include <time.h> to get access to the definition of this struct.
Previously, we encountered the following error message when building
applications with a clang/bionic toolchain:
kernel-headers/sound/asound.h:350:19: error: field has incomplete type 'struct timespec'
struct timespec trigger_tstamp;
^
The absence of the time.h #include statement does not cause build errors
with glibc, because its version of stdlib.h indirectly includes time.h.
Signed-off-by: Daniel Mentz <danielmentz@google.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The Acer TravelMate B114-21 laptop cannot detect and record sound from
headset MIC. This patch adds the ALC233_FIXUP_ACER_HEADSET_MIC HDA verb
quirk chained with ALC233_FIXUP_ASUS_MIC_NO_PRESENCE pin quirk to fix
this issue.
[ fixed the missing brace and reordered the entry -- tiwai ]
Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com>
Signed-off-by: Daniel Drake <drake@endlessm.com>
Reviewed-by: Kailang Yang <kailang@realtek.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Jonathan writes:
First set of IIO fixes for the 5.1 cycle.
Mostly the usual mix, but the bme680 SPI fix is much larger than
I would normally like. It never worked, but conversely we have
code there that would make people expect it to do so. Chances
of side effects are very low.
* core
- Fix an uninitialised bitaks that could potentially result in random
channels being enabled on startup.
* ad7192
- Fix a wrong channel address for ad7193.
* ade7854
- Fix a typo that results in returning peak voltage instead of peak current.
* at91
- Fix a potential hang due to a race on interrupt setting.
* bmg160
- Fix scale factor of temperature
* bme680
- Fix scale factor of temperature
- Fix SPI read interface. This is a bit of a large patch as it seems
that it never worked. It's major for this driver but is unlikely to
have any negative side effects.
* kxcjk1013
- restore sensor range setting after resume.
* mcp4725
- make sure to store powerdown bits when storing to the eeprom.
* mpu3050
- Mask the chip ID correctly as we have chips that set the bother bits of
this register.
* sgp30
- Fix a missing Kconfig block that means the driver doesn't actually ever
get built.
* tag 'iio-fixes-for-5.1a' of git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio:
iio: core: fix a possible circular locking dependency
iio: ad_sigma_delta: select channel when reading register
iio: pms7003: select IIO_TRIGGERED_BUFFER
iio: cros_ec: Fix the maths for gyro scale calculation
iio: adc: xilinx: prevent touching unclocked h/w on remove
iio: adc: xilinx: fix potential use-after-free on probe
iio: adc: xilinx: fix potential use-after-free on remove
iio: dac: mcp4725: add missing powerdown bits in store eeprom
io: accel: kxcjk1013: restore the range after resume.
iio:chemical:bme680: Fix SPI read interface
iio:chemical:bme680: Fix, report temperature in millidegrees
iio: chemical: fix missing Kconfig block for sgp30
iio: adc: at91: disable adc channel interrupt in timeout case
iio: gyro: mpu3050: fix chip ID reading
iio: Fix scan mask selection
staging: iio: ad7192: Fix ad7193 channel address
iio/gyro/bmg160: Use millidegrees for temperature scale
Staging: iio: meter: fixed typo
We currently don't reload pointers pointing into skb header
after doing pskb_may_pull() in _decode_session4(). So in case
pskb_may_pull() changed the pointers, we read from random
memory. Fix this by putting all the needed infos on the
stack, so that we don't need to access the header pointers
after doing pskb_may_pull().
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Currently, buffers, schedulers, src's, encoders, decoders
and effect type dapm widgets remain always on as their
power_check method is not set. Setting this callback allows these
widgets in the audio path to be powered managed properly.
Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
If for any reason, the backend does not have the requested substream
(like capture on a playback only backend), the BE will be skipped in
dpcm_be_dai_startup().
However, dpcm_apply_symmetry() does not skip those BE and will
dereference the be_substream (NULL) pointer anyway.
Like in dpcm_be_dai_startup(), just skip those BE.
Fixes: 906c7d690c ("ASoC: dpcm: Apply symmetry for DPCM")
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
We should use SoC compatible string in stead of wildcard string for
PMIC child devices.
Fixes: 0419a75b18 (arm64: dts: sprd: Remove wildcard compatible string)
Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Since commit 6e2bd956936 ("i2c: omap: Use noirq system sleep pm ops to idle device for suspend")
on gta04 we have handle_twl4030_pih() called in situations where pm_runtime_get()
in i2c-omap.c returns -EACCES.
[ 86.474365] Freezing remaining freezable tasks ... (elapsed 0.002 seconds) done.
[ 86.485473] printk: Suspending console(s) (use no_console_suspend to debug)
[ 86.555572] Disabling non-boot CPUs ...
[ 86.555664] Successfully put all powerdomains to target state
[ 86.563720] twl: Read failed (mod 1, reg 0x01 count 1)
[ 86.563751] twl4030: I2C error -13 reading PIH ISR
[ 86.563812] twl: Read failed (mod 1, reg 0x01 count 1)
[ 86.563812] twl4030: I2C error -13 reading PIH ISR
[ 86.563873] twl: Read failed (mod 1, reg 0x01 count 1)
[ 86.563903] twl4030: I2C error -13 reading PIH ISR
This happens when we wakeup via something behing twl4030 (powerbutton or rtc
alarm). This goes on for minutes until the system is finally resumed.
Disable the irq on suspend and enable it on resume to avoid
having i2c access problems when the irq registers are checked.
Fixes: 6e2bd956936 ("i2c: omap: Use noirq system sleep pm ops to idle device for suspend")
Signed-off-by: Andreas Kemnade <andreas@kemnade.info>
Tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
We don't want to overwrite "ret", it already holds the correct error
code. The "regmap" variable might be a valid pointer as this point.
Fixes: 8f83f26891 ("drm/mediatek: Add HDMI support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: CK Hu <ck.hu@mediatek.com>
If dccp_feat_push_change fails, we forget free the mem
which is alloced by kmemdup in dccp_feat_clone_sp_val.
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: e8ef967a54 ("dccp: Registration routines for changing feature values")
Reviewed-by: Mukesh Ojha <mojha@codeaurora.org>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Syzbot report a kernel-infoleak:
BUG: KMSAN: kernel-infoleak in _copy_to_user+0x16b/0x1f0 lib/usercopy.c:32
Call Trace:
_copy_to_user+0x16b/0x1f0 lib/usercopy.c:32
copy_to_user include/linux/uaccess.h:174 [inline]
sctp_getsockopt_peer_addrs net/sctp/socket.c:5911 [inline]
sctp_getsockopt+0x1668e/0x17f70 net/sctp/socket.c:7562
...
Uninit was stored to memory at:
sctp_transport_init net/sctp/transport.c:61 [inline]
sctp_transport_new+0x16d/0x9a0 net/sctp/transport.c:115
sctp_assoc_add_peer+0x532/0x1f70 net/sctp/associola.c:637
sctp_process_param net/sctp/sm_make_chunk.c:2548 [inline]
sctp_process_init+0x1a1b/0x3ed0 net/sctp/sm_make_chunk.c:2361
...
Bytes 8-15 of 16 are uninitialized
It was caused by that th _pad field (the 8-15 bytes) of a v4 addr (saved in
struct sockaddr_in) wasn't initialized, but directly copied to user memory
in sctp_getsockopt_peer_addrs().
So fix it by calling memset(addr->v4.sin_zero, 0, 8) to initialize _pad of
sockaddr_in before copying it to user memory in sctp_v4_addr_to_user(), as
sctp_v6_addr_to_user() does.
Reported-by: syzbot+86b5c7c236a22616a72f@syzkaller.appspotmail.com
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Tested-by: Alexander Potapenko <glider@google.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski says:
====================
nfp: flower: fix matching and pushing vlan CFI bit
This patch clears up some confusion around the meaning of bit 12
for FW messages related to VLAN and flower offload.
Pieter says:
It fixes issues with matching, pushing and popping vlan tags.
We replace the vlan CFI bit with a vlan present bit that
indicates the presence of a vlan tag. We also no longer set
the CFI when pushing vlan tags.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace vlan CFI bit with a vlan present bit that indicates the
presence of a vlan tag. Previously the driver incorrectly assumed
that an vlan id of 0 is not matchable, therefore we indicate vlan
presence with a vlan present bit.
Fixes: 5571e8c9f2 ("nfp: extend flower matching capabilities")
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Signed-off-by: Louis Peens <louis.peens@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When kcm is loaded while many processes try to create a KCM socket, a
crash occurs:
BUG: unable to handle kernel NULL pointer dereference at 000000000000000e
IP: mutex_lock+0x27/0x40 kernel/locking/mutex.c:240
PGD 8000000016ef2067 P4D 8000000016ef2067 PUD 3d6e9067 PMD 0
Oops: 0002 [#1] SMP KASAN PTI
CPU: 0 PID: 7005 Comm: syz-executor.5 Not tainted 4.12.14-396-default #1 SLE15-SP1 (unreleased)
RIP: 0010:mutex_lock+0x27/0x40 kernel/locking/mutex.c:240
RSP: 0018:ffff88000d487a00 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 000000000000000e RCX: 1ffff100082b0719
...
CR2: 000000000000000e CR3: 000000004b1bc003 CR4: 0000000000060ef0
Call Trace:
kcm_create+0x600/0xbf0 [kcm]
__sock_create+0x324/0x750 net/socket.c:1272
...
This is due to race between sock_create and unfinished
register_pernet_device. kcm_create tries to do "net_generic(net,
kcm_net_id)". but kcm_net_id is not initialized yet.
So switch the order of the two to close the race.
This can be reproduced with mutiple processes doing socket(PF_KCM, ...)
and one process doing module removal.
Fixes: ab7ac4eb98 ("kcm: Kernel Connection Multiplexor module")
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni says:
====================
net: sched: fix stats accounting for child NOLOCK qdiscs
Currently, stats accounting for NOLOCK qdisc enslaved to classful (lock)
qdiscs is buggy. Per CPU values are ignored in most places, as a result,
stats dump in the above scenario always report 0 length backlog and parent
backlog len is not updated correctly on NOLOCK qdisc removal.
The first patch address stats dumping, and the second one child qdisc removal.
I'm targeting the net tree as this is a bugfix, but it could be moved to
net-next due to the relatively large diffstat.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The same code to flush qdisc tree and purge the qdisc queue
is duplicated in many places and in most cases it does not
respect NOLOCK qdisc: the global backlog len is used and the
per CPU values are ignored.
This change addresses the above, factoring-out the relevant
code and using the helpers introduced by the previous patch
to fetch the correct backlog len.
Fixes: c5ad119fb6 ("net: sched: pfifo_fast use skb_array")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Classful qdiscs can't access directly the child qdiscs backlog
length: if such qdisc is NOLOCK, per CPU values should be
accounted instead.
Most qdiscs no not respect the above. As a result, qstats fetching
for most classful qdisc is currently incorrect: if the child qdisc is
NOLOCK, it always reports 0 len backlog.
This change introduces a pair of helpers to safely fetch
both backlog and qlen and use them in stats class dumping
functions, fixing the above issue and cleaning a bit the code.
DRR needs also to access the child qdisc queue length, so it
needs custom handling.
Fixes: c5ad119fb6 ("net: sched: pfifo_fast use skb_array")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This driver is Intel-only so loading on anything which is not Intel is
pointless. Prevent it from doing so.
While at it, correct the "not supported" print statement to say CPU
"model" which is what that test does.
Fixes: 076b862c7e (cpufreq: intel_pstate: Add reasons for failure and debug messages)
Suggested-by: Erwan Velu <e.velu@criteo.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
It returned always NULL, thus it was never possible to get the filter.
Example:
$ ip link add foo type dummy
$ ip link add bar type dummy
$ tc qdisc add dev foo clsact
$ tc filter add dev foo protocol all pref 1 ingress handle 1234 \
matchall action mirred ingress mirror dev bar
Before the patch:
$ tc filter get dev foo protocol all pref 1 ingress handle 1234 matchall
Error: Specified filter handle not found.
We have an error talking to the kernel
After:
$ tc filter get dev foo protocol all pref 1 ingress handle 1234 matchall
filter ingress protocol all pref 1 matchall chain 0 handle 0x4d2
not_in_hw
action order 1: mirred (Ingress Mirror to device bar) pipe
index 1 ref 1 bind 1
CC: Yotam Gigi <yotamg@mellanox.com>
CC: Jiri Pirko <jiri@mellanox.com>
Fixes: fd62d9f5c5 ("net/sched: matchall: Fix configuration race")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current sys_pidfd_send_signal() silently turns signals with explicit
SI_USER context that are sent to non-current tasks into signals with
kernel-generated siginfo.
This is unlike do_rt_sigqueueinfo(), which returns -EPERM in this case.
If a user actually wants to send a signal with kernel-provided siginfo,
they can do that with pidfd_send_signal(pidfd, sig, NULL, 0); so allowing
this case is unnecessary.
Instead of silently replacing the siginfo, just bail out with an error;
this is consistent with other interfaces and avoids special-casing behavior
based on security checks.
Fixes: 3eb39f4793 ("signal: add pidfd_send_signal() syscall")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Christian Brauner <christian@brauner.io>
Some devices don't use blk_integrity but still want stable pages
because they do their own checksumming. Examples include rbd and iSCSI
when data digests are negotiated. Stacking DM (and thus LVM) on top of
these devices results in sporadic checksum errors.
Set BDI_CAP_STABLE_WRITES if any underlying device has it set.
Cc: stable@vger.kernel.org
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
The limit was already incorporated to dm-crypt with commit 4e870e948f
("dm crypt: fix error with too large bios"), so we don't need to apply
it globally to all targets. The quantity BIO_MAX_PAGES * PAGE_SIZE is
wrong anyway because the variable ti->max_io_len it is supposed to be in
the units of 512-byte sectors not in bytes.
Reduction of the limit to 1048576 sectors could even cause data
corruption in rare cases - suppose that we have a dm-striped device with
stripe size 768MiB. The target will call dm_set_target_max_io_len with
the value 1572864. The buggy code would reduce it to 1048576. Now, the
dm-core will errorneously split the bios on 1048576-sector boundary
insetad of 1572864-sector boundary and pass these stripe-crossing bios
to the striped target.
Cc: stable@vger.kernel.org # v4.16+
Fixes: 8f50e35815 ("dm: limit the max bio size as BIO_MAX_PAGES * PAGE_SIZE")
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Acked-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
A non const pointer to const cannot be marked initconst.
Mark the array actually const.
Fixes: 6bbc923dfc dm: add support to directly boot to a mapped device
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Fix sparse warnings:
drivers/md/dm-integrity.c:3619:12: warning:
symbol 'dm_integrity_init' was not declared. Should it be static?
drivers/md/dm-integrity.c:3638:6: warning:
symbol 'dm_integrity_exit' was not declared. Should it be static?
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
If the string opt_string is small, the function memcmp can access bytes
that are beyond the terminating nul character. In theory, it could cause
segfault, if opt_string were located just below some unmapped memory.
Change from memcmp to strncmp so that we don't read bytes beyond the end
of the string.
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Reconnecting after server or network failure can be improved
(to maintain availability and protect data integrity) by allowing
the client to choose the default persistent (or resilient)
handle timeout in some use cases. Today we default to 0 which lets
the server pick the default timeout (usually 120 seconds) but this
can be problematic for some workloads. Add the new mount parameter
to cifs.ko for SMB3 mounts "handletimeout" which enables the user
to override the default handle timeout for persistent (mount
option "persistenthandles") or resilient handles (mount option
"resilienthandles"). Maximum allowed is 16 minutes (960000 ms).
Units for the timeout are expressed in milliseconds. See
section 2.2.14.2.12 and 2.2.31.3 of the MS-SMB2 protocol
specification for more information.
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
CC: Stable <stable@vger.kernel.org>
Some servers (see MS-SMB2 protocol specification
section 3.3.5.15.1) expect that the FSCTL enumerate snapshots
is done twice, with the first query having EXACTLY the minimum
size response buffer requested (16 bytes) which refreshes
the snapshot list (otherwise that and subsequent queries get
an empty list returned). So had to add code to set
the maximum response size differently for the first snapshot
query (which gets the size needed for the second query which
contains the actual list of snapshots).
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
CC: Stable <stable@vger.kernel.org> # 4.19+
Fix a bug where we used to not initialize the cached fid structure at all
in open_shroot() if the open was successful but we did not get a lease.
This would leave the structure uninitialized and later when we close the handle
we would in close_shroot() try to kref_put() an uninitialized refcount.
Fix this by always initializing this structure if the open was successful
but only do the extra get() if we got a lease.
This extra get() is only used to hold the structure until we get a lease
break from the server at which point we will kref_put() it during lease
processing.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
In commit f3fef2b6e1 ("i40e: Remove umem from VSI") a regression was
introduced; When the VSI was reset, the setup code would try to enable
AF_XDP ZC unconditionally (as long as there was a umem placed in the
netdev._rx struct). Here, we add a bitmap to the VSI that tracks if a
certain queue pair has been "zero-copy enabled" via the ndo_bpf. The
bitmap is used in i40e_xsk_umem, and enables zero-copy if and only if
XDP is enabled, the corresponding qid in the bitmap is set and the
umem is non-NULL.
Fixes: f3fef2b6e1 ("i40e: Remove umem from VSI")
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Configuration check to accept source route IP options should be made on
the incoming netdevice when the skb->dev is an l3mdev master. The route
lookup for the source route next hop also needs the incoming netdev.
v2->v3:
- Simplify by passing the original netdevice down the stack (per David
Ahern).
Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When tcp_sk_init() failed in inet_ctl_sock_create(),
'net->ipv4.tcp_congestion_control' will be left
uninitialized, but tcp_sk_exit() hasn't check for
that.
This patch add checking on 'net->ipv4.tcp_congestion_control'
in tcp_sk_exit() to prevent NULL-ptr dereference.
Fixes: 6670e15244 ("tcp: Namespace-ify sysctl_tcp_default_congestion_control")
Signed-off-by: Dust Li <dust.li@linux.alibaba.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull aio race fixes and cleanups from Al Viro.
The aio code had more issues with error handling and races with the aio
completing at just the right (wrong) time along with freeing the file
descriptor when another thread closes the file.
Just a couple of these commits are the actual fixes: the others are
cleanups to either make the fixes simpler, or to make the code legible
and understandable enough that we hope there's no more fundamental races
hiding.
* 'work.aio' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
aio: move sanity checks and request allocation to io_submit_one()
deal with get_reqs_available() in aio_get_req() itself
aio: move dropping ->ki_eventfd into iocb_destroy()
make aio_read()/aio_write() return int
Fix aio_poll() races
aio: store event at final iocb_put()
aio: keep io_event in aio_kiocb
aio: fold lookup_kiocb() into its sole caller
pin iocb through aio.
Pull symlink fixes from Al Viro:
"The ceph fix is already in mainline, Daniel's bpf fix is in bpf tree
(1da6c4d914 "bpf: fix use after free in bpf_evict_inode"), the rest
is in here"
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
debugfs: fix use-after-free on symlink traversal
ubifs: fix use-after-free on symlink traversal
jffs2: fix use-after-free on symlink traversal
We have a new Dell laptop which has the synaptics I2C touchpad
(06cb:7e7e) on it. After booting up the Linux, the touchpad doesn't
work, there is no interrupt when touching the touchpad, after
disable the runtime PM, everything works well.
I also tried the quirk of I2C_HID_QUIRK_DELAY_AFTER_SLEEP, it is
better after applied this quirk, there are interrupts but data it
reports is invalid.
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Commit 0df977eafc ("powerpc/6xx: Don't use SPRN_SPRG2 for storing
stack pointer while in RTAS") changes the code to use a field in
thread struct to store the stack pointer while in RTAS instead of
using SPRN_SPRG2. It therefore converts all places which were
manipulating SPRN_SPRG2 to use that field. During early startup, the
zeroing of SPRN_SPRG2 has been replaced by a zeroing of that field in
thread struct. But at least in start_here, that's done wrongly because
it used the physical address of the fields while MMU is on at that
time.
So the virtual address of the field should be used instead, but in
the meantime, thread struct has already been zeroed and initialised
so we can just drop this initialisation.
Reported-by: Larry Finger <Larry.Finger@lwfinger.net>
Fixes: 0df977eafc ("powerpc/6xx: Don't use SPRN_SPRG2 for storing stack pointer while in RTAS")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Tested-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The common pins were mistakenly not added to the DAPM graph.
Adding these pins will allow valid graphs to be created.
Signed-off-by: Annaliese McDermond <nh6z@nh6z.net>
Signed-off-by: Mark Brown <broonie@kernel.org>
symlink body shouldn't be freed without an RCU delay. Switch debugfs to
->destroy_inode() and use of call_rcu(); free both the inode and symlink
body in the callback. Similar to solution for bpf, only here it's even
more obvious that ->evict_inode() can be dropped.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
free the symlink body after the same RCU delay we have for freeing the
struct inode itself, so that traversal during RCU pathwalk wouldn't step
into freed memory.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
free the symlink body after the same RCU delay we have for freeing the
struct inode itself, so that traversal during RCU pathwalk wouldn't step
into freed memory.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Xin Long says:
====================
tipc: a batch of uninit-value fixes for netlink_compat
These issues were all reported by syzbot, and exist since very beginning.
See the details on each patch.
====================
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Syzbot found a crash:
BUG: KMSAN: uninit-value in tipc_nl_compat_name_table_dump+0x54f/0xcd0 net/tipc/netlink_compat.c:872
Call Trace:
tipc_nl_compat_name_table_dump+0x54f/0xcd0 net/tipc/netlink_compat.c:872
__tipc_nl_compat_dumpit+0x59e/0xda0 net/tipc/netlink_compat.c:215
tipc_nl_compat_dumpit+0x63a/0x820 net/tipc/netlink_compat.c:280
tipc_nl_compat_handle net/tipc/netlink_compat.c:1226 [inline]
tipc_nl_compat_recv+0x1b5f/0x2750 net/tipc/netlink_compat.c:1265
genl_family_rcv_msg net/netlink/genetlink.c:601 [inline]
genl_rcv_msg+0x185f/0x1a60 net/netlink/genetlink.c:626
netlink_rcv_skb+0x431/0x620 net/netlink/af_netlink.c:2477
genl_rcv+0x63/0x80 net/netlink/genetlink.c:637
netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
netlink_unicast+0xf3e/0x1020 net/netlink/af_netlink.c:1336
netlink_sendmsg+0x127f/0x1300 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:622 [inline]
sock_sendmsg net/socket.c:632 [inline]
Uninit was created at:
__alloc_skb+0x309/0xa20 net/core/skbuff.c:208
alloc_skb include/linux/skbuff.h:1012 [inline]
netlink_alloc_large_skb net/netlink/af_netlink.c:1182 [inline]
netlink_sendmsg+0xb82/0x1300 net/netlink/af_netlink.c:1892
sock_sendmsg_nosec net/socket.c:622 [inline]
sock_sendmsg net/socket.c:632 [inline]
It was supposed to be fixed on commit 974cb0e3e7 ("tipc: fix uninit-value
in tipc_nl_compat_name_table_dump") by checking TLV_GET_DATA_LEN(msg->req)
in cmd->header()/tipc_nl_compat_name_table_dump_header(), which is called
ahead of tipc_nl_compat_name_table_dump().
However, tipc_nl_compat_dumpit() doesn't handle the error returned from cmd
header function. It means even when the check added in that fix fails, it
won't stop calling tipc_nl_compat_name_table_dump(), and the issue will be
triggered again.
So this patch is to add the process for the err returned from cmd header
function in tipc_nl_compat_dumpit().
Reported-by: syzbot+3ce8520484b0d4e260a5@syzkaller.appspotmail.com
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A similar issue as fixed by Patch "tipc: check bearer name with right
length in tipc_nl_compat_bearer_enable" was also found by syzbot in
tipc_nl_compat_link_set().
The length to check with should be 'TLV_GET_DATA_LEN(msg->req) -
offsetof(struct tipc_link_config, name)'.
Reported-by: syzbot+de00a87b8644a582ae79@syzkaller.appspotmail.com
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Syzbot reported the following crash:
BUG: KMSAN: uninit-value in memchr+0xce/0x110 lib/string.c:961
memchr+0xce/0x110 lib/string.c:961
string_is_valid net/tipc/netlink_compat.c:176 [inline]
tipc_nl_compat_bearer_enable+0x2c4/0x910 net/tipc/netlink_compat.c:401
__tipc_nl_compat_doit net/tipc/netlink_compat.c:321 [inline]
tipc_nl_compat_doit+0x3aa/0xaf0 net/tipc/netlink_compat.c:354
tipc_nl_compat_handle net/tipc/netlink_compat.c:1162 [inline]
tipc_nl_compat_recv+0x1ae7/0x2750 net/tipc/netlink_compat.c:1265
genl_family_rcv_msg net/netlink/genetlink.c:601 [inline]
genl_rcv_msg+0x185f/0x1a60 net/netlink/genetlink.c:626
netlink_rcv_skb+0x431/0x620 net/netlink/af_netlink.c:2477
genl_rcv+0x63/0x80 net/netlink/genetlink.c:637
netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
netlink_unicast+0xf3e/0x1020 net/netlink/af_netlink.c:1336
netlink_sendmsg+0x127f/0x1300 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:622 [inline]
sock_sendmsg net/socket.c:632 [inline]
Uninit was created at:
__alloc_skb+0x309/0xa20 net/core/skbuff.c:208
alloc_skb include/linux/skbuff.h:1012 [inline]
netlink_alloc_large_skb net/netlink/af_netlink.c:1182 [inline]
netlink_sendmsg+0xb82/0x1300 net/netlink/af_netlink.c:1892
sock_sendmsg_nosec net/socket.c:622 [inline]
sock_sendmsg net/socket.c:632 [inline]
It was triggered when the bearer name size < TIPC_MAX_BEARER_NAME,
it would check with a wrong len/TLV_GET_DATA_LEN(msg->req), which
also includes priority and disc_domain length.
This patch is to fix it by checking it with a right length:
'TLV_GET_DATA_LEN(msg->req) - offsetof(struct tipc_bearer_config, name)'.
Reported-by: syzbot+8b707430713eb46e1e45@syzkaller.appspotmail.com
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Aaro Koskinen says:
====================
net: stmmac: fix handling of oversized frames
I accidentally had MTU size mismatch (9000 vs. 1500) in my network,
and I noticed I could kill a system using stmmac & 1500 MTU simply
by pinging it with "ping -s 2000 ...".
While testing a fix I encountered also some other issues that need fixing.
I have tested these only with enhanced descriptors, so the normal
descriptor changes need a careful review.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This is log is harmful as it can trigger multiple times per packet. Delete
it.
Signed-off-by: Aaro Koskinen <aaro.koskinen@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Packets without the last descriptor set should be dropped early. If we
receive a frame larger than the DMA buffer, the HW will continue using the
next descriptor. Driver mistakes these as individual frames, and sometimes
a truncated frame (without the LD set) may look like a valid packet.
This fixes a strange issue where the system replies to 4098-byte ping
although the MTU/DMA buffer size is set to 4096, and yet at the same
time it's logging an oversized packet.
Signed-off-by: Aaro Koskinen <aaro.koskinen@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If we have error bits set, the discard_frame status will get overwritten
by checksum bit checks, which might set the status back to good one.
Fix by checking the COE status only if the frame is good.
Signed-off-by: Aaro Koskinen <aaro.koskinen@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, if we drop a packet, we exit from NAPI loop before the budget
is consumed. In some situations this will make the RX processing stall
e.g. when flood pinging the system with oversized packets, as the
errorneous packets are not dropped efficiently.
If we drop a packet, we should just continue to the next one as long as
the budget allows.
Signed-off-by: Aaro Koskinen <aaro.koskinen@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We always program the maximum DMA buffer size into the receive descriptor,
although the allocated size may be less. E.g. with the default MTU size
we allocate only 1536 bytes. If somebody sends us a bigger frame, then
memory may get corrupted.
Fix by using exact buffer sizes.
Signed-off-by: Aaro Koskinen <aaro.koskinen@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull KVM fixes from Paolo Bonzini:
"A collection of x86 and ARM bugfixes, and some improvements to
documentation.
On top of this, a cleanup of kvm_para.h headers, which were exported
by some architectures even though they not support KVM at all. This is
responsible for all the Kbuild changes in the diffstat"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (28 commits)
Documentation: kvm: clarify KVM_SET_USER_MEMORY_REGION
KVM: doc: Document the life cycle of a VM and its resources
KVM: selftests: complete IO before migrating guest state
KVM: selftests: disable stack protector for all KVM tests
KVM: selftests: explicitly disable PIE for tests
KVM: selftests: assert on exit reason in CR4/cpuid sync test
KVM: x86: update %rip after emulating IO
x86/kvm/hyper-v: avoid spurious pending stimer on vCPU init
kvm/x86: Move MSR_IA32_ARCH_CAPABILITIES to array emulated_msrs
KVM: x86: Emulate MSR_IA32_ARCH_CAPABILITIES on AMD hosts
kvm: don't redefine flags as something else
kvm: mmu: Used range based flushing in slot_handle_level_range
KVM: export <linux/kvm_para.h> and <asm/kvm_para.h> iif KVM is supported
KVM: x86: remove check on nr_mmu_pages in kvm_arch_commit_memory_region()
kvm: nVMX: Add a vmentry check for HOST_SYSENTER_ESP and HOST_SYSENTER_EIP fields
KVM: SVM: Workaround errata#1096 (insn_len maybe zero on SMAP violation)
KVM: Reject device ioctls from processes other than the VM's creator
KVM: doc: Fix incorrect word ordering regarding supported use of APIs
KVM: x86: fix handling of role.cr4_pae and rename it to 'gpte_size'
KVM: nVMX: Do not inherit quadrant and invalid for the root shadow EPT
...
Pull x86 fixes from Thomas Gleixner:
"A pile of x86 updates:
- Prevent exceeding he valid physical address space in the /dev/mem
limit checks.
- Move all header content inside the header guard to prevent compile
failures.
- Fix the bogus __percpu annotation in this_cpu_has() which makes
sparse very noisy.
- Disable switch jump tables completely when retpolines are enabled.
- Prevent leaking the trampoline address"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/realmode: Make set_real_mode_mem() static inline
x86/cpufeature: Fix __percpu annotation in this_cpu_has()
x86/mm: Don't exceed the valid physical address space
x86/retpolines: Disable switch jump tables when retpolines are enabled
x86/realmode: Don't leak the trampoline kernel address
x86/boot: Fix incorrect ifdeffery scope
x86/resctrl: Remove unused variable
Pull perf tooling fixes from Thomas Gleixner:
"Core libraries:
- Fix max perf_event_attr.precise_ip detection.
- Fix parser error for uncore event alias
- Fixup ordering of kernel maps after obtaining the main kernel map
address.
Intel PT:
- Fix TSC slip where A TSC packet can slip past MTC packets so that
the timestamp appears to go backwards.
- Fixes for exported-sql-viewer GUI conversion to python3.
ARM coresight:
- Fix the build by adding a missing case value for enumeration value
introduced in newer library, that now is the required one.
tool headers:
- Syncronize kernel headers with the kernel, getting new io_uring and
pidfd_send_signal syscalls so that 'perf trace' can handle them"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf pmu: Fix parser error for uncore event alias
perf scripts python: exported-sql-viewer.py: Fix python3 support
perf scripts python: exported-sql-viewer.py: Fix never-ending loop
perf machine: Update kernel map address and re-order properly
tools headers uapi: Sync powerpc's asm/kvm.h copy with the kernel sources
tools headers: Update x86's syscall_64.tbl and uapi/asm-generic/unistd
tools headers uapi: Update drm/i915_drm.h
tools arch x86: Sync asm/cpufeatures.h with the kernel sources
tools headers uapi: Sync linux/fcntl.h to get the F_SEAL_FUTURE_WRITE addition
tools headers uapi: Sync asm-generic/mman-common.h and linux/mman.h
perf evsel: Fix max perf_event_attr.precise_ip detection
perf intel-pt: Fix TSC slip
perf cs-etm: Add missing case value
Pull CPU hotplug fixes from Thomas Gleixner:
"Two SMT/hotplug related fixes:
- Prevent crash when HOTPLUG_CPU is disabled and the CPU bringup
aborts. This is triggered with the 'nosmt' command line option, but
can happen by any abort condition. As the real unplug code is not
compiled in, prevent the fail by keeping the CPU in zombie state.
- Enforce HOTPLUG_CPU for SMP on x86 to avoid the above situation
completely. With 'nosmt' being a popular option it's required to
unplug the half brought up sibling CPUs (due to the MCE wreckage)
completely"
* 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/smp: Enforce CONFIG_HOTPLUG_CPU when SMP=y
cpu/hotplug: Prevent crash when CPU bringup fails on CONFIG_HOTPLUG_CPU=n
Pull locking fixlet from Thomas Gleixner:
"Trivial update to the maintainers file"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
MAINTAINERS: Remove deleted file from futex file pattern
Pull core fixes from Thomas Gleixner:
"A small set of core updates:
- Make the watchdog respect the selected CPU mask again. That was
broken by the rework of the watchdog thread management and caused
inconsistent state and NMI watchdog being unstoppable.
- Ensure that the objtool build can find the libelf location.
- Remove dead kcore stub code"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
watchdog: Respect watchdog cpumask on CPU hotplug
objtool: Query pkg-config for libelf location
proc/kcore: Remove unused kclist_add_remap()
Pull powerpc fixes from Michael Ellerman:
"Three non-regression fixes.
- Our optimised memcmp could read past the end of one of the buffers
and potentially trigger a page fault leading to an oops.
- Some of our code to read energy management data on PowerVM had an
endian bug leading to bogus results.
- When reporting a machine check exception we incorrectly reported
TLB multihits as D-Cache multhits due to a missing entry in the
array of causes.
Thanks to: Chandan Rajendra, Gautham R. Shenoy, Mahesh Salgaonkar,
Segher Boessenkool, Vaidyanathan Srinivasan"
* tag 'powerpc-5.1-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/pseries/mce: Fix misleading print for TLB mutlihit
powerpc/pseries/energy: Use OF accessor functions to read ibm,drc-indexes
powerpc/64: Fix memcmp reading past the end of src/dest
This fixes a possible circular locking dependency detected warning seen
with:
- CONFIG_PROVE_LOCKING=y
- consumer/provider IIO devices (ex: "voltage-divider" consumer of "adc")
When using the IIO consumer interface, e.g. iio_channel_get(), the consumer
device will likely call iio_read_channel_raw() or similar that rely on
'info_exist_lock' mutex.
typically:
...
mutex_lock(&chan->indio_dev->info_exist_lock);
if (chan->indio_dev->info == NULL) {
ret = -ENODEV;
goto err_unlock;
}
ret = do_some_ops()
err_unlock:
mutex_unlock(&chan->indio_dev->info_exist_lock);
return ret;
...
Same mutex is also hold in iio_device_unregister().
The following deadlock warning happens when:
- the consumer device has called an API like iio_read_channel_raw()
at least once.
- the consumer driver is unregistered, removed (unbind from sysfs)
======================================================
WARNING: possible circular locking dependency detected
4.19.24 #577 Not tainted
------------------------------------------------------
sh/372 is trying to acquire lock:
(kn->count#30){++++}, at: kernfs_remove_by_name_ns+0x3c/0x84
but task is already holding lock:
(&dev->info_exist_lock){+.+.}, at: iio_device_unregister+0x18/0x60
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&dev->info_exist_lock){+.+.}:
__mutex_lock+0x70/0xa3c
mutex_lock_nested+0x1c/0x24
iio_read_channel_raw+0x1c/0x60
iio_read_channel_info+0xa8/0xb0
dev_attr_show+0x1c/0x48
sysfs_kf_seq_show+0x84/0xec
seq_read+0x154/0x528
__vfs_read+0x2c/0x15c
vfs_read+0x8c/0x110
ksys_read+0x4c/0xac
ret_fast_syscall+0x0/0x28
0xbedefb60
-> #0 (kn->count#30){++++}:
lock_acquire+0xd8/0x268
__kernfs_remove+0x288/0x374
kernfs_remove_by_name_ns+0x3c/0x84
remove_files+0x34/0x78
sysfs_remove_group+0x40/0x9c
sysfs_remove_groups+0x24/0x34
device_remove_attrs+0x38/0x64
device_del+0x11c/0x360
cdev_device_del+0x14/0x2c
iio_device_unregister+0x24/0x60
release_nodes+0x1bc/0x200
device_release_driver_internal+0x1a0/0x230
unbind_store+0x80/0x130
kernfs_fop_write+0x100/0x1e4
__vfs_write+0x2c/0x160
vfs_write+0xa4/0x17c
ksys_write+0x4c/0xac
ret_fast_syscall+0x0/0x28
0xbe906840
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&dev->info_exist_lock);
lock(kn->count#30);
lock(&dev->info_exist_lock);
lock(kn->count#30);
*** DEADLOCK ***
...
cdev_device_del() can be called without holding the lock. It should be safe
as info_exist_lock prevents kernelspace consumers to use the exported
routines during/after provider removal. cdev_device_del() is for userspace.
Help to reproduce:
See example: Documentation/devicetree/bindings/iio/afe/voltage-divider.txt
sysv {
compatible = "voltage-divider";
io-channels = <&adc 0>;
output-ohms = <22>;
full-ohms = <222>;
};
First, go to iio:deviceX for the "voltage-divider", do one read:
$ cd /sys/bus/iio/devices/iio:deviceX
$ cat in_voltage0_raw
Then, unbind the consumer driver. It triggers above deadlock warning.
$ cd /sys/bus/platform/drivers/iio-rescale/
$ echo sysv > unbind
Note I don't actually expect stable will pick this up all the
way back into IIO being in staging, but if's probably valid that
far back.
Signed-off-by: Fabrice Gasnier <fabrice.gasnier@st.com>
Fixes: ac917a8111 ("staging:iio:core set the iio_dev.info pointer to null on unregister")
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Pull LED fixes from Jacek Anaszewski:
- fix refcnt leak on interface rename
- use memcpy in device_name_store() to avoid including garbage from a
previous, longer value in the device_name
- fix a potential NULL pointer dereference in case of_match_device()
cannot find a match
* tag 'led-fixes-for-5.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds:
leds: trigger: netdev: use memcpy in device_name_store
leds: pca9532: fix a potential NULL pointer dereference
leds: trigger: netdev: fix refcnt leak on interface rename
Pull GPIO fixes from Linus Walleij:
"As you can see [in the git history] I was away on leave and Bartosz
kindly stepped in and collected a slew of fixes, I pulled them into my
tree in two sets and merged some two more fixes (fixing my own caused
bugs) on top.
Summary:
- Revert the extended use of gpio_set_config() and think about how we
can do this properly.
- Fix up the SPI CS GPIO handling so it now works properly on the SPI
bus children, as intended.
- Error paths and driver fixes"
* tag 'gpio-v5.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpio: mockup: use simple_read_from_buffer() in debugfs read callback
gpio: of: Fix of_gpiochip_add() error path
gpio: of: Check for "spi-cs-high" in child instead of parent node
gpio: of: Check propname before applying "cs-gpios" quirks
gpio: mockup: fix debugfs read
Revert "gpio: use new gpio_set_config() helper in more places"
gpio: aspeed: fix a potential NULL pointer dereference
gpio: amd-fch: Fix bogus SPDX identifier
gpio: adnp: Fix testing wrong value in adnp_gpio_direction_input
gpio: exar: add a check for the return value of ida_simple_get fails
If userspace doesn't end the input with a newline (which can easily
happen if the write happens from a C program that does write(fd,
iface, strlen(iface))), we may end up including garbage from a
previous, longer value in the device_name. For example
# cat device_name
# printf 'eth12' > device_name
# cat device_name
eth12
# printf 'eth3' > device_name
# cat device_name
eth32
I highly doubt anybody is relying on this behaviour, so switch to
simply copying the bytes (we've already checked that size is <
IFNAMSIZ) and unconditionally zero-terminate it; of course, we also
still have to strip a trailing newline.
This is also preparation for future patches.
Fixes: 06f502f57d ("leds: trigger: Introduce a NETDEV trigger")
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
It was reported that re-introducing ASPM, in combination with RX
interrupt coalescing, results in significantly increased packet
latency, see [0]. Disabling ASPM or RX interrupt coalescing fixes
the issue. Therefore change the driver's default to disable RX
interrupt coalescing. Users still have the option to enable RX
coalescing via ethtool.
[0] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=925496
Fixes: a99790bf5c ("r8169: Reinstate ASPM Support")
Reported-by: Mike Crowe <mac@mcrowe.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case of_match_device cannot find a match, return -EINVAL to avoid
NULL pointer dereference.
Fixes: fa4191a609 ("leds: pca9532: Add device tree support")
Signed-off-by: Kangjie Lu <kjlu@umn.edu>
Signed-off-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Pull staging driver fixes from Greg KH:
"Here are some small staging driver fixes for 5.1-rc3, and one driver
removal.
The biggest thing here is the removal of the mt7621-eth driver as a
"real" network driver was merged in 5.1-rc1 for this hardware, so this
old driver can now be removed.
Other than that, there are just a number of small fixes, all resolving
reported issues and some potential corner cases for error handling
paths.
All of these have been in linux-next with no reported issues"
* tag 'staging-5.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: vt6655: Remove vif check from vnt_interrupt
staging: erofs: keep corrupted fs from crashing kernel in erofs_readdir()
staging: octeon-ethernet: fix incorrect PHY mode
staging: vc04_services: Fix an error code in vchiq_probe()
staging: erofs: fix error handling when failed to read compresssed data
staging: vt6655: Fix interrupt race condition on device start up.
staging: rtlwifi: Fix potential NULL pointer dereference of kzalloc
staging: rtl8712: uninitialized memory in read_bbreg_hdl()
staging: rtlwifi: rtl8822b: fix to avoid potential NULL pointer dereference
staging: rtl8188eu: Fix potential NULL pointer dereference of kcalloc
staging, mt7621-pci: fix build without pci support
staging: speakup_soft: Fix alternate speech with other synths
staging: axis-fifo: add CONFIG_OF dependency
staging: olpc_dcon_xo_1: add missing 'const' qualifier
staging: comedi: ni_mio_common: Fix divide-by-zero for DIO cmdtest
staging: erofs: fix to handle error path of erofs_vmap()
staging: mt7621-dts: update ethernet settings.
staging: remove mt7621-eth
Pull tty/serial fixes from Greg KH:
"Here are some small tty and serial driver fixes for 5.1-rc3.
Nothing major here, just a number of potential problems fixes for
error handling paths, as well as some other minor bugfixes for
reported issues with 5.1-rc1.
All of these have been in linux-next with no reported issues"
* tag 'tty-5.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
tty: fix NULL pointer issue when tty_port ops is not set
Disable kgdboc failed by echo space to /sys/module/kgdboc/parameters/kgdboc
dt-bindings: serial: Add compatible for Mediatek MT8183
tty/serial: atmel: RS485 HD w/DMA: enable RX after TX is stopped
tty/serial: atmel: Add is_half_duplex helper
serial: sh-sci: Fix setting SCSCR_TIE while transferring data
serial: ar933x_uart: Fix build failure with disabled console
tty: serial: qcom_geni_serial: Initialize baud in qcom_geni_console_setup
sc16is7xx: missing unregister/delete driver on error in sc16is7xx_init()
tty: mxs-auart: fix a potential NULL pointer dereference
tty: atmel_serial: fix a potential NULL pointer dereference
serial: max310x: Fix to avoid potential NULL pointer dereference
serial: mvebu-uart: Fix to avoid a potential NULL pointer dereference
Pull USB fixes from Greg KH:
"Here are some small USB fixes for 5.1-rc3.
Nothing major at all here, just a small collection of fixes for
reported issues, and potential problems with error handling paths.
Also a few new device ids, as normal.
All of these have been in linux-next with no reported issues"
* tag 'usb-5.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (25 commits)
USB: serial: option: add Olicard 600
USB: serial: cp210x: add new device id
usb: u132-hcd: fix resource leak
usb: cdc-acm: fix race during wakeup blocking TX traffic
usb: mtu3: fix EXTCON dependency
usb: usb251xb: fix to avoid potential NULL pointer dereference
usb: core: Try generic PHY_MODE_USB_HOST if usb_phy_roothub_set_mode fails
phy: sun4i-usb: Support set_mode to USB_HOST for non-OTG PHYs
xhci: Don't let USB3 ports stuck in polling state prevent suspend
usb: xhci: dbc: Don't free all memory with spinlock held
xhci: Fix port resume done detection for SS ports with LPM enabled
USB: serial: mos7720: fix mos_parport refcount imbalance on error path
USB: gadget: f_hid: fix deadlock in f_hidg_write()
usb: gadget: net2272: Fix net2272_dequeue()
usb: gadget: net2280: Fix net2280_dequeue()
usb: gadget: net2280: Fix overrun of OUT messages
usb: dwc3: pci: add support for Comet Lake PCH ID
usb: usb251xb: Remove unnecessary comparison of unsigned integer with >= 0
usb: common: Consider only available nodes for dr_mode
usb: typec: tcpm: Try PD-2.0 if sink does not respond to 3.0 source-caps
...
Pull ACPI fix from Rafael Wysocki:
"This corrects a previous attempt to make Linux use its own set of ACPI
debug flags different from the upstream ACPICA's default (Erik
Schmauss)"
* tag 'acpi-5.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: use different default debug value than ACPICA
Pull power management fixes from Rafael Wysocki:
"These fix CPU base frequency reporting in the intel_pstate driver and
a use-after-free in the scpi-cpufreq driver.
Specifics:
- Fix the ACPI CPPC library to actually follow the specification when
decoding the guaranteed performance register information and make
the intel_pstate driver to fall back to the nominal frequency when
reporting the base frequency if the guaranteed performance register
information is not there (Srinivas Pandruvada).
- Fix use-after-free in the exit callback of the scpi-cpufreq left
after an update during the 5.0 development cycle (Vincent Stehlé)"
* tag 'pm-5.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: scpi: Fix use after free
cpufreq: intel_pstate: Also use CPPC nominal_perf for base_frequency
ACPI / CPPC: Fix guaranteed performance handling
Pull security layer fixes from James Morris:
"Yama and LSM config fixes"
* 'fixes-v5.1-a' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
LSM: Revive CONFIG_DEFAULT_SECURITY_* for "make oldconfig"
Yama: mark local symbols as static
With zero-key defined, we can remove previous detection of key id 0 or null
key in order to deal with a zero-key situation. Syncing all security
commands to use the zero-key. Helper functions are introduced to return the
data that points to the actual key payload or the zero_key. This helps
uniformly handle the key material even with zero_key.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Add a zero key in order to standardize hardware that want a key of 0's to
be passed. Some platforms defaults to a zero-key with security enabled
rather than allow the OS to enable the security. The zero key would allow
us to manage those platform as well. This also adds a fix to secure erase
so it can use the zero key to do crypto erase. Some other security commands
already use zero keys. This introduces a standard zero-key to allow
unification of semantics cross nvdimm security commands.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Recently the generic timer test of kvm-unit-tests failed to complete
(stalled) when a physical timer is being used. This issue is caused
by incorrect update of CNTP_CVAL when CNTP_TVAL is being accessed,
introduced by 'Commit 84135d3d18 ("KVM: arm/arm64: consolidate arch
timer trap handlers")'. According to Arm ARM, the read/write behavior
of accesses to the TVAL registers is expected to be:
* READ: TimerValue = (CompareValue – (Counter - Offset)
* WRITE: CompareValue = ((Counter - Offset) + Sign(TimerValue)
This patch fixes the TVAL read/write code path according to the
specification.
Fixes: 84135d3d18 ("KVM: arm/arm64: consolidate arch timer trap handlers")
Signed-off-by: Wei Huang <wei@redhat.com>
[maz: commit message tidy-up]
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
First batch of iwlwifi fixes intended for v5.1
* add some new PCI IDs (plus a struct name change they depend on);
* fix crypto with new devices, namely 22560 and above;
* a bunch of fixes (and a dependency) for the new debugging infra;
Daniel Borkmann says:
====================
pull-request: bpf 2019-03-29
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Bug fix in BTF deduplication that was mishandling an equivalence
comparison, from Andrii.
2) libbpf Makefile fixes to properly link against libelf for the shared
object and to actually export AF_XDP's xsk.h header, from Björn.
3) Fix use after free in bpf inode eviction, from Daniel.
4) Fix a bug in skb creation out of cpumap redirect, from Jesper.
5) Remove an unnecessary and triggerable WARN_ONCE() in max number
of call stack frames checking in verifier, from Paul.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull turbostat utility updates for 5.1 from Len Brown:
"Misc fixes and updates."
* 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
tools/power turbostat: update version number
tools/power turbostat: Warn on bad ACPI LPIT data
tools/power turbostat: Add checks for failure of fgets() and fscanf()
tools/power turbostat: Also read package power on AMD F17h (Zen)
tools/power turbostat: Add support for AMD Fam 17h (Zen) RAPL
tools/power turbostat: Do not display an error on systems without a cpufreq driver
tools/power turbostat: Add Die column
tools/power turbostat: Add Icelake support
tools/power turbostat: Cleanup CNL-specific code
tools/power turbostat: Cleanup CC3-skip code
tools/power turbostat: Restore ability to execute in topology-order
Merge misc fixes from Andrew Morton:
"22 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (22 commits)
fs/proc/proc_sysctl.c: fix NULL pointer dereference in put_links
fs: fs_parser: fix printk format warning
checkpatch: add %pt as a valid vsprintf extension
mm/migrate.c: add missing flush_dcache_page for non-mapped page migrate
drivers/block/zram/zram_drv.c: fix idle/writeback string compare
mm/page_isolation.c: fix a wrong flag in set_migratetype_isolate()
mm/memory_hotplug.c: fix notification in offline error path
ptrace: take into account saved_sigmask in PTRACE{GET,SET}SIGMASK
fs/proc/kcore.c: make kcore_modules static
include/linux/list.h: fix list_is_first() kernel-doc
mm/debug.c: fix __dump_page when mapping->host is not set
mm: mempolicy: make mbind() return -EIO when MPOL_MF_STRICT is specified
include/linux/hugetlb.h: convert to use vm_fault_t
iommu/io-pgtable-arm-v7s: request DMA32 memory, and improve debugging
mm: add support for kmem caches in DMA32 zone
ocfs2: fix inode bh swapping mixup in ocfs2_reflink_inodes_lock
mm/hotplug: fix offline undo_isolate_page_range()
fs/open.c: allow opening only regular files during execve()
mailmap: add Changbin Du
mm/debug.c: add a cast to u64 for atomic64_read()
...
Pull arm64 fix from Catalin Marinas:
"Use memblock_alloc() instead of memblock_alloc_low() in
request_standard_resources(), the latter being limited to the low 4G
memory range on arm64"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: replace memblock_alloc_low with memblock_alloc
Pull more fixes for meson clocks from Neil Armstrong:
- clk-pll: fix rate rounding fixing meson8b boot failure
- vid-pll-div: fix recal_rate warning and return when invalid setting
* tag 'meson-clk-fixes-for-5.1-v2' of https://github.com/BayLibre/clk-meson:
clk: meson: vid-pll-div: remove warning and return 0 on invalid config
clk: meson: pll: fix rounding and setting a rate that matches precisely
Pull IOMMU fixes from Joerg Roedel:
- Fix a bug in the AMD IOMMU driver not handling exclusion ranges
correctly. In fact the driver did not reserve these ranges for IOVA
allocations, so that dma-handles could be allocated in an exclusion
range, leading to data corruption. Exclusion ranges have not been
used by any firmware up to now, so this issue remained undiscovered
for quite some time.
- Fix wrong warning messages that the IOMMU core code prints when it
tries to allocate the default domain for an iommu group and the
driver does not support any of the default domain types (like Intel
VT-d).
* tag 'iommu-fixes-v5.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/amd: Reserve exclusion range in iova-domain
iommu: Don't print warning when IOMMU driver only supports unmanaged domains
Saeed Mahameed says:
====================
Mellanox, mlx5 fixes 2019-03-29
This series introduces some fixes to mlx5 driver.
Please pull and let me know if there is any problem.
For -stable v4.11
('net/mlx5: Decrease default mr cache size')
For -stable v4.12
('net/mlx5e: Add a lock on tir list')
For -stable v4.13
('net/mlx5e: Fix error handling when refreshing TIRs')
For -stable v4.18
('net/mlx5e: Update xon formula')
For -stable v4.19
('net: mlx5: Add a missing check on idr_find, free buf')
('net/mlx5e: Update xoff formula')
net-next merge Note:
When merged with net-next the following simple conflict will appear,
drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.c
++<<<<<<< HEAD (net)
+ * max_mtu: netdev's max_mtu
++=======
+ * @mtu: device's MTU
++>>>>>>> net-next
To resolve: just replace the line in net-next
* @mtu: device's MTU
to
* @max_mtu: netdev's max_mtu
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull driver core fix from Greg KH:
"Here is a single driver core patch for 5.1-rc3.
After 5.1-rc1, all of the users of BUS_ATTR() are finally removed, so
we can now drop this macro from include/linux/device.h so that no more
new users will be created.
This patch has been in linux-next for a while, with no reported
issues"
* tag 'driver-core-5.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
driver core: remove BUS_ATTR()
Pull char/misc driver fixes from Greg KH:
"Here are some binder, habanalabs, and vboxguest driver fixes for
5.1-rc3.
The Binder fixes resolve some reported issues found by testing, first
by the selinux developers, and then earlier today by syzbot.
The habanalabs fixes are all minor, resolving a number of tiny things.
The vboxguest patches are a bit larger. They resolve the fact that
virtual box decided to change their api in their latest release in a
way that broke the existing kernel code, despite saying that they were
never going to do that. So this is a bit of a "new feature", but is
good to get merged so that 5.1 will work with the latest release. The
changes are not large and of course virtual box "swears" they will not
break this again, but no one is holding their breath here.
All of these have been in linux-next for a while with no reported
issues"
* tag 'char-misc-5.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
virt: vbox: Implement passing requestor info to the host for VirtualBox 6.0.x
binder: fix race between munmap() and direct reclaim
binder: fix BUG_ON found by selinux-testsuite
habanalabs: cast to expected type
habanalabs: prevent host crash during suspend/resume
habanalabs: perform accounting for active CS
habanalabs: fix mapping with page size bigger than 4KB
habanalabs: complete user context cleanup before hard reset
habanalabs: fix bug when mapping very large memory area
habanalabs: fix MMU number of pages calculation
Pull SCSI fixes from James Bottomley:
"Thirteen fixes, seven of which are for IBM fibre channel and three
additional for fairly serious bugs in drivers (qla2xxx, mpt3sas,
aacraid).
Of the three core fixes, the most significant is probably the missed
run queue causing an indefinite hang. The others are fixing a
potential use after free on device close and silencing an incorrect
warning"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ibmvfc: Clean up transport events
scsi: ibmvfc: Byte swap status and error codes when logging
scsi: ibmvfc: Add failed PRLI to cmd_status lookup array
scsi: ibmvfc: Remove "failed" from logged errors
scsi: zfcp: reduce flood of fcrscn1 trace records on multi-element RSCN
scsi: zfcp: fix scsi_eh host reset with port_forced ERP for non-NPIV FCP devices
scsi: zfcp: fix rport unblock if deleted SCSI devices on Scsi_Host
scsi: sd: Quiesce warning if device does not report optimal I/O size
scsi: sd: Fix a race between closing an sd device and sd I/O
scsi: core: Run queue when state is set to running after being blocked
scsi: qla4xxx: fix a potential NULL pointer dereference
scsi: aacraid: Insure we don't access PCIe space during AER/EEH
scsi: mpt3sas: Fix kernel panic during expander reset
Pull i2c fixes from Wolfram Sang:
"A new ID for the i801 driver and some Documentation fixes to make it
easier for people to find the bindings (which is also a basis for
further improvements in that area)"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: wmt: make bindings file name match the driver
i2c: sun6i-p2wi: make bindings file name match the driver
i2c: stu300: make bindings file name match the driver
i2c: mt65xx: make bindings file name match the driver
i2c: iop3xx: make bindings file name match the driver
i2c: i801: Add support for Intel Comet Lake
Pull sound fixes from Takashi Iwai:
"The important fixes at this time are a couple fixes in ALSA core: a
fix for PCM is about the OOB access in PCM OSS plugins that has been
for long time, but hasn't hit so often until now just because we
allocated a large buffer via vmalloc(), and surfaced more often after
switching to kvmalloc(). Another fix is for a long-standing PCM
problem wrt racy PM resume.
Others are trivial nospec coverage and usual HD-audio quirks"
* tag 'sound-5.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/realtek - Fix speakers on Acer Predator Helios 500 Ryzen laptops
ALSA: pcm: Don't suspend stream in unrecoverable PCM state
ALSA: hda/ca0132 - Simplify alt firmware loading code
ALSA: pcm: Fix possible OOB access in PCM oss plugins
ALSA: hda/realtek: Enable headset MIC of ASUS X430UN and X512DK with ALC256
ALSA: hda/realtek: Enable headset mic of ASUS P5440FF with ALC256
ALSA: hda/realtek: Enable ASUS X441MB and X705FD headset MIC with ALC256
ALSA: hda/realtek - Add support for Acer Aspire E5-523G/ES1-432 headset mic
ALSA: hda/realtek: Enable headset MIC of Acer Aspire Z24-890 with ALC286
ALSA: seq: oss: Fix Spectre v1 vulnerability
ALSA: rawmidi: Fix potential Spectre v1 vulnerability
Pull Kbuild fixes from Masahiro Yamada:
- Remove harmful -Oz option of Clang
- Get back the original behavior (no recursion for in-tree build) for
GNU Make 4.x
- Some minor fixes for coccinelle patches
- Do not overwrite .gitignore in the output directory in case it is
version-controlled
- Fix missed record-mcount bug for dynamic ftrace
- Fix endianness bug in modversions for relative CRC
- Cater to '^H' key code in Kconfig ncurses programs
* tag 'kbuild-fixes-v5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kconfig/[mn]conf: handle backspace (^H) key
kbuild: modversions: Fix relative CRC byte order interpretation
scripts: coccinelle: Fix description of badty.cocci
kbuild: strip whitespace in cmd_record_mcount findstring
kbuild: do not overwrite .gitignore in output directory
kbuild: skip parsing pre sub-make code for recursion
coccinelle: put_device: reduce false positives
kbuild: skip sub-make for in-tree build with GNU Make 4.x
Revert "kbuild: use -Oz instead of -Os when using clang"
Pull block fixes from Jens Axboe:
"Small set of fixes that should go into this series. This contains:
- compat signal mask fix for io_uring (Arnd)
- EAGAIN corner case for direct vs buffered writes for io_uring
(Roman)
- NVMe pull request from Christoph with various little fixes
- sbitmap ws_active fix, which caused a perf regression for shared
tags (me)
- sbitmap bit ordering fix (Ming)
- libata on-stack DMA fix (Raymond)"
* tag 'for-linus-20190329' of git://git.kernel.dk/linux-block:
nvmet: fix error flow during ns enable
nvmet: fix building bvec from sg list
nvme-multipath: relax ANA state check
nvme-tcp: fix an endianess miss-annotation
libata: fix using DMA buffers on stack
io_uring: offload write to async worker in case of -EAGAIN
sbitmap: order READ/WRITE freed instance and setting clear bit
blk-mq: fix sbitmap ws_active for shared tags
io_uring: fix big-endian compat signal mask handling
blk-mq: update comment for blk_mq_hctx_has_pending()
blk-mq: use blk_mq_put_driver_tag() to put tag
Pull ceph fixes from Ilya Dryomov:
"A patch to avoid choking on multipage bvecs in the messenger and a
small use-after-free fix"
* tag 'ceph-for-5.1-rc3' of git://github.com/ceph/ceph-client:
ceph: fix use-after-free on symlink traversal
libceph: fix breakage caused by multipage bvecs
Pull xfs fixes from Darrick Wong:
"Here are a few fixes for some corruption bugs and uninitialized
variable problems. The few patches here have gone through a few days
worth of fstest runs with no new problems observed.
Changes since last update:
- Fix a bunch of static checker complaints about uninitialized
variables and insufficient range checks.
- Avoid a crash when incore extent map data are corrupt.
- Disallow FITRIM when we haven't recovered the log and know the
metadata are stale.
- Fix a data corruption when doing unaligned overlapping dio writes"
* tag 'xfs-5.1-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: serialize unaligned dio writes against all other dio writes
xfs: prohibit fstrim in norecovery mode
xfs: always init bma in xfs_bmapi_write
xfs: fix btree scrub checking with regards to root-in-inode
xfs: dabtree scrub needs to range-check level
xfs: don't trip over uninitialized buffer on extent read of corrupted inode
Commit 70b62c2566 ("LoadPin: Initialize as ordered LSM") removed
CONFIG_DEFAULT_SECURITY_{SELINUX,SMACK,TOMOYO,APPARMOR,DAC} from
security/Kconfig and changed CONFIG_LSM to provide a fixed ordering as a
default value. That commit expected that existing users (upgrading from
Linux 5.0 and earlier) will edit CONFIG_LSM value in accordance with
their CONFIG_DEFAULT_SECURITY_* choice in their old kernel configs. But
since users might forget to edit CONFIG_LSM value, this patch revives
the choice (only for providing the default value for CONFIG_LSM) in order
to make sure that CONFIG_LSM reflects CONFIG_DEFAULT_SECURITY_* from their
old kernel configs.
Note that since TOMOYO can be fully stacked against the other legacy
major LSMs, when it is selected, it explicitly disables the other LSMs
to avoid them also initializing since TOMOYO does not expect this
currently.
Reported-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Fixes: 70b62c2566 ("LoadPin: Initialize as ordered LSM")
Co-developed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: James Morris <james.morris@microsoft.com>
Change t4fw_version.h to update latest firmware version
number to 1.23.3.0.
Signed-off-by: Vishal Kulkarni <vishal@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
NULL or ZERO_SIZE_PTR will be returned for zero sized memory
request, and derefencing them will lead to a segfault
so it is unnecessory to call vzalloc for zero sized memory
request and not call functions which maybe derefence the
NULL allocated memory
this also fixes a possible memory leak if phy_ethtool_get_stats
returns error, memory should be freed before exit
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Reviewed-by: Wang Li <wangli39@baidu.com>
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Only decrypt_internal() performs zero copy on rx, all paths
which don't hit decrypt_internal() must set zc to false,
otherwise tls_sw_recvmsg() may return 0 causing the application
to believe that that connection got closed.
Currently this happens with device offload when new record
is first read from.
Fixes: d069b780e3 ("tls: Fix tls_device receive")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reported-by: David Beckett <david.beckett@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After queue stopped, the wakeup mechanism may wake it up again
when ring buffer usage is lower than a threshold. This may cause
send path panic on NULL pointer when we stopped all tx queues in
netvsc_detach and start removing the netvsc device.
This patch fix it by adding a tx_disable flag to prevent unwanted
queue wakeup.
Fixes: 7b2ee50c0c ("hv_netvsc: common detach logic")
Reported-by: Mohammed Gamal <mgamal@redhat.com>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bond expects ethernet hwaddr for its slave, but it can be longer than 6
bytes - infiniband interface for example.
# cat /sys/devices/<skipped>/net/ib0/address
80:00:02:08:fe:80:00:00:00:00:00:00:7c:fe:90:03:00:be:5d:e1
# cat /sys/devices/<skipped>/net/ib0/bonding_slave/perm_hwaddr
80:00:02:08:fe:80
So print full hwaddr in sysfs "bonding_slave/perm_hwaddr" as well.
Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull perf/urgent fixes from Arnaldo:
Core libraries:
Jiri Olsa:
- Fix max perf_event_attr.precise_ip detection.
Kan Liang:
- Fix parser error for uncore event alias
Wei Lin:
- Fixup ordering of kernel maps after obtaining the main kernel map address.
Intel PT:
Adrian Hunter:
- Fix TSC slip where A TSC packet can slip past MTC packets so that the
timestamp appears to go backwards.
- Fixes for exported-sql-viewer GUI conversion to python3.
ARM coresight:
Solomon Tan:
- Fix the build by adding a missing case value for enumeration value introduced
in newer library, that now is the required one.
tool headers:
Arnaldo Carvalho de Melo:
- Syncronize kernel headers with the kernel, getting new io_uring and
pidfd_send_signal syscalls so that 'perf trace' can handle them.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The driver allocates an encap context based on the tunnel properties,
and reuse that context for all flows using the same tunnel properties.
Commit df2ef3bff1 ("net/mlx5e: Add GRE protocol offloading")
introduced another tunnel protocol other than the single VXLAN
previously supported. A flow that uses a tunnel with the same tunnel
properties but with a different tunnel type (GRE vs VXLAN for example)
would mistakenly reuse the previous alocated context, causing the
traffic to be sent with the wrong encapsulation. Fix that by
considering the tunnel type for encap contexts.
Fixes: df2ef3bff1 ("net/mlx5e: Add GRE protocol offloading")
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Set xon = xoff - netdev's max_mtu.
netdev's max_mtu will give enough time for the pause frame to
arrive at the sender.
Fixes: 0696d60853 ("net/mlx5e: Receive buffer configuration")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Make sure the struct mlx5_flow_destination is zero before
filling in the field.
Fixes: 8da202b249 ("net/mlx5: E-Switch, Add support for VEPA in legacy mode.")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Traditionally, the PF (Physical Function) which resides on vport 0 was
the E-switch manager. Since the ECPF (Embedded CPU Physical Function),
which resides on vport 0xfffe, was introduced as the E-Switch manager,
the assumption that the E-switch manager is on vport 0 is incorrect.
Since the eswitch code already uses the actual vport value, all we
need is to always set other_vport=1.
Signed-off-by: Omri Kahalon <omrik@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The esw offloads structures share a union with the legacy mode structs.
Reset the offloads struct to zero in init to protect from null
assumptions made by the legacy mode code.
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The capacity of FDB offloading and NIC offloading table are
different, and when allocating the pedit actions, we should
use the correct namespace type.
Fixes: c500c86b0c ("net/mlx5e: support for two independent packet edit actions")
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The esw fdb table has a union of legacy and offloads members.
So if we were in a certain esw mode we could set some memebers and not
set null which is fine as on destroy path and don't care.
But then moving from legacy to switchdev a second time, the cleanup flow
of legacy mode checks if a struct member was in use if it's not null so
we need to make sure to reset the code to null when we init legacy mode.
Fixes: 8da202b249 ("net/mlx5: E-Switch, Add support for VEPA in legacy mode.")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Allow configuration of legacy link-modes even when extended link-modes
are supported. This requires reading of legacy advertisement even when
extended link-modes are supported. Since legacy and extended
advertisement are mutually excluded, wait for empty reply from extended
advertisement before reading legacy advertisement.
Fixes: 6a89737241 ("net/mlx5: ethtool, Add ethtool support for 50Gbps per lane link modes")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Ethtool option set_link_ksettings allows setting of legacy link-modes
or extended link-modes. Refine the decision of which type of link-modes
is set.
Fixes: 6a89737241 ("net/mlx5: ethtool, Add ethtool support for 50Gbps per lane link modes")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Refresh tirs is looping over a global list of tirs while netdevs are
adding and removing tirs from that list. That is why a lock is
required.
Fixes: 724b2aa151 ("net/mlx5e: TIRs management refactoring")
Signed-off-by: Yuval Avnery <yuvalav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
idr_find() can return a NULL value to 'flow' which is used without a
check. The patch adds a check to avoid potential NULL pointer dereference.
In case of mlx5_fpga_sbu_conn_sendmsg() failure, free buf allocated
using kzalloc.
Fixes: ab412e1dd7 ("net/mlx5: Accel, add TLS rx offload routines")
Signed-off-by: Aditya Pakki <pakki001@umn.edu>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
For some protocols we are not allowing IP header rewrite offload, since
the HW is not capable to properly adjust the l4 checksum. However, TTL
& HOPLIMIT modification can be done for all IP protocols, because they
are not part of the pseudo header taken into account for checksum.
Fixes: 7386788175 ("drivers: net: use flow action infrastructure")
Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Previously, a false positive would be caught if the TIRs list is
empty, since the err value was initialized to -ENOMEM, and was only
updated if a TIR is refreshed. This is resolved by initializing the
err value to zero.
Fixes: b676f65389 ("net/mlx5e: Refactor refresh TIRs")
Signed-off-by: Gavi Teitz <gavi@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Delete initialization of high order entries in mr cache to decrease initial
memory footprint. When required, the administrator can populate the
entries with memory keys via the /sys interface.
This approach is very helpful to significantly reduce the per HW function
memory footprint in virtualization environments such as SRIOV.
Fixes: 9603b61de1 ("mlx5: Move pci device handling from mlx5_ib to mlx5_core")
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reported-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
We want to avoid leaking pointer info from xdp_frame (that is placed in
top of frame) like commit 6dfb970d3d ("xdp: avoid leaking info stored in
frame data on page reuse"), and followup commit 97e19cce05 ("bpf:
reserve xdp_frame size in xdp headroom") that reserve this headroom.
These changes also affected how cpumap constructed SKBs, as xdpf->headroom
size changed, the skb data starting point were in-effect shifted with 32
bytes (sizeof xdp_frame). This was still okay, as the cpumap frame_size
calculation also included xdpf->headroom which were reduced by same amount.
A bug was introduced in commit 77ea5f4cbe ("bpf/cpumap: make sure
frame_size for build_skb is aligned if headroom isn't"), where the
xdpf->headroom became part of the SKB_DATA_ALIGN rounding up. This
round-up to find the frame_size is in principle still correct as it does
not exceed the 2048 bytes frame_size (which is max for ixgbe and i40e),
but the 32 bytes offset of pkt_data_start puts this over the 2048 bytes
limit. This cause skb_shared_info to spill into next frame. It is a little
hard to trigger, as the SKB need to use above 15 skb_shinfo->frags[] as
far as I calculate. This does happen in practise for TCP streams when
skb_try_coalesce() kicks in.
KASAN can be used to detect these wrong memory accesses, I've seen:
BUG: KASAN: use-after-free in skb_try_coalesce+0x3cb/0x760
BUG: KASAN: wild-memory-access in skb_release_data+0xe2/0x250
Driver veth also construct a SKB from xdp_frame in this way, but is not
affected, as it doesn't reserve/deduct the room (used by xdp_frame) from
the SKB headroom. Instead is clears the pointers via xdp_scrub_frame(),
and allows SKB to use this area.
The fix in this patch is to do like veth and instead allow SKB to (re)use
the area occupied by xdp_frame, by clearing via xdp_scrub_frame(). (This
does kill the idea of the SKB being able to access (mem) info from this
area, but I guess it was a bad idea anyhow, and it was already killed by
the veth changes.)
Fixes: 77ea5f4cbe ("bpf/cpumap: make sure frame_size for build_skb is aligned if headroom isn't")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Pull drm fixes from Dave Airlie:
"Weekly fixes roundup, nothing two serious, some usb device regressions
are fixed, and i915 GVT has a bigger fix but otherwise not really much
happening here.
core:
- fb bpp check regression fix
- release/unplug fix
- use after free fixes
i915:
- fix mmap range checks
- fix gvt ppgtt mm LRU list access races
- fix selftest error pointer check
- fix a macro definition (pre-emptive for potential further backports)
- fix one AML SKU ULX status
amdgpu:
- one variable refresh rate fix
udl:
- fix EDID reading
tegra:
- build/warning fixes
meson:
- cleanup path fixes
- TMDS clock filter fix
rockchip:
- NV12 buffers and scalar fix"
* tag 'drm-fixes-2019-03-29' of git://anongit.freedesktop.org/drm/drm: (22 commits)
drm/i915/icl: Fix VEBOX mismatch BUG_ON()
drm/i915/selftests: Fix an IS_ERR() vs NULL check
drm/i915: Mark AML 0x87CA as ULX
drm/meson: fix TMDS clock filtering for DMT monitors
drm/meson: Uninstall IRQ handler
drm/meson: Fix invalid pointer in meson_drv_unbind()
drm/udl: Refactor edid retrieving in UDL driver (v2)
drm: Fix drm_release() and device unplug
drm/fb: avoid setting 0 depth.
drm/tegra: vic: Fix implicit function declaration warning
drm/tegra: hub: Fix dereference before check
drm/i915/icl: Fix the TRANS_DDI_FUNC_CTL2 bitfield macro
drm/amd/display: Only allow VRR when vrefresh is within supported range
drm/rockchip: vop: reset scale mode when win is disabled
drm/vkms: fix use-after-free when drm_gem_handle_create() fails
drm/vgem: fix use-after-free when drm_gem_handle_create() fails
drm/i915/gvt: Add mutual lock for ppgtt mm LRU list
drm/i915/gvt: Only assign ppgtt root at dispatch time
drm/i915/gvt: Don't submit request for error workload dispatch
drm/i915/gvt: stop scheduling workload when vgpu is inactive
...
Our MIPS 1004Kc SoCs were seeing random userspace crashes with SIGILL
and SIGSEGV that could not be traced back to a userspace code bug. They
had all the magic signs of an I/D cache coherency issue.
Now recently we noticed that the /proc/sys/vm/compact_memory interface
was quite efficient at provoking this class of userspace crashes.
Studying the code in mm/migrate.c there is a distinction made between
migrating a page that is mapped at the instant of migration and one that
is not mapped. Our problem turned out to be the non-mapped pages.
For the non-mapped page the code performs a copy of the page content and
all relevant meta-data of the page without doing the required D-cache
maintenance. This leaves dirty data in the D-cache of the CPU and on
the 1004K cores this data is not visible to the I-cache. A subsequent
page-fault that triggers a mapping of the page will happily serve the
process with potentially stale code.
What about ARM then, this bug should have seen greater exposure? Well
ARM became immune to this flaw back in 2010, see commit c01778001a
("ARM: 6379/1: Assume new page cache pages have dirty D-cache").
My proposed fix moves the D-cache maintenance inside move_to_new_page to
make it common for both cases.
Link: http://lkml.kernel.org/r/20190315083502.11849-1-larper@axis.com
Fixes: 97ee052461 ("flush cache before installing new page at migraton")
Signed-off-by: Lars Persson <larper@axis.com>
Reviewed-by: Paul Burton <paul.burton@mips.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Makoto report a below KASAN error: zram does out-of-bounds read. Because
strscpy copies from source up to count bytes unconditionally. It could
cause out-of-bounds read on next object in slab.
To prevent it, use strlcpy which checks source's length automatically.
BUG: KASAN: slab-out-of-bounds in strscpy+0x68/0x154
Read of size 8 at addr ffffffc0c3495a00 by task system_server/1314
..
Call trace:
strscpy+0x68/0x154
idle_store+0xc4/0x34c
dev_attr_store+0x50/0x6c
sysfs_kf_write+0x98/0xb4
kernfs_fop_write+0x198/0x260
__vfs_write+0x10c/0x338
vfs_write+0x114/0x238
SyS_write+0xc8/0x168
__sys_trace_return+0x0/0x4
Allocated by task 1314:
__kmalloc+0x280/0x318
kernfs_fop_write+0xac/0x260
__vfs_write+0x10c/0x338
vfs_write+0x114/0x238
SyS_write+0xc8/0x168
__sys_trace_return+0x0/0x4
Freed by task 2855:
kfree+0x138/0x630
kernfs_put_open_node+0x10c/0x124
kernfs_fop_release+0xd8/0x114
__fput+0x130/0x2a4
____fput+0x1c/0x28
task_work_run+0x16c/0x1c8
do_notify_resume+0x2bc/0x107c
work_pending+0x8/0x10
The buggy address belongs to the object at ffffffc0c3495a00
which belongs to the cache kmalloc-128 of size 128
The buggy address is located 0 bytes inside of
128-byte region [ffffffc0c3495a00, ffffffc0c3495a80)
The buggy address belongs to the page:
page:ffffffbf030d2500 count:1 mapcount:0 mapping: (null) index:0x0 compound_mapcount: 0
flags: 0x4000000000010200(slab|head)
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffffffc0c3495900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffffffc0c3495980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffffffc0c3495a00: 04 fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^
ffffffc0c3495a80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffffffc0c3495b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Link: http://lkml.kernel.org/r/20190319231911.145968-1-minchan@kernel.org
Cc: <stable@vger.kernel.org> [5.0]
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reported-by: Makoto Wu <makotowu@google.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are a few system calls (pselect, ppoll, etc) which replace a task
sigmask while they are running in a kernel-space
When a task calls one of these syscalls, the kernel saves a current
sigmask in task->saved_sigmask and sets a syscall sigmask.
On syscall-exit-stop, ptrace traps a task before restoring the
saved_sigmask, so PTRACE_GETSIGMASK returns the syscall sigmask and
PTRACE_SETSIGMASK does nothing, because its sigmask is replaced by
saved_sigmask, when the task returns to user-space.
This patch fixes this problem. PTRACE_GETSIGMASK returns saved_sigmask
if it's set. PTRACE_SETSIGMASK drops the TIF_RESTORE_SIGMASK flag.
Link: http://lkml.kernel.org/r/20181120060616.6043-1-avagin@gmail.com
Fixes: 29000caecb ("ptrace: add ability to get/set signal-blocked mask")
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
While debugging something, I added a dump_page() into do_swap_page(),
and I got the splat from below. The issue happens when dereferencing
mapping->host in __dump_page():
...
else if (mapping) {
pr_warn("%ps ", mapping->a_ops);
if (mapping->host->i_dentry.first) {
struct dentry *dentry;
dentry = container_of(mapping->host->i_dentry.first, struct dentry, d_u.d_alias);
pr_warn("name:\"%pd\" ", dentry);
}
}
...
Swap address space does not contain an inode information, and so
mapping->host equals NULL.
Although the dump_page() call was added artificially into
do_swap_page(), I am not sure if we can hit this from any other path, so
it looks worth fixing it. We can easily do that by checking
mapping->host first.
Link: http://lkml.kernel.org/r/20190318072931.29094-1-osalvador@suse.de
Fixes: 1c6fb1d89e ("mm: print more information about mapping in __dump_page")
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
kbuild produces the below warning:
tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
head: 5453a3df2a
commit 3d3539018d ("mm: create the new vm_fault_t type")
reproduce:
# apt-get install sparse
git checkout 3d3539018d
make ARCH=x86_64 allmodconfig
make C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__'
>> mm/memory.c:3968:21: sparse: incorrect type in assignment (different
>> base types) @@ expected restricted vm_fault_t [usertype] ret @@
>> got e] ret @@
mm/memory.c:3968:21: expected restricted vm_fault_t [usertype] ret
mm/memory.c:3968:21: got int
This patch converts to return vm_fault_t type for hugetlb_fault() when
CONFIG_HUGETLB_PAGE=n.
Regarding the sparse warning, Luc said:
: This is the expected behaviour. The constant 0 is magic regarding bitwise
: types but ({ ...; 0; }) is not, it is just an ordinary expression of type
: 'int'.
:
: So, IMHO, Souptick's patch is the right thing to do.
Link: http://lkml.kernel.org/r/20190318162604.GA31553@jordon-HP-15-Notebook-PC
Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables",
v6.
This is a followup to the discussion in [1], [2].
IOMMUs using ARMv7 short-descriptor format require page tables (level 1
and 2) to be allocated within the first 4GB of RAM, even on 64-bit
systems.
For L1 tables that are bigger than a page, we can just use
__get_free_pages with GFP_DMA32 (on arm64 systems only, arm would still
use GFP_DMA).
For L2 tables that only take 1KB, it would be a waste to allocate a full
page, so we considered 3 approaches:
1. This series, adding support for GFP_DMA32 slab caches.
2. genalloc, which requires pre-allocating the maximum number of L2 page
tables (4096, so 4MB of memory).
3. page_frag, which is not very memory-efficient as it is unable to reuse
freed fragments until the whole page is freed. [3]
This series is the most memory-efficient approach.
stable@ note:
We confirmed that this is a regression, and IOMMU errors happen on 4.19
and linux-next/master on MT8173 (elm, Acer Chromebook R13). The issue
most likely starts from commit ad67f5a654 ("arm64: replace ZONE_DMA
with ZONE_DMA32"), i.e. 4.15, and presumably breaks a number of Mediatek
platforms (and maybe others?).
[1] https://lists.linuxfoundation.org/pipermail/iommu/2018-November/030876.html
[2] https://lists.linuxfoundation.org/pipermail/iommu/2018-December/031696.html
[3] https://patchwork.codeaurora.org/patch/671639/
This patch (of 3):
IOMMUs using ARMv7 short-descriptor format require page tables to be
allocated within the first 4GB of RAM, even on 64-bit systems. On arm64,
this is done by passing GFP_DMA32 flag to memory allocation functions.
For IOMMU L2 tables that only take 1KB, it would be a waste to allocate
a full page using get_free_pages, so we considered 3 approaches:
1. This patch, adding support for GFP_DMA32 slab caches.
2. genalloc, which requires pre-allocating the maximum number of L2
page tables (4096, so 4MB of memory).
3. page_frag, which is not very memory-efficient as it is unable
to reuse freed fragments until the whole page is freed.
This change makes it possible to create a custom cache in DMA32 zone using
kmem_cache_create, then allocate memory using kmem_cache_alloc.
We do not create a DMA32 kmalloc cache array, as there are currently no
users of kmalloc(..., GFP_DMA32). These calls will continue to trigger a
warning, as we keep GFP_DMA32 in GFP_SLAB_BUG_MASK.
This implies that calls to kmem_cache_*alloc on a SLAB_CACHE_DMA32
kmem_cache must _not_ use GFP_DMA32 (it is anyway redundant and
unnecessary).
Link: http://lkml.kernel.org/r/20181210011504.122604-2-drinkcat@chromium.org
Signed-off-by: Nicolas Boichat <drinkcat@chromium.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Sasha Levin <Alexander.Levin@microsoft.com>
Cc: Huaisheng Ye <yehs1@lenovo.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Yong Wu <yong.wu@mediatek.com>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Tomasz Figa <tfiga@google.com>
Cc: Yingjoe Chen <yingjoe.chen@mediatek.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Hsin-Yi Wang <hsinyi@chromium.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ocfs2_reflink_inodes_lock() can swap the inode1/inode2 variables so that
we always grab cluster locks in order of increasing inode number.
Unfortunately, we forget to swap the inode record buffer head pointers
when we've done this, which leads to incorrect bookkeepping when we're
trying to make the two inodes have the same refcount tree.
This has the effect of causing filesystem shutdowns if you're trying to
reflink data from inode 100 into inode 97, where inode 100 already has a
refcount tree attached and inode 97 doesn't. The reflink code decides
to copy the refcount tree pointer from 100 to 97, but uses inode 97's
inode record to open the tree root (which it doesn't have) and blows up.
This issue causes filesystem shutdowns and metadata corruption!
Link: http://lkml.kernel.org/r/20190312214910.GK20533@magnolia
Fixes: 29ac8e856c ("ocfs2: implement the VFS clone_range, copy_range, and dedupe_range features")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Joseph Qi <jiangqi903@gmail.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit f1dd2cd13c ("mm, memory_hotplug: do not associate hotadded
memory to zones until online") introduced move_pfn_range_to_zone() which
calls memmap_init_zone() during onlining a memory block.
memmap_init_zone() will reset pagetype flags and makes migrate type to
be MOVABLE.
However, in __offline_pages(), it also call undo_isolate_page_range()
after offline_isolated_pages() to do the same thing. Due to commit
2ce13640b3 ("mm: __first_valid_page skip over offline pages") changed
__first_valid_page() to skip offline pages, undo_isolate_page_range()
here just waste CPU cycles looping around the offlining PFN range while
doing nothing, because __first_valid_page() will return NULL as
offline_isolated_pages() has already marked all memory sections within
the pfn range as offline via offline_mem_sections().
Also, after calling the "useless" undo_isolate_page_range() here, it
reaches the point of no returning by notifying MEM_OFFLINE. Those pages
will be marked as MIGRATE_MOVABLE again once onlining. The only thing
left to do is to decrease the number of isolated pageblocks zone counter
which would make some paths of the page allocation slower that the above
commit introduced.
Even if alloc_contig_range() can be used to isolate 16GB-hugetlb pages
on ppc64, an "int" should still be enough to represent the number of
pageblocks there. Fix an incorrect comment along the way.
[cai@lca.pw: v4]
Link: http://lkml.kernel.org/r/20190314150641.59358-1-cai@lca.pw
Link: http://lkml.kernel.org/r/20190313143133.46200-1-cai@lca.pw
Fixes: 2ce13640b3 ("mm: __first_valid_page skip over offline pages")
Signed-off-by: Qian Cai <cai@lca.pw>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org> [4.13+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
atomic64_read() on ppc64le returns "long int", so fix the same way as
commit d549f545e6 ("drm/virtio: use %llu format string form
atomic64_t") by adding a cast to u64, which makes it work on all arches.
In file included from ./include/linux/printk.h:7,
from ./include/linux/kernel.h:15,
from mm/debug.c:9:
mm/debug.c: In function 'dump_mm':
./include/linux/kern_levels.h:5:18: warning: format '%llx' expects argument of type 'long long unsigned int', but argument 19 has type 'long int' [-Wformat=]
#define KERN_SOH "A" /* ASCII Start Of Header */
^~~~~~
./include/linux/kern_levels.h:8:20: note: in expansion of macro
'KERN_SOH'
#define KERN_EMERG KERN_SOH "0" /* system is unusable */
^~~~~~~~
./include/linux/printk.h:297:9: note: in expansion of macro 'KERN_EMERG'
printk(KERN_EMERG pr_fmt(fmt), ##__VA_ARGS__)
^~~~~~~~~~
mm/debug.c:133:2: note: in expansion of macro 'pr_emerg'
pr_emerg("mm %px mmap %px seqnum %llu task_size %lu"
^~~~~~~~
mm/debug.c:140:17: note: format string is defined here
"pinned_vm %llx data_vm %lx exec_vm %lx stack_vm %lx"
~~~^
%lx
Link: http://lkml.kernel.org/r/20190310183051.87303-1-cai@lca.pw
Fixes: 70f8a3ca68 ("mm: make mm->pinned_vm an atomic64 counter")
Signed-off-by: Qian Cai <cai@lca.pw>
Acked-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Aneesh has reported that PPC triggers the following warning when
excercising DAX code:
IP set_pte_at+0x3c/0x190
LR insert_pfn+0x208/0x280
Call Trace:
insert_pfn+0x68/0x280
dax_iomap_pte_fault.isra.7+0x734/0xa40
__xfs_filemap_fault+0x280/0x2d0
do_wp_page+0x48c/0xa40
__handle_mm_fault+0x8d0/0x1fd0
handle_mm_fault+0x140/0x250
__do_page_fault+0x300/0xd60
handle_page_fault+0x18
Now that is WARN_ON in set_pte_at which is
VM_WARN_ON(pte_hw_valid(*ptep) && !pte_protnone(*ptep));
The problem is that on some architectures set_pte_at() cannot cope with
a situation where there is already some (different) valid entry present.
Use ptep_set_access_flags() instead to modify the pfn which is built to
deal with modifying existing PTE.
Link: http://lkml.kernel.org/r/20190311084537.16029-1-jack@suse.cz
Fixes: b2770da642 "mm: add vm_insert_mixed_mkwrite()"
Signed-off-by: Jan Kara <jack@suse.cz>
Reported-by: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Cc: Chandan Rajendra <chandan@linux.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
By default, cells in DT are 32-bit in size. The driver reads "ti,mode"
using the function of_property_read_u8() which causes the value to be
read incorrectly in little-endian architectures if the size is not
specified.
Make it explicit in the binding documentation that this prorperty must
be set as a 8-bit value.
Signed-off-by: Carlos Menin <menin@carlosaurelio.net>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Commit 7cc7de93fa ("hwmon: (ntc_thermistor) Convert to new hwmon API")
converted the driver to use the new hwmon API, but introduced a subtle
error: The temperature type is no longer reported as temp1_type, but as
temp2_type.
Fixes: 7cc7de93fa ("hwmon: (ntc_thermistor) Convert to new hwmon API")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
In the case of power sensor version 0xA0, the sensor indexing overlapped
with the "caps" power sensors, resulting in probe failure and kernel
warnings. Fix this by specifying the next index for each power sensor
version.
Fixes: 54076cb3b5 ("hwmon (occ): Add sensor attributes and register ...")
Cc: stable@vger.kernel.org
Signed-off-by: Eddie James <eajames@linux.ibm.com>
Tested-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
A check for vif is made in vnt_interrupt_work.
There is a small chance of leaving interrupt disabled while vif
is NULL and the work hasn't been scheduled.
Signed-off-by: Malcolm Priestley <tvboxspy@gmail.com>
CC: stable@vger.kernel.org # v4.2+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
After commit 419d6efc50, kernel cannot be crashed in the namei
path. However, corrupted nameoff can do harm in the process of
readdir for scenerios without dm-verity as well. Fix it now.
Fixes: 3aa8ec716e ("staging: erofs: add directory operations")
Cc: <stable@vger.kernel.org> # 4.19+
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If a device has an exclusion range specified in the IVRS
table, this region needs to be reserved in the iova-domain
of that device. This hasn't happened until now and can cause
data corruption on data transfered with these devices.
Treat exclusion ranges as reserved regions in the iommu-core
to fix the problem.
Fixes: be2a022c0d ('x86, AMD IOMMU: add functions to parse IOMMU memory mapping requirements for devices')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Reviewed-by: Gary R Hook <gary.hook@amd.com>
Johan writes:
USB-serial fixes for 5.1-rc3
Here's a fix for a long-standing refcount issue in the mos7720 parport
implementation, and a set of device id updates.
All have been in linux-next with no reported issues.
Signed-off-by: Johan Hovold <johan@kernel.org>
* tag 'usb-serial-5.1-rc3' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial:
USB: serial: option: add Olicard 600
USB: serial: cp210x: add new device id
USB: serial: mos7720: fix mos_parport refcount imbalance on error path
USB: serial: option: set driver_info for SIM5218 and compatibles
USB: serial: ftdi_sio: add additional NovaTech products
USB: serial: option: add support for Quectel EM12
The nvm_image is a large struct qedi_nvm_iscsi_image object of over 24K so
don't declare it on the stack just for a sizeof requirement; use sizeof on
struct qedi_nvm_iscsi_image instead.
Fixes: c77a2fa3ff ("scsi: qedi: Add the CRC size within iSCSI NVM image")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Backspace is not working on some terminal emulators which do not send the
key code defined by terminfo. Terminals either send '^H' (8) or '^?' (127).
But currently only '^?' is handled. Let's also handle '^H' for those
terminals.
Signed-off-by: Changbin Du <changbin.du@gmail.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Felipe writes:
usb: fixes for v5.1-rc2
One deadlock fix on f_hid. NET2280 got a fix on its dequeue
implementation and a fix for overrun of OUT messages.
DWC3 learned about another Intel product: Comment Lake PCH.
NET2272 got a similar fix to NET2280 on its dequeue implementation.
* tag 'fixes-for-v5.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb:
USB: gadget: f_hid: fix deadlock in f_hidg_write()
usb: gadget: net2272: Fix net2272_dequeue()
usb: gadget: net2280: Fix net2280_dequeue()
usb: gadget: net2280: Fix overrun of OUT messages
usb: dwc3: pci: add support for Comet Lake PCH ID
Holding the lock around the entire duration of tx scheduling can create
some nasty lock contention, especially when processing airtime information
from the tx status or the rx path.
Improve locking by only holding the active_txq_lock for lookups / scheduling
list modifications.
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This commit adds NL80211_FLAG_CLEAR_SKB flag to other NL commands
that carry key data to ensure they do not stick around on heap
after the SKB is freed.
Also introduced this flag for NL80211_CMD_VENDOR as there are sub
commands which configure the keys.
Signed-off-by: Sunil Dutt <usdutt@codeaurora.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
There are several scenarios in which mac80211 can call drv_wake_tx_queue
after ieee80211_restart_hw has been called and has not yet completed.
Driver private structs are considered uninitialized until mac80211 has
uploaded the vifs, stations and keys again, so using private tx queue
data during that time is not safe.
The driver can also not rely on drv_reconfig_complete to figure out when
it is safe to accept drv_wake_tx_queue calls again, because it is only
called after all tx queues are woken again.
To fix this, bail out early in drv_wake_tx_queue if local->in_reconfig
is set.
Cc: stable@vger.kernel.org
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When building with -Wsometimes-uninitialized, Clang warns:
net/wireless/util.c:1223:11: warning: variable 'result' is used
uninitialized whenever 'if' condition is false
[-Wsometimes-uninitialized]
Clang can't evaluate at this point that WARN(1, ...) always returns true
because __ret_warn_on is defined as !!(condition), which isn't
immediately evaluated as 1. Change this branch to else so that it's
clear to Clang that we intend to bail out here.
Link: https://github.com/ClangBuiltLinux/linux/issues/382
Suggested-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
skb->truesize can change due to memory reallocation or when adding extra
fragments. Adjust fq->memory_usage accordingly
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The support added for regulatory WMM rules did not handle
the case of regulatory domain intersections. Fix it.
Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Fixes: 230ebaa189 ("cfg80211: read wmm rules from regulatory database")
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Looks that 100 chars isn't enough for messages, as we keep getting
warnings popping from different places due to message shortening.
Instead of trying to shorten the prints, just increase the buffer size.
Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The pointer to the last four bytes of the address is not guaranteed to be
aligned, so we need to use __get_unaligned_cpu32 here
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Once a station enters powersave, its queues should not be returned by
ieee80211_next_txq() anymore. They will be re-scheduled again after the
station has woken up again
Fixes: 1866760096 ("mac80211: Add TXQ scheduling API")
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The vid_pll_div is a programmable fractional divider, but vendor gives a
limited of known configuration value and it's corresponding fraction.
Thus when at reset value (0) or unknown value, we cannot determine the
result rate.
The initial behaviour was to print a warning, but the warning triggers
at each boot and when the clock tree is refreshed.
This patch moves the print to debug and returns 0 instead of the
parent rate.
Fixes: 72dbb8c94d ("clk: meson: Add vid_pll divider driver")
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Reviewed-by: Jerome Brunet <jbrunet@baylibre.com>
Link: https://lkml.kernel.org/r/20190327151348.27402-1-narmstrong@baylibre.com
The DASD driver incorrectly limits the maximum number of blocks of ECKD
DASD volumes to 32 bit numbers. Volumes with a capacity greater than
2^32-1 blocks are incorrectly recognized as smaller volumes.
This results in the following volume capacity limits depending on the
formatted block size:
BLKSIZE MAX_GB MAX_CYL
512 2047 5843492
1024 4095 8676701
2048 8191 13634816
4096 16383 23860929
The same problem occurs when a volume with more than 17895697 cylinders
is accessed in raw-track-access mode.
Fix this problem by adding an explicit type cast when calculating the
maximum number of blocks.
Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Reviewed-by: Stefan Haberland <sth@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
IS_ENABLED should generally use CONFIG_ prefaced symbols and
it doesn't appear as if there is a BLK_DEV_INITRD define.
Cc: <stable@vger.kernel.org> # 4.20
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
IS_ENABLED should generally use CONFIG_ prefaced symbols and
it doesn't appear as if there is a CMODEL_MEDLOW define.
Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
shadow mm's pin count got increased in workload preparation phase, which
is after workload scanning.
it will get decreased in complete_current_workload() anyway after
workload completion.
Sometimes, if a workload meets a scanning error, its shadow mm pin count
will not get increased but will get decreased in the end.
This patch lets shadow mm's pin count not go below 0.
Fixes: 2707e44466 ("drm/i915/gvt: vGPU graphics memory virtualization")
Cc: zhenyuw@linux.intel.com
Cc: stable@vger.kernel.org #4.14+
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
in workload creation routine, if any failure occurs, do not queue this
workload for delivery. if this failure is fatal, enter into failsafe
mode.
Fixes: 6d76303553 ("drm/i915/gvt: Move common vGPU workload creation into scheduler.c")
Cc: stable@vger.kernel.org #4.19+
Cc: zhenyuw@linux.intel.com
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
gpio fixes for v5.1-rc3
- fix for a potential NULL-pointer dereference in the aspeed driver
- revert of the commit using the new gpio_set_config() when setting
debaunce and transitory state config as it caused a regression in
the aspeed driver
- two fixes for gpio-mockup for debugfs problems introduced in the
last merge window
When it is to cleanup net namespace, rds_tcp_exit_net() will call
rds_tcp_kill_sock(), if t_sock is NULL, it will not call
rds_conn_destroy(), rds_conn_path_destroy() and rds_tcp_conn_free() to free
connection, and the worker cp_conn_w is not stopped, afterwards the net is freed in
net_drop_ns(); While cp_conn_w rds_connect_worker() will call rds_tcp_conn_path_connect()
and reference 'net' which has already been freed.
In rds_tcp_conn_path_connect(), rds_tcp_set_callbacks() will set t_sock = sock before
sock->ops->connect, but if connect() is failed, it will call
rds_tcp_restore_callbacks() and set t_sock = NULL, if connect is always
failed, rds_connect_worker() will try to reconnect all the time, so
rds_tcp_kill_sock() will never to cancel worker cp_conn_w and free the
connections.
Therefore, the condition !tc->t_sock is not needed if it is going to do
cleanup_net->rds_tcp_exit_net->rds_tcp_kill_sock, because tc->t_sock is always
NULL, and there is on other path to cancel cp_conn_w and free
connection. So this patch is to fix this.
rds_tcp_kill_sock():
...
if (net != c_net || !tc->t_sock)
...
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
==================================================================
BUG: KASAN: use-after-free in inet_create+0xbcc/0xd28
net/ipv4/af_inet.c:340
Read of size 4 at addr ffff8003496a4684 by task kworker/u8:4/3721
CPU: 3 PID: 3721 Comm: kworker/u8:4 Not tainted 5.1.0 #11
Hardware name: linux,dummy-virt (DT)
Workqueue: krdsd rds_connect_worker
Call trace:
dump_backtrace+0x0/0x3c0 arch/arm64/kernel/time.c:53
show_stack+0x28/0x38 arch/arm64/kernel/traps.c:152
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x120/0x188 lib/dump_stack.c:113
print_address_description+0x68/0x278 mm/kasan/report.c:253
kasan_report_error mm/kasan/report.c:351 [inline]
kasan_report+0x21c/0x348 mm/kasan/report.c:409
__asan_report_load4_noabort+0x30/0x40 mm/kasan/report.c:429
inet_create+0xbcc/0xd28 net/ipv4/af_inet.c:340
__sock_create+0x4f8/0x770 net/socket.c:1276
sock_create_kern+0x50/0x68 net/socket.c:1322
rds_tcp_conn_path_connect+0x2b4/0x690 net/rds/tcp_connect.c:114
rds_connect_worker+0x108/0x1d0 net/rds/threads.c:175
process_one_work+0x6e8/0x1700 kernel/workqueue.c:2153
worker_thread+0x3b0/0xdd0 kernel/workqueue.c:2296
kthread+0x2f0/0x378 kernel/kthread.c:255
ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:1117
Allocated by task 687:
save_stack mm/kasan/kasan.c:448 [inline]
set_track mm/kasan/kasan.c:460 [inline]
kasan_kmalloc+0xd4/0x180 mm/kasan/kasan.c:553
kasan_slab_alloc+0x14/0x20 mm/kasan/kasan.c:490
slab_post_alloc_hook mm/slab.h:444 [inline]
slab_alloc_node mm/slub.c:2705 [inline]
slab_alloc mm/slub.c:2713 [inline]
kmem_cache_alloc+0x14c/0x388 mm/slub.c:2718
kmem_cache_zalloc include/linux/slab.h:697 [inline]
net_alloc net/core/net_namespace.c:384 [inline]
copy_net_ns+0xc4/0x2d0 net/core/net_namespace.c:424
create_new_namespaces+0x300/0x658 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xa0/0x198 kernel/nsproxy.c:206
ksys_unshare+0x340/0x628 kernel/fork.c:2577
__do_sys_unshare kernel/fork.c:2645 [inline]
__se_sys_unshare kernel/fork.c:2643 [inline]
__arm64_sys_unshare+0x38/0x58 kernel/fork.c:2643
__invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
invoke_syscall arch/arm64/kernel/syscall.c:47 [inline]
el0_svc_common+0x168/0x390 arch/arm64/kernel/syscall.c:83
el0_svc_handler+0x60/0xd0 arch/arm64/kernel/syscall.c:129
el0_svc+0x8/0xc arch/arm64/kernel/entry.S:960
Freed by task 264:
save_stack mm/kasan/kasan.c:448 [inline]
set_track mm/kasan/kasan.c:460 [inline]
__kasan_slab_free+0x114/0x220 mm/kasan/kasan.c:521
kasan_slab_free+0x10/0x18 mm/kasan/kasan.c:528
slab_free_hook mm/slub.c:1370 [inline]
slab_free_freelist_hook mm/slub.c:1397 [inline]
slab_free mm/slub.c:2952 [inline]
kmem_cache_free+0xb8/0x3a8 mm/slub.c:2968
net_free net/core/net_namespace.c:400 [inline]
net_drop_ns.part.6+0x78/0x90 net/core/net_namespace.c:407
net_drop_ns net/core/net_namespace.c:406 [inline]
cleanup_net+0x53c/0x6d8 net/core/net_namespace.c:569
process_one_work+0x6e8/0x1700 kernel/workqueue.c:2153
worker_thread+0x3b0/0xdd0 kernel/workqueue.c:2296
kthread+0x2f0/0x378 kernel/kthread.c:255
ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:1117
The buggy address belongs to the object at ffff8003496a3f80
which belongs to the cache net_namespace of size 7872
The buggy address is located 1796 bytes inside of
7872-byte region [ffff8003496a3f80, ffff8003496a5e40)
The buggy address belongs to the page:
page:ffff7e000d25a800 count:1 mapcount:0 mapping:ffff80036ce4b000
index:0x0 compound_mapcount: 0
flags: 0xffffe0000008100(slab|head)
raw: 0ffffe0000008100 dead000000000100 dead000000000200 ffff80036ce4b000
raw: 0000000000000000 0000000080040004 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff8003496a4580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8003496a4600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff8003496a4680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8003496a4700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8003496a4780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
Fixes: 467fa15356ac("RDS-TCP: Support multiple RDS-TCP listen endpoints, one per netns.")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Mao Wenan <maowenan@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski says:
====================
nfp: fix retcode and disable netpoll on representors
This series avoids a potential crash on nfp representor devices
when netpoll is in use. If transmitting the frame through underlying
vNIC fails we'd return an error code (by passing on error code from
__dev_queue_xmit()) and cause double free in netpoll code.
Fix the error code and disable netpoll on reprs altogether.
IRQ-safety of locking the queues and calling __dev_queue_xmit()
is questionable.
Big thanks to John Hurley for debugging and narrowing down
the trace log after I gave up! :)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
NFP reprs are software device on top of the PF's vNIC.
The comment above __dev_queue_xmit() sayeth:
When calling this method, interrupts MUST be enabled. This is because
the BH enable code must have IRQs enabled so that it will not deadlock.
For netconsole we can't guarantee IRQ state, let's just
disable netpoll on representors to be on the safe side.
When the initial implementation of NFP reprs was added by the
commit 5de73ee467 ("nfp: general representor implementation")
.ndo_poll_controller was required for netpoll to be enabled.
Fixes: ac3d9dd034 ("netpoll: make ndo_poll_controller() optional")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
dev_queue_xmit() may return error codes as well as netdev_tx_t,
and it always consumes the skb. Make sure we always return a
correct netdev_tx_t value.
Fixes: eadfa4c3be ("nfp: add stats and xmit helpers for representors")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
net_hash_mix() currently uses kernel address of a struct net,
and is used in many places that could be used to reveal this
address to a patient attacker, thus defeating KASLR, for
the typical case (initial net namespace, &init_net is
not dynamically allocated)
I believe the original implementation tried to avoid spending
too many cycles in this function, but security comes first.
Also provide entropy regardless of CONFIG_NET_NS.
Fixes: 0b4419162a ("netns: introduce the net_hash_mix "salt" for hashes")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Amit Klein <aksecurity@gmail.com>
Reported-by: Benny Pinkas <benny@pinkas.net>
Cc: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull PCI fixes from Bjorn Helgaas:
"PCI fixes:
- Clear level-triggered interrupts for the bandwidth notification
supported added for v5.1 (Alexandru Gagniuc)
- Clear bandwidth notification interrupts before enabling them (Lukas
Wunner)
- Report post-enumeration bandwidth changes only once for
multi-function devices (Lukas Wunner)"
* tag 'pci-v5.1-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI/LINK: Deduplicate bandwidth reports for multi-function devices
PCI/LINK: Clear bandwidth notification interrupt before enabling it
PCI/LINK: Supply IRQ handler so level-triggered IRQs are acked
Jeff Kirsher says:
====================
Intel Wired LAN Driver Fixes 2019-03-26
This series contains updates to igb, ixgbe, i40e and fm10k.
Jake fixes an issue with PTP in i40e where a previous commit resulted
in a regression where the driver would interpret small negative
adjustments as large positive additions, resulting in incorrect
behavior.
Arvind Sankar fixes an issue in igb where a previous commit would cause
a warning in the PCI pm core and resulted in pci_pm_runtime_suspend
would not call pci_save_state or pci_finish_runtime_suspend.
Ivan Vecera fixes MDIO bus registration with ixgbe, where the driver was
ignoring errors returned when registering and would leave the pointer in
a NULL state which triggered a BUG when un-registering.
Stefan Assmann fixes the check for Wake-On-LAN for i40e, which only
supports magic packet.
Yue Haibing fixes a potential NULL pointer de-reference in fm10k by
adding a simple check if the value is NULL.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Perf fails to parse uncore event alias, for example:
# perf stat -e unc_m_clockticks -a --no-merge sleep 1
event syntax error: 'unc_m_clockticks'
\___ parser error
Current code assumes that the event alias is from one specific PMU.
To find the PMU, perf strcmps the PMU name of event alias with the real
PMU name on the system.
However, the uncore event alias may be from multiple PMUs with common
prefix. The PMU name of uncore event alias is the common prefix.
For example, UNC_M_CLOCKTICKS is clock event for iMC, which include 6
PMUs with the same prefix "uncore_imc" on a skylake server.
The real PMU names on the system for iMC are uncore_imc_0 ...
uncore_imc_5.
The strncmp is used to only check the common prefix for uncore event
alias.
With the patch:
# perf stat -e unc_m_clockticks -a --no-merge sleep 1
Performance counter stats for 'system wide':
723,594,722 unc_m_clockticks [uncore_imc_5]
724,001,954 unc_m_clockticks [uncore_imc_3]
724,042,655 unc_m_clockticks [uncore_imc_1]
724,161,001 unc_m_clockticks [uncore_imc_4]
724,293,713 unc_m_clockticks [uncore_imc_2]
724,340,901 unc_m_clockticks [uncore_imc_0]
1.002090060 seconds time elapsed
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: stable@vger.kernel.org
Fixes: ea1fa48c05 ("perf stat: Handle different PMU names with common prefix")
Link: http://lkml.kernel.org/r/1552672814-156173-1-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
KVM/ARM fixes for 5.1
- Fix THP handling in the presence of pre-existing PTEs
- Honor request for PTE mappings even when THPs are available
- GICv4 performance improvement
- Take the srcu lock when writing to guest-controlled ITS data structures
- Reset the virtual PMU in preemptible context
- Various cleanups
pyside version 1 fails to handle python3 large integers in some cases,
resulting in Qt getting into a never-ending loop. This affects:
samples Table
samples_view Table
All branches Report
Selected branches Report
Add workarounds for those cases.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Fixes: beda0e725e ("perf script python: Add Python3 support to exported-sql-viewer.py")
Link: http://lkml.kernel.org/r/20190327072826.19168-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Since commit 1fb87b8e95 ("perf machine: Don't search for active kernel
start in __machine__create_kernel_maps"), the __machine__create_kernel_maps()
just create a map what start and end are both zero. Though the address will be
updated later, the order of map in the rbtree may be incorrect.
The commit ee05d21791 ("perf machine: Set main kernel end address properly")
fixed the logic in machine__create_kernel_maps(), but it's still wrong in
function machine__process_kernel_mmap_event().
To reproduce this issue, we need an environment which the module address
is before the kernel text segment. I tested it on an aarch64 machine with
kernel 4.19.25:
[root@localhost hulk]# grep _stext /proc/kallsyms
ffff000008081000 T _stext
[root@localhost hulk]# grep _etext /proc/kallsyms
ffff000009780000 R _etext
[root@localhost hulk]# tail /proc/modules
hisi_sas_v2_hw 77824 0 - Live 0xffff00000191d000
nvme_core 126976 7 nvme, Live 0xffff0000018b6000
mdio 20480 1 ixgbe, Live 0xffff0000018ab000
hisi_sas_main 106496 1 hisi_sas_v2_hw, Live 0xffff000001861000
hns_mdio 20480 2 - Live 0xffff000001822000
hnae 28672 3 hns_dsaf,hns_enet_drv, Live 0xffff000001815000
dm_mirror 40960 0 - Live 0xffff000001804000
dm_region_hash 32768 1 dm_mirror, Live 0xffff0000017f5000
dm_log 32768 2 dm_mirror,dm_region_hash, Live 0xffff0000017e7000
dm_mod 315392 17 dm_mirror,dm_log, Live 0xffff000001780000
[root@localhost hulk]#
Before fix:
[root@localhost bin]# perf record sleep 3
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.011 MB perf.data (9 samples) ]
[root@localhost bin]# perf buildid-list -i perf.data
4c4e46c971ca935f781e603a09b52a92e8bdfee8 [vdso]
[root@localhost bin]# perf buildid-list -i perf.data -H
0000000000000000000000000000000000000000 /proc/kcore
[root@localhost bin]#
After fix:
[root@localhost tools]# ./perf/perf record sleep 3
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.011 MB perf.data (9 samples) ]
[root@localhost tools]# ./perf/perf buildid-list -i perf.data
28a6c690262896dbd1b5e1011ed81623e6db0610 [kernel.kallsyms]
106c14ce6e4acea3453e484dc604d66666f08a2f [vdso]
[root@localhost tools]# ./perf/perf buildid-list -i perf.data -H
28a6c690262896dbd1b5e1011ed81623e6db0610 /proc/kcore
Signed-off-by: Wei Li <liwei391@huawei.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Kim Phillips <kim.phillips@arm.com>
Cc: Li Bin <huawei.libin@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20190228092003.34071-1-liwei391@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick up the changes introduced in the following csets:
2b188cc1bb ("Add io_uring IO interface")
edafccee56 ("io_uring: add support for pre-mapped user IO buffers")
3eb39f4793 ("signal: add pidfd_send_signal() syscall")
This makes 'perf trace' to become aware of these new syscalls, so that
one can use them like 'perf trace -e ui_uring*,*signal' to do a system
wide strace-like session looking at those syscalls, for instance.
For example:
# perf trace -s io_uring-cp ~acme/isos/RHEL-x86_64-dvd1.iso ~/bla
Summary of events:
io_uring-cp (383), 1208866 events, 100.0%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
-------------- ------ -------- ------ ------- ------- ------
io_uring_enter 605780 2955.615 0.000 0.005 33.804 1.94%
openat 4 459.446 0.004 114.861 459.435 100.00%
munmap 4 0.073 0.009 0.018 0.042 44.03%
mmap 10 0.054 0.002 0.005 0.026 43.24%
brk 28 0.038 0.001 0.001 0.003 7.51%
io_uring_setup 1 0.030 0.030 0.030 0.030 0.00%
mprotect 4 0.014 0.002 0.004 0.005 14.32%
close 5 0.012 0.001 0.002 0.004 28.87%
fstat 3 0.006 0.001 0.002 0.003 35.83%
read 4 0.004 0.001 0.001 0.002 13.58%
access 1 0.003 0.003 0.003 0.003 0.00%
lseek 3 0.002 0.001 0.001 0.001 9.00%
arch_prctl 2 0.002 0.001 0.001 0.001 0.69%
execve 1 0.000 0.000 0.000 0.000 0.00%
#
# perf trace -e io_uring* -s io_uring-cp ~acme/isos/RHEL-x86_64-dvd1.iso ~/bla
Summary of events:
io_uring-cp (390), 1191250 events, 100.0%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
-------------- ------ -------- ------ ------ ------ ------
io_uring_enter 597093 2706.060 0.001 0.005 14.761 1.10%
io_uring_setup 1 0.038 0.038 0.038 0.038 0.00%
#
More work needed to make the tools/perf/examples/bpf/augmented_raw_syscalls.c
BPF program to copy the 'struct io_uring_params' arguments to perf's ring
buffer so that 'perf trace' can use the BTF info put in place by pahole's
conversion of the kernel DWARF and then auto-beautify those arguments.
This patch produces the expected change in the generated syscalls table
for x86_64:
--- /tmp/build/perf/arch/x86/include/generated/asm/syscalls_64.c.before 2019-03-26 13:37:46.679057774 -0300
+++ /tmp/build/perf/arch/x86/include/generated/asm/syscalls_64.c 2019-03-26 13:38:12.755990383 -0300
@@ -334,5 +334,9 @@ static const char *syscalltbl_x86_64[] =
[332] = "statx",
[333] = "io_pgetevents",
[334] = "rseq",
+ [424] = "pidfd_send_signal",
+ [425] = "io_uring_setup",
+ [426] = "io_uring_enter",
+ [427] = "io_uring_register",
};
-#define SYSCALLTBL_x86_64_MAX_ID 334
+#define SYSCALLTBL_x86_64_MAX_ID 427
This silences these perf build warnings:
Warning: Kernel ABI header at 'tools/include/uapi/asm-generic/unistd.h' differs from latest version at 'include/uapi/asm-generic/unistd.h'
diff -u tools/include/uapi/asm-generic/unistd.h include/uapi/asm-generic/unistd.h
Warning: Kernel ABI header at 'tools/perf/arch/x86/entry/syscalls/syscall_64.tbl' differs from latest version at 'arch/x86/entry/syscalls/syscall_64.tbl'
diff -u tools/perf/arch/x86/entry/syscalls/syscall_64.tbl arch/x86/entry/syscalls/syscall_64.tbl
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Christian Brauner <christian@brauner.io>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Link: https://lkml.kernel.org/n/tip-p0ars3otuc52x5iznf21shhw@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To deal with the move of some defines from asm-generic/mmap-common.h to
linux/mman.h done in:
746c9398f5 ("arch: move common mmap flags to linux/mman.h")
The generated mmap_flags array stays the same:
$ tools/perf/trace/beauty/mmap_flags.sh
static const char *mmap_flags[] = {
[ilog2(0x40) + 1] = "32BIT",
[ilog2(0x01) + 1] = "SHARED",
[ilog2(0x02) + 1] = "PRIVATE",
[ilog2(0x10) + 1] = "FIXED",
[ilog2(0x20) + 1] = "ANONYMOUS",
[ilog2(0x100000) + 1] = "FIXED_NOREPLACE",
[ilog2(0x0100) + 1] = "GROWSDOWN",
[ilog2(0x0800) + 1] = "DENYWRITE",
[ilog2(0x1000) + 1] = "EXECUTABLE",
[ilog2(0x2000) + 1] = "LOCKED",
[ilog2(0x4000) + 1] = "NORESERVE",
[ilog2(0x8000) + 1] = "POPULATE",
[ilog2(0x10000) + 1] = "NONBLOCK",
[ilog2(0x20000) + 1] = "STACK",
[ilog2(0x40000) + 1] = "HUGETLB",
[ilog2(0x80000) + 1] = "SYNC",
};
$
And to have the system's sys/mman.h find the definition of MAP_SHARED
and MAP_PRIVATE, make sure they are defined in the tools/ mman-common.h
in a way that keeps it the same as the kernel's, need for keeping the
Android's NDK cross build working.
This silences these perf build warnings:
Warning: Kernel ABI header at 'tools/include/uapi/asm-generic/mman-common.h' differs from latest version at 'include/uapi/asm-generic/mman-common.h'
diff -u tools/include/uapi/asm-generic/mman-common.h include/uapi/asm-generic/mman-common.h
Warning: Kernel ABI header at 'tools/include/uapi/linux/mman.h' differs from latest version at 'include/uapi/linux/mman.h'
diff -u tools/include/uapi/linux/mman.h include/uapi/linux/mman.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-h80ycpc6pedg9s5z2rwpy6ws@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
After a discussion with Andi, move the perf_event_attr.precise_ip
detection for maximum precise config (via :P modifier or for default
cycles event) to perf_evsel__open().
The current detection in perf_event_attr__set_max_precise_ip() is
tricky, because precise_ip config is specific for given event and it
currently checks only hw cycles.
We now check for valid precise_ip value right after failing
sys_perf_event_open() for specific event, before any of the
perf_event_attr fallback code gets executed.
This way we get the proper config in perf_event_attr together with
allowed precise_ip settings.
We can see that code activity with -vv, like:
$ perf record -vv ls
...
------------------------------------------------------------
perf_event_attr:
size 112
{ sample_period, sample_freq } 4000
...
precise_ip 3
sample_id_all 1
exclude_guest 1
mmap2 1
comm_exec 1
ksymbol 1
------------------------------------------------------------
sys_perf_event_open: pid 9926 cpu 0 group_fd -1 flags 0x8
sys_perf_event_open failed, error -95
decreasing precise_ip by one (2)
------------------------------------------------------------
perf_event_attr:
size 112
{ sample_period, sample_freq } 4000
...
precise_ip 2
sample_id_all 1
exclude_guest 1
mmap2 1
comm_exec 1
ksymbol 1
------------------------------------------------------------
sys_perf_event_open: pid 9926 cpu 0 group_fd -1 flags 0x8 = 4
...
Suggested-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/n/tip-dkvxxbeg7lu74155d4jhlmc9@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The following error was thrown when compiling `tools/perf` using OpenCSD
v0.11.1. This patch fixes said error.
CC util/intel-pt-decoder/intel-pt-log.o
CC util/cs-etm-decoder/cs-etm-decoder.o
util/cs-etm-decoder/cs-etm-decoder.c: In function
‘cs_etm_decoder__buffer_range’:
util/cs-etm-decoder/cs-etm-decoder.c:370:2: error: enumeration value
‘OCSD_INSTR_WFI_WFE’ not handled in switch [-Werror=switch-enum]
switch (elem->last_i_type) {
^~~~~~
CC util/intel-pt-decoder/intel-pt-decoder.o
cc1: all warnings being treated as errors
Because `OCSD_INSTR_WFI_WFE` case was added only in v0.11.0, the minimum
required OpenCSD library version for this patch is no longer v0.10.0.
Signed-off-by: Solomon Tan <solomonbobstoner@gmail.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Walker <robert.walker@arm.com>
Cc: Suzuki K Poulouse <suzuki.poulose@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/20190322052255.GA4809@w-OptiPlex-7050
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull NVMe fixes from Christoph:
"A few accumulated small fixes:
- fix an endianess misannotation that sneaked in this merge window in
nvme-tcp (me)
- fix nvme-loop to handle multi-page segments (Ming)
- fix error handling in the nvmet configfs code (Max)
- add a missing requeue point in the multipath code (Martin George)"
* 'nvme-5.1' of git://git.infradead.org/nvme:
nvmet: fix error flow during ns enable
nvmet: fix building bvec from sg list
nvme-multipath: relax ANA state check
nvme-tcp: fix an endianess miss-annotation
In case we fail to enable p2pmem on the current namespace, disable the
backing store device before exiting.
Cc: Stephen Bates <sbates@raithlin.com>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
There are two mistakes for building bvec from sg list for file
backed ns:
- use request data length to compute number of io vector, this way
doesn't consider sg->offset, and the result may be smaller than required
io vectors
- bvec->bv_len isn't capped by sg->length
This patch fixes this issue by building bvec from sg directly, given
the whole IO stack is ready for multi-page bvec.
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Fixes: 3a85a5de29 ("nvme-loop: add a NVMe loopback host driver")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
When undergoing state transitions I/O might be requeued, hence
we should always call nvme_mpath_set_live() to schedule requeue_work
whenever the nvme device is live, independent on whether the
old state was live or not.
Signed-off-by: Martin George <marting@netapp.com>
Signed-off-by: Gargi Srinivas <sring@netapp.com>
Signed-off-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
nvme_tcp_end_request just takes the status value and the converts
it to little endian as well as shifting for the phase bit.
Fixes: 43ce38a6d823 ("nvme-tcp: support C2HData with SUCCESS flag")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Whan a filesystem is mounted with the nologreplay mount option, which
requires it to be mounted in RO mode as well, we can not allow discard on
free space inside block groups, because log trees refer to extents that
are not pinned in a block group's free space cache (pinning the extents is
precisely the first phase of replaying a log tree).
So do not allow the fitrim ioctl to do anything when the filesystem is
mounted with the nologreplay option, because later it can be mounted RW
without that option, which causes log replay to happen and result in
either a failure to replay the log trees (leading to a mount failure), a
crash or some silent corruption.
Reported-by: Darrick J. Wong <darrick.wong@oracle.com>
Fixes: 96da09192c ("btrfs: Introduce new mount option to disable tree log replay")
CC: stable@vger.kernel.org # 4.9+
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Calling read() for a single byte read will return 2 currently. Use
simple_read_from_buffer() which correctly handles all sizes.
Fixes: 2a9e27408e ("gpio: mockup: rework debugfs interface")
Reviewed-by: Mukesh Ojha <mojha@codeaurora.org>
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
If the call to of_gpiochip_scan_gpios() in of_gpiochip_add() fails, no
error handling is performed. This lead to the need of callers to call
of_gpiochip_remove() on failure, which causes "BAD of_node_put() on ..."
if the failure happened before the call to of_node_get().
Fix this by adding proper error handling.
Note that calling gpiochip_remove_pin_ranges() multiple times causes no
harm: subsequent calls are a no-op.
Fixes: dfbd379ba9 ("gpio: of: Return error if gpio hog configuration failed")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Mukesh Ojha <mojha@codeaurora.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Simon Wunderlich says:
====================
Here are some batman-adv bugfixes:
- Fix refcount underflows in bridge loop avoidance code,
by Sven Eckelmann (3 patches)
- Fix warning when CFG80211 isn't enabled, by Anders Roxell
- Fix genl notification for throughput override, by Sven Eckelmann
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The documentation does not mention how to delete a slot, add the
information.
Reported-by: Nathaniel McCallum <npmccallum@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The series to add memcg accounting to KVM allocations[1] states:
There are many KVM kernel memory allocations which are tied to the
life of the VM process and should be charged to the VM process's
cgroup.
While it is correct to account KVM kernel allocations to the cgroup of
the process that created the VM, it's technically incorrect to state
that the KVM kernel memory allocations are tied to the life of the VM
process. This is because the VM itself, i.e. struct kvm, is not tied to
the life of the process which created it, rather it is tied to the life
of its associated file descriptor. In other words, kvm_destroy_vm() is
not invoked until fput() decrements its associated file's refcount to
zero. A simple example is to fork() in Qemu and have the child sleep
indefinitely; kvm_destroy_vm() isn't called until Qemu closes its file
descriptor *and* the rogue child is killed.
The allocations are guaranteed to be *accounted* to the process which
created the VM, but only because KVM's per-{VM,vCPU} ioctls reject the
ioctl() with -EIO if kvm->mm != current->mm. I.e. the child can keep
the VM "alive" but can't do anything useful with its reference.
Note that because 'struct kvm' also holds a reference to the mm_struct
of its owner, the above behavior also applies to userspace allocations.
Given that mucking with a VM's file descriptor can lead to subtle and
undesirable behavior, e.g. memcg charges persisting after a VM is shut
down, explicitly document a VM's lifecycle and its impact on the VM's
resources.
Alternatively, KVM could aggressively free resources when the creating
process exits, e.g. via mmu_notifier->release(). However, mmu_notifier
isn't guaranteed to be available, and freeing resources when the creator
exits is likely to be error prone and fragile as KVM would need to
ensure that it only freed resources that are truly out of reach. In
practice, the existing behavior shouldn't be problematic as a properly
configured system will prevent a child process from being moved out of
the appropriate cgroup hierarchy, i.e. prevent hiding the process from
the OOM killer, and will prevent an unprivileged user from being able to
to hold a reference to struct kvm via another method, e.g. debugfs.
[1]https://patchwork.kernel.org/patch/10806707/
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Documentation/virtual/kvm/api.txt states:
NOTE: For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_PAPR and
KVM_EXIT_EPR the corresponding operations are complete (and guest
state is consistent) only after userspace has re-entered the
kernel with KVM_RUN. The kernel side will first finish incomplete
operations and then check for pending signals. Userspace can
re-enter the guest with an unmasked signal pending to complete
pending operations.
Because guest state may be inconsistent, starting state migration after
an IO exit without first completing IO may result in test failures, e.g.
a proposed change to KVM's handling of %rip in its fast PIO handling[1]
will cause the new VM, i.e. the post-migration VM, to have its %rip set
to the IN instruction that triggered KVM_EXIT_IO, leading to a test
assertion due to a stage mismatch.
For simplicitly, require KVM_CAP_IMMEDIATE_EXIT to complete IO and skip
the test if it's not available. The addition of KVM_CAP_IMMEDIATE_EXIT
predates the state selftest by more than a year.
[1] https://patchwork.kernel.org/patch/10848545/
Fixes: fa3899add1 ("kvm: selftests: add basic test for state save and restore")
Reported-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Since 4.8.3, gcc has enabled -fstack-protector by default. This is
problematic for the KVM selftests as they do not configure fs or gs
segments (the stack canary is pulled from fs:0x28). With the default
behavior, gcc will insert a stack canary on any function that creates
buffers of 8 bytes or more. As a result, ucall() will hit a triple
fault shutdown due to reading a bad fs segment when inserting its
stack canary, i.e. every test fails with an unexpected SHUTDOWN.
Fixes: 14c47b7530 ("kvm: selftests: introduce ucall")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM selftests embed the guest "image" as a function in the test itself
and extract the guest code at runtime by manually parsing the elf
headers. The parsing is very simple and doesn't supporting fancy things
like position independent executables. Recent versions of gcc enable
pie by default, which results in triple fault shutdowns in the guest due
to the virtual address in the headers not matching up with the virtual
address retrieved from the function pointer.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
...so that the test doesn't end up in an infinite loop if it fails for
whatever reason, e.g. SHUTDOWN due to gcc inserting stack canary code
into ucall() and attempting to derefence a null segment.
Fixes: ca35906688 ("kvm: selftests: add cr4_cpuid_sync_test")
Cc: Wei Huang <wei@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Most (all?) x86 platforms provide a port IO based reset mechanism, e.g.
OUT 92h or CF9h. Userspace may emulate said mechanism, i.e. reset a
vCPU in response to KVM_EXIT_IO, without explicitly announcing to KVM
that it is doing a reset, e.g. Qemu jams vCPU state and resumes running.
To avoid corruping %rip after such a reset, commit 0967b7bf1c ("KVM:
Skip pio instruction when it is emulated, not executed") changed the
behavior of PIO handlers, i.e. today's "fast" PIO handling to skip the
instruction prior to exiting to userspace. Full emulation doesn't need
such tricks becase re-emulating the instruction will naturally handle
%rip being changed to point at the reset vector.
Updating %rip prior to executing to userspace has several drawbacks:
- Userspace sees the wrong %rip on the exit, e.g. if PIO emulation
fails it will likely yell about the wrong address.
- Single step exits to userspace for are effectively dropped as
KVM_EXIT_DEBUG is overwritten with KVM_EXIT_IO.
- Behavior of PIO emulation is different depending on whether it
goes down the fast path or the slow path.
Rather than skip the PIO instruction before exiting to userspace,
snapshot the linear %rip and cancel PIO completion if the current
value does not match the snapshot. For a 64-bit vCPU, i.e. the most
common scenario, the snapshot and comparison has negligible overhead
as VMCS.GUEST_RIP will be cached regardless, i.e. there is no extra
VMREAD in this case.
All other alternatives to snapshotting the linear %rip that don't
rely on an explicit reset announcenment suffer from one corner case
or another. For example, canceling PIO completion on any write to
%rip fails if userspace does a save/restore of %rip, and attempting to
avoid that issue by canceling PIO only if %rip changed then fails if PIO
collides with the reset %rip. Attempting to zero in on the exact reset
vector won't work for APs, which means adding more hooks such as the
vCPU's MP_STATE, and so on and so forth.
Checking for a linear %rip match technically suffers from corner cases,
e.g. userspace could theoretically rewrite the underlying code page and
expect a different instruction to execute, or the guest hardcodes a PIO
reset at 0xfffffff0, but those are far, far outside of what can be
considered normal operation.
Fixes: 432baf60ee ("KVM: VMX: use kvm_fast_pio_in for handling IN I/O")
Cc: <stable@vger.kernel.org>
Reported-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When userspace initializes guest vCPUs it may want to zero all supported
MSRs including Hyper-V related ones including HV_X64_MSR_STIMERn_CONFIG/
HV_X64_MSR_STIMERn_COUNT. With commit f3b138c5d8 ("kvm/x86: Update SynIC
timers on guest entry only") we began doing stimer_mark_pending()
unconditionally on every config change.
The issue I'm observing manifests itself as following:
- Qemu writes 0 to STIMERn_{CONFIG,COUNT} MSRs and marks all stimers as
pending in stimer_pending_bitmap, arms KVM_REQ_HV_STIMER;
- kvm_hv_has_stimer_pending() starts returning true;
- kvm_vcpu_has_events() starts returning true;
- kvm_arch_vcpu_runnable() starts returning true;
- when kvm_arch_vcpu_ioctl_run() gets into
(vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED) case:
- kvm_vcpu_block() gets in 'kvm_vcpu_check_block(vcpu) < 0' and returns
immediately, avoiding normal wait path;
- -EAGAIN is returned from kvm_arch_vcpu_ioctl_run() immediately forcing
userspace to retry.
So instead of normal wait path we get a busy loop on all secondary vCPUs
before they get INIT signal. This seems to be undesirable, especially given
that this happens even when Hyper-V extensions are not used.
Generally, it seems to be pointless to mark an stimer as pending in
stimer_pending_bitmap and arm KVM_REQ_HV_STIMER as the only thing
kvm_hv_process_stimers() will do is clear the corresponding bit. We may
just not mark disabled timers as pending instead.
Fixes: f3b138c5d8 ("kvm/x86: Update SynIC timers on guest entry only")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Since MSR_IA32_ARCH_CAPABILITIES is emualted unconditionally even if
host doesn't suppot it. We should move it to array emulated_msrs from
arry msrs_to_save, to report to userspace that guest support this msr.
Signed-off-by: Xiaoyao Li <xiaoyao.li@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The CPUID flag ARCH_CAPABILITIES is unconditioinally exposed to host
userspace for all x86 hosts, i.e. KVM advertises ARCH_CAPABILITIES
regardless of hardware support under the pretense that KVM fully
emulates MSR_IA32_ARCH_CAPABILITIES. Unfortunately, only VMX hosts
handle accesses to MSR_IA32_ARCH_CAPABILITIES (despite KVM_GET_MSRS
also reporting MSR_IA32_ARCH_CAPABILITIES for all hosts).
Move the MSR_IA32_ARCH_CAPABILITIES handling to common x86 code so
that it's emulated on AMD hosts.
Fixes: 1eaafe91a0 ("kvm: x86: IA32_ARCH_CAPABILITIES is always supported")
Cc: stable@vger.kernel.org
Reported-by: Xiaoyao Li <xiaoyao.li@linux.intel.com>
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The function irqfd_wakeup() has flags defined as __poll_t and then it
has additional flags which is used for irqflags.
Redefine the inner flags variable as iflags so it does not shadow the
outer flags.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: kvm@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Replace kvm_flush_remote_tlbs with kvm_flush_remote_tlbs_with_address
in slot_handle_level_range. When range based flushes are not enabled
kvm_flush_remote_tlbs_with_address falls back to kvm_flush_remote_tlbs.
This changes the behavior of many functions that indirectly use
slot_handle_level_range, iff the range based flushes are enabled. The
only potential problem I see with this is that kvm->tlbs_dirty will be
cleared less often, however the only caller of slot_handle_level_range that
checks tlbs_dirty is kvm_mmu_notifier_invalidate_range_start which
checks it and does a kvm_flush_remote_tlbs after calling
kvm_unmap_hva_range anyway.
Tested: Ran all kvm-unit-tests on a Intel Haswell machine with and
without this patch. The patch introduced no new failures.
Signed-off-by: Ben Gardon <bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
I do not see any consistency about headers_install of <linux/kvm_para.h>
and <asm/kvm_para.h>.
According to my analysis of Linux 5.1-rc1, there are 3 groups:
[1] Both <linux/kvm_para.h> and <asm/kvm_para.h> are exported
alpha, arm, hexagon, mips, powerpc, s390, sparc, x86
[2] <asm/kvm_para.h> is exported, but <linux/kvm_para.h> is not
arc, arm64, c6x, h8300, ia64, m68k, microblaze, nios2, openrisc,
parisc, sh, unicore32, xtensa
[3] Neither <linux/kvm_para.h> nor <asm/kvm_para.h> is exported
csky, nds32, riscv
This does not match to the actual KVM support. At least, [2] is
half-baked.
Nor do arch maintainers look like they care about this. For example,
commit 0add53713b ("microblaze: Add missing kvm_para.h to Kbuild")
exported <asm/kvm_para.h> to user-space in order to fix an in-kernel
build error.
We have two ways to make this consistent:
[A] export both <linux/kvm_para.h> and <asm/kvm_para.h> for all
architectures, irrespective of the KVM support
[B] Match the header export of <linux/kvm_para.h> and <asm/kvm_para.h>
to the KVM support
My first attempt was [A] because the code looks cleaner, but Paolo
suggested [B].
So, this commit goes with [B].
For most architectures, <asm/kvm_para.h> was moved to the kernel-space.
I changed include/uapi/linux/Kbuild so that it checks generated
asm/kvm_para.h as well as check-in ones.
After this commit, there will be two groups:
[1] Both <linux/kvm_para.h> and <asm/kvm_para.h> are exported
arm, arm64, mips, powerpc, s390, x86
[2] Neither <linux/kvm_para.h> nor <asm/kvm_para.h> is exported
alpha, arc, c6x, csky, h8300, hexagon, ia64, m68k, microblaze,
nds32, nios2, openrisc, parisc, riscv, sh, sparc, unicore32, xtensa
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* nr_mmu_pages would be non-zero only if kvm->arch.n_requested_mmu_pages is
non-zero.
* nr_mmu_pages is always non-zero, since kvm_mmu_calculate_mmu_pages()
never return zero.
Based on these two reasons, we can merge the two *if* clause and use the
return value from kvm_mmu_calculate_mmu_pages() directly. This simplify
the code and also eliminate the possibility for reader to believe
nr_mmu_pages would be zero.
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
According to section "Checks on VMX Controls" in Intel SDM vol 3C, the
following check is performed on vmentry of L2 guests:
On processors that support Intel 64 architecture, the IA32_SYSENTER_ESP
field and the IA32_SYSENTER_EIP field must each contain a canonical
address.
Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Errata#1096:
On a nested data page fault when CR.SMAP=1 and the guest data read
generates a SMAP violation, GuestInstrBytes field of the VMCB on a
VMEXIT will incorrectly return 0h instead the correct guest
instruction bytes .
Recommend Workaround:
To determine what instruction the guest was executing the hypervisor
will have to decode the instruction at the instruction pointer.
The recommended workaround can not be implemented for the SEV
guest because guest memory is encrypted with the guest specific key,
and instruction decoder will not be able to decode the instruction
bytes. If we hit this errata in the SEV guest then log the message
and request a guest shutdown.
Reported-by: Venkatesh Srinivas <venkateshs@google.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM's API requires thats ioctls must be issued from the same process
that created the VM. In other words, userspace can play games with a
VM's file descriptors, e.g. fork(), SCM_RIGHTS, etc..., but only the
creator can do anything useful. Explicitly reject device ioctls that
are issued by a process other than the VM's creator, and update KVM's
API documentation to extend its requirements to device ioctls.
Fixes: 852b6d57dc ("kvm: add device control API")
Cc: <stable@vger.kernel.org>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Per Paolo[1], instantiating multiple VMs in a single process is legal;
but this conflicts with KVM's API documentation, which states:
The only supported use is one virtual machine per process, and one
vcpu per thread.
However, an earlier section in the documentation states:
Only run VM ioctls from the same process (address space) that was used
to create the VM.
and:
Only run vcpu ioctls from the same thread that was used to create the
vcpu.
This suggests that the conflicting documentation is simply an incorrect
ordering of of words, i.e. what's really meant is that a virtual machine
can't be shared across multiple processes and a vCPU can't be shared
across multiple threads.
Tweak the blurb on issuing ioctls to use a more assertive tone, and
rewrite the "supported use" sentence to reference said blurb instead of
poorly restating it in different terms.
Opportunistically add missing punctuation.
[1] https://lkml.kernel.org/r/f23265d4-528e-3bd4-011f-4d7b8f3281db@redhat.com
Fixes: 9c1b96e347 ("KVM: Document basic API")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
[Improve notes on asynchronous ioctl]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The cr4_pae flag is a bit of a misnomer, its purpose is really to track
whether the guest PTE that is being shadowed is a 4-byte entry or an
8-byte entry. Prior to supporting nested EPT, the size of the gpte was
reflected purely by CR4.PAE. KVM fudged things a bit for direct sptes,
but it was mostly harmless since the size of the gpte never mattered.
Now that a spte may be tracking an indirect EPT entry, relying on
CR4.PAE is wrong and ill-named.
For direct shadow pages, force the gpte_size to '1' as they are always
8-byte entries; EPT entries can only be 8-bytes and KVM always uses
8-byte entries for NPT and its identity map (when running with EPT but
not unrestricted guest).
Likewise, nested EPT entries are always 8-bytes. Nested EPT presents a
unique scenario as the size of the entries are not dictated by CR4.PAE,
but neither is the shadow page a direct map. To handle this scenario,
set cr0_wp=1 and smap_andnot_wp=1, an otherwise impossible combination,
to denote a nested EPT shadow page. Use the information to avoid
incorrectly zapping an unsync'd indirect page in __kvm_sync_page().
Providing a consistent and accurate gpte_size fixes a bug reported by
Vitaly where fast_cr3_switch() always fails when switching from L2 to
L1 as kvm_mmu_get_page() would force role.cr4_pae=0 for direct pages,
whereas kvm_calc_mmu_role_common() would set it according to CR4.PAE.
Fixes: 7dcd575520 ("x86/kvm/mmu: check if tdp/shadow MMU reconfiguration is needed")
Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Tested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Explicitly zero out quadrant and invalid instead of inheriting them from
the root_mmu. Functionally, this patch is a nop as we (should) never
set quadrant for a direct mapped (EPT) root_mmu and nested EPT is only
allowed if EPT is used for L1, and the root_mmu will never be invalid at
this point.
Explicitly setting flags sets the stage for repurposing the legacy
paging bits in role, e.g. nxe, cr0_wp, and sm{a,e}p_andnot_wp, at which
point 'smm' would be the only flag to be inherited from root_mmu.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The marshalling of AFS.StoreData, AFS.StoreData64 and YFS.StoreData64 calls
generated by ->setattr() ops for the purpose of expanding a file is
incorrect due to older documentation incorrectly describing the way the RPC
'FileLength' parameter is meant to work.
The older documentation says that this is the length the file is meant to
end up at the end of the operation; however, it was never implemented this
way in any of the servers, but rather the file is truncated down to this
before the write operation is effected, and never expanded to it (and,
indeed, it was renamed to 'TruncPos' in 2014).
Fix this by setting the position parameter to the new file length and doing
a zero-lengh write there.
The bug causes Xwayland to SIGBUS due to unexpected non-expansion of a file
it then mmaps. This can be tested by giving the following test program a
filename in an AFS directory:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/mman.h>
int main(int argc, char *argv[])
{
char *p;
int fd;
if (argc != 2) {
fprintf(stderr,
"Format: test-trunc-mmap <file>\n");
exit(2);
}
fd = open(argv[1], O_RDWR | O_CREAT | O_TRUNC);
if (fd < 0) {
perror(argv[1]);
exit(1);
}
if (ftruncate(fd, 0x140008) == -1) {
perror("ftruncate");
exit(1);
}
p = mmap(NULL, 4096, PROT_READ | PROT_WRITE,
MAP_SHARED, fd, 0);
if (p == MAP_FAILED) {
perror("mmap");
exit(1);
}
p[0] = 'a';
if (munmap(p, 4096) < 0) {
perror("munmap");
exit(1);
}
if (close(fd) < 0) {
perror("close");
exit(1);
}
exit(0);
}
Fixes: 31143d5d51 ("AFS: implement basic file write support")
Reported-by: Jonathan Billings <jsbillin@umich.edu>
Tested-by: Jonathan Billings <jsbillin@umich.edu>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull s390 fixes from Martin Schwidefsky:
"Improvements and bug fixes for 5.1-rc2:
- Fix early free of the channel program in vfio
- On AP device removal make sure that all messages are flushed with
the driver still attached that queued the message
- Limit brk randomization to 32MB to reduce the chance that the heap
of ld.so is placed after the main stack
- Add a rolling average for the steal time of a CPU, this will be
needed for KVM to decide when to do busy waiting
- Fix a warning in the CPU-MF code
- Add a notification handler for AP configuration change to react
faster to new AP devices"
* tag 's390-5.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/cpumf: Fix warning from check_processor_id
zcrypt: handle AP Info notification from CHSC SEI command
vfio: ccw: only free cp on final interrupt
s390/vtime: steal time exponential moving average
s390/zcrypt: revisit ap device remove procedure
s390: limit brk randomization to 32MB
The DPDK project is moving forward with its AF_XDP PMD, and during
that process some libbpf issues surfaced [1]: When libbpf was built
as a shared library, libelf was not included in the linking phase.
Since libelf is an internal depedency to libbpf, libelf should be
included. This patch adds '-lelf' to resolve that.
[1] https://patches.dpdk.org/patch/50704/#93571
Fixes: 1b76c13e4b ("bpf tools: Introduce 'bpf' library and add bpf feature check")
Suggested-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The xsk.h header file was missing from the install_headers target in
the Makefile. This patch simply adds xsk.h to the set of installed
headers.
Fixes: 1cad078842 ("libbpf: add support for using AF_XDP sockets")
Reported-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Pull ARM SoC fixes from Arnd Bergmann:
"A couple of minor fixes only for now
- fix for incorrect DMA channels on Renesas R-Car
- Broadcom bcm2835 error handling fixes
- Kconfig dependency fixes for bcm2835 and davinci
- CPU idle wakeup fix for i.MX6
- MMC regression on Tegra186
- fix incorrect phy settings on one imx board"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
arm64: tegra: Disable CQE Support for SDMMC4 on Tegra186
ARM: dts: nomadik: Fix polarity of SPI CS
ARM: davinci: fix build failure with allnoconfig
ARM: imx_v4_v5_defconfig: enable PWM driver
ARM: imx_v6_v7_defconfig: continue compiling the pwm driver
ARM: dts: imx6dl-yapp4: Use correct pseudo PHY address for the switch
ARM: dts: imx6qdl: Fix typo in imx6qdl-icore-rqs.dtsi
ARM: dts: imx6ull: Use the correct style for SPDX License Identifier
ARM: dts: pfla02: increase phy reset duration
ARM: imx6q: cpuidle: fix bug that CPU might not wake up at expected time
ARM: imx51: fix a leaked reference by adding missing of_node_put
ARM: dts: imx6dl-yapp4: Use rgmii-id phy mode on the cpu port
arm64: bcm2835: Add missing dependency on MFD_CORE.
ARM: dts: bcm283x: Fix hdmi hpd gpio pull
soc: bcm: bcm2835-pm: Fix error paths of initialization.
soc: bcm: bcm2835-pm: Fix PM_IMAGE_PERI power domain support.
arm64: dts: renesas: r8a774c0: Fix SCIF5 DMA channels
arm64: dts: renesas: r8a77990: Fix SCIF5 DMA channels
Fix commit 56067812d5 ("kbuild: modversions: add infrastructure for
emitting relative CRCs") where CRCs are interpreted in host byte order
rather than proper kernel byte order. The bug is conditional on
CONFIG_MODULE_REL_CRCS.
For example, when loading a BE module into a BE kernel compiled with a LE
system, the error "disagrees about version of symbol module_layout" is
produced. A message such as "Found checksum D7FA6856 vs module 5668FAD7"
will be given with debug enabled, which indicates an obvious endian
problem within __kcrctab within the kernel image.
The general solution is to use the macro TO_NATIVE, as is done in
similar cases throughout modpost.c. With this correction it has been
verified that a BE kernel compiled with a LE system accepts BE modules.
This change has also been verified with a LE kernel compiled with a LE
system, in which case TO_NATIVE returns its value unmodified since the
byte orders match. This is by far the common case.
Fixes: 56067812d5 ("kbuild: modversions: add infrastructure for emitting relative CRCs")
Signed-off-by: Fredrik Noring <noring@nocrew.org>
Cc: stable@vger.kernel.org
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
CC_FLAGS_FTRACE may contain trailing whitespace that interferes with
findstring.
For example, commit 6977f95e63 ("powerpc: avoid -mno-sched-epilog on
GCC 4.9 and newer") introduced a change such that on my ppc64le box,
CC_FLAGS_FTRACE="-pg -mprofile-kernel ". (Note the trailing space.)
When cmd_record_mcount is now invoked, findstring fails as the ftrace
flags were found at very end of _c_flags, without the trailing space.
_c_flags=" ... -pg -mprofile-kernel"
CC_FLAGS_FTRACE="-pg -mprofile-kernel "
^
findstring is looking for this extra space
Remove the redundant whitespaces from CC_FLAGS_FTRACE in
cmd_record_mcount to avoid this problem.
[masahiro.yamada: This issue only happens in the released versions
of GNU Make. CC_FLAGS_FTRACE will not contain the trailing space if
you use the latest GNU Make, which contains commit b90fabc8d6f3
("* NEWS: Do not insert a space during '+=' if the value is empty.") ]
Suggested-by: Masahiro Yamada <yamada.masahiro@socionext.com> (refactoring)
Fixes: 6977f95e63 ("powerpc: avoid -mno-sched-epilog on GCC 4.9 and newer").
Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Commit 3a51ff3442 ("kbuild: gitignore output directory") seemed to
bother people who version-control output directories.
Andre Przywara says:
"Unfortunately this breaks my setup, because I keep a totally separate
git repository in my build directories to track (various versions of)
.config. So .gitignore there is carefully crafted to ignore most build
artefacts, but not .config, for instance."
Link: https://lkml.org/lkml/2019/3/22/1819
Reported-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Tested-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
When Make recurses to the top Makefile with sub-make-done unset,
the code block surrounded by 'ifneq ($(sub-make-done),1) ... endif'
is parsed multiple times. This happens for in-tree building of
include/config/auto.conf, *-pkg, etc. with GNU Make 4.x.
This is a slight regression by commit 688931a5ad ("kbuild: skip
sub-make for in-tree build with GNU Make 4.x") in terms of performance
since that code block contains one $(shell ...) invocation.
Fix it by exporting the variable irrespective of sub-make being run.
I renamed it because GNU Make cannot properly export variables
containing hyphens. This is probably a bug of GNU Make, and the issue
in Kbuild had already been reported by commit 2bfbe7881e ("kbuild:
Do not use hyphen in exported variable name").
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
When CONFIG_VMAP_STACK=y, __pa() returns incorrect physical address for
a stack virtual address. Stack DMA buffers must be avoided.
Signed-off-by: raymond pang <raymondpangxd@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
GT VEBOX DISABLE is only 4 bits wide but it was using a 8 bits wide
mask, the remaning reserved bits is set to 0 causing 4 more
nonexistent VEBOX engines being detected as enabled, triggering the
BUG_ON() because of mismatch between vebox_mask and newly added
VEBOX_MASK().
[ 64.081621] [drm:intel_device_info_init_mmio [i915]] vdbox enable: 0005, instances: 0005
[ 64.081763] [drm:intel_device_info_init_mmio [i915]] vebox enable: 00f1, instances: 0001
[ 64.081825] intel_device_info_init_mmio:925 GEM_BUG_ON(vebox_mask != ({ unsigned int first__ = (VECS0); unsigned int count__ = (2); ((&(dev_priv)->__info)->engine_mask & (((~0UL) - (1UL << (first__)) + 1) & (~0UL >> (64 - 1 - (first__ + count__ - 1))))) >> first__; }))
[ 64.082047] ------------[ cut here ]------------
[ 64.082054] kernel BUG at drivers/gpu/drm/i915/intel_device_info.c:925!
BSpec: 20680
Fixes: 26376a7e74 ("drm/i915/icl: Check for fused-off VDBOX and VEBOX instances")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190326230223.26336-1-jose.souza@intel.com
(cherry picked from commit 547fcf9b1c)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
kernel/futex_compat.c was recently removed, but it's still in the
MAINTAINERS file. Remove it there as well.
Fixes: 04e7712f44 ("y2038: futex: Move compat implementation into futex.c")
Reported-by: Joe Perches <joe@perches.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Some comments in virt/kvm/arm/mmu.c are outdated. Update them to
reflect the current state of the code.
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
[maz: commit message tidy-up]
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
valid_phys_addr_range() is used to sanity check the physical address range
of an operation, e.g., access to /dev/mem. It uses __pa(high_memory)
internally.
If memory is populated at the end of the physical address space, then
__pa(high_memory) is outside of the physical address space because:
high_memory = (void *)__va(max_pfn * PAGE_SIZE - 1) + 1;
For the comparison in valid_phys_addr_range() this is not an issue, but if
CONFIG_DEBUG_VIRTUAL is enabled, __pa() maps to __phys_addr(), which
verifies that the resulting physical address is within the valid physical
address space of the CPU. So in the case that memory is populated at the
end of the physical address space, this is not true and triggers a
VIRTUAL_BUG_ON().
Use __pa(high_memory - 1) to prevent the conversion from going beyond
the end of valid physical addresses.
Fixes: be62a32044 ("x86/mm: Limit mmap() of /dev/mem to valid physical addresses")
Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Craig Bergstrom <craigb@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hans Verkuil <hans.verkuil@cisco.com>
Cc: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Cc: Sean Young <sean@mess.org>
Link: https://lkml.kernel.org/r/20190326001817.15413-2-rcampbell@nvidia.com
The alcor driver is setting up data transfer and submitting the associated
MMC command at the same time. While this works most of the time, it
occasionally causes problems upon write.
In the working case, after setting up the data transfer and submitting
the MMC command, an interrupt comes in a moment later with CMD_END and
WRITE_BUF_RDY bits set. The data transfer then happens without problem.
However, on occasion, the interrupt that arrives at that point only
has WRITE_BUF_RDY set. The hardware notifies that it's ready to write
data, but the associated MMC command is still running. Regardless, the
driver was proceeding to write data immediately, and that would then cause
another interrupt indicating data CRC error, and the write would fail.
Additionally, the transfer setup function alcor_trigger_data_transfer()
was being called 3 times for each write operation, which was confusing
and may be contributing to this issue.
Solve this by tweaking the driver behaviour to follow the sequence observed
in the original ampe_stor vendor driver:
1. When starting request handling, write 0 to DATA_XFER_CTRL
2. Submit the command
3. Wait for CMD_END interrupt and then trigger data transfer
4. For the PIO case, trigger the next step of the data transfer only
upon the following DATA_END interrupt, which occurs after the block has
been written.
I confirmed that the read path still works (DMA & PIO) and also now
presents more consistency with the operations performed by ampe_stor.
Signed-off-by: Daniel Drake <drake@endlessm.com>
Fixes: c5413ad815 ("mmc: add new Alcor Micro Cardreader SD/MMC driver")
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Commit ce02ef06fc ("x86, retpolines: Raise limit for generating indirect
calls from switch-case") raised the limit under retpolines to 20 switch
cases where gcc would only then start to emit jump tables, and therefore
effectively disabling the emission of slow indirect calls in this area.
After this has been brought to attention to gcc folks [0], Martin Liska
has then fixed gcc to align with clang by avoiding to generate switch jump
tables entirely under retpolines. This is taking effect in gcc starting
from stable version 8.4.0. Given kernel supports compilation with older
versions of gcc where the fix is not being available or backported anymore,
we need to keep the extra KBUILD_CFLAGS around for some time and generally
set the -fno-jump-tables to align with what more recent gcc is doing
automatically today.
More than 20 switch cases are not expected to be fast-path critical, but
it would still be good to align with gcc behavior for versions < 8.4.0 in
order to have consistency across supported gcc versions. vmlinux size is
slightly growing by 0.27% for older gcc. This flag is only set to work
around affected gcc, no change for clang.
[0] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86952
Suggested-by: Martin Liska <mliska@suse.cz>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Björn Töpel<bjorn.topel@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: David S. Miller <davem@davemloft.net>
Link: https://lkml.kernel.org/r/20190325135620.14882-1-daniel@iogearbox.net
Tianyu reported a crash in a CPU hotplug teardown callback when booting a
kernel which has CONFIG_HOTPLUG_CPU disabled with the 'nosmt' boot
parameter.
It turns out that the SMP=y CONFIG_HOTPLUG_CPU=n case has been broken
forever in case that a bringup callback fails. Unfortunately this issue was
not recognized when the CPU hotplug code was reworked, so the shortcoming
just stayed in place.
When a bringup callback fails, the CPU hotplug code rolls back the
operation and takes the CPU offline.
The 'nosmt' command line argument uses a bringup failure to abort the
bringup of SMT sibling CPUs. This partial bringup is required due to the
MCE misdesign on Intel CPUs.
With CONFIG_HOTPLUG_CPU=y the rollback works perfectly fine, but
CONFIG_HOTPLUG_CPU=n lacks essential mechanisms to exercise the low level
teardown of a CPU including the synchronizations in various facilities like
RCU, NOHZ and others.
As a consequence the teardown callbacks which must be executed on the
outgoing CPU within stop machine with interrupts disabled are executed on
the control CPU in interrupt enabled and preemptible context causing the
kernel to crash and burn. The pre state machine code has a different
failure mode which is more subtle and resulting in a less obvious use after
free crash because the control side frees resources which are still in use
by the undead CPU.
But this is not a x86 only problem. Any architecture which supports the
SMP=y HOTPLUG_CPU=n combination suffers from the same issue. It's just less
likely to be triggered because in 99.99999% of the cases all bringup
callbacks succeed.
The easy solution of making HOTPLUG_CPU mandatory for SMP is not working on
all architectures as the following architectures have either no hotplug
support at all or not all subarchitectures support it:
alpha, arc, hexagon, openrisc, riscv, sparc (32bit), mips (partial).
Crashing the kernel in such a situation is not an acceptable state
either.
Implement a minimal rollback variant by limiting the teardown to the point
where all regular teardown callbacks have been invoked and leave the CPU in
the 'dead' idle state. This has the following consequences:
- the CPU is brought down to the point where the stop_machine takedown
would happen.
- the CPU stays there forever and is idle
- The CPU is cleared in the CPU active mask, but not in the CPU online
mask which is a legit state.
- Interrupts are not forced away from the CPU
- All facilities which only look at online mask would still see it, but
that is the case during normal hotplug/unplug operations as well. It's
just a (way) longer time frame.
This will expose issues, which haven't been exposed before or only seldom,
because now the normally transient state of being non active but online is
a permanent state. In testing this exposed already an issue vs. work queues
where the vmstat code schedules work on the almost dead CPU which ends up
in an unbound workqueue and triggers 'preemtible context' warnings. This is
not a problem of this change, it merily exposes an already existing issue.
Still this is better than crashing fully without a chance to debug it.
This is mainly thought as workaround for those architectures which do not
support HOTPLUG_CPU. All others should enforce HOTPLUG_CPU for SMP.
Fixes: 2e1a3483ce ("cpu/hotplug: Split out the state walk into functions")
Reported-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Konrad Wilk <konrad.wilk@oracle.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Mukesh Ojha <mojha@codeaurora.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Rik van Riel <riel@surriel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Micheal Kelley <michael.h.kelley@microsoft.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20190326163811.503390616@linutronix.de
Commit 18996f2db9 ("ACPICA: Events: Stop unconditionally clearing
ACPI IRQs during suspend/resume") was added to stop clearing event
status bits unconditionally in the system-wide suspend and resume
paths. This was done because of an issue with a laptop lid appaering
to be closed even when it was used to wake up the system from suspend
(see https://bugzilla.kernel.org/show_bug.cgi?id=196249), which
happened because event status bits were cleared unconditionally on
system resume. Though this change fixed the issue in the resume path,
it introduced regressions in a few suspend paths.
First regression was reported and fixed in the S5 entry path by commit
fa85015c0d ("ACPICA: Clear status of all events when entering S5").
Next regression was reported and fixed for all legacy sleep paths by
commit f317c7dc12 ("ACPICA: Clear status of all events when entering
sleep states"). However, there still is a suspend-to-idle regression,
since suspend-to-idle does not follow the legacy sleep paths.
In the suspend-to-idle case, wakeup is enabled as part of device
suspend. If the status bits of wakeup GPEs are set when they are
enabled, it causes a premature system wakeup to occur.
To address that problem, partially revert commit 18996f2db9 to
restore GPE status bits clearing before the GPE is enabled in
acpi_ev_enable_gpe().
Fixes: 18996f2db9 ("ACPICA: Events: Stop unconditionally clearing ACPI IRQs during suspend/resume")
Signed-off-by: Furquan Shaikh <furquan@google.com>
Cc: 4.17+ <stable@vger.kernel.org> # 4.17+
[ rjw: Subject & changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Function __hw_perf_event_init() used a CPU variable without
ensuring CPU preemption has been disabled. This caused the
following warning in the kernel log:
[ 7.277085] BUG: using smp_processor_id() in preemptible
[00000000] code: cf-csdiag/1892
[ 7.277111] caller is cf_diag_event_init+0x13a/0x338
[ 7.277122] CPU: 10 PID: 1892 Comm: cf-csdiag Not tainted
5.0.0-20190318.rc0.git0.9e1a11e0f602.300.fc29.s390x+debug #1
[ 7.277131] Hardware name: IBM 2964 NC9 712 (LPAR)
[ 7.277139] Call Trace:
[ 7.277150] ([<000000000011385a>] show_stack+0x82/0xd0)
[ 7.277161] [<0000000000b7a71a>] dump_stack+0x92/0xd0
[ 7.277174] [<00000000007b7e9c>] check_preemption_disabled+0xe4/0x100
[ 7.277183] [<00000000001228aa>] cf_diag_event_init+0x13a/0x338
[ 7.277195] [<00000000002cf3aa>] perf_try_init_event+0x72/0xf0
[ 7.277204] [<00000000002d0bba>] perf_event_alloc+0x6fa/0xce0
[ 7.277214] [<00000000002dc4a8>] __s390x_sys_perf_event_open+0x398/0xd50
[ 7.277224] [<0000000000b9e8f0>] system_call+0xdc/0x2d8
[ 7.277233] 2 locks held by cf-csdiag/1892:
[ 7.277241] #0: 00000000976f5510 (&sig->cred_guard_mutex){+.+.},
at: __s390x_sys_perf_event_open+0xd2e/0xd50
[ 7.277257] #1: 00000000363b11bd (&pmus_srcu){....},
at: perf_event_alloc+0x52e/0xce0
The variable is now accessed in proper context. Use
get_cpu_var()/put_cpu_var() pair to disable
preemption during access.
As the hardware authorization settings apply to all CPUs, it
does not matter which CPU is used to check the authorization setting.
Remove the event->count assignment. It is not needed as function
perf_event_alloc() allocates memory for the event with kzalloc() and
thus count is already set to zero.
Fixes: fe5908bccc ("s390/cpum_cf_diag: Add support for s390 counter facility diagnostic trace")
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Lorenz Messtechnik has a device that is controlled by the cp210x driver,
so add the device id to the driver. The device id was provided by
Silicon-Labs for the devices from this vendor.
Reported-by: Uli <t9cpu@web.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
VRF devices don't work with upper devices. Currently, it's possible to
add a VRF device to a bridge or team, and to create macvlan, macsec, or
ipvlan devices on top of a VRF (bond and vlan are prevented respectively
by the lack of an ndo_set_mac_address op and the NETIF_F_VLAN_CHALLENGED
feature flag).
Fix this by setting the IFF_NO_RX_HANDLER flag (introduced in commit
f5426250a6 ("net: introduce IFF_NO_RX_HANDLER")).
Cc: David Ahern <dsahern@gmail.com>
Fixes: 193125dbd8 ("net: Introduce VRF device driver")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix a side effect of adding xcbc support, when the next_buffer is not
copied.
The issue occurs, when there is stored from previous state a blocksize
buffer and received, a less than blocksize, from user. In this case, the
nents for req->src is 0, and the next_buffer is not copied.
An example is:
{
.tap = { 17, 15, 8 },
.psize = 40,
.np = 3,
.ksize = 16,
}
Fixes: 12b8567f6f ("crypto: caam - add support for xcbc(aes)")
Signed-off-by: Iuliana Prodan <iuliana.prodan@nxp.com>
Reviewed-by: Horia Geantă <horia.geanta@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Dean Nelson says:
====================
thunderx: fix receive buffer page recycling
In attempting to optimize receive buffer page recycling for XDP, commit
773225388d ("net: thunderx: Optimize page recycling for XDP")
inadvertently introduced two problems for the non-XDP case, that will be
addressed by this patch series.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
For the non-XDP case, commit 773225388d ("net: thunderx: Optimize
page recycling for XDP") added code to nicvf_free_rbdr() that, when releasing
the additional receive buffer page reference held for recycling, repeatedly
calls put_page() until the page's _refcount goes to zero. Which results in
the page being freed.
This is not okay if the page's _refcount was greater than 1 (in the non-XDP
case), because nicvf_free_rbdr() should not be subtracting more than what
nicvf_alloc_page() had previously added to the page's _refcount, which was
only 1 (in the non-XDP case).
This can arise if a received packet is still being processed and the receive
buffer (i.e., skb->head) has not yet been freed via skb_free_head() when
nicvf_free_rbdr() is spinning through the aforementioned put_page() loop.
If this should occur, when the received packet finishes processing and
skb_free_head() is called, various problems can ensue. Exactly what, depends on
whether the page has already been reallocated or not, anything from "BUG: Bad
page state ... ", to "Unable to handle kernel NULL pointer dereference ..." or
"Unable to handle kernel paging request...".
So this patch changes nicvf_free_rbdr() to only call put_page() once for pages
held for recycling (in the non-XDP case).
Fixes: 773225388d ("net: thunderx: Optimize page recycling for XDP")
Signed-off-by: Dean Nelson <dnelson@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 773225388d ("net: thunderx: Optimize page recycling for XDP")
added code to nicvf_alloc_page() that inadvertently disables receive buffer
page recycling for the non-XDP case by always NULL'ng the page pointer.
This patch corrects two if-conditionals to allow for the recycling of non-XDP
mode pages by only setting the page pointer to NULL when the page is not ready
for recycling.
Fixes: 773225388d ("net: thunderx: Optimize page recycling for XDP")
Signed-off-by: Dean Nelson <dnelson@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With a recent link mode advertisement code update this helper
providing local pause capability translation used for flow
control link mode negotiation got broken.
For eth drivers using this helper, the issue is apparent only
if either PAUSE or ASYM_PAUSE is being advertised.
Fixes: 3c1bcc8614 ("net: ethernet: Convert phydev advertize and supported from u32 to link mode")
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, the rules for configuring search paths in Kbuild have
changed, this will lead some erros when compiling hns3 with the
following command:
make O=DIR M=drivers/net/ethernet/hisilicon/hns3
drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c:11:10:
fatal error: hnae3.h: No such file or directory
This patch fix it by adding $(srctree)/ prefix to the serach paths.
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
MAINTAINERS still pointed to phy.txt after moving this file into the rst
format, fix this.
Reported-by: Joe Perches <joe@perches.com>
Fixes: 25fe02d00a ("Documentation: net: phy: switch documentation to rst format")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Christoph reported a stall while peeking datagram with an offset when
busy polling is enabled. __skb_try_recv_datagram() uses as the loop
termination condition 'queue empty'. When peeking, the socket
queue can be not empty, even when no additional packets are received.
Address the issue explicitly checking for receive queue changes,
as currently done by __skb_wait_for_more_packets().
Fixes: 2b5cd0dfa3 ("net: Change return type of sk_busy_loop from bool to void")
Reported-and-tested-by: Christoph Paasch <cpaasch@apple.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patches fixes few issues in mv88e6390x_port_set_cmode().
1. When entering the function the old cmode may be 0, in this case
mv88e6390x_serdes_get_lane() returns -ENODEV. As result we bail
out and have no chance to set a new mode. Therefore deal properly
with -ENODEV.
2. Once we have disabled power and irq, let's set the cached cmode to 0.
This reflects the actual status and is cleaner if we bail out with an
error in the following function calls.
3. The cached cmode is used by mv88e6390x_serdes_get_lane(),
mv88e6390_serdes_power_lane() and mv88e6390_serdes_irq_enable().
Currently we set the cached mode to the new one at the very end of
the function only, means until then we use the old one what may be
wrong.
4. When calling mv88e6390_serdes_irq_enable() we use the lane value
belonging to the old cmode. Get the lane belonging to the new cmode
before calling this function.
It's hard to provide a good "Fixes" tag because quite a few smaller
changes have been done to the code in question recently.
Fixes: d235c48b40 ("net: dsa: mv88e6xxx: power serdes on/off for 10G interfaces on 6390X")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[Why]
VBIOS will not post pixel rate > 340MHz.
If driver set pixel rate > 340MHz and do restart bottom, VBIOS can't
post HDMI monitor due to monitor is stay in HDMI2.0 state.
[How]
Program Scrambling_Enable and TMDS_Bit_Clock_Ratio when disable stream.
Signed-off-by: Paul Hsieh <paul.hsieh@amd.com>
Reviewed-by: Charlene Liu <Charlene.Liu@amd.com>
Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Acked-by: Harry Wentland <Harry.Wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
use pcie_bandwidth_available to get real link state
to update pcie table.
v2: fix incorrect initialized return value
v3: expand the fetching method about the link width to all asics.
Signed-off-by: Chengming Gui <Jack.Gui@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
No change to functionality. Simply make transport event messages a little
clearer, and rework CRQ format enums such that we have separate enums for
INIT messages and XPORT events.
[mkp: typo]
Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Status and error codes are returned in big endian from the VIOS. The values
are translated into a human readable format when logged, but the values are
also logged. This patch byte swaps those values so that they are consistent
between BE and LE platforms.
Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The VIOS uses the SCSI_ERROR class to report PRLI failures. These errors
are indicated with the combination of a IBMVFC_FC_SCSI_ERROR return status
and 0x8000 error code. Add these codes to cmd_status[] with appropriate
human readable error message.
Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The text of messages logged with ibmvfc_log_error() always contain the term
"failed". In the case of cancelled commands during EH they are reported
back by the VIOS using error codes. This can be confusing to somebody
looking at these log messages as to whether a command was successfully
cancelled. The following real log message for example it is unclear if the
transaction was actaully cancelled.
<6>sd 0:0:1:1: Cancelling outstanding commands.
<3>sd 0:0:1:1: [sde] Command (28) failed: transaction cancelled (2:6) flags: 0 fcp_rsp: 0, resid=0, scsi_status: 0
Remove prefixing of "failed" to all error logged messages. The
ibmvfc_log_error() function translates the returned error/status codes to a
human readable message already.
Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If an incoming ELS of type RSCN contains more than one element, zfcp
suboptimally causes repeated erp trigger NOP trace records for each
previously failed port. These could be ports that went away. It loops over
each RSCN element, and for each of those in an inner loop over all
zfcp_ports.
The trigger to recover failed ports should be just the reception of some
RSCN, no matter how many elements it has. So we can loop over failed ports
separately, and only then loop over each RSCN element to handle the
non-failed ports.
The call chain was:
zfcp_fc_incoming_rscn
for (i = 1; i < no_entries; i++)
_zfcp_fc_incoming_rscn
list_for_each_entry(port, &adapter->port_list, list)
if (masked port->d_id match) zfcp_fc_test_link
if (!port->d_id) zfcp_erp_port_reopen "fcrscn1" <===
In order the reduce the "flooding" of the REC trace area in such cases, we
factor out handling the failed ports to be outside of the entries loop:
zfcp_fc_incoming_rscn
if (no_entries > 1) <===
list_for_each_entry(port, &adapter->port_list, list) <===
if (!port->d_id) zfcp_erp_port_reopen "fcrscn1" <===
for (i = 1; i < no_entries; i++)
_zfcp_fc_incoming_rscn
list_for_each_entry(port, &adapter->port_list, list)
if (masked port->d_id match) zfcp_fc_test_link
Abbreviated example trace records before this code change:
Tag : fcrscn1
WWPN : 0x500507630310d327
ERP want : 0x02
ERP need : 0x02
Tag : fcrscn1
WWPN : 0x500507630310d327
ERP want : 0x02
ERP need : 0x00 NOP => superfluous trace record
The last trace entry repeats if there are more than 2 RSCN elements.
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Suppose more than one non-NPIV FCP device is active on the same channel.
Send I/O to storage and have some of the pending I/O run into a SCSI
command timeout, e.g. due to bit errors on the fibre. Now the error
situation stops. However, we saw FCP requests continue to timeout in the
channel. The abort will be successful, but the subsequent TUR fails.
Scsi_eh starts. The LUN reset fails. The target reset fails. The host
reset only did an FCP device recovery. However, for non-NPIV FCP devices,
this does not close and reopen ports on the SAN-side if other non-NPIV FCP
device(s) share the same open ports.
In order to resolve the continuing FCP request timeouts, we need to
explicitly close and reopen ports on the SAN-side.
This was missing since the beginning of zfcp in v2.6.0 history commit
ea127f975424 ("[PATCH] s390 (7/7): zfcp host adapter.").
Note: The FSF requests for forced port reopen could run into FSF request
timeouts due to other reasons. This would trigger an internal FCP device
recovery. Pending forced port reopen recoveries would get dismissed. So
some ports might not get fully reopened during this host reset handler.
However, subsequent I/O would trigger the above described escalation and
eventually all ports would be forced reopen to resolve any continuing FCP
request timeouts due to earlier bit errors.
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Cc: <stable@vger.kernel.org> #3.0+
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
An already deleted SCSI device can exist on the Scsi_Host and remain there
because something still holds a reference. A new SCSI device with the same
H:C:T:L and FCP device, target port WWPN, and FCP LUN can be created. When
we try to unblock an rport, we still find the deleted SCSI device and
return early because the zfcp_scsi_dev of that SCSI device is not
ZFCP_STATUS_COMMON_UNBLOCKED. Hence we miss to unblock the rport, even if
the new proper SCSI device would be in good state.
Therefore, skip deleted SCSI devices when iterating the sdevs of the shost.
[cf. __scsi_device_lookup{_by_target}() or scsi_device_get()]
The following abbreviated trace sequence can indicate such problem:
Area : REC
Tag : ersfs_3
LUN : 0x4045400300000000
WWPN : 0x50050763031bd327
LUN status : 0x40000000 not ZFCP_STATUS_COMMON_UNBLOCKED
Ready count : n not incremented yet
Running count : 0x00000000
ERP want : 0x01
ERP need : 0xc1 ZFCP_ERP_ACTION_NONE
Area : REC
Tag : ersfs_3
LUN : 0x4045400300000000
WWPN : 0x50050763031bd327
LUN status : 0x41000000
Ready count : n+1
Running count : 0x00000000
ERP want : 0x01
ERP need : 0x01
...
Area : REC
Level : 4 only with increased trace level
Tag : ertru_l
LUN : 0x4045400300000000
WWPN : 0x50050763031bd327
LUN status : 0x40000000
Request ID : 0x0000000000000000
ERP status : 0x01800000
ERP step : 0x1000
ERP action : 0x01
ERP count : 0x00
NOT followed by a trace record with tag "scpaddy"
for WWPN 0x50050763031bd327.
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Fixes: 6f2ce1c6af ("scsi: zfcp: fix rport unblock race with LUN recovery")
Cc: <stable@vger.kernel.org> #2.6.32+
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Commit a83da8a450 ("scsi: sd: Optimal I/O size should be a multiple
of physical block size") split one conditional into several separate
statements in an effort to provide more accurate warning messages when
a device reports a nonsensical value. However, this reorganization
accidentally dropped the precondition of the reported value being
larger than zero. This lead to a warning getting emitted on devices
that do not report an optimal I/O size at all.
Remain silent if a device does not report an optimal I/O size.
Fixes: a83da8a450 ("scsi: sd: Optimal I/O size should be a multiple of physical block size")
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: <stable@vger.kernel.org>
Reported-by: Hussam Al-Tayeb <ht990332@gmx.com>
Tested-by: Hussam Al-Tayeb <ht990332@gmx.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The scsi_end_request() function calls scsi_cmd_to_driver() indirectly and
hence needs the disk->private_data pointer. Avoid that that pointer is
cleared before all affected I/O requests have finished. This patch avoids
that the following crash occurs:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
Call trace:
scsi_mq_uninit_cmd+0x1c/0x30
scsi_end_request+0x7c/0x1b8
scsi_io_completion+0x464/0x668
scsi_finish_command+0xbc/0x160
scsi_eh_flush_done_q+0x10c/0x170
sas_scsi_recover_host+0x84c/0xa98 [libsas]
scsi_error_handler+0x140/0x5b0
kthread+0x100/0x12c
ret_from_fork+0x10/0x18
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Jason Yan <yanaijie@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reported-by: Jason Yan <yanaijie@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Use dd to test a SCSI device:
1. echo "blocked" >/sys/block/sda/device/state
2. dd if=/dev/sda of=/mnt/t.log bs=1M count=10
3. echo "running" >/sys/block/sda/device/state
dd should finish this work after step 3, but it hangs.
After step2, the call chain is this:
blk_mq_dispatch_rq_list-->scsi_queue_rq-->prep_to_mq
prep_to_mq will return BLK_STS_RESOURCE, and scsi_queue_rq will
transition it to BLK_STS_DEV_RESOURCE which means that driver can
guarantee that IO dispatch will be triggered in future when the
resource is available. Need to follow the rule if we set the device
state to running.
[mkp: tweaked commit description and code comment as suggested by Bart]
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Pull networking fixes from David Miller:
"Fixes here and there, a couple new device IDs, as usual:
1) Fix BQL race in dpaa2-eth driver, from Ioana Ciornei.
2) Fix 64-bit division in iwlwifi, from Arnd Bergmann.
3) Fix documentation for some eBPF helpers, from Quentin Monnet.
4) Some UAPI bpf header sync with tools, also from Quentin Monnet.
5) Set descriptor ownership bit at the right time for jumbo frames in
stmmac driver, from Aaro Koskinen.
6) Set IFF_UP properly in tun driver, from Eric Dumazet.
7) Fix load/store doubleword instruction generation in powerpc eBPF
JIT, from Naveen N. Rao.
8) nla_nest_start() return value checks all over, from Kangjie Lu.
9) Fix asoc_id handling in SCTP after the SCTP_*_ASSOC changes this
merge window. From Marcelo Ricardo Leitner and Xin Long.
10) Fix memory corruption with large MTUs in stmmac, from Aaro
Koskinen.
11) Do not use ipv4 header for ipv6 flows in TCP and DCCP, from Eric
Dumazet.
12) Fix topology subscription cancellation in tipc, from Erik Hugne.
13) Memory leak in genetlink error path, from Yue Haibing.
14) Valid control actions properly in packet scheduler, from Davide
Caratti.
15) Even if we get EEXIST, we still need to rehash if a shrink was
delayed. From Herbert Xu.
16) Fix interrupt mask handling in interrupt handler of r8169, from
Heiner Kallweit.
17) Fix leak in ehea driver, from Wen Yang"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (168 commits)
dpaa2-eth: fix race condition with bql frame accounting
chelsio: use BUG() instead of BUG_ON(1)
net: devlink: skip info_get op call if it is not defined in dumpit
net: phy: bcm54xx: Encode link speed and activity into LEDs
tipc: change to check tipc_own_id to return in tipc_net_stop
net: usb: aqc111: Extend HWID table by QNAP device
net: sched: Kconfig: update reference link for PIE
net: dsa: qca8k: extend slave-bus implementations
net: dsa: qca8k: remove leftover phy accessors
dt-bindings: net: dsa: qca8k: support internal mdio-bus
dt-bindings: net: dsa: qca8k: fix example
net: phy: don't clear BMCR in genphy_soft_reset
bpf, libbpf: clarify bump in libbpf version info
bpf, libbpf: fix version info and add it to shared object
rxrpc: avoid clang -Wuninitialized warning
tipc: tipc clang warning
net: sched: fix cleanup NULL pointer exception in act_mirr
r8169: fix cable re-plugging issue
net: ethernet: ti: fix possible object reference leak
net: ibm: fix possible object reference leak
...
If page-fault handler spans multiple MRs then the access mask needs to
be reset before each MR handling or otherwise write access will be
granted to mapped pages instead of read-only.
Cc: <stable@vger.kernel.org> # 3.19
Fixes: 7bdf65d411 ("IB/mlx5: Handle page faults")
Reported-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
free the symlink body after the same RCU delay we have for freeing the
struct inode itself, so that traversal during RCU pathwalk wouldn't step
into freed memory.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
The receive side mapping (RSM) on hfi1 hardware is a special
matching mechanism to direct an incoming packet to a given
hardware receive context. It has 4 instances of matching capabilities
(RSM0 - RSM3) that share the same RSM table (RMT). The RMT has a total of
256 entries, each of which points to a receive context.
Currently, three instances of RSM have been used:
1. RSM0 by QOS;
2. RSM1 by PSM FECN;
3. RSM2 by VNIC.
Each RSM instance should reserve enough entries in RMT to function
properly. Since both PSM and VNIC could allocate any receive context
between dd->first_dyn_alloc_ctxt and dd->num_rcv_contexts, PSM FECN must
reserve enough RMT entries to cover the entire receive context index
range (dd->num_rcv_contexts - dd->first_dyn_alloc_ctxt) instead of only
the user receive contexts allocated for PSM
(dd->num_user_contexts). Consequently, the sizing of
dd->num_user_contexts in set_up_context_variables is incorrect.
Fixes: 2280740f01 ("IB/hfi1: Virtual Network Interface Controller (VNIC) HW support")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
When an old ack_queue entry is used to store an incoming request, it may
need to clean up the old entry if it is still referencing the
MR. Originally only RDMA READ request needed to reference MR on the
responder side and therefore the opcode was tested when cleaning up the
old entry. The introduction of tid rdma specific operations in the
ack_queue makes the specific opcode tests wrong. Multiple opcodes (RDMA
READ, TID RDMA READ, and TID RDMA WRITE) may need MR ref cleanup.
Remove the opcode specific tests associated with the ack_queue.
Fixes: f48ad614c1 ("IB/hfi1: Move driver out of staging")
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
When a QP is put into error state, it may be waiting for send engine
resources. In this case, the QP will be removed from the send engine's
waiting list, but its IOWAIT pending bits are not cleared. This will
normally not have any major impact as the QP is being destroyed. However,
the QP still needs to wind down its operations, such as draining the send
queue by scheduling the send engine. Clearing the pending bits will avoid
any potential complications. In addition, if the QP will eventually hang,
clearing the pending bits can help debugging by presenting a consistent
picture if the user dumps the qp_stats.
This patch clears a QP's IOWAIT_PENDING_IB and IO_PENDING_TID bits in
priv->s_iowait.flags in this case.
Fixes: 5da0fc9dbf ("IB/hfi1: Prepare resource waits for dual leg")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Alex Estrin <alex.estrin@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
When a QP is put into error state, all pending requests in the send work
queue should be drained. The following sequence of events could lead to a
failure, causing a request to hang:
(1) The QP builds a packet and tries to send through SDMA engine.
However, PIO engine is still busy. Consequently, this packet is put on
the QP's tx list and the QP is put on the PIO waiting list. The field
qp->s_flags is set with HFI1_S_WAIT_PIO_DRAIN;
(2) The QP is put into error state by the user application and
notify_error_qp() is called, which removes the QP from the PIO waiting
list and the packet from the QP's tx list. In addition, qp->s_flags is
cleared of RVT_S_ANY_WAIT_IO bits, which does not include
HFI1_S_WAIT_PIO_DRAIN bit;
(3) The hfi1_schdule_send() function is called to drain the QP's send
queue. Subsequently, hfi1_do_send() is called. Since the flag bit
HFI1_S_WAIT_PIO_DRAIN is set in qp->s_flags, hfi1_send_ok() fails. As
a result, hfi1_do_send() bails out without draining any request from
the send queue;
(4) The PIO engine completes the sending and tries to wake up any QP on
its waiting list. But the QP has been removed from the PIO waiting
list and therefore is kept in sleep forever.
The fix is to clear qp->s_flags of HFI1_S_ANY_WAIT_IO bits in step (2).
HFI1_S_ANY_WAIT_IO includes RVT_S_ANY_WAIT_IO and HFI1_S_WAIT_PIO_DRAIN.
Fixes: 2e2ba09e48 ("IB/rdmavt, IB/hfi1: Create device dependent s_flags")
Cc: <stable@vger.kernel.org> # 4.19.x+
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Alex Estrin <alex.estrin@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
In case kmemdup fails, the fix releases resources and returns to
avoid the NULL pointer dereference.
Signed-off-by: Aditya Pakki <pakki001@umn.edu>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
VirtualBox 6.0.x has a new feature where the guest kernel driver passes
info about the origin of the request (e.g. userspace or kernelspace) to
the hypervisor.
If we do not pass this information then when running the 6.0.x userspace
guest-additions tools on a 6.0.x host, some requests will get denied
with a VERR_VERSION_MISMATCH error, breaking vboxservice.service and
the mounting of shared folders marked to be auto-mounted.
This commit implements passing the requestor info to the host, fixing this.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Unlike 'client_ops' which is initialized to 'default_client_ops', the
port operations 'ops' may be left to NULL.
Check the 'ops' value before checking the 'ops->x' value.
Signed-off-by: Fabien Dessenne <fabien.dessenne@st.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Echo "" to /sys/module/kgdboc/parameters/kgdboc will fail with "No such
device” error.
This is caused by function "configure_kgdboc" who init err to ENODEV
when the config is empty (legal input) the code go out with ENODEV
returned.
Fixes: 2dd4531686 ("kgdboc: Fix restrict error")
Signed-off-by: Wentao Wang <witallwang@gmail.com>
Cc: stable <stable@vger.kernel.org>
Acked-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In half-duplex operation, RX should be started after TX completes.
If DMA is used, there is a case when the DMA transfer completes but the
TX FIFO is not emptied, so the RX cannot be restarted just yet.
Use a boolean variable to store this state and rearm TX interrupt mask
to be signaled again that the transfer finished. In interrupt transmit
handler this variable is used to start RX. A warning message is generated
if RX is activated before TX fifo is cleared.
Fixes: b389f173aa ("tty/serial: atmel: RS485 half duplex w/DMA: enable
RX after TX is done")
Signed-off-by: Razvan Stefanescu <razvan.stefanescu@microchip.com>
Acked-by: Richard Genoud <richard.genoud@gmail.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If an xfrmi is associated to a vrf layer 3 master device,
xfrm_policy_check() fails after traffic decapsulation. The input
interface is replaced by the layer 3 master device, and hence
xfrmi_decode_session() can't match the xfrmi anymore to satisfy
policy checking.
Extend ingress xfrmi lookup to honor the original layer 3 slave
device, allowing xfrm interfaces to operate within a vrf domain.
Fixes: f203b76d78 ("xfrm: Add virtual xfrm interfaces")
Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
When the kernel is compiled with preemption enabled, the URB completion
handler can run in parallel with the work responsible for waking up the
tty layer. If the URB handler sets the EVENT_TTY_WAKEUP bit during the
call to tty_port_tty_wakeup() to signal that there is room for additional
input, it will be cleared at the end of this call. As a result, TX traffic
on the upper layer will be blocked.
This can be seen with a kernel configured with CONFIG_PREEMPT, and a fast
modem connected with PPP running over a USB CDC-ACM port.
Use test_and_clear_bit() instead, which ensures that each wakeup requested
by the URB completion code will trigger a call to tty_port_tty_wakeup().
Fixes: 1aba579f3c cdc-acm: handle read pipe errors
Signed-off-by: Romain Izard <romain.izard.pro@gmail.com>
Cc: stable <stable@vger.kernel.org>
Acked-by: Oliver Neukum <oneukum@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Andrii Nakryiko says:
====================
This patch set fixes bug in btf_dedup_is_equiv() check mishandling equivalence
comparison between VOID kind in candidate type graph versus anonymous non-VOID
kind in canonical type graph.
Patch #1 fixes bug, by comparing candidate and canonical kinds for equality,
before proceeding to kind-specific checks.
Patch #2 adds a test case testing this specific scenario.
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This patch adds specific test exposing bug in btf_dedup_is_equiv() when
comparing candidate VOID type to a non-VOID canonical type. It's
important for canonical type to be anonymous, otherwise name equality
check will do the right thing and will exit early.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
btf_dedup_is_equiv() used to compare btf_type->info fields, before doing
kind-specific equivalence check. This comparsion implicitly verified
that candidate and canonical types are of the same kind. With enum fwd
resolution logic this check couldn't be done generically anymore, as for
enums info contains vlen, which differs between enum fwd and
fully-defined enum, so this check was subsumed by kind-specific
equivalence checks.
This change caused btf_dedup_is_equiv() to let through VOID vs other
types check to reach switch, which was never meant to be handing VOID
kind, as VOID kind is always pre-resolved to itself and is only
equivalent to itself, which is checked early in btf_dedup_is_equiv().
This change adds back BTF kind equality check in place of more generic
btf_type->info check, still defering further kind-specific checks to
a per-kind switch.
Fixes: 9768095ba9 ("btf: resolve enum fwds in btf_dedup")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
If we use the "i2c-" prefix for the binding documentation file name,
then it should match the file name of the driver, if possible. It is
possible for this driver, so rename it.
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
If we use the "i2c-" prefix for the binding documentation file name,
then it should match the file name of the driver, if possible. It is
possible for this driver, so rename it.
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Acked-by: Maxime Ripard <maxime.ripard@bootlin.com>
If we use the "i2c-" prefix for the binding documentation file name,
then it should match the file name of the driver, if possible. It is
possible for this driver, so rename it.
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
If we use the "i2c-" prefix for the binding documentation file name,
then it should match the file name of the driver, if possible. It is
possible for this driver, so rename it.
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
If we use the "i2c-" prefix for the binding documentation file name,
then it should match the file name of the driver, if possible. It is
possible for this driver, so rename it.
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Similar to commit edfc3722cf ("HID: quirks: Fix keyboard + touchpad on
Toshiba Click Mini not working"), the Lenovo Miix 630 has a combo
keyboard/touchpad device with vid:pid of 04F3:0400, which is shared with
Elan touchpads. The combo on the Miix 630 has an ACPI id of QTEC0001,
which is not claimed by the elan_i2c driver, so key on that similar to
what was done for the Toshiba Click Mini.
Signed-off-by: Jeffrey Hugo <jeffrey.l.hugo@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
The debugfs read callback must advance ppos or users using read() on
the file descriptor will never get the EOL. This wasn't spotted before
as I was using busybox cat for testing which uses sendfile() internally
and only noticed it now when switched to cat from coreutils.
Fixes: 2a9e27408e ("gpio: mockup: rework debugfs interface")
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
The Linux RISC-V 32bit kernel is broken after we moved setup_vm() from
kernel/setup.c to mm/init.c because Linux RISC-V 32bit kernel by default
uses cmodel=medlow which results in a non-position-independent setup_vm().
This patch fixes Linux RISC-V 32bit kernel booting by:
1. Forcing cmodel=medany for mm/init.c
2. Moving remaing MM-related stuff va_pa_offset, pfn_base and
empty_zero_page from kernel/setup.c to mm/init.c
Further, the setup_vm() cannot handle GCC instrumentation for FTRACE so
we disable it for mm/init.c by not using "-pg" compiler flag.
Fixes: 6f1e9e946f ("RISC-V: Move setup_vm() to mm/init.c")
Suggested-by: Christoph Hellwig <hch@lst.de>
Suggested-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
A memory save operation to 8-byte variable in RV32 is divided into
two sw instructions in the put_user macro. The current fixup returns
execution flow to the second sw instead of the one after it.
This patch fixes this fixup code according to the load access part.
Signed-off-by: Alan Kao<alankao@andestech.com>
Cc: Greentime Hu <greentime@andestech.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
According to HUT 1.12 usage 0xb5 from the generic desktop page is reserved
for switching between external and internal display, so let's add the
mapping.
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
According to HUT 1.12 usage 0x232 from the consumer page is reserved for
switching application between full screen and windowed mode, so let's add
the mapping.
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
According to HUTRR73 usages 0x79, 0x7a and 0x7c from the consumer page
correspond to Brightness Up/Down/Toggle keys, so let's add the mappings.
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
According to HUTRR77 usage 0x29f from the consumer page is reserved for
the Desktop application to present all running user’s application windows.
Linux defines KEY_SCALE to request Compiz Scale (Expose) mode, so let's
add the mapping.
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
According to HUTRR37 usage 0x6d from the consumer usage page corresponds
to action that selects the next available supported aspect ratio option
on a device which outputs or displays video. However KEY_ZOOM means
activate "Full Screen" mode, KEY_ASPECT_RATIO should be used instead.
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
We defined better names for keys to activate full screen mode or
change aspect ratio (while keeping the existing keycodes to avoid
breaking userspace), so let's use them in the document.
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
It is hard to say what KEY_SCREEN and KEY_ZOOM mean, but historically DVB
folks have used them to indicate switch to full screen mode. Later, they
converged on using KEY_ZOOM to switch into full screen mode and KEY)SCREEN
to control aspect ratio (see Documentation/media/uapi/rc/rc-tables.rst).
Let's commit to these uses, and define:
- KEY_FULL_SCREEN (and make KEY_ZOOM its alias)
- KEY_ASPECT_RATIO (and make KEY_SCREEN its alias)
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
There are many Lenovo laptops which need elan_i2c support, this patch adds
relevant IDs to the Elan driver so that touchpads are recognized.
Signed-off-by: KT Liao <kt.liao@emc.com.tw>
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
In cpu_to_drc_index() in the case when FW_FEATURE_DRC_INFO is absent,
we currently use of_read_property() to obtain the pointer to the array
corresponding to the property "ibm,drc-indexes". The elements of this
array are of type __be32, but are accessed without any conversion to
the OS-endianness, which is buggy on a Little Endian OS.
Fix this by using of_property_read_u32_index() accessor function to
safely read the elements of the array.
Fixes: e83636ac33 ("pseries/drc-info: Search DRC properties for CPU indexes")
Cc: stable@vger.kernel.org # v4.16+
Reported-by: Pavithra R. Prakash <pavrampu@in.ibm.com>
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
[mpe: Make the WARN_ON a WARN_ON_ONCE so it's not retriggerable]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The current check for WoL on i40e is broken. Code comment says only
magic packet is supported, so only check for that.
Fixes: 540a152da7 (i40e/ixgbe/igb: fail on new WoL flag setting WAKE_MAGICSECURE)
Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The ixgbe ignores errors returned from mdiobus_register() and leaves
adapter->mii_bus non-NULL and MDIO bus state as MDIOBUS_ALLOCATED.
This triggers a BUG from mdiobus_unregister() during ixgbe_remove() call.
Fixes: 8fa10ef012 ("ixgbe: register a mdiobus")
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The runtime_suspend device callbacks are not supposed to save
configuration state or change the power state. Commit fb29f76cc5
("igb: Fix an issue that PME is not enabled during runtime suspend")
changed the driver to not save configuration state during runtime
suspend, however the driver callback still put the device into a
low-power state. This causes a warning in the pci pm core and results in
pci_pm_runtime_suspend not calling pci_save_state or pci_finish_runtime_suspend.
Fix this by not changing the power state either, leaving that to pci pm
core, and make the same change for suspend callback as well.
Also move a couple of defines into the appropriate header file instead
of inline in the .c file.
Fixes: fb29f76cc5 ("igb: Fix an issue that PME is not enabled during runtime suspend")
Signed-off-by: Arvind Sankar <niveditas98@gmail.com>
Reviewed-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Commit 0ac30ce433 ("i40e: fix up 32 bit timespec references",
2017-07-26) claims to be cleaning up references to 32-bit timespecs.
The actual contents of the commit make no sense, as it converts a call
to timespec64_add into timespec64_add_ns. This would seem ok, if (a) the
change was documented in the commit message, and (b) timespec64_add_ns
supported negative numbers.
timespec64_add_ns doesn't work with signed deltas, because the
implementation is based around iter_div_u64_rem. This change resulted in
a regression where i40e_ptp_adjtime would interpret small negative
adjustments as large positive additions, resulting in incorrect
behavior.
This commit doesn't appear to fix anything, is not well explained, and
introduces a bug, so lets just revert it.
Reverts: 0ac30ce433 ("i40e: fix up 32 bit timespec references", 2017-07-26)
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Pull NFS client bugfixes from Trond Myklebust:
"Highlights include:
Stable fixes:
- Fix nfs4_lock_state refcounting in nfs4_alloc_{lock,unlock}data()
- fix mount/umount race in nlmclnt.
- NFSv4.1 don't free interrupted slot on open
Bugfixes:
- Don't let RPC_SOFTCONN tasks time out if the transport is connected
- Fix a typo in nfs_init_timeout_values()
- Fix layoutstats handling during read failovers
- fix uninitialized variable warning"
* tag 'nfs-for-5.1-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
SUNRPC: fix uninitialized variable warning
pNFS/flexfiles: Fix layoutstats handling during read failovers
NFS: Fix a typo in nfs_init_timeout_values()
SUNRPC: Don't let RPC_SOFTCONN tasks time out if the transport is connected
NFSv4.1 don't free interrupted slot on open
NFS: fix mount/umount race in nlmclnt.
NFS: Fix nfs4_lock_state refcounting in nfs4_alloc_{lock,unlock}data()
Section 2.2.1 BTF_KIND_INT a bullet list was collapsed due to
text reflow in commit 9ab5305dbe ("docs/btf: reflow text to
fill up to 78 characters").
This patch correct the mistake. Also adjust next bullet list,
which is used for comparison, to get rendered the same way.
Fixes: 9ab5305dbe ("docs/btf: reflow text to fill up to 78 characters")
Link: https://www.kernel.org/doc/html/latest/bpf/btf.html#btf-kind-int
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Avoid following compiler warning on uninitialized variable
net/sunrpc/xprtsock.c: In function ‘xs_read_stream_request.constprop’:
net/sunrpc/xprtsock.c:525:10: warning: ‘read’ may be used uninitialized in this function [-Wmaybe-uninitialized]
return read;
^~~~
net/sunrpc/xprtsock.c:529:23: warning: ‘ret’ may be used uninitialized in this function [-Wmaybe-uninitialized]
return ret < 0 ? ret : read;
~~~~~~~~~~~~~~^~~~~~
Signed-off-by: Alakesh Haloi <alakesh.haloi@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Paul Chaignon says:
====================
The BPF verifier checks the maximum number of call stack frames twice,
first in the main CFG traversal (do_check) and then in a subsequent
traversal (check_max_stack_depth). If the second check fails, it logs a
'verifier bug' warning and errors out, as the number of call stack frames
should have been verified already.
However, the second check may fail without indicating a verifier bug: if
the excessive function calls reside in dead code, the main CFG traversal
may not visit them; the subsequent traversal visits all instructions,
including dead code.
This case raises the question of how invalid dead code should be treated.
The first patch implements the conservative option and rejects such code;
the second adds a test case.
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This patch adds a test case with an excessive number of call stack frames
in dead code.
Signed-off-by: Paul Chaignon <paul.chaignon@orange.com>
Tested-by: Xiao Han <xiao.han@orange.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The BPF verifier checks the maximum number of call stack frames twice,
first in the main CFG traversal (do_check) and then in a subsequent
traversal (check_max_stack_depth). If the second check fails, it logs a
'verifier bug' warning and errors out, as the number of call stack frames
should have been verified already.
However, the second check may fail without indicating a verifier bug: if
the excessive function calls reside in dead code, the main CFG traversal
may not visit them; the subsequent traversal visits all instructions,
including dead code.
This case raises the question of how invalid dead code should be treated.
This patch implements the conservative option and rejects such code.
Signed-off-by: Paul Chaignon <paul.chaignon@orange.com>
Tested-by: Xiao Han <xiao.han@orange.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Renaming a netdev-trigger-tracked interface was resulting in an
unbalanced dev_hold().
Example:
> iw phy phy0 interface add foo type __ap
> echo netdev > trigger
> echo foo > device_name
> ip link set foo name bar
> iw dev bar del
[ 237.355366] unregister_netdevice: waiting for bar to become free. Usage count = 1
[ 247.435362] unregister_netdevice: waiting for bar to become free. Usage count = 1
[ 257.545366] unregister_netdevice: waiting for bar to become free. Usage count = 1
Above problem was caused by trigger checking a dev->name which obviously
changes after renaming an interface. It meant missing all further events
including the NETDEV_UNREGISTER which is required for calling dev_put().
This change fixes that by:
1) Comparing device struct *address* for notification-filtering purposes
2) Dropping unneeded NETDEV_CHANGENAME code (no behavior change)
Fixes: 06f502f57d ("leds: trigger: Introduce a NETDEV trigger")
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
It might happen that Tx conf acknowledges a frame before it was
subscribed in bql, as subscribing was previously done after the enqueue
operation.
This patch moves the netdev_tx_sent_queue call before the actual frame
enqueue, so that this can never happen.
Fixes: 569dac6a5a ("dpaa2-eth: bql support")
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
clang warns about possible bugs in a dead code branch after
BUG_ON(1) when CONFIG_PROFILE_ALL_BRANCHES is enabled:
drivers/net/ethernet/chelsio/cxgb4/sge.c:479:3: error: variable 'buf_size' is used uninitialized whenever 'if'
condition is false [-Werror,-Wsometimes-uninitialized]
BUG_ON(1);
^~~~~~~~~
include/asm-generic/bug.h:61:36: note: expanded from macro 'BUG_ON'
#define BUG_ON(condition) do { if (unlikely(condition)) BUG(); } while (0)
^~~~~~~~~~~~~~~~~~~
include/linux/compiler.h:48:23: note: expanded from macro 'unlikely'
# define unlikely(x) (__branch_check__(x, 0, __builtin_constant_p(x)))
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/chelsio/cxgb4/sge.c:482:9: note: uninitialized use occurs here
return buf_size;
^~~~~~~~
drivers/net/ethernet/chelsio/cxgb4/sge.c:479:3: note: remove the 'if' if its condition is always true
BUG_ON(1);
^
include/asm-generic/bug.h:61:32: note: expanded from macro 'BUG_ON'
#define BUG_ON(condition) do { if (unlikely(condition)) BUG(); } while (0)
^
drivers/net/ethernet/chelsio/cxgb4/sge.c:459:14: note: initialize the variable 'buf_size' to silence this warning
int buf_size;
^
= 0
Use BUG() here to create simpler code that clang understands
correctly.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We must not use legacy clock defines for dts clckctrl clocks as the offsets
will be wrong.
Fixes: 87fc89ced3 ("ARM: dts: am335x: Move l4 child devices to probe them with ti-sysc")
Cc: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
In dumpit, unlike doit, the check for info_get op being defined
is missing. Add it and avoid null pointer dereference in case driver
does not define this op.
Fixes: f9cf22882c ("devlink: add device information API")
Reported-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously the green and amber LEDs on this quad PHY were solid, to
indicate an encoding of the link speed (10/100/1000).
This keeps the LEDs always on just as before, but now they flash on
Rx/Tx activity.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When running a syz script, a panic occurred:
[ 156.088228] BUG: KASAN: use-after-free in tipc_disc_timeout+0x9c9/0xb20 [tipc]
[ 156.094315] Call Trace:
[ 156.094844] <IRQ>
[ 156.095306] dump_stack+0x7c/0xc0
[ 156.097346] print_address_description+0x65/0x22e
[ 156.100445] kasan_report.cold.3+0x37/0x7a
[ 156.102402] tipc_disc_timeout+0x9c9/0xb20 [tipc]
[ 156.106517] call_timer_fn+0x19a/0x610
[ 156.112749] run_timer_softirq+0xb51/0x1090
It was caused by the netns freed without deleting the discoverer timer,
while later on the netns would be accessed in the timer handler.
The timer should have been deleted by tipc_net_stop() when cleaning up a
netns. However, tipc has been able to enable a bearer and start d->timer
without the local node_addr set since Commit 52dfae5c85 ("tipc: obtain
node identity from interface by default"), which caused the timer not to
be deleted in tipc_net_stop() then.
So fix it in tipc_net_stop() by changing to check local node_id instead
of local node_addr, as Jon suggested.
While at it, remove the calling of tipc_nametbl_withdraw() there, since
tipc_nametbl_stop() will take of the nametbl's freeing after.
Fixes: 52dfae5c85 ("tipc: obtain node identity from interface by default")
Reported-by: syzbot+a25307ad099309f1c2b9@syzkaller.appspotmail.com
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
New device of QNAP based on aqc111u
Add this ID to blacklist of cdc_ether driver as well
Signed-off-by: Dmitry Bezrukov <dmitry.bezrukov@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch implements accessors for the QCA8337 MDIO access
through the MDIO_MASTER register, which makes it possible to
access the PHYs on slave-bus through the switch. In cases
where the switch ports are already mapped via external
"phy-phandles", the internal mdio-bus is disabled in order to
prevent a duplicated discovery and enumeration of the same
PHYs. Don't use mixed external and internal mdio-bus
configurations, as this is not supported by the hardware.
Signed-off-by: Christian Lamparter <chunkeey@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This belated patch implements Andrew Lunn's request of
"remove the phy_read() and phy_write() functions."
<https://lore.kernel.org/patchwork/comment/902734/>
While seemingly harmless, this causes the switch's user
port PHYs to get registered twice. This is because the
DSA subsystem will create a slave mdio-bus not knowing
that the qca8k_phy_(read|write) accessors operate on
the external mdio-bus. So the same "bus" gets effectively
duplicated.
Cc: stable@vger.kernel.org
Fixes: 6b93fb4648 ("net-next: dsa: add new driver for qca8xxx family")
Signed-off-by: Christian Lamparter <chunkeey@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch updates the qca8k's binding to document to the
approach for using the internal mdio-bus of the supported
qca8k switches.
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Christian Lamparter <chunkeey@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the example, the phy at phy@0 is clashing with
the switch0@0 at the same address. Usually, the switches
are accessible through pseudo PHYs which in case of the
qca8k are located at 0x10 - 0x18.
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Christian Lamparter <chunkeey@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull btrfs fixes from David Sterba:
- fsync fixes: i_size for truncate vs fsync, dio vs buffered during
snapshotting, remove complicated but incomplete assertion
- removed excessive warnigs, misreported device stats updates
- fix raid56 page mapping for 32bit arch
- fixes reported by static analyzer
* tag 'for-5.1-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
Btrfs: fix assertion failure on fsync with NO_HOLES enabled
btrfs: Avoid possible qgroup_rsv_size overflow in btrfs_calculate_inode_block_rsv_size
btrfs: Fix bound checking in qgroup_trace_new_subtree_blocks
btrfs: raid56: properly unmap parity page in finish_parity_scrub()
btrfs: don't report readahead errors and don't update statistics
Btrfs: fix file corruption after snapshotting due to mix of buffered/DIO writes
btrfs: remove WARN_ON in log_dir_items
Btrfs: fix incorrect file size after shrinking truncate and fsync
Pull tracing fixes from Steven Rostedt:
"Three small fixes:
- A fix to a double free in the histogram code
- Uninitialized variable fix
- Use NULL instead of zero fix and spelling fixes"
* tag 'trace-v5.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ftrace: Fix warning using plain integer as NULL & spelling corrections
tracing: initialize variable in create_dyn_event()
tracing: Remove unnecessary var_ref destroy in track_data_destroy()
Pull file locking bugfix from Jeff Layton:
"Just a single fix for a bug that crept into POSIX lock deadlock
detection in v5.0"
* tag 'locks-v5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
locks: wake any locks blocked on request before deadlock check
On an Acer Predator Helios 500 (Ryzen version), the laptop's speakers
don't work out of the box.
The problem can be worked around with hdajackretask, remapping the
"Black Headphone, Right side" pin (0x21) to the Internal speaker.
This patch adds a quirk to change this mapping by default.
[ corrected ALC299_FIXUP_PREDATOR_SPK definition and adapted for the
latest tree by tiwai ]
Signed-off-by: Bernhard Rosenkraenzer <bero@lindev.ch>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
XFS applies more strict serialization constraints to unaligned
direct writes to accommodate things like direct I/O layer zeroing,
unwritten extent conversion, etc. Unaligned submissions acquire the
exclusive iolock and wait for in-flight dio to complete to ensure
multiple submissions do not race on the same block and cause data
corruption.
This generally works in the case of an aligned dio followed by an
unaligned dio, but the serialization is lost if I/Os occur in the
opposite order. If an unaligned write is submitted first and
immediately followed by an overlapping, aligned write, the latter
submits without the typical unaligned serialization barriers because
there is no indication of an unaligned dio still in-flight. This can
lead to unpredictable results.
To provide proper unaligned dio serialization, require that such
direct writes are always the only dio allowed in-flight at one time
for a particular inode. We already acquire the exclusive iolock and
drain pending dio before submitting the unaligned dio. Wait once
more after the dio submission to hold the iolock across the I/O and
prevent further submissions until the unaligned I/O completes. This
is heavy handed, but consistent with the current pre-submission
serialization for unaligned direct writes.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
To enable S24_LE format, sample_type in topology fw has to be set to 1.
But sample_type defined in topology firmware configuration is not
getting reflected in the dsp param. This patch sets sample_type in base
config so that the sample type defined in the topology firmware is reflected
in the dsp params. This issues was uncovered while debugging the S24_LE format
which require the MSB byte in 32 bit word to be skipped. Setting sample_type
in topology firmware to 1 helps to skip MSB byte word.
Signed-off-by: Jenny TC <jenny.tc@intel.com>
Acked-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Oded writes:
The following bug fixes are included in this tag:
- Fix host crash upon resume after suspend
- Fix MMU related bugs which result in user's jobs getting stuck
- Fix race between user context cleanup and hard-reset which results in
host crash
- Fix sparse warning
* tag 'misc-habanalabs-fixes-2019-03-26' of git://people.freedesktop.org/~gabbayo/linux: (265 commits)
habanalabs: cast to expected type
habanalabs: prevent host crash during suspend/resume
habanalabs: perform accounting for active CS
habanalabs: fix mapping with page size bigger than 4KB
habanalabs: complete user context cleanup before hard reset
habanalabs: fix bug when mapping very large memory area
habanalabs: fix MMU number of pages calculation
Linux 5.1-rc2
clocksource/drivers/clps711x: Remove board support
ext4: prohibit fstrim in norecovery mode
ext4: cleanup bh release code in ext4_ind_remove_space()
ext4: brelse all indirect buffer in ext4_ind_remove_space()
genirq: Mark expected switch case fall-through
clocksource/drivers/riscv: Fix clocksource mask
x86/gart: Exclude GART aperture from kcore
cifs: update internal module version number
SMB3: Fix SMB3.1.1 guest mounts to Samba
cifs: Fix slab-out-of-bounds when tracing SMB tcon
cifs: allow guest mounts to work for smb3.11
fix incorrect error code mapping for OBJECTID_NOT_FOUND
...
When EXTCON is a loadable module, mtu3 fails to link as built-in:
drivers/usb/mtu3/mtu3_plat.o: In function `mtu3_probe':
mtu3_plat.c:(.text+0x690): undefined reference to `extcon_get_edev_by_phandle'
Add a Kconfig dependency to force mtu3 also to be a loadable module
if extconn is, but still allow it to be built without extcon.
Fixes: d0ed062a8b ("usb: mtu3: dual-role mode support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
of_match_device in usb251xb_probe can fail and returns a NULL pointer.
The patch avoids a potential NULL pointer dereference in this scenario.
Signed-off-by: Aditya Pakki <pakki001@umn.edu>
Reviewed-by: Richard Leitner <richard.leitner@skidata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Some PHYs do not support PHY_MODE_USB_HOST_SS, i.e. USB 3.0 or higher.
Fall back and try the more generic PHY_MODE_USB_HOST if it fails.
Fixes: b97a313483 ("usb: core: comply to PHY framework")
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Tested-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
While only the first PHY supports mode switching, the remaining PHYs
work in USB host mode. They should support set_mode with mode=USB_HOST
instead of failing. This is especially needed now that the USB core does
set_mode for all USB ports, which was added in commit b97a313483 ("usb:
core: comply to PHY framework").
Make set_mode with mode=USB_HOST a no-op instead of failing for the
non-OTG USB PHYs.
Fixes: 6ba43c2919 ("phy-sun4i-usb: Add support for phy_set_mode")
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
esp_output_udp_encap can produce a length that doesn't fit in the 16
bits of a UDP header's length field. In that case, we'll send a
fragmented packet whose length is larger than IP_MAX_MTU (resulting in
"Oversized IP packet" warnings on receive) and with a bogus UDP
length.
To prevent this, add a length check to esp_output_udp_encap and return
-EMSGSIZE on failure.
This seems to be older than git history.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
In commit 6a53b75932 ("xfrm: check id proto in validate_tmpl()")
I introduced a check for xfrm protocol, but according to Herbert
IPSEC_PROTO_ANY should only be used as a wildcard for lookup, so
it should be removed from validate_tmpl().
And, IPSEC_PROTO_ANY is expected to only match 3 IPSec-specific
protocols, this is why xfrm_state_flush() could still miss
IPPROTO_ROUTING, which leads that those entries are left in
net->xfrm.state_all before exit net. Fix this by replacing
IPSEC_PROTO_ANY with zero.
This patch also extracts the check from validate_tmpl() to
xfrm_id_proto_valid() and uses it in parse_ipsecrequest().
With this, no other protocols should be added into xfrm.
Fixes: 6a53b75932 ("xfrm: check id proto in validate_tmpl()")
Reported-by: syzbot+0bf0519d6e0de15914fe@syzkaller.appspotmail.com
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
There are a few windows during AER/EEH when we can access PCIe I/O mapped
registers. This will harden the access to insure we do not allow PCIe
access during errors
Signed-off-by: Dave Carroll <david.carroll@microsemi.com>
Reviewed-by: Sagar Biradar <sagar.biradar@microchip.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
During expander reset handling, the driver invokes kernel function
scsi_host_find_tag() to obtain outstanding requests associated with the
scsi host managed by the driver. Driver loops from tag value zero to hba
queue depth to obtain the outstanding scmds. But when blk-mq is enabled,
the block layer may return stale entry for one or more requests. This may
lead to kernel panic if the returned value is inaccessible or the memory
pointed by the returned value is reused.
Reference of upstream discussion:
https://patchwork.kernel.org/patch/10734933/
Instead of calling scsi_host_find_tag() API for each and every smid (smid
is tag +1) from one to shost->can_queue, now driver will call this API (to
obtain the outstanding scmd) only for those smid's which are outstanding at
the driver level.
Driver will determine whether this smid is outstanding at driver level by
looking into it's corresponding MPI request frame, if its MPI request frame
is empty, then it means that this smid is free and does not need to call
scsi_host_find_tag() for it. By doing this, driver will invoke
scsi_host_find_tag() for only those tags which are outstanding at the
driver level.
Driver will check whether particular MPI request frame is empty or not by
looking into the "DevHandle" field. If this field is zero then it means
that this MPI request is empty. For active MPI request DevHandle must be
non-zero.
Also driver will memset the MPI request frame once the corresponding scmd
is processed (i.e. just before calling
scmd->done function).
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
syzkaller was able to generate the following UAF in bpf:
BUG: KASAN: use-after-free in lookup_last fs/namei.c:2269 [inline]
BUG: KASAN: use-after-free in path_lookupat.isra.43+0x9f8/0xc00 fs/namei.c:2318
Read of size 1 at addr ffff8801c4865c47 by task syz-executor2/9423
CPU: 0 PID: 9423 Comm: syz-executor2 Not tainted 4.20.0-rc1-next-20181109+
#110
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x244/0x39d lib/dump_stack.c:113
print_address_description.cold.7+0x9/0x1ff mm/kasan/report.c:256
kasan_report_error mm/kasan/report.c:354 [inline]
kasan_report.cold.8+0x242/0x309 mm/kasan/report.c:412
__asan_report_load1_noabort+0x14/0x20 mm/kasan/report.c:430
lookup_last fs/namei.c:2269 [inline]
path_lookupat.isra.43+0x9f8/0xc00 fs/namei.c:2318
filename_lookup+0x26a/0x520 fs/namei.c:2348
user_path_at_empty+0x40/0x50 fs/namei.c:2608
user_path include/linux/namei.h:62 [inline]
do_mount+0x180/0x1ff0 fs/namespace.c:2980
ksys_mount+0x12d/0x140 fs/namespace.c:3258
__do_sys_mount fs/namespace.c:3272 [inline]
__se_sys_mount fs/namespace.c:3269 [inline]
__x64_sys_mount+0xbe/0x150 fs/namespace.c:3269
do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x457569
Code: fd b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
ff 0f 83 cb b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fde6ed96c78 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 0000000000457569
RDX: 0000000020000040 RSI: 0000000020000000 RDI: 0000000000000000
RBP: 000000000072bf00 R08: 0000000020000340 R09: 0000000000000000
R10: 0000000000200000 R11: 0000000000000246 R12: 00007fde6ed976d4
R13: 00000000004c2c24 R14: 00000000004d4990 R15: 00000000ffffffff
Allocated by task 9424:
save_stack+0x43/0xd0 mm/kasan/kasan.c:448
set_track mm/kasan/kasan.c:460 [inline]
kasan_kmalloc+0xc7/0xe0 mm/kasan/kasan.c:553
__do_kmalloc mm/slab.c:3722 [inline]
__kmalloc_track_caller+0x157/0x760 mm/slab.c:3737
kstrdup+0x39/0x70 mm/util.c:49
bpf_symlink+0x26/0x140 kernel/bpf/inode.c:356
vfs_symlink+0x37a/0x5d0 fs/namei.c:4127
do_symlinkat+0x242/0x2d0 fs/namei.c:4154
__do_sys_symlink fs/namei.c:4173 [inline]
__se_sys_symlink fs/namei.c:4171 [inline]
__x64_sys_symlink+0x59/0x80 fs/namei.c:4171
do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Freed by task 9425:
save_stack+0x43/0xd0 mm/kasan/kasan.c:448
set_track mm/kasan/kasan.c:460 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/kasan.c:521
kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
__cache_free mm/slab.c:3498 [inline]
kfree+0xcf/0x230 mm/slab.c:3817
bpf_evict_inode+0x11f/0x150 kernel/bpf/inode.c:565
evict+0x4b9/0x980 fs/inode.c:558
iput_final fs/inode.c:1550 [inline]
iput+0x674/0xa90 fs/inode.c:1576
do_unlinkat+0x733/0xa30 fs/namei.c:4069
__do_sys_unlink fs/namei.c:4110 [inline]
__se_sys_unlink fs/namei.c:4108 [inline]
__x64_sys_unlink+0x42/0x50 fs/namei.c:4108
do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
In this scenario path lookup under RCU is racing with the final
unlink in case of symlinks. As Linus puts it in his analysis:
[...] We actually RCU-delay the inode freeing itself, but
when we do the final iput(), the "evict()" function is called
synchronously. Now, the simple fix would seem to just RCU-delay
the kfree() of the symlink data in bpf_evict_inode(). Maybe
that's the right thing to do. [...]
Al suggested to piggy-back on the ->destroy_inode() callback in
order to implement RCU deferral there which can then kfree() the
inode->i_link eventually right before putting inode back into
inode cache. By reusing free_inode_nonrcu() from there we can
avoid the need for our own inode cache and just reuse generic
one as we currently do.
And in-fact on top of all this we should just get rid of the
bpf_evict_inode() entirely. This means truncate_inode_pages_final()
and clear_inode() will then simply be called by the fs core via
evict(). Dropping the reference should really only be done when
inode is unhashed and nothing reachable anymore, so it's better
also moved into the final ->destroy_inode() callback.
Fixes: 0f98621bef ("bpf, inode: add support for symlinks and fix mtime/ctime")
Reported-by: syzbot+fb731ca573367b7f6564@syzkaller.appspotmail.com
Reported-by: syzbot+a13e5ead792d6df37818@syzkaller.appspotmail.com
Reported-by: syzbot+7a8ba368b47fdefca61e@syzkaller.appspotmail.com
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Analyzed-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lore.kernel.org/lkml/0000000000006946d2057bbd0eef@google.com/T/
So far we effectively clear the BMCR register. Some PHY's can deal
with this (e.g. because they reset BMCR to a default as part of a
soft-reset) whilst on others this causes issues because e.g. the
autoneg bit is cleared. Marvell is an example, see also thread [0].
So let's be a little bit more gentle and leave all bits we're not
interested in as-is. This change is needed for PHY drivers to
properly deal with the original patch.
[0] https://marc.info/?t=155264050700001&r=1&w=2
Fixes: 6e2d85ec05 ("net: phy: Stop with excessive soft reset")
Tested-by: Phil Reid <preid@electromag.com.au>
Tested-by: liweihang <liweihang@hisilicon.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If a multi-function device's bandwidth is already limited when it is
enumerated, a message is logged only for function 0. By contrast, when
downtraining occurs after enumeration, a message is logged for all
functions. That's because the former uses pcie_report_downtraining(),
whereas the latter uses __pcie_print_link_status() (which doesn't filter
functions != 0). I am seeing this happen on a MacBookPro9,1 with a GPU
(function 0) and an integrated HDA controller (function 1).
Avoid this incongruence by calling pcie_report_downtraining() in both
cases.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Alexandru Gagniuc <alex.gagniuc@dellteam.com>
When booting a MacBookPro9,1, duplicate link downtraining messages are
logged for the devices directly attached to the two CPU-internal Root Ports
of the Core i7 3615QM: Once on device enumeration and once on enablement
of the bandwidth notification interrupt on the Root Ports.
Duplicate messages do not occur with Root Ports on the PCH and Downstream
Ports on the Thunderbolt controller: Only a single message is logged for
these, namely on device enumeration.
The reason for the duplicate messages is a stale interrupt in the Link
Status register of the 3615QM's internal Root Ports. Avoid by clearing the
interrupt before enabling it.
An alternative approach would be to clear the interrupt already on device
enumeration or to report link downtraining only if the speed has changed.
That way, link downtraining occurring between device enumeration and
enablement of the bandwidth notification interrupt could be caught.
However clearing stale interrupts before enabling them is a standard
operating procedure for any driver and keeping the two steps in one place
makes the code easier to follow.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Alexandru Gagniuc <alex.gagniuc@dellteam.com>
A threaded IRQ with a NULL handler does not work with level-triggered
interrupts. request_threaded_irq() will return an error:
genirq: Threaded irq requested with handler=NULL and !ONESHOT for irq 16
pcie_bw_notification: probe of 0000:00:1b.0:pcie010 failed with error -22
For level interrupts we need to silence the interrupt before exiting the
IRQ handler, so just clear the PCI_EXP_LNKSTA_LBMS bit there.
Fixes: e8303bb7a7 ("PCI/LINK: Report degraded links via link bandwidth notification")
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
The ACPI specification states that if the "Guaranteed Performance
Register" is not implemented, the OSPM assumes guaranteed performance
to always be equal to nominal performance.
So for invalid or unimplemented guaranteed performance register, use
nominal performance as guaranteed performance.
This change will fall back to nominal_perf when guranteed_perf is
invalid. If nominal_perf is also invalid or not present, fall back
to the existing implementation, which is to read from HWP Capabilities
MSR.
Fixes: 86d333a8cc ("cpufreq: intel_pstate: Add base_frequency attribute")
Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: 4.20+ <stable@vger.kernel.org> # 4.20+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
As per the ACPI specification, "Guaranteed Performance Register" is
a "Buffer" field and it cannot be "Integer", so treat the "Integer"
type for "Guaranteed Performance Register" field as invalid and
ignore its value in that case.
Also save one cpc_read() call when "Guaranteed Performance Register"
is not present, which means a register defined as:
"Register(SystemMemory, 0, 0, 0, 0)".
Fixes: 29523f0953 ("ACPI / CPPC: Add support for guaranteed performance")
Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: 4.20+ <stable@vger.kernel.org> # 4.20+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This reverts commit 1aec421120.
Steven Rostedt reports that it causes a hang at bootup and bisected it
to this commit.
The troigger is apparently a module alias for "parport_lowlevel" that
points to "parport_pc", which causes a hang with
modprobe -q -- parport_lowlevel
blocking forever with a backtrace like this:
wait_for_completion_killable+0x1c/0x28
call_usermodehelper_exec+0xa7/0x108
__request_module+0x351/0x3d8
get_lowlevel_driver+0x28/0x41 [parport]
__parport_register_driver+0x39/0x1f4 [parport]
daisy_drv_init+0x31/0x4f [parport]
parport_bus_init+0x5d/0x7b [parport]
parport_default_proc_register+0x26/0x1000 [parport]
do_one_initcall+0xc2/0x1e0
do_init_module+0x50/0x1d4
load_module+0x1c2e/0x21b3
sys_init_module+0xef/0x117
Supid says:
"Due to the new device model daisy driver will now try to find the
parallel ports while trying to register its driver so that it can bind
with them. Now, since daisy driver is loaded while parport bus is
initialising the list of parport is still empty and it tries to load
the lowlevel driver, which has an alias set to parport_pc, now causes
a deadlock"
But I don't think the daisy driver should be loaded by the parport
initialization in the first place, so let's revert the whole change.
If the daisy driver can just initialize separately on its own (like a
driver should), instead of hooking into the parport init sequence
directly, this issue probably would go away.
Reported-and-bisected-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reported-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A bvec can now consist of multiple physically contiguous pages.
This means that bvec_iter_advance() can move to a different page while
staying in the same bvec (i.e. ->bi_bvec_done != 0).
The messenger works in terms of segments which can now be defined as
the smaller of a bvec and a page. The "more bytes to process in this
segment" condition holds only if bvec_iter_advance() leaves us in the
same bvec _and_ in the same page. On next bvec (possibly in the same
page) and on next page (possibly in the same bvec) we may need to set
->last_piece.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
When connecting PHY, we set the mode to PHY_INTERFACE_MODE_GMII which is
not always correct. Specifically on boards where RGMII_RXID is needed
networking now longer works with at803x after commit 6d4cd041f0
("net: phy: at803x: disable delay only for RGMII mode").
Fix by passing the correct mode. Tested on EdgeRouter Lite
(RGMII_RXID, at803x PHY) and D-Link DSR-500N (RGMII, broadcom PHY).
Fixes: 6d4cd041f0 ("net: phy: at803x: disable delay only for RGMII mode")
Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Complete read error handling paths for all three kinds of
compressed pages:
1) For cache-managed pages, PG_uptodate will be checked since
read_endio will unlock and SetPageUptodate for these pages;
2) For inplaced pages, read_endio cannot SetPageUptodate directly
since it should be used to mark the final decompressed data,
PG_error will be set with page locked for IO error instead;
3) For staging pages, PG_error is used, which is similar to
what we do for inplaced pages.
Fixes: 3883a79abd ("staging: erofs: introduce VLE decompression support")
Cc: <stable@vger.kernel.org> # 4.19+
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It appears on some slower systems that the driver can find its way
out of the workqueue while the interrupt is disabled by continuous polling
by it.
Move MACvIntEnable to vnt_interrupt_work so that it is always enabled
on all routes out of vnt_interrupt_process.
Move MACvIntDisable so that the device doesn't keep polling the system
while the workqueue is being processed.
Signed-off-by: Malcolm Priestley <tvboxspy@gmail.com>
CC: stable@vger.kernel.org # v4.2+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In case of direct write -EAGAIN will be returned if page cache was
previously populated. To avoid immediate completion of a request
with -EAGAIN error write has to be offloaded to the async worker,
like io_read() does.
Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We now wrap sbitmap waitqueues in an active counter, so we can avoid
iterating wakeups unless we have waiters there. This works as long as
everyone that's manipulating the waitqueues use the proper helpers. For
the tag wait case for shared tags, however, we add ourselves to the
waitqueue without incrementing/decrementing the ->ws_active count. This
means that wakeups can take a long time to happen.
Fix this by manually doing the inc/dec as needed for the wait queue
handling.
Reported-by: Michael Leun <kbug@newton.leun.net>
Tested-by: Michael Leun <kbug@newton.leun.net>
Cc: stable@vger.kernel.org
Reviewed-by: Omar Sandoval <osandov@fb.com>
Fixes: 5d2ee7122c ("sbitmap: optimize wakeup check")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This reverts commit 906b40b246 ("dmaengine: stm32-mdma: Add a check on
read_u32_array")
As stated by bindings "st,ahb-addr-masks" is optional.
The statement inserted by this commit makes this property
mandatory and prevents MDMA to be probed in case property not present.
Signed-off-by: Pierre-Yves MORDRET <pierre-yves.mordret@st.com>
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Enabling CQE support on Tegra186 Jetson TX2 has introduced a regression
that is causing accesses to the file-system on the eMMC to fail. Errors
such as the following have been observed ...
mmc2: running CQE recovery
mmc2: mmc_select_hs400 failed, error -110
print_req_error: I/O error, dev mmcblk2, sector 8 flags 80700
mmc2: cqhci: CQE failed to exit halt state
For now disable CQE support for Tegra186 until this issue is resolved.
Fixes: dfd3cb6feb arm64: tegra: Add CQE Support for SDMMC4
Signed-off-by: Jonathan Hunter <jonathanh@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
i.MX fixes for 5.1:
- Correct phy mode setting of imx6dl-yapp4 board to fix a problem
caused by commit 5ecdd77c61 ("net: dsa: qca8k: disable delay
for RGMII mode").
- Add a missing of_node_put call to fix leaked reference detected by
coccinelle in imx51 machine code.
- Fix imx6q cpuidle driver bug which causes that CPU might not wake up
at expected time.
- Increase reset duration of Ethernet phy Micrel KSZ9031RNX to fix
transmission timeouts error seen on imx6qdl-phytec-pfla02 board.
- Correct SPDX License Identifier style for imx6ull-pinfunc-snvs.h.
- Fix 'bus-witdh' typos in imx6qdl-icore-rqs.dtsi.
- Correct pseudo PHY address of switch device for imx6dl-yapp4 board.
- Update PWM driver options in imx defconfig files due to the change
on driver part.
* tag 'imx-fixes-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
ARM: imx_v4_v5_defconfig: enable PWM driver
ARM: imx_v6_v7_defconfig: continue compiling the pwm driver
ARM: dts: imx6dl-yapp4: Use correct pseudo PHY address for the switch
ARM: dts: imx6qdl: Fix typo in imx6qdl-icore-rqs.dtsi
ARM: dts: imx6ull: Use the correct style for SPDX License Identifier
ARM: dts: pfla02: increase phy reset duration
ARM: imx6q: cpuidle: fix bug that CPU might not wake up at expected time
ARM: imx51: fix a leaked reference by adding missing of_node_put
ARM: dts: imx6dl-yapp4: Use rgmii-id phy mode on the cpu port
On big-endian architectures, the signal masks are differnet
between 32-bit and 64-bit tasks, so we have to use a different
function for reading them from user space.
io_cqring_wait() initially got this wrong, and always interprets
this as a native structure. This is ok on x86 and most arm64,
but not on s390, ppc64be, mips64be, sparc64 and parisc.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This pull request contains Broadcom ARM/ARM64-based SoCs fixes for 5.1,
please pull the following:
- Eric provides fixes for the bcm2835-pm driver: added missing depends
on MFD_CORE for the ARM64 definition of ARCH_BCM2835, fixing error
paths on initialization and fixing the PM_IMAGE_PERI power domain
* tag 'arm-soc/for-5.1/soc-fixes' of https://github.com/Broadcom/stblinux:
arm64: bcm2835: Add missing dependency on MFD_CORE.
soc: bcm: bcm2835-pm: Fix error paths of initialization.
soc: bcm: bcm2835-pm: Fix PM_IMAGE_PERI power domain support.
This pull request contains Broadcom ARM-based SoCs Device Tree fixes for
5.1, please pull the following:
- Helen fixes the HDMI hot-pug detect GPIO polarity for the Rasperry Pi
model B revision 2
* tag 'arm-soc/for-5.1/devicetree-fixes' of https://github.com/Broadcom/stblinux:
ARM: dts: bcm283x: Fix hdmi hpd gpio pull
The SPI DT bindings are for historical reasons a pitfall,
the ability to flag a GPIO line as active high/low with
the second cell flags was introduced later so the SPI
subsystem will only accept the bool flag spi-cs-high
to indicate that the line is active high.
It worked by mistake, but the mistake was corrected
in another commit.
The comment in the DTS file was also misleading: this
CS is indeed active high.
Fixes: cffbb02daf ("ARM: dts: nomadik: Augment NHK15 panel setting")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
allnoconfig build with just ARCH_DAVINCI enabled
fails because drivers/clk/davinci/* depends on
REGMAP being enabled.
Fix it by selecting REGMAP_MMIO when building in
DaVinci support.
Signed-off-by: Sekhar Nori <nsekhar@ti.com>
Reviewed-by: David Lechner <david@lechnology.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Currently PCM core sets each opened stream forcibly to SUSPENDED state
via snd_pcm_suspend_all() call, and the user-space is responsible for
re-triggering the resume manually either via snd_pcm_resume() or
prepare call. The scheme works fine usually, but there are corner
cases where the stream can't be resumed by that call: the streams
still in OPEN state before finishing hw_params. When they are
suspended, user-space cannot perform resume or prepare because they
haven't been set up yet. The only possible recovery is to re-open the
device, which isn't nice at all. Similarly, when a stream is in
DISCONNECTED state, it makes no sense to change it to SUSPENDED
state. Ditto for in SETUP state; which you can re-prepare directly.
So, this patch addresses these issues by filtering the PCM streams to
be suspended by checking the PCM state. When a stream is in either
OPEN, SETUP or DISCONNECTED as well as already SUSPENDED, the suspend
action is skipped.
To be noted, this problem was originally reported for the PCM runtime
PM on HD-audio. And, the runtime PM problem itself was already
addressed (although not intended) by the code refactoring commits
3d21ef0b49 ("ALSA: pcm: Suspend streams globally via device type PM
ops") and 17bc4815de ("ALSA: pci: Remove superfluous
snd_pcm_suspend*() calls"). These commits eliminated the
snd_pcm_suspend*() calls from the runtime PM suspend callback code
path, hence the racy OPEN state won't appear while runtime PM.
(FWIW, the race window is between snd_pcm_open_substream() and the
first power up in azx_pcm_open().)
Although the runtime PM issue was already "fixed", the same problem is
still present for the system PM, hence this patch is still needed.
And for stable trees, this patch alone should suffice for fixing the
runtime PM problem, too.
Reported-and-tested-by: Jon Hunter <jonathanh@nvidia.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Set .owner to prevent module unloading while being used.
Signed-off-by: Axel Lin <axel.lin@ingics.com>
Fixes: d903779b58 ("reset: meson: add meson audio arb driver")
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
The G12A Documentation lacked these 2 reset lines, but they are present and
used for each USB 2 PHYs.
Add them to the dt-bindings for the upcoming USB support.
Fixes: dbfc54534d ("dt-bindings: reset: meson: add g12a bindings")
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
The xfs fstrim implementation uses the free space btrees to find free
space that can be discarded. If we haven't recovered the log, the bnobt
will be stale and we absolutely *cannot* use stale metadata to zap the
underlying storage.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Print the warning about the fall-back to IOMMU_DOMAIN_DMA in
iommu_group_get_for_dev() only when such a domain was
actually allocated.
Otherwise the user will get misleading warnings in the
kernel log when the iommu driver used doesn't support
IOMMU_DOMAIN_DMA and IOMMU_DOMAIN_IDENTITY.
Fixes: fccb4e3b8a ('iommu: Allow default domain type to be set on the kernel command line')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Andreas reported that he was seeing the tdbtorture test fail in some
cases with -EDEADLCK when it wasn't before. Some debugging showed that
deadlock detection was sometimes discovering the caller's lock request
itself in a dependency chain.
While we remove the request from the blocked_lock_hash prior to
reattempting to acquire it, any locks that are blocked on that request
will still be present in the hash and will still have their fl_blocker
pointer set to the current request.
This causes posix_locks_deadlock to find a deadlock dependency chain
when it shouldn't, as a lock request cannot block itself.
We are going to end up waking all of those blocked locks anyway when we
go to reinsert the request back into the blocked_lock_hash, so just do
it prior to checking for deadlocks. This ensures that any lock blocked
on the current request will no longer be part of any blocked request
chain.
URL: https://bugzilla.kernel.org/show_bug.cgi?id=202975
Fixes: 5946c4319e ("fs/locks: allow a lock request to block other requests.")
Cc: stable@vger.kernel.org
Reported-by: Andreas Schneider <asn@redhat.com>
Signed-off-by: Neil Brown <neilb@suse.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Chandan reported that fstests' generic/026 test hit a crash:
BUG: Unable to handle kernel data access at 0xc00000062ac40000
Faulting instruction address: 0xc000000000092240
Oops: Kernel access of bad area, sig: 11 [#1]
LE SMP NR_CPUS=2048 DEBUG_PAGEALLOC NUMA pSeries
CPU: 0 PID: 27828 Comm: chacl Not tainted 5.0.0-rc2-next-20190115-00001-g6de6dba64dda #1
NIP: c000000000092240 LR: c00000000066a55c CTR: 0000000000000000
REGS: c00000062c0c3430 TRAP: 0300 Not tainted (5.0.0-rc2-next-20190115-00001-g6de6dba64dda)
MSR: 8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE> CR: 44000842 XER: 20000000
CFAR: 00007fff7f3108ac DAR: c00000062ac40000 DSISR: 40000000 IRQMASK: 0
GPR00: 0000000000000000 c00000062c0c36c0 c0000000017f4c00 c00000000121a660
GPR04: c00000062ac3fff9 0000000000000004 0000000000000020 00000000275b19c4
GPR08: 000000000000000c 46494c4500000000 5347495f41434c5f c0000000026073a0
GPR12: 0000000000000000 c0000000027a0000 0000000000000000 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: c00000062ea70020 c00000062c0c38d0 0000000000000002 0000000000000002
GPR24: c00000062ac3ffe8 00000000275b19c4 0000000000000001 c00000062ac30000
GPR28: c00000062c0c38d0 c00000062ac30050 c00000062ac30058 0000000000000000
NIP memcmp+0x120/0x690
LR xfs_attr3_leaf_lookup_int+0x53c/0x5b0
Call Trace:
xfs_attr3_leaf_lookup_int+0x78/0x5b0 (unreliable)
xfs_da3_node_lookup_int+0x32c/0x5a0
xfs_attr_node_addname+0x170/0x6b0
xfs_attr_set+0x2ac/0x340
__xfs_set_acl+0xf0/0x230
xfs_set_acl+0xd0/0x160
set_posix_acl+0xc0/0x130
posix_acl_xattr_set+0x68/0x110
__vfs_setxattr+0xa4/0x110
__vfs_setxattr_noperm+0xac/0x240
vfs_setxattr+0x128/0x130
setxattr+0x248/0x600
path_setxattr+0x108/0x120
sys_setxattr+0x28/0x40
system_call+0x5c/0x70
Instruction dump:
7d201c28 7d402428 7c295040 38630008 38840008 408201f0 4200ffe8 2c050000
4182ff6c 20c50008 54c61838 7d201c28 <7d402428> 7d293436 7d4a3436 7c295040
The instruction dump decodes as:
subfic r6,r5,8
rlwinm r6,r6,3,0,28
ldbrx r9,0,r3
ldbrx r10,0,r4 <-
Which shows us doing an 8 byte load from c00000062ac3fff9, which
crosses the page boundary at c00000062ac40000 and faults.
It's not OK for memcmp to read past the end of the source or
destination buffers if that would cross a page boundary, because we
don't know that the next page is mapped.
As pointed out by Segher, we can read past the end of the source or
destination as long as we don't cross a 4K boundary, because that's
our minimum page size on all platforms.
The bug is in the code at the .Lcmp_rest_lt8bytes label. When we get
there we know that s1 is 8-byte aligned and we have at least 1 byte to
read, so a single 8-byte load won't read past the end of s1 and cross
a page boundary.
But we have to be more careful with s2. So check if it's within 8
bytes of a 4K boundary and if so go to the byte-by-byte loop.
Fixes: 2d9ee327ad ("powerpc/64: Align bytes before fall back to .Lshort in powerpc64 memcmp()")
Cc: stable@vger.kernel.org # v4.19+
Reported-by: Chandan Rajendra <chandan@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org>
Tested-by: Chandan Rajendra <chandan@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
They are pointless. As dtc points out:
Warning (avoid_unnecessary_addr_size):
/gpio-keys:
unnecessary #address-cells/#size-cells without "ranges" or child "reg" property
Let's remove them.
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
They are pointless. As dtc points out:
Warning (avoid_unnecessary_addr_size):
/mipi@ff960000:
unnecessary #address-cells/#size-cells without "ranges" or child "reg" property
Let's remove them.
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
The device tree compiler yells like this:
Warning (unit_address_vs_reg):
/gpu-opp-table/opp@100000000:
node has a unit name, but no reg property
Let's match the cpu opp node names and use a dash.
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Make meson_clk_pll_is_better() consider a rate that precisely matches
the requested rate to be better than any previous rate (which was
smaller than the current).
Prior to commit 8eed1db1ad ("clk: meson: pll: update driver for the
g12a") meson_clk_get_pll_settings() returned early (before calling
meson_clk_pll_is_better()) if the rate from the current iteration
matches the requested rate precisely. After this commit
meson_clk_pll_is_better() is called unconditionally. This requires
meson_clk_pll_is_better() to work with the case where "now == rate".
This fixes a hang during boot on Meson8b / Odroid-C1 for me.
Fixes: 8eed1db1ad ("clk: meson: pll: update driver for the g12a")
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Reviewed-by: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Link: https://lkml.kernel.org/r/20190324164327.22590-2-martin.blumenstingl@googlemail.com
This patch is an attempt to limit HDMI 2.0 SCDC setup when :
- the SoC embeds an HDMI 1.4 only controller
- the EDID supports SCDC but not scrambling
- the EDID supports SCDC scrambling but not for low TMDS bit rates,
while only supporting low TMDS bit rates
This to avoid communicating with the SCDC DDC slave uncessary, and
setting the DW-HDMI TMDS Scrambler setup when not supported by the
underlying hardware.
Reported-by: Rob Herring <robh@kernel.org>
Fixes: 264fce6cc2 ("drm/bridge: dw-hdmi: Add SCDC and TMDS Scrambling support")
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Tested-by: Rob Herring <robh@kernel.org>
Reviewed-by: Andrzej Hajda <a.hajda@samsung.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190315095414.28520-1-narmstrong@baylibre.com
meson_drv_unbind() doesn't unregister the IRQ handler, which can lead to
use-after-free if the IRQ fires after unbind:
[ 64.656876] Unable to handle kernel paging request at virtual address ffff000011706dbc
...
[ 64.662001] pc : meson_irq+0x18/0x30 [meson_drm]
I'm assuming that a similar problem could happen on the error path of
bind(), so uninstall the IRQ handler there as well.
Fixes: bbbe775ec5 ("drm: Add support for Amlogic Meson Graphic Controller")
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Acked-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190322152657.13752-2-jean-philippe.brucker@arm.com
meson_drv_bind() registers a meson_drm struct as the device's privdata,
but meson_drv_unbind() tries to retrieve a drm_device. This may cause a
segfault on shutdown:
[ 5194.593429] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000197
...
[ 5194.788850] Call trace:
[ 5194.791349] drm_dev_unregister+0x1c/0x118 [drm]
[ 5194.795848] meson_drv_unbind+0x50/0x78 [meson_drm]
Retrieve the right pointer in meson_drv_unbind().
Fixes: bbbe775ec5 ("drm: Add support for Amlogic Meson Graphic Controller")
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Acked-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190322152657.13752-1-jean-philippe.brucker@arm.com
The throughput_override sysfs file is not below the meshif but below a
hardif. The kobj has therefore not a pointer which can be used to find the
batadv_priv data. The pointer stored in the hardif object must be used
instead to find the correct meshif private data.
Fixes: 7e6f461efe ("batman-adv: Trigger genl notification on sysfs config change")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
When CONFIG_CFG80211 isn't enabled the compiler correcly warns about
'sinfo.pertid' may be unused. It can also happen for other error
conditions that it not warn about.
net/batman-adv/bat_v_elp.c: In function ‘batadv_v_elp_get_throughput.isra.0’:
include/net/cfg80211.h:6370:13: warning: ‘sinfo.pertid’ may be used
uninitialized in this function [-Wmaybe-uninitialized]
kfree(sinfo->pertid);
~~~~~^~~~~~~~
Rework so that we only release '&sinfo' if cfg80211_get_station returns
zero.
Fixes: 7d652669b6 ("batman-adv: release station info tidstats")
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
The batadv_hash_remove is a function which searches the hashtable for an
entry using a needle, a hashtable bucket selection function and a compare
function. It will lock the bucket list and delete an entry when the compare
function matches it with the needle. It returns the pointer to the
hlist_node which matches or NULL when no entry matches the needle.
The batadv_tt_global_free is not itself protected in anyway to avoid that
any other function is modifying the hashtable between the search for the
entry and the call to batadv_hash_remove. It can therefore happen that the
entry either doesn't exist anymore or an entry was deleted which is not the
same object as the needle. In such an situation, the reference counter (for
the reference stored in the hashtable) must not be reduced for the needle.
Instead the reference counter of the actually removed entry has to be
reduced.
Otherwise the reference counter will underflow and the object might be
freed before all its references were dropped. The kref helpers reported
this problem as:
refcount_t: underflow; use-after-free.
Fixes: 7683fdc1e8 ("batman-adv: protect the local and the global trans-tables with rcu")
Reported-by: Martin Weinelt <martin@linuxlounge.net>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Acked-by: Antonio Quartulli <a@unstable.cc>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
The batadv_hash_remove is a function which searches the hashtable for an
entry using a needle, a hashtable bucket selection function and a compare
function. It will lock the bucket list and delete an entry when the compare
function matches it with the needle. It returns the pointer to the
hlist_node which matches or NULL when no entry matches the needle.
The batadv_tt_local_remove is not itself protected in anyway to avoid that
any other function is modifying the hashtable between the search for the
entry and the call to batadv_hash_remove. It can therefore happen that the
entry either doesn't exist anymore or an entry was deleted which is not the
same object as the needle. In such an situation, the reference counter (for
the reference stored in the hashtable) must not be reduced for the needle.
Instead the reference counter of the actually removed entry has to be
reduced.
Otherwise the reference counter will underflow and the object might be
freed before all its references were dropped. The kref helpers reported
this problem as:
refcount_t: underflow; use-after-free.
Fixes: ef72706a05 ("batman-adv: protect tt_local_entry from concurrent delete events")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
The batadv_hash_remove is a function which searches the hashtable for an
entry using a needle, a hashtable bucket selection function and a compare
function. It will lock the bucket list and delete an entry when the compare
function matches it with the needle. It returns the pointer to the
hlist_node which matches or NULL when no entry matches the needle.
The batadv_bla_del_claim is not itself protected in anyway to avoid that
any other function is modifying the hashtable between the search for the
entry and the call to batadv_hash_remove. It can therefore happen that the
entry either doesn't exist anymore or an entry was deleted which is not the
same object as the needle. In such an situation, the reference counter (for
the reference stored in the hashtable) must not be reduced for the needle.
Instead the reference counter of the actually removed entry has to be
reduced.
Otherwise the reference counter will underflow and the object might be
freed before all its references were dropped. The kref helpers reported
this problem as:
refcount_t: underflow; use-after-free.
Fixes: 23721387c4 ("batman-adv: add basic bridge loop avoidance code")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
In case devm_kzalloc, the patch returns ENOMEM to avoid potential
NULL pointer dereference.
Signed-off-by: Kangjie Lu <kjlu@umn.edu>
Reviewed-by: Andrew Jeffery <andrew@aj.id.au>
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
If userspace has open fd(s) when drm_dev_unplug() is run, it will result
in drm_dev_unregister() being called twice. First in drm_dev_unplug() and
then later in drm_release() through the call to drm_put_dev().
Since userspace already holds a ref on drm_device through the drm_minor,
it's not necessary to add extra ref counting based on no open file
handles. Instead just drm_dev_put() unconditionally in drm_dev_unplug().
We now have this:
- Userpace holds a ref on drm_device as long as there's open fd(s)
- The driver holds a ref on drm_device as long as it's bound to the
struct device
When both sides are done with drm_device, it is released.
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
Reviewed-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Sean Paul <sean@poorly.run>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190208140103.28919-2-noralf@tronnes.org
Alexei Starovoitov says:
====================
pull-request: bpf 2019-03-24
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) libbpf verision fix up from Daniel.
2) fix liveness propagation from Jakub.
3) fix verbose print of refcounted regs from Martin.
4) fix for large map allocations from Martynas.
5) fix use after free in sanitize_ptr_alu from Xu.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
First one is fixing version in Makefile and shared object and
second one clarifies bump in version. Thanks!
v1 -> v2:
- Fix up soname, thanks Stanislav!
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The current documentation suggests that we would need to bump the
libbpf version on every change. Lets clarify this a bit more and
reflect what we do today in practice, that is, bumping it once per
development cycle.
Fixes: 76d1b894c5 ("libbpf: Document API and ABI conventions")
Reported-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Even though libbpf's versioning script for the linker (libbpf.map)
is pointing to 0.0.2, the BPF_EXTRAVERSION in the Makefile has
not been updated along with it and is therefore still on 0.0.1.
While fixing up, I also noticed that the generated shared object
versioning information is missing, typical convention is to have
a linker name (libbpf.so), soname (libbpf.so.0) and real name
(libbpf.so.0.0.2) for library management. This is based upon the
LIBBPF_VERSION as well.
The build will then produce the following bpf libraries:
# ll libbpf*
libbpf.a
libbpf.so -> libbpf.so.0.0.2
libbpf.so.0 -> libbpf.so.0.0.2
libbpf.so.0.0.2
# readelf -d libbpf.so.0.0.2 | grep SONAME
0x000000000000000e (SONAME) Library soname: [libbpf.so.0]
And install them accordingly:
# rm -rf /tmp/bld; mkdir /tmp/bld; make -j$(nproc) O=/tmp/bld install
Auto-detecting system features:
... libelf: [ on ]
... bpf: [ on ]
CC /tmp/bld/libbpf.o
CC /tmp/bld/bpf.o
CC /tmp/bld/nlattr.o
CC /tmp/bld/btf.o
CC /tmp/bld/libbpf_errno.o
CC /tmp/bld/str_error.o
CC /tmp/bld/netlink.o
CC /tmp/bld/bpf_prog_linfo.o
CC /tmp/bld/libbpf_probes.o
CC /tmp/bld/xsk.o
LD /tmp/bld/libbpf-in.o
LINK /tmp/bld/libbpf.a
LINK /tmp/bld/libbpf.so.0.0.2
LINK /tmp/bld/test_libbpf
INSTALL /tmp/bld/libbpf.a
INSTALL /tmp/bld/libbpf.so.0.0.2
# ll /usr/local/lib64/libbpf.*
/usr/local/lib64/libbpf.a
/usr/local/lib64/libbpf.so -> libbpf.so.0.0.2
/usr/local/lib64/libbpf.so.0 -> libbpf.so.0.0.2
/usr/local/lib64/libbpf.so.0.0.2
Fixes: 1bf4b05810 ("tools: bpftool: add probes for eBPF program types")
Fixes: 1b76c13e4b ("bpf tools: Introduce 'bpf' library and add bpf feature check")
Reported-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
If the downscaling fails and we end up with a best_depth of 0,
then ignore it.
This actually works around a cascade of failure, but it the
simplest fix for now.
The scaling patch broke the udl driver, as the udl driver doesn't
expose planes at all, so gets the two default 32-bit formats, but
the udl driver then ask for 16bpp fbdev, and the scaling code falls
over.
This fixes the udl driver since the scaled depth support was added.
Fixes: f4bd542bca ("drm/fb-helper: Scale back depth to supported maximum")
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190315014621.21816-2-airlied@gmail.com
The desired channel has to be selected in order to correctly fill the
buffer with the corresponding data.
The `ad_sd_write_reg()` already does this, but for the
`ad_sd_read_reg_raw()` this was omitted.
Fixes: af3008485e ("iio:adc: Add common code for ADI Sigma Delta devices")
Signed-off-by: Dragos Bogdan <dragos.bogdan@analog.com>
Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
For now, blk_mq_hctx_has_pending() checks any of ctx, hctx->dispatch
or io scheduler have pending work. So, update the comment accordingly.
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Expect arguments, blk_mq_put_driver_tag_hctx() and blk_mq_put_driver_tag()
is same. We can just use argument 'request' to put tag by blk_mq_put_driver_tag().
Then we can remove the unused blk_mq_put_driver_tag_hctx().
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Because DMA #0 is now used by the user, remove the limitation of credits
from this channel. Without this patch, this channel is pretty much
unusable due to its very low bandwidth configuration.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
clang produces a false-positive warning as it fails to notice
that "lost = true" implies that "ret" is initialized:
net/rxrpc/output.c:402:6: error: variable 'ret' is used uninitialized whenever 'if' condition is true [-Werror,-Wsometimes-uninitialized]
if (lost)
^~~~
net/rxrpc/output.c:437:6: note: uninitialized use occurs here
if (ret >= 0) {
^~~
net/rxrpc/output.c:402:2: note: remove the 'if' if its condition is always false
if (lost)
^~~~~~~~~
net/rxrpc/output.c:339:9: note: initialize the variable 'ret' to silence this warning
int ret, opt;
^
= 0
Rearrange the code to make that more obvious and avoid the warning.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When checking the code with clang -Wsometimes-uninitialized we get the
following warning:
if (!tipc_link_is_establishing(l)) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
net/tipc/node.c:847:46: note: uninitialized use occurs here
tipc_bearer_xmit(n->net, bearer_id, &xmitq, maddr);
net/tipc/node.c:831:2: note: remove the 'if' if its condition is always
true
if (!tipc_link_is_establishing(l)) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
net/tipc/node.c:821:31: note: initialize the variable 'maddr' to silence
this warning
struct tipc_media_addr *maddr;
We fix this by initializing 'maddr' to NULL. For the matter of clarity,
we also test if 'xmitq' is non-empty before we use it and 'maddr'
further down in the function. It will never happen that 'xmitq' is non-
empty at the same time as 'maddr' is NULL, so this is a sufficient test.
Fixes: 598411d70f ("tipc: make resetting of links non-atomic")
Reported-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A new mirred action is created by the tcf_mirred_init function. This
contains a list head struct which is inserted into a global list on
successful creation of a new action. However, after a creation, it is
still possible to error out and call the tcf_idr_release function. This,
in turn, calls the act_mirr cleanup function via __tcf_idr_release and
__tcf_action_put. This cleanup function tries to delete the list entry
which is as yet uninitialised, leading to a NULL pointer exception.
Fix this by initialising the list entry on creation of a new action.
Bug report:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
PGD 8000000840c73067 P4D 8000000840c73067 PUD 858dcc067 PMD 0
Oops: 0002 [#1] SMP PTI
CPU: 32 PID: 5636 Comm: handler194 Tainted: G OE 5.0.0+ #186
Hardware name: Dell Inc. PowerEdge R730/0599V5, BIOS 1.3.6 06/03/2015
RIP: 0010:tcf_mirred_release+0x42/0xa7 [act_mirred]
Code: f0 90 39 c0 e8 52 04 57 c8 48 c7 c7 b8 80 39 c0 e8 94 fa d4 c7 48 8b 93 d0 00 00 00 48 8b 83 d8 00 00 00 48 c7 c7 f0 90 39 c0 <48> 89 42 08 48 89 10 48 b8 00 01 00 00 00 00 ad de 48 89 83 d0 00
RSP: 0018:ffffac4aa059f688 EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff9dcd1b214d00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9dcd1fa165f8 RDI: ffffffffc03990f0
RBP: ffff9dccf9c7af80 R08: 0000000000000a3b R09: 0000000000000000
R10: ffff9dccfa11f420 R11: 0000000000000000 R12: 0000000000000001
R13: ffff9dcd16b433c0 R14: ffff9dcd1b214d80 R15: 0000000000000000
FS: 00007f441bfff700(0000) GS:ffff9dcd1fa00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 0000000839e64004 CR4: 00000000001606e0
Call Trace:
tcf_action_cleanup+0x59/0xca
__tcf_action_put+0x54/0x6b
__tcf_idr_release.cold.33+0x9/0x12
tcf_mirred_init.cold.20+0x22e/0x3b0 [act_mirred]
tcf_action_init_1+0x3d0/0x4c0
tcf_action_init+0x9c/0x130
tcf_exts_validate+0xab/0xc0
fl_change+0x1ca/0x982 [cls_flower]
tc_new_tfilter+0x647/0x8d0
? load_balance+0x14b/0x9e0
rtnetlink_rcv_msg+0xe3/0x370
? __switch_to_asm+0x40/0x70
? __switch_to_asm+0x34/0x70
? _cond_resched+0x15/0x30
? __kmalloc_node_track_caller+0x1d4/0x2b0
? rtnl_calcit.isra.31+0xf0/0xf0
netlink_rcv_skb+0x49/0x110
netlink_unicast+0x16f/0x210
netlink_sendmsg+0x1df/0x390
sock_sendmsg+0x36/0x40
___sys_sendmsg+0x27b/0x2c0
? futex_wake+0x80/0x140
? do_futex+0x2b9/0xac0
? ep_scan_ready_list.constprop.22+0x1f2/0x210
? ep_poll+0x7a/0x430
__sys_sendmsg+0x47/0x80
do_syscall_64+0x55/0x100
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: 4e232818bd ("net: sched: act_mirred: remove dependency on rtnl lock")
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bartek reported that after few cable unplug/replug cycles suddenly
replug isn't detected any longer. His system uses a RTL8106, I wasn't
able to reproduce the issue with RTL8168g. According to his bisect
the referenced commit caused the regression. As Realtek doesn't
release datasheets or errata it's hard to say what's the actual root
cause, but this change was reported to fix the issue.
Fixes: 38caff5a44 ("r8169: handle all interrupt events in the hard irq handler")
Reported-by: Bartosz Skrzypczak <barteks2x@gmail.com>
Suggested-by: Bartosz Skrzypczak <barteks2x@gmail.com>
Tested-by: Bartosz Skrzypczak <barteks2x@gmail.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The call to of_get_child_by_name returns a node pointer with refcount
incremented thus it must be explicitly decremented after the last
usage.
Detected by coccinelle with the following warnings:
./drivers/net/ethernet/ti/netcp_ethss.c:3661:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 3654, but without a corresponding object release within this function.
./drivers/net/ethernet/ti/netcp_ethss.c:3665:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 3654, but without a corresponding object release within this function.
Signed-off-by: Wen Yang <wen.yang99@zte.com.cn>
Cc: Wingman Kwok <w-kwok2@ti.com>
Cc: Murali Karicheri <m-karicheri2@ti.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
The call to ehea_get_eth_dn returns a node pointer with refcount
incremented thus it must be explicitly decremented after the last
usage.
Detected by coccinelle with the following warnings:
./drivers/net/ethernet/ibm/ehea/ehea_main.c:3163:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 3154, but without a corresponding object release within this function.
Signed-off-by: Wen Yang <wen.yang99@zte.com.cn>
Cc: Douglas Miller <dougmill@linux.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
After 90eff9096c ("net: phy: Allow splitting MDIO bus/device support
from PHYs") the various MDIO bus drivers were no longer parented with
config PHYLIB but with config MDIO_BUS which is not a menuconfig, fix
this by depending on MDIO_DEVICE which is a menuconfig.
This is visually nicer and less confusing for users.
Fixes: 90eff9096c ("net: phy: Allow splitting MDIO bus/device support from PHYs")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
During a read failover, we may end up changing the value of
the pgio_mirror_idx, so make sure that we record the layout
stats before that update.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Specifying a retrans=0 mount parameter to a NFS/TCP mount, is
inadvertently causing the NFS client to rewrite any specified
timeout parameter to the default of 60 seconds.
Fixes: a956beda19 ("NFS: Allow the mount option retrans=0")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
If the transport is still connected, then we do want to allow
RPC_SOFTCONN tasks to retry. They should time out if and only if
the connection is broken.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
GPIO fixes for v5.1-rc2
- check the return value of a function that can fail in gpio-exar
- fix the SPDX identifier in amd-fch
- fix the direction_input callback in gpio-adnp
In case kmemdup fails, the fix goes to blk_err to avoid NULL
pointer dereference.
Signed-off-by: Kangjie Lu <kjlu@umn.edu>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The dynamic-debug statements for command payload output only get emitted
when the command is not ND_CMD_CALL. Move the output payload dumping
ahead of the early return path for ND_CMD_CALL.
Fixes: 31eca76ba2 ("...whitelisted dimm command marshaling mechanism")
Reported-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Correctly map the regulators used by tlv320aic3106.
Both 1.8V and 3.3V for the codec is derived from VBAT via fixed regulators.
Cc: <Stable@vger.kernel.org> # v4.14+
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Correctly map the regulators used by tlv320aic3106.
Both 1.8V and 3.3V for the codec is derived from VBAT via fixed regulators.
Cc: <Stable@vger.kernel.org> # v4.14+
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Add an of_node_put when a tested device node is not available.
The semantic patch that fixes this problem is as follows
(http://coccinelle.lip6.fr):
// <smpl>
@@
identifier f;
local idexpression e;
expression x;
@@
e = f(...);
... when != of_node_put(e)
when != x = e
when != e = x
when any
if (<+...of_device_is_available(e)...+>) {
... when != of_node_put(e)
(
return e;
|
+ of_node_put(e);
return ...;
)
}
// </smpl>
Fixes: e0c827aca0 ("drm/omap: Populate DSS children in omapdss driver")
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Tony Lindgren <tony@atomide.com>
In order to request dynamic allocationn of GPIO IDs, a negative number
should be passed as a base GPIO ID via platform data. Unfortuntely,
commit 771e53c4d1 ("ARM: OMAP1: ams-delta: Drop board specific global
GPIO numbers") didn't follow that rule while switching to dynamically
allocated GPIO IDs for Amstrad Delta latches, making their IDs
overlapping with those already assigned to OMAP GPIO devices. Fix it.
Fixes: 771e53c4d1 ("ARM: OMAP1: ams-delta: Drop board specific global GPIO numbers")
Signed-off-by: Janusz Krzysztofik <jmkrzyszt@gmail.com>
Cc: stable@vger.kernel.org
Acked-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Signed-off-by: Tony Lindgren <tony@atomide.com>
ca0132 codec driver loads the firmware selectively depending on the
model in addition to the fallback of the default firmware. The code
works good, but a minor problem is that the current code seems
confusing for Clang where it spews a warning about uninitialized
variable.
This patch simplifies the code flow for such a false-positive
warning. After this refactoring, the ca0132_spec.alt_firmware_present
field is no longer used, hence it's eliminated as well.
Reported-and-tested-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Commit 2f31a67f01 ("usb: xhci: Prevent bus suspend if a port connect
change or polling state is detected") was intended to prevent ports that
were still link training from being forced to U3 suspend state mid
enumeration.
This solved enumeration issues for devices with slow link training.
Turns out some devices are stuck in the link training/polling state,
and thus that patch will prevent suspend completely for these devices.
This is seen with USB3 card readers in some MacBooks.
Instead of preventing suspend, give some time to complete the link
training. On successful training the port will end up as connected
and enabled.
If port instead is stuck in link training the bus suspend will continue
suspending after 360ms (10 * 36ms) timeout (tPollingLFPSTimeout).
Original patch was sent to stable, this one should go there as well
Fixes: 2f31a67f01 ("usb: xhci: Prevent bus suspend if a port connect change or polling state is detected")
Cc: stable@vger.kernel.org
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The xhci debug capability (DbC) feature did its memory cleanup with
spinlock held. dma_free_coherent() warns if called with interrupts
disabled
move the memory cleanup outside the spinlock
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A suspended SS port in U3 link state will go to U0 when resumed, but
can almost immediately after that enter U1 or U2 link power save
states before host controller driver reads the port status.
Host controller driver only checks for U0 state, and might miss
the finished resume, leaving flags unclear and skip notifying usb
code of the wake.
Add U1 and U2 to the possible link states when checking for finished
port resume.
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
spdxcheck.py complains:
include/linux/platform_data/gpio/gpio-amd-fch.h: 1:28 Invalid License ID: GPL+
which is correct because GPL+ is not a valid identifier. Of course this
could have been caught by checkpatch.pl _before_ submitting or merging the
patch.
WARNING: 'SPDX-License-Identifier: GPL+ */' is not supported in LICENSES/...
#271: FILE: include/linux/platform_data/gpio/gpio-amd-fch.h:1:
+/* SPDX-License-Identifier: GPL+ */
Fix it under the assumption that the author meant GPL-2.0+, which makes
sense as the corresponding C file is using that identifier.
Fixes: e09d168f13 ("gpio: AMD G-Series PCH gpio driver")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Current code test wrong value so it does not verify if the written
data is correctly read back. Fix it.
Also make it return -EPERM if read value does not match written bit,
just like it done for adnp_gpio_direction_output().
Fixes: 5e969a401a ("gpio: Add Avionic Design N-bit GPIO expander support")
Cc: <stable@vger.kernel.org>
Signed-off-by: Axel Lin <axel.lin@ingics.com>
Reviewed-by: Thierry Reding <thierry.reding@gmail.com>
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
ida_simple_get may fail and return a negative error number.
The fix checks its return value; if it fails, go to err_destroy.
Cc: <stable@vger.kernel.org>
Signed-off-by: Kangjie Lu <kjlu@umn.edu>
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
The PCM OSS emulation converts and transfers the data on the fly via
"plugins". The data is converted over the dynamically allocated
buffer for each plugin, and recently syzkaller caught OOB in this
flow.
Although the bisection by syzbot pointed out to the commit
65766ee0bf ("ALSA: oss: Use kvzalloc() for local buffer
allocations"), this is merely a commit to replace vmalloc() with
kvmalloc(), hence it can't be the cause. The further debug action
revealed that this happens in the case where a slave PCM doesn't
support only the stereo channels while the OSS stream is set up for a
mono channel. Below is a brief explanation:
At each OSS parameter change, the driver sets up the PCM hw_params
again in snd_pcm_oss_change_params_lock(). This is also the place
where plugins are created and local buffers are allocated. The
problem is that the plugins are created before the final hw_params is
determined. Namely, two snd_pcm_hw_param_near() calls for setting the
period size and periods may influence on the final result of channels,
rates, etc, too, while the current code has already created plugins
beforehand with the premature values. So, the plugin believes that
channels=1, while the actual I/O is with channels=2, which makes the
driver reading/writing over the allocated buffer size.
The fix is simply to move the plugin allocation code after the final
hw_params call.
Reported-by: syzbot+d4503ae45b65c5bc1194@syzkaller.appspotmail.com
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
When CONFIG_IOMMU_API isn't set the following warnings pops up:
drivers/gpu/drm/tegra/vic.c: In function ‘vic_boot’:
drivers/gpu/drm/tegra/vic.c:110:31: error: implicit declaration of function ‘dev_iommu_fwspec_get’; did you mean ‘iommu_fwspec_free’? [-Werror=implicit-function-declaration]
struct iommu_fwspec *spec = dev_iommu_fwspec_get(vic->dev);
^~~~~~~~~~~~~~~~~~~~
iommu_fwspec_free
drivers/gpu/drm/tegra/vic.c:110:31: warning: initialization of ‘struct iommu_fwspec *’ from ‘int’ makes pointer from integer without a cast [-Wint-conversion]
drivers/gpu/drm/tegra/vic.c:117:19: error: ‘struct iommu_fwspec’ has no member named ‘num_ids’
if (spec && spec->num_ids > 0) {
^~
drivers/gpu/drm/tegra/vic.c:118:16: error: ‘struct iommu_fwspec’ has no member named ‘ids’
value = spec->ids[0] & 0xffff;
^~
Rework so that its inside a '#ifdef CONFIG_IOMMU_API' block.
Fixes: f3779cb190 ("drm/tegra: vic: Support stream ID register programming")
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Thierry Reding <treding@nvidia.com>
In case of alive interrupt timeout or any failure in the init flow
the driver generates FW nmi. The driver assumes that the nmi will
generate SW interrupt. This assumption does not hold and leads to faulty
behavior in the recovery flow.
Solve this by using sync nmi, this way, even if the driver does not
receive SW interrupt, it still starts the recovery flow.
Also remove the wait queue from iwl_fwrt_stop_device since the driver is
handling the SW interrupt synchronously.
Signed-off-by: Shahar S Matityahu <shahar.s.matityahu@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
The driver initiates the size value with the size of the struct and then
adds the size of the data and checks if the size is zero so size can not
be equal to zero.
Solve this by getting the data size, check that it is not equal to zero
and only then add the struct size.
Signed-off-by: Shahar S Matityahu <shahar.s.matityahu@intel.com>
Fixes: 7a14c23dcd ("iwlwifi: dbg: dump data according to the new ini TLVs")
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
In case the driver fails to dump a memory region, and this is the last
region, then partial region would be extracted.
Solve this by setting the data to zero in case of failure.
This will cause dump to be a list of consecutive successful memory
regions and trailing zeros with no partial memories extracted.
Signed-off-by: Shahar S Matityahu <shahar.s.matityahu@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
In the old days, we could transmit with HW crypto with an arbitrary
key by filling it into TX_CMD. This was broken first with the advent
of CCMP/GCMP-256 keys which don't fit there.
This was broken *again* with the newer TX_CMD format on 22560+,
where we simply cannot pass key material anymore. However, we forgot
to update all the cases when we get a key from mac80211 and don't
program it into the hardware but still return 0 for HW crypto on TX.
In AP mode with WEP, we tried to fix this by programming the keys
separately for each station later, but this ultimately turns out to
be buggy, for example now it leaks memory when we have more than one
WEP key.
Fix this by simply using only SW crypto for WEP in newer devices by
returning -EOPNOTSUPP instead of trying to program WEP keys later.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Commit 7640ead939 ("bpf: verifier: make sure callees don't prune
with caller differences") connected up parentage chains of all
frames of the stack. It didn't, however, ensure propagate_liveness()
propagates all liveness information along those chains.
This means pruning happening in the callee may generate explored
states with incomplete liveness for the chains in lower frames
of the stack.
The included selftest is similar to the prior one from commit
7640ead939 ("bpf: verifier: make sure callees don't prune with
caller differences"), where callee would prune regardless of the
difference in r8 state.
Now we also initialize r9 to 0 or 1 based on a result from get_random().
r9 is never read so the walk with r9 = 0 gets pruned (correctly) after
the walk with r9 = 1 completes.
The selftest is so arranged that the pruning will happen in the
callee. Since callee does not propagate read marks of r8, the
explored state at the pruning point prior to the callee will
now ignore r8.
Propagate liveness on all frames of the stack when pruning.
Fixes: f4d7e40a5b ("bpf: introduce function calls (verification)")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
While there is no mainline board that makes use of the PWM still enable the
driver for it to increase compile test coverage.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
After the pwm-imx driver was split into two drivers and the Kconfig symbol
changed accordingly, use the new name to continue being able to use the
PWM hardware.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
The switch is accessible through pseudo PHY which is located at 0x10.
Signed-off-by: Michal Vokáč <michal.vokac@ysoft.com>
Fixes: 87489ec3a7 ("ARM: dts: imx: Add Y Soft IOTA Draco, Hydra and Ursa boards")
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
When there is only one byte in a frag, the current calculation
using "(size + HNS3_MAX_BD_SIZE - 1) >> HNS3_MAX_BD_SIZE_OFFSET"
will return zero, because HNS3_MAX_BD_SIZE is 65535 and
HNS3_MAX_BD_SIZE_OFFSET is 16. So it will cause tx error when
a frag's size is one byte.
This patch fixes it by using DIV_ROUND_UP.
Fixes: 3fe13ed95d ("net: hns3: avoid mult + div op in critical data path")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As it stands if a shrink is delayed because of an outstanding
rehash, we will go into a rescheduling loop without ever doing
the rehash.
This patch fixes this by still carrying out the rehash and then
rescheduling so that we can shrink after the completion of the
rehash should it still be necessary.
The return value of EEXIST captures this case and other cases
(e.g., another thread expanded/rehashed the table at the same
time) where we should still proceed with the rehash.
Fixes: da20420f83 ("rhashtable: Add nested tables")
Reported-by: Josh Elsasser <jelsasser@appneta.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Josh Elsasser <jelsasser@appneta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Davide Caratti says:
====================
net/sched: validate the control action with all the other parameters
currently, the kernel checks for bad values of the control action in
tcf_action_init_1(), after a successful call to the action's init()
function. When the control action is 'goto chain', this causes two
undesired behaviors:
1. "misconfigured action after replace that causes kernel crash":
if users replace a valid TC action with another one having invalid
control action, all the new configuration data (including the bad
control action) are applied successfully, even if the kernel returned
an error. As a consequence, it's possible to trigger a NULL pointer
dereference in the traffic path of every TC action (1), replacing the
control action with 'goto chain x', when chain <x> doesn't exist.
2. "refcount leak that makes kmemleak complain"
when a valid 'goto chain' action is overwritten with another action,
the kernel forgets to decrease refcounts in the chain.
The above problems can be fixed if we validate the control action in each
action's init() function, the same way as we are already doing for all the
other configuration parameters.
Now that chains can be released after an action is replaced, we need to
care about concurrent access of 'goto_chain' pointer: ensure we access it
through RCU, like we did with most action-specific configuration parameters.
- Patch 1 removes the wrong checks and provides functions that can be
used to properly validate control actions in individual actions
- Patch 2 to 16 fix individual actions, and add TDC selftest code to
verify the correct behavior (2)
- Patch 17 and 18 fix concurrent access issues on 'goto_chain', that can be
observed after the chain refcount leak is fixed.
Changes since v1:
- reword the cover letter
- condense the extack message in case tc_action_check_ctrlact() is called
with invalid parameters.
- add tcf_action_set_ctrlact() to avoid code duplication an make the
RCU-ification of 'goto_chain' easier.
- fix errors in act_ife, act_simple, act_skbedit, and avoid useless 'goto
end' in act_connmark, thanks a lot to Vlad Buslov.
- avoid dereferencing 'goto_chain' in tcf_gact_goto_chain_index(), so
we don't have to care about the grace period there.
- let actions respect the grace period when they release chains, thanks
to Cong Wang and Vlad Buslov.
Changes since RFC:
- include a fix for all TC actions
- add a selftest for each TC action
- squash fix for refcount leaks into a single patch, the first in the
series, thanks to Cong Wang
- ensure that chain refcount is released without tcfa_lock held, thanks
to Vlad Buslov
Notes:
(1) act_ipt didn't need any fix, as the control action is constantly equal
to TC_ACT_OK.
(2) the selftest for act_simple fails because userspace tc backend for
'simple' does not parse the control action correctly (and hardcodes it
to TC_ACT_PIPE).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
use RCU when accessing the action chain, to avoid use after free in the
traffic path when 'goto chain' is replaced on existing TC actions (see
script below). Since the control action is read in the traffic path
without holding the action spinlock, we need to explicitly ensure that
a->goto_chain is not NULL before dereferencing (i.e it's not sufficient
to rely on the value of TC_ACT_GOTO_CHAIN bits). Not doing so caused NULL
dereferences in tcf_action_goto_chain_exec() when the following script:
# tc chain add dev dd0 chain 42 ingress protocol ip flower \
> ip_proto udp action pass index 4
# tc filter add dev dd0 ingress protocol ip flower \
> ip_proto udp action csum udp goto chain 42 index 66
# tc chain del dev dd0 chain 42 ingress
(start UDP traffic towards dd0)
# tc action replace action csum udp pass index 66
was run repeatedly for several hours.
Suggested-by: Cong Wang <xiyou.wangcong@gmail.com>
Suggested-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
callers of tcf_gact_goto_chain_index() can potentially read an old value
of the chain index, or even dereference a NULL 'goto_chain' pointer,
because 'goto_chain' and 'tcfa_action' are read in the traffic path
without caring of concurrent write in the control path. The most recent
value of chain index can be read also from a->tcfa_action (it's encoded
there together with TC_ACT_GOTO_CHAIN bits), so we don't really need to
dereference 'goto_chain': just read the chain id from the control action.
Fixes: e457d86ada ("net: sched: add couple of goto_chain helpers")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- pass a pointer to struct tcf_proto in each actions's init() handler,
to allow validating the control action, checking whether the chain
exists and (eventually) refcounting it.
- remove code that validates the control action after a successful call
to the action's init() handler, and replace it with a test that forbids
addition of actions having 'goto_chain' and NULL goto_chain pointer at
the same time.
- add tcf_action_check_ctrlact(), that will validate the control action
and eventually allocate the action 'goto_chain' within the init()
handler.
- add tcf_action_set_ctrlact(), that will assign the control action and
swap the current 'goto_chain' pointer with the new given one.
This disallows 'goto_chain' on actions that don't initialize it properly
in their init() handler, i.e. calling tcf_action_check_ctrlact() after
successful IDR reservation and then calling tcf_action_set_ctrlact()
to assign 'goto_chain' and 'tcf_action' consistently.
By doing this, the kernel does not leak anymore refcounts when a valid
'goto chain' handle is replaced in TC actions, causing kmemleak splats
like the following one:
# tc chain add dev dd0 chain 42 ingress protocol ip flower \
> ip_proto tcp action drop
# tc chain add dev dd0 chain 43 ingress protocol ip flower \
> ip_proto udp action drop
# tc filter add dev dd0 ingress matchall \
> action gact goto chain 42 index 66
# tc filter replace dev dd0 ingress matchall \
> action gact goto chain 43 index 66
# echo scan >/sys/kernel/debug/kmemleak
<...>
unreferenced object 0xffff93c0ee09f000 (size 1024):
comm "tc", pid 2565, jiffies 4295339808 (age 65.426s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 08 00 06 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<000000009b63f92d>] tc_ctl_chain+0x3d2/0x4c0
[<00000000683a8d72>] rtnetlink_rcv_msg+0x263/0x2d0
[<00000000ddd88f8e>] netlink_rcv_skb+0x4a/0x110
[<000000006126a348>] netlink_unicast+0x1a0/0x250
[<00000000b3340877>] netlink_sendmsg+0x2c1/0x3c0
[<00000000a25a2171>] sock_sendmsg+0x36/0x40
[<00000000f19ee1ec>] ___sys_sendmsg+0x280/0x2f0
[<00000000d0422042>] __sys_sendmsg+0x5e/0xa0
[<000000007a6c61f9>] do_syscall_64+0x5b/0x180
[<00000000ccd07542>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[<0000000013eaa334>] 0xffffffffffffffff
Fixes: db50514f9a ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f4 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Both PCLK and HCLK are "required" clocks according to macb devicetree
documentation. There is a chance that devm_clk_get doesn't return a
negative error but just a NULL clock structure instead. In such a case
the driver proceeds as usual and uses pclk value 0 to calculate MDC
divisor which is incorrect. Hence fix the same in clock initialization.
Signed-off-by: Harini Katakam <harini.katakam@xilinx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the DP83825I ethernet PHY to the DP83822 driver.
These devices share the same WoL register bits and addresses.
The phy_driver init was made into a macro as there may be future
devices appended to this driver that will share the register space.
http://www.ti.com/lit/gpn/dp83825i
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Dan Murphy <dmurphy@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[Why]
Black screens or artifacting can occur when enabling FreeSync outside
of the supported range of the monitor. This can happen since the
supported range isn't always the min/max vrefresh range available for
the monitor.
[How]
There was previously a fix that prevented this from happening in the
low range but it didn't cover the upper range. Expand the condition
to include both.
Cc: Sun peng Li <Sunpeng.Li@amd.com>
Cc: Harry Wentland <Harry.Wentland@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for your net tree:
1) Remove a direct dependency with IPv6 introduced by the
sip_external_media feature, from Alin Nastac.
2) Fix bogus ENOENT when removing interval elements from set.
3) Set transport_header from br_netfilter to mimic the stack
behaviour, this partially fixes a checksum validation bug
from the SCTP connection tracking, from Xin Long.
4) Fix undefined reference to symbol in xt_TEE, due to missing
Kconfig dependencies, from Arnd Bergmann.
5) Check for NULL in skb_header_pointer() calls in ip6t_shr,
from Kangjie Lu.
6) Fix bogus EBUSY when removing an existing conntrack helper from
a transaction.
7) Fix module autoload of the redirect extension.
8) Remove duplicated transition in flowtable diagram in the existing
documentation.
9) Missing .release_ops call from error path in newrule() which
results module refcount leak, from Taehee Yoo.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
In genl_register_family(), when idr_alloc() fails,
we forget to free the memory we possibly allocate for
family->attrbuf.
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: 2ae0f17df1 ("genetlink: use idr to track families")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When cancelling a subscription, we have to clear the cancel bit in the
request before iterating over any established subscriptions with memcmp.
Otherwise no subscription will ever be found, and it will not be
possible to explicitly unsubscribe individual subscriptions.
Fixes: 8985ecc7c1 ("tipc: simplify endianness handling in topology subscriber")
Signed-off-by: Erik Hugne <erik.hugne@gmail.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The gmac ethernet driver uses the "altr,sysmgr-syscon" property to
configure phy settings for the gmac controller.
Add the "altr,sysmgr-syscon" property to all gmac nodes.
This patch fixes:
[ 0.917530] socfpga-dwmac ff800000.ethernet: No sysmgr-syscon node found
[ 0.924209] socfpga-dwmac ff800000.ethernet: Unable to parse OF data
Cc: stable@vger.kernel.org
Reported-by: Ley Foon Tan <ley.foon.tan@intel.com>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
There is very low possibility ( < 0.1% ) that channel swap happened
in beginning when multi output/input pin is enabled. The issue is
that hardware can't send data to correct pin in the beginning with
the normal enable flow.
This is hardware issue, but there is no errata, the workaround flow
is that: Each time playback/recording, firstly clear the xSMA/xSMB,
then enable TE/RE, then enable xSMB and xSMA (xSMB must be enabled
before xSMA). Which is to use the xSMA as the trigger start register,
previously the xCR_TE or xCR_RE is the bit for starting.
Fixes commit 43d24e76b6 ("ASoC: fsl_esai: Add ESAI CPU DAI driver")
Cc: <stable@vger.kernel.org>
Reviewed-by: Fabio Estevam <festevam@gmail.com>
Acked-by: Nicolin Chen <nicoleotsuka@gmail.com>
Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
There is a constraint for the channel number setting on the
asrc of older version (e.g. imx35), the channel number should
be even, odd number isn't valid.
So add this constraint when the asrc of older version is used.
Acked-by: Nicolin Chen <nicoleotsuka@gmail.com>
Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
The CS4270 does not by default increment the register address on
consecutive writes. During normal operation it doesn't matter as all
register accesses are done individually. At resume time after suspend,
however, the regcache code gathers the biggest possible block of
registers to sync and sends them one on one go.
To fix this, set the INCR bit in all cases.
Signed-off-by: Daniel Mack <daniel@zonque.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
dev is indirectly controlled by user-space, hence leading to
a potential exploitation of the Spectre variant 1 vulnerability.
This issue was detected with the help of Smatch:
sound/core/seq/oss/seq_oss_synth.c:626 snd_seq_oss_synth_make_info() warn: potential spectre issue 'dp->synths' [w] (local cap)
Fix this by sanitizing dev before using it to index dp->synths.
Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].
[1] https://lore.kernel.org/lkml/20180423164740.GY17484@dhcp22.suse.cz/
Cc: stable@vger.kernel.org
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
info->stream is indirectly controlled by user-space, hence leading to
a potential exploitation of the Spectre variant 1 vulnerability.
This issue was detected with the help of Smatch:
sound/core/rawmidi.c:604 __snd_rawmidi_info_select() warn: potential spectre issue 'rmidi->streams' [r] (local cap)
Fix this by sanitizing info->stream before using it to index
rmidi->streams.
Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].
[1] https://lore.kernel.org/lkml/20180423164740.GY17484@dhcp22.suse.cz/
Cc: stable@vger.kernel.org
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Removed info log-message if ipip tunnel registration fails during
module-initialization: it adds nothing to the error message that is
written on all failures.
Fixes: dd9ee34440 ("vti4: Fix a ipip packet processing bug in 'IPCOMP' virtual tunnel")
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
If tunnel registration failed during module initialization, the module
would fail to deregister the IPPROTO_COMP protocol and would attempt to
deregister the tunnel.
The tunnel was not deregistered during module-exit.
Fixes: dd9ee34440 ("vti4: Fix a ipip packet processing bug in 'IPCOMP' virtual tunnel")
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Syzkaller hit 'KASAN: use-after-free Write in sanitize_ptr_alu' bug.
Call trace:
dump_stack+0xbf/0x12e
print_address_description+0x6a/0x280
kasan_report+0x237/0x360
sanitize_ptr_alu+0x85a/0x8d0
adjust_ptr_min_max_vals+0x8f2/0x1ca0
adjust_reg_min_max_vals+0x8ed/0x22e0
do_check+0x1ca6/0x5d00
bpf_check+0x9ca/0x2570
bpf_prog_load+0xc91/0x1030
__se_sys_bpf+0x61e/0x1f00
do_syscall_64+0xc8/0x550
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Fault injection trace:
kfree+0xea/0x290
free_func_state+0x4a/0x60
free_verifier_state+0x61/0xe0
push_stack+0x216/0x2f0 <- inject failslab
sanitize_ptr_alu+0x2b1/0x8d0
adjust_ptr_min_max_vals+0x8f2/0x1ca0
adjust_reg_min_max_vals+0x8ed/0x22e0
do_check+0x1ca6/0x5d00
bpf_check+0x9ca/0x2570
bpf_prog_load+0xc91/0x1030
__se_sys_bpf+0x61e/0x1f00
do_syscall_64+0xc8/0x550
entry_SYSCALL_64_after_hwframe+0x49/0xbe
When kzalloc() fails in push_stack(), free_verifier_state() will free
current verifier state. As push_stack() returns, dst_reg was restored
if ptr_is_dst_reg is false. However, as member of the cur_state,
dst_reg is also freed, and error occurs when dereferencing dst_reg.
Simply fix it by testing ret of push_stack() before restoring dst_reg.
Fixes: 979d63d50c ("bpf: prevent out of bounds speculation on pointer arithmetic")
Signed-off-by: Xu Yu <xuyu@linux.alibaba.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
phydm.internal is allocated using kzalloc which is used multiple
times without a check for NULL pointer. This patch avoids such a
scenario by returning 0, consistent with the failure case.
Signed-off-by: Aditya Pakki <pakki001@umn.edu>
Reviewed-by: Mukesh Ojha <mojha@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Colin King reported a bug in read_bbreg_hdl():
memcpy(pcmd->rsp, (u8 *)&val, pcmd->rspsz);
The problem is that "val" is uninitialized.
This code is obviously not useful, but so far as I can tell
"pcmd->cmdcode" is never GEN_CMD_CODE(_Read_BBREG) so it's not harmful
either. For now the easiest fix is to just call r8712_free_cmd_obj()
and return.
Fixes: 2865d42c78 ("staging: r8712u: Add the new driver to the mainline kernel")
Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
skb allocated via dev_alloc_skb can fail and return a NULL pointer.
This patch avoids such a scenario and returns, consistent with other
invocations.
Signed-off-by: Aditya Pakki <pakki001@umn.edu>
Reviewed-by: Mukesh Ojha <mojha@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
hwxmits is allocated via kcalloc and not checked for failure before its
dereference. The patch fixes this problem by returning error upstream
in rtl8723bs, rtl8188eu.
Signed-off-by: Aditya Pakki <pakki001@umn.edu>
Acked-by: Mukesh Ojha <mojha@codeaurora.org>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
An munmap() on a binder device causes binder_vma_close() to be called
which clears the alloc->vma pointer.
If direct reclaim causes binder_alloc_free_page() to be called, there
is a race where alloc->vma is read into a local vma pointer and then
used later after the mm->mmap_sem is acquired. This can result in
calling zap_page_range() with an invalid vma which manifests as a
use-after-free in zap_page_range().
The fix is to check alloc->vma after acquiring the mmap_sem (which we
were acquiring anyway) and skip zap_page_range() if it has changed
to NULL.
Signed-off-by: Todd Kjos <tkjos@google.com>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The selinux-testsuite found an issue resulting in a BUG_ON()
where a conditional relied on a size_t going negative when
checking the validity of a buffer offset.
Fixes: 7a67a39320 ("binder: add function to copy binder object from buffer")
Reported-by: Paul Moore <paul@paul-moore.com>
Tested-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On some systems /sys/devices/system/cpu/cpuidle/low_power_idle_cpu_residency_us
or /sys/devices/system/cpu/cpuidle/low_power_idle_system_residency_us
return a file error because of bad ACPI LPIT data from a misconfigured BIOS.
turbostat interprets this failure as a fatal error and outputs
turbostat: CPU LPI: No data available
If the ACPI LPIT sysfs files return an error output a warning instead of
a fatal error, disable the ACPI LPIT evaluation code, and continue.
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Most calls to fgets() and fscanf() are followed by error checks.
Add an exit-on-error in the remaining cases.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Len Brown <len.brown@intel.com>
The package power can also be read from an MSR. It's not clear exactly
what is included, and whether it's aggregated over all nodes or
reported separately.
It does look like this is reported separately per CCX (I get a single
value on the Ryzen R7 1700), but it might be reported separately per-
die (node?) on larger processors. If that's the case, it would have to
be recorded per node and aggregated for the socket.
Note that although Zen has these MSRs reporting power, it looks like
the actual RAPL configuration (power limits, configured TDP) is done
through PCI configuration space. I have not yet found any public
documentation for this.
Signed-off-by: Calvin Walton <calvin.walton@kepstin.ca>
Signed-off-by: Len Brown <len.brown@intel.com>
Based on the Open-Source Register Reference for AMD Family 17h
Processors Models 00h-2Fh:
https://support.amd.com/TechDocs/56255_OSRR.pdf
These processors report RAPL support in bit 14 of CPUID 0x80000007 EDX,
and the following MSRs are present:
0xc0010299 (RAPL_PWR_UNIT), like Intel's RAPL_POWER_UNIT
0xc001029a (CORE_ENERGY_STAT), kind of like Intel's PP0_ENERGY_STATUS
0xc001029b (PKG_ENERGY_STAT), like Intel's PKG_ENERGY_STATUS
A notable difference from the Intel implementation is that AMD reports
the "Cores" energy usage separately for each core, rather than a
per-package total. The code has been adjusted to handle either case in a
generic way.
I haven't yet enabled collection of package power, due to being unable
to test it on multi-node systems (TR, EPYC).
Signed-off-by: Calvin Walton <calvin.walton@kepstin.ca>
Signed-off-by: Len Brown <len.brown@intel.com>
Running without a cpufreq driver is a valid case so warnings output in
this case should not be to stderr.
Use outf instead of stderr for these warnings.
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>
turbostat executes on CPUs in "topology order".
This is an optimization for measuring profoundly idle systems --
as the closest hardware is woken next...
Fix a typo that was added with the sub-die-node support,
that broke topology ordering on multi-node systems.
Signed-off-by: Len Brown <len.brown@intel.com>
Naresh reported that test_align fails because of the mismatch at the
verbose printout of the register states. The reason is due to the newly
added ref_obj_id.
ref_obj_id is only useful for refcounted reg. Thus, this patch fixes it
by only printing ref_obj_id for refcounted reg. While at it, it also uses
comma instead of space to separate between "id" and "ref_obj_id".
Fixes: 1b98658968 ("bpf: Fix bpf_tcp_sock and bpf_sk_fullsock issue related to bpf_sk_release")
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Commit 2b50f7ab63 ("kbuild: add workaround for Debian make-kpkg")
annoyed people who want to wrap the top Makefile with GNUmakefile
to customize it for their use.
On second thought, we do not need to run the sub-make for in-tree
build with Make 4.x because the 'MAKEFLAGS += -rR' issue only happens
on GNU Make 3.x.
With this commit, people will get back their workflow, and the Debian
make-kpkg will still work.
Fixes: 2b50f7ab63 ("kbuild: add workaround for Debian make-kpkg")
Reported-by: Andreas Schwab <schwab@suse.de>
Reported-by: David Howells <dhowells@redhat.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Tested-by: Andreas Schwab <schwab@suse.de>
Tested-by: David Howells <dhowells@redhat.com>
Make sure we don't try to enqueue XDP_REDIRECT frames to an
inexistent FQ.
While it is guaranteed not to have more than one queue per core,
having fewer queues than CPUs on an interface is a valid
configuration.
Fixes: d678be1dc1 ("dpaa2-eth: add XDP_REDIRECT support")
Reported-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lukas Wunner says:
====================
ks8851 fixes & cleanups
Four fixes and two cleanups for the Microchip (formerly Micrel) KSZ8851
SPI Ethernet driver.
Some of the fixes might even pass as stable material, but I haven't marked
them as such for cautiousness: Doesn't hurt letting them bake in linux-next
for a few weeks to raise the confidence, even though we've tested them
extensively on our Revolution Pi open source PLCs.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The ks8851 chip is sold either with an SPI interface (KSZ8851SNL) or
with a so-called non-PCI interface (KSZ8851-16MLL). When the driver
for the latter was introduced with commit a55c0a0ed4 ("drivers/net:
ks8851_mll ethernet network driver"), it duplicated the register macros
introduced by the driver for the former with commit 3ba81f3ece ("net:
Micrel KS8851 SPI network driver").
The chips are almost identical, so the duplication seems unwarranted.
There are a handful of bits which are in use on the KSZ8851-16MLL but
reserved on the KSZ8851SNL, and vice-versa, but there are no actual
collisions.
Thus, remove the duplicate definitions from the KSZ8851-16MLL driver.
Mark all bits which differ between the two chips. Move the SPI frame
opcodes, which are specific to KSZ8851SNL, to its driver.
The KSZ8851-16MLL driver added a RXFCTR_THRESHOLD_MASK macro which is a
duplication of the RXFCTR_RXFCT_MASK macro, rename it where it's used.
Same for P1MBCR_FORCE_FDX, which duplicates the BMCR_FULLDPLX macro and
OBCR_ODS_16MA, which duplicates OBCR_ODS_16mA.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Frank Pavlic <f.pavlic@kunbus.de>
Cc: Ben Dooks <ben.dooks@codethink.co.uk>
Cc: Tristram Ha <Tristram.Ha@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ks8851 chip's initial carrier state is down. A Link Change Interrupt
is signaled once interrupts are enabled if the carrier is up.
The ks8851 driver has it backwards by assuming that the initial carrier
state is up. The state is therefore misrepresented if the interface is
opened with no cable attached. Fix it.
The Link Change interrupt is sometimes not signaled unless the P1MBSR
register (which contains the Link Status bit) is read on ->ndo_open().
This might be a hardware erratum. Read the register by calling
mii_check_link(), which has the desirable side effect of setting the
carrier state to down if the cable was detached while the interface was
closed.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Frank Pavlic <f.pavlic@kunbus.de>
Cc: Ben Dooks <ben.dooks@codethink.co.uk>
Cc: Tristram Ha <Tristram.Ha@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ks8851 driver currently requests the IRQ before registering the
net_device. Because the net_device name is used as IRQ name and is
still "eth%d" when the IRQ is requested, it's impossibe to tell IRQs
apart if multiple ks8851 chips are present. Most other drivers delay
requesting the IRQ until the net_device is opened. Do the same.
The driver doesn't enable interrupts on the chip before opening the
net_device and disables them when closing it, so there doesn't seem to
be a need to request the IRQ already on probe.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Frank Pavlic <f.pavlic@kunbus.de>
Cc: Ben Dooks <ben.dooks@codethink.co.uk>
Cc: Tristram Ha <Tristram.Ha@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 73fdeb82e9 ("net: ks8851: Add optional vdd_io regulator and
reset gpio") amended the ks8851 driver to briefly assert the chip's
reset pin on probe. It also amended the probe routine's error path to
reassert the reset pin if a subsequent initialization step fails.
However the commit misplaced reassertion of the reset pin in the error
path such that it is not performed if the check of the Chip ID and
Enable Register (CIDER) fails. The error path is therefore slightly
asymmetrical to the probe routine's body. Fix it.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Frank Pavlic <f.pavlic@kunbus.de>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Nishanth Menon <nm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ks8851 driver lets the chip auto-dequeue received packets once they
have been read in full. It achieves that by setting the ADRFE flag in
the RXQCR register ("Auto-Dequeue RXQ Frame Enable").
However if allocation of a packet's socket buffer or retrieval of the
packet over the SPI bus fails, the packet will not have been read in
full and is not auto-dequeued. Such partial retrieval of a packet
confuses the chip's RX queue management: On the next RX interrupt,
the first packet read from the queue will be the one left there
previously and this one can be retrieved without issues. But for any
newly received packets, the frame header status and byte count registers
(RXFHSR and RXFHBCR) contain bogus values, preventing their retrieval.
The chip allows explicitly dequeueing a packet from the RX queue by
setting the RRXEF flag in the RXQCR register ("Release RX Error Frame").
This could be used to dequeue the packet in case of an error, but if
that error is a failed SPI transfer, it is unknown if the packet was
transferred in full and was auto-dequeued or if it was only transferred
in part and requires an explicit dequeue. The safest approach is thus
to always dequeue packets explicitly and forgo auto-dequeueing.
Without this change, I've witnessed packet retrieval break completely
when an SPI DMA transfer fails, requiring a chip reset. Explicit
dequeueing magically fixes this and makes packet retrieval absolutely
robust for me.
The chip's documentation suggests auto-dequeuing and uses the RRXEF
flag only to dequeue error frames which the driver doesn't want to
retrieve. But that seems to be a fair-weather approach.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Frank Pavlic <f.pavlic@kunbus.de>
Cc: Ben Dooks <ben.dooks@codethink.co.uk>
Cc: Tristram Ha <Tristram.Ha@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Back in commit a89ca6f24f ("Btrfs: fix fsync after truncate when
no_holes feature is enabled") I added an assertion that is triggered when
an inline extent is found to assert that the length of the (uncompressed)
data the extent represents is the same as the i_size of the inode, since
that is true most of the time I couldn't find or didn't remembered about
any exception at that time. Later on the assertion was expanded twice to
deal with a case of a compressed inline extent representing a range that
matches the sector size followed by an expanding truncate, and another
case where fallocate can update the i_size of the inode without adding
or updating existing extents (if the fallocate range falls entirely within
the first block of the file). These two expansion/fixes of the assertion
were done by commit 7ed586d0a8 ("Btrfs: fix assertion on fsync of
regular file when using no-holes feature") and commit 6399fb5a0b
("Btrfs: fix assertion failure during fsync in no-holes mode").
These however missed the case where an falloc expands the i_size of an
inode to exactly the sector size and inline extent exists, for example:
$ mkfs.btrfs -f -O no-holes /dev/sdc
$ mount /dev/sdc /mnt
$ xfs_io -f -c "pwrite -S 0xab 0 1096" /mnt/foobar
wrote 1096/1096 bytes at offset 0
1 KiB, 1 ops; 0.0002 sec (4.448 MiB/sec and 4255.3191 ops/sec)
$ xfs_io -c "falloc 1096 3000" /mnt/foobar
$ xfs_io -c "fsync" /mnt/foobar
Segmentation fault
$ dmesg
[701253.602385] assertion failed: len == i_size || (len == fs_info->sectorsize && btrfs_file_extent_compression(leaf, extent) != BTRFS_COMPRESS_NONE) || (len < i_size && i_size < fs_info->sectorsize), file: fs/btrfs/tree-log.c, line: 4727
[701253.602962] ------------[ cut here ]------------
[701253.603224] kernel BUG at fs/btrfs/ctree.h:3533!
[701253.603503] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
[701253.603774] CPU: 2 PID: 7192 Comm: xfs_io Tainted: G W 5.0.0-rc8-btrfs-next-45 #1
[701253.604054] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
[701253.604650] RIP: 0010:assfail.constprop.23+0x18/0x1a [btrfs]
(...)
[701253.605591] RSP: 0018:ffffbb48c186bc48 EFLAGS: 00010286
[701253.605914] RAX: 00000000000000de RBX: ffff921d0a7afc08 RCX: 0000000000000000
[701253.606244] RDX: 0000000000000000 RSI: ffff921d36b16868 RDI: ffff921d36b16868
[701253.606580] RBP: ffffbb48c186bcf0 R08: 0000000000000000 R09: 0000000000000000
[701253.606913] R10: 0000000000000003 R11: 0000000000000000 R12: ffff921d05d2de18
[701253.607247] R13: ffff921d03b54000 R14: 0000000000000448 R15: ffff921d059ecf80
[701253.607769] FS: 00007f14da906700(0000) GS:ffff921d36b00000(0000) knlGS:0000000000000000
[701253.608163] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[701253.608516] CR2: 000056087ea9f278 CR3: 00000002268e8001 CR4: 00000000003606e0
[701253.608880] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[701253.609250] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[701253.609608] Call Trace:
[701253.609994] btrfs_log_inode+0xdfb/0xe40 [btrfs]
[701253.610383] btrfs_log_inode_parent+0x2be/0xa60 [btrfs]
[701253.610770] ? do_raw_spin_unlock+0x49/0xc0
[701253.611150] btrfs_log_dentry_safe+0x4a/0x70 [btrfs]
[701253.611537] btrfs_sync_file+0x3b2/0x440 [btrfs]
[701253.612010] ? do_sysinfo+0xb0/0xf0
[701253.612552] do_fsync+0x38/0x60
[701253.612988] __x64_sys_fsync+0x10/0x20
[701253.613360] do_syscall_64+0x60/0x1b0
[701253.613733] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[701253.614103] RIP: 0033:0x7f14da4e66d0
(...)
[701253.615250] RSP: 002b:00007fffa670fdb8 EFLAGS: 00000246 ORIG_RAX: 000000000000004a
[701253.615647] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f14da4e66d0
[701253.616047] RDX: 000056087ea9c260 RSI: 000056087ea9c260 RDI: 0000000000000003
[701253.616450] RBP: 0000000000000001 R08: 0000000000000020 R09: 0000000000000010
[701253.616854] R10: 000000000000009b R11: 0000000000000246 R12: 000056087ea9c260
[701253.617257] R13: 000056087ea9c240 R14: 0000000000000000 R15: 000056087ea9dd10
(...)
[701253.619941] ---[ end trace e088d74f132b6da5 ]---
Updating the assertion again to allow for this particular case would result
in a meaningless assertion, plus there is currently no risk of logging
content that would result in any corruption after a log replay if the size
of the data encoded in an inline extent is greater than the inode's i_size
(which is not currently possibe either with or without compression),
therefore just remove the assertion.
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In sctp_setsockopt_bindx()/__sctp_setsockopt_connectx(), it allocates
memory with addrs_size which is passed from userspace. We used flag
GFP_USER to put some more restrictions on it in Commit cacc062152
("sctp: use GFP_USER for user-controlled kmalloc").
However, since Commit c981f254cc ("sctp: use vmemdup_user() rather
than badly open-coding memdup_user()"), vmemdup_user() has been used,
which doesn't check GFP_USER flag when goes to vmalloc_*(). So when
addrs_size is a huge value, it could exhaust memory and even trigger
oom killer.
This patch is to use memdup_user() instead, in which GFP_USER would
work to limit the memory allocation with a huge addrs_size.
Note we can't fix it by limiting 'addrs_size', as there's no demand
for it from RFC.
Reported-by: syzbot+ec1b7575afef85a0e5ca@syzkaller.appspotmail.com
Fixes: c981f254cc ("sctp: use vmemdup_user() rather than badly open-coding memdup_user()")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jianlin reported a crash:
[ 381.484332] BUG: unable to handle kernel NULL pointer dereference at 0000000000000068
[ 381.619802] RIP: 0010:fib6_rule_lookup+0xa3/0x160
[ 382.009615] Call Trace:
[ 382.020762] <IRQ>
[ 382.030174] ip6_route_redirect.isra.52+0xc9/0xf0
[ 382.050984] ip6_redirect+0xb6/0xf0
[ 382.066731] icmpv6_notify+0xca/0x190
[ 382.083185] ndisc_redirect_rcv+0x10f/0x160
[ 382.102569] ndisc_rcv+0xfb/0x100
[ 382.117725] icmpv6_rcv+0x3f2/0x520
[ 382.133637] ip6_input_finish+0xbf/0x460
[ 382.151634] ip6_input+0x3b/0xb0
[ 382.166097] ipv6_rcv+0x378/0x4e0
It was caused by the lookup function __ip6_route_redirect() returns NULL in
fib6_rule_lookup() when ip6_create_rt_rcu() returns NULL.
So we fix it by simply making ip6_create_rt_rcu() return ip6_null_entry
instead of NULL.
v1->v2:
- move down 'fallback:' to make it more readable.
Fixes: e873e4b9cc ("ipv6: use fib6_info_hold_safe() when necessary")
Reported-by: Jianlin Shi <jishi@redhat.com>
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Acked-by: Wei Wang <weiwan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix sparse warnings:
arch/arm64/kvm/../../../virt/kvm/arm/vgic/vgic-its.c:1732:5: warning:
symbol 'vgic_its_has_attr_regs' was not declared. Should it be static?
arch/arm64/kvm/../../../virt/kvm/arm/vgic/vgic-its.c:1753:5: warning:
symbol 'vgic_its_attr_regs_access' was not declared. Should it be static?
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
[maz: fixed subject]
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
We rely on the mmu_notifier call backs to handle the split/merge
of huge pages and thus we are guaranteed that, while creating a
block mapping, either the entire block is unmapped at stage2 or it
is missing permission.
However, we miss a case where the block mapping is split for dirty
logging case and then could later be made block mapping, if we cancel the
dirty logging. This not only creates inconsistent TLB entries for
the pages in the the block, but also leakes the table pages for
PMD level.
Handle this corner case for the huge mappings at stage2 by
unmapping the non-huge mapping for the block. This could potentially
release the upper level table. So we need to restart the table walk
once we unmap the range.
Fixes : ad361f093c ("KVM: ARM: Support hugetlbfs backed huge pages")
Reported-by: Zheng Xiang <zhengxiang9@huawei.com>
Cc: Zheng Xiang <zhengxiang9@huawei.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Cc: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Register platform component with a prefix, to avoid warnings
on debugfs entries creation, due to component name
redundancy.
Signed-off-by: Olivier Moysan <olivier.moysan@st.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
The DFSDM must be stopped when a new setting is applied.
restart systematically DFSDM on multiple prepare calls,
to apply changes.
Signed-off-by: Olivier Moysan <olivier.moysan@st.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
This reverts commit f10e0010fa.
This commit was just wrong. It caused a lot of
syzbot warnings, so just revert it.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
hidpp_scroll_counter_handle_scroll() doesn't expect a 0-value scroll event, it
gets interpreted as a negative scroll direction event. This can cause scroll
direction resets and thus broken scrolling.
Fixes: 4435ff2f09 ("HID: logitech: Enable high-resolution scrolling on Logitech mice")
Cc: stable@vger.kernel.org # v5.0
Reported-and-tested-by: Aimo Metsälä <aimetsal@outlook.com>
Signed-off-by: Peter Hutterer <peter.hutterer@who-t.net>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
The clang option -Oz enables *aggressive* optimization for size,
which doesn't necessarily result in smaller images, but can have
negative impact on performance. Switch back to the less aggressive
-Os.
This reverts commit 6748cb3c29.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
The write_parport_reg_nonblock() helper takes a reference to the struct
mos_parport, but failed to release it in a couple of error paths after
allocation failures, leading to a memory leak.
Johan said that move the kref_get() and mos_parport assignment to the
end of urbtrack initialisation is a better way, so move it. and
mos_parport do not used until urbtrack initialisation.
Signed-off-by: Lin Yi <teroincn@163.com>
Fixes: b69578df7e ("USB: usbserial: mos7720: add support for parallel port on moschip 7715")
Cc: stable <stable@vger.kernel.org> # 2.6.35
Signed-off-by: Johan Hovold <johan@kernel.org>
Increase the reset duration to ensure correct phy functionality. The
reset duration is taken from barebox commit 52fdd510de ("ARM: dts:
pfla02: use long enough reset for ethernet phy"):
Use a longer reset time for ethernet phy Micrel KSZ9031RNX. Otherwise a
small percentage of modules have 'transmission timeouts' errors like
barebox@Phytec phyFLEX-i.MX6 Quad Carrier-Board:/ ifup eth0
warning: No MAC address set. Using random address 7e:94:4d:02:f8:f3
eth0: 1000Mbps full duplex link detected
eth0: transmission timeout
T eth0: transmission timeout
T eth0: transmission timeout
T eth0: transmission timeout
T eth0: transmission timeout
Cc: Stefan Christ <s.christ@phytec.de>
Cc: Christian Hemp <c.hemp@phytec.de>
Signed-off-by: Marco Felsch <m.felsch@pengutronix.de>
Fixes: 3180f95666 ("ARM: dts: Phytec imx6q pfla02 and pbab01 support")
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
When no alarm has been programmed on RSK-RZA1, an error message is
printed during boot:
rtc rtc0: invalid alarm value: 2019-03-14T255:255:255
sh_rtc_read_alarm_value() returns 0xff when querying a hardware alarm
field that is not enabled. __rtc_read_alarm() validates the received
alarm values, and fills in missing fields when needed.
While 0xff is handled fine for the year, month, and day fields, and
corrected as considered being out-of-range, this is not the case for the
hour, minute, and second fields, where -1 is expected for missing
fields.
Fix this by returning -1 instead, as this value is handled fine for all
fields.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
If we encounter a failure during suspend where this RTC was programmed
to wakeup the system from suspend, but that wakeup couldn't be
configured because the system didn't support wakeup interrupts, we'll
run into the following warning:
Unbalanced IRQ 166 wake disable
WARNING: CPU: 7 PID: 3071 at kernel/irq/manage.c:669 irq_set_irq_wake+0x108/0x278
This happens because the suspend process isn't aborted when the RTC
fails to configure the wakeup IRQ. Instead, we continue suspending the
system and then another suspend callback fails the suspend process and
"unwinds" the previously suspended drivers by calling their resume
callbacks. When we get back to resuming this RTC driver, we'll call
disable_irq_wake() on an IRQ that hasn't been configured for wake.
Let's just fail suspend/resume here if we can't configure the system to
wake and the user has chosen to wakeup with this device. This fixes this
warning and makes the code more robust in case there are systems out
there that can't wakeup from suspend on this line but the user has
chosen to do so.
Cc: Enric Balletbo i Serra <enric.balletbo@collabora.com>
Cc: Evan Green <evgreen@chromium.org>
Cc: Benson Leung <bleung@chromium.org>
Cc: Guenter Roeck <groeck@chromium.org>
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Acked-By: Benson Leung <bleung@chromium.org>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
In f_hidg_write() the write_spinlock is acquired before calling
usb_ep_queue() which causes a deadlock when dummy_hcd is being used.
This is because dummy_queue() callbacks into f_hidg_req_complete() which
tries to acquire the same spinlock. This is (part of) the backtrace when
the deadlock occurs:
0xffffffffc06b1410 in f_hidg_req_complete
0xffffffffc06a590a in usb_gadget_giveback_request
0xffffffffc06cfff2 in dummy_queue
0xffffffffc06a4b96 in usb_ep_queue
0xffffffffc06b1eb6 in f_hidg_write
0xffffffff8127730b in __vfs_write
0xffffffff812774d1 in vfs_write
0xffffffff81277725 in SYSC_write
Fix this by releasing the write_spinlock before calling usb_ep_queue()
Reviewed-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Tested-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: stable@vger.kernel.org # 4.11+
Fixes: 749494b6bd ("usb: gadget: f_hid: fix: Move IN request allocation to set_alt()")
Signed-off-by: Radoslav Gerganov <rgerganov@vmware.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
->release_ops() callback releases resources and this is used in error path.
If nf_tables_newrule() fails after ->select_ops(), it should release
resources. but it can not call ->destroy() because that should be called
after ->init().
At this point, ->release_ops() should be used for releasing resources.
Test commands:
modprobe -rv xt_tcpudp
iptables-nft -I INPUT -m tcp <-- error command
lsmod
Result:
Module Size Used by
xt_tcpudp 20480 2 <-- it should be 0
Fixes: b8e2040063 ("netfilter: nft_compat: use .release_ops and remove list of extension")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Restore the status of ep->stopped in function net2272_dequeue().
When the given request is not found in the endpoint queue
the function returns -EINVAL without restoring the state of
ep->stopped. Thus the endpoint keeps blocked and does not transfer
any data anymore.
This fix is only compile-tested, since we do not have a
corresponding hardware. An analogous fix was tested in the sibling
driver. See "usb: gadget: net2280: Fix net2280_dequeue()"
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Guido Kiener <guido.kiener@rohde-schwarz.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
When a request must be dequeued with net2280_dequeue() e.g. due
to a device clear action and the same request is finished by the
function scan_dma_completions() then the function net2280_dequeue()
does not find the request in the following search loop and
returns the error -EINVAL without restoring the status ep->stopped.
Thus the endpoint keeps blocked and does not receive any data
anymore.
This fix restores the status and does not issue an error message.
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Guido Kiener <guido.kiener@rohde-schwarz.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
The OUT endpoint normally blocks (NAK) subsequent packets when a
short packet was received and returns an incomplete queue entry to
the gadget driver. Thereby the gadget driver can detect a short packet
when reading queue entries with a length that is not equal to a
multiple of packet size.
The start_queue() function enables receiving OUT packets regardless of
the content of the OUT FIFO. This results in a race: With the current
code, it's possible that the "!ep->is_in && (readl(&ep->regs->ep_stat)
& BIT(NAK_OUT_PACKETS))" test in start_dma() will fail, then a short
packet will be received, and then start_queue() will call
stop_out_naking(). That's what we don't want (OUT naking gets turned
off while there is data in the FIFO) because then the next driver
request might receive a mixture of old and new packets.
With the patch, this race can't occur because the FIFO's state is
tested after we know that OUT naking is already turned on, and OUT
naking is stopped only when both of the conditions are met. This
ensures that all received data is delivered to the gadget driver,
which can detect a short packet now before new packets are appended
to the last short packet.
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Guido Kiener <guido.kiener@rohde-schwarz.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
This patch adds support for 6PE (RFC 4798) which uses IPv4-mapped IPv6
nexthop to connect IPv6 islands over IPv4 only MPLS network core.
Prior to this fix, to find the link-layer destination mac address, 6PE
enabled host/router was sending IPv6 ND requests for IPv4-mapped IPv6
nexthop address over the interface facing the IPv4 only core which
wouldn't success as the core is IPv6 free.
This fix changes that behavior on 6PE host to treat the nexthop as IPv4
address and send ARP requests whenever the next-hop address is an
IPv4-mapped IPv6 address.
Below topology illustrates the issue and how the patch addresses it.
abcd::1.1.1.1 (lo) abcd::2.2.2.2 (lo)
R0 (PE/host)------------------------R1--------------------------------R2 (PE/host)
<--- IPv4 MPLS core ---> <------ IPv4 MPLS core -------->
eth1 eth2 eth3 eth4
172.18.0.10 172.18.0.11 172.19.0.11 172.19.0.12
ffff::172.18.0.10 ffff::172.19.0.12
<------------------IPv6 MPLS tunnel ---------------------->
R0 and R2 act as 6PE routers of IPv6 islands. R1 is IPv4 only with MPLS tunnels
between R0,R1 and R1,R2.
docker exec r0 ip -f inet6 route add abcd::2.2.2.2/128 nexthop encap mpls 100 via ::ffff:172.18.0.11 dev eth1
docker exec r2 ip -f inet6 route add abcd::1.1.1.1/128 nexthop encap mpls 200 via ::ffff:172.19.0.11 dev eth4
docker exec r1 ip -f mpls route add 100 via inet 172.19.0.12 dev eth3
docker exec r1 ip -f mpls route add 200 via inet 172.18.0.10 dev eth2
With the change, when R0 sends an IPv6 packet over MPLS tunnel to abcd::2.2.2.2,
using ::ffff:172.18.0.11 as the nexthop, it does neighbor discovery for
172.18.18.0.11.
Signed-off-by: Vinay K Nallamothu <nvinay@juniper.net>
Tested-by: Avinash Lingala <ar977m@att.com>
Tested-by: Aravind Srinivas Srinivasa Prabhakar <aprabh@juniper.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
clang points out a harmless signed integer overflow:
drivers/net/ethernet/3com/3c515.c:1530:66: error: implicit conversion from 'int' to 'short' changes value from 32783 to -32753 [-Werror,-Wconstant-conversion]
new_mode = SetRxFilter | RxStation | RxMulticast | RxBroadcast | RxProm;
~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~
drivers/net/ethernet/3com/3c515.c:1532:52: error: implicit conversion from 'int' to 'short' changes value from 32775 to -32761 [-Werror,-Wconstant-conversion]
new_mode = SetRxFilter | RxStation | RxMulticast | RxBroadcast;
~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~
drivers/net/ethernet/3com/3c515.c:1534:38: error: implicit conversion from 'int' to 'short' changes value from 32773 to -32763 [-Werror,-Wconstant-conversion]
new_mode = SetRxFilter | RxStation | RxBroadcast;
~ ~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~
Make the variable unsigned to avoid the overflow.
Fixes: Linux-2.1.128pre1
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull a round of fixes for meson clocks from Neil Armstrong:
- g12a: Fix VPU clock parents and mux mask
- gxbb: Add CLK_DIVIDER_ROUND_CLOSEST to video decoder clocks
* tag 'meson-clk-fixes-for-5.1' of https://github.com/BayLibre/clk-meson:
clk: meson-g12a: fix VPU clock parents
clk: meson: g12a: fix VPU clock muxes mask
clk: meson-gxbb: round the vdec dividers to closest
When a dual stack dccp listener accepts an ipv4 flow,
it should not attempt to use an ipv6 header or
inet6_iif() helper.
Fixes: 3df80d9320 ("[DCCP]: Introduce DCCPv6")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a dual stack tcp listener accepts an ipv4 flow,
it should not attempt to use an ipv6 header or tcp_v6_iif() helper.
Fixes: 1397ed35f2 ("ipv6: add flowinfo for tcp6 pkt_options for all cases")
Fixes: df3687ffc6 ("ipv6: add the IPV6_FL_F_REFLECT flag to IPV6_FL_A_GET")
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case of kmemdup failure while setting the service name the patch
returns -ENOMEM upstream for processing.
Signed-off-by: Aditya Pakki <pakki001@umn.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
In netdev_queue_add_kobject and rx_queue_add_kobject,
if sysfs_create_group failed, kobject_put will call
netdev_queue_release to decrease dev refcont, however
dev_hold has not be called. So we will see this while
unregistering dev:
unregister_netdevice: waiting for bcsh0 to become free. Usage count = -1
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: d0d6683716 ("net: don't decrement kobj reference count on init failure")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When using 16K DMA buffers and ring mode, the DES3 refill is not working
correctly as the function is using a bogus pointer for checking the
private data. As a result stale pointers will remain in the RX descriptor
ring, so DMA will now likely overwrite/corrupt some already freed memory.
As simple reproducer, just receive some UDP traffic:
# ifconfig eth0 down; ifconfig eth0 mtu 9000; ifconfig eth0 up
# iperf3 -c 192.168.253.40 -u -b 0 -R
If you didn't crash by now check the RX descriptors to find non-contiguous
RX buffers:
cat /sys/kernel/debug/stmmaceth/eth0/descriptors_status
[...]
1 [0x2be5020]: 0xa3220321 0x9ffc1ffc 0x72d70082 0x130e207e
^^^^^^^^^^^^^^^^^^^^^
2 [0x2be5040]: 0xa3220321 0x9ffc1ffc 0x72998082 0x1311a07e
^^^^^^^^^^^^^^^^^^^^^
A simple ping test will now report bad data:
# ping -s 8200 192.168.253.40
PING 192.168.253.40 (192.168.253.40) 8200(8228) bytes of data.
8208 bytes from 192.168.253.40: icmp_seq=1 ttl=64 time=1.00 ms
wrong data byte #8144 should be 0xd0 but was 0x88
Fix the wrong pointer. Also we must refill DES3 only if the DMA buffer
size is 16K.
Fixes: 54139cf3bb ("net: stmmac: adding multiple buffers for rx")
Signed-off-by: Aaro Koskinen <aaro.koskinen@nokia.com>
Acked-by: Jose Abreu <joabreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A recently added function in mlxsw triggers a harmless compiler warning:
In file included from drivers/net/ethernet/mellanox/mlxsw/core.h:17,
from drivers/net/ethernet/mellanox/mlxsw/core_env.c:7:
drivers/net/ethernet/mellanox/mlxsw/core_env.c: In function 'mlxsw_env_module_temp_thresholds_get':
drivers/net/ethernet/mellanox/mlxsw/reg.h:8015:45: error: '*' in boolean context, suggest '&&' instead [-Werror=int-in-bool-context]
#define MLXSW_REG_MTMP_TEMP_TO_MC(val) (val * 125)
~~~~~^~~~~~
drivers/net/ethernet/mellanox/mlxsw/core_env.c:116:8: note: in expansion of macro 'MLXSW_REG_MTMP_TEMP_TO_MC'
if (!MLXSW_REG_MTMP_TEMP_TO_MC(module_temp)) {
^~~~~~~~~~~~~~~~~~~~~~~~~
The warning is normally disabled, but it would be nice to enable
it to find real bugs, and there are no other known instances at
the moment.
Replace the negation with a zero-comparison, which also matches
the comment above it.
Fixes: d93c19a1d9 ("mlxsw: core: Add API for QSFP module temperature thresholds reading")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kalle Valo says:
====================
wireless-drivers fixes for 5.1
First set of fixes for 5.1. Lots of fixes for mt76 this time.
iwlwifi
* fix warning with do_div()
mt7601u
* avoid using hardware which is supported by mt76
mt76
* more fixes for hweight8() usage
* fix hardware restart for mt76x2
* fix writing txwi on USB devices
* fix (and disable by default) ED/CCA support on 76x2
* fix powersave issues on 7603
* fix return value check for ioremap on 7603
* fix duplicate USB device IDs
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 6794ad5443 ("KVM: arm/arm64: Fix unintended stage 2 PMD mappings")
made the checks to skip huge mappings, stricter. However it introduced
a bug where we still use huge mappings, ignoring the flag to
use PTE mappings, by not reseting the vma_pagesize to PAGE_SIZE.
Also, the checks do not cover the PUD huge pages, that was
under review during the same period. This patch fixes both
the issues.
Fixes : 6794ad5443 ("KVM: arm/arm64: Fix unintended stage 2 PMD mappings")
Reported-by: Zenghui Yu <yuzenghui@huawei.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Cc: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Stefan Schmidt says:
====================
pull-request: ieee802154 for net 2019-03-19
An update from ieee802154 for your *net* tree.
Kangjie Lu fixed a potential NULL pointer deref in the adf7242 driver and Li
RongQing make sure we propagate a netlink return code to the caller.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When halting a guest, QEMU flushes the virtual ITS caches, which
amounts to writing to the various tables that the guest has allocated.
When doing this, we fail to take the srcu lock, and the kernel
shouts loudly if running a lockdep kernel:
[ 69.680416] =============================
[ 69.680819] WARNING: suspicious RCU usage
[ 69.681526] 5.1.0-rc1-00008-g600025238f51-dirty #18 Not tainted
[ 69.682096] -----------------------------
[ 69.682501] ./include/linux/kvm_host.h:605 suspicious rcu_dereference_check() usage!
[ 69.683225]
[ 69.683225] other info that might help us debug this:
[ 69.683225]
[ 69.683975]
[ 69.683975] rcu_scheduler_active = 2, debug_locks = 1
[ 69.684598] 6 locks held by qemu-system-aar/4097:
[ 69.685059] #0: 0000000034196013 (&kvm->lock){+.+.}, at: vgic_its_set_attr+0x244/0x3a0
[ 69.686087] #1: 00000000f2ed935e (&its->its_lock){+.+.}, at: vgic_its_set_attr+0x250/0x3a0
[ 69.686919] #2: 000000005e71ea54 (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0
[ 69.687698] #3: 00000000c17e548d (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0
[ 69.688475] #4: 00000000ba386017 (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0
[ 69.689978] #5: 00000000c2c3c335 (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0
[ 69.690729]
[ 69.690729] stack backtrace:
[ 69.691151] CPU: 2 PID: 4097 Comm: qemu-system-aar Not tainted 5.1.0-rc1-00008-g600025238f51-dirty #18
[ 69.691984] Hardware name: rockchip evb_rk3399/evb_rk3399, BIOS 2019.04-rc3-00124-g2feec69fb1 03/15/2019
[ 69.692831] Call trace:
[ 69.694072] lockdep_rcu_suspicious+0xcc/0x110
[ 69.694490] gfn_to_memslot+0x174/0x190
[ 69.694853] kvm_write_guest+0x50/0xb0
[ 69.695209] vgic_its_save_tables_v0+0x248/0x330
[ 69.695639] vgic_its_set_attr+0x298/0x3a0
[ 69.696024] kvm_device_ioctl_attr+0x9c/0xd8
[ 69.696424] kvm_device_ioctl+0x8c/0xf8
[ 69.696788] do_vfs_ioctl+0xc8/0x960
[ 69.697128] ksys_ioctl+0x8c/0xa0
[ 69.697445] __arm64_sys_ioctl+0x28/0x38
[ 69.697817] el0_svc_common+0xd8/0x138
[ 69.698173] el0_svc_handler+0x38/0x78
[ 69.698528] el0_svc+0x8/0xc
The fix is to obviously take the srcu lock, just like we do on the
read side of things since bf308242ab. One wonders why this wasn't
fixed at the same time, but hey...
Fixes: bf308242ab ("KVM: arm/arm64: VGIC/ITS: protect kvm_read_guest() calls with SRCU lock")
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
The normal interrupt flow is not to enable the vgic when no virtual
interrupt is to be injected (i.e. the LRs are empty). But when a guest
is likely to use GICv4 for LPIs, we absolutely need to switch it on
at all times. Otherwise, VLPIs only get delivered when there is something
in the LRs, which doesn't happen very often.
Reported-by: Nianyao Tang <tangnianyao@huawei.com>
Tested-by: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
We've become very cautious to now always reset the vcpu when nothing
is loaded on the physical CPU. To do so, we now disable preemption
and do a kvm_arch_vcpu_put() to make sure we have all the state
in memory (and that it won't be loaded behind out back).
This now causes issues with resetting the PMU, which calls into perf.
Perf itself uses mutexes, which clashes with the lack of preemption.
It is worth realizing that the PMU is fully emulated, and that
no PMU state is ever loaded on the physical CPU. This means we can
perfectly reset the PMU outside of the non-preemptible section.
Fixes: e761a927bc ("KVM: arm/arm64: Reset the VCPU without preemption and vcpu state loaded")
Reported-by: Julien Grall <julien.grall@arm.com>
Tested-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
When dev_exception_add() returns an error (due to a failed memory
allocation), make sure that we move the RCU preemption count back to where
it was before we were called. We dropped the RCU read lock inside the loop
body, so we can't just "break".
sparse complains about this, too:
$ make -s C=2 security/device_cgroup.o
./include/linux/rcupdate.h:647:9: warning: context imbalance in
'propagate_exception' - unexpected unlock
Fixes: d591fb5661 ("device_cgroup: simplify cgroup tree walk in propagate_exception()")
Cc: stable@vger.kernel.org
Signed-off-by: Jann Horn <jannh@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Allow the async rpc task for finish and update the open state if needed,
then free the slot. Otherwise, the async rpc unable to decode the reply.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Fixes: ae55e59da0 ("pnfs: Don't release the sequence slot...")
Cc: stable@vger.kernel.org # v4.18+
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
My commit 26a7b54731 ("mt76x02: set protection according to ht
operation element") enabled by default RTS/CTS protection for OFDM
and CCK traffic, because MT_TX_RTS_CFG_THRESH is configured to non
0xffff by initvals and .set_rts_threshold callback is not called by
mac80211 on initialization, only on user request or during
ieee80211_reconfig() (suspend/resuem or restart_hw).
Enabling RTS/CTS cause some problems when sending probe request
frames by hcxdumptool penetration tool, but I expect it can cause
other issues on different scenarios.
Restore previous setting of RTS/CTS being disabled by default for
OFDM/CCK by changing MT_TX_RTS_CFG_THRESH initvals to 0xffff.
Fixes: 26a7b54731 ("mt76x02: set protection according to ht operation element")
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
__sw_hweight8() is only defined if CONFIG_GENERIC_HWEIGHT is enabled.
The function that works on all architectures is hweight8().
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Since some USB device IDs are duplicated between mt76x0u, mt7601u
and mt76x2u device, check chip version on probe and return error if
not match the driver.
Don't think this is serious issue, probe most likely will fail at
some other point for wrong device, but we do not have to configure
it if we know is not our device.
Reported-by: Xose Vazquez Perez <xose.vazquez@gmail.com>
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Since some USB device IDs are duplicated between mt7601u and mt76x0u
devices, check chip version on probe and return error if not match
0x7601.
Don't think this is serious issue, probe most likely will fail at
some other point for wrong device, but we do not have to configure
it if we know is not mt7601u device.
Reported-by: Xose Vazquez Perez <xose.vazquez@gmail.com>
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Acked-by: Jakub Kicinski <kubakici@wp.pl>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Always init the tp/ip fields of bma in xfs_bmapi_write so that the
bmapi_finish at the bottom never trips over null transaction or inode
pointers.
Coverity-id: 1443964
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
In xchk_btree_check_owner, we can be passed a null buffer pointer. This
should only happen for the root of a root-in-inode btree type, but we
should program defensively in case the btree cursor state ever gets
screwed up and we get a null buffer anyway.
Coverity-id: 1438713
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Make sure scrub's dabtree iterator function checks that we're not
going deeper in the stack than our cursor permits.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
There is a race condition that could happen if hid_debug_rdesc_show()
is running while hdev is in the process of going away (device removal,
system suspend, etc) which could result in NULL pointer dereference:
BUG: unable to handle kernel paging request at 0000000783316040
CPU: 1 PID: 1512 Comm: getevent Tainted: G U O 4.19.20-quilt-2e5dc0ac-00029-gc455a447dd55 #1
RIP: 0010:hid_dump_device+0x9b/0x160
Call Trace:
hid_debug_rdesc_show+0x72/0x1d0
seq_read+0xe0/0x410
full_proxy_read+0x5f/0x90
__vfs_read+0x3a/0x170
vfs_read+0xa0/0x150
ksys_read+0x58/0xc0
__x64_sys_read+0x1a/0x20
do_syscall_64+0x55/0x110
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Grab driver_input_lock to make sure the input device exists throughout the
whole process of dumping the rdesc.
[jkosina@suse.cz: update changelog a bit]
Signed-off-by: he, bo <bo.he@intel.com>
Signed-off-by: "Zhang, Jun" <jun.zhang@intel.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
We disable transmission interrupt (clear SCSCR_TIE) after all data has been transmitted
(if uart_circ_empty(xmit)). While transmitting, if the data is still in the tty buffer,
re-enable the SCSCR_TIE bit, which was done at sci_start_tx().
This is unnecessary processing, wasting CPU operation if the data transmission length is large.
And further, transmit end, FIFO empty bits disabling have also been performed in the step above.
Signed-off-by: Hoan Nguyen An <na-hoan@jinso.co.jp>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Andrey has reported on OpenWrt's bug tracking system[1], that he
currently can't use ar93xx_uart as pure serial UART without console
(CONFIG_SERIAL_8250_CONSOLE and CONFIG_SERIAL_AR933X_CONSOLE undefined),
because compilation ends with following error:
ar933x_uart.c: In function 'ar933x_uart_console_write':
ar933x_uart.c:550:14: error: 'struct uart_port' has no
member named 'sysrq'
So this patch moves all the code related to console handling behind
series of CONFIG_SERIAL_AR933X_CONSOLE ifdefs.
1. https://bugs.openwrt.org/index.php?do=details&task_id=2152
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jiri Slaby <jslaby@suse.com>
Cc: Andrey Batyiev <batyiev@gmail.com>
Reported-by: Andrey Batyiev <batyiev@gmail.com>
Tested-by: Andrey Batyiev <batyiev@gmail.com>
Signed-off-by: Petr Štetiar <ynezz@true.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add the missing uart_unregister_driver() and i2c_del_driver() before return
from sc16is7xx_init() in the error handling case.
Signed-off-by: Mao Wenan <maowenan@huawei.com>
Reviewed-by: Vladimir Zapolskiy <vz@mleia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In case ioremap fails, the fix returns -ENOMEM to avoid NULL
pointer dereferences.
Multiple places use port.membase.
Signed-off-by: Kangjie Lu <kjlu@umn.edu>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
of_match_device can return a NULL pointer when matching device is not
found. This patch avoids a scenario causing NULL pointer derefernce.
Signed-off-by: Aditya Pakki <pakki001@umn.edu>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
of_match_device on failure to find a matching device can return a NULL
pointer. The patch checks for such a scenrio and passes the error upstream.
Signed-off-by: Aditya Pakki <pakki001@umn.edu>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There is no need to compare *port* with >= 0 because such comparison
of an unsigned value is always true.
Fix this by removing such comparison.
Addresses-Coverity-ID: 1443949 ("Unsigned compared against 0")
Fixes: 02a50b8750 ("usb: usb251xb: add usb data lane port swap feature")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Richard Leitner <richard.leitner@skidata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
No direct transition from prerouting to forward hook, routing lookup
needs to happen first.
Fixes: 19b351f16f ("netfilter: add flowtable documentation")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
There are cases where multiple device tree nodes point to the
same phy node by means of the "phys" property, but we should
only consider those nodes that are marked as available rather
than just any node.
Fixes: 98bfb39466 ("usb: of: add an api to get dr_mode by the phy node")
Cc: stable@vger.kernel.org # v4.4+
Signed-off-by: Fabrizio Castro <fabrizio.castro@bp.renesas.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
PD 2.0 sinks are supposed to accept src-capabilities with a 3.0 header and
simply ignore any src PDOs which the sink does not understand such as PPS
but some 2.0 sinks instead ignore the entire PD_DATA_SOURCE_CAP message,
causing contract negotiation to fail.
This commit fixes such sinks not working by re-trying the contract
negotiation with PD-2.0 source-caps messages if we don't have a contract
after PD_N_HARD_RESET_COUNT hard-reset attempts.
The problem fixed by this commit was noticed with a Type-C to VGA dongle.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently there is no check on platform_get_irq() return value
in case it fails, hence never actually reporting any errors and
causing unexpected behavior when using such value as argument
for function regmap_irq_get_virq().
Fix this by adding a proper check, a message error and return
*irq* in case platform_get_irq() fails.
Addresses-Coverity-ID: 1443899 ("Improper use of negative value")
Fixes: d2061f9cc3 ("usb: typec: add driver for Intel Whiskey Cove PMIC USB Type-C PHY")
Cc: stable@vger.kernel.org
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
qgroup_rsv_size is calculated as the product of
outstanding_extent * fs_info->nodesize. The product is calculated with
32 bit precision since both variables are defined as u32. Yet
qgroup_rsv_size expects a 64 bit result.
Avoid possible multiplication overflow by casting outstanding_extent to
u64. Such overflow would in the worst case (64K nodesize) require more
than 65536 extents, which is quite large and i'ts not likely that it
would happen in practice.
Fixes-coverity-id: 1435101
Fixes: ff6bc37eb7 ("btrfs: qgroup: Use independent and accurate per inode qgroup rsv")
CC: stable@vger.kernel.org # 4.19+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
If 'cur_level' is 7 then the bound checking at the top of the function
will actually pass. Later on, it's possible to dereference
ds_path->nodes[cur_level+1] which will be an out of bounds.
The correct check will be cur_level >= BTRFS_MAX_LEVEL - 1 .
Fixes-coverty-id: 1440918
Fixes-coverty-id: 1440911
Fixes: ea49f3e73c ("btrfs: qgroup: Introduce function to find all new tree blocks of reloc tree")
CC: stable@vger.kernel.org # 4.20+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
If a watchdog timeout is received from the DSP it is safe to assume the
DSP is not functioning anymore and as such any active compressed streams
should be put into an error state.
Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Best to lock across handling the bus error to ensure the DSP doesn't
change power state as we are reading the status registers.
Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
During recent logging improvements it seems two error messages lost
their updates during patch application/rebasing. Add these back in.
Fixes: 0d3fba3e7a ("ASoC: wm_adsp: Improve logging messages")
Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Previously support was added to allow streams to be stopped and
started again without the DSP being power cycled and this was done
by clearing the buffer state in trigger start. Another supported
use-case is using the DSP for a trigger event then opening the
compressed stream later to receive the audio, unfortunately clearing
the buffer state in trigger start destroys the data received
from such a trigger. Correct this issue by moving the call to
wm_adsp_buffer_clear to be in trigger stop instead.
Fixes: 61fc060c40 ("ASoC: wm_adsp: Support streams which can start/stop with DSP active")
Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
create_singlethread_workqueue may fail and return NULL. The fix checks if it is
NULL to avoid NULL pointer dereference. Also, the fix moves the call of
create_singlethread_workqueue earlier to avoid resource-release issues.
Signed-off-by: Kangjie Lu <kjlu@umn.edu>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
The SIMCom SIM5218 and compatible devices have 5 USB interfaces, only 4
of which are serial ports. The fifth is a network interface supported
by the qmi-wwan driver. Furthermore, the serial ports do not support
modem control signals. Add driver_info flags to reflect this.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Fixes: ec0cd94d88 ("usb: option: add SIMCom SIM5218")
Cc: stable <stable@vger.kernel.org> # 3.2
Signed-off-by: Johan Hovold <johan@kernel.org>
The Quectel EM12 is a Cat. 12 LTE modem. It behaves in the exactly the
same way as the EP06 (including the dynamic configuration behavior), so
the same checks on reserved interfaces, etc. are needed.
Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
In the current cpuidle implementation for i.MX6q, the CPU that sets
'WAIT_UNCLOCKED' and the CPU that returns to 'WAIT_CLOCKED' are always
the same. While the CPU that sets 'WAIT_UNCLOCKED' is in IDLE state of
"WAIT", if the other CPU wakes up and enters IDLE state of "WFI"
istead of "WAIT", this CPU can not wake up at expired time.
Because, in the case of "WFI", the CPU must be waked up by the local
timer interrupt. But, while 'WAIT_UNCLOCKED' is set, the local timer
is stopped, when all CPUs execute "wfi" instruction. As a result, the
local timer interrupt is not fired.
In this situation, this CPU will wake up by IRQ different from local
timer. (e.g. broacast timer)
So, this fix changes CPU to return to 'WAIT_CLOCKED'.
Signed-off-by: Kohji Okuno <okuno.kohji@jp.panasonic.com>
Fixes: e5f9dec8ff ("ARM: imx6q: support WAIT mode using cpuidle")
Cc: <stable@vger.kernel.org>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Use rgmii-id phy mode for the CPU port (MAC0) of the QCA8334 switch
to add delays to both Tx and Rx clock.
It worked with the rgmii mode before because the qca8k driver
(incorrectly) enabled delays in that mode and rgmii-id was not
implemented at all.
Commit 5ecdd77c61 ("net: dsa: qca8k: disable delay for RGMII mode")
removed the delays from the RGMII mode and hence broke the networking.
To fix the problem, commit a968b5e9d5 ("net: dsa: qca8k: Enable delay
for RGMII_ID mode") was introduced.
Now the correct phy mode is available so use it.
Signed-off-by: Michal Vokáč <michal.vokac@ysoft.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Provide an explanation of what is expected with respect to sending new
versions of specific patches within a patch series, as well as what
happens if an earlier patch series accidentally gets merged).
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the last NFSv3 unmount from a given host races with a mount from the
same host, we can destroy an nlm_host that is still in use.
Specifically nlmclnt_lookup_host() can increment h_count on
an nlm_host that nlmclnt_release_host() has just successfully called
refcount_dec_and_test() on.
Once nlmclnt_lookup_host() drops the mutex, nlm_destroy_host_lock()
will be called to destroy the nlmclnt which is now in use again.
The cause of the problem is that the dec_and_test happens outside the
locked region. This is easily fixed by using
refcount_dec_and_mutex_lock().
Fixes: 8ea6ecc8b0 ("lockd: Create client-side nlm_host cache")
Cc: stable@vger.kernel.org (v2.6.38+)
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
For some application which need kernel virtual address, such as fbcon,
implement these function to map/unmap kernel virtual address of prime
buffer.
Signed-off-by: CK Hu <ck.hu@mediatek.com>
Julian Wiedmann says:
====================
s390/qeth: fixes 2019-03-18
please apply the following three patches to -net. The first two are fixes
for minor race conditions in the probe code, while the third one gets
dropwatch working (again).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
As part of the TX completion path, qeth_release_skbs() frees the completed
skbs with __skb_queue_purge(). This ends in kfree_skb(), reporting every
completed skb as dropped.
On the other hand when dropping an skb in .ndo_start_xmit, we end up
calling consume_skb()... where we should be using kfree_skb() so that
drop monitors get notified.
Switch the drop/consume logic around, and also don't accumulate dropped
packets in the tx_errors statistics.
Fixes: dc149e3764 ("s390/qeth: replace open-coded skb_queue_walk()")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ucast IP table is utilized by some of the L3-specific sysfs attributes
that qeth_l3_create_device_attributes() provides. So initialize the table
_before_ registering the attributes.
Fixes: ebccc7397e ("s390/qeth: add missing hash table initializations")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The HW trap and VNICC configuration is exposed via sysfs, and may have
already been modified when qeth_l?_probe_device() attempts to initialize
them. So (1) initialize the VNICC values a little earlier, and (2) don't
bother about the HW trap mode, it was already initialized before.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The device ID alone does not uniquely identify a device. Test both the
vendor and device ID to make sure we don't mistakenly think some other
vendor's 0xB410 device is a Digium HFC4S. Also, instead of the bare hex
ID, use the same constant (PCI_DEVICE_ID_DIGIUM_HFC4S) used in the device
ID table.
No functional change intended.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long says:
====================
sctp: fix ignoring asoc_id for tcp-style sockets on some setsockopts
This is a patchset to fix ignoring asoc_id for tcp-style sockets on
some setsockopts, introduced by SCTP_CURRENT_ASSOC of the patchset:
[net-next,00/24] sctp: support SCTP_FUTURE/CURRENT/ALL_ASSOC
(https://patchwork.ozlabs.org/cover/1031706/)
As Marcelo suggested, we fix it on each setsockopt that is using
SCTP_CURRENT_ASSOC one by one by adding the check:
if (sctp_style(sk, TCP))
xxx.xxx_assoc_id = SCTP_FUTURE_ASSOC;
so that assoc_id will be completely ingored for tcp-style socket on
setsockopts, and works as SCTP_FUTURE_ASSOC.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
A similar fix as Patch "sctp: fix ignoring asoc_id for tcp-style sockets on
SCTP_DEFAULT_SEND_PARAM sockopt" on SCTP_STREAM_SCHEDULER sockopt.
Fixes: 7efba10d6b ("sctp: add SCTP_FUTURE_ASOC and SCTP_CURRENT_ASSOC for SCTP_STREAM_SCHEDULER sockopt")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A similar fix as Patch "sctp: fix ignoring asoc_id for tcp-style sockets on
SCTP_DEFAULT_SEND_PARAM sockopt" on SCTP_EVENT sockopt.
Fixes: d251f05e3b ("sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_EVENT sockopt")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A similar fix as Patch "sctp: fix ignoring asoc_id for tcp-style sockets on
SCTP_DEFAULT_SEND_PARAM sockopt" on SCTP_ENABLE_STREAM_RESET sockopt.
Fixes: 99a62135e1 ("sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_ENABLE_STREAM_RESET sockopt")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A similar fix as Patch "sctp: fix ignoring asoc_id for tcp-style sockets on
SCTP_DEFAULT_SEND_PARAM sockopt" on SCTP_DEFAULT_PRINFO sockopt.
Fixes: 3a583059d1 ("sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_DEFAULT_PRINFO sockopt")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A similar fix as Patch "sctp: fix ignoring asoc_id for tcp-style sockets on
SCTP_DEFAULT_SEND_PARAM sockopt" on SCTP_AUTH_DEACTIVATE_KEY sockopt.
Fixes: 2af66ff3ed ("sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_AUTH_DEACTIVATE_KEY sockopt")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A similar fix as Patch "sctp: fix ignoring asoc_id for tcp-style sockets on
SCTP_DEFAULT_SEND_PARAM sockopt" on SCTP_AUTH_DELETE_KEY sockopt.
Fixes: 3adcc30060 ("sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_AUTH_DELETE_KEY sockopt")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A similar fix as Patch "sctp: fix ignoring asoc_id for tcp-style sockets on
SCTP_DEFAULT_SEND_PARAM sockopt" on SCTP_AUTH_ACTIVE_KEY sockopt.
Fixes: bf9fb6ad4f ("sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_AUTH_ACTIVE_KEY sockopt")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A similar fix as Patch "sctp: fix ignoring asoc_id for tcp-style sockets on
SCTP_DEFAULT_SEND_PARAM sockopt" on SCTP_AUTH_KEY sockopt.
Fixes: 7fb3be13a2 ("sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_AUTH_KEY sockopt")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A similar fix as Patch "sctp: fix ignoring asoc_id for tcp-style sockets on
SCTP_DEFAULT_SEND_PARAM sockopt" on SCTP_MAX_BURST sockopt.
Fixes: e0651a0dc8 ("sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_MAX_BURST sockopt")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A similar fix as Patch "sctp: fix ignoring asoc_id for tcp-style sockets on
SCTP_DEFAULT_SEND_PARAM sockopt" on SCTP_CONTEXT sockopt.
Fixes: 49b037acca ("sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_CONTEXT sockopt")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A similar fix as Patch "sctp: fix ignoring asoc_id for tcp-style sockets on
SCTP_DEFAULT_SEND_PARAM sockopt" on SCTP_DEFAULT_SNDINFO sockopt.
Fixes: 92fc3bd928 ("sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_DEFAULT_SNDINFO sockopt")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A similar fix as Patch "sctp: fix ignoring asoc_id for tcp-style sockets on
SCTP_DEFAULT_SEND_PARAM sockopt" on SCTP_DELAYED_SACK sockopt.
Fixes: 9c5829e1c4 ("sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_DELAYED_SACK sockopt")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently if the user pass an invalid asoc_id to SCTP_DEFAULT_SEND_PARAM
on a TCP-style socket, it will silently ignore the new parameters.
That's because after not finding an asoc, it is checking asoc_id against
the known values of CURRENT/FUTURE/ALL values and that fails to match.
IOW, if the user supplies an invalid asoc id or not, it should either
match the current asoc or the socket itself so that it will inherit
these later. Fixes it by forcing asoc_id to SCTP_FUTURE_ASSOC in case it
is a TCP-style socket without an asoc, so that the values get set on the
socket.
Fixes: 707e45b3dc ("sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_DEFAULT_SEND_PARAM sockopt")
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now sctp_copy_descendant() copies pd_lobby from old sctp scok to new
sctp sock. If sctp_sock_migrate() returns error, it will panic when
releasing new sock and trying to purge pd_lobby due to the incorrect
pointers in pd_lobby.
[ 120.485116] kasan: CONFIG_KASAN_INLINE enabled
[ 120.486270] kasan: GPF could be caused by NULL-ptr deref or user
[ 120.509901] Call Trace:
[ 120.510443] sctp_ulpevent_free+0x1e8/0x490 [sctp]
[ 120.511438] sctp_queue_purge_ulpevents+0x97/0xe0 [sctp]
[ 120.512535] sctp_close+0x13a/0x700 [sctp]
[ 120.517483] inet_release+0xdc/0x1c0
[ 120.518215] __sock_release+0x1d2/0x2a0
[ 120.519025] sctp_do_peeloff+0x30f/0x3c0 [sctp]
We fix it by not copying sctp_sock pd_lobby in sctp_copy_descendan(),
and skb_queue_head_init() can also be removed in sctp_sock_migrate().
Reported-by: syzbot+85e0b422ff140b03672a@syzkaller.appspotmail.com
Fixes: 89664c6236 ("sctp: sctp_sock_migrate() returns error if sctp_bind_addr_dup() fails")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sctp_hdr(skb) only works when skb->transport_header is set properly.
But in Netfilter, skb->transport_header for ipv6 is not guaranteed
to be right value for sctphdr. It would cause to fail to check the
checksum for sctp packets.
So fix it by using offset, which is always right in all places.
v1->v2:
- Fix the changelog.
Fixes: e6d8b64b34 ("net: sctp: fix and consolidate SCTP checksumming code")
Reported-by: Li Shuang <shuali@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I am using "protocol ip" filters in TC to manipulate TC flower
classifiers, which are only available with "protocol ip". However,
I faced an issue that packets sent via raw sockets with ETH_P_ALL
did not match the ip filters even if they did satisfy the condition
(e.g., DHCP offer from dhcpd).
I have determined that the behavior was caused by an unexpected
value stored in skb->protocol, namely, ETH_P_ALL instead of ETH_P_IP,
when packets were sent via raw sockets with ETH_P_ALL set.
IMHO, storing ETH_P_ALL in skb->protocol is not appropriate for
packets sent via raw sockets because ETH_P_ALL is not a real ether
type used on wire, but a virtual one.
This patch fixes the tx protocol selection in cases of transmission
via raw sockets created with ETH_P_ALL so that it asks the driver to
extract protocol from the Ethernet header.
Fixes: 75c65772c3 ("net/packet: Ask driver for protocol if not provided by user")
Signed-off-by: Yoshiki Komachi <komachi.yoshiki@lab.ntt.co.jp>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When using fanouts with AF_PACKET, the demux functions such as
fanout_demux_cpu will return an index in the fanout socket array, which
corresponds to the selected socket.
The ordering of this array depends on the order the sockets were added
to a given fanout group, so for FANOUT_CPU this means sockets are bound
to cpus in the order they are configured, which is OK.
However, when stopping then restarting the interface these sockets are
bound to, the sockets are reassigned to the fanout group in the reverse
order, due to the fact that they were inserted at the head of the
interface's AF_PACKET socket list.
This means that traffic that was directed to the first socket in the
fanout group is now directed to the last one after an interface restart.
In the case of FANOUT_CPU, traffic from CPU0 will be directed to the
socket that used to receive traffic from the last CPU after an interface
restart.
This commit introduces a helper to add a socket at the tail of a list,
then uses it to register AF_PACKET sockets.
Note that this changes the order in which sockets are listed in /proc and
with sock_diag.
Fixes: dc99f60069 ("packet: Add fanout support")
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit ad6c9986bc ("vxlan: Fix GRO cells race condition between
receive and link delete") fixed a race condition for the typical case a vxlan
device is dismantled from the current netns. But if a netns is dismantled,
vxlan_destroy_tunnels() is called to schedule a unregister_netdevice_queue()
of all the vxlan tunnels that are related to this netns.
In vxlan_destroy_tunnels(), gro_cells_destroy() is called and finished before
unregister_netdevice_queue(). This means that the gro_cells_destroy() call is
done too soon, for the same reasons explained in above commit.
So we need to fully respect the RCU rules, and thus must remove the
gro_cells_destroy() call or risk use after-free.
Fixes: 58ce31cca1 ("vxlan: GRO support at tunnel layer")
Signed-off-by: Suanming.Mou <mousuanming@huawei.com>
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Zhiqiang Liu <liuzhiqiang26@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
TCP/UDP checksum validity was propagated to skb
only if IP checksum is valid.
But for IPv6 there is no validity as there is no checksum in IPv6.
This patch propagates TCP/UDP checksum validity regardless of IP checksum.
Fixes: 018423e90b ("net: ethernet: aquantia: Add ring support code")
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: Nikita Danilov <nikita.danilov@aquantia.com>
Signed-off-by: Dmitry Bogdanov <dmitry.bogdanov@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bug that Stan reported is as follows. After a restart, a 16-bit NIC
may be incorrectly identified as a 32-bit NIC and stop working.
mac8390 slot.E: Memory length resource not found, probing
mac8390 slot.E: Farallon EtherMac II-C (type farallon)
mac8390 slot.E: MAC 00:00:c5:30:c2:99, IRQ 61, 32 KB shared memory at 0xfeed0000, 32-bit access.
The bug never arises after a cold start and only intermittently after a
warm start. (I didn't investigate why the bug is intermittent.)
It turns out that memcpy_toio() is deprecated and memcmp_withio() also
has issues. Replacing these calls with mmio accessors fixes the problem.
Reported-and-tested-by: Stan Johnson <userm57@yahoo.com>
Fixes: 2964db0f59 ("m68k: Mac DP8390 update")
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Similarly to commit a7603ac1fc ("geneve: change NET_UDP_TUNNEL
dependency to select"), GTP has a dependency on NET_UDP_TUNNEL which
makes impossible to compile it if no other protocol depending on
NET_UDP_TUNNEL is selected.
Fix this by changing the depends to a select, and drop NET_IP_TUNNEL from
the select list, as it already depends on NET_UDP_TUNNEL.
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Parity page is incorrectly unmapped in finish_parity_scrub(), triggering
a reference counter bug on i386, i.e.:
[ 157.662401] kernel BUG at mm/highmem.c:349!
[ 157.666725] invalid opcode: 0000 [#1] SMP PTI
The reason is that kunmap(p_page) was completely left out, so we never
did an unmap for the p_page and the loop unmapping the rbio page was
iterating over the wrong number of stripes: unmapping should be done
with nr_data instead of rbio->real_stripes.
Test case to reproduce the bug:
- create a raid5 btrfs filesystem:
# mkfs.btrfs -m raid5 -d raid5 /dev/sdb /dev/sdc /dev/sdd /dev/sde
- mount it:
# mount /dev/sdb /mnt
- run btrfs scrub in a loop:
# while :; do btrfs scrub start -BR /mnt; done
BugLink: https://bugs.launchpad.net/bugs/1812845
Fixes: 5a6ac9eacb ("Btrfs, raid56: support parity scrub on raid56")
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
NV12 framebuffers produced by the VPU shows distorted on RK3288
after win has been disabled when scaling is active.
This issue can be reproduced using a 1080p modeset by:
- Scale a 1280x720 NV12 framebuffer to 1920x1080 on win0
- Disable win0
- Display a 1920x1080 NV12 framebuffer without scaling on win0
- Output will now show the framebuffer distorted
And by:
- Scale a 1280x720 NV12 framebuffer to 1920x1080
- Change to a 720p modeset (win gets disabled and scaling reset to none)
- Output will now show the framebuffer distorted
Fix this by setting scale mode to none when win is disabled.
Fixes: 4c156c21c7 ("drm/rockchip: vop: support plane scale")
Cc: stable@vger.kernel.org
Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Link: https://patchwork.freedesktop.org/patch/msgid/AM3PR03MB0966DE3E19BACE07328CD637AC7D0@AM3PR03MB0966.eurprd03.prod.outlook.com
Since commit ad67b74d24 ("printk: hash addresses printed with %p"),
at boot "____ptrval____" is printed instead of actual addresses:
percpu: Embedded 38 pages/cpu @(____ptrval____) s124376 r0 d31272 u524288
Instead of changing the print to "%px", and leaking kernel addresses,
just remove the print completely, cfr. e.g. commit 071929dbdd
("arm64: Stop printing the virtual memory layout").
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Dennis Zhou <dennis@kernel.org>
This pull request brings in a build fix for arm64 with bcm2835
enabled, and fixes the driver in the presence of -EPROBE_DEFER.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
It has been observed that sometimes a higher order memory allocation
for BPF maps fails when there is no obvious memory pressure in a system.
E.g. the map (BPF_MAP_TYPE_LRU_HASH, key=38, value=56, max_elems=524288)
could not be created due to vmalloc unable to allocate 75497472B,
when the system's memory consumption (in MB) was the following:
Total: 3942 Used: 837 (21.24%) Free: 138 Buffers: 239 Cached: 2727
Later analysis [1] by Michal Hocko showed that the vmalloc was not trying
to reclaim memory from the page cache and was failing prematurely due to
__GFP_NORETRY.
Considering dcda9b0471 ("mm, tree wide: replace __GFP_REPEAT by
__GFP_RETRY_MAYFAIL with more useful semantic") and [1], we can replace
__GFP_NORETRY with __GFP_RETRY_MAYFAIL, as it won't invoke OOM killer
and will try harder to fulfil allocation requests.
Unfortunately, replacing the body of the BPF map memory allocation
function with the kvmalloc_node helper function is not an option at
this point in time, given 1) kmalloc is non-optional for higher order
allocations, and 2) passing __GFP_RETRY_MAYFAIL to the kmalloc would
stress the slab allocator too much for large requests.
The change has been tested with the workloads mentioned above and by
observing oom_kill value from /proc/vmstat.
[1]: https://lore.kernel.org/bpf/20190310071318.GW5232@dhcp22.suse.cz/
Signed-off-by: Martynas Pumputis <m@lambda.lt>
Acked-by: Yonghong Song <yhs@fb.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20190318153940.GL8924@dhcp22.suse.cz/
AF_INET4 does not exist.
Fixes: c78efc99c7 ("netfilter: nf_tables: nat: merge nft_redir protocol specific modules)"
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Proper use counter updates when activating and deactivating the object,
otherwise, this hits bogus EBUSY error.
Fixes: cd5125d8f5 ("netfilter: nf_tables: split set destruction in deactivate and destroy phase")
Reported-by: Laura Garcia <nevola@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
skb_header_pointer may return NULL. The current code dereference
its return values without a NULL check.
The fix inserts the checks to avoid NULL pointer dereferences.
Fixes: 202a8ff545 ("netfilter: add IPv6 segment routing header 'srh' match")
Signed-off-by: Kangjie Lu <kjlu@umn.edu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
With NETFILTER_XT_TARGET_TEE=y and IP6_NF_IPTABLES=m, we get a link
error when referencing the NF_DUP_IPV6 module:
net/netfilter/xt_TEE.o: In function `tee_tg6':
xt_TEE.c:(.text+0x14): undefined reference to `nf_dup_ipv6'
The problem here is the 'select NF_DUP_IPV6 if IP6_NF_IPTABLES'
that forces NF_DUP_IPV6 to be =m as well rather than setting it
to =y as was intended here. Adding a soft dependency on
IP6_NF_IPTABLES avoids that broken configuration.
Fixes: 5d400a4933 ("netfilter: Kconfig: Change select IPv6 dependencies")
Cc: Máté Eckl <ecklm94@gmail.com>
Cc: Taehee Yoo <ap420073@gmail.com>
Link: https://patchwork.ozlabs.org/patch/999498/
Link: https://lore.kernel.org/patchwork/patch/960062/
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Since Commit 21d1196a35 ("ipv4: set transport header earlier"),
skb->transport_header has been always set before entering INET
netfilter. This patch is to set skb->transport_header for bridge
before entering INET netfilter by bridge-nf-call-iptables.
It also fixes an issue that sctp_error() couldn't compute a right
csum due to unset skb->transport_header.
Fixes: e6d8b64b34 ("net: sctp: fix and consolidate SCTP checksumming code")
Reported-by: Li Shuang <shuali@redhat.com>
Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Otherwise, we hit bogus ENOENT when removing elements.
Fixes: e701001e7c ("netfilter: nft_rbtree: allow adjacent intervals with dynamic updates")
Reported-by: Václav Zindulka <vaclav.zindulka@tlapnet.cz>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
If ASRC turns on, HW will use clk_dac as the reference clock
whether recording or playback.
Both of clk_dac and clk_adc should set proper clock while using ASRC.
Signed-off-by: Shuming Fan <shumingf@realtek.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
The jack type detection needs the main bias power of analog.
The modification makes sure the main bias power on/off while jack plug/unplug.
Signed-off-by: Shuming Fan <shumingf@realtek.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
The IRQ function may not work when system suspend.
We remove snd_soc_dapm_force_enable_pin function call to
make sure the bias off when idle and run into suspend/resume function.
Signed-off-by: Shuming Fan <shumingf@realtek.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Skip for i2s5 in mck_disable which is also bypassed in mck_enable.
Signed-off-by: Tzung-Bi Shih <tzungbi@google.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Commit 71f6fa90a3 ("HID: increase maximum global item tag report size
to 256") increases the max report size from 128 to 256.
We also need to update the report size in hid_field_extract() otherwise
it complains and truncates now valid report size:
[ 406.165461] hid-sensor-hub 001F:8086:22D8.0002: hid_field_extract() called with n (192) > 32! (kworker/5:1)
BugLink: https://bugs.launchpad.net/bugs/1818547
Fixes: 71f6fa90a3 ("HID: increase maximum global item tag report size to 256")
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
When using this driver with the wireless dongle and some usermode
program that monitors every input device (acpid, for example), while
another usermode client opens and closes the low-level device
repeadedly, the system eventually deadlocks.
The reason is that steam_input_register_device() must not be called with
the mutex held, because the input subsystem has its own synchronization
that clashes with this one: it is possible that steam_input_open() is
called before input_register_device() returns, and since
steam_input_open() needs to lock the mutex, it deadlocks.
However we must hold the mutex when calling any function that sends
commands to the controller. If not, random commands end up falling fail.
Reported-by: Simon Gene Gottlieb <simon@gottliebtfreitag.de>
Signed-off-by: Rodrigo Rivas Costa <rodrigorivascosta@gmail.com>
Tested-by: Simon Gene Gottlieb <simon@gottliebtfreitag.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Various rk3328 based boards experience occasional sdmmc0 write errors.
This is due to the rk3328.dtsi tx drive levels being set to 4ma, vs
8ma per the rk3328 datasheet default settings.
Fix this by setting the tx signal pins to 8ma.
Inspiration from tonymac32's patch,
dc1212b347
Fixes issues on the rk3328-roc-cc and the rk3328-rock64 (as per the
above commit message).
Tested on the rk3328-roc-cc board.
Fixes: 52e02d377a ("arm64: dts: rockchip: add core dtsi file for RK3328 SoCs")
Cc: stable@vger.kernel.org
Signed-off-by: Peter Geis <pgwipeout@gmail.com>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Several rk3328 based boards experience high rgmii tx error rates.
This is due to several pins in the rk3328.dtsi rgmii pinmux that are
missing a defined pull strength setting.
This causes the pinmux driver to default to 2ma (bit mask 00).
These pins are only defined in the rk3328.dtsi, and are not listed in
the rk3328 specification.
The TRM only lists them as "Reserved"
(RK3328 TRM V1.1, 3.3.3 Detail Register Description, GRF_GPIO0B_IOMUX,
GRF_GPIO0C_IOMUX, GRF_GPIO0D_IOMUX).
However, removal of these pins from the rgmii pinmux definition causes
the interface to fail to transmit.
Also, the rgmii tx and rx pins defined in the dtsi are not consistent
with the rk3328 specification, with tx pins currently set to 12ma and
rx pins set to 2ma.
Fix this by setting tx pins to 8ma and the rx pins to 4ma, consistent
with the specification.
Defining the drive strength for the undefined pins eliminated the high
tx packet error rate observed under heavy data transfers.
Aligning the drive strength to the TRM values eliminated the occasional
packet retry errors under iperf3 testing.
This allows much higher data rates with no recorded tx errors.
Tested on the rk3328-roc-cc board.
Fixes: 52e02d377a ("arm64: dts: rockchip: add core dtsi file for RK3328 SoCs")
Cc: stable@vger.kernel.org
Signed-off-by: Peter Geis <pgwipeout@gmail.com>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
The Problem:
On ASUS Tinker Board S, when booting from the eMMC, and there is card
in the sd slot, there are constant errors.
Also when warm reboot, uboot can not access the sd slot
Cause:
Identified by Robin Murphy @ ARM. The Card Detect on rk3288
devices is pulled up by vccio-sd; so when the regulator powers this
off, card detect gives spurious errors. A second problem, is during
power down, vccio-sd apprears to be powered down. This causes a
problem when warm rebooting from the sd card. This was identified by
Jonas Karlman.
History:
A common fault on these rk3288 board, which impliment the reference
design.
When this arose before:
http://lists.infradead.org/pipermail/linux-arm-kernel/2014-August/281153.html
And Ulf and Jaehoon clearly said this was a broken card detect design,
which should be solved via polling
Solution:
Hence broken-cd is set as a property. This cures the errors. The
powering down of vccio-sd during reboot is cured by adding
regulator-boot-on.
This solutions has been fairly widely reviewed and tested.
Fixes: e58c5e739d ("ARM: dts: rockchip: move shared tinker-board nodes to a common dtsi")
Cc: stable@vger.kernel.org
[Heiko: slightly inaccurate fixes but tinker is a sbc (aka like a Pi) where
we can hopefully expect people not to rely on overly old stable kernels]
Signed-off-by: David Summers <beagleboard@davidjohnsummers.uk>
Reviewed-by: Jonas Karlman <jonas@kwiboo.se>
Tested-by: Jonas Karlman <jonas@kwiboo.se>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
The following error can be seen during boot:
of: /cpus/cpu@501: Couldn't find opp node
Change cpu nodes to use operating-points-v2 in order to fix this.
Fixes: ce76de9846 ("ARM: dts: rockchip: convert rk3288 to operating-points-v2")
Cc: stable@vger.kernel.org
Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
A DDC I2C bus specifier is required for DDC EDID probing to work
properly.
Fixes: 1b5715c602 ("arm64: dts: rockchip: add ROCK Pi 4 DTS support")
Signed-off-by: Ezequiel Garcia <ezequiel@collabora.com>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
The rk3328-roc-cc board exhibits tx stability issues with large packets,
as does the rock64 board, which was fixed with this patch
https://patchwork.kernel.org/patch/10178969/
A similar patch was merged for the rk3328-roc-cc here
https://patchwork.kernel.org/patch/10804863/
but it doesn't include the tx/rx_delay tweaks, and I find that they
help with an issue where large transfers would bring the ethernet
link down, causing a link reset regularly.
Signed-off-by: Leonidas P. Papadakos <papadakospan@gmail.com>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
When switching from speakup_soft to another synth, speakup_soft would
keep calling synth_buffer_getc() from softsynthx_read.
Let's thus make synth.c export the knowledge of the current synth, so
that speakup_soft can determine whether it should be running.
speakup_soft also needs to set itself alive, otherwise the switch would
let it remain silent.
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When building without CONFIG_OF, the compiler loses track of the flow
control in axis_fifo_probe(), and thinks that many variables are used
without an initialization even though we actually leave the function
before the first use:
drivers/staging/axis-fifo/axis-fifo.c: In function 'axis_fifo_probe':
drivers/staging/axis-fifo/axis-fifo.c:900:5: error: 'rxd_tdata_width' may be used uninitialized in this function [-Werror=maybe-uninitialized]
if (rxd_tdata_width != 32) {
^
drivers/staging/axis-fifo/axis-fifo.c:907:5: error: 'txd_tdata_width' may be used uninitialized in this function [-Werror=maybe-uninitialized]
if (txd_tdata_width != 32) {
^
drivers/staging/axis-fifo/axis-fifo.c:914:5: error: 'has_tdest' may be used uninitialized in this function [-Werror=maybe-uninitialized]
if (has_tdest) {
^
drivers/staging/axis-fifo/axis-fifo.c:919:5: error: 'has_tid' may be used uninitialized in this function [-Werror=maybe-uninitialized]
When CONFIG_OF is set, this does not happen, and since the driver cannot
work without it, just add that option as a Kconfig dependency.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
gcc noticed a mismatch between the type qualifiers after a recent
cleanup:
drivers/staging/olpc_dcon/olpc_dcon_xo_1.c: In function 'dcon_init_xo_1':
drivers/staging/olpc_dcon/olpc_dcon_xo_1.c:48:26: error: initialization discards 'const' qualifier from pointer target type [-Werror=discarded-qualifiers]
Add the 'const' keyword that should have been there all along.
Fixes: 2159fb3729 ("staging: olpc_dcon: olpc_dcon_xo_1.c: Switch to the gpio descriptor interface")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
`ni_cdio_cmdtest()` validates Comedi asynchronous commands for the DIO
subdevice (subdevice 2) of supported National Instruments M-series
cards. It is called when handling the `COMEDI_CMD` and `COMEDI_CMDTEST`
ioctls for this subdevice. There are two causes for a possible
divide-by-zero error when validating that the `stop_arg` member of the
passed-in command is not too large.
The first cause for the divide-by-zero is that calls to
`comedi_bytes_per_scan()` are only valid once the command has been
copied to `s->async->cmd`, but that copy is only done for the
`COMEDI_CMD` ioctl. For the `COMEDI_CMDTEST` ioctl, it will use
whatever was left there by the previous `COMEDI_CMD` ioctl, if any.
(This is very likely, as it is usual for the application to use
`COMEDI_CMDTEST` before `COMEDI_CMD`.) If there has been no previous,
valid `COMEDI_CMD` for this subdevice, then `comedi_bytes_per_scan()`
will return 0, so the subsequent division in `ni_cdio_cmdtest()` of
`s->async->prealloc_bufsz / comedi_bytes_per_scan(s)` will be a
divide-by-zero error. To fix this error, call a new function
`comedi_bytes_per_scan_cmd(s, cmd)`, based on the existing
`comedi_bytes_per_scan(s)` but using a specified `struct comedi_cmd` for
its calculations. (Also refactor `comedi_bytes_per_scan()` to call the
new function.)
Once the first cause for the divide-by-zero has been fixed, the second
cause is that `comedi_bytes_per_scan_cmd()` can legitimately return 0 if
the `scan_end_arg` member of the `struct comedi_cmd` being tested is 0.
Fix it by only performing the division (and validating that `stop_arg`
is no more than the maximum value) if `comedi_bytes_per_scan_cmd()`
returns a non-zero value.
The problem was reported on the COMEDI mailing list here:
https://groups.google.com/forum/#!topic/comedi_list/4t9WlHzMhKM
Reported-by: Ivan Vasilyev <grabesstimme@gmail.com>
Tested-by: Ivan Vasilyev <grabesstimme@gmail.com>
Fixes: f164cbf98f ("staging: comedi: ni_mio_common: add finite regeneration to dio output")
Cc: <stable@vger.kernel.org> # 4.6+
Cc: Spencer E. Olson <olsonse@umich.edu>
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
erofs_vmap() wrapped vmap() and vm_map_ram() to return virtual
continuous memory, but both of them can failed due to a lot of
reason, previously, erofs_vmap()'s callers didn't handle them,
which can potentially cause NULL pointer access, fix it.
Fixes: 3883a79abd ("staging: erofs: introduce VLE decompression support")
Fixes: 0d40d6e399 ("staging: erofs: add a generic z_erofs VLE decompressor")
Cc: <stable@vger.kernel.org> # 4.19+
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The ethernet in mt7621 is now supported by
drivers/net/ethernet/mediatek/
which provides support for the integrated switch through DSA.
This requires some devicetree changes, and particularly allows
a board dts to identify which switch ports are present.
The second CPU interface - gmac1 - doesn't work yet, so the device
tree information may not be correct. The phy (which is present on the
gnubee-pc2) can negotiate and report connection speed etc, but no
traffic flows.
The gnubee-pc1 has two network ports which are 'black' and 'blue'.
There are connected to switch ports 0 and 4 respectively.
Signed-off-by: NeilBrown <neil@brown.name>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
driver/net/ethernet/mediatek/ now supports this hardware,
so we don't need a separate driver.
Signed-off-by: NeilBrown <neil@brown.name>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There are now no in-kernel users of BUS_ATTR() so drop it from device.h
Everyone should use BUS_ATTR_RO/RW/WO() from now on.
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We move the check that prevents connecting service ranges to after
the RDM/DGRAM check, and move address sanity control to a separate
function that also validates the service range.
Fixes: 23998835be ("tipc: improve address sanity check in tipc_connect()")
Signed-off-by: Erik Hugne <erik.hugne@gmail.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix documentation markup warnings in snmp_counter.rst:
Documentation/networking/snmp_counter.rst:416: WARNING: Title underline too short.
Documentation/networking/snmp_counter.rst:684: WARNING: Bullet list ends without a blank line; unexpected unindent.
Documentation/networking/snmp_counter.rst:693: WARNING: Title underline too short.
Documentation/networking/snmp_counter.rst:707: WARNING: Bullet list ends without a blank line; unexpected unindent.
Documentation/networking/snmp_counter.rst:712: WARNING: Bullet list ends without a blank line; unexpected unindent.
Documentation/networking/snmp_counter.rst:722: WARNING: Title underline too short.
Documentation/networking/snmp_counter.rst:733: WARNING: Bullet list ends without a blank line; unexpected unindent.
Documentation/networking/snmp_counter.rst:736: WARNING: Bullet list ends without a blank line; unexpected unindent.
Documentation/networking/snmp_counter.rst:739: WARNING: Bullet list ends without a blank line; unexpected unindent.
Fixes: 80cc49507b ("net: Add part of TCP counts explanations in snmp_counters.rst")
Fixes: 8e2ea53a83 ("add snmp counters document")
Fixes: a6c7c7aac2 ("net: add document for several snmp counters")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: yupeng <yupeng0921@gmail.com>
that ssize_t is a rudiment of earlier calling conventions; it's been
used only to pass 0 and -E... since last autumn.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
aio_poll() has to cope with several unpleasant problems:
* requests that might stay around indefinitely need to
be made visible for io_cancel(2); that must not be done to
a request already completed, though.
* in cases when ->poll() has placed us on a waitqueue,
wakeup might have happened (and request completed) before ->poll()
returns.
* worse, in some early wakeup cases request might end
up re-added into the queue later - we can't treat "woken up and
currently not in the queue" as "it's not going to stick around
indefinitely"
* ... moreover, ->poll() might have decided not to
put it on any queues to start with, and that needs to be distinguished
from the previous case
* ->poll() might have tried to put us on more than one queue.
Only the first will succeed for aio poll, so we might end up missing
wakeups. OTOH, we might very well notice that only after the
wakeup hits and request gets completed (all before ->poll() gets
around to the second poll_wait()). In that case it's too late to
decide that we have an error.
req->woken was an attempt to deal with that. Unfortunately, it was
broken. What we need to keep track of is not that wakeup has happened -
the thing might come back after that. It's that async reference is
already gone and won't come back, so we can't (and needn't) put the
request on the list of cancellables.
The easiest case is "request hadn't been put on any waitqueues"; we
can tell by seeing NULL apt.head, and in that case there won't be
anything async. We should either complete the request ourselves
(if vfs_poll() reports anything of interest) or return an error.
In all other cases we get exclusion with wakeups by grabbing the
queue lock.
If request is currently on queue and we have something interesting
from vfs_poll(), we can steal it and complete the request ourselves.
If it's on queue and vfs_poll() has not reported anything interesting,
we either put it on the cancellable list, or, if we know that it
hadn't been put on all queues ->poll() wanted it on, we steal it and
return an error.
If it's _not_ on queue, it's either been already dealt with (in which
case we do nothing), or there's aio_poll_complete_work() about to be
executed. In that case we either put it on the cancellable list,
or, if we know it hadn't been put on all queues ->poll() wanted it on,
simulate what cancel would've done.
It's a lot more convoluted than I'd like it to be. Single-consumer APIs
suck, and unfortunately aio is not an exception...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Instead of having aio_complete() set ->ki_res.{res,res2}, do that
explicitly in its callers, drop the reference (as aio_complete()
used to do) and delay the rest until the final iocb_put().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
aio_poll() is not the only case that needs file pinned; worse, while
aio_read()/aio_write() can live without pinning iocb itself, the
proof is rather brittle and can easily break on later changes.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
We've had rather rare reports of bmap btree block corruption where
the bmap root block has a level count of zero. The root cause of the
corruption is so far unknown. We do have verifier checks to detect
this form of on-disk corruption, but this doesn't cover a memory
corruption variant of the problem. The latter is a reasonable
possibility because the root block is part of the inode fork and can
reside in-core for some time before inode extents are read.
If this occurs, it leads to a system crash such as the following:
BUG: unable to handle kernel paging request at ffffffff00000221
PF error: [normal kernel read fault]
...
RIP: 0010:xfs_trans_brelse+0xf/0x200 [xfs]
...
Call Trace:
xfs_iread_extents+0x379/0x540 [xfs]
xfs_file_iomap_begin_delay+0x11a/0xb40 [xfs]
? xfs_attr_get+0xd1/0x120 [xfs]
? iomap_write_begin.constprop.40+0x2d0/0x2d0
xfs_file_iomap_begin+0x4c4/0x6d0 [xfs]
? __vfs_getxattr+0x53/0x70
? iomap_write_begin.constprop.40+0x2d0/0x2d0
iomap_apply+0x63/0x130
? iomap_write_begin.constprop.40+0x2d0/0x2d0
iomap_file_buffered_write+0x62/0x90
? iomap_write_begin.constprop.40+0x2d0/0x2d0
xfs_file_buffered_aio_write+0xe4/0x3b0 [xfs]
__vfs_write+0x150/0x1b0
vfs_write+0xba/0x1c0
ksys_pwrite64+0x64/0xa0
do_syscall_64+0x5a/0x1d0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
The crash occurs because xfs_iread_extents() attempts to release an
uninitialized buffer pointer as the level == 0 value prevented the
buffer from ever being allocated or read. Change the level > 0
assert to an explicit error check in xfs_iread_extents() to avoid
crashing the kernel in the event of localized, in-core inode
corruption.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
nla_nest_start may fail. The fix check its status and returns
-EMSGSIZE in case it fails.
Signed-off-by: Kangjie Lu <kjlu@umn.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf 2019-03-16
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Fix a umem memory leak on cleanup in AF_XDP, from Björn.
2) Fix BTF to properly resolve forward-declared enums into their corresponding
full enum definition types during deduplication, from Andrii.
3) Fix libbpf to reject invalid flags in xsk_socket__create(), from Magnus.
4) Fix accessing invalid pointer returned from bpf_tcp_sock() and
bpf_sk_fullsock() after bpf_sk_release() was called, from Martin.
5) Fix generation of load/store DW instructions in PPC JIT, from Naveen.
6) Various fixes in BPF helper function documentation in bpf.h UAPI header
used to bpf-helpers(7) man page, from Quentin.
7) Fix segfault in BPF test_progs when prog loading failed, from Yonghong.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
nla_nest_start could fail and requires a check. The fix returns
-EMSGSIZE if it fails.
Signed-off-by: Kangjie Lu <kjlu@umn.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
nla_nest_start may fail and thus deserves a check.
The fix returns -EMSGSIZE in case it fails.
Signed-off-by: Kangjie Lu <kjlu@umn.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
nla_nest_start may fail and thus deserves a check.
The fix returns -EMSGSIZE when it fails.
Signed-off-by: Kangjie Lu <kjlu@umn.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
upcall is dereferenced even when genlmsg_put fails. The fix
goto out to avoid the NULL pointer dereference in this case.
Signed-off-by: Kangjie Lu <kjlu@umn.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Without IIO_TRIGGERED_BUFFER, this driver fails to link:
drivers/iio/chemical/pms7003.o: In function `pms7003_probe':
pms7003.c:(.text+0x21c): undefined reference to `devm_iio_triggered_buffer_setup'
pms7003.c:(.text+0x21c): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `devm_iio_triggered_buffer_setup'
Fixes: a1d642266c ("iio: chemical: add support for Plantower PMS7003 sensor")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Tomasz Duszynski <tduszyns@gmail.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Calculation did not use IIO_DEGREE_TO_RAD and implemented a variant to
avoid precision loss as we aim a nano value. The offset added to avoid
rounding error, though, doesn't give us a close result to the expected
value. E.g.
For 1000dps, the result should be:
(1000 * pi ) / 180 >> 15 ~= 0.000532632218
But with current calculation we get
$ cat scale
0.000547890
Fix the calculation by just doing the maths involved for a nano value
val * pi * 10e12 / (180 * 2^15)
so we get a closer result.
$ cat scale
0.000532632
Fixes: c14dca07a3 ("iio: cros_ec_sensors: add ChromeOS EC Contiguous Sensors driver")
Signed-off-by: Gwendal Grignou <gwendal@chromium.org>
Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
In remove, the clock is disabled before canceling the
delayed work. This means that the delayed work may be
touching unclocked hardware.
Fix by disabling the clock after the delayed work is
fully canceled. This is consistent with the probe error
path order.
Signed-off-by: Sven Van Asbroeck <TheSven73@gmail.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
If probe errors out after request_irq(), its error path
does not explicitly cancel the delayed work, which may
have been scheduled by the interrupt handler.
This means the delayed work may still be running when
the core frees the private structure (struct xadc).
This is a potential use-after-free.
Fix by inserting cancel_delayed_work_sync() in the probe
error path.
Signed-off-by: Sven Van Asbroeck <TheSven73@gmail.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
When cancel_delayed_work() returns, the delayed work may still
be running. This means that the core could potentially free
the private structure (struct xadc) while the delayed work
is still using it. This is a potential use-after-free.
Fix by calling cancel_delayed_work_sync(), which waits for
any residual work to finish before returning.
Signed-off-by: Sven Van Asbroeck <TheSven73@gmail.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
When issuing the write DAC register and write eeprom command, the two
powerdown bits (PD0 and PD1) are assumed by the chip to be present in
the bytes sent. Leaving them at 0 implies "powerdown disabled" which is
a different state that the current one. By adding the current state of
the powerdown in the i2c write, the chip will correctly power-on exactly
like as it is at the moment of store_eeprom call.
This is documented in MCP4725's datasheet, FIGURE 6-2: "Write Commands
for DAC Input Register and EEPROM" and MCP4726's datasheet, FIGURE 6-3:
"Write All Memory Command".
Signed-off-by: Jean-Francois Dagenais <jeff.dagenais@gmail.com>
Acked-by: Peter Meerwald-Stadler <pmeerw@pmeerw.net>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Yauheni Kaliuta pointed out that PTR_TO_STACK store/load verifier test
was failing on powerpc64 BE, and rightfully indicated that the PPC_LD()
macro is not masking away the last two bits of the offset per the ISA,
resulting in the generation of 'lwa' instruction instead of the intended
'ld' instruction.
Segher also pointed out that we can't simply mask away the last two bits
as that will result in loading/storing from/to a memory location that
was not intended.
This patch addresses this by using ldx/stdx if the offset is not
word-aligned. We load the offset into a temporary register (TMP_REG_2)
and use that as the index register in a subsequent ldx/stdx. We fix
PPC_LD() macro to mask off the last two bits, but enhance PPC_BPF_LL()
and PPC_BPF_STL() to factor in the offset value and generate the proper
instruction sequence. We also convert all existing users of PPC_LD() and
PPC_STD() to use these macros. All existing uses of these macros have
been audited to ensure that TMP_REG_2 can be clobbered.
Fixes: 156d0e290e ("powerpc/ebpf/jit: Implement JIT compiler for extended BPF")
Cc: stable@vger.kernel.org # v4.9+
Reported-by: Yauheni Kaliuta <yauheni.kaliuta@redhat.com>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Same reasons than the ones explained in commit 4179cb5a4c
("vxlan: test dev->flags & IFF_UP before calling netif_rx()")
netif_rx_ni() or napi_gro_frags() must be called under a strict contract.
At device dismantle phase, core networking clears IFF_UP
and flush_all_backlogs() is called after rcu grace period
to make sure no incoming packet might be in a cpu backlog
and still referencing the device.
A similar protocol is used for gro layer.
Most drivers call netif_rx() from their interrupt handler,
and since the interrupts are disabled at device dismantle,
netif_rx() does not have to check dev->flags & IFF_UP
Virtual drivers do not have this guarantee, and must
therefore make the check themselves.
Fixes: 1bd4978a88 ("tun: honor IFF_UP in tun_get_user()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If an interrupt is already pending when the interrupt is enabled on the
GXL phy, no IRQ will ever be triggered.
The fix is simply to make sure pending IRQs are cleared before setting
up the irq mask.
Fixes: cf127ff20a ("net: phy: meson-gxl: add interrupt support")
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds missing sphinx documentation to the
socket.c's functions. Also fixes some whitespaces.
I also changed the style of older documentation as an
effort to have an uniform documentation style.
Signed-off-by: Pedro Tammela <pctammela@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case create_singlethread_workqueue fails, the check returns
an error to callers to avoid potential NULL pointer dereferences.
Signed-off-by: Kangjie Lu <kjlu@umn.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are several statements that contain extra spacing in
the indentation; clean this up by removing spaces. Also
add { } braces on if statement to keep to kernel coding
style.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a statement that is indented incorrectly; replace
spaces with a tab.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We initially interpreted the fwmark parameter as a flag that simply turned
on the feature, using the whole skb->mark field as the index into the CAKE
tin_order array. However, it is quite common for different applications to
use different parts of the mask field for their own purposes, each using a
different mask.
Support this use of subsets of the mark by interpreting the TCA_CAKE_FWMARK
parameter as a bitmask to apply to the fwmark field when reading it. The
result will be right-shifted by the number of unset lower bits of the mask
before looking up the tin.
In the original commit message we also failed to credit Felix Resch with
originally suggesting the fwmark feature back in 2017; so the Suggested-By
in this commit covers the whole fwmark feature.
Fixes: 0b5c7efdfc ("sch_cake: Permit use of connmarks as tin classifiers")
Suggested-by: Felix Resch <fuller@beif.de>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
netdev_alloc_skb can fail and return a NULL pointer which is
dereferenced without a check. The patch avoids such a scenario.
Signed-off-by: Aditya Pakki <pakki001@umn.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
When sending non-linear skbs with jumbo frames, we set up the non-paged
data and mark that as a last segment, although the paged fragments are
also prepared. This will stall the TX queue and trigger a watchdog warning
(a simple reproducer is to run an iperf client mode TCP test with a large
MTU - networking fails instantly).
Fix by checking if the skb is non-linear.
Signed-off-by: Aaro Koskinen <aaro.koskinen@nokia.com>
Acked-by: Jose Abreu <joabreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 0e80bdc9a7 ("stmmac: first frame prep at the end of xmit
routine") overlooked jumbo frames when re-ordering the code, and as a
result the own bit was not getting set anymore for the first jumbo frame
descriptor. Commit 487e2e22ab ("net: stmmac: Set OWN bit for jumbo
frames") tried to fix this, but now the bit is getting set too early and
the DMA may start while we are still setting up the remaining descriptors.
And with the chain mode the own bit remains still unset.
Fix by setting the own bit at the end of xmit also with jumbo frames.
Fixes: 0e80bdc9a7 ("stmmac: first frame prep at the end of xmit routine")
Fixes: 487e2e22ab ("net: stmmac: Set OWN bit for jumbo frames")
Signed-off-by: Aaro Koskinen <aaro.koskinen@nokia.com>
Acked-by: Jose Abreu <joabreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
register_snap_client may return NULL, all the callers
check it, but only print a warning. This will result in
NULL pointer dereference in unregister_snap_client and other
places.
It has always been used like this since v2.6
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When unloading xfrm6_tunnel module, xfrm6_tunnel_fini directly
frees the xfrm6_tunnel_spi_kmem. Maybe someone has gotten the
xfrm6_tunnel_spi, so need to wait it.
Fixes: 91cc3bb0b04ff("xfrm6_tunnel: RCU conversion")
Signed-off-by: Su Yanjun <suyj.fnst@cn.fujitsu.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Quentin Monnet says:
====================
Hi,
This set is an update to the documentation for the BPF helper functions in
the UAPI header bpf.h, used to generate the bpf-helpers(7) man page.
First patch contains fixes to the current documentation. The second patch
adds documentation for the two helpers bpf_spin_lock() and
bpf_spin_unlock(). The last patch simply reports the changes to the version
of that header in tools/.
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Synchronise the bpf.h header under tools, to report the latest fixes and
additions to the documentation for the BPF helpers.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add documentation for the BPF spinlock-related helpers to the doc in
bpf.h. I added the constraints and restrictions coming with the use of
spinlocks for BPF: not all of it is directly related to the use of the
helper, but I thought it would be nice for users to find them in the man
page.
This list of restrictions is nearly a verbatim copy of the list in
Alexei's commit log for those helpers.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Another round of minor fixes for the documentation of the BPF helpers
located in the UAPI bpf.h header file. Changes include:
- Moving around description of some helpers, to keep the descriptions in
the same order as helpers are declared (bpf_map_push_elem(), leftover
from commit 90b1023f68 ("bpf: fix documentation for eBPF helpers"),
bpf_rc_keydown(), and bpf_skb_ancestor_cgroup_id()).
- Fixing typos ("contex" -> "context").
- Harmonising return types ("void* " -> "void *", "uint64_t" -> "u64").
- Addition of the "bpf_" prefix to bpf_get_storage().
- Light additions of RST markup on some keywords.
- Empty line deletion between description and return value for
bpf_tcp_sock().
- Edit for the description for bpf_skb_ecn_set_ce() (capital letters,
acronym expansion, no effect if ECT not set, more details on return
value).
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Andrii Nakryiko says:
====================
This patchset adds ability to resolve forward-declared enums into their
corresponding full enum definition types during type deduplication,
eliminating one of the reasons for having duplicated graphs of types.
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This patch adds test verifying new btf_dedup logic of resolving
forward-declared enums.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
GCC and clang support enum forward declarations as an extension. Such
forward-declared enums will be represented as normal BTF_KIND_ENUM types with
vlen=0. This patch adds ability to resolve such enums to their corresponding
fully defined enums. This helps to avoid duplicated BTF type graphs which only
differ by some types referencing forward-declared enum vs full enum.
One such example in kernel is enum irqchip_irq_state, defined in
include/linux/interrupt.h and forward-declared in include/linux/irq.h. This
causes entire struct task_struct and all referenced types to be duplicated in
btf_dedup output. This patch eliminates such duplication cases.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Architectures like ppc64 use the deposited page table to store hardware
page table slot information. Make sure we deposit a page table when
using zero page at the pmd level for hash.
Without this we hit
Unable to handle kernel paging request for data at address 0x00000000
Faulting instruction address: 0xc000000000082a74
Oops: Kernel access of bad area, sig: 11 [#1]
....
NIP [c000000000082a74] __hash_page_thp+0x224/0x5b0
LR [c0000000000829a4] __hash_page_thp+0x154/0x5b0
Call Trace:
hash_page_mm+0x43c/0x740
do_hash_page+0x2c/0x3c
copy_from_iter_flushcache+0xa4/0x4a0
pmem_copy_from_iter+0x2c/0x50 [nd_pmem]
dax_copy_from_iter+0x40/0x70
dax_iomap_actor+0x134/0x360
iomap_apply+0xfc/0x1b0
dax_iomap_rw+0xac/0x130
ext4_file_write_iter+0x254/0x460 [ext4]
__vfs_write+0x120/0x1e0
vfs_write+0xd8/0x220
SyS_write+0x6c/0x110
system_call+0x3c/0x130
Fixes: b5beae5e22 ("powerpc/pseries: Add driver for PAPR SCM regions")
Cc: <stable@vger.kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Martin KaFai Lau says:
====================
This set addresses issue about accessing invalid
ptr returned from bpf_tcp_sock() and bpf_sk_fullsock()
after bpf_sk_release().
v4:
- Tried the one "id" approach. It does not work well and the reason is in
the Patch 1 commit message.
- Rename refcount_id to ref_obj_id.
- With ref_obj_id, resetting reg->id to 0 is fine in mark_ptr_or_null_reg()
because ref_obj_id is passed to release_reference() instead of reg->id.
- Also reset reg->ref_obj_id in mark_ptr_or_null_reg() when is_null == true
- sk_to_full_sk() is removed from bpf_sk_fullsock() and bpf_tcp_sock().
- bpf_get_listener_sock() is added to do sk_to_full_sk() in Patch 2.
- If tp is from bpf_tcp_sock(sk) and sk is a refcounted ptr,
bpf_sk_release(tp) is also allowed.
v3:
- reset reg->refcount_id for the is_null case in mark_ptr_or_null_reg()
v2:
- Remove refcount_id arg from release_reference() because
id == refcount_id
- Add a WARN_ON_ONCE to mark_ptr_or_null_regs() to catch
an internal verifier bug.
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This patch adds an example in using the new helper
bpf_get_listener_sock().
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Adding verifier tests to ensure the ptr returned from bpf_tcp_sock() and
bpf_sk_fullsock() cannot be accessed after bpf_sk_release() is called.
A few of the tests are derived from a reproducer test by Lorenz Bauer.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add a new helper "struct bpf_sock *bpf_get_listener_sock(struct bpf_sock *sk)"
which returns a bpf_sock in TCP_LISTEN state. It will trace back to
the listener sk from a request_sock if possible. It returns NULL
for all other cases.
No reference is taken because the helper ensures the sk is
in SOCK_RCU_FREE (where the TCP_LISTEN sock should be in).
Hence, bpf_sk_release() is unnecessary and the verifier does not
allow bpf_sk_release(listen_sk) to be called either.
The following is also allowed because the bpf_prog is run under
rcu_read_lock():
sk = bpf_sk_lookup_tcp();
/* if (!sk) { ... } */
listen_sk = bpf_get_listener_sock(sk);
/* if (!listen_sk) { ... } */
bpf_sk_release(sk);
src_port = listen_sk->src_port; /* Allowed */
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Lorenz Bauer [thanks!] reported that a ptr returned by bpf_tcp_sock(sk)
can still be accessed after bpf_sk_release(sk).
Both bpf_tcp_sock() and bpf_sk_fullsock() have the same issue.
This patch addresses them together.
A simple reproducer looks like this:
sk = bpf_sk_lookup_tcp();
/* if (!sk) ... */
tp = bpf_tcp_sock(sk);
/* if (!tp) ... */
bpf_sk_release(sk);
snd_cwnd = tp->snd_cwnd; /* oops! The verifier does not complain. */
The problem is the verifier did not scrub the register's states of
the tcp_sock ptr (tp) after bpf_sk_release(sk).
[ Note that when calling bpf_tcp_sock(sk), the sk is not always
refcount-acquired. e.g. bpf_tcp_sock(skb->sk). The verifier works
fine for this case. ]
Currently, the verifier does not track if a helper's return ptr (in REG_0)
is "carry"-ing one of its argument's refcount status. To carry this info,
the reg1->id needs to be stored in reg0.
One approach was tried, like "reg0->id = reg1->id", when calling
"bpf_tcp_sock()". The main idea was to avoid adding another "ref_obj_id"
for the same reg. However, overlapping the NULL marking and ref
tracking purpose in one "id" does not work well:
ref_sk = bpf_sk_lookup_tcp();
fullsock = bpf_sk_fullsock(ref_sk);
tp = bpf_tcp_sock(ref_sk);
if (!fullsock) {
bpf_sk_release(ref_sk);
return 0;
}
/* fullsock_reg->id is marked for NOT-NULL.
* Same for tp_reg->id because they have the same id.
*/
/* oops. verifier did not complain about the missing !tp check */
snd_cwnd = tp->snd_cwnd;
Hence, a new "ref_obj_id" is needed in "struct bpf_reg_state".
With a new ref_obj_id, when bpf_sk_release(sk) is called, the verifier can
scrub all reg states which has a ref_obj_id match. It is done with the
changes in release_reg_references() in this patch.
While fixing it, sk_to_full_sk() is removed from bpf_tcp_sock() and
bpf_sk_fullsock() to avoid these helpers from returning
another ptr. It will make bpf_sk_release(tp) possible:
sk = bpf_sk_lookup_tcp();
/* if (!sk) ... */
tp = bpf_tcp_sock(sk);
/* if (!tp) ... */
bpf_sk_release(tp);
A separate helper "bpf_get_listener_sock()" will be added in a later
patch to do sk_to_full_sk().
Misc change notes:
- To allow bpf_sk_release(tp), the arg of bpf_sk_release() is changed
from ARG_PTR_TO_SOCKET to ARG_PTR_TO_SOCK_COMMON. ARG_PTR_TO_SOCKET
is removed from bpf.h since no helper is using it.
- arg_type_is_refcounted() is renamed to arg_type_may_be_refcounted()
because ARG_PTR_TO_SOCK_COMMON is the only one and skb->sk is not
refcounted. All bpf_sk_release(), bpf_sk_fullsock() and bpf_tcp_sock()
take ARG_PTR_TO_SOCK_COMMON.
- check_refcount_ok() ensures is_acquire_function() cannot take
arg_type_may_be_refcounted() as its argument.
- The check_func_arg() can only allow one refcount-ed arg. It is
guaranteed by check_refcount_ok() which ensures at most one arg can be
refcounted. Hence, it is a verifier internal error if >1 refcount arg
found in check_func_arg().
- In release_reference(), release_reference_state() is called
first to ensure a match on "reg->ref_obj_id" can be found before
scrubbing the reg states with release_reg_references().
- reg_is_refcounted() is no longer needed.
1. In mark_ptr_or_null_regs(), its usage is replaced by
"ref_obj_id && ref_obj_id == id" because,
when is_null == true, release_reference_state() should only be
called on the ref_obj_id obtained by a acquire helper (i.e.
is_acquire_function() == true). Otherwise, the following
would happen:
sk = bpf_sk_lookup_tcp();
/* if (!sk) { ... } */
fullsock = bpf_sk_fullsock(sk);
if (!fullsock) {
/*
* release_reference_state(fullsock_reg->ref_obj_id)
* where fullsock_reg->ref_obj_id == sk_reg->ref_obj_id.
*
* Hence, the following bpf_sk_release(sk) will fail
* because the ref state has already been released in the
* earlier release_reference_state(fullsock_reg->ref_obj_id).
*/
bpf_sk_release(sk);
}
2. In release_reg_references(), the current reg_is_refcounted() call
is unnecessary because the id check is enough.
- The type_is_refcounted() and type_is_refcounted_or_null()
are no longer needed also because reg_is_refcounted() is removed.
Fixes: 655a51e536 ("bpf: Add struct bpf_tcp_sock and BPF_FUNC_tcp_sock")
Reported-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
As readahead is an optimization, all errors are usually filtered out,
but still properly handled when the real read call is done. The commit
5e9d398240 ("btrfs: readpages() should submit IO as read-ahead") added
REQ_RAHEAD to readpages() because that's only used for readahead
(despite what one would expect from the callback name).
This causes a flood of messages and inflated read error stats, so skip
reporting in case it's readahead.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202403
Reported-by: LimeTech <tomm@lime-technology.com>
Fixes: 5e9d398240 ("btrfs: readpages() should submit IO as read-ahead")
CC: stable@vger.kernel.org # 4.19+
Signed-off-by: David Sterba <dsterba@suse.com>
When we are mixing buffered writes with direct IO writes against the same
file and snapshotting is happening concurrently, we can end up with a
corrupt file content in the snapshot. Example:
1) Inode/file is empty.
2) Snapshotting starts.
2) Buffered write at offset 0 length 256Kb. This updates the i_size of the
inode to 256Kb, disk_i_size remains zero. This happens after the task
doing the snapshot flushes all existing delalloc.
3) DIO write at offset 256Kb length 768Kb. Once the ordered extent
completes it sets the inode's disk_i_size to 1Mb (256Kb + 768Kb) and
updates the inode item in the fs tree with a size of 1Mb (which is
the value of disk_i_size).
4) The dealloc for the range [0, 256Kb[ did not start yet.
5) The transaction used in the DIO ordered extent completion, which updated
the inode item, is committed by the snapshotting task.
6) Snapshot creation completes.
7) Dealloc for the range [0, 256Kb[ is flushed.
After that when reading the file from the snapshot we always get zeroes for
the range [0, 256Kb[, the file has a size of 1Mb and the data written by
the direct IO write is found. From an application's point of view this is
a corruption, since in the source subvolume it could never read a version
of the file that included the data from the direct IO write without the
data from the buffered write included as well. In the snapshot's tree,
file extent items are missing for the range [0, 256Kb[.
The issue, obviously, does not happen when using the -o flushoncommit
mount option.
Fix this by flushing delalloc for all the roots that are about to be
snapshotted when committing a transaction. This guarantees total ordering
when updating the disk_i_size of an inode since the flush for dealloc is
done when a transaction is in the TRANS_STATE_COMMIT_START state and wait
is done once no more external writers exist. This is similar to what we
do when using the flushoncommit mount option, but we do it only if the
transaction has snapshots to create and only for the roots of the
subvolumes to be snapshotted. The bulk of the dealloc is flushed in the
snapshot creation ioctl, so the flush work we do inside the transaction
is minimized.
This issue, involving buffered and direct IO writes with snapshotting, is
often triggered by fstest btrfs/078, and got reported by fsck when not
using the NO_HOLES features, for example:
$ cat results/btrfs/078.full
(...)
_check_btrfs_filesystem: filesystem on /dev/sdc is inconsistent
*** fsck.btrfs output ***
[1/7] checking root items
[2/7] checking extents
[3/7] checking free space cache
[4/7] checking fs roots
root 258 inode 264 errors 100, file extent discount
Found file extent holes:
start: 524288, len: 65536
ERROR: errors found in fs roots
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When Filipe added the recursive directory logging stuff in
2f2ff0ee5e ("Btrfs: fix metadata inconsistencies after directory
fsync") he specifically didn't take the directory i_mutex for the
children directories that we need to log because of lockdep. This is
generally fine, but can lead to this WARN_ON() tripping if we happen to
run delayed deletion's in between our first search and our second search
of dir_item/dir_indexes for this directory. We expect this to happen,
so the WARN_ON() isn't necessary. Drop the WARN_ON() and add a comment
so we know why this case can happen.
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
If we do a shrinking truncate against an inode which is already present
in the respective log tree and then rename it, as part of logging the new
name we end up logging an inode item that reflects the old size of the
file (the one which we previously logged) and not the new smaller size.
The decision to preserve the size previously logged was added by commit
1a4bcf470c ("Btrfs: fix fsync data loss after adding hard link to
inode") in order to avoid data loss after replaying the log. However that
decision is only needed for the case the logged inode size is smaller then
the current size of the inode, as explained in that commit's change log.
If the current size of the inode is smaller then the previously logged
size, we know a shrinking truncate happened and therefore need to use
that smaller size.
Example to trigger the problem:
$ mkfs.btrfs -f /dev/sdb
$ mount /dev/sdb /mnt
$ xfs_io -f -c "pwrite -S 0xab 0 8000" /mnt/foo
$ xfs_io -c "fsync" /mnt/foo
$ xfs_io -c "truncate 3000" /mnt/foo
$ mv /mnt/foo /mnt/bar
$ xfs_io -c "fsync" /mnt/bar
<power failure>
$ mount /dev/sdb /mnt
$ od -t x1 -A d /mnt/bar
0000000 ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab
*
0008000
Once we rename the file, we log its name (and inode item), and because
the inode was already logged before in the current transaction, we log it
with a size of 8000 bytes because that is the size we previously logged
(with the first fsync). As part of the rename, besides logging the inode,
we do also sync the log, which is done since commit d4682ba03e
("Btrfs: sync log after logging new name"), so the next fsync against our
inode is effectively a no-op, since no new changes happened since the
rename operation. Even if did not sync the log during the rename
operation, the same problem (fize size of 8000 bytes instead of 3000
bytes) would be visible after replaying the log if the log ended up
getting synced to disk through some other means, such as for example by
fsyncing some other modified file. In the example above the fsync after
the rename operation is there just because not every filesystem may
guarantee logging/journalling the inode (and syncing the log/journal)
during the rename operation, for example it is needed for f2fs, but not
for ext4 and xfs.
Fix this scenario by, when logging a new name (which is triggered by
rename and link operations), using the current size of the inode instead
of the previously logged inode size.
A test case for fstests follows soon.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202695
CC: stable@vger.kernel.org # 4.4+
Reported-by: Seulbae Kim <seulbae@gatech.edu>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
After commit fbeec965b8 ("ASoC: samsung: odroid: Fix 32000 sample rate
handling") the audio root clock frequency is configured improperly for
44100 sample rate. Due to clock rate rounding it's 20070401 Hz instead
of 22579000 Hz. This results in a too low value of the PSR clock divider
in the CPU DAI driver and too fast actual sample rate for fs=44100. E.g.
1 kHz tone has actual 1780 Hz frequency (1 kHz * 20070401/22579000 * 2).
Fix this by increasing the correction passed to clk_set_rate() to take
into account inaccuracy of the EPLL frequency properly.
Fixes: fbeec965b8 ("ASoC: samsung: odroid: Fix 32000 sample rate handling")
Reported-by: JaeChul Lee <jcsing.lee@samsung.com>
Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
The driver changes the stream name of DAC and ADC to avoid the issue of
widget with prefixed name. When the machine adds prefixed name for codec,
the stream name of DAI may not find the widgets.
Signed-off-by: John Hsu <KCHSU0@nuvoton.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
In case alloc_ordered_workqueue fails, the fix releases
sources and returns -ENOMEM to avoid NULL pointer dereference.
Signed-off-by: Kangjie Lu <kjlu@umn.edu>
Acked-by: Michael Hennerich <michael.hennerich@analog.com>
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
In xsk_socket__create(), the libbpf_flags field was not checked for
setting currently unused/unknown flags. This patch fixes that by
returning -EINVAL if the user has set any flag that is not in use at
this point in time.
Fixes: 1cad078842 ("libbpf: add support for using AF_XDP sockets")
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Reviewed-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The test_progs subtests, test_spin_lock() and test_map_lock(),
requires BTF present to run successfully.
Currently, when BTF failed to load, test_progs will segfault,
$ ./test_progs
...
12: (bf) r1 = r8
13: (85) call bpf_spin_lock#93
map 'hash_map' has to have BTF in order to use bpf_spin_lock
libbpf: -- END LOG --
libbpf: failed to load program 'map_lock_demo'
libbpf: failed to load object './test_map_lock.o'
test_map_lock:bpf_prog_load errno 13
Segmentation fault
The segfault is caused by uninitialized variable "obj", which
is used in bpf_object__close(obj), when bpf prog failed to load.
Initializing variable "obj" to NULL in two occasions fixed the problem.
$ ./test_progs
...
Summary: 219 PASSED, 2 FAILED
Fixes: b4d4556c32 ("selftests/bpf: add bpf_spin_lock verifier tests")
Fixes: ba72a7b4ba ("selftests/bpf: test for BPF_F_LOCK")
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
When adding the MFD dependency for power domains and WDT in bcm2835, I
added it only on the arm32 side and missed it for arm64.
Fixes: 5e6acc3e67 ("bcm2835-pm: Move bcm2835-watchdog's DT probe to an MFD.")
Signed-off-by: Eric Anholt <eric@anholt.net>
Reported-by: Stefan Wahren <stefan.wahren@i2se.com>
Acked-by: Stefan Wahren <stefan.wahren@i2se.com>
The current AP bus implementation periodically polls the AP configuration
to detect changes. When the AP configuration is dynamically changed via the
SE or an SCLP instruction, the changes will not be reflected to sysfs until
the next time the AP configuration is polled. The CHSC architecture
provides a Store Event Information (SEI) command to make notification of an
AP configuration change. This patch introduces a handler to process
notification from the CHSC SEI command by immediately kicking off an AP bus
scan-after-event.
Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
Reviewed-by: Sebastian Ott <sebott@linux.ibm.com>
Reviewed-by: Harald Freudenberger <FREUDE@de.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Sebastian Ott <sebott@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
compiler complains about following declarations
sound/soc/sh/rcar/src.c:174:1: warning: 'static' is not at beginning of declaration [-Wold-style-declaration]
const static u32 bsdsr_table_pattern1[] = {
^~~~~
sound/soc/sh/rcar/src.c:183:1: warning: 'static' is not at beginning of declaration [-Wold-style-declaration]
const static u32 bsdsr_table_pattern2[] = {
^~~~~
sound/soc/sh/rcar/src.c:192:1: warning: 'static' is not at beginning of declaration [-Wold-style-declaration]
const static u32 bsisr_table[] = {
^~~~~
sound/soc/sh/rcar/src.c:201:1: warning: 'static' is not at beginning of declaration [-Wold-style-declaration]
const static u32 chan288888[] = {
^~~~~
sound/soc/sh/rcar/src.c:210:1: warning: 'static' is not at beginning of declaration [-Wold-style-declaration]
const static u32 chan244888[] = {
^~~~~
sound/soc/sh/rcar/src.c:219:1: warning: 'static' is not at beginning of declaration [-Wold-style-declaration]
const static u32 chan222222[] = {
^~~~~
This patch moves the 'static' keyword to the front of the
declaration to fix the compiler warnings
Fixes: linux-next commit 7674bec4fc ("ASoC: rsnd: update BSDSR/BSDISR handling")
Signed-off-by: Jiada Wang <jiada_wang@mentor.com>
Acked-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
lockdep warns us that priv->lock and k->k_lock can cause a
deadlock when after acquire of k->k_lock, process is interrupted
by src, while in another routine of src .init, k->k_lock is
acquired with priv->lock held.
This patch avoids a potential deadlock by not calling soc_device_match()
in SRC .init callback, instead it adds new soc fields in priv->flags to
differentiate SoCs.
Fixes: linux-next commit 7674bec4fc ("ASoC: rsnd: update BSDSR/BSDISR handling")
Signed-off-by: Jiada Wang <jiada_wang@mentor.com>
Acked-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
- Declare SR as volatile, as it is changed by hardware.
- Remove TXDR from readable and volatile register list,
as it is intended for write accesses only.
Signed-off-by: Olivier Moysan <olivier.moysan@st.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
The driver has two issues when machine add prefix name for codec.
(1)The stream name of DAI can't find the AIF widgets.
(2)The drivr can enable/disalbe the MICBIAS and SAR widgets.
The patch will fix these issues caused by prefixed name added.
Signed-off-by: John Hsu <KCHSU0@nuvoton.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
The dpcm get from fe_clients/be_clients
may be free before use
Add a spin lock at snd_soc_card level,
to protect the dpcm instance.
The lock may be used in atomic context, so use spin lock.
Use irq spin lock version,
since the lock may be used in interrupts.
possible race condition between
void dpcm_be_disconnect(
...
list_del(&dpcm->list_be);
list_del(&dpcm->list_fe);
kfree(dpcm);
...
and
for_each_dpcm_fe()
for_each_dpcm_be*()
race condition example
Thread 1:
snd_soc_dapm_mixer_update_power()
-> soc_dpcm_runtime_update()
-> dpcm_be_disconnect()
-> kfree(dpcm);
Thread 2:
dpcm_fe_dai_trigger()
-> dpcm_be_dai_trigger()
-> snd_soc_dpcm_can_be_free_stop()
-> if (dpcm->fe == fe)
Excpetion Scenario:
two FE link to same BE
FE1 -> BE
FE2 ->
Thread 1: switch of mixer between FE2 -> BE
Thread 2: pcm_stop FE1
Exception:
Unable to handle kernel paging request at virtual address dead0000000000e0
pc=<> [<ffffff8960e2cd10>] dpcm_be_dai_trigger+0x29c/0x47c
sound/soc/soc-pcm.c:3226
if (dpcm->fe == fe)
lr=<> [<ffffff8960e2f694>] dpcm_fe_dai_do_trigger+0x94/0x26c
Backtrace:
[<ffffff89602dba80>] notify_die+0x68/0xb8
[<ffffff896028c7dc>] die+0x118/0x2a8
[<ffffff89602a2f84>] __do_kernel_fault+0x13c/0x14c
[<ffffff89602a27f4>] do_translation_fault+0x64/0xa0
[<ffffff8960280cf8>] do_mem_abort+0x4c/0xd0
[<ffffff8960282ad0>] el1_da+0x24/0x40
[<ffffff8960e2cd10>] dpcm_be_dai_trigger+0x29c/0x47c
[<ffffff8960e2f694>] dpcm_fe_dai_do_trigger+0x94/0x26c
[<ffffff8960e2edec>] dpcm_fe_dai_trigger+0x3c/0x44
[<ffffff8960de5588>] snd_pcm_do_stop+0x50/0x5c
[<ffffff8960dded24>] snd_pcm_action+0xb4/0x13c
[<ffffff8960ddfdb4>] snd_pcm_drop+0xa0/0x128
[<ffffff8960de69bc>] snd_pcm_common_ioctl+0x9d8/0x30f0
[<ffffff8960de1cac>] snd_pcm_ioctl_compat+0x29c/0x2f14
[<ffffff89604c9d60>] compat_SyS_ioctl+0x128/0x244
[<ffffff8960283740>] el0_svc_naked+0x34/0x38
[<ffffffffffffffff>] 0xffffffffffffffff
Signed-off-by: KaiChieh Chuang <kaichieh.chuang@mediatek.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
If playback and capture are enabled concurrently, when the capture stops
the output becomes inaudile. The playback application will become stuck
and underrun after a timeout.
This is caused by mistaken use of the stream_id, which should only be
set for playback and not for capture
Tested on Apollolake and Kabylake with SST driver.
Signed-off-by: Rander Wang <rander.wang@linux.intel.com>
Acked-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
The current implementation of the hdac_hda codec results in zero-valued
samples on capture and noise with headset playback when SOF is used on
platforms with an on-board HDaudio codec. This is root-caused to SOF
using be_hw_params_fixup, and the prepare() call using invalid runtime
fields to determine the format.
This patch moves the format handling to the hw_params() callback, as
done already for hdac_hdmi, to make sure the fixed-up information is
taken into account but keeps the codec initialization in prepare() as
the stream_tag is only available at that time. Moving everything in the
prepare() callback is possible but the code is less elegant so this
two-step solution was chosen.
The solution was tested with the SST driver with no regressions, and all
the issues with SOF playback and capture are solved.
Signed-off-by: Rander Wang <rander.wang@linux.intel.com>
Acked-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
On HDaudio platforms, if playback is started when capture is working,
there is no audible output.
This can be root-caused to the use of the rx|tx_mask to store an HDaudio
stream tag.
If capture is stared before playback, rx_mask would be non-zero on HDaudio
platform, then the channel number of playback, which is in the same codec
dai with the capture, would be changed by soc_pcm_codec_params_fixup based
on the tx_mask at first, then overwritten by this function based on rx_mask
at last.
According to the author of tx|rx_mask, tx_mask is for playback and rx_mask
is for capture. And stream direction is checked at all other references of
tx|rx_mask in ASoC, so here should be an error. This patch checks stream
direction for tx|rx_mask for fixup function.
This issue would affect not only HDaudio+ASoC, but also I2S codecs if the
channel number based on rx_mask is not equal to the one for tx_mask. It could
be rarely reproduecd because most drivers in kernel set the same channel number
to tx|rx_mask or rx_mask is zero.
Tested on all platforms using stream_tag & HDaudio and intel I2S platforms.
Signed-off-by: Rander Wang <rander.wang@linux.intel.com>
Acked-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
This patch sets missing stream_name of capture part of the DAI driver
so we can define DAPM routing properly also for the capture stream.
While at it "Playback" suffix is added to the playback stream names
to clearly identify playback/capture.
Together with related dts patch this fixes NULL pointer dereference
when opening ALSA device for recording on Odroid XU3.
Fixes: 64aba9bca5 ("ASoC: samsung: i2s: Add widgets and routes for DPCM support")
Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Currently ver_ptr is being null checked twice, once before calling
usb_string and once afterwards. The second null check is redundant
and can be removed, remove it.
Detected by CoverityScan, CID#1477308 ("Logically dead code")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
The kernel-doc annotation is misused for hid_mouse_ignore_list. The script
complains about it:
drivers/hid/hid-quirks.c:894: warning: cannot understand function prototype: 'const struct hid_device_id hid_mouse_ignore_list[] = '
Drop the annotation to make script happy.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
The newly added power supply code fails to link when the power supply core
code is disabled:
drivers/hid/hid-asus.o: In function `asus_battery_get_property':
hid-asus.c:(.text+0x11de): undefined reference to `power_supply_get_drvdata'
drivers/hid/hid-asus.o: In function `asus_probe':
hid-asus.c:(.text+0x170c): undefined reference to `devm_power_supply_register'
hid-asus.c:(.text+0x1734): undefined reference to `power_supply_powers'
drivers/hid/hid-asus.o: In function `asus_raw_event':
hid-asus.c:(.text+0x1914): undefined reference to `power_supply_changed'
Select the subsystem from Kconfig as we do for other hid drivers already.
Fixes: 6311d329e1 ("HID: hid-asus: Add BT keyboard dock battery monitoring support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
When building with -Wformat, clang warns:
drivers/hid/hid-quirks.c:1075:27: warning: format specifies type
'unsigned short' but the argument has type '__u32' (aka 'unsigned int')
[-Wformat]
bl_entry->driver_data, bl_entry->vendor,
^~~~~~~~~~~~~~~~
./include/linux/hid.h:1170:48: note: expanded from macro 'dbg_hid'
printk(KERN_DEBUG "%s: " format, __FILE__, ##arg); \
~~~~~~ ^~~
drivers/hid/hid-quirks.c:1076:4: warning: format specifies type
'unsigned short' but the argument has type '__u32' (aka 'unsigned int')
[-Wformat]
bl_entry->product);
^~~~~~~~~~~~~~~~~
./include/linux/hid.h:1170:48: note: expanded from macro 'dbg_hid'
printk(KERN_DEBUG "%s: " format, __FILE__, ##arg); \
~~~~~~ ^~~
drivers/hid/hid-quirks.c:1242:12: warning: format specifies type
'unsigned short' but the argument has type '__u32' (aka 'unsigned int')
[-Wformat]
quirks, hdev->vendor, hdev->product);
^~~~~~~~~~~~
./include/linux/hid.h:1170:48: note: expanded from macro 'dbg_hid'
printk(KERN_DEBUG "%s: " format, __FILE__, ##arg); \
~~~~~~ ^~~
drivers/hid/hid-quirks.c:1242:26: warning: format specifies type
'unsigned short' but the argument has type '__u32' (aka 'unsigned int')
[-Wformat]
quirks, hdev->vendor, hdev->product);
^~~~~~~~~~~~~
./include/linux/hid.h:1170:48: note: expanded from macro 'dbg_hid'
printk(KERN_DEBUG "%s: " format, __FILE__, ##arg); \
~~~~~~ ^~~
4 warnings generated.
This patch fixes the format strings to use the correct format type for unsigned
ints.
Link: https://github.com/ClangBuiltLinux/linux/issues/378
Signed-off-by: Louis Taylor <louis@kragniz.eu>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
When we get an interrupt for a channel program, it is not
necessarily the final interrupt; for example, the issuing
guest may request an intermediate interrupt by specifying
the program-controlled-interrupt flag on a ccw.
We must not switch the state to idle if the interrupt is not
yet final; even more importantly, we must not free the translated
channel program if the interrupt is not yet final, or the host
can crash during cp rewind.
Fixes: e5f84dbaea ("vfio: ccw: return I/O results asynchronously")
Cc: stable@vger.kernel.org # v4.12+
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
This patch splits and cleans up error handling logic for loading BTF data.
Previously, if BTF data was parsed successfully, but failed to load into
kernel, we'd report nonsensical error code, instead of error returned from
btf__load(). Now btf__new() and btf__load() are handled separately with proper
cleanup and warning reporting.
Fixes: d29d87f7e6 ("btf: separate btf creation and loading")
Reported-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The SPI interface implementation was completely broken.
When using the SPI interface, there are only 7 address bits, the upper bit
is controlled by a page select register. The core needs access to both
ranges, so implement register read/write for both regions. The regmap
paging functionality didn't agree with a register that needs to be read
and modified, so I implemented a custom paging algorithm.
This fixes that the device wouldn't even probe in SPI mode.
The SPI interface then isn't different from I2C, merged them into the core,
and the I2C/SPI named registers are no longer needed.
Implemented register value caching for the registers to reduce the I2C/SPI
data transfers considerably.
The calibration set reads as all zeroes until some undefined point in time,
and I couldn't determine what makes it valid. The datasheet mentions these
registers but does not provide any hints on when they become valid, and they
aren't even enumerated in the memory map. So check the calibration and
retry reading it from the device after each measurement until it provides
something valid.
Despite the size this is suitable for a stable backport given that
it seems the SPI support never worked.
Signed-off-by: Mike Looijmans <mike.looijmans@topic.nl>
Fixes: 1b3bd85927 ("iio: chemical: Add support for Bosch BME680 sensor");
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
The standard unit for temperature is millidegrees Celcius. Adapt the
driver to report in millidegrees instead of degrees.
Signed-off-by: Mike Looijmans <mike.looijmans@topic.nl>
Fixes: 1b3bd85927 ("iio: chemical: Add support for Bosch BME680 sensor");
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
I clearly messed up applying this patch. Not sure
how but the entire Kconfig block is missing. This
patch puts it back as it was in the original patch.
Reported-by: Andreas Brauchli <a.brauchli@elementarea.net>
Fixes: ce51412416 ("iio: chemical: sgp30: Support Sensirion SGP30/SGPC3 sensors")
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Having a brief look at at91_adc_read_raw() it is obvious that in the case
of a timeout the setting of AT91_ADC_CHDR and AT91_ADC_IDR registers is
omitted. If 2 different channels are queried we can end up with a
situation where two interrupts are enabled, but only one interrupt is
cleared in the interrupt handler. Resulting in a interrupt loop and a
system hang.
Signed-off-by: Georg Ottinger <g.ottinger@abatec.at>
Acked-by: Ludovic Desroches <ludovic.desroches@microchip.com>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
According to the datasheet, the last bit of CHIP_ID register controls
I2C bus, and the first one is unused. Handle this correctly.
Note that there are chips out there that have a value such that
the id check currently fails.
Signed-off-by: Sergey Larin <cerg2010cerg2010@mail.ru>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
The trialmask is expected to have all bits set to 0 after allocation.
Currently kmalloc_array() is used which does not zero the memory and so
random bits are set. This results in random channels being enabled when
they shouldn't. Replace kmalloc_array() with kcalloc() which has the same
interface but zeros the memory.
Note the fix is actually required earlier than the below fixes tag, but
will require a manual backport due to move from kmalloc to kmalloc_array.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com>
Fixes commit 057ac1acdf ("iio: Use kmalloc_array() in iio_scan_mask_set()").
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
For rcu protected pointers, we'd better add '__rcu' for them.
Once added '__rcu' tag for rcu protected pointer, the sparse tool reports
warnings.
net/xfrm/xfrm_user.c:1198:39: sparse: expected struct sock *sk
net/xfrm/xfrm_user.c:1198:39: sparse: got struct sock [noderef] <asn:4> *nlsk
[...]
So introduce a new wrapper function of nlmsg_unicast to handle type
conversions.
This patch also fixes a direct access of a rcu protected socket.
Fixes: be33690d8fcf("[XFRM]: Fix aevent related crash")
Signed-off-by: Su Yanjun <suyj.fnst@cn.fujitsu.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
In esp4_gro_receive() and esp6_gro_receive(), secpath can be allocated
without adding xfrm state to xvec. Then, sp->xvec[sp->len - 1] would
fail and result in dereferencing invalid pointer in esp4_gso_segment()
and esp6_gso_segment(). Reset secpath if xfrm function returns error.
Fixes: 7785bba299 ("esp: Add a software GRO codepath")
Reported-by: syzbot+b69368fd933c6c592f4c@syzkaller.appspotmail.com
Signed-off-by: Myungho Jung <mhjungk@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
do_div() expects unsigned operands and otherwise triggers a warning like:
drivers/net/wireless/intel/iwlwifi/mvm/ftm-initiator.c:465:2: error: comparison of distinct pointer types ('typeof ((rtt_avg)) *' (aka 'long long *') and 'uint64_t *' (aka 'unsigned long long *')) [-Werror,-Wcompare-distinct-pointer-types]
do_div(rtt_avg, 6666);
^~~~~~~~~~~~~~~~~~~~~
include/asm-generic/div64.h:222:28: note: expanded from macro 'do_div'
(void)(((typeof((n)) *)0) == ((uint64_t *)0)); \
~~~~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~
1 error generated.
Change the do_div() to the simpler div_s64() that can handle
negative inputs correctly.
Fixes: 937b10c0de ("iwlwifi: mvm: add debug prints for FTM")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
mt76 patches for 5.1
* fix hardware restart for mt76x2
* fix writing txwi on USB devices
* fix (and disable by default) ED/CCA support on 76x2
* fix powersave issues on 7603
* fix return value check for ioremap on 7603
* fix duplicate USB device IDs
Remove duplicated entry in mt76x2u_device_table since Alfa AWUS036ACM
and Aukey USB-AC1200 have the same ids
Fixes: 62a25dc569 ("mt76x2u: Add support for Alfa AWUS036ACM")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
In case of error, the function devm_ioremap_resource() returns ERR_PTR()
and never returns NULL. The NULL test in the return value check should
be replaced with IS_ERR().
Fixes: c8846e1015 ("mt76: add driver for MT7603E and MT7628/7688")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
This feature has been reported to cause stability issues on several systems.
Disable it until it has been fixed and verified. It can still be enabled
through debugfs
Signed-off-by: Felix Fietkau <nbd@nbd.name>
These packets have no txwi entry in the ring, so tracking via tx status does
not work. To prevent PS poll requests from being unanswered, end the service
period right away
Signed-off-by: Felix Fietkau <nbd@nbd.name>
AGC register 35, 37 override for the low gain setting should only be done
on 5 GHz. Also, 2.4 GHz needs a different value for register 35
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Use the correct variable in the check. Fixes an uninitialized variable warning
Reported-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Fixes: c8846e1015 ("mt76: add driver for MT7603E and MT7628/7688")
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Full tx blocking (as opposed to CCA blocking) should only happen if there
is a continuous non-802.11 signal above the energy detect threshold.
Unfortunately the ED/CCA counter can't detect that, as it also counts 802.11
signals as busy.
Similar to the vendor code, implement a learning mode that waits until the AGC
gain has already been adjusted to the lowest value (due to false CCA events),
and the number of false CCA events still remains high, and the blocking
threshold is exceeded for more than 5 seconds.
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Also update the mask first before calculating the vif index.
Fixes an issue where adding back the same interfaces in a different order
fails because of duplicate vif index use
Fixes: 06662264ce ("mt76x02: use mask for vifs")
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Since we add txwi at the begining of skb->data, it no longer point
to ieee80211_hdr. This breaks settings TS bit for probe response and
beacons.
Acked-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Introduce mt76_queue stopped parameter in order to run
ieee80211_wake_queue only when mac80211 queues have been
previously stopped and avoid to disable interrupts when
it is not necessary
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
To be able to judge the current overcommitment ratio for a CPU add
a lowcore field with the exponential moving average of the steal time.
The average is updated every tick.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Working with the vfio-ap driver let to some revisit of the way
how an ap (queue) device is removed from the driver.
With the current implementation all the cleanup was done before
the driver even got notified about the removal. Now the ap
queue removal is done in 3 steps:
1) A preparation step, all ap messages within the queue
are flushed and so the driver does 'receive' them.
Also a new state AP_STATE_REMOVE assigned to the queue
makes sure there are no new messages queued in.
2) Now the driver's remove function is invoked and the
driver should do the job of cleaning up it's internal
administration lists or whatever. After 2) is done
it is guaranteed, that the driver is not invoked any
more. On the other hand the driver has to make sure
that the APQN is not accessed any more after step 2
is complete.
3) Now the ap bus code does the job of total cleanup of the
APQN. A reset with zero is triggered and the state of
the queue goes to AP_STATE_UNBOUND.
After step 3) is complete, the ap queue has no pending
messages and the APQN is cleared and so there are no
requests and replies lingering around in the firmware
queue for this APQN. Also the interrupts are disabled.
After these remove steps the ap queue device may be assigned
to another driver.
Stress testing this remove/probe procedure showed a problem with the
correct module reference counting. The actual receive of an reply in
the driver is done asynchronous with completions. So with a driver
change on an ap queue the message flush triggers completions but the
threads waiting for the completions may run at a time where the queue
already has the new driver assigned. So the module_put() at receive
time needs to be done on the driver module which queued the ap
message. This change is also part of this patch.
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Ingo Franzki <ifranzki@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
For a 64-bit process the randomization of the program break is quite
large with 1GB. That is as big as the randomization of the anonymous
mapping base, for a test case started with '/lib/ld64.so.1 <exec>'
it can happen that the heap is placed after the stack. To avoid
this limit the program break randomization to 32MB for 64-bit and
keep 8MB for 31-bit.
Reported-by: Stefan Liebler <stli@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Raspberry pi board model B revison 2 have the hot plug detector gpio
active high (and not low as it was in the dts).
Signed-off-by: Helen Koike <helen.koike@collabora.com>
Fixes: 49ac67e0c3 ("ARM: bcm2835: Add VC4 to the device tree.")
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Eric Anholt <eric@anholt.net>
The clock driver may probe after ours and so we need to pass the
-EPROBE_DEFER out. Fix the other error path while we're here.
v2: Use dom->name instead of dom->gov as the flag for initialized
domains, since we aren't setting up a governor. Make sure to
clear ->clk when no clk is present in the DT.
Signed-off-by: Eric Anholt <eric@anholt.net>
Fixes: 670c672608 ("soc: bcm: bcm2835-pm: Add support for power domains under a new binding.")
We don't have ASB master/slave regs for this domain, so just skip that
step.
Signed-off-by: Eric Anholt <eric@anholt.net>
Fixes: 670c672608 ("soc: bcm: bcm2835-pm: Add support for power domains under a new binding.")
Commit 78a24e10cd ("ASoC: soc-core: clear platform pointers on error")
re-worked the clean-up of any platform pointers that may have been
initialised by the function snd_soc_init_platform(). This commit missed
one error path where if any of the prelinks for a soundcard failed to
initialise, then these platform pointers would not be cleaned-up. This
then prevents the soundcard from being initialised following a probe
deferral when any of the soundcard prelinks cannot be found.
Fix this by ensuring that soc_cleanup_platform() is called when
initialising the soundcard prelinks fails.
Fixes: 78a24e10cd ("ASoC: soc-core: clear platform pointers on error")
Signed-off-by: Jonathan Hunter <jonathanh@nvidia.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Limiting the value of the passed in params->msbits in the hw_params()
callback is redundant on three counts:
1. We already specify in the DAI driver that we can only handle up to
24 bits. This means msbits will be limited to 24 via the ALSA
constraints imposed by the ASoC core, unless we have multiple codecs
that can handle more bits.
2. Nothing in our hw_params() implementation uses this value.
3. The copy of the params that we are passed by the ASoC core never
reads back the msbits value.
Consequently, this code is unnecessary and does nothing useful. Remove
it.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Jyri Sarha <jsarha@ti.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
This moves ppgtt root hook out of scan and shadow function,
as it's only required at dispatch time. Also make sure this
checks against shadow mm to be ready, otherwise bail to fail
earlier.
Reviewed-by: Xiong Zhang <xiong.y.zhang@intel.com>
Cc: Xiong Zhang <xiong.y.zhang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Add error check on set_sync function return.
Add of_node_put() as of_get_parent() takes a reference
which has to be released.
Signed-off-by: Olivier Moysan <olivier.moysan@st.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
When snd_pcm_stop_xrun() is called in interrupt routine,
substream context may have already been released.
Add protection on substream context.
Signed-off-by: Olivier Moysan <olivier.moysan@st.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Change capabilities exposed in SAI S/PDIF mode, to match
actually supported formats.
In S/PDIF mode only 32 bits stereo is supported.
Signed-off-by: Olivier Moysan <olivier.moysan@st.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Allow indexation of sai iec958 controls according
to device id.
Signed-off-by: Olivier Moysan <olivier.moysan@st.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
In preparation to enabling -Wimplicit-fallthrough, mark switch
cases where we are expecting to fall through.
This patch fixes the following warning:
In file included from sound/soc/codecs/ab8500-codec.c:24:
sound/soc/codecs/ab8500-codec.c: In function ‘ab8500_codec_set_dai_fmt’:
./include/linux/device.h:1485:2: warning: this statement may fall through [-Wimplicit-fallthrough=]
_dev_err(dev, dev_fmt(fmt), ##__VA_ARGS__)
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
sound/soc/codecs/ab8500-codec.c:2129:3: note: in expansion of macro ‘dev_err’
dev_err(dai->component->dev,
^~~~~~~
sound/soc/codecs/ab8500-codec.c:2132:2: note: here
default:
^~~~~~~
Warning level 3 was used: -Wimplicit-fallthrough=3
This patch is part of the ongoing efforts to enable
-Wimplicit-fallthrough.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
When using the S/PDIF DAI, there is no requirement to call
snd_soc_dai_set_fmt() as there is no DAI format definition that defines
S/PDIF. In any case, S/PDIF does not have separate clocks, this is
embedded into the data stream.
Consequently, when attempting to use TDA998x in S/PDIF mode, the attempt
to configure TDA998x via the hw_params callback fails as the
hdmi_codec_daifmt is left initialised to zero.
Since the S/PDIF DAI will only be used by S/PDIF, prepare the
hdmi_codec_daifmt structure for this format.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Jyri Sarha <jsarha@ti.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
This patch fixes the implementation of suspend/resume of the device so that
upon resume of the device, the host won't crash due to PCI completion
timeout.
Upon suspend, the device is being reset due to PERST. Therefore, upon
resume, the driver must initialize the PCI controller as if the driver was
loaded. If the controller is not initialized and the device tries to
access the device through the PCI bars, the host will crash with PCI
completion timeout error.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This patch adds accounting for active CS. Active means that the CS was
submitted to the H/W queues and was not completed yet.
This is necessary to support suspend operation. Because the device will be
reset upon suspend, we can only suspend after all active CS have been
completed. Hence, we need to perform accounting on their number.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This patch fixes the mapping of virtual address to physical addresses on
architectures where PAGE_SIZE is bigger than 4KB.
The break down to the device page size was done only for the virtual
address while it should have been done for the physical address as well.
As a result virtual addresses were mapped to wrong physical address.
The fix is to apply the break down for the physical addresses as well in
order to get correct mappings.
Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This patch fixes a bug which led to a crash during hard reset flow.
Before a hard reset is executed, we wait a few seconds for the user
context cleanup to complete.
If it wasn't completed, we kill the user process and move on to the reset
flow.
Upon killing the user process, the context cleanup flow begins and may
take a while due to MMU unmaps.
Meanwhile, in the driver reset flow, we change the PCI DRAM bar location
which can interfere with the MMU that uses the bar.
If the context cleanup flow didn't finish quickly, a crash may occur due
to PCI DRAM bar mislocation during the MMU unmap.
Hence adding a wait between killing the user process and the start of the
reset flow.
Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This patch fixes a bug of allocating a too big memory size with kmalloc,
which causes a failure.
In case of mapping a large memory block, an array of the relevant physical
page addresses is allocated. If there are many pages the array might be
too big to allocate with kmalloc, hence changing to kvmalloc.
Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
The requested allocation size is 64bit, hence the number of requested
pages and the total requested size should 64bit as well.
This patch fixes all places where these are treated as 32bit.
Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
UBSAN report this:
UBSAN: Undefined behaviour in net/xfrm/xfrm_policy.c:1289:24
index 6 is out of range for type 'unsigned int [6]'
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.4.162-514.55.6.9.x86_64+ #13
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
0000000000000000 1466cf39b41b23c9 ffff8801f6b07a58 ffffffff81cb35f4
0000000041b58ab3 ffffffff83230f9c ffffffff81cb34e0 ffff8801f6b07a80
ffff8801f6b07a20 1466cf39b41b23c9 ffffffff851706e0 ffff8801f6b07ae8
Call Trace:
<IRQ> [<ffffffff81cb35f4>] __dump_stack lib/dump_stack.c:15 [inline]
<IRQ> [<ffffffff81cb35f4>] dump_stack+0x114/0x1a0 lib/dump_stack.c:51
[<ffffffff81d94225>] ubsan_epilogue+0x12/0x8f lib/ubsan.c:164
[<ffffffff81d954db>] __ubsan_handle_out_of_bounds+0x16e/0x1b2 lib/ubsan.c:382
[<ffffffff82a25acd>] __xfrm_policy_unlink+0x3dd/0x5b0 net/xfrm/xfrm_policy.c:1289
[<ffffffff82a2e572>] xfrm_policy_delete+0x52/0xb0 net/xfrm/xfrm_policy.c:1309
[<ffffffff82a3319b>] xfrm_policy_timer+0x30b/0x590 net/xfrm/xfrm_policy.c:243
[<ffffffff813d3927>] call_timer_fn+0x237/0x990 kernel/time/timer.c:1144
[<ffffffff813d8e7e>] __run_timers kernel/time/timer.c:1218 [inline]
[<ffffffff813d8e7e>] run_timer_softirq+0x6ce/0xb80 kernel/time/timer.c:1401
[<ffffffff8120d6f9>] __do_softirq+0x299/0xe10 kernel/softirq.c:273
[<ffffffff8120e676>] invoke_softirq kernel/softirq.c:350 [inline]
[<ffffffff8120e676>] irq_exit+0x216/0x2c0 kernel/softirq.c:391
[<ffffffff82c5edab>] exiting_irq arch/x86/include/asm/apic.h:652 [inline]
[<ffffffff82c5edab>] smp_apic_timer_interrupt+0x8b/0xc0 arch/x86/kernel/apic/apic.c:926
[<ffffffff82c5c985>] apic_timer_interrupt+0xa5/0xb0 arch/x86/entry/entry_64.S:735
<EOI> [<ffffffff81188096>] ? native_safe_halt+0x6/0x10 arch/x86/include/asm/irqflags.h:52
[<ffffffff810834d7>] arch_safe_halt arch/x86/include/asm/paravirt.h:111 [inline]
[<ffffffff810834d7>] default_idle+0x27/0x430 arch/x86/kernel/process.c:446
[<ffffffff81085f05>] arch_cpu_idle+0x15/0x20 arch/x86/kernel/process.c:437
[<ffffffff8132abc3>] default_idle_call+0x53/0x90 kernel/sched/idle.c:92
[<ffffffff8132b32d>] cpuidle_idle_call kernel/sched/idle.c:156 [inline]
[<ffffffff8132b32d>] cpu_idle_loop kernel/sched/idle.c:251 [inline]
[<ffffffff8132b32d>] cpu_startup_entry+0x60d/0x9a0 kernel/sched/idle.c:299
[<ffffffff8113e119>] start_secondary+0x3c9/0x560 arch/x86/kernel/smpboot.c:245
The issue is triggered as this:
xfrm_add_policy
-->verify_newpolicy_info //check the index provided by user with XFRM_POLICY_MAX
//In my case, the index is 0x6E6BB6, so it pass the check.
-->xfrm_policy_construct //copy the user's policy and set xfrm_policy_timer
-->xfrm_policy_insert
--> __xfrm_policy_link //use the orgin dir, in my case is 2
--> xfrm_gen_index //generate policy index, there is 0x6E6BB6
then xfrm_policy_timer be fired
xfrm_policy_timer
--> xfrm_policy_id2dir //get dir from (policy index & 7), in my case is 6
--> xfrm_policy_delete
--> __xfrm_policy_unlink //access policy_count[dir], trigger out of range access
Add xfrm_policy_id2dir check in verify_newpolicy_info, make sure the computed dir is
valid, to fix the issue.
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: e682adf021 ("xfrm: Try to honor policy index if it's supplied by user")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
As vGPU shadow ctx is loaded with guest context state, arbitrarily
submitting request in error workload dispatch path would cause trouble.
So don't try to submit in error path now like in previous code.
This is to fix VM failure when GPU hang happens.
Fixes: f0e9943725 ("drm/i915/gvt: Fix workload request allocation before request add")
Reviewed-by: Xiong Zhang <xiong.y.zhang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
There is one corner case that workload_thread may pick and dispatch one
workload of vgpu after it's already deactivated. Below is the scenario:
1. deactive_vgpu got the vgpu_lock, it found pending workload was
submitted, then it released the vgpu_lock and wait for vgpu idle.
2. before deactive_vgpu got the vgpu_lock back, workload_thread might pick
one new valid workload, then it was blocked by the vgpu_lock.
3. deactive_vgpu got the vgpu_lock again, finished the last processes of
deactivating, then release the vgpu_lock.
4. workload_thread got the vgpu_lock, then it will try to dispatch the
fetched workload. It's not expected one workload of deactivated vgpu is
dispatched.
The solution is to add condition check of the vgpu's active flag and stop
to schedule when it's inactive.
Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Weinan Li <weinan.z.li@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
This patch fixes a bug that prevents freeing the reset gpio on unloading
the module.
aic3x_i2c_probe is called when loading the module and it calls list_add
with a probably uninitialized list entry aic3x->list (next = prev = NULL)).
So even if list_del is called it does nothing and in the end the gpio_reset
is not freed. Then a repeated module probing fails silently because
gpio_request fails.
When moving INIT_LIST_HEAD to aic3x_i2c_probe we also have to move
list_del to aic3x_i2c_remove because aic3x_remove may be called
multiple times without aic3x_i2c_remove being called which leads to
a NULL pointer dereference.
Signed-off-by: Philipp Puschmann <philipp.puschmann@emlix.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
According to the R-Car Gen3 Hardware Manual Errata for Rev 1.50 of Feb
12, 2019, the DMA channels for SCIF5 are corrected from 16..47 to 0..15
on R-Car E3.
Signed-off-by: Takeshi Kihara <takeshi.kihara.df@renesas.com>
Fixes: a5ebe5e49a ("arm64: dts: renesas: r8a77990: Add SCIF-{0,1,3,4,5} device nodes")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Fabrizio Castro <fabrizio.castro@bp.renesas.com>
Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
Depends on GEN family and I915_PARAM_HAS_CONTEXT_ISOLATION, Mesa driver
will decide whether constant buffer 0 address is relative or absolute,
and load GPU initial state by lri to context mmio INSTPM (GEN8)
or 0x20D8 (>=GEN9).
Mesa Commit fa8a764b62
("i965: Use absolute addressing for constant buffer 0 on Kernel 4.16+.")
INSTPM is already added to gen8_engine_mmio_list, but 0x20D8 is missed
in gen9_engine_mmio_list. From GVT point of view, different guest could
have different context so should switch those mmio accordingly.
v2: Update fixes commit ID.
Fixes: 1786571393 ("drm/i915/gvt: vGPU context switch")
Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Colin Xu <colin.xu@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
When MI_FLUSH_DW post write hw status page in index mode, the index
value is in dword step and turned into address offset in cmd dword1.
As status page size is 4K, so can't exceed that.
This fixed upper bound check in cmd parser code which incorrectly
stopped VM for reason of invalid MI_FLUSH_DW write index.
v2:
- Fix upper bound as 4K page size because index value is address offset.
Fixes: be1da7070a ("drm/i915/gvt: vGPU command scanner")
Cc: stable@vger.kernel.org # v4.10+
Cc: "Zhao, Yan Y" <yan.y.zhao@intel.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
2019-02-21 12:19:19 +08:00
2619 changed files with 30019 additions and 19046 deletions
* Copyright (C) 2016 Freescale Semiconductor, Inc.
* Copyright (C) 2017 NXP
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.