commit df8fed831c upstream.
Commit 7e534323c4 ("mtd: rawnand: Pass a nand_chip object to
chip->read_xxx() hooks") modified the prototype of the struct nand_chip
read_buf function pointer. In the au1550nd driver we have 2
implementations of read_buf. The previously mentioned commit modified
the au_read_buf() implementation to match the function pointer, but not
au_read_buf16(). This results in a compiler warning for MIPS
db1xxx_defconfig builds:
drivers/mtd/nand/raw/au1550nd.c:443:57:
warning: pointer type mismatch in conditional expression
Fix this by updating the prototype of au_read_buf16() to take a struct
nand_chip pointer as its first argument, as is expected after commit
7e534323c4 ("mtd: rawnand: Pass a nand_chip object to chip->read_xxx()
hooks").
Note that this shouldn't have caused any functional issues at runtime,
since the offset of the struct mtd_info within struct nand_chip is 0
making mtd_to_nand() effectively a type-cast.
Signed-off-by: Paul Burton <paul.burton@mips.com>
Fixes: 7e534323c4 ("mtd: rawnand: Pass a nand_chip object to chip->read_xxx() hooks")
Cc: stable@vger.kernel.org # v4.20+
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 194c2c74f5 upstream.
As instances may have different tracers available, we need to look at the
trace_array descriptor that shows the list of the available tracers for the
instance. But there's a race between opening the file and an admin
deleting the instance. The trace_array_get() needs to be called before
accessing the trace_array.
Cc: stable@vger.kernel.org
Fixes: 607e2ea167 ("tracing: Set up infrastructure to allow tracers for instances")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9ef16693af upstream.
The ftrace set_ftrace_filter and set_ftrace_notrace files are specific for
an instance now. They need to take a reference to the instance otherwise
there could be a race between accessing the files and deleting the instance.
It wasn't until the :mod: caching where these file operations started
referencing the trace_array directly.
Cc: stable@vger.kernel.org
Fixes: 673feb9d76 ("ftrace: Add :mod: caching infrastructure to trace_array")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4585fc59c0 upstream.
The system which has SVE feature crashed because of
the memory pointed by task->thread.sve_state was destroyed
by someone.
That is because sve_state is freed while the forking the
child process. The child process has the pointer of sve_state
which is same as the parent's because the child's task_struct
is copied from the parent's one. If the copy_process()
fails as an error on somewhere, for example, copy_creds(),
then the sve_state is freed even if the parent is alive.
The flow is as follows.
copy_process
p = dup_task_struct
=> arch_dup_task_struct
*dst = *src; // copy the entire region.
:
retval = copy_creds
if (retval < 0)
goto bad_fork_free;
:
bad_fork_free:
...
delayed_free_task(p);
=> free_task
=> arch_release_task_struct
=> fpsimd_release_task
=> __sve_free
=> kfree(task->thread.sve_state);
// free the parent's sve_state
Move child's sve_state = NULL and clearing TIF_SVE flag
to arch_dup_task_struct() so that the child doesn't free the
parent's one.
There is no need to wait until copy_process() to clear TIF_SVE for
dst, because the thread flags for dst are initialized already by
copying the src task_struct.
This change simplifies the code, so get rid of comments that are no
longer needed.
As a note, arm64 used to have thread_info on the stack. So it
would not be possible to clear TIF_SVE until the stack is initialized.
From commit c02433dd6d ("arm64: split thread_info from task stack"),
the thread_info is part of the task, so it should be valid to modify
the flag from arch_dup_task_struct().
Cc: stable@vger.kernel.org # 4.15.x-
Fixes: bc0ee47603 ("arm64/sve: Core task context handling")
Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Reported-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Suggested-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Tested-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 30045f2174 upstream.
Since commit c2b71462d2 ("USB: core: Fix bug caused by duplicate
interface PM usage counter") USB drivers must always balance their
runtime PM gets and puts, including when the driver has already been
unbound from the interface.
Leaving the interface with a positive PM usage counter would prevent a
later bound driver from suspending the device.
Note that runtime PM has never actually been enabled for this driver
since the support_autosuspend flag in its usb_driver struct is not set.
Fixes: c2b71462d2 ("USB: core: Fix bug caused by duplicate interface PM usage counter")
Cc: stable <stable@vger.kernel.org>
Acked-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://lore.kernel.org/r/20191001084908.2003-5-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dc7890995e upstream.
The officially validated plane width limit is 4k on skl+, however
we already had people using 5k displays before we started to enforce
the limit. Also it seems Windows allows 5k resolutions as well
(though not sure if they do it with one plane or two).
According to hw folks 5k should work with the possible
exception of the following features:
- Ytile (already limited to 4k)
- FP16 (already limited to 4k)
- render compression (already limited to 4k)
- KVMR sprite and cursor (don't care)
- horizontal panning (need to verify this)
- pipe and plane scaling (need to verify this)
So apart from last two items on that list we are already
fine. We should really verify what happens with those last
two items but I don't have a 5k display on hand atm so it'll
have to wait.
In the meantime let's just bump the limit back up to 5k since
several users have already been using it without apparent issues.
At least we'll be no worse off than we were prior to lowering
the limits.
Cc: stable@vger.kernel.org
Cc: Sean Paul <sean@poorly.run>
Cc: José Roberto de Souza <jose.souza@intel.com>
Tested-by: Leho Kraav <leho@kraav.com>
Fixes: 372b9ffb57 ("drm/i915: Fix skl+ max plane width")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111501
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190905135044.2001-1-ville.syrjala@linux.intel.com
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Sean Paul <sean@poorly.run>
(cherry picked from commit bed34ef544)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d4f4de5e5e upstream.
There are two problems in dcache_readdir() - one is that lockless traversal
of the list needs non-trivial cooperation of d_alloc() (at least a switch
to list_add_rcu(), and probably more than just that) and another is that
it assumes that no removal will happen without the directory locked exclusive.
Said assumption had always been there, never had been stated explicitly and
is violated by several places in the kernel (devpts and selinuxfs).
* replacement of next_positive() with different calling conventions:
it returns struct list_head * instead of struct dentry *; the latter is
passed in and out by reference, grabbing the result and dropping the original
value.
* scan is under ->d_lock. If we run out of timeslice, cursor is moved
after the last position we'd reached and we reschedule; then the scan continues
from that place. To avoid livelocks between multiple lseek() (with cursors
getting moved past each other, never reaching the real entries) we always
skip the cursors, need_resched() or not.
* returned list_head * is either ->d_child of dentry we'd found or
->d_subdirs of parent (if we got to the end of the list).
* dcache_readdir() and dcache_dir_lseek() switched to new helper.
dcache_readdir() always holds a reference to dentry passed to dir_emit() now.
Cursor is moved to just before the entry where dir_emit() has failed or into
the very end of the list, if we'd run out.
* move_cursor() eliminated - it had sucky calling conventions and
after fixing that it became simply list_move() (in lseek and scan_positives)
or list_move_tail() (in readdir).
All operations with the list are under ->d_lock now, and we do not
depend upon having all file removals done with parent locked exclusive
anymore.
Cc: stable@vger.kernel.org
Reported-by: "zhengbin (A)" <zhengbin13@huawei.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 1436a78c63 ]
Since commit ebd457d559 ("iio: light: vcnl4000 add devicetree hooks")
the of_match_table is supported but the data shouldn't be a string.
Instead it shall be one of 'enum vcnl4000_device_ids'. Also the matching
logic for the vcnl4020 was wrong. Since the data retrieve mechanism is
still based on the i2c_device_id no failures did appeared till now.
Fixes: ebd457d559 ("iio: light: vcnl4000 add devicetree hooks")
Signed-off-by: Marco Felsch <m.felsch@pengutronix.de>
Reviewed-by: Angus Ainslie (Purism) angus@akkea.ca
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Commit 98dc19902a upstream.
ACPI 6.3 adds a thread flag to represent if a CPU/PE is
actually a thread. Given that the MPIDR_MT bit may not
represent this information consistently on homogeneous machines
we should prefer the PPTT flag if its available.
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Reviewed-by: Robert Richter <rrichter@marvell.com>
[will: made acpi_cpu_is_threaded() return 'bool']
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 2f2b4fd674 upstream.
GCC 9.x automatically enables support for Loongson MMI instructions when
using some -march= flags, and then errors out when -msoft-float is
specified with:
cc1: error: ‘-mloongson-mmi’ must be used with ‘-mhard-float’
The kernel shouldn't be using these MMI instructions anyway, just as it
doesn't use floating point instructions. Explicitly disable them in
order to fix the build with GCC 9.x.
Signed-off-by: Paul Burton <paul.burton@mips.com>
Fixes: 3702bba5eb ("MIPS: Loongson: Add GCC 4.4 support for Loongson2E")
Fixes: 6f7a251a25 ("MIPS: Loongson: Add basic Loongson 2F support")
Fixes: 5188129b8c ("MIPS: Loongson-3: Improve -march option and move it to Platform")
Cc: Huacai Chen <chenhc@lemote.com>
Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
Cc: stable@vger.kernel.org # v2.6.32+
Cc: linux-mips@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 031d73ed76 upstream.
When a series of O_DIRECT reads or writes are truncated, either due to
eof or due to an error, then we should return the number of contiguous
bytes that were received/sent starting at the offset specified by the
application.
Currently, we are failing to correctly check contiguity, and so we're
failing the generic/465 in xfstests when the race between the read
and write RPCs causes the file to get extended while the 2 reads are
outstanding. If the first read RPC call wins the race and returns with
eof set, we should treat the second read RPC as being truncated.
Reported-by: Su Yanjun <suyj.fnst@cn.fujitsu.com>
Fixes: 1ccbad9f9f ("nfs: fix DIO good bytes calculation")
Cc: stable@vger.kernel.org # 4.1+
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c5f4987e86 upstream.
Coverity caught a case where we could return with a uninitialized value
in ret in process_leaf. This is actually pretty likely because we could
very easily run into a block group item key and have a garbage value in
ret and think there was an errror. Fix this by initializing ret to 0.
Reported-by: Colin Ian King <colin.king@canonical.com>
Fixes: fd708b81d9 ("Btrfs: add a extent ref verify tool")
CC: stable@vger.kernel.org # 4.19+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4203e96894 upstream.
We've historically had reports of being unable to mount file systems
because the tree log root couldn't be read. Usually this is the "parent
transid failure", but could be any of the related errors, including
"fsid mismatch" or "bad tree block", depending on which block got
allocated.
The modification of the individual log root items are serialized on the
per-log root root_mutex. This means that any modification to the
per-subvol log root_item is completely protected.
However we update the root item in the log root tree outside of the log
root tree log_mutex. We do this in order to allow multiple subvolumes
to be updated in each log transaction.
This is problematic however because when we are writing the log root
tree out we update the super block with the _current_ log root node
information. Since these two operations happen independently of each
other, you can end up updating the log root tree in between writing out
the dirty blocks and setting the super block to point at the current
root.
This means we'll point at the new root node that hasn't been written
out, instead of the one we should be pointing at. Thus whatever garbage
or old block we end up pointing at complains when we mount the file
system later and try to replay the log.
Fix this by copying the log's root item into a local root item copy.
Then once we're safely under the log_root_tree->log_mutex we update the
root item in the log_root_tree. This way we do not modify the
log_root_tree while we're committing it, fixing the problem.
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Chris Mason <clm@fb.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c67d970f0e upstream.
When we have a buffered write that starts at an offset greater than or
equals to the file's size happening concurrently with a full ranged
fiemap, we can end up leaking an extent state structure.
Suppose we have a file with a size of 1Mb, and before the buffered write
and fiemap are performed, it has a single extent state in its io tree
representing the range from 0 to 1Mb, with the EXTENT_DELALLOC bit set.
The following sequence diagram shows how the memory leak happens if a
fiemap a buffered write, starting at offset 1Mb and with a length of
4Kb, are performed concurrently.
CPU 1 CPU 2
extent_fiemap()
--> it's a full ranged fiemap
range from 0 to LLONG_MAX - 1
(9223372036854775807)
--> locks range in the inode's
io tree
--> after this we have 2 extent
states in the io tree:
--> 1 for range [0, 1Mb[ with
the bits EXTENT_LOCKED and
EXTENT_DELALLOC_BITS set
--> 1 for the range
[1Mb, LLONG_MAX[ with
the EXTENT_LOCKED bit set
--> start buffered write at offset
1Mb with a length of 4Kb
btrfs_file_write_iter()
btrfs_buffered_write()
--> cached_state is NULL
lock_and_cleanup_extent_if_need()
--> returns 0 and does not lock
range because it starts
at current i_size / eof
--> cached_state remains NULL
btrfs_dirty_pages()
btrfs_set_extent_delalloc()
(...)
__set_extent_bit()
--> splits extent state for range
[1Mb, LLONG_MAX[ and now we
have 2 extent states:
--> one for the range
[1Mb, 1Mb + 4Kb[ with
EXTENT_LOCKED set
--> another one for the range
[1Mb + 4Kb, LLONG_MAX[ with
EXTENT_LOCKED set as well
--> sets EXTENT_DELALLOC on the
extent state for the range
[1Mb, 1Mb + 4Kb[
--> caches extent state
[1Mb, 1Mb + 4Kb[ into
@cached_state because it has
the bit EXTENT_LOCKED set
--> btrfs_buffered_write() ends up
with a non-NULL cached_state and
never calls anything to release its
reference on it, resulting in a
memory leak
Fix this by calling free_extent_state() on cached_state if the range was
not locked by lock_and_cleanup_extent_if_need().
The same issue can happen if anything else other than fiemap locks a range
that covers eof and beyond.
This could be triggered, sporadically, by test case generic/561 from the
fstests suite, which makes duperemove run concurrently with fsstress, and
duperemove does plenty of calls to fiemap. When CONFIG_BTRFS_DEBUG is set
the leak is reported in dmesg/syslog when removing the btrfs module with
a message like the following:
[77100.039461] BTRFS: state leak: start 6574080 end 6582271 state 16402 in tree 0 refs 1
Otherwise (CONFIG_BTRFS_DEBUG not set) detectable with kmemleak.
CC: stable@vger.kernel.org # 4.16+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7a54789074 upstream.
Currently, the command:
btrfs balance start -dconvert=single,soft .
on a Raspberry Pi produces the following kernel message:
BTRFS error (device mmcblk0p2): balance: invalid convert data profile single
This fails because we use is_power_of_2(unsigned long) to validate
the new data profile, the constant for 'single' profile uses bit 48,
and there are only 32 bits in a long on ARM.
Fix by open-coding the check using u64 variables.
Tested by completing the original balance command on several Raspberry
Pis.
Fixes: 818255feec ("btrfs: use common helper instead of open coding a bit test")
CC: stable@vger.kernel.org # 4.20+
Signed-off-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 11a19a9087 upstream.
A user reported a lockdep splat
======================================================
WARNING: possible circular locking dependency detected
5.2.11-gentoo #2 Not tainted
------------------------------------------------------
kswapd0/711 is trying to acquire lock:
000000007777a663 (sb_internal){.+.+}, at: start_transaction+0x3a8/0x500
but task is already holding lock:
000000000ba86300 (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x0/0x30
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (fs_reclaim){+.+.}:
kmem_cache_alloc+0x1f/0x1c0
btrfs_alloc_inode+0x1f/0x260
alloc_inode+0x16/0xa0
new_inode+0xe/0xb0
btrfs_new_inode+0x70/0x610
btrfs_symlink+0xd0/0x420
vfs_symlink+0x9c/0x100
do_symlinkat+0x66/0xe0
do_syscall_64+0x55/0x1c0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
-> #0 (sb_internal){.+.+}:
__sb_start_write+0xf6/0x150
start_transaction+0x3a8/0x500
btrfs_commit_inode_delayed_inode+0x59/0x110
btrfs_evict_inode+0x19e/0x4c0
evict+0xbc/0x1f0
inode_lru_isolate+0x113/0x190
__list_lru_walk_one.isra.4+0x5c/0x100
list_lru_walk_one+0x32/0x50
prune_icache_sb+0x36/0x80
super_cache_scan+0x14a/0x1d0
do_shrink_slab+0x131/0x320
shrink_node+0xf7/0x380
balance_pgdat+0x2d5/0x640
kswapd+0x2ba/0x5e0
kthread+0x147/0x160
ret_from_fork+0x24/0x30
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(fs_reclaim);
lock(sb_internal);
lock(fs_reclaim);
lock(sb_internal);
commit 1fac4a5437 upstream.
[BUG]
One user reported a reproducible KASAN report about use-after-free:
BTRFS info (device sdi1): balance: start -dvrange=1256811659264..1256811659265
BTRFS info (device sdi1): relocating block group 1256811659264 flags data|raid0
==================================================================
BUG: KASAN: use-after-free in btrfs_init_reloc_root+0x2cd/0x340 [btrfs]
Write of size 8 at addr ffff88856f671710 by task kworker/u24:10/261579
CPU: 2 PID: 261579 Comm: kworker/u24:10 Tainted: P OE 5.2.11-arch1-1-kasan #4
Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X99 Extreme4, BIOS P3.80 04/06/2018
Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
Call Trace:
dump_stack+0x7b/0xba
print_address_description+0x6c/0x22e
? btrfs_init_reloc_root+0x2cd/0x340 [btrfs]
__kasan_report.cold+0x1b/0x3b
? btrfs_init_reloc_root+0x2cd/0x340 [btrfs]
kasan_report+0x12/0x17
__asan_report_store8_noabort+0x17/0x20
btrfs_init_reloc_root+0x2cd/0x340 [btrfs]
record_root_in_trans+0x2a0/0x370 [btrfs]
btrfs_record_root_in_trans+0xf4/0x140 [btrfs]
start_transaction+0x1ab/0xe90 [btrfs]
btrfs_join_transaction+0x1d/0x20 [btrfs]
btrfs_finish_ordered_io+0x7bf/0x18a0 [btrfs]
? lock_repin_lock+0x400/0x400
? __kmem_cache_shutdown.cold+0x140/0x1ad
? btrfs_unlink_subvol+0x9b0/0x9b0 [btrfs]
finish_ordered_fn+0x15/0x20 [btrfs]
normal_work_helper+0x1bd/0xca0 [btrfs]
? process_one_work+0x819/0x1720
? kasan_check_read+0x11/0x20
btrfs_endio_write_helper+0x12/0x20 [btrfs]
process_one_work+0x8c9/0x1720
? pwq_dec_nr_in_flight+0x2f0/0x2f0
? worker_thread+0x1d9/0x1030
worker_thread+0x98/0x1030
kthread+0x2bb/0x3b0
? process_one_work+0x1720/0x1720
? kthread_park+0x120/0x120
ret_from_fork+0x35/0x40
Allocated by task 369692:
__kasan_kmalloc.part.0+0x44/0xc0
__kasan_kmalloc.constprop.0+0xba/0xc0
kasan_kmalloc+0x9/0x10
kmem_cache_alloc_trace+0x138/0x260
btrfs_read_tree_root+0x92/0x360 [btrfs]
btrfs_read_fs_root+0x10/0xb0 [btrfs]
create_reloc_root+0x47d/0xa10 [btrfs]
btrfs_init_reloc_root+0x1e2/0x340 [btrfs]
record_root_in_trans+0x2a0/0x370 [btrfs]
btrfs_record_root_in_trans+0xf4/0x140 [btrfs]
start_transaction+0x1ab/0xe90 [btrfs]
btrfs_start_transaction+0x1e/0x20 [btrfs]
__btrfs_prealloc_file_range+0x1c2/0xa00 [btrfs]
btrfs_prealloc_file_range+0x13/0x20 [btrfs]
prealloc_file_extent_cluster+0x29f/0x570 [btrfs]
relocate_file_extent_cluster+0x193/0xc30 [btrfs]
relocate_data_extent+0x1f8/0x490 [btrfs]
relocate_block_group+0x600/0x1060 [btrfs]
btrfs_relocate_block_group+0x3a0/0xa00 [btrfs]
btrfs_relocate_chunk+0x9e/0x180 [btrfs]
btrfs_balance+0x14e4/0x2fc0 [btrfs]
btrfs_ioctl_balance+0x47f/0x640 [btrfs]
btrfs_ioctl+0x119d/0x8380 [btrfs]
do_vfs_ioctl+0x9f5/0x1060
ksys_ioctl+0x67/0x90
__x64_sys_ioctl+0x73/0xb0
do_syscall_64+0xa5/0x370
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Freed by task 369692:
__kasan_slab_free+0x14f/0x210
kasan_slab_free+0xe/0x10
kfree+0xd8/0x270
btrfs_drop_snapshot+0x154c/0x1eb0 [btrfs]
clean_dirty_subvols+0x227/0x340 [btrfs]
relocate_block_group+0x972/0x1060 [btrfs]
btrfs_relocate_block_group+0x3a0/0xa00 [btrfs]
btrfs_relocate_chunk+0x9e/0x180 [btrfs]
btrfs_balance+0x14e4/0x2fc0 [btrfs]
btrfs_ioctl_balance+0x47f/0x640 [btrfs]
btrfs_ioctl+0x119d/0x8380 [btrfs]
do_vfs_ioctl+0x9f5/0x1060
ksys_ioctl+0x67/0x90
__x64_sys_ioctl+0x73/0xb0
do_syscall_64+0xa5/0x370
entry_SYSCALL_64_after_hwframe+0x44/0xa9
The buggy address belongs to the object at ffff88856f671100
which belongs to the cache kmalloc-4k of size 4096
The buggy address is located 1552 bytes inside of
4096-byte region [ffff88856f671100, ffff88856f672100)
The buggy address belongs to the page:
page:ffffea0015bd9c00 refcount:1 mapcount:0 mapping:ffff88864400e600 index:0x0 compound_mapcount: 0
flags: 0x2ffff0000010200(slab|head)
raw: 02ffff0000010200 dead000000000100 dead000000000200 ffff88864400e600
raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff88856f671600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88856f671680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88856f671700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88856f671780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88856f671800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
BTRFS info (device sdi1): 1 enospc errors during balance
BTRFS info (device sdi1): balance: ended with status: -28
[CAUSE]
The problem happens when finish_ordered_io() get called with balance
still running, while the reloc root of that subvolume is already dead.
(Tree is swap already done, but tree not yet deleted for possible qgroup
usage.)
That means root->reloc_root still exists, but that reloc_root can be
under btrfs_drop_snapshot(), thus we shouldn't access it.
The following race could cause the use-after-free problem:
CPU1 | CPU2
--------------------------------------------------------------------------
| relocate_block_group()
| |- unset_reloc_control(rc)
| |- btrfs_commit_transaction()
btrfs_finish_ordered_io() | |- clean_dirty_subvols()
|- btrfs_join_transaction() | |
|- record_root_in_trans() | |
|- btrfs_init_reloc_root() | |
|- if (root->reloc_root) | |
| | |- root->reloc_root = NULL
| | |- btrfs_drop_snapshot(reloc_root);
|- reloc_root->last_trans|
= trans->transid |
^^^^^^^^^^^^^^^^^^^^^^
Use after free
[FIX]
Fix it by the following modifications:
- Test if the root has dead reloc tree before accessing root->reloc_root
If the root has BTRFS_ROOT_DEAD_RELOC_TREE, then we don't need to
create or update root->reloc_tree
- Clear the BTRFS_ROOT_DEAD_RELOC_TREE flag until we have fully dropped
reloc tree
To co-operate with above modification, so as long as
BTRFS_ROOT_DEAD_RELOC_TREE is still set, we won't try to re-create
reloc tree at record_root_in_trans().
Reported-by: Cebtenzzre <cebtenzzre@gmail.com>
Fixes: d2311e6985 ("btrfs: relocation: Delay reloc tree deletion after merge_reloc_roots")
CC: stable@vger.kernel.org # 5.1+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e735244e2c ]
When emulating open-drain/open-source by not actively driving the output
lines - we're simply changing their mode to input. This is wrong as it
will then make it impossible to change the value of such line - it's now
considered to actually be in input mode. If we want to still use the
direction_input() callback for simplicity then we need to set FLAG_IS_OUT
manually in gpiod_direction_output() and not clear it in
gpio_set_open_drain_value_commit() and
gpio_set_open_source_value_commit().
Fixes: c663e5f567 ("gpio: support native single-ended hardware drivers")
Cc: stable@vger.kernel.org
Reported-by: Kent Gibson <warthog618@gmail.com>
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
[Bartosz: backported to v5.3, v4.19]
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit be7ae45cfe ]
Since commit ec757001c8 ("gpio: Enable nonexclusive gpiods from DT
nodes") we are able to get GPIOD_FLAGS_BIT_NONEXCLUSIVE marked gpios.
Currently the gpiolib uses the wrong flags variable for the check. We
need to check the gpiod_flags instead of the of_gpio_flags else we
return -EBUSY for GPIOD_FLAGS_BIT_NONEXCLUSIVE marked and requested
gpiod's.
Fixes: ec757001c8 gpio: Enable nonexclusive gpiods from DT nodes
Cc: stable@vger.kernel.org
Signed-off-by: Marco Felsch <m.felsch@pengutronix.de>
[Bartosz: the function was moved to gpiolib-of.c so updated the patch]
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
[Bartosz: backported to v5.3.y]
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 442f1e746e ]
Commit 4b708b7b1a ("firmware: google: check if size is valid when
decoding VPD data") adds length checks, but the new vpd_decode_entry()
function botched the logic -- it adds the key length twice, instead of
adding the key and value lengths separately.
On my local system, this means vpd.c's vpd_section_create_attribs() hits
an error case after the first attribute it parses, since it's no longer
looking at the correct offset. With this patch, I'm back to seeing all
the correct attributes in /sys/firmware/vpd/...
Fixes: 4b708b7b1a ("firmware: google: check if size is valid when decoding VPD data")
Cc: <stable@vger.kernel.org>
Cc: Hung-Te Lin <hungte@chromium.org>
Signed-off-by: Brian Norris <briannorris@chromium.org>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Link: https://lore.kernel.org/r/20190930214522.240680-1-briannorris@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit b0f53dbc4b upstream.
Partially revert 16db3d3f11 ("kernel/sysctl.c: threads-max observe
limits") because the patch is causing a regression to any workload which
needs to override the auto-tuning of the limit provided by kernel.
set_max_threads is implementing a boot time guesstimate to provide a
sensible limit of the concurrently running threads so that runaways will
not deplete all the memory. This is a good thing in general but there
are workloads which might need to increase this limit for an application
to run (reportedly WebSpher MQ is affected) and that is simply not
possible after the mentioned change. It is also very dubious to
override an admin decision by an estimation that doesn't have any direct
relation to correctness of the kernel operation.
Fix this by dropping set_max_threads from sysctl_max_threads so any
value is accepted as long as it fits into MAX_THREADS which is important
to check because allowing more threads could break internal robust futex
restriction. While at it, do not use MIN_THREADS as the lower boundary
because it is also only a heuristic for automatic estimation and admin
might have a good reason to stop new threads to be created even when
below this limit.
This became more severe when we switched x86 from 4k to 8k kernel
stacks. Starting since 6538b8ea88 ("x86_64: expand kernel stack to
16K") (3.16) we use THREAD_SIZE_ORDER = 2 and that halved the auto-tuned
value.
In the particular case
3.12
kernel.threads-max = 515561
4.4
kernel.threads-max = 200000
Neither of the two values is really insane on 32GB machine.
I am not sure we want/need to tune the max_thread value further. If
anything the tuning should be removed altogether if proven not useful in
general. But we definitely need a way to override this auto-tuning.
Link: http://lkml.kernel.org/r/20190922065801.GB18814@dhcp22.suse.cz
Fixes: 16db3d3f11 ("kernel/sysctl.c: threads-max observe limits")
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0b3d0ef984 upstream.
Mark inode for force revalidation if LOOKUP_REVAL flag is set.
This tells the client to actually send a QueryInfo request to
the server to obtain the latest metadata in case a directory
or a file were changed remotely. Only do that if the client
doesn't have a lease for the file to avoid unneeded round
trips to the server.
Cc: <stable@vger.kernel.org>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c82e5ac7fe upstream.
Currently the client indicates that a dentry is stale when inode
numbers or type types between a local inode and a remote file
don't match. If this is the case attributes is not being copied
from remote to local, so, it is already known that the local copy
has stale metadata. That's why the inode needs to be marked for
revalidation in order to tell the VFS to lookup the dentry again
before openning a file. This prevents unexpected stale errors
to be returned to the user space when openning a file.
Cc: <stable@vger.kernel.org>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 30573a82fb upstream.
Currently if the client identifies problems when processing
metadata returned in CREATE response, the open handle is being
leaked. This causes multiple problems like a file missing a lease
break by that client which causes high latencies to other clients
accessing the file. Another side-effect of this is that the file
can't be deleted.
Fix this by closing the file after the client hits an error after
the file was opened and the open descriptor wasn't returned to
the user space. Also convert -ESTALE to -EOPENSTALE to allow
the VFS to revalidate a dentry and retry the open.
Cc: <stable@vger.kernel.org>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2a5243937c upstream.
string_to_context_struct() may garble the context string, so we need to
copy back the contents again from the old context struct to avoid
storing the corrupted context.
Since string_to_context_struct() tokenizes (and therefore truncates) the
context string and we are later potentially copying it with kstrdup(),
this may eventually cause pieces of uninitialized kernel memory to be
disclosed to userspace (when copying to userspace based on the stored
length and not the null character).
How to reproduce on Fedora and similar:
# dnf install -y memcached
# systemctl start memcached
# semodule -d memcached
# load_policy
# load_policy
# systemctl stop memcached
# ausearch -m AVC
type=AVC msg=audit(1570090572.648:313): avc: denied { signal } for pid=1 comm="systemd" scontext=system_u:system_r:init_t:s0 tcontext=system_u:object_r:unlabeled_t:s0 tclass=process permissive=0 trawcon=73797374656D5F75007400000000000070BE6E847296FFFF726F6D000096FFFF76
Cc: stable@vger.kernel.org
Reported-by: Milos Malik <mmalik@redhat.com>
Fixes: ee1a84fdfe ("selinux: overhaul sidtab to fix bug and improve performance")
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b84477d3eb upstream.
scale_up wakes up waiters after scaling up. But after scaling max, it
should not wake up more waiters as waiters will not have anything to
do. This patch fixes this by making scale_up (and also scale_down)
return when threshold is reached.
This bug causes increased fdatasync latency when fdatasync and dd
conv=sync are performed in parallel on 4.19 compared to 4.14. This
bug was introduced during refactoring of blk-wbt code.
Fixes: a79050434b ("blk-rq-qos: refactor out common elements of blk-wbt")
Cc: stable@vger.kernel.org
Cc: Josef Bacik <jbacik@fb.com>
Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d9a997bd4d upstream.
We need to perform a reset a start up to make sure that the chip is in a
consistent state. This reset also disables all the interrupts which
should only be enabled together with the iio buffer. Not doing this, was
sometimes causing unwanted interrupts to trigger.
Signed-off-by: Stefan Popa <stefan.popa@analog.com>
Fixes: f4f55ce38e ("iio:adxl372: Add FIFO and interrupts support")
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d202ce4787 upstream.
Currently, the driver sets the FIFO_SAMPLES register with the number of
sample sets (maximum of 170 for 3 axis data, 256 for 2-axis and 512 for
single axis). However, the FIFO_SAMPLES register should store the number
of samples, regardless of how the FIFO format is configured.
Signed-off-by: Stefan Popa <stefan.popa@analog.com>
Fixes: f4f55ce38e ("iio:adxl372: Add FIFO and interrupts support")
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7fd1c26065 upstream.
Commit 5a441aade5 ("iio: light: vcnl4000 add support for the VCNL4040
proximity and light sensor") added the support for the vcnl4040 but
forgot to add the of_compatible. Fix this by adding it now.
Signed-off-by: Marco Felsch <m.felsch@pengutronix.de>
Fixes: 5a441aade5 ("iio: light: vcnl4000 add support for the VCNL4040 proximity and light sensor")
Reviewed-by: Angus Ainslie (Purism) angus@akkea.ca
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 82f3015635 upstream.
When an end-of-conversion interrupt is received after performing a
single-shot reading of the light sensor, the driver was waking up the
result ready queue before checking opt->ok_to_ignore_lock to determine
if it should unlock the mutex. The problem occurred in the case where
the other thread woke up and changed the value of opt->ok_to_ignore_lock
to false prior to the interrupt thread performing its read of the
variable. In this case, the mutex would be unlocked twice.
Signed-off-by: David Frey <dpfrey@gmail.com>
Reviewed-by: Andreas Dannenberg <dannenberg@ti.com>
Fixes: 94a9b7b180 ("iio: light: add support for TI's opt3001 light sensor")
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dcb1092017 upstream.
End of conversion may be handled by using IRQ or DMA. There may be a
race when two conversions complete at the same time on several ADCs.
EOC can be read as 'set' for several ADCs, with:
- an ADC configured to use IRQs. EOCIE bit is set. The handler is normally
called in this case.
- an ADC configured to use DMA. EOCIE bit isn't set. EOC triggers the DMA
request instead. It's then automatically cleared by DMA read. But the
handler gets called due to status bit is temporarily set (IRQ triggered
by the other ADC).
So both EOC status bit in CSR and EOCIE control bit must be checked
before invoking the interrupt handler (e.g. call ISR only for
IRQ-enabled ADCs).
Fixes: 2763ea0585 ("iio: adc: stm32: add optional dma support")
Signed-off-by: Fabrice Gasnier <fabrice.gasnier@st.com>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 31922f62bb upstream.
Move STM32 ADC registers definitions to common header.
This is precursor patch to:
- iio: adc: stm32-adc: fix a race when using several adcs with dma and irq
It keeps registers definitions as a whole block, to ease readability and
allow simple access path to EOC bits (readl) in stm32-adc-core driver.
Fixes: 2763ea0585 ("iio: adc: stm32: add optional dma support")
Signed-off-by: Fabrice Gasnier <fabrice.gasnier@st.com>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 972917419a upstream.
Since commit 9bcf15f75c ("iio: adc: axp288: Fix TS-pin handling") we
preserve the bias current set by the firmware at boot. This fixes issues
we were seeing on various models, but it seems our old hardcoded 80ųA bias
current was working around a firmware bug on at least one model laptop.
In order to both have our cake and eat it, this commit adds a dmi based
list of models where we need to override the firmware set bias current and
adds the one model we now know needs this to it: The Lenovo Ideapad 100S
(11 inch version).
Fixes: 9bcf15f75c ("iio: adc: axp288: Fix TS-pin handling")
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=203829
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4043ecfb5f upstream.
Fix bug in sampling function hx711_cycle() when interrupt occures while
PD_SCK is high. If PD_SCK is high for at least 60 us power down mode of
the sensor is entered which in turn leads to a wrong measurement.
Switch off interrupts during a PD_SCK high period and move query of DOUT
to the latest point of time which is at the end of PD_SCK low period.
This bug exists in the driver since it's initial addition. The more
interrupts on the system the higher is the probability that it happens.
Fixes: c3b2fdd0ea ("iio: adc: hx711: Add IIO driver for AVIA HX711")
Signed-off-by: Andreas Klinger <ak@it-klinger.de>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2eed19b99c upstream.
The PCM draining behavior got broken since the recent refactoring, and
this turned out to be the incorrect expectation of the firmware
behavior regarding "draining". While I expected the "drain" flag at
the stop operation would do processing the queued samples, it seems
rather dropping the samples.
As a quick fix, just drop the SNDRV_PCM_INFO_DRAIN_TRIGGER flag, so
that the driver uses the normal PCM draining procedure. Also, put
some caution comment to the function for future readers not to fall
into the same pitfall.
Fixes: d7ca3a7154 ("staging: bcm2835-audio: Operate non-atomic PCM ops")
BugLink: https://github.com/raspberrypi/linux/issues/2983
Cc: stable@vger.kernel.org
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Acked-by: Stefan Wahren <wahrenst@gmx.net>
Link: https://lore.kernel.org/r/20190914152405.7416-1-tiwai@suse.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 63f2b1677f upstream.
Commit c440eee1a7 ("Staging: fbtft: Switch to the gpio descriptor
interface") removed setting gpios via platform data. This means that
fbtft will now only work with Device Tree so set the dependency.
This also prevents a NULL pointer deref on non-DT platform because
fbtftops.request_gpios is not set in that case anymore.
Fixes: c440eee1a7 ("Staging: fbtft: Switch to the gpio descriptor interface")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
Link: https://lore.kernel.org/r/20190917171843.10334-1-noralf@tronnes.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cd81e6fa8e upstream.
The driver is using its struct usb_device pointer as an inverted
disconnected flag, but was setting it to NULL before making sure all
completion handlers had run. This could lead to a NULL-pointer
dereference in a number of dev_dbg and dev_err statements in the
completion handlers which relies on said pointer.
Fix this by unconditionally stopping all I/O and preventing
resubmissions by poisoning the interrupt URBs at disconnect and using a
dedicated disconnected flag.
This also makes sure that all I/O has completed by the time the
disconnect callback returns.
Fixes: 9d974b2a06 ("USB: legousbtower.c: remove err() usage")
Fixes: fef526cae7 ("USB: legousbtower: remove custom debug macro")
Fixes: 4dae996380 ("USB: legotower: remove custom debug macro and module parameter")
Cc: stable <stable@vger.kernel.org> # 3.5
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://lore.kernel.org/r/20190919083039.30898-4-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8530e4e20e upstream.
The "run_isr" flag is used for preventing the driver from
calling the interrupt service routine in its runtime resume
callback when the driver is expecting completion to a
command, but what that basically does is that it hides the
real problem. The real problem is that the controller is
allowed to suspend in the middle of command execution.
As a more appropriate fix for the problem, using autosuspend
delay time that matches UCSI_TIMEOUT_MS (5s). That prevents
the controller from suspending while still in the middle of
executing a command.
This fixes a potential deadlock. Both ccg_read() and
ccg_write() are called with the mutex already taken at least
from ccg_send_command(). In ccg_read() and ccg_write, the
mutex is only acquired so that run_isr flag can be set.
Fixes: f0e4cd948b ("usb: typec: ucsi: ccg: add runtime pm workaround")
Cc: stable@vger.kernel.org
Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Link: https://lore.kernel.org/r/20191004100219.71152-2-heikki.krogerus@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d51bdb93ca upstream.
Since commit c2b71462d2 ("USB: core: Fix bug caused by duplicate
interface PM usage counter") USB drivers must always balance their
runtime PM gets and puts, including when the driver has already been
unbound from the interface.
Leaving the interface with a positive PM usage counter would prevent a
later bound driver from suspending the device.
Fixes: c2b71462d2 ("USB: core: Fix bug caused by duplicate interface PM usage counter")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://lore.kernel.org/r/20191001084908.2003-4-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 58ecf131e7 upstream.
The driver was using its struct usb_interface pointer as an inverted
disconnected flag, but was setting it to NULL before making sure all
completion handlers had run. This could lead to a NULL-pointer
dereference in a number of dev_dbg, dev_warn and dev_err statements in
the completion handlers which relies on said pointer.
Fix this by unconditionally stopping all I/O and preventing
resubmissions by poisoning the interrupt URBs at disconnect and using a
dedicated disconnected flag.
This also makes sure that all I/O has completed by the time the
disconnect callback returns.
Fixes: 2824bd250f ("[PATCH] USB: add ldusb driver")
Cc: stable <stable@vger.kernel.org> # 2.6.13
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://lore.kernel.org/r/20191009153848.8664-4-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9a31535859 upstream.
Since commit c2b71462d2 ("USB: core: Fix bug caused by duplicate
interface PM usage counter") USB drivers must always balance their
runtime PM gets and puts, including when the driver has already been
unbound from the interface.
Leaving the interface with a positive PM usage counter would prevent a
later bound driver from suspending the device.
Fixes: c2b71462d2 ("USB: core: Fix bug caused by duplicate interface PM usage counter")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://lore.kernel.org/r/20191001084908.2003-3-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit edc4746f25 upstream.
A recent fix addressing a deadlock on disconnect introduced a new bug
by moving the present flag out of the critical section protected by the
driver-data mutex. This could lead to a racing release() freeing the
driver data before disconnect() is done with it.
Due to insufficient locking a related use-after-free could be triggered
also before the above mentioned commit. Specifically, the driver needs
to hold the driver-data mutex also while checking the opened flag at
disconnect().
Fixes: c468a8aa79 ("usb: iowarrior: fix deadlock on disconnect")
Fixes: 946b960d13 ("USB: add driver for iowarrior devices.")
Cc: stable <stable@vger.kernel.org> # 2.6.21
Reported-by: syzbot+0761012cebf7bdb38137@syzkaller.appspotmail.com
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://lore.kernel.org/r/20191009104846.5925-2-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b2fa7baee7 upstream.
The driver was using its struct usb_device pointer as an inverted
disconnected flag, but was setting it to NULL before making sure all
completion handlers had run. This could lead to a NULL-pointer
dereference in a number of dev_dbg statements in the completion handlers
which relies on said pointer.
The pointer was also dereferenced unconditionally in a dev_dbg statement
release() something which would lead to a NULL-deref whenever a device
was disconnected before the final character-device close if debugging
was enabled.
Fix this by unconditionally stopping all I/O and preventing
resubmissions by poisoning the interrupt URBs at disconnect and using a
dedicated disconnected flag.
This also makes sure that all I/O has completed by the time the
disconnect callback returns.
Fixes: 1ef37c6047 ("USB: adutux: remove custom debug macro and module parameter")
Fixes: 66d4bc30d1 ("USB: adutux: remove custom debug macro")
Cc: stable <stable@vger.kernel.org> # 3.12
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://lore.kernel.org/r/20190925092913.8608-2-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8de66b0e6a upstream.
The system can hit a deadlock if an xhci adapter breaks while initializing.
The deadlock is between two threads: thread 1 is tearing down the
adapter and is stuck in usb_unlocked_disable_lpm waiting to lock the
hcd->handwidth_mutex. Thread 2 is holding this mutex (while still trying
to add a usb device), but is stuck in xhci_endpoint_reset waiting for a
stop or config command to complete. A reboot is required to resolve.
It turns out when calling xhci_queue_stop_endpoint and
xhci_queue_configure_endpoint in xhci_endpoint_reset, the return code is
not checked for errors. If the timing is right and the adapter dies just
before either of these commands get issued, we hang indefinitely waiting
for a completion on a command that didn't get issued.
This wasn't a problem before the following fix because we didn't send
commands in xhci_endpoint_reset:
commit f5249461b5 ("xhci: Clear the host side toggle manually when
endpoint is soft reset")
With the patch I am submitting, a duration test which breaks adapters
during initialization (and which deadlocks with the standard kernel) runs
without issue.
Fixes: f5249461b5 ("xhci: Clear the host side toggle manually when endpoint is soft reset")
Cc: <stable@vger.kernel.org> # v4.17+
Cc: Torez Smith <torez@redhat.com>
Signed-off-by: Bill Kuzeja <william.kuzeja@stratus.com>
Signed-off-by: Torez Smith <torez@redhat.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/1570190373-30684-7-git-send-email-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c03101ff4f upstream.
The check printing out the "WARN Wrong bounce buffer write length:"
uses incorrect values when comparing bytes written from scatterlist
to bounce buffer. Actual copied lengths are fine.
The used seg->bounce_len will be set to equal new_buf_len a few lines later
in the code, but is incorrect when doing the comparison.
The patch which added this false warning was backported to 4.8+ kernels
so this should be backported as far as well.
Cc: <stable@vger.kernel.org> # v4.8+
Fixes: 597c56e372 ("xhci: update bounce buffer with correct sg num")
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/1570190373-30684-2-git-send-email-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bed5ef2309 upstream.
The driver was using its struct usb_interface pointer as an inverted
disconnected flag and was setting it to NULL before making sure all
completion handlers had run. This could lead to NULL-pointer
dereferences in the dev_err() statements in the completion handlers
which relies on said pointer.
Fix this by using a dedicated disconnected flag.
Note that this is also addresses a NULL-pointer dereference at release()
and a struct usb_interface reference leak introduced by a recent runtime
PM fix, which depends on and should have been submitted together with
this patch.
Fixes: 4212cd74ca ("USB: usb-skeleton.c: remove err() usage")
Fixes: 5c290a5e42 ("USB: usb-skeleton: fix runtime PM after driver unbind")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://lore.kernel.org/r/20191009170944.30057-2-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5c290a5e42 upstream.
Since commit c2b71462d2 ("USB: core: Fix bug caused by duplicate
interface PM usage counter") USB drivers must always balance their
runtime PM gets and puts, including when the driver has already been
unbound from the interface.
Leaving the interface with a positive PM usage counter would prevent a
later bound driver from suspending the device.
Fixes: c2b71462d2 ("USB: core: Fix bug caused by duplicate interface PM usage counter")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://lore.kernel.org/r/20191001084908.2003-2-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit aafb00a977 upstream.
The driver was using its struct usb_interface pointer as an inverted
disconnected flag, but was setting it to NULL without making sure all
code paths that used it were done with it.
Before commit ef61eb43ad ("USB: yurex: Fix protection fault after
device removal") this included the interrupt-in completion handler, but
there are further accesses in dev_err and dev_dbg statements in
yurex_write() and the driver-data destructor (sic!).
Fix this by unconditionally stopping also the control URB at disconnect
and by using a dedicated disconnected flag.
Note that we need to take a reference to the struct usb_interface to
avoid a use-after-free in the destructor whenever the device was
disconnected while the character device was still open.
Fixes: aadd6472d9 ("USB: yurex.c: remove dbg() usage")
Fixes: 45714104b9 ("USB: yurex.c: remove err() usage")
Cc: stable <stable@vger.kernel.org> # 3.5: ef61eb43ad
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://lore.kernel.org/r/20191009153848.8664-6-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 32a0721c66 upstream.
According to Greg KH, it has been generally agreed that when a USB
driver encounters an unknown error (or one it can't handle directly),
it should just give up instead of going into a potentially infinite
retry loop.
The three codes -EPROTO, -EILSEQ, and -ETIME fall into this category.
They can be caused by bus errors such as packet loss or corruption,
attempting to communicate with a disconnected device, or by malicious
firmware. Nowadays the extent of packet loss or corruption is
negligible, so it should be safe for a driver to give up whenever one
of these errors occurs.
Although the yurex driver handles -EILSEQ errors in this way, it
doesn't do the same for -EPROTO (as discovered by the syzbot fuzzer)
or other unrecognized errors. This patch adjusts the driver so that
it doesn't log an error message for -EPROTO or -ETIME, and it doesn't
retry after any errors.
Reported-and-tested-by: syzbot+b24d736f18a1541ad550@syzkaller.appspotmail.com
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
CC: Tomoki Sekiyama <tomoki.sekiyama@gmail.com>
CC: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/Pine.LNX.4.44L0.1909171245410.1590-100000@iolanthe.rowland.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 20bb759a66 upstream.
Calling 'panic()' on a kernel with CONFIG_PREEMPT=y can leave the
calling CPU in an infinite loop, but with interrupts and preemption
enabled. From this state, userspace can continue to be scheduled,
despite the system being "dead" as far as the kernel is concerned.
This is easily reproducible on arm64 when booting with "nosmp" on the
command line; a couple of shell scripts print out a periodic "Ping"
message whilst another triggers a crash by writing to
/proc/sysrq-trigger:
| sysrq: Trigger a crash
| Kernel panic - not syncing: sysrq triggered crash
| CPU: 0 PID: 1 Comm: init Not tainted 5.2.15 #1
| Hardware name: linux,dummy-virt (DT)
| Call trace:
| dump_backtrace+0x0/0x148
| show_stack+0x14/0x20
| dump_stack+0xa0/0xc4
| panic+0x140/0x32c
| sysrq_handle_reboot+0x0/0x20
| __handle_sysrq+0x124/0x190
| write_sysrq_trigger+0x64/0x88
| proc_reg_write+0x60/0xa8
| __vfs_write+0x18/0x40
| vfs_write+0xa4/0x1b8
| ksys_write+0x64/0xf0
| __arm64_sys_write+0x14/0x20
| el0_svc_common.constprop.0+0xb0/0x168
| el0_svc_handler+0x28/0x78
| el0_svc+0x8/0xc
| Kernel Offset: disabled
| CPU features: 0x0002,24002004
| Memory Limit: none
| ---[ end Kernel panic - not syncing: sysrq triggered crash ]---
| Ping 2!
| Ping 1!
| Ping 1!
| Ping 2!
The issue can also be triggered on x86 kernels if CONFIG_SMP=n,
otherwise local interrupts are disabled in 'smp_send_stop()'.
Disable preemption in 'panic()' before re-enabling interrupts.
Link: http://lkml.kernel.org/r/20191002123538.22609-1-will@kernel.org
Link: https://lore.kernel.org/r/BX1W47JXPMR8.58IYW53H6M5N@dragonstone
Signed-off-by: Will Deacon <will@kernel.org>
Reported-by: Xogium <contact@xogium.me>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Feng Tang <feng.tang@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ee45197c80 upstream.
As reported by erofs_utils fuzzer, a logical page can belong
to at most 2 compressed clusters, if one compressed cluster
is corrupted, but the other has been ready in submitting chain.
The chain needs to submit anyway in order to keep the page
working properly (page unlocked with PG_error set, PG_uptodate
not set).
Let's fix it now.
Fixes: 3883a79abd ("staging: erofs: introduce VLE decompression support")
Cc: <stable@vger.kernel.org> # 4.19+
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Link: https://lore.kernel.org/r/20190819103426.87579-2-gaoxiang25@huawei.com
[ Gao Xiang: Manually backport to v5.3.y stable. ]
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1004ce4c25 upstream.
Synchronization is recommended before disabling the trace registers
to prevent any start or stop points being speculative at the point
of disabling the unit (section 7.3.77 of ARM IHI 0064D).
Synchronization is also recommended after programming the trace
registers to ensure all updates are committed prior to normal code
resuming (section 4.3.7 of ARM IHI 0064D).
Let's ensure these syncronization points are present in the code
and clearly commented.
Note that we could rely on the barriers in CS_LOCK and
coresight_disclaim_device_unlocked or the context switch to user
space - however coresight may be of use in the kernel.
On armv8 the mb macro is defined as dsb(sy) - Given that the etm4x is
only used on armv8 let's directly use dsb(sy) instead of mb(). This
removes some ambiguity and makes it easier to correlate the code with
the TRM.
Signed-off-by: Andrew Murray <andrew.murray@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
[Fixed capital letter for "use" in title]
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Link: https://lore.kernel.org/r/20190829202842.580-11-mathieu.poirier@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cc3a7bfe62 upstream.
Today, put_compat_statfs64() disallows nearly any field value over
2^32 if f_bsize is only 32 bits, but that makes no sense.
compat_statfs64 is there for the explicit purpose of providing 64-bit
fields for f_files, f_ffree, etc. And f_bsize is always only 32 bits.
As a result, 32-bit userspace gets -EOVERFLOW for i.e. large file
counts even with -D_FILE_OFFSET_BITS=64 set.
In reality, only f_bsize and f_frsize can legitimately overflow
(fields like f_type and f_namelen should never be large), so test
only those fields.
This bug was discussed at length some time ago, and this is the proposal
Al suggested at https://lkml.org/lkml/2018/8/6/640. It seemed to get
dropped amid the discussion of other related changes, but this
part seems obviously correct on its own, so I've picked it up and
sent it, for expediency.
Fixes: 64d2ab32ef ("vfs: fix put_compat_statfs64() does not handle errors")
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c82dd6d078 ]
When the handle_exception function addresses an exception, the interrupts
will be unconditionally enabled after finishing the context save. However,
It may erroneously enable the interrupts if the interrupts are disabled
before entering the handle_exception.
For example, one of the WARN_ON() condition is satisfied in the scheduling
where the interrupt is disabled and rq.lock is locked. The WARN_ON will
trigger a break exception and the handle_exception function will enable the
interrupts before entering do_trap_break function. During the procedure, if
a timer interrupt is pending, it will be taken when interrupts are enabled.
In this case, it may cause a deadlock problem if the rq.lock is locked
again in the timer ISR.
Hence, the handle_exception() can only enable interrupts when the state of
sstatus.SPIE is 1.
This patch is tested on HiFive Unleashed board.
Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
Reviewed-by: Palmer Dabbelt <palmer@sifive.com>
[paul.walmsley@sifive.com: updated to apply]
Fixes: bcae803a21 ("RISC-V: Enable IRQ during exception handling")
Cc: David Abdurachmanov <david.abdurachmanov@sifive.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b63fd11cce ]
When using 'perf stat' with repeat and interval option, it shows wrong
values for events.
The wrong values will be shown for the first interval on the second and
subsequent repetitions.
Without the fix:
# perf stat -r 3 -I 2000 -e faults -e sched:sched_switch -a sleep 5
2.000282489 53 faults
2.000282489 513 sched:sched_switch
4.005478208 3,721 faults
4.005478208 2,666 sched:sched_switch
5.025470933 395 faults
5.025470933 1,307 sched:sched_switch
2.009602825 1,84,46,74,40,73,70,95,47,520 faults <------
2.009602825 1,84,46,74,40,73,70,95,49,568 sched:sched_switch <------
4.019612206 4,730 faults
4.019612206 2,746 sched:sched_switch
5.039615484 3,953 faults
5.039615484 1,496 sched:sched_switch
2.000274620 1,84,46,74,40,73,70,95,47,520 faults <------
2.000274620 1,84,46,74,40,73,70,95,47,520 sched:sched_switch <------
4.000480342 4,282 faults
4.000480342 2,303 sched:sched_switch
5.000916811 1,322 faults
5.000916811 1,064 sched:sched_switch
#
prev_raw_counts is allocated when using intervals. This is used when
calculating the difference in the counts of events when using interval.
The current counts are stored in prev_raw_counts to calculate the
differences in the next iteration.
On the first interval of the second and subsequent repetitions,
prev_raw_counts would be the values stored in the last interval of the
previous repetitions, while the current counts will only be for the
first interval of the current repetition.
Hence there is a possibility of events showing up as big number.
Fix this by resetting prev_raw_counts whenever perf stat repeats the
command.
With the fix:
# perf stat -r 3 -I 2000 -e faults -e sched:sched_switch -a sleep 5
2.019349347 2,597 faults
2.019349347 2,753 sched:sched_switch
4.019577372 3,098 faults
4.019577372 2,532 sched:sched_switch
5.019415481 1,879 faults
5.019415481 1,356 sched:sched_switch
2.000178813 8,468 faults
2.000178813 2,254 sched:sched_switch
4.000404621 7,440 faults
4.000404621 1,266 sched:sched_switch
5.040196079 2,458 faults
5.040196079 556 sched:sched_switch
2.000191939 6,870 faults
2.000191939 1,170 sched:sched_switch
4.000414103 541 faults
4.000414103 902 sched:sched_switch
5.000809863 450 faults
5.000809863 364 sched:sched_switch
#
Committer notes:
This was broken since the cset introducing the --interval feature, i.e.
--repeat + --interval wasn't tested at that point, add the Fixes tag so
that automatic scripts can pick this up.
Fixes: 13370a9b5b ("perf stat: Add interval printing")
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: stable@vger.kernel.org # v3.9+
Link: http://lore.kernel.org/lkml/20190904094738.9558-2-srikar@linux.vnet.ibm.com
[ Fixed up conflicts with libperf, i.e. some perf_{evsel,evlist} lost the 'perf' prefix ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b9023b91dd ]
When a cpu requests broadcasting, before starting the tick broadcast
hrtimer, bc_set_next() checks if the timer callback (bc_handler) is active
using hrtimer_try_to_cancel(). But hrtimer_try_to_cancel() does not provide
the required synchronization when the callback is active on other core.
The callback could have already executed tick_handle_oneshot_broadcast()
and could have also returned. But still there is a small time window where
the hrtimer_try_to_cancel() returns -1. In that case bc_set_next() returns
without doing anything, but the next_event of the tick broadcast clock
device is already set to a timeout value.
In the race condition diagram below, CPU #1 is running the timer callback
and CPU #2 is entering idle state and so calls bc_set_next().
In the worst case, the next_event will contain an expiry time, but the
hrtimer will not be started which happens when the racing callback returns
HRTIMER_NORESTART. The hrtimer might never recover if all further requests
from the CPUs to subscribe to tick broadcast have timeout greater than the
next_event of tick broadcast clock device. This leads to cascading of
failures and finally noticed as rcu stall warnings
Here is a depiction of the race condition
CPU #1 (Running timer callback) CPU #2 (Enter idle
and subscribe to
tick broadcast)
--------------------- ---------------------
__run_hrtimer() tick_broadcast_enter()
bc_handler() __tick_broadcast_oneshot_control()
tick_handle_oneshot_broadcast()
raw_spin_lock(&tick_broadcast_lock);
dev->next_event = KTIME_MAX; //wait for tick_broadcast_lock
//next_event for tick broadcast clock
set to KTIME_MAX since no other cores
subscribed to tick broadcasting
raw_spin_unlock(&tick_broadcast_lock);
if (dev->next_event == KTIME_MAX)
return HRTIMER_NORESTART
// callback function exits without
restarting the hrtimer //tick_broadcast_lock acquired
raw_spin_lock(&tick_broadcast_lock);
tick_broadcast_set_event()
clockevents_program_event()
dev->next_event = expires;
bc_set_next()
hrtimer_try_to_cancel()
//returns -1 since the timer
callback is active. Exits without
restarting the timer
cpu_base->running = NULL;
The comment that hrtimer cannot be armed from within the callback is
wrong. It is fine to start the hrtimer from within the callback. Also it is
safe to start the hrtimer from the enter/exit idle code while the broadcast
handler is active. The enter/exit idle code and the broadcast handler are
synchronized using tick_broadcast_lock. So there is no need for the
existing try to cancel logic. All this can be removed which will eliminate
the race condition as well.
Fixes: 5d1638acb9 ("tick: Introduce hrtimer based broadcast")
Originally-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Balasubramani Vivekanandan <balasubramani_vivekanandan@mentor.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20190926135101.12102-2-balasubramani_vivekanandan@mentor.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 567926cca9 ]
Current versions of Intel's SDM incorrectly state that "bits 31:15 of
the VM-Entry exception error-code field" must be zero. In reality, bits
31:16 must be zero, i.e. error codes are 16-bit values.
The bogus error code check manifests as an unexpected VM-Entry failure
due to an invalid code field (error number 7) in L1, e.g. when injecting
a #GP with error_code=0x9f00.
Nadav previously reported the bug[*], both to KVM and Intel, and fixed
the associated kvm-unit-test.
[*] https://patchwork.kernel.org/patch/11124749/
Reported-by: Nadav Amit <namit@vmware.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9f7fec0ba8 ]
Some of the self tests create a test inode, setup some extents and then do
calls to btrfs_get_extent() to test that the corresponding extent maps
exist and are correct. However btrfs_get_extent(), since the 5.2 merge
window, now errors out when it finds a regular or prealloc extent for an
inode that does not correspond to a regular file (its ->i_mode is not
S_IFREG). This causes the self tests to fail sometimes, specially when
KASAN, slub_debug and page poisoning are enabled:
$ modprobe btrfs
modprobe: ERROR: could not insert 'btrfs': Invalid argument
$ dmesg
[ 9414.691648] Btrfs loaded, crc32c=crc32c-intel, debug=on, assert=on, integrity-checker=on, ref-verify=on
[ 9414.692655] BTRFS: selftest: sectorsize: 4096 nodesize: 4096
[ 9414.692658] BTRFS: selftest: running btrfs free space cache tests
[ 9414.692918] BTRFS: selftest: running extent only tests
[ 9414.693061] BTRFS: selftest: running bitmap only tests
[ 9414.693366] BTRFS: selftest: running bitmap and extent tests
[ 9414.696455] BTRFS: selftest: running space stealing from bitmap to extent tests
[ 9414.697131] BTRFS: selftest: running extent buffer operation tests
[ 9414.697133] BTRFS: selftest: running btrfs_split_item tests
[ 9414.697564] BTRFS: selftest: running extent I/O tests
[ 9414.697583] BTRFS: selftest: running find delalloc tests
[ 9415.081125] BTRFS: selftest: running find_first_clear_extent_bit test
[ 9415.081278] BTRFS: selftest: running extent buffer bitmap tests
[ 9415.124192] BTRFS: selftest: running inode tests
[ 9415.124195] BTRFS: selftest: running btrfs_get_extent tests
[ 9415.127909] BTRFS: selftest: running hole first btrfs_get_extent test
[ 9415.128343] BTRFS critical (device (efault)): regular/prealloc extent found for non-regular inode 256
[ 9415.131428] BTRFS: selftest: fs/btrfs/tests/inode-tests.c:904 expected a real extent, got 0
This happens because the test inodes are created without ever initializing
the i_mode field of the inode, and neither VFS's new_inode() nor the btrfs
callback btrfs_alloc_inode() initialize the i_mode. Initialization of the
i_mode is done through the various callbacks used by the VFS to create
new inodes (regular files, directories, symlinks, tmpfiles, etc), which
all call btrfs_new_inode() which in turn calls inode_init_owner(), which
sets the inode's i_mode. Since the tests only uses new_inode() to create
the test inodes, the i_mode was never initialized.
This always happens on a VM I used with kasan, slub_debug and many other
debug facilities enabled. It also happened to someone who reported this
on bugzilla (on a 5.3-rc).
Fix this by setting i_mode to S_IFREG at btrfs_new_test_inode().
Fixes: 6bf9e4bd6a ("btrfs: inode: Verify inode mode to avoid NULL pointer dereference")
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204397
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 78beef629f ]
In nfp_abm_u32_knode_replace if the allocation for match fails it should
go to the error handling instead of returning. Updated other gotos to
have correct errno returned, too.
Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 52feb8b588 ]
The ASIC can only mirror a packet to one port, but when user is trying
to set more than one mirror action, it doesn't fail.
Add a check if more than one mirror action was specified per rule and if so,
fail for not being supported.
Fixes: d0d13c1858 ("mlxsw: spectrum_acl: Add support for mirror action")
Signed-off-by: Danielle Ratson <danieller@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 127068abe8 ]
We have a production-level laptop (Lenovo Yoga C630) which is exhibiting
a rather horrific bug. When I2C HID devices are being scanned for at
boot-time the QCom Geni based I2C (Serial Engine) attempts to use DMA.
When it does, the laptop reboots and the user never sees the OS.
Attempts are being made to debug the reason for the spontaneous reboot.
No luck so far, hence the requirement for this hot-fix. This workaround
will be removed once we have a viable fix.
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Tested-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 284b94be19 ]
Commit c48dac137a ("block: don't hold q->sysfs_lock in elevator_init_mq")
removes q->sysfs_lock from elevator_init_mq(), but forgot to deal with
lockdep_assert_held() called in blk_mq_sched_free_requests() which is
run in failure path of elevator_init_mq().
blk_mq_sched_free_requests() is called in the following 3 functions:
elevator_init_mq()
elevator_exit()
blk_cleanup_queue()
In blk_cleanup_queue(), blk_mq_sched_free_requests() is followed exactly
by 'mutex_lock(&q->sysfs_lock)'.
So moving the lockdep_assert_held() from blk_mq_sched_free_requests()
into elevator_exit() for fixing the report by syzbot.
Reported-by: syzbot+da3b7677bb913dc1b737@syzkaller.appspotmail.com
Fixed: c48dac137a ("block: don't hold q->sysfs_lock in elevator_init_mq")
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit aef70a1f44 ]
Some compilers emit warning for potential uninitialized next_id usage.
The code is correct, but control flow is too complicated for some
compilers to figure this out. Re-initialize next_id to satisfy
compiler.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0f74914071 ]
When building with W=1, gcc properly complains that there's no prototypes:
CC kernel/elfcore.o
kernel/elfcore.c:7:17: warning: no previous prototype for 'elf_core_extra_phdrs' [-Wmissing-prototypes]
7 | Elf_Half __weak elf_core_extra_phdrs(void)
| ^~~~~~~~~~~~~~~~~~~~
kernel/elfcore.c:12:12: warning: no previous prototype for 'elf_core_write_extra_phdrs' [-Wmissing-prototypes]
12 | int __weak elf_core_write_extra_phdrs(struct coredump_params *cprm, loff_t offset)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
kernel/elfcore.c:17:12: warning: no previous prototype for 'elf_core_write_extra_data' [-Wmissing-prototypes]
17 | int __weak elf_core_write_extra_data(struct coredump_params *cprm)
| ^~~~~~~~~~~~~~~~~~~~~~~~~
kernel/elfcore.c:22:15: warning: no previous prototype for 'elf_core_extra_data_size' [-Wmissing-prototypes]
22 | size_t __weak elf_core_extra_data_size(void)
| ^~~~~~~~~~~~~~~~~~~~~~~~
Provide the include file so gcc is happy, and we don't have potential code drift
Link: http://lkml.kernel.org/r/29875.1565224705@turing-police
Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4670d68b92 ]
Some recent changes in latest Clang started causing the following
warning when unrolling strobemeta test case main loop:
progs/strobemeta.h:416:2: warning: loop not unrolled: the optimizer was
unable to perform the requested transformation; the transformation might
be disabled or specified as part of an unsupported transformation
ordering [-Wpass-failed=transform-warning]
This patch simplifies loop's exit condition to depend only on constant
max iteration number (STROBE_MAX_MAP_ENTRIES), while moving early
termination logic inside the loop body. The changes are equivalent from
program logic standpoint, but fixes the warning. It also appears to
improve generated BPF code, as it fixes previously failing non-unrolled
strobemeta test cases.
Cc: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d1a445d3b8 ]
There are many of those warnings.
In file included from ./arch/powerpc/include/asm/paca.h:15,
from ./arch/powerpc/include/asm/current.h:13,
from ./include/linux/thread_info.h:21,
from ./include/asm-generic/preempt.h:5,
from ./arch/powerpc/include/generated/asm/preempt.h:1,
from ./include/linux/preempt.h:78,
from ./include/linux/spinlock.h:51,
from fs/fs-writeback.c:19:
In function 'strncpy',
inlined from 'perf_trace_writeback_page_template' at
./include/trace/events/writeback.h:56:1:
./include/linux/string.h:260:9: warning: '__builtin_strncpy' specified
bound 32 equals destination size [-Wstringop-truncation]
return __builtin_strncpy(p, q, size);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Fix it by using the new strscpy_pad() which was introduced in "lib/string:
Add strscpy_pad() function" and will always be NUL-terminated instead of
strncpy(). Also, change strlcpy() to use strscpy_pad() in this file for
consistency.
Link: http://lkml.kernel.org/r/1564075099-27750-1-git-send-email-cai@lca.pw
Fixes: 455b286468 ("writeback: Initial tracing support")
Fixes: 028c2dd184 ("writeback: Add tracing to balance_dirty_pages")
Fixes: e84d0a4f8e ("writeback: trace event writeback_queue_io")
Fixes: b48c104d22 ("writeback: trace event bdi_dirty_ratelimit")
Fixes: cc1676d917 ("writeback: Move requeueing when I_SYNC set to writeback_sb_inodes()")
Fixes: 9fb0a7da0c ("writeback: add more tracepoints")
Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Tobin C. Harding <tobin@kernel.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Joe Perches <joe@perches.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Nitin Gote <nitin.r.gote@intel.com>
Cc: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Cc: Stephen Kitt <steve@sk2.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 815c1560bf ]
With Java 11 there is no seperate JRE anymore.
Details:
https://coderanch.com/t/701603/java/JRE-JDK
Therefore the detection of the JRE needs to be adapted.
This change works for s390 and x86. I have not tested other platforms.
Committer testing:
Continues to work with the OpenJDK 8:
$ rm -f ~acme/lib64/libperf-jvmti.so
$ rpm -qa | grep jdk-devel
java-1.8.0-openjdk-devel-1.8.0.222.b10-0.fc30.x86_64
$ git log --oneline -1
a51937170f33 (HEAD -> perf/core) perf build: Add detection of java-11-openjdk-devel package
$ rm -rf /tmp/build/perf ; mkdir -p /tmp/build/perf ; make -C tools/perf O=/tmp/build/perf install > /dev/null 2>1
$ ls -la ~acme/lib64/libperf-jvmti.so
-rwxr-xr-x. 1 acme acme 230744 Sep 24 16:46 /home/acme/lib64/libperf-jvmti.so
$
Suggested-by: Andreas Krebbel <krebbel@linux.ibm.com>
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Hendrik Brueckner <brueckner@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: http://lore.kernel.org/lkml/20190909114116.50469-4-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 714e501e16 ]
An oops can be triggered in the scheduler when running qemu on arm64:
Unable to handle kernel paging request at virtual address ffff000008effe40
Internal error: Oops: 96000007 [#1] SMP
Process migration/0 (pid: 12, stack limit = 0x00000000084e3736)
pstate: 20000085 (nzCv daIf -PAN -UAO)
pc : __ll_sc___cmpxchg_case_acq_4+0x4/0x20
lr : move_queued_task.isra.21+0x124/0x298
...
Call trace:
__ll_sc___cmpxchg_case_acq_4+0x4/0x20
__migrate_task+0xc8/0xe0
migration_cpu_stop+0x170/0x180
cpu_stopper_thread+0xec/0x178
smpboot_thread_fn+0x1ac/0x1e8
kthread+0x134/0x138
ret_from_fork+0x10/0x18
__set_cpus_allowed_ptr() will choose an active dest_cpu in affinity mask to
migrage the process if process is not currently running on any one of the
CPUs specified in affinity mask. __set_cpus_allowed_ptr() will choose an
invalid dest_cpu (dest_cpu >= nr_cpu_ids, 1024 in my virtual machine) if
CPUS in an affinity mask are deactived by cpu_down after cpumask_intersects
check. cpumask_test_cpu() of dest_cpu afterwards is overflown and may pass if
corresponding bit is coincidentally set. As a consequence, kernel will
access an invalid rq address associate with the invalid CPU in
migration_cpu_stop->__migrate_task->move_queued_task and the Oops occurs.
The reproduce the crash:
1) A process repeatedly binds itself to cpu0 and cpu1 in turn by calling
sched_setaffinity.
2) A shell script repeatedly does "echo 0 > /sys/devices/system/cpu/cpu1/online"
and "echo 1 > /sys/devices/system/cpu/cpu1/online" in turn.
3) Oops appears if the invalid CPU is set in memory after tested cpumask.
Signed-off-by: KeMeng Shi <shikemeng@huawei.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/1568616808-16808-1-git-send-email-shikemeng@huawei.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 59f08896f0 ]
After commit 62974fc389 ("libnvdimm: Enable unit test infrastructure
compile checks"), clang warns:
In file included from
../drivers/nvdimm/../../tools/testing/nvdimm/test/iomap.c:15:
../drivers/nvdimm/../../tools/testing/nvdimm/test/nfit_test.h:206:15:
warning: redefinition of typedef 'acpi_handle' is a C11 feature
[-Wtypedef-redefinition]
typedef void *acpi_handle;
^
../include/acpi/actypes.h:424:15: note: previous definition is here
typedef void *acpi_handle; /* Actually a ptr to a NS Node */
^
1 warning generated.
The include chain:
iomap.c ->
linux/acpi.h ->
acpi/acpi.h ->
acpi/actypes.h
nfit_test.h
Avoid this by including linux/acpi.h in nfit_test.h, which allows us to
remove both the typedef and the forward declaration of acpi_object.
Link: https://github.com/ClangBuiltLinux/linux/issues/660
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20190918042148.77553-1-natechancellor@gmail.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c42adf87e4 ]
We do check for a bad block during namespace init and that use
region bad block list. We need to initialize the bad block
for volatile regions for this to work. We also observe a lockdep
warning as below because the lock is not initialized correctly
since we skip bad block init for volatile regions.
INFO: trying to register non-static key.
the code is fine but needs lockdep annotation.
turning off the locking correctness validator.
CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.3.0-rc1-15699-g3dee241c937e #149
Call Trace:
[c0000000f95cb250] [c00000000147dd84] dump_stack+0xe8/0x164 (unreliable)
[c0000000f95cb2a0] [c00000000022ccd8] register_lock_class+0x308/0xa60
[c0000000f95cb3a0] [c000000000229cc0] __lock_acquire+0x170/0x1ff0
[c0000000f95cb4c0] [c00000000022c740] lock_acquire+0x220/0x270
[c0000000f95cb580] [c000000000a93230] badblocks_check+0xc0/0x290
[c0000000f95cb5f0] [c000000000d97540] nd_pfn_validate+0x5c0/0x7f0
[c0000000f95cb6d0] [c000000000d98300] nd_dax_probe+0xd0/0x1f0
[c0000000f95cb760] [c000000000d9b66c] nd_pmem_probe+0x10c/0x160
[c0000000f95cb790] [c000000000d7f5ec] nvdimm_bus_probe+0x10c/0x240
[c0000000f95cb820] [c000000000d0f844] really_probe+0x254/0x4e0
[c0000000f95cb8b0] [c000000000d0fdfc] driver_probe_device+0x16c/0x1e0
[c0000000f95cb930] [c000000000d10238] device_driver_attach+0x68/0xa0
[c0000000f95cb970] [c000000000d1040c] __driver_attach+0x19c/0x1c0
[c0000000f95cb9f0] [c000000000d0c4c4] bus_for_each_dev+0x94/0x130
[c0000000f95cba50] [c000000000d0f014] driver_attach+0x34/0x50
[c0000000f95cba70] [c000000000d0e208] bus_add_driver+0x178/0x2f0
[c0000000f95cbb00] [c000000000d117c8] driver_register+0x108/0x170
[c0000000f95cbb70] [c000000000d7edb0] __nd_driver_register+0xe0/0x100
[c0000000f95cbbd0] [c000000001a6baa4] nd_pmem_driver_init+0x34/0x48
[c0000000f95cbbf0] [c0000000000106f4] do_one_initcall+0x1d4/0x4b0
[c0000000f95cbcd0] [c0000000019f499c] kernel_init_freeable+0x544/0x65c
[c0000000f95cbdb0] [c000000000010d6c] kernel_init+0x2c/0x180
[c0000000f95cbe20] [c00000000000b954] ret_from_kernel_thread+0x5c/0x68
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Link: https://lore.kernel.org/r/20190919083355.26340-1-aneesh.kumar@linux.ibm.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6ccb72f837 ]
Downgrading an existing large mapping to a mapping using smaller
page-sizes works only for the mappings created with page-mode 7 (i.e.
non-default page size).
Treat large mappings created with page-mode 0 (i.e. default page size)
like a non-present mapping and allow to overwrite it in alloc_pte().
While around, make sure that we flush the TLB only if we change an
existing mapping, otherwise we might end up acting on garbage PTEs.
Fixes: 6d568ef9a6 ("iommu/amd: Allow downgrading page-sizes in alloc_pte()")
Signed-off-by: Andrei Dulea <adulea@amazon.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8c7aa18428 ]
When calling thermal_add_hwmon_sysfs(), the device type is sanitized by
replacing '-' with '_'. However tz->type remains unsanitized. Thus
calling thermal_hwmon_lookup_by_type() returns no device. And if there is
no device, thermal_remove_hwmon_sysfs() fails with "hwmon device lookup
failed!".
The result is unregisted hwmon devices in the sysfs.
Fixes: 409ef0baca ("thermal_hwmon: Sanitize attribute name passed to hwmon")
Signed-off-by: Stefan Mavrodiev <stefan@olimex.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ae89339b08 ]
second parameter of ntb_peer_mw_get_addr is pointing to wrong memory
window index by passing "peer gidx" instead of "local gidx".
For ex, "local gidx" value is '0' and "peer gidx" value is '1', then
on peer side ntb_mw_set_trans() api is used as below with gidx pointing to
local side gidx which is '0', so memroy window '0' is chosen and XLAT '0'
will be programmed by peer side.
ntb_mw_set_trans(perf->ntb, peer->pidx, peer->gidx, peer->inbuf_xlat,
peer->inbuf_size);
Now, on local side ntb_peer_mw_get_addr() is been used as below with gidx
pointing to "peer gidx" which is '1', so pointing to memory window '1'
instead of memory window '0'.
ntb_peer_mw_get_addr(perf->ntb, peer->gidx, &phys_addr,
&peer->outbuf_size);
So this patch pass "local gidx" as parameter to ntb_peer_mw_get_addr().
Signed-off-by: Sanjay R Mehta <sanju.mehta@amd.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 88282297ff ]
The seccomp selftest goes to some length to build against older kernel
headers, viz. all the #ifdefs at the beginning of the file.
Commit 201766a20e ("ptrace: add PTRACE_GET_SYSCALL_INFO request")
introduces some additional macros, but doesn't do the #ifdef dance.
Let's add that dance here to avoid:
gcc -Wl,-no-as-needed -Wall seccomp_bpf.c -lpthread -o seccomp_bpf
In file included from seccomp_bpf.c:51:
seccomp_bpf.c: In function ‘tracer_ptrace’:
seccomp_bpf.c:1787:20: error: ‘PTRACE_EVENTMSG_SYSCALL_ENTRY’ undeclared (first use in this function); did you mean ‘PTRACE_EVENT_CLONE’?
EXPECT_EQ(entry ? PTRACE_EVENTMSG_SYSCALL_ENTRY
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../kselftest_harness.h:608:13: note: in definition of macro ‘__EXPECT’
__typeof__(_expected) __exp = (_expected); \
^~~~~~~~~
seccomp_bpf.c:1787:2: note: in expansion of macro ‘EXPECT_EQ’
EXPECT_EQ(entry ? PTRACE_EVENTMSG_SYSCALL_ENTRY
^~~~~~~~~
seccomp_bpf.c:1787:20: note: each undeclared identifier is reported only once for each function it appears in
EXPECT_EQ(entry ? PTRACE_EVENTMSG_SYSCALL_ENTRY
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../kselftest_harness.h:608:13: note: in definition of macro ‘__EXPECT’
__typeof__(_expected) __exp = (_expected); \
^~~~~~~~~
seccomp_bpf.c:1787:2: note: in expansion of macro ‘EXPECT_EQ’
EXPECT_EQ(entry ? PTRACE_EVENTMSG_SYSCALL_ENTRY
^~~~~~~~~
seccomp_bpf.c:1788:6: error: ‘PTRACE_EVENTMSG_SYSCALL_EXIT’ undeclared (first use in this function); did you mean ‘PTRACE_EVENT_EXIT’?
: PTRACE_EVENTMSG_SYSCALL_EXIT, msg);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~
../kselftest_harness.h:608:13: note: in definition of macro ‘__EXPECT’
__typeof__(_expected) __exp = (_expected); \
^~~~~~~~~
seccomp_bpf.c:1787:2: note: in expansion of macro ‘EXPECT_EQ’
EXPECT_EQ(entry ? PTRACE_EVENTMSG_SYSCALL_ENTRY
^~~~~~~~~
make: *** [Makefile:12: seccomp_bpf] Error 1
[skhan@linuxfoundation.org: Fix checkpatch error in commit log]
Signed-off-by: Tycho Andersen <tycho@tycho.ws>
Fixes: 201766a20e ("ptrace: add PTRACE_GET_SYSCALL_INFO request")
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c91e3234c6 ]
LPTimer can use a 32KHz clock for counting. It depends on clock tree
configuration. In such a case, PWM output frequency range is limited.
Although unlikely, nothing prevents user from requesting a PWM frequency
above counting clock (32KHz for instance):
- This causes (prd - 1) = 0xffff to be written in ARR register later in
the apply() routine.
This results in badly configured PWM period (and also duty_cycle).
Add a check to report an error is such a case.
Signed-off-by: Fabrice Gasnier <fabrice.gasnier@st.com>
Reviewed-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9ba828861c ]
If the copy of the RPC reply into our buffers did not complete, and
we could end up with a truncated message. In that case, just resend
the call.
Fixes: a0584ee9ae ("SUNRPC: Use struct xdr_stream when decoding...")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9c47b18cf7 ]
IF the server rejected our layout return with a state error such as
NFS4ERR_BAD_STATEID, or even a stale inode error, then we do want
to clear out all the remaining layout segments and mark that stateid
as invalid.
Fixes: 1c5bd76d17 ("pNFS: Enable layoutreturn operation for...")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9e6124d9d6 ]
Since add_probe_trace_event() can reuse tf->tevs[i] after calling
clear_probe_trace_event(), this can make perf-probe crash if the 1st
attempt of probe event finding fails to find an event argument, and the
2nd attempt fails to find probe point.
E.g.
$ perf probe -D "task_pid_nr tsk"
Failed to find 'tsk' in this function.
Failed to get entry address of warn_bad_vsyscall
Segmentation fault (core dumped)
Committer testing:
After the patch:
$ perf probe -D "task_pid_nr tsk"
Failed to find 'tsk' in this function.
Failed to get entry address of warn_bad_vsyscall
Failed to get entry address of signal_fault
Failed to get entry address of show_signal
Failed to get entry address of umip_printk
Failed to get entry address of __bad_area_nosemaphore
<SNIP>
Failed to get entry address of sock_set_timeout
Failed to get entry address of tcp_recvmsg
Probe point 'task_pid_nr' not found.
Error: Failed to add events.
$
Fixes: 092b1f0b5f ("perf probe: Clear probe_trace_event when add_probe_trace_event() fails")
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lore.kernel.org/lkml/156856587999.25775.5145779959474477595.stgit@devnote2
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit dcafbd50f2 ]
Hawaii needs to flush caches explicitly, submitting an IB in a user
VMID from kernel mode. There is no s_fence in this case.
Fixes: eb3961a574 ("drm/amdgpu: remove fence context from the job")
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit acab713177 ]
This un-breaks lookups in sets that have the 'dynamic' flag set.
Given this active example configuration:
table filter {
set set1 {
type ipv4_addr
size 64
flags dynamic,timeout
timeout 1m
}
chain input {
type filter hook input priority 0; policy accept;
}
}
... this works:
nft add rule ip filter input add @set1 { ip saddr }
-> whenever rule is triggered, the source ip address is inserted
into the set (if it did not exist).
This won't work:
nft add rule ip filter input ip saddr @set1 counter
Error: Could not process rule: Operation not supported
In other words, we can add entries to the set, but then can't make
matching decision based on that set.
That is just wrong -- all set backends support lookups (else they would
not be very useful).
The failure comes from an explicit rejection in nft_lookup.c.
Looking at the history, it seems like NFT_SET_EVAL used to mean
'set contains expressions' (aka. "is a meter"), for instance something like
nft add rule ip filter input meter example { ip saddr limit rate 10/second }
or
nft add rule ip filter input meter example { ip saddr counter }
The actual meaning of NFT_SET_EVAL however, is
'set can be updated from the packet path'.
'meters' and packet-path insertions into sets, such as
'add @set { ip saddr }' use exactly the same kernel code (nft_dynset.c)
and thus require a set backend that provides the ->update() function.
The only set that provides this also is the only one that has the
NFT_SET_EVAL feature flag.
Removing the wrong check makes the above example work.
While at it, also fix the flag check during set instantiation to
allow supported combinations only.
Fixes: 8aeff920dc ("netfilter: nf_tables: add stateful object reference to set elements")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 714fbc7388 ]
Ensure that we set task->tk_rpc_status for all RPC level errors so that
the caller can distinguish between those and server reply status errors.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 71a228bc8d ]
If client mds session is evicted in CEPH_MDS_SESSION_OPENING state,
mds won't send session msg to client, and delayed_work skip
CEPH_MDS_SESSION_OPENING state session, the session hang forever.
Allow ceph_con_keepalive to reconnect a session in OPENING to avoid
session hang. Also, ensure that we skip sessions in RESTARTING and
REJECTED states since those states can't be resurrected by issuing
a keepalive.
Link: https://tracker.ceph.com/issues/41551
Signed-off-by: Erqi Chen chenerqi@gmail.com
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 606d102327 ]
It's protected by the s_gen_ttl_lock, so we should fetch under it
and ensure that we're using the same generation in both places.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 750670341a ]
When filling an inode with info from the MDS, i_blkbits is being
initialized using fl_stripe_unit, which contains the stripe unit in
bytes. Unfortunately, this doesn't make sense for directories as they
have fl_stripe_unit set to '0'. This means that i_blkbits will be set
to 0xff, causing an UBSAN undefined behaviour in i_blocksize():
UBSAN: Undefined behaviour in ./include/linux/fs.h:731:12
shift exponent 255 is too large for 32-bit type 'int'
Fix this by initializing i_blkbits to CEPH_BLOCK_SHIFT if fl_stripe_unit
is zero.
Signed-off-by: Luis Henriques <lhenriques@suse.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f22f812d5c ]
The size of struct fuse_req was reduced from 392B to 144B on a non-debug
config, thus the sanitize_global_limit() helper was setting a larger
default limit. This doesn't really reflect reduction in the memory used by
requests, since the fields removed from fuse_req were added to fuse_args
derived structs; e.g. sizeof(struct fuse_writepages_args) is 248B, thus
resulting in slightly more memory being used for writepage requests
overalll (due to using 256B slabs).
Make the calculatation ignore the size of fuse_req and use the old 392B
value.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a4098bc6ee ]
If MCFG area is not reserved in E820, Xen by default will defer its usage
until Dom0 registers it explicitly after ACPI parser recognizes it as
a reserved resource in DSDT. Having it reserved in E820 is not
mandatory according to "PCI Firmware Specification, rev 3.2" (par. 4.1.2)
and firmware is free to keep a hole in E820 in that place. Xen doesn't know
what exactly is inside this hole since it lacks full ACPI view of the
platform therefore it's potentially harmful to access MCFG region
without additional checks as some machines are known to provide
inconsistent information on the size of the region.
Now xen_mcfg_late() runs after acpi_init() which is too late as some basic
PCI enumeration starts exactly there as well. Trying to register a device
prior to MCFG reservation causes multiple problems with PCIe extended
capability initializations in Xen (e.g. SR-IOV VF BAR sizing). There are
no convenient hooks for us to subscribe to so register MCFG areas earlier
upon the first invocation of xen_add_device(). It should be safe to do once
since all the boot time buses must have their MCFG areas in MCFG table
already and we don't support PCI bus hot-plug.
Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0ce772fe79 ]
The p9_tag_alloc() does not initialize the transport error t_err field.
The struct p9_req_t *req is allocated and stored in a struct p9_client
variable. The field t_err is never initialized before p9_conn_cancel()
checks its value.
KUMSAN(KernelUninitializedMemorySantizer, a new error detection tool)
reports this bug.
==================================================================
BUG: KUMSAN: use of uninitialized memory in p9_conn_cancel+0x2d9/0x3b0
Read of size 4 at addr ffff88805f9b600c by task kworker/1:2/1216
CPU: 1 PID: 1216 Comm: kworker/1:2 Not tainted 5.2.0-rc4+ #28
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
Workqueue: events p9_write_work
Call Trace:
dump_stack+0x75/0xae
__kumsan_report+0x17c/0x3e6
kumsan_report+0xe/0x20
p9_conn_cancel+0x2d9/0x3b0
p9_write_work+0x183/0x4a0
process_one_work+0x4d1/0x8c0
worker_thread+0x6e/0x780
kthread+0x1ca/0x1f0
ret_from_fork+0x35/0x40
Allocated by task 1979:
save_stack+0x19/0x80
__kumsan_kmalloc.constprop.3+0xbc/0x120
kmem_cache_alloc+0xa7/0x170
p9_client_prepare_req.part.9+0x3b/0x380
p9_client_rpc+0x15e/0x880
p9_client_create+0x3d0/0xac0
v9fs_session_init+0x192/0xc80
v9fs_mount+0x67/0x470
legacy_get_tree+0x70/0xd0
vfs_get_tree+0x4a/0x1c0
do_mount+0xba9/0xf90
ksys_mount+0xa8/0x120
__x64_sys_mount+0x62/0x70
do_syscall_64+0x6d/0x1e0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Freed by task 0:
(stack is not available)
The buggy address belongs to the object at ffff88805f9b6008
which belongs to the cache p9_req_t of size 144
The buggy address is located 4 bytes inside of
144-byte region [ffff88805f9b6008, ffff88805f9b6098)
The buggy address belongs to the page:
page:ffffea00017e6d80 refcount:1 mapcount:0 mapping:ffff888068b63740 index:0xffff88805f9b7d90 compound_mapcount: 0
flags: 0x100000000010200(slab|head)
raw: 0100000000010200 ffff888068b66450 ffff888068b66450 ffff888068b63740
raw: ffff88805f9b7d90 0000000000100001 00000001ffffffff 0000000000000000
page dumped because: kumsan: bad access detected
==================================================================
Link: http://lkml.kernel.org/r/20190613070854.10434-1-shuaibinglu@126.com
Signed-off-by: Lu Shuaibing <shuaibinglu@126.com>
[dominique.martinet@cea.fr: grouped the added init with the others]
Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 98ef77d1aa ]
Eli Dorfman reports that after a series of idle disconnects, an
RPC/RDMA transport becomes unusable (rdma_create_qp returns
-ENOMEM). Problem was tracked down to increasing Send Queue size
after each reconnect.
The rdma_create_qp() API does not promise to leave its @qp_init_attr
parameter unaltered. In fact, some drivers do modify one or more of
its fields. Thus our calls to rdma_create_qp must use a fresh copy
of ib_qp_init_attr each time.
This fix is appropriate for kernels dating back to late 2007, though
it will have to be adapted, as the connect code has changed over the
years.
Reported-by: Eli Dorfman <eli@vastdata.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 395790566e ]
Commit 48be539dd4 ("xprtrdma: Introduce ->alloc_slot call-out for
xprtrdma") added a separate alloc_slot and free_slot to the RPC/RDMA
transport. Later, commit 75891f502f ("SUNRPC: Support for
congestion control when queuing is enabled") modified the generic
alloc/free_slot methods, but neglected the methods in xprtrdma.
Found via code review.
Fixes: 75891f502f ("SUNRPC: Support for congestion control ... ")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e2751463ea ]
In encode_attrs(), there is an if statement on line 1145 to check
whether label is NULL:
if (label && (attrmask[2] & FATTR4_WORD2_SECURITY_LABEL))
When label is NULL, it is used on lines 1178-1181:
*p++ = cpu_to_be32(label->lfs);
*p++ = cpu_to_be32(label->pi);
*p++ = cpu_to_be32(label->len);
p = xdr_encode_opaque_fixed(p, label->label, label->len);
To fix these bugs, label is checked before being used.
These bugs are found by a static analysis tool STCheck written by us.
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4ece3125f2 ]
integrity_kernel_read() can fail in which case we forward to call
ahash_request_free() on a currently running request. We have to wait
for its completion before we can free the request.
This was observed by interrupting a "find / -type f -xdev -print0 | xargs -0
cat 1>/dev/null" with ctrl-c on an IMA enabled filesystem.
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f5e1040196 ]
integrity_kernel_read() returns the number of bytes read. If this is
a short read then this positive value is returned from
ima_calc_file_hash_atfm(). Currently this is only indirectly called from
ima_calc_file_hash() and this function only tests for the return value
being zero or nonzero and also doesn't forward the return value.
Nevertheless there's no point in returning a positive value as an error,
so translate a short read into -EINVAL.
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6b8249abb0 ]
memory returned as part of nvmem_read via qfprom_read should be
freed by the consumer once done.
Existing code is not doing it so fix it.
Below memory leak detected by kmemleak
[<ffffff80088b7658>] kmemleak_alloc+0x50/0x84
[<ffffff80081df120>] __kmalloc+0xe8/0x168
[<ffffff80086db350>] nvmem_cell_read+0x30/0x80
[<ffffff8008632790>] qfprom_read+0x4c/0x7c
[<ffffff80086335a4>] calibrate_v1+0x34/0x204
[<ffffff8008632518>] tsens_probe+0x164/0x258
[<ffffff80084e0a1c>] platform_drv_probe+0x80/0xa0
[<ffffff80084de4f4>] really_probe+0x208/0x248
[<ffffff80084de2c4>] driver_probe_device+0x98/0xc0
[<ffffff80084dec54>] __device_attach_driver+0x9c/0xac
[<ffffff80084dca74>] bus_for_each_drv+0x60/0x8c
[<ffffff80084de634>] __device_attach+0x8c/0x100
[<ffffff80084de6c8>] device_initial_probe+0x20/0x28
[<ffffff80084dcbb8>] bus_probe_device+0x34/0x7c
[<ffffff80084deb08>] deferred_probe_work_func+0x6c/0x98
[<ffffff80080c3da8>] process_one_work+0x160/0x2f8
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Acked-by: Amit Kucheria <amit.kucheria@linaro.org>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit a8fabb3852 upstream.
In case a user process using xenbus has open transactions and is killed
e.g. via ctrl-C the following cleanup of the allocated resources might
result in a deadlock due to trying to end a transaction in the xenbus
worker thread:
[ 2551.474706] INFO: task xenbus:37 blocked for more than 120 seconds.
[ 2551.492215] Tainted: P OE 5.0.0-29-generic #5
[ 2551.510263] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2551.528585] xenbus D 0 37 2 0x80000080
[ 2551.528590] Call Trace:
[ 2551.528603] __schedule+0x2c0/0x870
[ 2551.528606] ? _cond_resched+0x19/0x40
[ 2551.528632] schedule+0x2c/0x70
[ 2551.528637] xs_talkv+0x1ec/0x2b0
[ 2551.528642] ? wait_woken+0x80/0x80
[ 2551.528645] xs_single+0x53/0x80
[ 2551.528648] xenbus_transaction_end+0x3b/0x70
[ 2551.528651] xenbus_file_free+0x5a/0x160
[ 2551.528654] xenbus_dev_queue_reply+0xc4/0x220
[ 2551.528657] xenbus_thread+0x7de/0x880
[ 2551.528660] ? wait_woken+0x80/0x80
[ 2551.528665] kthread+0x121/0x140
[ 2551.528667] ? xb_read+0x1d0/0x1d0
[ 2551.528670] ? kthread_park+0x90/0x90
[ 2551.528673] ret_from_fork+0x35/0x40
Fix this by doing the cleanup via a workqueue instead.
Reported-by: James Dingwall <james@dingwall.me.uk>
Fixes: fd8aa9095a ("xen: optimize xenbus driver for multiple concurrent xenstore accesses")
Cc: <stable@vger.kernel.org> # 4.11
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f1f028ff89 upstream.
commit 6953c57ab1 "gpio: of: Handle SPI chipselect legacy bindings"
did introduce logic to centrally handle the legacy spi-cs-high property
in combination with cs-gpios. This assumes that the polarity
of the CS has to be inverted if spi-cs-high is missing, even
and especially if non-legacy GPIO_ACTIVE_HIGH is specified.
The DTS for the GTA04 was orginally introduced under the assumption
that there is no need for spi-cs-high if the gpio is defined with
proper polarity GPIO_ACTIVE_HIGH.
This was not a problem until gpiolib changed the interpretation of
GPIO_ACTIVE_HIGH and missing spi-cs-high.
The effect is that the missing spi-cs-high is now interpreted as CS being
low (despite GPIO_ACTIVE_HIGH) which turns off the SPI interface when the
panel is to be programmed by the panel driver.
Therefore, we have to add the redundant and legacy spi-cs-high property
to properly activate CS.
Cc: stable@vger.kernel.org
Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cf387d9644 upstream.
With PFN_MODE_PMEM namespace, the memmap area is allocated from the device
area. Some architectures map the memmap area with large page size. On
architectures like ppc64, 16MB page for memap mapping can map 262144 pfns.
This maps a namespace size of 16G.
When populating memmap region with 16MB page from the device area,
make sure the allocated space is not used to map resources outside this
namespace. Such usage of device area will prevent a namespace destroy.
Add resource end pnf in altmap and use that to check if the memmap area
allocation can map pfn outside the namespace. On ppc64 in such case we fallback
to allocation from memory.
This fix kernel crash reported below:
[ 132.034989] WARNING: CPU: 13 PID: 13719 at mm/memremap.c:133 devm_memremap_pages_release+0x2d8/0x2e0
[ 133.464754] BUG: Unable to handle kernel data access at 0xc00c00010b204000
[ 133.464760] Faulting instruction address: 0xc00000000007580c
[ 133.464766] Oops: Kernel access of bad area, sig: 11 [#1]
[ 133.464771] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
.....
[ 133.464901] NIP [c00000000007580c] vmemmap_free+0x2ac/0x3d0
[ 133.464906] LR [c0000000000757f8] vmemmap_free+0x298/0x3d0
[ 133.464910] Call Trace:
[ 133.464914] [c000007cbfd0f7b0] [c0000000000757f8] vmemmap_free+0x298/0x3d0 (unreliable)
[ 133.464921] [c000007cbfd0f8d0] [c000000000370a44] section_deactivate+0x1a4/0x240
[ 133.464928] [c000007cbfd0f980] [c000000000386270] __remove_pages+0x3a0/0x590
[ 133.464935] [c000007cbfd0fa50] [c000000000074158] arch_remove_memory+0x88/0x160
[ 133.464942] [c000007cbfd0fae0] [c0000000003be8c0] devm_memremap_pages_release+0x150/0x2e0
[ 133.464949] [c000007cbfd0fb70] [c000000000738ea0] devm_action_release+0x30/0x50
[ 133.464955] [c000007cbfd0fb90] [c00000000073a5a4] release_nodes+0x344/0x400
[ 133.464961] [c000007cbfd0fc40] [c00000000073378c] device_release_driver_internal+0x15c/0x250
[ 133.464968] [c000007cbfd0fc80] [c00000000072fd14] unbind_store+0x104/0x110
[ 133.464973] [c000007cbfd0fcd0] [c00000000072ee24] drv_attr_store+0x44/0x70
[ 133.464981] [c000007cbfd0fcf0] [c0000000004a32bc] sysfs_kf_write+0x6c/0xa0
[ 133.464987] [c000007cbfd0fd10] [c0000000004a1dfc] kernfs_fop_write+0x17c/0x250
[ 133.464993] [c000007cbfd0fd60] [c0000000003c348c] __vfs_write+0x3c/0x70
[ 133.464999] [c000007cbfd0fd80] [c0000000003c75d0] vfs_write+0xd0/0x250
djbw: Aneesh notes that this crash can likely be triggered in any kernel that
supports 'papr_scm', so flagging that commit for -stable consideration.
Fixes: b5beae5e22 ("powerpc/pseries: Add driver for PAPR SCM regions")
Cc: <stable@vger.kernel.org>
Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reviewed-by: Pankaj Gupta <pagupta@redhat.com>
Tested-by: Santosh Sivaraj <santosh@fossix.org>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Link: https://lore.kernel.org/r/20190910062826.10041-1-aneesh.kumar@linux.ibm.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 89340d0935 upstream.
This patch reverts commit 75437bb304 (locking/pvqspinlock: Don't
wait if vCPU is preempted). A large performance regression was caused
by this commit. on over-subscription scenarios.
The test was run on a Xeon Skylake box, 2 sockets, 40 cores, 80 threads,
with three VMs of 80 vCPUs each. The score of ebizzy -M is reduced from
13000-14000 records/s to 1700-1800 records/s:
Host Guest score
vanilla w/o kvm optimizations upstream 1700-1800 records/s
vanilla w/o kvm optimizations revert 13000-14000 records/s
vanilla w/ kvm optimizations upstream 4500-5000 records/s
vanilla w/ kvm optimizations revert 14000-15500 records/s
Exit from aggressive wait-early mechanism can result in premature yield
and extra scheduling latency.
Actually, only 6% of wait_early events are caused by vcpu_is_preempted()
being true. However, when one vCPU voluntarily releases its vCPU, all
the subsequently waiters in the queue will do the same and the cascading
effect leads to bad performance.
kvm optimizations:
[1] commit d73eb57b80 (KVM: Boost vCPUs that are delivering interrupts)
[2] commit 266e85a5ec (KVM: X86: Boost queue head vCPU to mitigate lock waiter preemption)
Tested-by: loobinliu@tencent.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: loobinliu@tencent.com
Cc: stable@vger.kernel.org
Fixes: 75437bb304 (locking/pvqspinlock: Don't wait if vCPU is preempted)
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 121bd08b02 upstream.
We must not unconditionally set the DMA snoop bit; if the DMA API is
assuming that the device is not DMA coherent, and the device snoops the
CPU caches, the device can see stale cache lines brought in by
speculative prefetch.
This leads to the device seeing stale data, potentially resulting in
corrupted data transfers. Commonly, this results in a descriptor fetch
error such as:
mmc0: ADMA error
mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
mmc0: sdhci: Sys addr: 0x00000000 | Version: 0x00002202
mmc0: sdhci: Blk size: 0x00000008 | Blk cnt: 0x00000001
mmc0: sdhci: Argument: 0x00000000 | Trn mode: 0x00000013
mmc0: sdhci: Present: 0x01f50008 | Host ctl: 0x00000038
mmc0: sdhci: Power: 0x00000003 | Blk gap: 0x00000000
mmc0: sdhci: Wake-up: 0x00000000 | Clock: 0x000040d8
mmc0: sdhci: Timeout: 0x00000003 | Int stat: 0x00000001
mmc0: sdhci: Int enab: 0x037f108f | Sig enab: 0x037f108b
mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00002202
mmc0: sdhci: Caps: 0x35fa0000 | Caps_1: 0x0000af00
mmc0: sdhci: Cmd: 0x0000333a | Max curr: 0x00000000
mmc0: sdhci: Resp[0]: 0x00000920 | Resp[1]: 0x001d8a33
mmc0: sdhci: Resp[2]: 0x325b5900 | Resp[3]: 0x3f400e00
mmc0: sdhci: Host ctl2: 0x00000000
mmc0: sdhci: ADMA Err: 0x00000009 | ADMA Ptr: 0x000000236d43820c
mmc0: sdhci: ============================================
mmc0: error -5 whilst initialising SD card
but can lead to other errors, and potentially direct the SDHCI
controller to read/write data to other memory locations (e.g. if a valid
descriptor is visible to the device in a stale cache line.)
Fix this by ensuring that the DMA snoop bit corresponds with the
behaviour of the DMA API. Since the driver currently only supports DT,
use of_dma_is_coherent(). Note that device_get_dma_attr() can not be
used as that risks re-introducing this bug if/when the driver is
converted to ACPI.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d1c536e317 upstream.
ADMA errors are potentially data corrupting events; although we print
the register state, we do not usefully print the ADMA descriptors.
Worse than that, we print them by referencing their virtual address
which is meaningless when the register state gives us the DMA address
of the failing descriptor.
Print the ADMA descriptors giving their DMA addresses rather than their
virtual addresses, and print them using SDHCI_DUMP() rather than DBG().
We also do not show the correct value of the interrupt status register;
the register dump shows the current value, after we have cleared the
pending interrupts we are going to service. What is more useful is to
print the interrupts that _were_ pending at the time the ADMA error was
encountered. Fix that too.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b960bc448a upstream.
The SDHCI controller on Tegra186 supports 40-bit addressing, which is
usually enough to address all of system memory. However, if the SDHCI
controller is behind an IOMMU, the address space can go beyond. This
happens on Tegra186 and later where the ARM SMMU has an input address
space of 48 bits. If the DMA API is backed by this ARM SMMU, the top-
down IOVA allocator will cause IOV addresses to be returned that the
SDHCI controller cannot access.
Unfortunately, prior to the introduction of the ->set_dma_mask() host
operation, the SDHCI core would set either a 64-bit DMA mask if the
controller claimed to support 64-bit addressing, or a 32-bit DMA mask
otherwise.
Since the full 64 bits cannot be addressed on Tegra, this had to be
worked around in commit 68481a7e1c ("mmc: tegra: Mark 64 bit dma
broken on Tegra186") by setting the SDHCI_QUIRK2_BROKEN_64_BIT_DMA
quirk, which effectively restricts the DMA mask to 32 bits.
One disadvantage of this is that dma_map_*() APIs will now try to use
the swiotlb to bounce DMA to addresses beyond of the controller's DMA
mask. This in turn caused degraded performance and can lead to
situations where the swiotlb buffer is exhausted, which in turn leads
to DMA transfers to fail.
With the recent introduction of the ->set_dma_mask() host operation,
this can now be properly fixed. For each generation of Tegra, the exact
supported DMA mask can be configured. This kills two birds with one
stone: it avoids the use of bounce buffers because system memory never
exceeds the addressable memory range of the SDHCI controllers on these
devices, and at the same time when an IOMMU is involved, it prevents
IOV addresses from being allocated beyond the addressible range of the
controllers.
Since the DMA mask is now properly handled, the 64-bit DMA quirk can be
removed.
Signed-off-by: Nicolin Chen <nicoleotsuka@gmail.com>
[treding@nvidia.com: provide more background in commit message]
Tested-by: Nicolin Chen <nicoleotsuka@gmail.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Cc: stable@vger.kernel.org # v4.15 +
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0a3242bdb4 upstream.
when creating a vGPU workload, the guest context head pointer should
be updated correctly by comparing with the exsiting workload in the
guest worklod queue including the current running context.
in some situation, there is a running context A and then received 2 new
vGPU workload context B and A. in the new workload context A, it's head
pointer should be updated with the running context A's tail.
v2: walk through guest workload list in backward way.
Cc: stable@vger.kernel.org
Signed-off-by: Xiaolin Zhang <xiaolin.zhang@intel.com>
Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 698c1aa9f8 upstream.
On the ThinkPad P71, we have one eDP connector exposed along with 5 DP
connectors, resulting in a total of 11 TMDS encoders. Since the GPU on
this system is also capable of MST, we create an additional 4 fake MST
encoders for each DP port. Unfortunately, we also do this for the eDP
port as well, resulting in:
1 eDP port: +1 TMDS encoder
+4 DPMST encoders
5 DP ports: +2 TMDS encoders
+4 DPMST encoders
*5 ports
== 35 encoders
Which breaks things, since DRM has a hard coded limit of 32 encoders.
So, fix this by not creating MSTMs for any eDP connectors. This brings
us down to 31 encoders, although we can do better.
This fixes driver probing for nouveau on the ThinkPad P71.
Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 28ba1b1da4 upstream.
Now that -Wimplicit-fallthrough is passed to GCC by default, the
following warnings shows up:
../drivers/gpu/drm/arm/malidp_hw.c: In function ‘malidp_format_get_bpp’:
../drivers/gpu/drm/arm/malidp_hw.c:387:8: warning: this statement may fall
through [-Wimplicit-fallthrough=]
bpp = 30;
~~~~^~~~
../drivers/gpu/drm/arm/malidp_hw.c:388:3: note: here
case DRM_FORMAT_YUV420_10BIT:
^~~~
../drivers/gpu/drm/arm/malidp_hw.c: In function ‘malidp_se_irq’:
../drivers/gpu/drm/arm/malidp_hw.c:1311:4: warning: this statement may fall
through [-Wimplicit-fallthrough=]
drm_writeback_signal_completion(&malidp->mw_connector, 0);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../drivers/gpu/drm/arm/malidp_hw.c:1313:3: note: here
case MW_START:
^~~~
Rework to add a 'break;' in a case that didn't have it so that
the compiler doesn't warn about fall-through.
Cc: stable@vger.kernel.org # v5.2+
Fixes: b8207562ab ("drm/arm/malidp: Specified the rotation memory requirements for AFBC YUV formats")
Acked-by: Liviu Dudau <liviu.dudau@arm.com>
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Liviu Dudau <Liviu.Dudau@arm.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190730153056.3606-1-anders.roxell@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cffb4c3ea3 upstream.
There was a integer wraparound when mode_clock became too high,
and we didn't correct for the FEC overhead factor when dividing,
with the calculations breaking at HBR3.
As a result our calculated bpp was way too high, and the link width
limitation never came into effect.
Print out the resulting bpp calcululations as a sanity check, just
in case we ever have to debug it later on again.
We also used the wrong factor for FEC. While bspec mentions 2.4%,
all the calculations use 1/0.972261, and the same ratio should be
applied to data M/N as well, so use it there when FEC is enabled.
This fixes the FIFO underrun we are seeing with FEC enabled.
Changes since v2:
- Handle fec_enable in intel_link_compute_m_n, so only data M/N is adjusted. (Ville)
- Fix initial hardware readout for FEC. (Ville)
Changes since v3:
- Remove bogus fec_to_mode_clock. (Ville)
Changes since v4:
- Use the correct register for icl. (Ville)
- Split hw readout to a separate patch.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Fixes: d9218c8f6c ("drm/i915/dp: Add helpers for Compressed BPP and Slice Count for DSC")
Cc: <stable@vger.kernel.org> # v5.0+
Cc: Manasi Navare <manasi.d.navare@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190925082110.17439-1-maarten.lankhorst@linux.intel.com
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
(cherry picked from commit ed06efb801)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 443f2d5ba1 upstream.
Observe a segmentation fault when 'perf stat' is asked to repeat forever
with the interval option.
Without fix:
# perf stat -r 0 -I 5000 -e cycles -a sleep 10
# time counts unit events
5.000211692 3,13,89,82,34,157 cycles
10.000380119 1,53,98,52,22,294 cycles
10.040467280 17,16,79,265 cycles
Segmentation fault
This problem was only observed when we use forever option aka -r 0 and
works with limited repeats. Calling print_counter with ts being set to
NULL, is not a correct option when interval is set. Hence avoid
print_counter(NULL,..) if interval is set.
With fix:
# perf stat -r 0 -I 5000 -e cycles -a sleep 10
# time counts unit events
5.019866622 3,15,14,43,08,697 cycles
10.039865756 3,15,16,31,95,261 cycles
10.059950628 1,26,05,47,158 cycles
5.009902655 3,14,52,62,33,932 cycles
10.019880228 3,14,52,22,89,154 cycles
10.030543876 66,90,18,333 cycles
5.009848281 3,14,51,98,25,437 cycles
10.029854402 3,15,14,93,04,918 cycles
5.009834177 3,14,51,95,92,316 cycles
Committer notes:
Did the 'git bisect' to find the cset introducing the problem to add the
Fixes tag below, and at that time the problem reproduced as:
(gdb) run stat -r0 -I500 sleep 1
<SNIP>
Program received signal SIGSEGV, Segmentation fault.
print_interval (prefix=prefix@entry=0x7fffffffc8d0 "", ts=ts@entry=0x0) at builtin-stat.c:866
866 sprintf(prefix, "%6lu.%09lu%s", ts->tv_sec, ts->tv_nsec, csv_sep);
(gdb) bt
#0 print_interval (prefix=prefix@entry=0x7fffffffc8d0 "", ts=ts@entry=0x0) at builtin-stat.c:866
#1 0x000000000041860a in print_counters (ts=ts@entry=0x0, argc=argc@entry=2, argv=argv@entry=0x7fffffffd640) at builtin-stat.c:938
#2 0x0000000000419a7f in cmd_stat (argc=2, argv=0x7fffffffd640, prefix=<optimized out>) at builtin-stat.c:1411
#3 0x000000000045c65a in run_builtin (p=p@entry=0x6291b8 <commands+216>, argc=argc@entry=5, argv=argv@entry=0x7fffffffd640) at perf.c:370
#4 0x000000000045c893 in handle_internal_command (argc=5, argv=0x7fffffffd640) at perf.c:429
#5 0x000000000045c8f1 in run_argv (argcp=argcp@entry=0x7fffffffd4ac, argv=argv@entry=0x7fffffffd4a0) at perf.c:473
#6 0x000000000045cac9 in main (argc=<optimized out>, argv=<optimized out>) at perf.c:588
(gdb)
Mostly the same as just before this patch:
Program received signal SIGSEGV, Segmentation fault.
0x00000000005874a7 in print_interval (config=0xa1f2a0 <stat_config>, evlist=0xbc9b90, prefix=0x7fffffffd1c0 "`", ts=0x0) at util/stat-display.c:964
964 sprintf(prefix, "%6lu.%09lu%s", ts->tv_sec, ts->tv_nsec, config->csv_sep);
(gdb) bt
#0 0x00000000005874a7 in print_interval (config=0xa1f2a0 <stat_config>, evlist=0xbc9b90, prefix=0x7fffffffd1c0 "`", ts=0x0) at util/stat-display.c:964
#1 0x0000000000588047 in perf_evlist__print_counters (evlist=0xbc9b90, config=0xa1f2a0 <stat_config>, _target=0xa1f0c0 <target>, ts=0x0, argc=2, argv=0x7fffffffd670)
at util/stat-display.c:1172
#2 0x000000000045390f in print_counters (ts=0x0, argc=2, argv=0x7fffffffd670) at builtin-stat.c:656
#3 0x0000000000456bb5 in cmd_stat (argc=2, argv=0x7fffffffd670) at builtin-stat.c:1960
#4 0x00000000004dd2e0 in run_builtin (p=0xa30e00 <commands+288>, argc=5, argv=0x7fffffffd670) at perf.c:310
#5 0x00000000004dd54d in handle_internal_command (argc=5, argv=0x7fffffffd670) at perf.c:362
#6 0x00000000004dd694 in run_argv (argcp=0x7fffffffd4cc, argv=0x7fffffffd4c0) at perf.c:406
#7 0x00000000004dda11 in main (argc=5, argv=0x7fffffffd670) at perf.c:531
(gdb)
Fixes: d4f63a4741 ("perf stat: Introduce print_counters function")
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org # v4.2+
Link: http://lore.kernel.org/lkml/20190904094738.9558-3-srikar@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0216234c2e upstream.
We release wrong pointer on error path in cpu_cache_level__read
function, leading to segfault:
(gdb) r record ls
Starting program: /root/perf/tools/perf/perf record ls
...
[ perf record: Woken up 1 times to write data ]
double free or corruption (out)
Thread 1 "perf" received signal SIGABRT, Aborted.
0x00007ffff7463798 in raise () from /lib64/power9/libc.so.6
(gdb) bt
#0 0x00007ffff7463798 in raise () from /lib64/power9/libc.so.6
#1 0x00007ffff7443bac in abort () from /lib64/power9/libc.so.6
#2 0x00007ffff74af8bc in __libc_message () from /lib64/power9/libc.so.6
#3 0x00007ffff74b92b8 in malloc_printerr () from /lib64/power9/libc.so.6
#4 0x00007ffff74bb874 in _int_free () from /lib64/power9/libc.so.6
#5 0x0000000010271260 in __zfree (ptr=0x7fffffffa0b0) at ../../lib/zalloc..
#6 0x0000000010139340 in cpu_cache_level__read (cache=0x7fffffffa090, cac..
#7 0x0000000010143c90 in build_caches (cntp=0x7fffffffa118, size=<optimiz..
...
Releasing the proper pointer.
Fixes: 720e98b5fa ("perf tools: Add perf data cache feature")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org: # v4.6+
Link: http://lore.kernel.org/lkml/20190912105235.10689-1-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 144783a80c upstream.
Converting from ms to s requires dividing by 1000, not multiplying. So
this is currently taking the smaller of new_timeout and 1.28e8,
i.e. effectively new_timeout.
The driver knows what it set max_hw_heartbeat_ms to, so use that
value instead of doing a division at run-time.
FWIW, this can easily be tested by booting into a busybox shell and
doing "watchdog -t 5 -T 130 /dev/watchdog" - without this patch, the
watchdog fires after 130&127 == 2 seconds.
Fixes: b07e228eee "watchdog: imx2_wdt: Fix set_timeout for big timeout values"
Cc: stable@vger.kernel.org # 5.2 plus anything the above got backported to
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20190812131356.23039-1-linux@rasmusvillemoes.dk
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e3dffa4f6c upstream.
VMD maps child device config spaces to the VMD Config BAR linearly
regardless of the starting bus offset. Because of this, the config
address decode must ignore starting bus offsets when mapping the BDF to
the config space address.
Fixes: 2a5a9c9a20 ("PCI: vmd: Add offset to bus numbers if necessary")
Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: stable@vger.kernel.org # v5.2+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e430d802d6 upstream.
The timer delayed for more than 3 seconds warning was triggered during
testing.
Workqueue: events_unbound sched_tick_remote
RIP: 0010:sched_tick_remote+0xee/0x100
...
Call Trace:
process_one_work+0x18c/0x3a0
worker_thread+0x30/0x380
kthread+0x113/0x130
ret_from_fork+0x22/0x40
The reason is that the code in collect_expired_timers() uses jiffies
unprotected:
if (next_event > jiffies)
base->clk = jiffies;
As the compiler is allowed to reload the value base->clk can advance
between the check and the store and in the worst case advance farther than
next event. That causes the timer expiry to be delayed until the wheel
pointer wraps around.
Convert the code to use READ_ONCE()
Fixes: 236968383c ("timers: Optimize collect_expired_timers() for NOHZ")
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Liang ZhiCheng <liangzhicheng@baidu.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1568894687-14499-1-git-send-email-lirongqing@baidu.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 314eed30ed upstream.
When running on a system with >512MB RAM with a 32-bit kernel built with:
CONFIG_DEBUG_VIRTUAL=y
CONFIG_HIGHMEM=y
CONFIG_HARDENED_USERCOPY=y
all execve()s will fail due to argv copying into kmap()ed pages, and on
usercopy checking the calls ultimately of virt_to_page() will be looking
for "bad" kmap (highmem) pointers due to CONFIG_DEBUG_VIRTUAL=y:
------------[ cut here ]------------
kernel BUG at ../arch/x86/mm/physaddr.c:83!
invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.3.0-rc8 #6
Hardware name: Dell Inc. Inspiron 1318/0C236D, BIOS A04 01/15/2009
EIP: __phys_addr+0xaf/0x100
...
Call Trace:
__check_object_size+0xaf/0x3c0
? __might_sleep+0x80/0xa0
copy_strings+0x1c2/0x370
copy_strings_kernel+0x2b/0x40
__do_execve_file+0x4ca/0x810
? kmem_cache_alloc+0x1c7/0x370
do_execve+0x1b/0x20
...
The check is from arch/x86/mm/physaddr.c:
VIRTUAL_BUG_ON((phys_addr >> PAGE_SHIFT) > max_low_pfn);
Due to the kmap() in fs/exec.c:
kaddr = kmap(kmapped_page);
...
if (copy_from_user(kaddr+offset, str, bytes_to_copy)) ...
Now we can fetch the correct page to avoid the pfn check. In both cases,
hardened usercopy will need to walk the page-span checker (if enabled)
to do sanity checking.
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Fixes: f5509cc18d ("mm: Hardened usercopy")
Cc: Matthew Wilcox <willy@infradead.org>
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/201909171056.7F2FFD17@keescook
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 17f8607a16 upstream.
Original changelog from Steve Rostedt (except last sentence which
explains the problem, and the Fixes: tag):
I performed a three way histogram with the following commands:
echo 'irq_lat u64 lat pid_t pid' > synthetic_events
echo 'wake_lat u64 lat u64 irqlat pid_t pid' >> synthetic_events
echo 'hist:keys=common_pid:irqts=common_timestamp.usecs if function == 0xffffffff81200580' > events/timer/hrtimer_start/trigger
echo 'hist:keys=common_pid:lat=common_timestamp.usecs-$irqts:onmatch(timer.hrtimer_start).irq_lat($lat,pid) if common_flags & 1' > events/sched/sched_waking/trigger
echo 'hist:keys=pid:wakets=common_timestamp.usecs,irqlat=lat' > events/synthetic/irq_lat/trigger
echo 'hist:keys=next_pid:lat=common_timestamp.usecs-$wakets,irqlat=$irqlat:onmatch(synthetic.irq_lat).wake_lat($lat,$irqlat,next_pid)' > events/sched/sched_switch/trigger
echo 1 > events/synthetic/wake_lat/enable
Basically I wanted to see:
hrtimer_start (calling function tick_sched_timer)
Note:
# grep tick_sched_timer /proc/kallsyms
ffffffff81200580 t tick_sched_timer
And save the time of that, and then record sched_waking if it is called
in interrupt context and with the same pid as the hrtimer_start, it
will record the latency between that and the waking event.
I then look at when the task that is woken is scheduled in, and record
the latency between the wakeup and the task running.
At the end, the wake_lat synthetic event will show the wakeup to
scheduled latency, as well as the irq latency in from hritmer_start to
the wakeup. The problem is that I found this:
<idle>-0 [007] d... 190.485261: wake_lat: lat=27 irqlat=190485230 pid=698
<idle>-0 [005] d... 190.485283: wake_lat: lat=40 irqlat=190485239 pid=10
<idle>-0 [002] d... 190.488327: wake_lat: lat=56 irqlat=190488266 pid=335
<idle>-0 [005] d... 190.489330: wake_lat: lat=64 irqlat=190489262 pid=10
<idle>-0 [003] d... 190.490312: wake_lat: lat=43 irqlat=190490265 pid=77
<idle>-0 [005] d... 190.493322: wake_lat: lat=54 irqlat=190493262 pid=10
<idle>-0 [005] d... 190.497305: wake_lat: lat=35 irqlat=190497267 pid=10
<idle>-0 [005] d... 190.501319: wake_lat: lat=50 irqlat=190501264 pid=10
The irqlat seemed quite large! Investigating this further, if I had
enabled the irq_lat synthetic event, I noticed this:
<idle>-0 [002] d.s. 249.429308: irq_lat: lat=164968 pid=335
<idle>-0 [002] d... 249.429369: wake_lat: lat=55 irqlat=249429308 pid=335
Notice that the timestamp of the irq_lat "249.429308" is awfully
similar to the reported irqlat variable. In fact, all instances were
like this. It appeared that:
irqlat=$irqlat
Wasn't assigning the old $irqlat to the new irqlat variable, but
instead was assigning the $irqts to it.
The issue is that assigning the old $irqlat to the new irqlat variable
creates a variable reference alias, but the alias creation code
forgets to make sure the alias uses the same var_ref_idx to access the
reference.
Link: http://lkml.kernel.org/r/1567375321.5282.12.camel@kernel.org
Cc: Linux Trace Devel <linux-trace-devel@vger.kernel.org>
Cc: linux-rt-users <linux-rt-users@vger.kernel.org>
Cc: stable@vger.kernel.org
Fixes: 7e8b88a30b ("tracing: Add hist trigger support for variable reference aliases")
Reported-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fe55e77032 upstream.
when the battery is set to sbs-mode and no gpio detection is enabled
"health" is always returning a value even when the battery is not present.
All other fields return "not present".
This leads to a scenario where the driver is constantly switching between
"present" and "not present" state. This generates a lot of constant
traffic on the i2c.
This commit changes the response of "health" to an error when the battery
is not responding leading to a consistent "not present" state.
Fixes: 76b16f4cdf ("power: supply: sbs-battery: don't assume MANUFACTURER_DATA formats")
Cc: <stable@vger.kernel.org>
Signed-off-by: Michael Nosthoff <committed@heine.so>
Reviewed-by: Brian Norris <briannorris@chromium.org>
Tested-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 99956a9e08 upstream.
the type flag is stored in the chip->flags field not in the
client->flags field. This currently leads to never using the ti
specific health function as client->flags doesn't use that bit.
So it's always falling back to the general one.
Fixes: 76b16f4cdf ("power: supply: sbs-battery: don't assume MANUFACTURER_DATA formats")
Cc: <stable@vger.kernel.org>
Signed-off-by: Michael Nosthoff <committed@heine.so>
Reviewed-by: Brian Norris <briannorris@chromium.org>
Reviewed-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 76a95bd8f9 upstream.
When ccree driver runs it checks the state of the Trusted Execution
Environment CryptoCell driver before proceeding. We did not account
for cases where the TEE side is not ready or not available at all.
Fix it by only considering TEE error state after sync with the TEE
side driver.
Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Fixes: ab8ec9658f ("crypto: ccree - add FIPS support")
CC: stable@vger.kernel.org # v4.17+
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 48f89d2a29 upstream.
IV transfer from ofifo to class2 (set up at [29][30]) is not guaranteed
to be scheduled before the data transfer from ofifo to external memory
(set up at [38]:
[29] 10FA0004 ld: ind-nfifo (len=4) imm
[30] 81F00010 <nfifo_entry: ofifo->class2 type=msg len=16>
[31] 14820004 ld: ccb2-datasz len=4 offs=0 imm
[32] 00000010 data:0x00000010
[33] 8210010D operation: cls1-op aes cbc init-final enc
[34] A8080B04 math: (seqin + math0)->vseqout len=4
[35] 28000010 seqfifold: skip len=16
[36] A8080A04 math: (seqin + math0)->vseqin len=4
[37] 2F1E0000 seqfifold: both msg1->2-last2-last1 len=vseqinsz
[38] 69300000 seqfifostr: msg len=vseqoutsz
[39] 5C20000C seqstr: ccb2 ctx len=12 offs=0
If ofifo -> external memory transfer happens first, DECO will hang
(issuing a Watchdog Timeout error, if WDOG is enabled) waiting for
data availability in ofifo for the ofifo -> c2 ififo transfer.
Make sure IV transfer happens first by waiting for all CAAM internal
transfers to end before starting payload transfer.
New descriptor with jump command inserted at [37]:
[..]
[36] A8080A04 math: (seqin + math0)->vseqin len=4
[37] A1000401 jump: jsl1 all-match[!nfifopend] offset=[01] local->[38]
[38] 2F1E0000 seqfifold: both msg1->2-last2-last1 len=vseqinsz
[39] 69300000 seqfifostr: msg len=vseqoutsz
[40] 5C20000C seqstr: ccb2 ctx len=12 offs=0
[Note: the issue is present in the descriptor from the very beginning
(cf. Fixes tag). However I've marked it v4.19+ since it's the oldest
maintained kernel that the patch applies clean against.]
Cc: <stable@vger.kernel.org> # v4.19+
Fixes: 1acebad3d8 ("crypto: caam - faster aead implementation")
Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 51fab3d730 upstream.
ERN handler calls the caam/qi frontend "done" callback with a status
of -EIO. This is incorrect, since the callback expects a status value
meaningful for the crypto engine - hence the cryptic messages
like the one below:
platform caam_qi: 15: unknown error source
Fix this by providing the callback with:
-the status returned by the crypto engine (fd[status]) in case
it contains an error, OR
-a QI "No error" code otherwise; this will trigger the message:
platform caam_qi: 50000000: Queue Manager Interface: No error
which is fine, since QMan driver provides details about the cause of
failure
Cc: <stable@vger.kernel.org> # v5.1+
Fixes: 67c2315def ("crypto: caam - add Queue Interface (QI) backend support")
Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
Reviewed-by: Iuliana Prodan <iuliana.prodan@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0ba3c026e6 upstream.
skcipher_walk_done may be called with an error by internal or
external callers. For those internal callers we shouldn't unmap
pages but for external callers we must unmap any pages that are
in use.
This patch distinguishes between the two cases by checking whether
walk->nbytes is zero or not. For internal callers, we now set
walk->nbytes to zero prior to the call. For external callers,
walk->nbytes has always been non-zero (as zero is used to indicate
the termination of a walk).
Reported-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Fixes: 5cde0af2a9 ("[CRYPTO] cipher: Added block cipher type")
Cc: <stable@vger.kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1b82feb6c5 upstream.
It seems that smp_processor_id() is only used for a best-effort
load-balancing, refer to qat_crypto_get_instance_node(). It's not feasible
to disable preemption for the duration of the crypto requests. Therefore,
just silence the warning. This commit is similar to e7a9b05ca4
("crypto: cavium - Fix smp_processor_id() warnings").
Silences the following splat:
BUG: using smp_processor_id() in preemptible [00000000] code: cryptomgr_test/2904
caller is qat_alg_ablkcipher_setkey+0x300/0x4a0 [intel_qat]
CPU: 1 PID: 2904 Comm: cryptomgr_test Tainted: P O 4.14.69 #1
...
Call Trace:
dump_stack+0x5f/0x86
check_preemption_disabled+0xd3/0xe0
qat_alg_ablkcipher_setkey+0x300/0x4a0 [intel_qat]
skcipher_setkey_ablkcipher+0x2b/0x40
__test_skcipher+0x1f3/0xb20
? cpumask_next_and+0x26/0x40
? find_busiest_group+0x10e/0x9d0
? preempt_count_add+0x49/0xa0
? try_module_get+0x61/0xf0
? crypto_mod_get+0x15/0x30
? __kmalloc+0x1df/0x1f0
? __crypto_alloc_tfm+0x116/0x180
? crypto_skcipher_init_tfm+0xa6/0x180
? crypto_create_tfm+0x4b/0xf0
test_skcipher+0x21/0xa0
alg_test_skcipher+0x3f/0xa0
alg_test.part.6+0x126/0x2a0
? finish_task_switch+0x21b/0x260
? __schedule+0x1e9/0x800
? __wake_up_common+0x8d/0x140
cryptomgr_test+0x40/0x50
kthread+0xff/0x130
? cryptomgr_notify+0x540/0x540
? kthread_create_on_node+0x70/0x70
ret_from_fork+0x24/0x50
Fixes: ed8ccaef52 ("crypto: qat - Add support for SRIOV")
Cc: stable@vger.kernel.org
Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 047e6575ae upstream.
On POWER9, under some circumstances, a broadcast TLB invalidation will
fail to invalidate the ERAT cache on some threads when there are
parallel mtpidr/mtlpidr happening on other threads of the same core.
This can cause stores to continue to go to a page after it's unmapped.
The workaround is to force an ERAT flush using PID=0 or LPID=0 tlbie
flush. This additional TLB flush will cause the ERAT cache
invalidation. Since we are using PID=0 or LPID=0, we don't get
filtered out by the TLB snoop filtering logic.
We need to still follow this up with another tlbie to take care of
store vs tlbie ordering issue explained in commit:
a5d4b5891c ("powerpc/mm: Fixup tlbie vs store ordering issue on
POWER9"). The presence of ERAT cache implies we can still get new
stores and they may miss store queue marking flush.
Cc: stable@vger.kernel.org
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190924035254.24612-3-aneesh.kumar@linux.ibm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 677733e296 upstream.
The store ordering vs tlbie issue mentioned in commit
a5d4b5891c ("powerpc/mm: Fixup tlbie vs store ordering issue on
POWER9") is fixed for Nimbus 2.3 and Cumulus 1.3 revisions. We don't
need to apply the fixup if we are running on them
We can only do this on PowerNV. On pseries guest with KVM we still
don't support redoing the feature fixup after migration. So we should
be enabling all the workarounds needed, because whe can possibly
migrate between DD 2.3 and DD 2.2
Fixes: a5d4b5891c ("powerpc/mm: Fixup tlbie vs store ordering issue on POWER9")
Cc: stable@vger.kernel.org # v4.16+
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190924035254.24612-1-aneesh.kumar@linux.ibm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 56090a3902 upstream.
pnv_tce() returns a pointer to a TCE entry and originally a TCE table
would be pre-allocated. For the default case of 2GB window the table
needs only a single level and that is fine. However if more levels are
requested, it is possible to get a race when 2 threads want a pointer
to a TCE entry from the same page of TCEs.
This adds cmpxchg to handle the race. Note that once TCE is non-zero,
it cannot become zero again.
Fixes: a68bd1267b ("powerpc/powernv/ioda: Allocate indirect TCE levels on demand")
CC: stable@vger.kernel.org # v4.19+
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190718051139.74787-2-aik@ozlabs.ru
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c784be435d upstream.
The calls to arch_add_memory()/arch_remove_memory() are always made
with the read-side cpu_hotplug_lock acquired via memory_hotplug_begin().
On pSeries, arch_add_memory()/arch_remove_memory() eventually call
resize_hpt() which in turn calls stop_machine() which acquires the
read-side cpu_hotplug_lock again, thereby resulting in the recursive
acquisition of this lock.
In the absence of CONFIG_PROVE_LOCKING, we hadn't observed a system
lockup during a memory hotplug operation because cpus_read_lock() is a
per-cpu rwsem read, which, in the fast-path (in the absence of the
writer, which in our case is a CPU-hotplug operation) simply
increments the read_count on the semaphore. Thus a recursive read in
the fast-path doesn't cause any problems.
However, we can hit this problem in practice if there is a concurrent
CPU-Hotplug operation in progress which is waiting to acquire the
write-side of the lock. This will cause the second recursive read to
block until the writer finishes. While the writer is blocked since the
first read holds the lock. Thus both the reader as well as the writers
fail to make any progress thereby blocking both CPU-Hotplug as well as
Memory Hotplug operations.
Memory-Hotplug CPU-Hotplug
CPU 0 CPU 1
------ ------
1. down_read(cpu_hotplug_lock.rw_sem)
[memory_hotplug_begin]
2. down_write(cpu_hotplug_lock.rw_sem)
[cpu_up/cpu_down]
3. down_read(cpu_hotplug_lock.rw_sem)
[stop_machine()]
Lockdep complains as follows in these code-paths.
swapper/0/1 is trying to acquire lock:
(____ptrval____) (cpu_hotplug_lock.rw_sem){++++}, at: stop_machine+0x2c/0x60
but task is already holding lock:
(____ptrval____) (cpu_hotplug_lock.rw_sem){++++}, at: mem_hotplug_begin+0x20/0x50
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(cpu_hotplug_lock.rw_sem);
lock(cpu_hotplug_lock.rw_sem);
*** DEADLOCK ***
May be due to missing lock nesting notation
3 locks held by swapper/0/1:
#0: (____ptrval____) (&dev->mutex){....}, at: __driver_attach+0x12c/0x1b0
#1: (____ptrval____) (cpu_hotplug_lock.rw_sem){++++}, at: mem_hotplug_begin+0x20/0x50
#2: (____ptrval____) (mem_hotplug_lock.rw_sem){++++}, at: percpu_down_write+0x54/0x1a0
stack backtrace:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.0.0-rc5-58373-gbc99402235f3-dirty #166
Call Trace:
dump_stack+0xe8/0x164 (unreliable)
__lock_acquire+0x1110/0x1c70
lock_acquire+0x240/0x290
cpus_read_lock+0x64/0xf0
stop_machine+0x2c/0x60
pseries_lpar_resize_hpt+0x19c/0x2c0
resize_hpt_for_hotplug+0x70/0xd0
arch_add_memory+0x58/0xfc
devm_memremap_pages+0x5e8/0x8f0
pmem_attach_disk+0x764/0x830
nvdimm_bus_probe+0x118/0x240
really_probe+0x230/0x4b0
driver_probe_device+0x16c/0x1e0
__driver_attach+0x148/0x1b0
bus_for_each_dev+0x90/0x130
driver_attach+0x34/0x50
bus_add_driver+0x1a8/0x360
driver_register+0x108/0x170
__nd_driver_register+0xd0/0xf0
nd_pmem_driver_init+0x34/0x48
do_one_initcall+0x1e0/0x45c
kernel_init_freeable+0x540/0x64c
kernel_init+0x2c/0x160
ret_from_kernel_thread+0x5c/0x68
Fix this issue by
1) Requiring all the calls to pseries_lpar_resize_hpt() be made
with cpu_hotplug_lock held.
2) In pseries_lpar_resize_hpt() invoke stop_machine_cpuslocked()
as a consequence of 1)
3) To satisfy 1), in hpt_order_set(), call mmu_hash_ops.resize_hpt()
with cpu_hotplug_lock held.
Fixes: dbcf929c00 ("powerpc/pseries: Add support for hash table resizing")
Cc: stable@vger.kernel.org # v4.11+
Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1557906352-29048-1-git-send-email-ego@linux.vnet.ibm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 99ead78afd upstream.
The current code would fail on huge pages addresses, since the shift would
be incorrect. Use the correct page shift value returned by
__find_linux_pte() to get the correct physical address. The code is more
generic and can handle both regular and compound pages.
Fixes: ba41e1e1cc ("powerpc/mce: Hookup derror (load/store) UE errors")
Signed-off-by: Balbir Singh <bsingharora@gmail.com>
[arbab@linux.ibm.com: Fixup pseries_do_memory_failure()]
Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
Tested-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Santosh Sivaraj <santosh@fossix.org>
Cc: stable@vger.kernel.org # v4.15+
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190820081352.8641-3-santosh@fossix.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit da15c03b04 upstream.
Testing has revealed the existence of a race condition where a XIVE
interrupt being shut down can be in one of the XIVE interrupt queues
(of which there are up to 8 per CPU, one for each priority) at the
point where free_irq() is called. If this happens, can return an
interrupt number which has been shut down. This can lead to various
symptoms:
- irq_to_desc(irq) can be NULL. In this case, no end-of-interrupt
function gets called, resulting in the CPU's elevated interrupt
priority (numerically lowered CPPR) never gets reset. That then
means that the CPU stops processing interrupts, causing device
timeouts and other errors in various device drivers.
- The irq descriptor or related data structures can be in the process
of being freed as the interrupt code is using them. This typically
leads to crashes due to bad pointer dereferences.
This race is basically what commit 62e0468650 ("genirq: Add optional
hardware synchronization for shutdown", 2019-06-28) is intended to
fix, given a get_irqchip_state() method for the interrupt controller
being used. It works by polling the interrupt controller when an
interrupt is being freed until the controller says it is not pending.
With XIVE, the PQ bits of the interrupt source indicate the state of
the interrupt source, and in particular the P bit goes from 0 to 1 at
the point where the hardware writes an entry into the interrupt queue
that this interrupt is directed towards. Normally, the code will then
process the interrupt and do an end-of-interrupt (EOI) operation which
will reset PQ to 00 (assuming another interrupt hasn't been generated
in the meantime). However, there are situations where the code resets
P even though a queue entry exists (for example, by setting PQ to 01,
which disables the interrupt source), and also situations where the
code leaves P at 1 after removing the queue entry (for example, this
is done for escalation interrupts so they cannot fire again until
they are explicitly re-enabled).
The code already has a 'saved_p' flag for the interrupt source which
indicates that a queue entry exists, although it isn't maintained
consistently. This patch adds a 'stale_p' flag to indicate that
P has been left at 1 after processing a queue entry, and adds code
to set and clear saved_p and stale_p as necessary to maintain a
consistent indication of whether a queue entry may or may not exist.
With this, we can implement xive_get_irqchip_state() by looking at
stale_p, saved_p and the ESB PQ bits for the interrupt.
There is some additional code to handle escalation interrupts
properly; because they are enabled and disabled in KVM assembly code,
which does not have access to the xive_irq_data struct for the
escalation interrupt. Hence, stale_p may be incorrect when the
escalation interrupt is freed in kvmppc_xive_{,native_}cleanup_vcpu().
Fortunately, we can fix it up by looking at vcpu->arch.xive_esc_on,
with some careful attention to barriers in order to ensure the correct
result if xive_esc_irq() races with kvmppc_xive_cleanup_vcpu().
Finally, this adds code to make noise on the console (pr_crit and
WARN_ON(1)) if we find an interrupt queue entry for an interrupt
which does not have a descriptor. While this won't catch the race
reliably, if it does get triggered it will be an indication that
the race is occurring and needs to be debugged.
Fixes: 243e25112d ("powerpc/xive: Native exploitation of the XIVE interrupt controller")
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190813100648.GE9567@blackberry
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b1f373a11d upstream.
VAG power control is improved to fit the manual [1]. This patch fixes as
minimum one bug: if customer muxes Headphone to Line-In right after boot,
the VAG power remains off that leads to poor sound quality from line-in.
I.e. after boot:
- Connect sound source to Line-In jack;
- Connect headphone to HP jack;
- Run following commands:
$ amixer set 'Headphone' 80%
$ amixer set 'Headphone Mux' LINE_IN
Change VAG power on/off control according to the following algorithm:
- turn VAG power ON on the 1st incoming event.
- keep it ON if there is any active VAG consumer (ADC/DAC/HP/Line-In).
- turn VAG power OFF when there is the latest consumer's pre-down event
come.
- always delay after VAG power OFF to avoid pop.
- delay after VAG power ON if the initiative consumer is Line-In, this
prevents pop during line-in muxing.
According to the data sheet [1], to avoid any pops/clicks,
the outputs should be muted during input/output
routing changes.
[1] https://www.nxp.com/docs/en/data-sheet/SGTL5000.pdf
Cc: stable@vger.kernel.org
Fixes: 9b34e6cc3b ("ASoC: Add Freescale SGTL5000 codec support")
Signed-off-by: Oleksandr Suvorov <oleksandr.suvorov@toradex.com>
Reviewed-by: Marcel Ziswiler <marcel.ziswiler@toradex.com>
Reviewed-by: Fabio Estevam <festevam@gmail.com>
Reviewed-by: Cezary Rojewski <cezary.rojewski@intel.com>
Link: https://lore.kernel.org/r/20190719100524.23300-3-oleksandr.suvorov@toradex.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 62bacb06b9 upstream.
The kHz to Hz is incorrectly converted in a few places in the code,
this results in a wrong frequency being calculated because devfreq core
uses OPP frequencies that are given in Hz to clamp the rate, while
tegra-devfreq gives to the core value in kHz and then it also expects to
receive value in kHz from the core. In a result memory freq is always set
to a value which is close to ULONG_MAX because of the bug. Hence the EMC
frequency is always capped to the maximum and the driver doesn't do
anything useful. This patch was tested on Tegra30 and Tegra124 SoC's, EMC
frequency scaling works properly now.
Cc: <stable@vger.kernel.org> # 4.14+
Tested-by: Steev Klimaszewski <steev@kali.org>
Reviewed-by: Chanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e9e006f5fc upstream.
This fixes a bug added in 4.10 with commit:
commit 9561a7ade0
Author: Josef Bacik <jbacik@fb.com>
Date: Tue Nov 22 14:04:40 2016 -0500
nbd: add multi-connection support
that limited the number of devices to 256. Before the patch we could
create 1000s of devices, but the patch switched us from using our
own thread to using a work queue which has a default limit of 256
active works.
The problem is that our recv_work function sits in a loop until
disconnection but only handles IO for one connection. The work is
started when the connection is started/restarted, but if we end up
creating 257 or more connections, the queue_work call just queues
connection257+'s recv_work and that waits for connection 1 - 256's
recv_work to be disconnected and that work instance completing.
Instead of reverting back to kthreads, this has us allocate a
workqueue_struct per device, so we can block in the work.
Cc: stable@vger.kernel.org
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3ca9419227 upstream.
Reported by syzkaller:
WARNING: CPU: 0 PID: 6544 at /home/kernel/data/kvm/arch/x86/kvm//vmx/vmx.c:4689 handle_desc+0x37/0x40 [kvm_intel]
CPU: 0 PID: 6544 Comm: a.out Tainted: G OE 5.3.0-rc4+ #4
RIP: 0010:handle_desc+0x37/0x40 [kvm_intel]
Call Trace:
vmx_handle_exit+0xbe/0x6b0 [kvm_intel]
vcpu_enter_guest+0x4dc/0x18d0 [kvm]
kvm_arch_vcpu_ioctl_run+0x407/0x660 [kvm]
kvm_vcpu_ioctl+0x3ad/0x690 [kvm]
do_vfs_ioctl+0xa2/0x690
ksys_ioctl+0x6d/0x80
__x64_sys_ioctl+0x1a/0x20
do_syscall_64+0x74/0x720
entry_SYSCALL_64_after_hwframe+0x49/0xbe
When CR4.UMIP is set, guest should have UMIP cpuid flag. Current
kvm set_sregs function doesn't have such check when userspace inputs
sregs values. SECONDARY_EXEC_DESC is enabled on writes to CR4.UMIP
in vmx_set_cr4 though guest doesn't have UMIP cpuid flag. The testcast
triggers handle_desc warning when executing ltr instruction since
guest architectural CR4 doesn't set UMIP. This patch fixes it by
adding valid CR4 and CPUID combination checking in __set_sregs.
syzkaller source: https://syzkaller.appspot.com/x/repro.c?x=138efb99600000
Reported-by: syzbot+0f1819555fbdce992df9@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ff42df49e7 upstream.
On POWER9, when userspace reads the value of the DPDES register on a
vCPU, it is possible for 0 to be returned although there is a doorbell
interrupt pending for the vCPU. This can lead to a doorbell interrupt
being lost across migration. If the guest kernel uses doorbell
interrupts for IPIs, then it could malfunction because of the lost
interrupt.
This happens because a newly-generated doorbell interrupt is signalled
by setting vcpu->arch.doorbell_request to 1; the DPDES value in
vcpu->arch.vcore->dpdes is not updated, because it can only be updated
when holding the vcpu mutex, in order to avoid races.
To fix this, we OR in vcpu->arch.doorbell_request when reading the
DPDES value.
Cc: stable@vger.kernel.org # v4.13+
Fixes: 579006944e ("KVM: PPC: Book3S HV: Virtualize doorbell facility on POWER9")
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d28eafc5a6 upstream.
When we are running multiple vcores on the same physical core, they
could be from different VMs and so it is possible that one of the
VMs could have its arch.mmu_ready flag cleared (for example by a
concurrent HPT resize) when we go to run it on a physical core.
We currently check the arch.mmu_ready flag for the primary vcore
but not the flags for the other vcores that will be run alongside
it. This adds that check, and also a check when we select the
secondary vcores from the preempted vcores list.
Cc: stable@vger.kernel.org # v4.14+
Fixes: 38c53af853 ("KVM: PPC: Book3S HV: Fix exclusion between HPT resizing and other HPT updates")
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 959c5d5134 upstream.
Escalation interrupts are interrupts sent to the host by the XIVE
hardware when it has an interrupt to deliver to a guest VCPU but that
VCPU is not running anywhere in the system. Hence we disable the
escalation interrupt for the VCPU being run when we enter the guest
and re-enable it when the guest does an H_CEDE hypercall indicating
it is idle.
It is possible that an escalation interrupt gets generated just as we
are entering the guest. In that case the escalation interrupt may be
using a queue entry in one of the interrupt queues, and that queue
entry may not have been processed when the guest exits with an H_CEDE.
The existing entry code detects this situation and does not clear the
vcpu->arch.xive_esc_on flag as an indication that there is a pending
queue entry (if the queue entry gets processed, xive_esc_irq() will
clear the flag). There is a comment in the code saying that if the
flag is still set on H_CEDE, we have to abort the cede rather than
re-enabling the escalation interrupt, lest we end up with two
occurrences of the escalation interrupt in the interrupt queue.
However, the exit code doesn't do that; it aborts the cede in the sense
that vcpu->arch.ceded gets cleared, but it still enables the escalation
interrupt by setting the source's PQ bits to 00. Instead we need to
set the PQ bits to 10, indicating that an interrupt has been triggered.
We also need to avoid setting vcpu->arch.xive_esc_on in this case
(i.e. vcpu->arch.xive_esc_on seen to be set on H_CEDE) because
xive_esc_irq() will run at some point and clear it, and if we race with
that we may end up with an incorrect result (i.e. xive_esc_on set when
the escalation interrupt has just been handled).
It is extremely unlikely that having two queue entries would cause
observable problems; theoretically it could cause queue overflow, but
the CPU would have to have thousands of interrupts targetted to it for
that to be possible. However, this fix will also make it possible to
determine accurately whether there is an unhandled escalation
interrupt in the queue, which will be needed by the following patch.
Fixes: 9b9b13a6d1 ("KVM: PPC: Book3S HV: Keep XIVE escalation interrupt masked unless ceded")
Cc: stable@vger.kernel.org # v4.16+
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190813100349.GD9567@blackberry
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8d4ba9c931 upstream.
At present, when running a guest on POWER9 using HV KVM but not using
an in-kernel interrupt controller (XICS or XIVE), for example if QEMU
is run with the kernel_irqchip=off option, the guest entry code goes
ahead and tries to load the guest context into the XIVE hardware, even
though no context has been set up.
To fix this, we check that the "CAM word" is non-zero before pushing
it to the hardware. The CAM word is initialized to a non-zero value
in kvmppc_xive_connect_vcpu() and kvmppc_xive_native_connect_vcpu(),
and is now cleared in kvmppc_xive_{,native_}cleanup_vcpu.
Fixes: 5af5099385 ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller")
Cc: stable@vger.kernel.org # v4.12+
Reported-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190813100100.GC9567@blackberry
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 237aed48c6 upstream.
When a vCPU is brought done, the XIVE VP (Virtual Processor) is first
disabled and then the event notification queues are freed. When freeing
the queues, we check for possible escalation interrupts and free them
also.
But when a XIVE VP is disabled, the underlying XIVE ENDs also are
disabled in OPAL. When an END (Event Notification Descriptor) is
disabled, its ESB pages (ESn and ESe) are disabled and loads return all
1s. Which means that any access on the ESB page of the escalation
interrupt will return invalid values.
When an interrupt is freed, the shutdown handler computes a 'saved_p'
field from the value returned by a load in xive_do_source_set_mask().
This value is incorrect for escalation interrupts for the reason
described above.
This has no impact on Linux/KVM today because we don't make use of it
but we will introduce in future changes a xive_get_irqchip_state()
handler. This handler will use the 'saved_p' field to return the state
of an interrupt and 'saved_p' being incorrect, softlockup will occur.
Fix the vCPU cleanup sequence by first freeing the escalation interrupts
if any, then disable the XIVE VP and last free the queues.
Fixes: 90c73795af ("KVM: PPC: Book3S HV: Add a new KVM device for the XIVE native exploitation mode")
Fixes: 5af5099385 ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller")
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190806172538.5087-1-clg@kaod.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2ad7a27dea upstream.
There are some POWER9 machines where the OPAL firmware does not support
the OPAL_XIVE_GET_QUEUE_STATE and OPAL_XIVE_SET_QUEUE_STATE calls.
The impact of this is that a guest using XIVE natively will not be able
to be migrated successfully. On the source side, the get_attr operation
on the KVM native device for the KVM_DEV_XIVE_GRP_EQ_CONFIG attribute
will fail; on the destination side, the set_attr operation for the same
attribute will fail.
This adds tests for the existence of the OPAL get/set queue state
functions, and if they are not supported, the XIVE-native KVM device
is not created and the KVM_CAP_PPC_IRQ_XIVE capability returns false.
Userspace can then either provide a software emulation of XIVE, or
else tell the guest that it does not have a XIVE controller available
to it.
Cc: stable@vger.kernel.org # v5.2+
Fixes: 3fab2d1058 ("KVM: PPC: Book3S HV: XIVE: Activate XIVE exploitation mode")
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b1c41ac3ce upstream.
The inline assembly constraints of __insn32_query() tell the compiler
that only the first byte of "query" is being written to. Intended was
probably that 32 bytes are written to.
Fix and simplify the code and just use a "memory" clobber.
Fixes: d668139718 ("KVM: s390: provide query function for instructions returning 32 byte")
Cc: stable@vger.kernel.org # v5.2+
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 964ce509e2 upstream.
This reverts commit 7e64db1597.
The thin provisioning feature introduces an IOCTL and the discard support
to allow userspace tools and filesystems to release unused and previously
allocated space respectively.
During some internal performance improvements and further tests, the
release of allocated space revealed some issues that may lead to data
corruption in some configurations when filesystems are mounted with
discard support enabled.
While we're working on a fix and trying to clarify the situation,
this commit reverts the discard support for ESE volumes to prevent
potential data corruption.
Cc: <stable@vger.kernel.org> # 5.3
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dd45483981 upstream.
It is possible that the CCW commands for reading volume and extent pool
information are not supported, either by the storage server (for
dedicated DASDs) or by z/VM (for virtual devices, such as MDISKs).
As a command reject will occur in such a case, the current error
handling leads to a failing online processing and thus the DASD can't be
used at all.
Since the data being read is not essential for an fully operational
DASD, the error handling can be removed. Information about the failing
command is sent to the s390dbf debug feature.
Fixes: c729696bcf ("s390/dasd: Recognise data for ESE volumes")
Cc: <stable@vger.kernel.org> # 5.3
Reported-by: Frank Heimes <frank.heimes@canonical.com>
Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com>
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ab57588480 upstream.
ccw console is created early in start_kernel and used before css is
initialized or ccw console subchannel is registered. Until then console
subchannel does not have a parent. For that reason assume subchannels
with no parent are not pseudo subchannels. This fixes the following
kasan finding:
BUG: KASAN: global-out-of-bounds in sch_is_pseudo_sch+0x8e/0x98
Read of size 8 at addr 00000000000005e8 by task swapper/0/0
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.0-rc8-07370-g6ac43dd12538 #2
Hardware name: IBM 2964 NC9 702 (z/VM 6.4.0)
Call Trace:
([<000000000012cd76>] show_stack+0x14e/0x1e0)
[<0000000001f7fb44>] dump_stack+0x1a4/0x1f8
[<00000000007d7afc>] print_address_description+0x64/0x3c8
[<00000000007d75f6>] __kasan_report+0x14e/0x180
[<00000000018a2986>] sch_is_pseudo_sch+0x8e/0x98
[<000000000189b950>] cio_enable_subchannel+0x1d0/0x510
[<00000000018cac7c>] ccw_device_recognition+0x12c/0x188
[<0000000002ceb1a8>] ccw_device_enable_console+0x138/0x340
[<0000000002cf1cbe>] con3215_init+0x25e/0x300
[<0000000002c8770a>] console_init+0x68a/0x9b8
[<0000000002c6a3d6>] start_kernel+0x4fe/0x728
[<0000000000100070>] startup_continue+0x70/0xd0
Cc: stable@vger.kernel.org
Reviewed-by: Sebastian Ott <sebott@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ea298e6ee8 upstream.
Fix the following kasan finding:
BUG: KASAN: global-out-of-bounds in ccwgroup_create_dev+0x850/0x1140
Read of size 1 at addr 0000000000000000 by task systemd-udevd.r/561
CPU: 30 PID: 561 Comm: systemd-udevd.r Tainted: G B
Hardware name: IBM 3906 M04 704 (LPAR)
Call Trace:
([<0000000231b3db7e>] show_stack+0x14e/0x1a8)
[<0000000233826410>] dump_stack+0x1d0/0x218
[<000000023216fac4>] print_address_description+0x64/0x380
[<000000023216f5a8>] __kasan_report+0x138/0x168
[<00000002331b8378>] ccwgroup_create_dev+0x850/0x1140
[<00000002332b618a>] group_store+0x3a/0x50
[<00000002323ac706>] kernfs_fop_write+0x246/0x3b8
[<00000002321d409a>] vfs_write+0x132/0x450
[<00000002321d47da>] ksys_write+0x122/0x208
[<0000000233877102>] system_call+0x2a6/0x2c8
Triggered by:
openat(AT_FDCWD, "/sys/bus/ccwgroup/drivers/qeth/group",
O_WRONLY|O_CREAT|O_TRUNC|O_CLOEXEC, 0666) = 16
write(16, "0.0.bd00,0.0.bd01,0.0.bd02", 26) = 26
The problem is that __get_next_id in ccwgroup_create_dev might set "buf"
buffer pointer to NULL and explicit check for that is required.
Cc: stable@vger.kernel.org
Reviewed-by: Sebastian Ott <sebott@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f3122a79a1 upstream.
arch_update_cpu_topology is first called from:
kernel_init_freeable->sched_init_smp->sched_init_domains
even before cpus has been registered in:
kernel_init_freeable->do_one_initcall->s390_smp_init
Do not trigger kobject_uevent change events until cpu devices are
actually created. Fixes the following kasan findings:
BUG: KASAN: global-out-of-bounds in kobject_uevent_env+0xb40/0xee0
Read of size 8 at addr 0000000000000020 by task swapper/0/1
BUG: KASAN: global-out-of-bounds in kobject_uevent_env+0xb36/0xee0
Read of size 8 at addr 0000000000000018 by task swapper/0/1
CPU: 0 PID: 1 Comm: swapper/0 Tainted: G B
Hardware name: IBM 3906 M04 704 (LPAR)
Call Trace:
([<0000000143c6db7e>] show_stack+0x14e/0x1a8)
[<0000000145956498>] dump_stack+0x1d0/0x218
[<000000014429fb4c>] print_address_description+0x64/0x380
[<000000014429f630>] __kasan_report+0x138/0x168
[<0000000145960b96>] kobject_uevent_env+0xb36/0xee0
[<0000000143c7c47c>] arch_update_cpu_topology+0x104/0x108
[<0000000143df9e22>] sched_init_domains+0x62/0xe8
[<000000014644c94a>] sched_init_smp+0x3a/0xc0
[<0000000146433a20>] kernel_init_freeable+0x558/0x958
[<000000014599002a>] kernel_init+0x22/0x160
[<00000001459a71d4>] ret_from_fork+0x28/0x30
[<00000001459a71dc>] kernel_thread_starter+0x0/0x10
Cc: stable@vger.kernel.org
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a13b03bbb4 upstream.
If the KVM_S390_MEM_OP ioctl is called with an access register >= 16,
then there is certainly a bug in the calling userspace application.
We check for wrong access registers, but only if the vCPU was already
in the access register mode before (i.e. the SIE block has recorded
it). The check is also buried somewhere deep in the calling chain (in
the function ar_translation()), so this is somewhat hard to find.
It's better to always report an error to the userspace in case this
field is set wrong, and it's safer in the KVM code if we block wrong
values here early instead of relying on a check somewhere deep down
the calling chain, so let's add another check to kvm_s390_guest_mem_op()
directly.
We also should check that the "size" is non-zero here (thanks to Janosch
Frank for the hint!). If we do not check the size, we could call vmalloc()
with this 0 value, and this will cause a kernel warning.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Link: https://lkml.kernel.org/r/20190829122517.31042-1-thuth@redhat.com
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8769f610fe upstream.
With THREAD_INFO_IN_TASK (which is selected on s390) task's stack usage
is refcounted and should always be protected by get/put when touching
other task's stack to avoid race conditions with task's destruction code.
Fixes: d5c352cdd0 ("s390: move thread_info into task_struct")
Cc: stable@vger.kernel.org # v4.10+
Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a073d7e3ad upstream.
Reported by syzkaller:
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
RIP: 0010:__apic_accept_irq+0x46/0x740 arch/x86/kvm/lapic.c:1029
Call Trace:
kvm_apic_set_irq+0xb4/0x140 arch/x86/kvm/lapic.c:558
stimer_notify_direct arch/x86/kvm/hyperv.c:648 [inline]
stimer_expiration arch/x86/kvm/hyperv.c:659 [inline]
kvm_hv_process_stimers+0x594/0x1650 arch/x86/kvm/hyperv.c:686
vcpu_enter_guest+0x2b2a/0x54b0 arch/x86/kvm/x86.c:7896
vcpu_run+0x393/0xd40 arch/x86/kvm/x86.c:8152
kvm_arch_vcpu_ioctl_run+0x636/0x900 arch/x86/kvm/x86.c:8360
kvm_vcpu_ioctl+0x6cf/0xaf0 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2765
The testcase programs HV_X64_MSR_STIMERn_CONFIG/HV_X64_MSR_STIMERn_COUNT,
in addition, there is no lapic in the kernel, the counters value are small
enough in order that kvm_hv_process_stimers() inject this already-expired
timer interrupt into the guest through lapic in the kernel which triggers
the NULL deferencing. This patch fixes it by don't advertise direct mode
synthetic timers and discarding the inject when lapic is not in kernel.
syzkaller source: https://syzkaller.appspot.com/x/repro.c?x=1752fe0a600000
Reported-by: syzbot+dff25ee91f0c7d5c1695@syzkaller.appspotmail.com
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 18917d5147 upstream.
nfc_genl_deactivate_target() relies on the NFC_ATTR_TARGET_INDEX
attribute being present, but doesn't check whether it is actually
provided by the user. Same goes for nfc_genl_fw_download() and
NFC_ATTR_FIRMWARE_NAME.
This patch adds appropriate checks.
Found with syzkaller.
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c8156fc77d upstream.
Unit of 'chunk_size' is byte, instead of sector, so fix it by setting
the queue_limits' max_discard_sectors to rs->md.chunk_sectors. Also,
rename chunk_size to chunk_size_bytes.
Without this fix, too big max_discard_sectors is applied on the request
queue of dm-raid, finally raid code has to split the bio again.
This re-split done by raid causes the following nested clone_endio:
1) one big bio 'A' is submitted to dm queue, and served as the original
bio
2) one new bio 'B' is cloned from the original bio 'A', and .map()
is run on this bio of 'B', and B's original bio points to 'A'
3) raid code sees that 'B' is too big, and split 'B' and re-submit
the remainded part of 'B' to dm-raid queue via generic_make_request().
4) now dm will handle 'B' as new original bio, then allocate a new
clone bio of 'C' and run .map() on 'C'. Meantime C's original bio
points to 'B'.
5) suppose now 'C' is completed by raid directly, then the following
clone_endio() is called recursively:
clone_endio(C)
->clone_endio(B) #B is original bio of 'C'
->bio_endio(A)
'A' can be big enough to make hundreds of nested clone_endio(), then
stack can be corrupted easily.
Fixes: 61697a6abd ("dm: eliminate 'split_discard_bios' flag from DM target interface")
Cc: stable@vger.kernel.org
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3675f052b4 upstream.
There is a logic bug in the current smack_bprm_set_creds():
If LSM_UNSAFE_PTRACE is set, but the ptrace state is deemed to be
acceptable (e.g. because the ptracer detached in the meantime), the other
->unsafe flags aren't checked. As far as I can tell, this means that
something like the following could work (but I haven't tested it):
- task A: create task B with fork()
- task B: set NO_NEW_PRIVS
- task B: install a seccomp filter that makes open() return 0 under some
conditions
- task B: replace fd 0 with a malicious library
- task A: attach to task B with PTRACE_ATTACH
- task B: execve() a file with an SMACK64EXEC extended attribute
- task A: while task B is still in the middle of execve(), exit (which
destroys the ptrace relationship)
Make sure that if any flags other than LSM_UNSAFE_PTRACE are set in
bprm->unsafe, we reject the execve().
Cc: stable@vger.kernel.org
Fixes: 5663884caa ("Smack: unify all ptrace accesses in the smack")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 9a9251a353 ]
The check in taprio_set_picos_per_byte is currently not robust enough
and will trigger this division by zero, due to e.g. PHYLINK not setting
kset->base.speed when there is no PHY connected:
[ 27.109992] Division by zero in kernel.
[ 27.113842] CPU: 1 PID: 198 Comm: tc Not tainted 5.3.0-rc5-01246-gc4006b8c2637-dirty #212
[ 27.121974] Hardware name: Freescale LS1021A
[ 27.126234] [<c03132e0>] (unwind_backtrace) from [<c030d8b8>] (show_stack+0x10/0x14)
[ 27.133938] [<c030d8b8>] (show_stack) from [<c10b21b0>] (dump_stack+0xb0/0xc4)
[ 27.141124] [<c10b21b0>] (dump_stack) from [<c10af97c>] (Ldiv0_64+0x8/0x18)
[ 27.148052] [<c10af97c>] (Ldiv0_64) from [<c0700260>] (div64_u64+0xcc/0xf0)
[ 27.154978] [<c0700260>] (div64_u64) from [<c07002d0>] (div64_s64+0x4c/0x68)
[ 27.161993] [<c07002d0>] (div64_s64) from [<c0f3d890>] (taprio_set_picos_per_byte+0xe8/0xf4)
[ 27.170388] [<c0f3d890>] (taprio_set_picos_per_byte) from [<c0f3f614>] (taprio_change+0x668/0xcec)
[ 27.179302] [<c0f3f614>] (taprio_change) from [<c0f2bc24>] (qdisc_create+0x1fc/0x4f4)
[ 27.187091] [<c0f2bc24>] (qdisc_create) from [<c0f2c0c8>] (tc_modify_qdisc+0x1ac/0x6f8)
[ 27.195055] [<c0f2c0c8>] (tc_modify_qdisc) from [<c0ee9604>] (rtnetlink_rcv_msg+0x268/0x2dc)
[ 27.203449] [<c0ee9604>] (rtnetlink_rcv_msg) from [<c0f4fef0>] (netlink_rcv_skb+0xe0/0x114)
[ 27.211756] [<c0f4fef0>] (netlink_rcv_skb) from [<c0f4f6cc>] (netlink_unicast+0x1b4/0x22c)
[ 27.219977] [<c0f4f6cc>] (netlink_unicast) from [<c0f4fa84>] (netlink_sendmsg+0x284/0x340)
[ 27.228198] [<c0f4fa84>] (netlink_sendmsg) from [<c0eae5fc>] (sock_sendmsg+0x14/0x24)
[ 27.235988] [<c0eae5fc>] (sock_sendmsg) from [<c0eaedf8>] (___sys_sendmsg+0x214/0x228)
[ 27.243863] [<c0eaedf8>] (___sys_sendmsg) from [<c0eb015c>] (__sys_sendmsg+0x50/0x8c)
[ 27.251652] [<c0eb015c>] (__sys_sendmsg) from [<c0301000>] (ret_fast_syscall+0x0/0x54)
[ 27.259524] Exception stack(0xe8045fa8 to 0xe8045ff0)
[ 27.264546] 5fa0: b6f608c8 000000f8 00000003 bed7e2f0 00000000 00000000
[ 27.272681] 5fc0: b6f608c8 000000f8 004ce54c 00000128 5d3ce8c7 00000000 00000026 00505c9c
[ 27.280812] 5fe0: 00000070 bed7e298 004ddd64 b6dd1e64
Russell King points out that the ethtool API says zero is a valid return
value of __ethtool_get_link_ksettings:
* If it is enabled then they are read-only; if the link
* is up they represent the negotiated link mode; if the link is down,
* the speed is 0, %SPEED_UNKNOWN or the highest enabled speed and
* @duplex is %DUPLEX_UNKNOWN or the best enabled duplex mode.
So, it seems that taprio is not following the API... I'd suggest either
fixing taprio, or getting agreement to change the ethtool API.
The chosen path was to fix taprio.
Fixes: 7b9eba7ba0 ("net/sched: taprio: fix picos_per_byte miscalculation")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 83c8c3cf45 ]
As explained in the "net: sched: taprio: Avoid division by zero on
invalid link speed" commit, it is legal for the ethtool API to return
zero as a link speed. So guard against it to ensure we don't perform a
division by zero in kernel.
Fixes: e0a7683d30 ("net/sched: cbs: fix port_rate miscalculation")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 55131dec2b ]
Always acquire tx descriptor spinlock even if a xdp program is not loaded
on the netsec device since ndo_xdp_xmit can run concurrently with
netsec_netdev_start_xmit and netsec_clean_tx_dring. This can happen
loading a xdp program on a different device (e.g virtio-net) and
xdp_do_redirect_map/xdp_do_redirect_slow can redirect to netsec even if
we do not have a xdp program on it.
Fixes: ba2b232108 ("net: netsec: add XDP support")
Tested-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 68501df92d ]
In sja1105_static_config_upload, in two cases memory is leaked: when
static_config_buf_prepare_for_upload fails and when sja1105_inhibit_tx
fails. In both cases config_buf should be released.
Fixes: 8aa9ebccae ("net: dsa: Introduce driver for NXP SJA1105 5-port L2 switch")
Fixes: 1a4c69406c ("net: dsa: sja1105: Prevent PHY jabbering during switch reset")
Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b6f2494d31 ]
Sometimes the PTP synchronization on the switch 'jumps':
ptp4l[11241.155]: rms 8 max 16 freq -21732 +/- 11 delay 742 +/- 0
ptp4l[11243.157]: rms 7 max 17 freq -21731 +/- 10 delay 744 +/- 0
ptp4l[11245.160]: rms 33592410 max 134217731 freq +192422 +/- 8530253 delay 743 +/- 0
ptp4l[11247.163]: rms 811631 max 964131 freq +10326 +/- 557785 delay 743 +/- 0
ptp4l[11249.166]: rms 261936 max 533876 freq -304323 +/- 126371 delay 744 +/- 0
ptp4l[11251.169]: rms 48700 max 57740 freq -20218 +/- 30532 delay 744 +/- 0
ptp4l[11253.171]: rms 14570 max 30163 freq -5568 +/- 7563 delay 742 +/- 0
ptp4l[11255.174]: rms 2914 max 3440 freq -22001 +/- 1667 delay 744 +/- 1
ptp4l[11257.177]: rms 811 max 1710 freq -22653 +/- 451 delay 744 +/- 1
ptp4l[11259.180]: rms 177 max 218 freq -21695 +/- 89 delay 741 +/- 0
ptp4l[11261.182]: rms 45 max 92 freq -21677 +/- 32 delay 742 +/- 0
ptp4l[11263.186]: rms 14 max 32 freq -21733 +/- 11 delay 742 +/- 0
ptp4l[11265.188]: rms 9 max 14 freq -21725 +/- 12 delay 742 +/- 0
ptp4l[11267.191]: rms 9 max 16 freq -21727 +/- 13 delay 742 +/- 0
ptp4l[11269.194]: rms 6 max 15 freq -21726 +/- 9 delay 743 +/- 0
ptp4l[11271.197]: rms 8 max 15 freq -21728 +/- 11 delay 743 +/- 0
ptp4l[11273.200]: rms 6 max 12 freq -21727 +/- 8 delay 743 +/- 0
ptp4l[11275.202]: rms 9 max 17 freq -21720 +/- 11 delay 742 +/- 0
ptp4l[11277.205]: rms 9 max 18 freq -21725 +/- 12 delay 742 +/- 0
Background: the switch only offers partial RX timestamps (24 bits) and
it is up to the driver to read the PTP clock to fill those timestamps up
to 64 bits. But the PTP clock readout needs to happen quickly enough (in
0.135 seconds, in fact), otherwise the PTP clock will wrap around 24
bits, condition which cannot be detected.
Looking at the 'max 134217731' value on output line 3, one can see that
in hex it is 0x8000003. Because the PTP clock resolution is 8 ns,
that means 0x1000000 in ticks, which is exactly 2^24. So indeed this is
a PTP clock wraparound, but the reason might be surprising.
What is going on is that sja1105_tstamp_reconstruct(priv, now, ts)
expects a "now" time that is later than the "ts" was snapshotted at.
This, of course, is obvious: we read the PTP time _after_ the partial RX
timestamp was received. However, the workqueue is processing frames from
a skb queue and reuses the same PTP time, read once at the beginning.
Normally the skb queue only contains one frame and all goes well. But
when the skb queue contains two frames, the second frame that gets
dequeued might have been partially timestamped by the RX MAC _after_ we
had read our PTP time initially.
The code was originally like that due to concerns that SPI access for
PTP time readout is a slow process, and we are time-constrained anyway
(aka: premature optimization). But some timing analysis reveals that the
time spent until the RX timestamp is completely reconstructed is 1 order
of magnitude lower than the 0.135 s deadline even under worst-case
conditions. So we can afford to read the PTP time for each frame in the
RX timestamping queue, which of course ensures that the full PTP time is
in the partial timestamp's future.
Fixes: f3097be21b ("net: dsa: sja1105: Add a state machine for RX timestamping")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3e8db7e560 ]
Currently this stack trace can be seen with CONFIG_DEBUG_ATOMIC_SLEEP=y:
[ 41.568348] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:909
[ 41.576757] in_atomic(): 1, irqs_disabled(): 0, pid: 208, name: ptp4l
[ 41.583212] INFO: lockdep is turned off.
[ 41.587123] CPU: 1 PID: 208 Comm: ptp4l Not tainted 5.3.0-rc6-01445-ge950f2d4bc7f-dirty #1827
[ 41.599873] [<c0313d7c>] (unwind_backtrace) from [<c030e13c>] (show_stack+0x10/0x14)
[ 41.607584] [<c030e13c>] (show_stack) from [<c1212d50>] (dump_stack+0xd4/0x100)
[ 41.614863] [<c1212d50>] (dump_stack) from [<c037dfc8>] (___might_sleep+0x1c8/0x2b4)
[ 41.622574] [<c037dfc8>] (___might_sleep) from [<c122ea90>] (__mutex_lock+0x48/0xab8)
[ 41.630368] [<c122ea90>] (__mutex_lock) from [<c122f51c>] (mutex_lock_nested+0x1c/0x24)
[ 41.638340] [<c122f51c>] (mutex_lock_nested) from [<c0c6fe08>] (sja1105_static_config_reload+0x30/0x27c)
[ 41.647779] [<c0c6fe08>] (sja1105_static_config_reload) from [<c0c7015c>] (sja1105_hwtstamp_set+0x108/0x1cc)
[ 41.657562] [<c0c7015c>] (sja1105_hwtstamp_set) from [<c0feb650>] (dev_ifsioc+0x18c/0x330)
[ 41.665788] [<c0feb650>] (dev_ifsioc) from [<c0febbd8>] (dev_ioctl+0x320/0x6e8)
[ 41.673064] [<c0febbd8>] (dev_ioctl) from [<c0f8b1f4>] (sock_ioctl+0x334/0x5e8)
[ 41.680340] [<c0f8b1f4>] (sock_ioctl) from [<c05404a8>] (do_vfs_ioctl+0xb0/0xa10)
[ 41.687789] [<c05404a8>] (do_vfs_ioctl) from [<c0540e3c>] (ksys_ioctl+0x34/0x58)
[ 41.695151] [<c0540e3c>] (ksys_ioctl) from [<c0301000>] (ret_fast_syscall+0x0/0x28)
[ 41.702768] Exception stack(0xe8495fa8 to 0xe8495ff0)
[ 41.707796] 5fa0: beff4a8c 00000001 00000011 000089b0 beff4a8c beff4a80
[ 41.715933] 5fc0: beff4a8c 00000001 0000000c 00000036 b6fa98c8 004e19c1 00000001 00000000
[ 41.724069] 5fe0: 004dcedc beff4a6c 004c0738 b6e7af4c
[ 41.729860] BUG: scheduling while atomic: ptp4l/208/0x00000002
[ 41.735682] INFO: lockdep is turned off.
Enabling RX timestamping will logically disturb the fastpath (processing
of meta frames). Replace bool hwts_rx_en with a bit that is checked
atomically from the fastpath and temporarily unset from the sleepable
context during a change of the RX timestamping process (a destructive
operation anyways, requires switch reset).
If found unset, the fastpath (net/dsa/tag_sja1105.c) will just drop any
received meta frame and not take the meta_lock at all.
Fixes: a602afd200 ("net: dsa: sja1105: Expose PTP timestamping ioctls to userspace")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a761129e36 ]
xennet_fill_frags() uses ~0U as return value when the sk_buff is not able
to cache extra fragments. This is incorrect because the return type of
xennet_fill_frags() is RING_IDX and 0xffffffff is an expected value for
ring buffer index.
In the situation when the rsp_cons is approaching 0xffffffff, the return
value of xennet_fill_frags() may become 0xffffffff which xennet_poll() (the
caller) would regard as error. As a result, queue->rx.rsp_cons is set
incorrectly because it is updated only when there is error. If there is no
error, xennet_poll() would be responsible to update queue->rx.rsp_cons.
Finally, queue->rx.rsp_cons would point to the rx ring buffer entries whose
queue->rx_skbs[i] and queue->grant_rx_ref[i] are already cleared to NULL.
This leads to NULL pointer access in the next iteration to process rx ring
buffer entries.
The symptom is similar to the one fixed in
commit 00b368502d ("xen-netfront: do not assume sk_buff_head list is
empty in error handling").
This patch changes the return type of xennet_fill_frags() to indicate
whether it is successful or failed. The queue->rx.rsp_cons will be
always updated inside this function.
Fixes: ad4f15dc2c ("xen/netfront: don't bug in case of too many frags")
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d64bf89a75 ]
rds_ibdev:ipaddr_list and rds_ibdev:conn_list are initialized
after allocation some resources such as protection domain.
If allocation of such resources fail, then these uninitialized
variables are accessed in rds_ib_dev_free() in failure path. This
can potentially crash the system. The code has been updated to
initialize these variables very early in the function.
Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Signed-off-by: Sudhakar Dindukurti <sudhakar.dindukurti@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4094871db1 ]
Prior to this change an application sending <= 1MSS worth of data and
enabling UDP GSO would fail if the system had SW GSO enabled, but the
same send would succeed if HW GSO offload is enabled. In addition to this
inconsistency the error in the SW GSO case does not get back to the
application if sending out of a real device so the user is unaware of this
failure.
With this change we only perform GSO if the # of segments is > 1 even
if the application has enabled segmentation. I've also updated the
relevant udpgso selftests.
Fixes: bec1f6f697 ("udp: generate gso with UDP_SEGMENT")
Signed-off-by: Josh Hunt <johunt@akamai.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3256a2d6ab ]
The cited commit exposed an old retransmits_timed_out() bug
which assumed it could call tcp_model_timeout() with
TCP_RTO_MIN as rto_base for all states.
But flows in SYN_SENT or SYN_RECV state uses a different
RTO base (1 sec instead of 200 ms, unless BPF choses
another value)
This caused a reduction of SYN retransmits from 6 to 4 with
the default /proc/sys/net/ipv4/tcp_syn_retries value.
Fixes: a41e8a88b0 ("tcp: better handle TCP_USER_TIMEOUT in SYN_SENT state")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Marek Majkowski <marek@cloudflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e8521e53cc ]
There has been some confusion between the port number and
the VLAN ID in this driver. What we need to check for
validity is the VLAN ID, nothing else.
The current confusion came from assigning a few default
VLANs for default routing and we need to rewrite that
properly.
Instead of checking if the port number is a valid VLAN
ID, check the actual VLAN IDs passed in to the callback
one by one as expected.
Fixes: d8652956cf ("net: dsa: realtek-smi: Add Realtek SMI driver")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0d9138ffac ]
Lockdep is unhappy if two locks from the same class are held.
Fix the below warning for hyperv and virtio sockets (vmci socket code
doesn't have the issue) by using lock_sock_nested() when __vsock_release()
is called recursively:
============================================
WARNING: possible recursive locking detected
5.3.0+ #1 Not tainted
--------------------------------------------
server/1795 is trying to acquire lock:
ffff8880c5158990 (sk_lock-AF_VSOCK){+.+.}, at: hvs_release+0x10/0x120 [hv_sock]
but task is already holding lock:
ffff8880c5158150 (sk_lock-AF_VSOCK){+.+.}, at: __vsock_release+0x2e/0xf0 [vsock]
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(sk_lock-AF_VSOCK);
lock(sk_lock-AF_VSOCK);
*** DEADLOCK ***
May be due to missing lock nesting notation
2 locks held by server/1795:
#0: ffff8880c5d05ff8 (&sb->s_type->i_mutex_key#10){+.+.}, at: __sock_release+0x2d/0xa0
#1: ffff8880c5158150 (sk_lock-AF_VSOCK){+.+.}, at: __vsock_release+0x2e/0xf0 [vsock]
stack backtrace:
CPU: 5 PID: 1795 Comm: server Not tainted 5.3.0+ #1
Call Trace:
dump_stack+0x67/0x90
__lock_acquire.cold.67+0xd2/0x20b
lock_acquire+0xb5/0x1c0
lock_sock_nested+0x6d/0x90
hvs_release+0x10/0x120 [hv_sock]
__vsock_release+0x24/0xf0 [vsock]
__vsock_release+0xa0/0xf0 [vsock]
vsock_release+0x12/0x30 [vsock]
__sock_release+0x37/0xa0
sock_close+0x14/0x20
__fput+0xc1/0x250
task_work_run+0x98/0xc0
do_exit+0x344/0xc60
do_group_exit+0x47/0xb0
get_signal+0x15c/0xc50
do_signal+0x30/0x720
exit_to_usermode_loop+0x50/0xa0
do_syscall_64+0x24e/0x270
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7f4184e85f31
Tested-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 44b321e502 ]
Commit dfec0ee22c ("udp: Record gso_segs when supporting UDP segmentation offload")
added gso_segs calculation, but incorrectly got sizeof() the pointer and
not the underlying data type. In addition let's fix the v6 case.
Fixes: bec1f6f697 ("udp: generate gso with UDP_SEGMENT")
Fixes: dfec0ee22c ("udp: Record gso_segs when supporting UDP segmentation offload")
Signed-off-by: Josh Hunt <johunt@akamai.com>
Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e95584a889 ]
We have identified a problem with the "oversubscription" policy in the
link transmission code.
When small messages are transmitted, and the sending link has reached
the transmit window limit, those messages will be bundled and put into
the link backlog queue. However, bundles of data messages are counted
at the 'CRITICAL' level, so that the counter for that level, instead of
the counter for the real, bundled message's level is the one being
increased.
Subsequent, to-be-bundled data messages at non-CRITICAL levels continue
to be tested against the unchanged counter for their own level, while
contributing to an unrestrained increase at the CRITICAL backlog level.
This leaves a gap in congestion control algorithm for small messages
that can result in starvation for other users or a "real" CRITICAL
user. Even that eventually can lead to buffer exhaustion & link reset.
We fix this by keeping a 'target_bskb' buffer pointer at each levels,
then when bundling, we only bundle messages at the same importance
level only. This way, we know exactly how many slots a certain level
have occupied in the queue, so can manage level congestion accurately.
By bundling messages at the same level, we even have more benefits. Let
consider this:
- One socket sends 64-byte messages at the 'CRITICAL' level;
- Another sends 4096-byte messages at the 'LOW' level;
When a 64-byte message comes and is bundled the first time, we put the
overhead of message bundle to it (+ 40-byte header, data copy, etc.)
for later use, but the next message can be a 4096-byte one that cannot
be bundled to the previous one. This means the last bundle carries only
one payload message which is totally inefficient, as for the receiver
also! Later on, another 64-byte message comes, now we make a new bundle
and the same story repeats...
With the new bundling algorithm, this will not happen, the 64-byte
messages will be bundled together even when the 4096-byte message(s)
comes in between. However, if the 4096-byte messages are sent at the
same level i.e. 'CRITICAL', the bundling algorithm will again cause the
same overhead.
Also, the same will happen even with only one socket sending small
messages at a rate close to the link transmit's one, so that, when one
message is bundled, it's transmitted shortly. Then, another message
comes, a new bundle is created and so on...
We will solve this issue radically by another patch.
Fixes: 365ad353c2 ("tipc: reduce risk of user starvation during link congestion")
Reported-by: Hoang Le <hoang.h.le@dektech.com.au>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit db9b2e0af6 ]
Fix the rxrpc_recvmsg tracepoint to handle being called with a NULL call
parameter.
Fixes: a25e21f0bc ("rxrpc, afs: Use debug_ids rather than pointers in traces")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 8c7138b33e ]
The "reuse->sock[]" array is shared by multiple sockets. The going away
sk must unpublish itself from "reuse->sock[]" before making call_rcu()
call. However, this unpublish-action is currently done after a grace
period and it may cause use-after-free.
The fix is to move reuseport_detach_sock() to sk_destruct().
Due to the above reason, any socket with sk_reuseport_cb has
to go through the rcu grace period before freeing it.
It is a rather old bug (~3 yrs). The Fixes tag is not necessary
the right commit but it is the one that introduced the SOCK_RCU_FREE
logic and this fix is depending on it.
Fixes: a4298e4522 ("net: add SOCK_RCU_FREE socket flag")
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 68ce6688a5 ]
The speed divisor is used in a context expecting an s64, but it is
evaluated using 32-bit arithmetic.
To avoid that happening, instead of multiplying by 1,000,000 in the
first place, simplify the fraction and do a standard 32 bit division
instead.
Fixes: f04b514c0c ("taprio: Set default link speed to 10 Mbps in taprio_set_picos_per_byte")
Reported-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 1acb8f2a7a ]
In ql_alloc_large_buffers, a new skb is allocated via netdev_alloc_skb.
This skb should be released if pci_dma_mapping_error fails.
Fixes: 0f8ab89e82 ("qla3xxx: Check return code from pci_map_single() in ql_release_to_lrg_buf_free_list(), ql_populate_free_queue(), ql_alloc_large_buffers(), and ql3xxx_send()")
Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b406472b5a ]
Since commit c09551c6ff ("net: ipv4: use a dedicated counter
for icmp_v4 redirect packets") we use 'n_redirects' to account
for redirect packets, but we still use 'rate_tokens' to compute
the redirect packets exponential backoff.
If the device sent to the relevant peer any ICMP error packet
after sending a redirect, it will also update 'rate_token' according
to the leaking bucket schema; typically 'rate_token' will raise
above BITS_PER_LONG and the redirect packets backoff algorithm
will produce undefined behavior.
Fix the issue using 'n_redirects' to compute the exponential backoff
in ip_rt_send_redirect().
Note that we still clear rate_tokens after a redirect silence period,
to avoid changing an established behaviour.
The root cause predates git history; before the mentioned commit in
the critical scenario, the kernel stopped sending redirects, after
the mentioned commit the behavior more randomic.
Reported-by: Xiumei Mu <xmu@redhat.com>
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Fixes: c09551c6ff ("net: ipv4: use a dedicated counter for icmp_v4 redirect packets")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2d819d250a ]
Rajendra reported a kernel panic when a link was taken down:
[ 6870.263084] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
[ 6870.271856] IP: [<ffffffff8efc5764>] __ipv6_ifa_notify+0x154/0x290
<snip>
[ 6870.570501] Call Trace:
[ 6870.573238] [<ffffffff8efc58c6>] ? ipv6_ifa_notify+0x26/0x40
[ 6870.579665] [<ffffffff8efc98ec>] ? addrconf_dad_completed+0x4c/0x2c0
[ 6870.586869] [<ffffffff8efe70c6>] ? ipv6_dev_mc_inc+0x196/0x260
[ 6870.593491] [<ffffffff8efc9c6a>] ? addrconf_dad_work+0x10a/0x430
[ 6870.600305] [<ffffffff8f01ade4>] ? __switch_to_asm+0x34/0x70
[ 6870.606732] [<ffffffff8ea93a7a>] ? process_one_work+0x18a/0x430
[ 6870.613449] [<ffffffff8ea93d6d>] ? worker_thread+0x4d/0x490
[ 6870.619778] [<ffffffff8ea93d20>] ? process_one_work+0x430/0x430
[ 6870.626495] [<ffffffff8ea99dd9>] ? kthread+0xd9/0xf0
[ 6870.632145] [<ffffffff8f01ade4>] ? __switch_to_asm+0x34/0x70
[ 6870.638573] [<ffffffff8ea99d00>] ? kthread_park+0x60/0x60
[ 6870.644707] [<ffffffff8f01ae77>] ? ret_from_fork+0x57/0x70
[ 6870.650936] Code: 31 c0 31 d2 41 b9 20 00 08 02 b9 09 00 00 0
addrconf_dad_work is kicked to be scheduled when a device is brought
up. There is a race between addrcond_dad_work getting scheduled and
taking the rtnl lock and a process taking the link down (under rtnl).
The latter removes the host route from the inet6_addr as part of
addrconf_ifdown which is run for NETDEV_DOWN. The former attempts
to use the host route in __ipv6_ifa_notify. If the down event removes
the host route due to the race to the rtnl, then the BUG listed above
occurs.
Since the DAD sequence can not be aborted, add a check for the missing
host route in __ipv6_ifa_notify. The only way this should happen is due
to the previously mentioned race. The host route is created when the
address is added to an interface; it is only removed on a down event
where the address is kept. Add a warning if the host route is missing
AND the device is up; this is a situation that should never happen.
Fixes: f1705ec197 ("net: ipv6: Make address flushing on ifdown optional")
Reported-by: Rajendra Dendukuri <rajendra.dendukuri@broadcom.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 6af1799aaf ]
This began with a syzbot report. syzkaller was injecting
IPv6 TCP SYN packets having a v4mapped source address.
After an unsuccessful 4-tuple lookup, TCP creates a request
socket (SYN_RECV) and calls reqsk_queue_hash_req()
reqsk_queue_hash_req() calls sk_ehashfn(sk)
At this point we have AF_INET6 sockets, and the heuristic
used by sk_ehashfn() to either hash the IPv4 or IPv6 addresses
is to use ipv6_addr_v4mapped(&sk->sk_v6_daddr)
For the particular spoofed packet, we end up hashing V4 addresses
which were not initialized by the TCP IPv6 stack, so KMSAN fired
a warning.
I first fixed sk_ehashfn() to test both source and destination addresses,
but then faced various problems, including user-space programs
like packetdrill that had similar assumptions.
Instead of trying to fix the whole ecosystem, it is better
to admit that we have a dual stack behavior, and that we
can not build linux kernels without V4 stack anyway.
The dual stack API automatically forces the traffic to be IPv4
if v4mapped addresses are used at bind() or connect(), so it makes
no sense to allow IPv6 traffic to use the same v4mapped class.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 8353da9fa6 ]
Fix NULL-pointer dereference on tty open due to a failure to handle a
missing interrupt-in endpoint when probing modem ports:
BUG: kernel NULL pointer dereference, address: 0000000000000006
...
RIP: 0010:tiocmget_submit_urb+0x1c/0xe0 [hso]
...
Call Trace:
hso_start_serial_device+0xdc/0x140 [hso]
hso_serial_open+0x118/0x1b0 [hso]
tty_open+0xf1/0x490
Fixes: 542f548236 ("tty: Modem functions for the HSO driver")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0e141f757b ]
erspan driver calls ether_setup(), after commit 61e84623ac
("net: centralize net_device min/max MTU checking"), the range
of mtu is [min_mtu, max_mtu], which is [68, 1500] by default.
It causes the dev mtu of the erspan device to not be greater
than 1500, this limit value is not correct for ipgre tap device.
Tested:
Before patch:
# ip link set erspan0 mtu 1600
Error: mtu greater than device maximum.
After patch:
# ip link set erspan0 mtu 1600
# ip -d link show erspan0
21: erspan0@NONE: <BROADCAST,MULTICAST> mtu 1600 qdisc noop state DOWN
mode DEFAULT group default qlen 1000
link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 0
Fixes: 61e84623ac ("net: centralize net_device min/max MTU checking")
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 6b517374f4 ]
When fetching free MSI-X vectors for ULDs, check for the error code
before accessing MSI-X info array. Otherwise, an out-of-bounds access is
attempted, which results in kernel panic.
Fixes: 94cdb8bb99 ("cxgb4: Add support for dynamic allocation of resources for ULD")
Signed-off-by: Shahjada Abul Husain <shahjada@chelsio.com>
Signed-off-by: Vishal Kulkarni <vishal@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 6279eb3dd7 ]
Since 9e3596b0c6 ("kbuild: initramfs cleanup, set target from Kconfig")
"make clean" leaves behind compressed initramfs images. Example:
$ make defconfig
$ sed -i 's|CONFIG_INITRAMFS_SOURCE=""|CONFIG_INITRAMFS_SOURCE="/tmp/ir.cpio"|' .config
$ make olddefconfig
$ make -s
$ make -s clean
$ git clean -ndxf | grep initramfs
Would remove usr/initramfs_data.cpio.gz
clean rules do not have CONFIG_* context so they do not know which
compression format was used. Thus they don't know which files to delete.
Tell clean to delete all possible compression formats.
Once patched usr/initramfs_data.cpio.gz and friends are deleted by
"make clean".
Link: http://lkml.kernel.org/r/20190722063251.55541-1-gthelen@google.com
Fixes: 9e3596b0c6 ("kbuild: initramfs cleanup, set target from Kconfig")
Signed-off-by: Greg Thelen <gthelen@google.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 24fbf7bad8 ]
There are two problems in sec_free_hw_sgl():
First, when sgl_current->next is valid, @hw_sgl will be freed in the
first loop, but it free again after the loop.
Second, sgl_current and sgl_current->next_sgl is not match when
dma_pool_free() is invoked, the third parameter should be the dma
address of sgl_current, but sgl_current->next_sgl is the dma address
of next chain, so use sgl_current->next_sgl is wrong.
Fix this by deleting the last dma_pool_free() in sec_free_hw_sgl(),
modifying the condition for while loop, and matching the address for
dma_pool_free().
Fixes: 915e4e8413 ("crypto: hisilicon - SEC security accelerator driver")
Signed-off-by: Yunfeng Ye <yeyunfeng@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 44460efe44 ]
If the CPU package has the less logical CPU than topo_max_cpus, but un-present
CPU's punit_cpu_core will be initiated to 0 and they will be count to core 0
Like below, there are only 10 high priority cores (20 logical CPUs) in the CPU
package, but it count to 27 logic CPUs.
./intel-speed-select base-freq info -l 0 | grep mask
high-priority-cpu-mask:7f000179,f000179f
With the fix patch:
./intel-speed-select base-freq info -l 0
high-priority-cpu-mask:00000179,f000179f
Signed-off-by: Youquan Song <youquan.song@intel.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b54c64f7ad ]
In hypfs_fill_super(), if hypfs_create_update_file() fails,
sbi->update_file is left holding an error number. This is passed to
hypfs_kill_super() which doesn't check for this.
Fix this by not setting sbi->update_value until after we've checked for
error.
Fixes: 24bbb1faf3 ("[PATCH] s390_hypfs filesystem")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
cc: Heiko Carstens <heiko.carstens@de.ibm.com>
cc: linux-s390@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 07bfa4415a ]
If userspace reads the buffer via blockdev while mounting,
sb_getblk()+modify can race with buffer read via blockdev.
For example,
FS userspace
bh = sb_getblk()
modify bh->b_data
read
ll_rw_block(bh)
fill bh->b_data by on-disk data
/* lost modified data by FS */
set_buffer_uptodate(bh)
set_buffer_uptodate(bh)
Userspace should not use the blockdev while mounting though, the udev
seems to be already doing this. Although I think the udev should try to
avoid this, workaround the race by small overhead.
Link: http://lkml.kernel.org/r/87pnk7l3sw.fsf_-_@mail.parknet.co.jp
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Reported-by: Jan Stancek <jstancek@redhat.com>
Tested-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 58494c980f ]
If equal to 0, the injection limit for a bfq_queue is pushed to 1
after a first sample of the total service time of the I/O requests of
the queue is computed (to allow injection to start). Yet, because of a
mistake in the branch that performs this action, the push may happen
also in some other case. This commit fixes this issue.
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8ebf15e9c8 ]
Commit acc8abcb2a ("i2c: tegra: Add suspend-resume support") added
suspend support for the Tegra I2C driver and following this change on
Tegra30 the following WARNING is seen on entering suspend ...
WARNING: CPU: 2 PID: 689 at /dvs/git/dirty/git-master_l4t-upstream/kernel/drivers/i2c/i2c-core.h:54 __i2c_transfer+0x35c/0x70c
i2c i2c-4: Transfer while suspended
Modules linked in: brcmfmac brcmutil
CPU: 2 PID: 689 Comm: rtcwake Not tainted 5.3.0-rc7-g089cf7f6ecb2 #1
Hardware name: NVIDIA Tegra SoC (Flattened Device Tree)
[<c0112264>] (unwind_backtrace) from [<c010ca94>] (show_stack+0x10/0x14)
[<c010ca94>] (show_stack) from [<c0a77024>] (dump_stack+0xb4/0xc8)
[<c0a77024>] (dump_stack) from [<c0124198>] (__warn+0xe0/0xf8)
[<c0124198>] (__warn) from [<c01241f8>] (warn_slowpath_fmt+0x48/0x6c)
[<c01241f8>] (warn_slowpath_fmt) from [<c06f6c40>] (__i2c_transfer+0x35c/0x70c)
[<c06f6c40>] (__i2c_transfer) from [<c06f7048>] (i2c_transfer+0x58/0xf4)
[<c06f7048>] (i2c_transfer) from [<c06f7130>] (i2c_transfer_buffer_flags+0x4c/0x70)
[<c06f7130>] (i2c_transfer_buffer_flags) from [<c05bee78>] (regmap_i2c_write+0x14/0x30)
[<c05bee78>] (regmap_i2c_write) from [<c05b9cac>] (_regmap_raw_write_impl+0x35c/0x868)
[<c05b9cac>] (_regmap_raw_write_impl) from [<c05b984c>] (_regmap_update_bits+0xe4/0xec)
[<c05b984c>] (_regmap_update_bits) from [<c05bad04>] (regmap_update_bits_base+0x50/0x74)
[<c05bad04>] (regmap_update_bits_base) from [<c04d453c>] (regulator_disable_regmap+0x44/0x54)
[<c04d453c>] (regulator_disable_regmap) from [<c04cf9d4>] (_regulator_do_disable+0xf8/0x268)
[<c04cf9d4>] (_regulator_do_disable) from [<c04d1694>] (_regulator_disable+0xf4/0x19c)
[<c04d1694>] (_regulator_disable) from [<c04d1770>] (regulator_disable+0x34/0x64)
[<c04d1770>] (regulator_disable) from [<c04d2310>] (regulator_bulk_disable+0x28/0xb4)
[<c04d2310>] (regulator_bulk_disable) from [<c0495cd4>] (tegra_pcie_power_off+0x64/0xa8)
[<c0495cd4>] (tegra_pcie_power_off) from [<c0495f74>] (tegra_pcie_pm_suspend+0x25c/0x3f4)
[<c0495f74>] (tegra_pcie_pm_suspend) from [<c05af48c>] (dpm_run_callback+0x38/0x1d4)
[<c05af48c>] (dpm_run_callback) from [<c05afe30>] (__device_suspend_noirq+0xc0/0x2b8)
[<c05afe30>] (__device_suspend_noirq) from [<c05b1c24>] (dpm_noirq_suspend_devices+0x100/0x37c)
[<c05b1c24>] (dpm_noirq_suspend_devices) from [<c05b1ebc>] (dpm_suspend_noirq+0x1c/0x48)
[<c05b1ebc>] (dpm_suspend_noirq) from [<c017d2c0>] (suspend_devices_and_enter+0x1d0/0xa00)
[<c017d2c0>] (suspend_devices_and_enter) from [<c017dd10>] (pm_suspend+0x220/0x74c)
[<c017dd10>] (pm_suspend) from [<c017c2c8>] (state_store+0x6c/0xc8)
[<c017c2c8>] (state_store) from [<c02ef398>] (kernfs_fop_write+0xe8/0x1c4)
[<c02ef398>] (kernfs_fop_write) from [<c0271e38>] (__vfs_write+0x2c/0x1c4)
[<c0271e38>] (__vfs_write) from [<c02748dc>] (vfs_write+0xa4/0x184)
[<c02748dc>] (vfs_write) from [<c0274b7c>] (ksys_write+0x9c/0xdc)
[<c0274b7c>] (ksys_write) from [<c0101000>] (ret_fast_syscall+0x0/0x54)
Exception stack(0xe9f21fa8 to 0xe9f21ff0)
1fa0: 0000006c 004b2438 00000004 004b2438 00000004 00000000
1fc0: 0000006c 004b2438 004b1228 00000004 00000004 00000004 0049e78c 004b1228
1fe0: 00000004 be9809b8 b6f0bc0b b6e96206
The problem is that the Tegra PCIe driver indirectly uses I2C for
controlling some regulators and the I2C driver is now being suspended
before the PCIe driver causing the PCIe suspend to fail. The Tegra PCIe
driver is suspended during the NOIRQ phase and this cannot be changed
due to other dependencies. Therefore, we also need to move the suspend
handling for the Tegra I2C driver to the NOIRQ phase as well.
In order to move the I2C suspend handling to the NOIRQ phase we also
need to avoid calling pm_runtime_get/put() because per commit
1e2ef05bb8 ("PM: Limit race conditions between runtime PM and system
sleep (v2)") these cannot be called early in resume. The function
tegra_i2c_init(), called during resume, calls pm_runtime_get/put() and
so move these calls outside of tegra_i2c_init(), so this function can
be used during the NOIRQ resume phase.
Fixes: acc8abcb2a ("i2c: tegra: Add suspend-resume support")
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 00d2ec1e6b ]
The calculation of memblock_limit in adjust_lowmem_bounds() assumes that
bank 0 starts from a PMD-aligned address. However, the beginning of the
first bank may be NOMAP memory and the start of usable memory
will be not aligned to PMD boundary. In such case the memblock_limit will
be set to the end of the NOMAP region, which will prevent any memblock
allocations.
Mark the region between the end of the NOMAP area and the next PMD-aligned
address as NOMAP as well, so that the usable memory will start at
PMD-aligned address.
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b0fe66cf09 ]
Currently, multi_v7_defconfig + CONFIG_FUNCTION_TRACER fails to build
with clang:
arm-linux-gnueabi-ld: kernel/softirq.o: in function `_local_bh_enable':
softirq.c:(.text+0x504): undefined reference to `mcount'
arm-linux-gnueabi-ld: kernel/softirq.o: in function `__local_bh_enable_ip':
softirq.c:(.text+0x58c): undefined reference to `mcount'
arm-linux-gnueabi-ld: kernel/softirq.o: in function `do_softirq':
softirq.c:(.text+0x6c8): undefined reference to `mcount'
arm-linux-gnueabi-ld: kernel/softirq.o: in function `irq_enter':
softirq.c:(.text+0x75c): undefined reference to `mcount'
arm-linux-gnueabi-ld: kernel/softirq.o: in function `irq_exit':
softirq.c:(.text+0x840): undefined reference to `mcount'
arm-linux-gnueabi-ld: kernel/softirq.o:softirq.c:(.text+0xa50): more undefined references to `mcount' follow
clang can emit a working mcount symbol, __gnu_mcount_nc, when
'-meabi gnu' is passed to it. Until r369147 in LLVM, this was
broken and caused the kernel not to boot with '-pg' because the
calling convention was not correct. Always build with '-meabi gnu'
when using clang but ensure that '-pg' (which is added with
CONFIG_FUNCTION_TRACER and its prereq CONFIG_HAVE_FUNCTION_TRACER)
cannot be added with it unless this is fixed (which means using
clang 10.0.0 and newer).
Link: https://github.com/ClangBuiltLinux/linux/issues/35
Link: https://bugs.llvm.org/show_bug.cgi?id=33845
Link: 16fa8b0970
Reviewed-by: Matthias Kaehlcke <mka@chromium.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Stefan Agner <stefan@agner.ch>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8050f3f664 ]
Move the static keyword to the front of declarations of pci_regs_behavior[]
and pcie_cap_regs_behavior[], which resolves compiler warnings when
building with "W=1":
drivers/pci/pci-bridge-emul.c:41:1: warning: ‘static’ is not at beginning of
declaration [-Wold-style-declaration]
const static struct pci_bridge_reg_behavior pci_regs_behavior[] = {
^
drivers/pci/pci-bridge-emul.c:176:1: warning: ‘static’ is not at beginning of
declaration [-Wold-style-declaration]
const static struct pci_bridge_reg_behavior pcie_cap_regs_behavior[] = {
^
Link: https://lore.kernel.org/r/20190826151436.4672-1-kw@linux.com
Link: https://lore.kernel.org/r/20190828131733.5817-1-kw@linux.com
Signed-off-by: Krzysztof Wilczynski <kw@linux.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3f4287e7d9 ]
In smack_socket_sock_rcv_skb(), there is an if statement
on line 3920 to check whether skb is NULL:
if (skb && skb->secmark != 0)
This check indicates skb can be NULL in some cases.
But on lines 3931 and 3932, skb is used:
ad.a.u.net->netif = skb->skb_iif;
ipv6_skb_to_auditdata(skb, &ad.a, NULL);
Thus, possible null-pointer dereferences may occur when skb is NULL.
To fix these possible bugs, an if statement is added to check skb.
These bugs are found by a static analysis tool STCheck written by us.
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ddd6960087 ]
devm_of_phy_get() can fail for a number of reasons besides probe
deferral. It can for example return -ENOMEM if it runs out of memory as
it tries to allocate devres structures. Propagating only -EPROBE_DEFER
is problematic because it results in these legitimately fatal errors
being treated as "PHY not specified in DT".
What we really want is to ignore the optional PHYs only if they have not
been specified in DT. devm_of_phy_get() returns -ENODEV in this case, so
that's the special case that we need to handle. So we propagate all
errors, except -ENODEV, so that real failures will still cause the
driver to fail probe.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Andrew Murray <andrew.murray@arm.com>
Cc: Jingoo Han <jingoohan1@gmail.com>
Cc: Kukjin Kim <kgene@kernel.org>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2170a09fb4 ]
regulator_get_optional() can fail for a number of reasons besides probe
deferral. It can for example return -ENOMEM if it runs out of memory as
it tries to allocate data structures. Propagating only -EPROBE_DEFER is
problematic because it results in these legitimately fatal errors being
treated as "regulator not specified in DT".
What we really want is to ignore the optional regulators only if they
have not been specified in DT. regulator_get_optional() returns -ENODEV
in this case, so that's the special case that we need to handle. So we
propagate all errors, except -ENODEV, so that real failures will still
cause the driver to fail probe.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Andrew Murray <andrew.murray@arm.com>
Cc: Richard Zhu <hongxing.zhu@nxp.com>
Cc: Lucas Stach <l.stach@pengutronix.de>
Cc: Shawn Guo <shawnguo@kernel.org>
Cc: Sascha Hauer <s.hauer@pengutronix.de>
Cc: Fabio Estevam <festevam@gmail.com>
Cc: kernel@pengutronix.de
Cc: linux-imx@nxp.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8f9e1641ba ]
regulator_get_optional() can fail for a number of reasons besides probe
deferral. It can for example return -ENOMEM if it runs out of memory as
it tries to allocate data structures. Propagating only -EPROBE_DEFER is
problematic because it results in these legitimately fatal errors being
treated as "regulator not specified in DT".
What we really want is to ignore the optional regulators only if they
have not been specified in DT. regulator_get_optional() returns -ENODEV
in this case, so that's the special case that we need to handle. So we
propagate all errors, except -ENODEV, so that real failures will still
cause the driver to fail probe.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Andrew Murray <andrew.murray@arm.com>
Cc: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0e3ff0ac5f ]
regulator_get_optional() can fail for a number of reasons besides probe
deferral. It can for example return -ENOMEM if it runs out of memory as
it tries to allocate data structures. Propagating only -EPROBE_DEFER is
problematic because it results in these legitimately fatal errors being
treated as "regulator not specified in DT".
What we really want is to ignore the optional regulators only if they
have not been specified in DT. regulator_get_optional() returns -ENODEV
in this case, so that's the special case that we need to handle. So we
propagate all errors, except -ENODEV, so that real failures will still
cause the driver to fail probe.
Tested-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Andrew Murray <andrew.murray@arm.com>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Acked-by: Shawn Lin <shawn.lin@rock-chips.com>
Cc: Shawn Lin <shawn.lin@rock-chips.com>
Cc: Heiko Stuebner <heiko@sntech.de>
Cc: linux-rockchip@lists.infradead.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit aec256d0ec ]
This fixes an issue in which key down events for function keys would be
repeatedly emitted even after the user has raised the physical key. For
example, the driver fails to emit the F5 key up event when going through
the following steps:
- fnmode=1: hold FN, hold F5, release FN, release F5
- fnmode=2: hold F5, hold FN, release F5, release FN
The repeated F5 key down events can be easily verified using xev.
Signed-off-by: Joao Moreno <mail@joaomoreno.com>
Co-developed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7f1c62c443 ]
Do not use printk_ratelimit() in drivers/pci/pci.c as it shares the rate
limiting state with all other callers to the printk_ratelimit().
Add pci_info_ratelimited() (similar to pci_notice_ratelimited() added in
the commit a88a7b3eb0 ("vfio: Use dev_printk() when possible")) and use
it instead of printk_ratelimit() + pci_info().
Link: https://lore.kernel.org/r/20190825224616.8021-1-kw@linux.com
Signed-off-by: Krzysztof Wilczynski <kw@linux.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 169ce0c081 ]
We need to use selinux_cred() to fetch the SELinux cred blob instead
of directly using current->security or current_security(). There
were a couple of lingering uses of current_security() in the SELinux code
that were apparently missed during the earlier conversions. IIUC, this
would only manifest as a bug if multiple security modules including
SELinux are enabled and SELinux is not first in the lsm order. After
this change, there appear to be no other users of current_security()
in-tree; perhaps we should remove it altogether.
Fixes: bbd3662a83 ("Infrastructure management of the cred security blob")
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: James Morris <jamorris@linux.microsoft.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f1b937cc86 ]
With the introduction of the HWMON compatibility layer to the power
supply framework in Linux 5.3, all power supply devices' names can be
used directly to create HWMON devices with the same names.
But HWMON has rules on allowable names that are different from those
used in the power supply framework. The dash character is forbidden, as
it is used by the libsensors library in userspace as a separator,
whereas this character is used in the device names in more than half of
the existing power supply drivers. This last case is consistent with the
typical naming usage with MFD and Device Tree.
This leads to warnings in the kernel log, with the format:
power_supply gpio-charger: hwmon: \
'gpio-charger' is not a valid name attribute, please fix
Add a protection to power_supply_add_hwmon_sysfs() that replaces any
dash in the device name with an underscore when registering with the
HWMON framework. Other forbidden characters (star, slash, space, tab,
newline) are not replaced, as they are not in common use.
Fixes: e67d4dfc9f ("power: supply: Add HWMON compatibility layer")
Signed-off-by: Romain Izard <romain.izard.pro@gmail.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7ef66122bd ]
Issue:
- # hwclock -w
hwclock: RTC_SET_TIME: Invalid argument
Why:
- Relative commit: 8b9f9d4dc5 ("regmap: verify if register is
writeable before writing operations"), this patch
will always check for unwritable registers, it will compare reg
with max_register in regmap_writeable.
- The pcf85363/pcf85263 has the capability of address wrapping
which means if you access an address outside the allowed range
(0x00-0x2f) hardware actually wraps the access to a lower address.
The rtc-pcf85363 driver will use this feature to configure the time
and execute 2 actions in the same i2c write operation (stopping the
clock and configure the time). However the driver has also
configured the `regmap maxregister` protection mechanism that will
block accessing addresses outside valid range (0x00-0x2f).
How:
- Split of writing regs to two parts, first part writes control
registers about stop_enable and resets, second part writes
RTC time and date registers.
Signed-off-by: Biwen Li <biwen.li@nxp.com>
Link: https://lore.kernel.org/r/20190829021418.4607-1-biwen.li@nxp.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit df901c85cc ]
Current code erroneously sets-up the CPU base address through the
parameter 'pci_addr', which is passed to initialize the CPU (AXI) base
address of the inbound window where the controller maps the PCI address
space into CPU physical address space; furthermore, it also truncates it
by programming only the lower 32-bit value into the inbound CPU address
register.
Fix both issues by introducing a new parameter 'u64 cpu_addr' to
initialize both lower 32-bit and upper 32-bit of the CPU physical
base address mapping PCI inbound transactions into CPU (AXI) ones.
Fixes: 9af6bcb11e ("PCI: mobiveil: Add Mobiveil PCIe Host Bridge IP driver")
Signed-off-by: Hou Zhiqiang <Zhiqiang.Hou@nxp.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Minghuan Lian <Minghuan.Lian@nxp.com>
Reviewed-by: Subrahmanya Lingappa <l.subrahmanya@mobiveil.co.in>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 834020366d ]
Translation faults arising from cache maintenance instructions are
rather unhelpfully reported with an FSR value where the WnR field is set
to 1, indicating that the faulting access was a write. Since cache
maintenance instructions on 32-bit ARM do not require any particular
permissions, this can cause our private 'cacheflush' system call to fail
spuriously if a translation fault is generated due to page aging when
targetting a read-only VMA.
In this situation, we will return -EFAULT to userspace, although this is
unfortunately suppressed by the popular '__builtin___clear_cache()'
intrinsic provided by GCC, which returns void.
Although it's tempting to write this off as a userspace issue, we can
actually do a little bit better on CPUs that support LPAE, even if the
short-descriptor format is in use. On these CPUs, cache maintenance
faults additionally set the CM field in the FSR, which we can use to
suppress the write permission checks in the page fault handler and
succeed in performing cache maintenance to read-only areas even in the
presence of a translation fault.
Reported-by: Orion Hodson <oth@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 42344113ba ]
Recent probing at the Linux Kernel Memory Model uncovered a
'surprise'. Strongly ordered architectures where the atomic RmW
primitive implies full memory ordering and
smp_mb__{before,after}_atomic() are a simple barrier() (such as MIPS
without WEAK_REORDERING_BEYOND_LLSC) fail for:
*x = 1;
atomic_inc(u);
smp_mb__after_atomic();
r0 = *y;
Because, while the atomic_inc() implies memory order, it
(surprisingly) does not provide a compiler barrier. This then allows
the compiler to re-order like so:
atomic_inc(u);
*x = 1;
smp_mb__after_atomic();
r0 = *y;
Which the CPU is then allowed to re-order (under TSO rules) like:
atomic_inc(u);
r0 = *y;
*x = 1;
And this very much was not intended. Therefore strengthen the atomic
RmW ops to include a compiler barrier.
Reported-by: Andrea Parri <andrea.parri@amarulasolutions.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Paul Burton <paul.burton@mips.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4ff96fb52c ]
klp_module_coming() is called for every module appearing in the system.
It sets obj->mod to a patched module for klp_object obj. Unfortunately
it leaves it set even if an error happens later in the function and the
patched module is not allowed to be loaded.
klp_is_object_loaded() uses obj->mod variable and could currently give a
wrong return value. The bug is probably harmless as of now.
Signed-off-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fd5d16531a ]
The layerscape PCIe controller have 4 BARs.
BAR0 and BAR1 are 32bit, BAR2 and BAR4 are 64bit and that's a
fixed hardware configuration.
Set the bar_fixed_64bit variable accordingly.
Signed-off-by: Xiaowei Bao <xiaowei.bao@nxp.com>
[lorenzo.pieralisi@arm.com: commit log]
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1c6c1ca318 ]
The comment describing the loongson_llsc_mb() reorder case doesn't
make any sense what so ever. Instruction re-ordering is not an SMP
artifact, but rather a CPU local phenomenon. Clarify the comment by
explaining that these issue cause a coherence fail.
For the branch speculation case; if futex_atomic_cmpxchg_inatomic()
needs one at the bne branch target, then surely the normal
__cmpxch_asm() implementation does too. We cannot rely on the
barriers from cmpxchg() because cmpxchg_local() is implemented with
the same macro, and branch prediction and speculation are, too, CPU
local.
Fixes: e02e07e312 ("MIPS: Loongson: Introduce and use loongson_llsc_mb()")
Cc: Huacai Chen <chenhc@lemote.com>
Cc: Huang Pei <huangpei@loongson.cn>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Paul Burton <paul.burton@mips.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 41a8e19f47 ]
With CONFIG_BD70528_WATCHDOG=m, a built-in rtc driver cannot call
into the low-level functions that are part of the watchdog module:
drivers/rtc/rtc-bd70528.o: In function `bd70528_set_time':
rtc-bd70528.c:(.text+0x22c): undefined reference to `bd70528_wdt_lock'
rtc-bd70528.c:(.text+0x2a8): undefined reference to `bd70528_wdt_unlock'
drivers/rtc/rtc-bd70528.o: In function `bd70528_set_rtc_based_timers':
rtc-bd70528.c:(.text+0x50c): undefined reference to `bd70528_wdt_set'
Add a Kconfig dependency which forces RTC to be a module if watchdog is a
module. If watchdog is not compiled at all the stub functions for watchdog
control are used. compiling the RTC without watchdog is fine.
Fixes: 32a4a4ebf7 ("rtc: bd70528: Initial support for ROHM bd70528 RTC")
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Matti Vaittinen <matti.vaittinen@fi.rohmeurope.com>
Link: https://lore.kernel.org/r/84462e01e43d39024948a3bdd24087ff87dc2255.1565591387.git.matti.vaittinen@fi.rohmeurope.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 073b50bccb ]
Addresses a few issues that were noticed when compiling with non-default
warnings enabled. The trimmed-down warnings in the order they are fixed
below are:
* declaration of 'size' shadows a parameter
* '%s' directive output may be truncated writing up to 5 bytes into a
region of size between 1 and 64
* pointer targets in initialization of 'char *' from 'unsigned char *'
differ in signedness
* left shift of negative value
Signed-off-by: Jason Gerecke <jason.gerecke@wacom.com>
Reviewed-by: Aaron Armstrong Skomra <aaron.skomra@wacom.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9e38e690ac ]
Each iteration of for_each_child_of_node() executes of_node_put() on the
previous node, but in some return paths in the middle of the loop
of_node_put() is missing thus causing a reference leak.
Hence stash these mid-loop return values in a variable 'err' and add a
new label err_node_put which executes of_node_put() on the previous node
and returns 'err' on failure.
Change mid-loop return statements to point to jump to this label to
fix the reference leak.
Issue found with Coccinelle.
Signed-off-by: Nishka Dasgupta <nishkadg.linux@gmail.com>
[lorenzo.pieralisi@arm.com: rewrote commit log]
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 76380a607b ]
Goodix touchpad may drop its first couple input events when
i2c-designware-platdrv and intel-lpss it connects to took too long to
runtime resume from runtime suspended state.
This issue happens becuase the touchpad has a rather small buffer to
store up to 13 input events, so if the host doesn't read those events in
time (i.e. runtime resume takes too long), events are dropped from the
touchpad's buffer.
The bottleneck is D3cold delay it waits when transitioning from D3cold
to D0, hence remove the delay to make the resume faster. I've tested
some systems with intel-lpss and haven't seen any regression.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202683
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 232219b9a4 ]
When the kernel is build with lockdep support and the i2c-cht-wc driver is
used, the following warning is shown:
[ 66.674334] ======================================================
[ 66.674337] WARNING: possible circular locking dependency detected
[ 66.674340] 5.3.0-rc4+ #83 Not tainted
[ 66.674342] ------------------------------------------------------
[ 66.674345] systemd-udevd/1232 is trying to acquire lock:
[ 66.674349] 00000000a74dab07 (intel_soc_pmic_chtwc:167:(&cht_wc_regmap_cfg)->lock){+.+.}, at: regmap_write+0x31/0x70
[ 66.674360]
but task is already holding lock:
[ 66.674362] 00000000d44a85b7 (i2c_register_adapter){+.+.}, at: i2c_smbus_xfer+0x49/0xf0
[ 66.674370]
which lock already depends on the new lock.
[ 66.674371]
the existing dependency chain (in reverse order) is:
[ 66.674374]
-> #1 (i2c_register_adapter){+.+.}:
[ 66.674381] rt_mutex_lock_nested+0x46/0x60
[ 66.674384] i2c_smbus_xfer+0x49/0xf0
[ 66.674387] i2c_smbus_read_byte_data+0x45/0x70
[ 66.674391] cht_wc_byte_reg_read+0x35/0x50
[ 66.674394] _regmap_read+0x63/0x1a0
[ 66.674396] _regmap_update_bits+0xa8/0xe0
[ 66.674399] regmap_update_bits_base+0x63/0xa0
[ 66.674403] regmap_irq_update_bits.isra.0+0x3b/0x50
[ 66.674406] regmap_add_irq_chip+0x592/0x7a0
[ 66.674409] devm_regmap_add_irq_chip+0x89/0xed
[ 66.674412] cht_wc_probe+0x102/0x158
[ 66.674415] i2c_device_probe+0x95/0x250
[ 66.674419] really_probe+0xf3/0x380
[ 66.674422] driver_probe_device+0x59/0xd0
[ 66.674425] device_driver_attach+0x53/0x60
[ 66.674428] __driver_attach+0x92/0x150
[ 66.674431] bus_for_each_dev+0x7d/0xc0
[ 66.674434] bus_add_driver+0x14d/0x1f0
[ 66.674437] driver_register+0x6d/0xb0
[ 66.674440] i2c_register_driver+0x45/0x80
[ 66.674445] do_one_initcall+0x60/0x2f4
[ 66.674450] kernel_init_freeable+0x20d/0x2b4
[ 66.674453] kernel_init+0xa/0x10c
[ 66.674457] ret_from_fork+0x3a/0x50
[ 66.674459]
-> #0 (intel_soc_pmic_chtwc:167:(&cht_wc_regmap_cfg)->lock){+.+.}:
[ 66.674465] __lock_acquire+0xe07/0x1930
[ 66.674468] lock_acquire+0x9d/0x1a0
[ 66.674472] __mutex_lock+0xa8/0x9a0
[ 66.674474] regmap_write+0x31/0x70
[ 66.674480] cht_wc_i2c_adap_smbus_xfer+0x72/0x240 [i2c_cht_wc]
[ 66.674483] __i2c_smbus_xfer+0x1a3/0x640
[ 66.674486] i2c_smbus_xfer+0x67/0xf0
[ 66.674489] i2c_smbus_read_byte_data+0x45/0x70
[ 66.674494] bq24190_probe+0x26b/0x410 [bq24190_charger]
[ 66.674497] i2c_device_probe+0x189/0x250
[ 66.674500] really_probe+0xf3/0x380
[ 66.674503] driver_probe_device+0x59/0xd0
[ 66.674506] device_driver_attach+0x53/0x60
[ 66.674509] __driver_attach+0x92/0x150
[ 66.674512] bus_for_each_dev+0x7d/0xc0
[ 66.674515] bus_add_driver+0x14d/0x1f0
[ 66.674518] driver_register+0x6d/0xb0
[ 66.674521] i2c_register_driver+0x45/0x80
[ 66.674524] do_one_initcall+0x60/0x2f4
[ 66.674528] do_init_module+0x5c/0x230
[ 66.674531] load_module+0x2707/0x2a20
[ 66.674534] __do_sys_init_module+0x188/0x1b0
[ 66.674537] do_syscall_64+0x5c/0xb0
[ 66.674541] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 66.674543]
other info that might help us debug this:
[ 66.674545] Possible unsafe locking scenario:
[ 66.674547] CPU0 CPU1
[ 66.674548] ---- ----
[ 66.674550] lock(i2c_register_adapter);
[ 66.674553] lock(intel_soc_pmic_chtwc:167:(&cht_wc_regmap_cfg)->lock);
[ 66.674556] lock(i2c_register_adapter);
[ 66.674559] lock(intel_soc_pmic_chtwc:167:(&cht_wc_regmap_cfg)->lock);
[ 66.674561]
*** DEADLOCK ***
The problem is that the CHT Whiskey Cove PMIC's builtin i2c-adapter is
itself a part of an i2c-client (the PMIC). This means that transfers done
through it take adapter->bus_lock twice, once for the parent i2c-adapter
and once for its own bus_lock. Lockdep does not like this nested locking.
To make lockdep happy in the case of busses with muxes, the i2c-core's
i2c_adapter_lock_bus function calls:
rt_mutex_lock_nested(&adapter->bus_lock, i2c_adapter_depth(adapter));
But i2c_adapter_depth only works when the direct parent of the adapter is
another adapter, as it is only meant for muxes. In this case there is an
i2c-client and MFD instantiated platform_device in the parent->child chain
between the 2 devices.
This commit overrides the default i2c_lock_operations, passing a hardcoded
depth of 1 to rt_mutex_lock_nested, making lockdep happy.
Note that if there were to be a mux attached to the i2c-wc-cht adapter,
this would break things again since the i2c-mux code expects the
root-adapter to have a locking depth of 0. But the i2c-wc-cht adapter
always has only 1 client directly attached in the form of the charger IC
paired with the CHT Whiskey Cove PMIC.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7727ae5297 ]
Remount process will release system zone which was allocated before if
"noblock_validity" is specified. If we mount an ext4 file system to two
mountpoints with default mount options, and then remount one of them
with "noblock_validity", it may trigger a use after free problem when
someone accessing the other one.
# mount /dev/sda foo
# mount /dev/sda bar
User access mountpoint "foo" | Remount mountpoint "bar"
|
ext4_map_blocks() | ext4_remount()
check_block_validity() | ext4_setup_system_zone()
ext4_data_block_valid() | ext4_release_system_zone()
| free system_blks rb nodes
access system_blks rb nodes |
trigger use after free |
This problem can also be reproduced by one mountpint, At the same time,
add_system_zone() can get called during remount as well so there can be
racing ext4_data_block_valid() reading the rbtree at the same time.
This patch add RCU to protect system zone from releasing or building
when doing a remount which inverse current "noblock_validity" mount
option. It assign the rbtree after the whole tree was complete and
do actual freeing after rcu grace period, avoid any intermediate state.
Reported-by: syzbot+1e470567330b7ad711d5@syzkaller.appspotmail.com
Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a8933b6b68 ]
As reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=204193
A null pointer dereference bug is triggered in f2fs under kernel-5.1.3.
kasan_report.cold+0x5/0x32
f2fs_write_end_io+0x215/0x650
bio_endio+0x26e/0x320
blk_update_request+0x209/0x5d0
blk_mq_end_request+0x2e/0x230
lo_complete_rq+0x12c/0x190
blk_done_softirq+0x14a/0x1a0
__do_softirq+0x119/0x3e5
irq_exit+0x94/0xe0
call_function_single_interrupt+0xf/0x20
During umount, we will access NULL sbi->node_inode pointer in
f2fs_write_end_io():
f2fs_bug_on(sbi, page->mapping == NODE_MAPPING(sbi) &&
page->index != nid_of_node(page));
The reason is if disable_checkpoint mount option is on, meta dirty
pages can remain during umount, and then be flushed by iput() of
meta_inode, however node_inode has been iput()ed before
meta_inode's iput().
Since checkpoint is disabled, all meta/node datas are useless and
should be dropped in next mount, so in umount, let's adjust
drop_inode() to give a hint to iput_final() to drop all those dirty
datas correctly.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e7ca44ed3b ]
Since commit 4388c9b3a6 ("powerpc: Do not send system reset request
through the oops path"), pstore dmesg file is not updated when dump is
triggered from HMC. This commit modified system reset (sreset) handler
to invoke fadump or kdump (if configured), without pushing dmesg to
pstore. This leaves pstore to have old dmesg data which won't be much
of a help if kdump fails to capture the dump. This patch fixes that by
calling kmsg_dump() before heading to fadump ot kdump.
Fixes: 4388c9b3a6 ("powerpc: Do not send system reset request through the oops path")
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190904075949.15607-1-ganeshgr@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a9aa830607 ]
When registering the PLL, unbypass the PLL.
The PLL has two bypass control bit, BYPASS and EXT_BYPASS.
we will expose EXT_BYPASS to clk driver for mux usage, and keep
BYPASS inside pll14xx usage. The PLL has a restriction that
when M/P change, need to RESET/BYPASS pll to avoid glitch, so
we could not expose BYPASS.
To make it easy for clk driver usage, unbypass PLL which does
not hurt current function.
Fixes: 8646d4dcc7 ("clk: imx: Add PLLs driver for imx8mm soc")
Reviewed-by: Leonard Crestez <leonard.crestez@nxp.com>
Signed-off-by: Peng Fan <peng.fan@nxp.com>
Link: https://lkml.kernel.org/r/1568043491-20680-3-git-send-email-peng.fan@nxp.com
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit dee1bc9c23 ]
According to PLL1443XA and PLL1416X spec,
"When BYPASS is 0 and RESETB is changed from 0 to 1, FOUT starts to
output unstable clock until lock time passes. PLL1416X/PLL1443XA may
generate a glitch at FOUT."
So set BYPASS when RESETB is changed from 0 to 1 to avoid glitch.
In the end of set rate, BYPASS will be cleared.
When prepare clock, also need to take care to avoid glitch. So
we also follow Spec to set BYPASS before RESETB changed from 0 to 1.
And add a check if the RESETB is already 0, directly return 0;
Fixes: 8646d4dcc7 ("clk: imx: Add PLLs driver for imx8mm soc")
Reviewed-by: Leonard Crestez <leonard.crestez@nxp.com>
Signed-off-by: Peng Fan <peng.fan@nxp.com>
Link: https://lkml.kernel.org/r/1568043491-20680-2-git-send-email-peng.fan@nxp.com
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 920fdab7b3 ]
On arm64 build with clang, sometimes the __cmpxchg_mb is not inlined
when CONFIG_OPTIMIZE_INLINING is set.
Clang then fails a compile-time assertion, because it cannot tell at
compile time what the size of the argument is:
mm/memcontrol.o: In function `__cmpxchg_mb':
memcontrol.c:(.text+0x1a4c): undefined reference to `__compiletime_assert_175'
memcontrol.c:(.text+0x1a4c): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `__compiletime_assert_175'
Mark all of the cmpxchg() style functions as __always_inline to
ensure that the compiler can see the result.
Acked-by: Nick Desaulniers <ndesaulniers@google.com>
Reported-by: Nathan Chancellor <natechancellor@gmail.com>
Link: https://github.com/ClangBuiltLinux/linux/issues/648
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Tested-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Andrew Murray <andrew.murray@arm.com>
Tested-by: Andrew Murray <andrew.murray@arm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2a7326caab ]
The D-Link DIR-685 had its clock polarity set as active
low using the special SPI "spi-cpol" property.
This is not correct: the datasheet clearly states:
"Fix SCL to GND level when not in use" which is
indicative that this line is active high.
After a recent fix making the GPIO-based SPI driver
force the clock line de-asserted at the beginning of
each SPI transaction this reared its ugly head: now
de-asserted was taken to mean the line should be
driven high, but it should be driven low.
Fix this up in the DTS file and the display works again.
Link: https://lore.kernel.org/r/20190915135444.11066-1-linus.walleij@linaro.org
Cc: Mark Brown <broonie@kernel.org>
Fixes: 2922d1cc16 ("spi: gpio: Add SPI_MASTER_GPIO_SS flag")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6058f11870 ]
GCE hardware stored event information in own internal sysram,
if the initial value in those sysram is not zero value
it will cause a situation that gce can wait the event immediately
after client ask gce to wait event but not really trigger the
corresponding hardware.
In order to make sure that the wait event function is
exactly correct, we need to clear the sysram value in
cmdq initial flow.
Fixes: 623a6143a8 ("mailbox: mediatek: Add Mediatek CMDQ driver")
Signed-off-by: Bibby Hsieh <bibby.hsieh@mediatek.com>
Reviewed-by: CK Hu <ck.hu@mediatek.com>
Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com>
Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 92c94dfb69 ]
prep_irq_for_idle() is intended to be called before entering
H_CEDE (and it is used by the pseries cpuidle driver). However the
default pseries idle routine does not call it, leading to mismanaged
lazy irq state when the cpuidle driver isn't in use. Manifestations of
this include:
* Dropped IPIs in the time immediately after a cpu comes
online (before it has installed the cpuidle handler), making the
online operation block indefinitely waiting for the new cpu to
respond.
* Hitting this WARN_ON in arch_local_irq_restore():
/*
* We should already be hard disabled here. We had bugs
* where that wasn't the case so let's dbl check it and
* warn if we are wrong. Only do that when IRQ tracing
* is enabled as mfmsr() can be costly.
*/
if (WARN_ON_ONCE(mfmsr() & MSR_EE))
__hard_irq_disable();
Call prep_irq_for_idle() from pseries_lpar_idle() and honor its
result.
Fixes: 363edbe261 ("powerpc: Default arch idle could cede processor on pseries")
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190910225244.25056-1-nathanl@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5e4b7e82d4 ]
Some MMC cards fail to enumerate properly when inserted into an MMC slot
on sdm845 devices. This is because the clk ops for qcom clks round the
frequency up to the nearest rate instead of down to the nearest rate.
For example, the MMC driver requests a frequency of 52MHz from
clk_set_rate() but the qcom implementation for these clks rounds 52MHz
up to the next supported frequency of 100MHz. The MMC driver could be
modified to request clk rate ranges but for now we can fix this in the
clk driver by changing the rounding policy for this clk to be round down
instead of round up.
Fixes: 06391eddb6 ("clk: qcom: Add Global Clock controller (GCC) driver for SDM845")
Reported-by: Douglas Anderson <dianders@chromium.org>
Cc: Taniya Das <tdas@codeaurora.org>
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Link: https://lkml.kernel.org/r/20190830195142.103564-1-swboyd@chromium.org
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 799abe283e ]
When the last device in an eeh_pe is removed the eeh_pe structure itself
(and any empty parents) are freed since they are no longer needed. This
results in a crash when a hotplug driver is involved since the following
may occur:
1. Device is suprise removed.
2. Driver performs an MMIO, which fails and queues and eeh_event.
3. Hotplug driver receives a hotplug interrupt and removes any
pci_devs that were under the slot.
4. pci_dev is torn down and the eeh_pe is freed.
5. The EEH event handler thread processes the eeh_event and crashes
since the eeh_pe pointer in the eeh_event structure is no
longer valid.
Crashing is generally considered poor form. Instead of doing that use
the fact PEs are marked as EEH_PE_INVALID to keep them around until the
end of the recovery cycle, at which point we can safely prune any empty
PEs.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190903101605.2890-2-oohall@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0b66370c61 ]
Bare metal machine checks run an "early" handler in real mode before
running the main handler which reports the event.
The main handler runs exactly as a normal interrupt handler, after the
"windup" which sets registers back as they were at interrupt entry.
CFAR does not get restored by the windup code, so that will be wrong
when the handler is run.
Restore the CFAR to the saved value before running the late handler.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190802105709.27696-8-npiggin@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1e7f100ce8 ]
[Why]
In newer hardware MANUAL_FLOW_CONTROL is not a trigger bit. Due to this
front porch is fixed and in these hardware freesync does not work.
[How]
Change the programming to generate a pulse so that the event will be
triggered, front porch will be cut short and freesync will work.
Signed-off-by: Yogesh Mohan Marimuthu <yogesh.mohanmarimuthu@amd.com>
Reviewed-by: Anthony Koo <Anthony.Koo@amd.com>
Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 89cb561473 ]
[why]
With Scatter Gather enabled, HUBP underflows during MPO enabled video
playback. hubp_init has a register write that fixes this problem, but
the register is cleared when HUBP gets power gated.
[how]
Make a call to hubp_init during enable_plane, so that the fix can
be applied after HUBP powers back on again.
Signed-off-by: Zi Yu Liao <ziyu.liao@amd.com>
Reviewed-by: Tony Cheng <Tony.Cheng@amd.com>
Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 706feb26f8 ]
fix other navi asic set peak performance level error.
because the navi10_ppt.c will handle navi12 14 asic,
it will use navi10 peak value to set other asic, it is not correct.
after patch:
only navi10 use custom peak value, other asic will used default value.
Signed-off-by: Kevin Wang <kevin1.wang@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f787216f33 ]
The CPG/MSSR Clock Domain driver does not implement the
generic_pm_domain.power_{on,off}() callbacks, as the domain itself
cannot be powered down. Hence the domain should be marked as always-on
by setting the GENPD_FLAG_ALWAYS_ON flag, to prevent the core PM Domain
code from considering it for power-off, and doing unnessary processing.
Note that this only affects RZ/A2 SoCs. On R-Car Gen2 and Gen3 SoCs,
the R-Car SYSC driver handles Clock Domain creation, and offloads only
device attachment/detachment to the CPG/MSSR driver.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Simon Horman <horms+renesas@verge.net.au>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a459a184c9 ]
The CPG/MSTP Clock Domain driver does not implement the
generic_pm_domain.power_{on,off}() callbacks, as the domain itself
cannot be powered down. Hence the domain should be marked as always-on
by setting the GENPD_FLAG_ALWAYS_ON flag, to prevent the core PM Domain
code from considering it for power-off, and doing unnessary processing.
This also gets rid of a boot warning when the Clock Domain contains an
IRQ-safe device, e.g. on RZ/A1:
sh_mtu2 fcff0000.timer: PM domain cpg_clocks will not be powered off
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Simon Horman <horms+renesas@verge.net.au>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d21b8adbd4 ]
When cold-booting Asus X434DA, GPIO 7 is found to be already configured
as an interrupt, and the GPIO level is found to be in a state that
causes the interrupt to fire.
As soon as pinctrl-amd probes, this interrupt fires and invokes
amd_gpio_irq_handler(). The IRQ is acked, but no GPIO-IRQ handler was
invoked, so the GPIO level being unchanged just causes another interrupt
to fire again immediately after.
This results in an interrupt storm causing this platform to hang
during boot, right after pinctrl-amd is probed.
Detect this situation and disable the GPIO interrupt when this happens.
This enables the affected platform to boot as normal. GPIO 7 actually is
the I2C touchpad interrupt line, and later on, i2c-multitouch loads and
re-enables this interrupt when it is ready to handle it.
Instead of this approach, I considered disabling all GPIO interrupts at
probe time, however that seems a little risky, and I also confirmed that
Windows does not seem to have this behaviour: the same 41 GPIO IRQs are
enabled under both Linux and Windows, which is a far larger collection
than the GPIOs referenced by the DSDT on this platform.
Signed-off-by: Daniel Drake <drake@endlessm.com>
Link: https://lore.kernel.org/r/20190814090540.7152-1-drake@endlessm.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a1af2afbd2 ]
Some, mostly Fermi, vbioses appear to have zero max voltage. That causes Nouveau to not parse voltage entries, thus users not being able to set higher clocks.
When changing this value Nvidia driver still appeared to ignore it, and I wasn't able to find out why, thus the code is ignoring the value if it is zero.
CC: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Mark Menzynski <mmenzyns@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1e339ab2ac ]
On Turing, an input LUT is required to transform inputs in fixed-point
formats to FP16 for the internal display pipe. We provide an identity
mapping whenever a window is enabled for this reason.
HW has error checks to ensure when the input is already FP16, that the
input LUT is also disabled.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9e48495017 ]
v2: set num_types based on num_instances
navi1x has 2 sdma engines but commit
"e7b58d03b678 drm/amdgpu: reorganize sdma v4 code to support more instances"
changes the max number of sdma irq types (AMDGPU_SDMA_IRQ_LAST) from 2 to 8
which causes amdgpu_irq_gpu_reset_resume_helper() to recover irq of sdma
engines with following logic:
(enable irq for sdma0) * 1 time
(enable irq for sdma1) * 1 time
(disable irq for sdma1) * 6 times
as a result, after gpu reset, interrupt for sdma1 is lost.
Signed-off-by: Xiaojie Yuan <xiaojie.yuan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 92c8026854 ]
vfio_pci_enable() saves the device's initial configuration information
with the intent that it is restored in vfio_pci_disable(). However,
the commit referenced in Fixes: below replaced the call to
__pci_reset_function_locked(), which is not wrapped in a state save
and restore, with pci_try_reset_function(), which overwrites the
restored device state with the current state before applying it to the
device. Reinstate use of __pci_reset_function_locked() to return to
the desired behavior.
Fixes: 890ed578df ("vfio-pci: Use pci "try" reset interface")
Signed-off-by: hexin <hexin15@baidu.com>
Signed-off-by: Liu Qi <liuqi16@baidu.com>
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit aa06e3d60e ]
The EEH_DEV_NO_HANDLER flag is used by the EEH system to prevent the
use of driver callbacks in drivers that have been bound part way
through the recovery process. This is necessary to prevent later stage
handlers from being called when the earlier stage handlers haven't,
which can be confusing for drivers.
However, the flag is set for all devices that are added after boot
time and only cleared at the end of the EEH recovery process. This
results in hot plugged devices erroneously having the flag set during
the first recovery after they are added (causing their driver's
handlers to be incorrectly ignored).
To remedy this, clear the flag at the beginning of recovery
processing. The flag is still cleared at the end of recovery
processing, although it is no longer really necessary.
Also clear the flag during eeh_handle_special_event(), for the same
reasons.
Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b8ca5629d27de74c957d4f4b250177d1b6fc4bbd.1565930772.git.sbobroff@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ccfb5bd71d ]
After a partition migration, pseries_devicetree_update() processes
changes to the device tree communicated from the platform to
Linux. This is a relatively heavyweight operation, with multiple
device tree searches, memory allocations, and conversations with
partition firmware.
There's a few levels of nested loops which are bounded only by
decisions made by the platform, outside of Linux's control, and indeed
we have seen RCU stalls on large systems while executing this call
graph. Use cond_resched() in these loops so that the cpu is yielded
when needed.
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190802192926.19277-4-nathanl@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8f51e39294 ]
create_physical_mapping expects physical addresses, but creating and
splitting these mappings after boot is supplying virtual (effective)
addresses. This can be irritated by booting with mem= to limit memory
then probing an unused physical memory range:
echo <addr> > /sys/devices/system/memory/probe
This mostly works by accident, firstly because __va(__va(x)) == __va(x)
so the virtual address does not get corrupted. Secondly because pfn_pte
masks out the upper bits of the pfn beyond the physical address limit,
so a pfn constructed with a 0xc000000000000000 virtual linear address
will be masked back to the correct physical address in the pte.
Fixes: 6cc27341b2 ("powerpc/mm: add radix__create_section_mapping()")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190724084638.24982-1-npiggin@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 38a0d0cdb4 ]
We see warnings such as:
kernel/futex.c: In function 'do_futex':
kernel/futex.c:1676:17: warning: 'oldval' may be used uninitialized in this function [-Wmaybe-uninitialized]
return oldval == cmparg;
^
kernel/futex.c:1651:6: note: 'oldval' was declared here
int oldval, ret;
^
This is because arch_futex_atomic_op_inuser() only sets *oval if ret
is 0 and GCC doesn't see that it will only use it when ret is 0.
Anyway, the non-zero ret path is an error path that won't suffer from
setting *oval, and as *oval is a local var in futex_atomic_op_inuser()
it will have no impact.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[mpe: reword change log slightly]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/86b72f0c134367b214910b27b9a6dd3321af93bb.1565774657.git.christophe.leroy@c-s.fr
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e033829d2a ]
walk_pagetables() always walk the entire pgdir from address 0
but considers PAGE_OFFSET or KERN_VIRT_START as the starting
address of the walk, resulting in a possible mismatch in the
displayed addresses.
Ex: on PPC32, when KERN_VIRT_START was locally defined as
PAGE_OFFSET, ptdump displayed 0x80000000
instead of 0xc0000000 for the first kernel page,
because 0xc0000000 + 0xc0000000 = 0x80000000
Start the walk at st->start_address instead of starting at 0.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/5aa2ac513295f594cce8ddb1c649f61947bd063d.1565786091.git.christophe.leroy@c-s.fr
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a6717c01dd ]
The LPAR migration implementation and userspace-initiated cpu hotplug
can interleave their executions like so:
1. Set cpu 7 offline via sysfs.
2. Begin a partition migration, whose implementation requires the OS
to ensure all present cpus are online; cpu 7 is onlined:
rtas_ibm_suspend_me -> rtas_online_cpus_mask -> cpu_up
This sets cpu 7 online in all respects except for the cpu's
corresponding struct device; dev->offline remains true.
3. Set cpu 7 online via sysfs. _cpu_up() determines that cpu 7 is
already online and returns success. The driver core (device_online)
sets dev->offline = false.
4. The migration completes and restores cpu 7 to offline state:
rtas_ibm_suspend_me -> rtas_offline_cpus_mask -> cpu_down
This leaves cpu7 in a state where the driver core considers the cpu
device online, but in all other respects it is offline and
unused. Attempts to online the cpu via sysfs appear to succeed but the
driver core actually does not pass the request to the lower-level
cpuhp support code. This makes the cpu unusable until the cpu device
is manually set offline and then online again via sysfs.
Instead of directly calling cpu_up/cpu_down, the migration code should
use the higher-level device core APIs to maintain consistent state and
serialize operations.
Fixes: 120496ac2d ("powerpc: Bring all threads online prior to migration/hibernation")
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190802192926.19277-2-nathanl@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c3e0dbd7f7 ]
Currently, the xmon 'dx' command calls OPAL to dump the XIVE state in
the OPAL logs and also outputs some of the fields of the internal XIVE
structures in Linux. The OPAL calls can only be done on baremetal
(PowerNV) and they crash a pseries machine. Fix by checking the
hypervisor feature of the CPU.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190814154754.23682-2-clg@kaod.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a7b85ad25a ]
The implementation of clk_hw_get_name() relies on the clk_core
associated with the clk_hw pointer existing. If of_clk_hw_register()
fails, there isn't a clk_core created yet, so calling clk_hw_get_name()
here fails. Extract the name first so we can print it later.
Fixes: 1d80c14248 ("clk: sunxi-ng: Add common infrastructure")
Cc: Maxime Ripard <maxime.ripard@bootlin.com>
Cc: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit cf9ec1fc6d ]
A future patch is going to change semantics of clk_register() so that
clk_hw::init is guaranteed to be NULL after a clk is registered. Avoid
referencing this member here so that we don't run into NULL pointer
exceptions.
Cc: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Link: https://lkml.kernel.org/r/20190731193517.237136-2-sboyd@kernel.org
[sboyd@kernel.org: Move name to after checking for error or NULL hw]
Acked-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c37c792dec ]
We allocate only the first level of multilevel TCE tables for KVM
already (alloc_userspace_copy==true), and the rest is allocated on demand.
This is not enabled though for bare metal.
This removes the KVM limitation (implicit, via the alloc_userspace_copy
parameter) and always allocates just the first level. The on-demand
allocation of missing levels is already implemented.
As from now on DMA map might happen with disabled interrupts, this
allocates TCEs with GFP_ATOMIC; otherwise lockdep reports errors 1].
In practice just a single page is allocated there so chances for failure
are quite low.
To save time when creating a new clean table, this skips non-allocated
indirect TCE entries in pnv_tce_free just like we already do in
the VFIO IOMMU TCE driver.
This changes the default level number from 1 to 2 to reduce the amount
of memory required for the default 32bit DMA window at the boot time.
The default window size is up to 2GB which requires 4MB of TCEs which is
unlikely to be used entirely or at all as most devices these days are
64bit capable so by switching to 2 levels by default we save 4032KB of
RAM per a device.
While at this, add __GFP_NOWARN to alloc_pages_node() as the userspace
can trigger this path via VFIO, see the failure and try creating a table
again with different parameters which might succeed.
[1]:
===
BUG: sleeping function called from invalid context at mm/page_alloc.c:4596
in_atomic(): 1, irqs_disabled(): 1, pid: 1038, name: scsi_eh_1
2 locks held by scsi_eh_1/1038:
#0: 000000005efd659a (&host->eh_mutex){+.+.}, at: ata_eh_acquire+0x34/0x80
#1: 0000000006cf56a6 (&(&host->lock)->rlock){....}, at: ata_exec_internal_sg+0xb0/0x5c0
irq event stamp: 500
hardirqs last enabled at (499): [<c000000000cb8a74>] _raw_spin_unlock_irqrestore+0x94/0xd0
hardirqs last disabled at (500): [<c000000000cb85c4>] _raw_spin_lock_irqsave+0x44/0x120
softirqs last enabled at (0): [<c000000000101120>] copy_process.isra.4.part.5+0x640/0x1a80
softirqs last disabled at (0): [<0000000000000000>] 0x0
CPU: 73 PID: 1038 Comm: scsi_eh_1 Not tainted 5.2.0-rc6-le_nv2_aikATfstn1-p1 #634
Call Trace:
[c000003d064cef50] [c000000000c8e6c4] dump_stack+0xe8/0x164 (unreliable)
[c000003d064cefa0] [c00000000014ed78] ___might_sleep+0x2f8/0x310
[c000003d064cf020] [c0000000003ca084] __alloc_pages_nodemask+0x2a4/0x1560
[c000003d064cf220] [c0000000000c2530] pnv_alloc_tce_level.isra.0+0x90/0x130
[c000003d064cf290] [c0000000000c2888] pnv_tce+0x128/0x3b0
[c000003d064cf360] [c0000000000c2c00] pnv_tce_build+0xb0/0xf0
[c000003d064cf3c0] [c0000000000bbd9c] pnv_ioda2_tce_build+0x3c/0xb0
[c000003d064cf400] [c00000000004cfe0] ppc_iommu_map_sg+0x210/0x550
[c000003d064cf510] [c00000000004b7a4] dma_iommu_map_sg+0x74/0xb0
[c000003d064cf530] [c000000000863944] ata_qc_issue+0x134/0x470
[c000003d064cf5b0] [c000000000863ec4] ata_exec_internal_sg+0x244/0x5c0
[c000003d064cf700] [c0000000008642d0] ata_exec_internal+0x90/0xe0
[c000003d064cf780] [c0000000008650ac] ata_dev_read_id+0x2ec/0x640
[c000003d064cf8d0] [c000000000878e28] ata_eh_recover+0x948/0x16d0
[c000003d064cfa10] [c00000000087d760] sata_pmp_error_handler+0x480/0xbf0
[c000003d064cfbc0] [c000000000884624] ahci_error_handler+0x74/0xe0
[c000003d064cfbf0] [c000000000879fa8] ata_scsi_port_error_handler+0x2d8/0x7c0
[c000003d064cfca0] [c00000000087a544] ata_scsi_error+0xb4/0x100
[c000003d064cfd00] [c000000000802450] scsi_error_handler+0x120/0x510
[c000003d064cfdb0] [c000000000140c48] kthread+0x1b8/0x1c0
[c000003d064cfe20] [c00000000000bd8c] ret_from_kernel_thread+0x5c/0x70
ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
irq event stamp: 2305
========================================================
hardirqs last enabled at (2305): [<c00000000000e4c8>] fast_exc_return_irq+0x28/0x34
hardirqs last disabled at (2303): [<c000000000cb9fd0>] __do_softirq+0x4a0/0x654
WARNING: possible irq lock inversion dependency detected
5.2.0-rc6-le_nv2_aikATfstn1-p1 #634 Tainted: G W
softirqs last enabled at (2304): [<c000000000cba054>] __do_softirq+0x524/0x654
softirqs last disabled at (2297): [<c00000000010f278>] irq_exit+0x128/0x180
--------------------------------------------------------
swapper/0/0 just changed the state of lock:
0000000006cf56a6 (&(&host->lock)->rlock){-...}, at: ahci_single_level_irq_intr+0xac/0x120
but this lock took another, HARDIRQ-unsafe lock in the past:
(fs_reclaim){+.+.}
and interrupts could create inverse lock ordering between them.
other info that might help us debug this:
Possible interrupt unsafe locking scenario:
CPU0 CPU1
---- ----
lock(fs_reclaim);
local_irq_disable();
lock(&(&host->lock)->rlock);
lock(fs_reclaim);
<Interrupt>
lock(&(&host->lock)->rlock);
*** DEADLOCK ***
no locks held by swapper/0/0.
the shortest dependencies between 2nd lock and 1st lock:
-> (fs_reclaim){+.+.} ops: 167579 {
HARDIRQ-ON-W at:
lock_acquire+0xf8/0x2a0
fs_reclaim_acquire.part.23+0x44/0x60
kmem_cache_alloc_node_trace+0x80/0x590
alloc_desc+0x64/0x270
__irq_alloc_descs+0x2e4/0x3a0
irq_domain_alloc_descs+0xb0/0x150
irq_create_mapping+0x168/0x2c0
xics_smp_probe+0x2c/0x98
pnv_smp_probe+0x40/0x9c
smp_prepare_cpus+0x524/0x6c4
kernel_init_freeable+0x1b4/0x650
kernel_init+0x2c/0x148
ret_from_kernel_thread+0x5c/0x70
SOFTIRQ-ON-W at:
lock_acquire+0xf8/0x2a0
fs_reclaim_acquire.part.23+0x44/0x60
kmem_cache_alloc_node_trace+0x80/0x590
alloc_desc+0x64/0x270
__irq_alloc_descs+0x2e4/0x3a0
irq_domain_alloc_descs+0xb0/0x150
irq_create_mapping+0x168/0x2c0
xics_smp_probe+0x2c/0x98
pnv_smp_probe+0x40/0x9c
smp_prepare_cpus+0x524/0x6c4
kernel_init_freeable+0x1b4/0x650
kernel_init+0x2c/0x148
ret_from_kernel_thread+0x5c/0x70
INITIAL USE at:
lock_acquire+0xf8/0x2a0
fs_reclaim_acquire.part.23+0x44/0x60
kmem_cache_alloc_node_trace+0x80/0x590
alloc_desc+0x64/0x270
__irq_alloc_descs+0x2e4/0x3a0
irq_domain_alloc_descs+0xb0/0x150
irq_create_mapping+0x168/0x2c0
xics_smp_probe+0x2c/0x98
pnv_smp_probe+0x40/0x9c
smp_prepare_cpus+0x524/0x6c4
kernel_init_freeable+0x1b4/0x650
kernel_init+0x2c/0x148
ret_from_kernel_thread+0x5c/0x70
}
===
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190718051139.74787-4-aik@ozlabs.ru
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e40837afb9 ]
[Why]
These are needed to send back DRM vblank events in the case where VRR
is on. Without the interrupt enabled we're deferring the events into the
vblank queue and userspace is left waiting forever to get back the
events they need.
Found using igt@kms_vrr - the test fails immediately due to vblank
timeout.
[How]
Register them the same way we're handling it for DCN1.
This fixes igt@kms_vrr for DCN2.
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Reviewed-by: David Francis <David.Francis@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e5382701c3 ]
[Why]
The vm config will be clear to 0 when system enter S4. It will
cause hubbub didn't know how to fetch data when system resume.
The flip always pending because earliest_inuse_address and
request_address are different.
[How]
Reprogram VM config when system resume
Signed-off-by: Lewis Huang <Lewis.Huang@amd.com>
Reviewed-by: Jun Lei <Jun.Lei@amd.com>
Acked-by: Eric Yang <eric.yang2@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a463b26303 ]
[Why]
The math on deciding on how many
"frames to insert" sometimes sent us over the max refresh rate.
Also integer overflow can occur if we have high refresh rates.
[How]
Instead of clipping the frame duration such that it doesn’t go below the min,
just remove a frame from the number of frames to insert. +
Use unsigned long long for intermediate calculations to prevent
integer overflow.
Signed-off-by: Bayan Zabihiyan <bayan.zabihiyan@amd.com>
Reviewed-by: Aric Cyr <Aric.Cyr@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1cbcfc9751 ]
[Why]
When endpoint is at the boundary of a region, such as at 2^0=1
we find that the last segment has a sharp slope and some points
are clipped at the top.
[How]
If end point is 1, which is exactly at the 2^0 region boundary, we
need to program an additional region beyond this point.
Signed-off-by: Anthony Koo <Anthony.Koo@amd.com>
Reviewed-by: Aric Cyr <Aric.Cyr@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 720099603d ]
The MMC2 clock slices are currently not defined in V3s CCU driver, which
makes MMC2 not working.
Fix this issue.
Fixes: d0f11d14b0 ("clk: sunxi-ng: add support for V3s CCU")
Signed-off-by: Icenowy Zheng <icenowy@aosc.io>
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 568b9de48d ]
The code was setting the bit 21 of the CPCCR register to use a divider
of 2 for the "pll half" clock, and clearing the bit to use a divider
of 1.
This is the opposite of how this register field works: a cleared bit
means that the /2 divider is used, and a set bit means that the divider
is 1.
Restore the correct behaviour using the newly introduced .div_table
field.
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Link: https://lkml.kernel.org/r/20190701113606.4130-1-paul@crapouillou.net
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 340ff31ab0 ]
ipmi_thread() uses back-to-back schedule() to poll for command
completion which, on some machines, can push up CPU consumption and
heavily tax the scheduler locks leading to noticeable overall
performance degradation.
This was originally added so firmware updates through IPMI would
complete in a timely manner. But we can't kill the scheduler
locks for that one use case.
Instead, only run schedule() continuously in maintenance mode,
where firmware updates should run.
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a502b343eb ]
According to the following tab (coming from STMFX datasheet), updates
have to done in stmfx_pinconf_set function:
-"type" has to be set when "bias" is configured as "pull-up or pull-down"
-PIN_CONFIG_DRIVE_PUSH_PULL should only be used when gpio is configured as
output. There is so no need to check direction.
DIR | TYPE | PUPD | MFX GPIO configuration
----|------|------|---------------------------------------------------
1 | 1 | 1 | OUTPUT open drain with internal pull-up resistor
----|------|------|---------------------------------------------------
1 | 1 | 0 | OUTPUT open drain with internal pull-down resistor
----|------|------|---------------------------------------------------
1 | 0 | 0/1 | OUTPUT push pull no pull
----|------|------|---------------------------------------------------
0 | 1 | 1 | INPUT with internal pull-up resistor
----|------|------|---------------------------------------------------
0 | 1 | 0 | INPUT with internal pull-down resistor
----|------|------|---------------------------------------------------
0 | 0 | 1 | INPUT floating
----|------|------|---------------------------------------------------
0 | 0 | 0 | analog (GPIO not used, default setting)
Signed-off-by: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: Amelie Delaunay <amelie.delaunay@st.com>
Link: https://lore.kernel.org/r/1564053416-32192-1-git-send-email-amelie.delaunay@st.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0df3e42167 ]
When building with -Wsometimes-uninitialized, clang warns:
drivers/pci/hotplug/rpaphp_core.c:243:14: warning: variable 'fndit' is
used uninitialized whenever 'for' loop exits because its condition is
false [-Wsometimes-uninitialized]
for (j = 0; j < entries; j++) {
^~~~~~~~~~~
drivers/pci/hotplug/rpaphp_core.c:256:6: note: uninitialized use occurs
here
if (fndit)
^~~~~
drivers/pci/hotplug/rpaphp_core.c:243:14: note: remove the condition if
it is always true
for (j = 0; j < entries; j++) {
^~~~~~~~~~~
drivers/pci/hotplug/rpaphp_core.c:233:14: note: initialize the variable
'fndit' to silence this warning
int j, fndit;
^
= 0
fndit is only used to gate a sprintf call, which can be moved into the
loop to simplify the code and eliminate the local variable, which will
fix this warning.
Fixes: 2fcf3ae508 ("hotplug/drc-info: Add code to search ibm,drc-info property")
Suggested-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Acked-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Acked-by: Joel Savitz <jsavitz@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://github.com/ClangBuiltLinux/linux/issues/504
Link: https://lore.kernel.org/r/20190603221157.58502-1-natechancellor@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9b9c60bed5 ]
Initially, the TMU_ROOT clock was marked as critical, which automatically
made the AHB clock to stay always on. Since the TMU_ROOT clock is not
marked as critical anymore, following commit:
"clk: imx8mq: Remove CLK_IS_CRITICAL flag for IMX8MQ_CLK_TMU_ROOT"
all the clocks that derive from ipg_root clock (and implicitly ahb clock)
would also have to enable, along with their own gate, the AHB clock.
But considering that AHB is actually a bus that has to be always on, we mark
it as critical in the clock provider driver and then all the clocks that
derive from it can be controlled through the dedicated per IP gate which
follows after the ipg_root clock.
Signed-off-by: Abel Vesa <abel.vesa@nxp.com>
Tested-by: Daniel Baluta <daniel.baluta@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f3eb9b8f67 ]
In radeon_connector_set_property(), there is an if statement on line 743
to check whether connector->encoder is NULL:
if (connector->encoder)
When connector->encoder is NULL, it is used on line 755:
if (connector->encoder->crtc)
Thus, a possible null-pointer dereference may occur.
To fix this bug, connector->encoder is checked before being used.
This bug is found by a static analysis tool STCheck written by us.
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6f7fe9a93e ]
During kexec some adapters hit an EEH since they are not properly
shut down in the radeon_pci_shutdown() function. Adding
radeon_suspend_kms() fixes this issue.
Signed-off-by: KyleMahlkuch <kmahlkuc@linux.vnet.ibm.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d196bbbc28 ]
clang warns:
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_pp_smu.c:336:8:
warning: implicit conversion from enumeration type 'enum smu_clk_type'
to different enumeration type 'enum amd_pp_clock_type'
[-Wenum-conversion]
dc_to_smu_clock_type(clk_type),
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_pp_smu.c:421:14:
warning: implicit conversion from enumeration type 'enum
amd_pp_clock_type' to different enumeration type 'enum smu_clk_type'
[-Wenum-conversion]
dc_to_pp_clock_type(clk_type),
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
There are functions to properly convert between all of these types, use
them so there are no longer any warnings.
Fixes: a43913ea50 ("drm/amd/powerplay: add function get_clock_by_type_with_latency for navi10")
Fixes: e5e4e22391 ("drm/amd/powerplay: add interface to get clock by type with latency for display (v2)")
Link: https://github.com/ClangBuiltLinux/linux/issues/586
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e4c4073b01 ]
HW requires for caching to be unset for scanout BO
mappings when the BO placement is in GTT memory.
Usually the flag to unset is passed from user mode
but for FB mode this was missing.
v2:
Keep all BO placement logic in amdgpu_display_supported_domains
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Shirish S <shirish.s@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3389669ac5 ]
The mipi_dbi helper is missing a dependency on DRM_KMS_HELPER and putting
that in revealed this problem:
drivers/video/fbdev/Kconfig:12:error: recursive dependency detected!
drivers/video/fbdev/Kconfig:12: symbol FB is selected by DRM_KMS_FB_HELPER
drivers/gpu/drm/Kconfig:75: symbol DRM_KMS_FB_HELPER depends on DRM_KMS_HELPER
drivers/gpu/drm/Kconfig:69: symbol DRM_KMS_HELPER is selected by TINYDRM_MIPI_DBI
drivers/gpu/drm/tinydrm/Kconfig:11: symbol TINYDRM_MIPI_DBI is selected by TINYDRM_HX8357D
drivers/gpu/drm/tinydrm/Kconfig:15: symbol TINYDRM_HX8357D depends on BACKLIGHT_CLASS_DEVICE
drivers/video/backlight/Kconfig:144: symbol BACKLIGHT_CLASS_DEVICE is selected by FB_BACKLIGHT
drivers/video/fbdev/Kconfig:187: symbol FB_BACKLIGHT depends on FB
A symbol that selects DRM_KMS_HELPER can not depend on
BACKLIGHT_CLASS_DEVICE. The reason for this is that DRM_KMS_FB_HELPER
selects FB instead of depending on it.
The tinydrm drivers have somehow gotten away with depending on
BACKLIGHT_CLASS_DEVICE because DRM_TINYDRM selects DRM_KMS_HELPER and the
drivers depend on that symbol.
An audit shows that all DRM drivers that select DRM_KMS_HELPER and use
BACKLIGHT_CLASS_DEVICE, selects it:
DRM_TILCDC, DRM_GMA500, DRM_SHMOBILE, DRM_NOUVEAU, DRM_FSL_DCU,
DRM_I915, DRM_RADEON, DRM_AMDGPU, DRM_PARADE_PS8622
Documentation/kbuild/kconfig-language.txt has a note regarding select:
1. 'select should be used with care since it doesn't visit dependencies.'
This is not a problem since BACKLIGHT_CLASS_DEVICE doesn't have any
dependencies.
2. 'In general use select only for non-visible symbols'
BACKLIGHT_CLASS_DEVICE is user visible.
The real solution to this would be to have DRM_KMS_FB_HELPER depend on the
user visible symbol FB. That is a can of worms I'm not willing to tackle.
I fear that such a change will result in me handling difficult fallouts
for the next weeks. So I'm following DRM suite here.
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
Reviewed-by: David Lechner <david@lechnology.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190722104312.16184-7-noralf@tronnes.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 21ffcc94d5 ]
[Why]
DC configures the GSL group for the pipe when pipe_split is enabled
and we're switching flip types (buffered <-> immediate flip) on DCN2.
In order to record what GSL group the pipe is using DC stores it in
the pipe's stream_res. DM is not aware of this internal grouping, nor
is DC resource.
So when DM creates a dc_state context and passes it to DC the current
GSL group is lost - DM never knew about it in the first place.
After 3 immediate flips we run out of GSL groups and we're no longer
able to correctly perform *any* flip for multi-pipe scenarios.
[How]
The gsl_group needs to be copied to the new context.
DM has no insight into GSL grouping and could even potentially create
a brand new context without referencing current hardware state. So this
makes the most sense to have happen in DC.
There are two places where DC can apply a new context:
- dc_commit_state
- dc_commit_updates_for_stream
But what's shared between both of these is apply_ctx_for_surface.
This logic only matters for DCN2, so it can be placed in
dcn20_apply_ctx_for_surface. Before doing any locking (where the GSL
group is setup) we can copy over the GSL groups before committing the
new context.
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Reviewed-by: Hersen Wu <hersen.wu@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d68a745417 ]
[why]
As a fail-safe, in case 'set FEC_READY' DPCD write fails, a HW shadow
register should be cleared and the internal FEC stat should be set to
'not ready'. This is to make sure HW settings will be consistent with
FEC_READY state on the RX.
Signed-off-by: Nikola Cornij <nikola.cornij@amd.com>
Reviewed-by: Joshua Aberback <Joshua.Aberback@amd.com>
Acked-by: Chris Park <Chris.Park@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 18b401874a ]
[why]
dcn20_clk_mgr_construct was not initializing pp_smu, and PME call gets
filtered out by the null check
[how]
initialize pp_smu dcn20_clk_mgr_construct
Signed-off-by: Su Sung Chung <Su.Chung@amd.com>
Reviewed-by: Eric Yang <eric.yang2@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 88eac241a1 ]
[Why]
Specifically to one panel,
TCON is able to accept active video signal quickly, but
the Source Driver requires 2-3 frames of extra time.
It is a Panel issue since TCON needs to take care of
all Sink requirements including Source Driver. But in
this case it does not.
Customer is asking to add fixed T7 delay as panel
workaround.
[How]
Add monitor specific patch to add T7 delay
Signed-off-by: Anthony Koo <anthony.koo@amd.com>
Reviewed-by: Charlene Liu <Charlene.Liu@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f8c6bfc612 ]
The horizontal blanking periods are too short, as the values are
specified for a single LVDS channel. Since this panel is dual LVDS
they need to be doubled. With this change the panel reaches its
nominal vrefresh rate of 60Fps, instead of the 64Fps with the
current wrong blanking.
Philipp Zabel added:
The datasheet specifies 960 active clocks + 40/128/160 clocks blanking
on each of the two LVDS channels (min/typical/max), so doubled this is
now correct.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Link: https://patchwork.freedesktop.org/patch/msgid/1562764060.23869.12.camel@pengutronix.de
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 18d0952a83 ]
The issue we have is that the crc worker might fall behind. We've
tried to handle this by tracking both the earliest frame for which it
still needs to compute a crc, and the last one. Plus when the
crtc_state changes, we have a new work item, which are all run in
order due to the ordered workqueue we allocate for each vkms crtc.
Trouble is there's been a few small issues in the current code:
- we need to capture frame_end in the vblank hrtimer, not in the
worker. The worker might run much later, and then we generate a lot
of crc for which there's already a different worker queued up.
- frame number might be 0, so create a new crc_pending boolean to
track this without confusion.
- we need to atomically grab frame_start/end and clear it, so do that
all in one go. This is not going to create a new race, because if we
race with the hrtimer then our work will be re-run.
- only race that can happen is the following:
1. worker starts
2. hrtimer runs and updates frame_end
3. worker grabs frame_start/end, already reading the new frame_end,
and clears crc_pending
4. hrtimer calls queue_work()
5. worker completes
6. worker gets re-run, crc_pending is false
Explain this case a bit better by rewording the comment.
v2: Demote warning level output to debug when we fail to requeue, this
is expected under high load when the crc worker can't quite keep up.
Cc: Shayenne Moura <shayenneluzmoura@gmail.com>
Cc: Rodrigo Siqueira <rodrigosiqueiramelo@gmail.com>
Cc: Haneen Mohammed <hamohammed.sa@gmail.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Reviewed-by: Rodrigo Siqueira <rodrigosiqueiramelo@gmail.com>
Tested-by: Rodrigo Siqueira <rodrigosiqueiramelo@gmail.com>
Signed-off-by: Rodrigo Siqueira <rodrigosiqueiramelo@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190606222751.32567-2-daniel.vetter@ffwll.ch
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 71cddb7097 ]
Since the rpmsg_endpoint is created before probe is called, it's
possible that a host event is received during cros_ec_register, and
there would be some pending work in the host_event_work workqueue while
cros_ec_register is called.
If cros_ec_register fails, when the leftover work in host_event_work
run, the ec_dev from the drvdata of the rpdev could be already set to
NULL, causing kernel crash when trying to run cros_ec_get_next_event.
Fix this by creating the rpmsg_endpoint by ourself, and when
cros_ec_register fails (or on remove), destroy the endpoint first (to
make sure there's no more new calls to cros_ec_rpmsg_callback), and then
cancel all works in the host_event_work workqueue.
Cc: stable@vger.kernel.org
Fixes: 2de89fd989 ("platform/chrome: cros_ec: Add EC host command support using rpmsg")
Signed-off-by: Pi-Hsun Shih <pihsun@chromium.org>
Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9d4d0d06bb ]
mt7615 patch/n9/cr4 firmwares are available in mediatek folder in
linux-firmware repository. Because of this mt7615 won't work on regular
distributions like Ubuntu. Fix path definitions. Moreover remove useless
firmware name pointers and use definitions directly
Fixes: 04b8e65922 ("mt76: add mac80211 driver for MT7615 PCIe-based chipsets")
Cc: stable@vger.kernel.org
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c84a1372df ]
If the drives in a RAID0 are not all the same size, the array is
divided into zones.
The first zone covers all drives, to the size of the smallest.
The second zone covers all drives larger than the smallest, up to
the size of the second smallest - etc.
A change in Linux 3.14 unintentionally changed the layout for the
second and subsequent zones. All the correct data is still stored, but
each chunk may be assigned to a different device than in pre-3.14 kernels.
This can lead to data corruption.
It is not possible to determine what layout to use - it depends which
kernel the data was written by.
So we add a module parameter to allow the old (0) or new (1) layout to be
specified, and refused to assemble an affected array if that parameter is
not set.
Fixes: 20d0189b10 ("block: Introduce new bio_split()")
cc: stable@vger.kernel.org (3.14+)
Acked-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit c02d6a1613 upstream.
[Why]
When more than 2 displays are connected to the graphics card,
only the minimum memory clock is needed. However, when more
displays are connected, the minimum memory clock is not
sufficient enough to support the overwhelming bandwidth.
System will hang under this circumstance.
Also, the old code didn't address HBM cards, which has 2
pseudo channels. We need to add the HBM part here.
[How]
When graphics card connects to 2 or more displays,
switch to high memory clock. Also, choose memory
multiplier based on whether its regular DRAM or HBM.
Signed-off-by: Zhan Liu <zhan.liu@amd.com>
Reviewed-by: Roman Li <Roman.Li@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bb264220d9 upstream.
Laptops with AMD APU doesn't restore display backlight brightness after
system resume.
This issue started when DC was introduced.
Let's use BL_CORE_SUSPENDRESUME so the backlight core calls
update_status callback after system resume to restore the backlight
level.
Tested on Dell Inspiron 3180 (Stoney Ridge) and Dell Latitude 5495
(Raven Ridge).
Cc: <stable@vger.kernel.org>
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a016e2794f upstream.
There may be situations when a server negotiates SMB 2.1
protocol version or higher but responds to a CREATE request
with an oplock rather than a lease.
Currently the client doesn't handle such a case correctly:
when another CREATE comes in the server sends an oplock
break to the initial CREATE and the client doesn't send
an ack back due to a wrong caching level being set (READ
instead of RWH). Missing an oplock break ack makes the
server wait until the break times out which dramatically
increases the latency of the second CREATE.
Fix this by properly detecting oplocks when using SMB 2.1
protocol version and higher.
Cc: <stable@vger.kernel.org>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a71e2ac1f3 upstream.
The NACKF flag should be cleared in INTRIICNAKI interrupt processing as
description in HW manual.
This issue shows up quickly when PREEMPT_RT is applied and a device is
probed that is not plugged in (like a touchscreen controller). The result
is endless interrupts that halt system boot.
Fixes: 310c18a414 ("i2c: riic: add driver")
Cc: stable@vger.kernel.org
Reported-by: Chien Nguyen <chien.nguyen.eb@rvc.renesas.com>
Signed-off-by: Chris Brandt <chris.brandt@renesas.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 78887832e7 upstream.
add_early_randomness() is called by hwrng_register() when the
hardware is added. If this hardware and its module are present
at boot, and if there is no data available the boot hangs until
data are available and can't be interrupted.
For instance, in the case of virtio-rng, in some cases the host can be
not able to provide enough entropy for all the guests.
We can have two easy ways to reproduce the problem but they rely on
misconfiguration of the hypervisor or the egd daemon:
- if virtio-rng device is configured to connect to the egd daemon of the
host but when the virtio-rng driver asks for data the daemon is not
connected,
- if virtio-rng device is configured to connect to the egd daemon of the
host but the egd daemon doesn't provide data.
The guest kernel will hang at boot until the virtio-rng driver provides
enough data.
To avoid that, call rng_get_data() in non-blocking mode (wait=0)
from add_early_randomness().
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Fixes: d9e7972619 ("hwrng: add randomness to system from rng...")
Cc: <stable@vger.kernel.org>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6565c18209 upstream.
Quoted from
commit 3da40c7b08 ("ext4: only call ext4_truncate when size <= isize")
" At LSF we decided that if we truncate up from isize we shouldn't trim
fallocated blocks that were fallocated with KEEP_SIZE and are past the
new i_size. This patch fixes ext4 to do this. "
And generic/092 of fstest have covered this case for long time, however
is_quota_modification() didn't adjust based on that rule, so that in
below condition, we will lose to quota block change:
- fallocate blocks beyond EOF
- remount
- truncate(file_path, file_size)
Fix it.
Link: https://lore.kernel.org/r/20190911093650.35329-1-yuchao0@huawei.com
Fixes: 3da40c7b08 ("ext4: only call ext4_truncate when size <= isize")
CC: stable@vger.kernel.org
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c1e8220bd3 upstream.
If a program attempts to punch a hole on an inline data file, we need
to convert it to a normal file first.
This was detected using ext4/032 using the adv configuration. Simple
reproducer:
mke2fs -Fq -t ext4 -O inline_data /dev/vdc
mount /vdc
echo "" > /vdc/testfile
xfs_io -c 'truncate 33554432' /vdc/testfile
xfs_io -c 'fpunch 0 1048576' /vdc/testfile
umount /vdc
e2fsck -fy /dev/vdc
Cc: stable@vger.kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e3d550c2c4 upstream.
Really enable warning when CONFIG_EXT4_DEBUG is set and fix missing
first argument. This was introduced in commit ff95ec22cd ("ext4:
add warning to ext4_convert_unwritten_extents_endio") and splitting
extents inside endio would trigger it.
Fixes: ff95ec22cd ("ext4: add warning to ext4_convert_unwritten_extents_endio")
Signed-off-by: Rakesh Pandit <rakesh@tuxera.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b410f4eb01 upstream.
This patch solves warnings detected by setting W=1 when building.
Warnings type detected:
drivers/mtd/nand/raw/stm32_fmc2_nand.c: In function ‘stm32_fmc2_calc_timings’:
drivers/mtd/nand/raw/stm32_fmc2_nand.c:1417:23: warning: comparison is
always false due to limited range of data type [-Wtype-limits]
else if (tims->twait > FMC2_PMEM_PATT_TIMING_MASK)
Signed-off-by: Christophe Kerello <christophe.kerello@st.com>
Cc: stable@vger.kernel.org
Fixes: 2cd457f328 ("mtd: rawnand: stm32_fmc2: add STM32 FMC2 NAND flash controller driver")
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 383035211c upstream.
V1->V2: in handle_one_rcv_msg, if data_size > 2, set requeue to zero and
goto out instead of calling ipmi_free_msg.
Kosuke Tatsukawa <tatsu@ab.jp.nec.com>
In the source stack trace below, function set_need_watch tries to
take out the same si_lock that was taken earlier by ipmi_thread.
ipmi_thread() [drivers/char/ipmi/ipmi_si_intf.c:995]
smi_event_handler() [drivers/char/ipmi/ipmi_si_intf.c:765]
handle_transaction_done() [drivers/char/ipmi/ipmi_si_intf.c:555]
deliver_recv_msg() [drivers/char/ipmi/ipmi_si_intf.c:283]
ipmi_smi_msg_received() [drivers/char/ipmi/ipmi_msghandler.c:4503]
intf_err_seq() [drivers/char/ipmi/ipmi_msghandler.c:1149]
smi_remove_watch() [drivers/char/ipmi/ipmi_msghandler.c:999]
set_need_watch() [drivers/char/ipmi/ipmi_si_intf.c:1066]
Upstream commit e1891cffd4 adds code to
ipmi_smi_msg_received() to call smi_remove_watch() via intf_err_seq()
and this seems to be causing the deadlock.
commit e1891cffd4
Author: Corey Minyard <cminyard@mvista.com>
Date: Wed Oct 24 15:17:04 2018 -0500
ipmi: Make the smi watcher be disabled immediately when not needed
The fix is to put all messages in the queue and move the message
checking code out of ipmi_smi_msg_received and into handle_one_recv_msg,
which processes the message checking after ipmi_thread releases its
locks.
Additionally,Kosuke Tatsukawa <tatsu@ab.jp.nec.com> reported that
handle_new_recv_msgs calls ipmi_free_msg when handle_one_rcv_msg returns
zero, so that the call to ipmi_free_msg in handle_one_rcv_msg introduced
another panic when "ipmitool sensor list" was run in a loop. He
submitted this part of the patch.
+free_msg:
+ requeue = 0;
+ goto out;
Reported by: Osamu Samukawa <osa-samukawa@tg.jp.nec.com>
Characterized by: Kosuke Tatsukawa <tatsu@ab.jp.nec.com>
Signed-off-by: Tony Camuso <tcamuso@redhat.com>
Fixes: e1891cffd4 ("ipmi: Make the smi watcher be disabled immediately when not needed")
Cc: stable@vger.kernel.org # 5.1
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 40144e49ff upstream.
Hole puching currently evicts pages from page cache and then goes on to
remove blocks from the inode. This happens under both XFS_IOLOCK_EXCL
and XFS_MMAPLOCK_EXCL which provides appropriate serialization with
racing reads or page faults. However there is currently nothing that
prevents readahead triggered by fadvise() or madvise() from racing with
the hole punch and instantiating page cache page after hole punching has
evicted page cache in xfs_flush_unmap_range() but before it has removed
blocks from the inode. This page cache page will be mapping soon to be
freed block and that can lead to returning stale data to userspace or
even filesystem corruption.
Fix the problem by protecting handling of readahead requests by
XFS_IOLOCK_SHARED similarly as we protect reads.
CC: stable@vger.kernel.org
Link: https://lore.kernel.org/linux-fsdevel/CAOQ4uxjQNmxqmtA_VbYW0Su9rKRk2zobJmahcyeaEVOFKVQ5dw@mail.gmail.com/
Reported-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 692fe62433 upstream.
Currently handling of MADV_WILLNEED hint calls directly into readahead
code. Handle it by calling vfs_fadvise() instead so that filesystem can
use its ->fadvise() callback to acquire necessary locks or otherwise
prepare for the request.
Suggested-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Boaz Harrosh <boazh@netapp.com>
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8619e5bdee upstream.
syzbot found that a thread can stall for minutes inside read_mem() or
write_mem() after that thread was killed by SIGKILL [1]. Reading from
iomem areas of /dev/mem can be slow, depending on the hardware.
While reading 2GB at one read() is legal, delaying termination of killed
thread for minutes is bad. Thus, allow reading/writing /dev/mem and
/dev/kmem to be preemptible and killable.
[ 1335.912419][T20577] read_mem: sz=4096 count=2134565632
[ 1335.943194][T20577] read_mem: sz=4096 count=2134561536
[ 1335.978280][T20577] read_mem: sz=4096 count=2134557440
[ 1336.011147][T20577] read_mem: sz=4096 count=2134553344
[ 1336.041897][T20577] read_mem: sz=4096 count=2134549248
Theoretically, reading/writing /dev/mem and /dev/kmem can become
"interruptible". But this patch chose "killable". Future patch will make
them "interruptible" so that we can revert to "killable" if some program
regressed.
[1] https://syzkaller.appspot.com/bug?id=a0e3436829698d5824231251fad9d8e998f94f5e
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: stable <stable@vger.kernel.org>
Reported-by: syzbot <syzbot+8ab2d0f39fb79fe6ca40@syzkaller.appspotmail.com>
Link: https://lore.kernel.org/r/1566825205-10703-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c1d3ad84ea upstream.
Currently frame registrations are not purged, even when changing the
interface type. This can lead to potentially weird situations where
frames possibly not allowed on a given interface type remain registered
due to the type switching happening after registration.
The kernel currently relies on userspace apps to actually purge the
registrations themselves, this is not something that the kernel should
rely on.
Add a call to cfg80211_mlme_purge_registrations() to forcefully remove
any registrations left over prior to switching the iftype.
Cc: stable@vger.kernel.org
Signed-off-by: Denis Kenzior <denkenz@gmail.com>
Link: https://lore.kernel.org/r/20190828211110.15005-1-denkenz@gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 480523feae upstream.
Since commit 4ad23a9764 ("MD: use per-cpu counter for
writes_pending"), set_in_sync() is substantially more expensive: it
can wait for a full RCU grace period which can be 10s of milliseconds.
So we should only call it when the cost is justified.
md_check_recovery() currently calls set_in_sync() every time it finds
anything to do (on non-external active arrays). For an array
performing resync or recovery, this will be quite often.
Each call will introduce a delay to the md thread, which can noticeable
affect IO submission latency.
In md_check_recovery() we only need to call set_in_sync() if
'safemode' was non-zero at entry, meaning that there has been not
recent IO. So we save this "safemode was nonzero" state, and only
call set_in_sync() if it was non-zero.
This measurably reduces mean and maximum IO submission latency during
resync/recovery.
Reported-and-tested-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Fixes: 4ad23a9764 ("MD: use per-cpu counter for writes_pending")
Cc: stable@vger.kernel.org (v4.12+)
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9d4b45d6af upstream.
Until revalidate_disk() has completed, the size of a new md array will
appear to be zero.
So we shouldn't report, through array_state, that the array is active
until that time.
udev rules check array_state to see if the array is ready. As soon as
it appear to be zero, fsck can be run. If it find the size to be
zero, it will fail.
So add a new flag to provide an interlock between do_md_run() and
array_state_show(). This flag is set while do_md_run() is active and
it prevents array_state_show() from reporting that the array is
active.
Before do_md_run() is called, ->pers will be NULL so array is
definitely not active.
After do_md_run() is called, revalidate_disk() will have run and the
array will be completely ready.
We also move various sysfs_notify*() calls out of md_run() into
do_md_run() after MD_NOT_READY is cleared. This ensure the
information is ready before the notification is sent.
Prior to v4.12, array_state_show() was called with the
mddev->reconfig_mutex held, which provided exclusion with do_md_run().
Note that MD_NOT_READY cleared twice. This is deliberate to cover
both success and error paths with minimal noise.
Fixes: b7b17c9b67 ("md: remove mddev_lock() from md_attr_show()")
Cc: stable@vger.kernel.org (v4.12++)
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 143f6e733b upstream.
7471fb77ce ("md/raid6: Fix anomily when recovering a single device in
RAID6.") avoids rereading P when it can be computed from other members.
However, this misses the chance to re-write the right data to P. This
patch sets R5_ReadError if the re-read fails.
Also, when re-read is skipped, we also missed the chance to reset
rdev->read_errors to 0. It can fail the disk when there are many read
errors on P member disk (other disks don't have read error)
V2: upper layer read request don't read parity/Q data. So there is no
need to consider such situation.
This is Reported-by: kbuild test robot <lkp@intel.com>
Fixes: 7471fb77ce ("md/raid6: Fix anomily when recovering a single device in RAID6.")
Cc: <stable@vger.kernel.org> #4.4+
Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 57b3006492 upstream.
My assumption in commit b53548f9d9 ("spi: pxa2xx: Remove LPSS private
register restoring during resume") that Intel Lynxpoint and compatible
based chipsets may not need LPSS private registers saving and restoring
over suspend/resume cycle turned out to be false on Intel Broadwell.
Curtis Malainey sent a patch bringing above change back and reported the
LPSS SPI Chip Select control was lost over suspend/resume cycle on
Broadwell machine.
Instead of reverting above commit lets add LPSS private register
saving/restoring also for all LPSS SPI, I2C and UART controllers on
Lynxpoint and compatible chipset to make sure context is not lost in
case nothing else preserves it like firmware or if LPSS is always on.
Fixes: b53548f9d9 ("spi: pxa2xx: Remove LPSS private register restoring during resume")
Reported-by: Curtis Malainey <cujomalainey@chromium.org>
Tested-by: Curtis Malainey <cujomalainey@chromium.org>
Cc: 5.0+ <stable@vger.kernel.org> # 5.0+
Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5f1bc39979 upstream.
The GSS Message Integrity Check data for krb5i may lie partially in the XDR
reply buffer's pages and tail. If so, we try to copy the entire MIC into
free space in the tail. But as the estimations of the slack space required
for authentication and verification have improved there may be less free
space in the tail to complete this copy -- see commit 2c94b8eca1
("SUNRPC: Use au_rslack when computing reply buffer size"). In fact, there
may only be room in the tail for a single copy of the MIC, and not part of
the MIC and then another complete copy.
The real world failure reported is that `ls` of a directory on NFS may
sometimes return -EIO, which can be traced back to xdr_buf_read_netobj()
failing to find available free space in the tail to copy the MIC.
Fix this by checking for the case of the MIC crossing the boundaries of
head, pages, and tail. If so, shift the buffer until the MIC is contained
completely within the pages or tail. This allows the remainder of the
function to create a sub buffer that directly address the complete MIC.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Cc: stable@vger.kernel.org # v5.1
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fab2735955 upstream.
[BUG]
With v5.3 kernel, we can't convert to SINGLE profile:
# btrfs balance start -f -dconvert=single $mnt
ERROR: error during balancing '/mnt/btrfs': Invalid argument
# dmesg -t | tail
validate_convert_profile: data profile=0x1000000000000 allowed=0x20 is_valid=1 final=0x1000000000000 ret=1
BTRFS error (device dm-3): balance: invalid convert data profile single
[CAUSE]
With the extra debug output added, it shows that the @allowed bit is
lacking the special in-memory only SINGLE profile bit.
Thus we fail at that (profile & ~allowed) check.
This regression is caused by commit 081db89b13 ("btrfs: use raid_attr
to get allowed profiles for balance conversion") and the fact that we
don't use any bit to indicate SINGLE profile on-disk, but uses special
in-memory only bit to help distinguish different profiles.
[FIX]
Add that BTRFS_AVAIL_ALLOC_BIT_SINGLE to @allowed, so the code should be
the same as it was and fix the regression.
Reported-by: Chris Murphy <lists@colorremedies.com>
Fixes: 081db89b13 ("btrfs: use raid_attr to get allowed profiles for balance conversion")
CC: stable@vger.kernel.org # 5.3+
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 13fc1d271a upstream.
There is a race between setting up a qgroup rescan worker and completing
a qgroup rescan worker that can lead to callers of the qgroup rescan wait
ioctl to either not wait for the rescan worker to complete or to hang
forever due to missing wake ups. The following diagram shows a sequence
of steps that illustrates the race.
CPU 1 CPU 2 CPU 3
btrfs_ioctl_quota_rescan()
btrfs_qgroup_rescan()
qgroup_rescan_init()
mutex_lock(&fs_info->qgroup_rescan_lock)
spin_lock(&fs_info->qgroup_lock)
fs_info->qgroup_flags |=
BTRFS_QGROUP_STATUS_FLAG_RESCAN
init_completion(
&fs_info->qgroup_rescan_completion)
fs_info->qgroup_rescan_running = true
mutex_unlock(&fs_info->qgroup_rescan_lock)
spin_unlock(&fs_info->qgroup_lock)
btrfs_init_work()
--> starts the worker
btrfs_qgroup_rescan_worker()
mutex_lock(&fs_info->qgroup_rescan_lock)
fs_info->qgroup_flags &=
~BTRFS_QGROUP_STATUS_FLAG_RESCAN
mutex_unlock(&fs_info->qgroup_rescan_lock)
starts transaction, updates qgroup status
item, etc
btrfs_ioctl_quota_rescan()
btrfs_qgroup_rescan()
qgroup_rescan_init()
mutex_lock(&fs_info->qgroup_rescan_lock)
spin_lock(&fs_info->qgroup_lock)
fs_info->qgroup_flags |=
BTRFS_QGROUP_STATUS_FLAG_RESCAN
init_completion(
&fs_info->qgroup_rescan_completion)
fs_info->qgroup_rescan_running = true
mutex_unlock(&fs_info->qgroup_rescan_lock)
spin_unlock(&fs_info->qgroup_lock)
btrfs_init_work()
--> starts another worker
mutex_lock(&fs_info->qgroup_rescan_lock)
fs_info->qgroup_rescan_running = false
mutex_unlock(&fs_info->qgroup_rescan_lock)
complete_all(&fs_info->qgroup_rescan_completion)
Before the rescan worker started by the task at CPU 3 completes, if
another task calls btrfs_ioctl_quota_rescan(), it will get -EINPROGRESS
because the flag BTRFS_QGROUP_STATUS_FLAG_RESCAN is set at
fs_info->qgroup_flags, which is expected and correct behaviour.
However if other task calls btrfs_ioctl_quota_rescan_wait() before the
rescan worker started by the task at CPU 3 completes, it will return
immediately without waiting for the new rescan worker to complete,
because fs_info->qgroup_rescan_running is set to false by CPU 2.
This race is making test case btrfs/171 (from fstests) to fail often:
btrfs/171 9s ... - output mismatch (see /home/fdmanana/git/hub/xfstests/results//btrfs/171.out.bad)
# --- tests/btrfs/171.out 2018-09-16 21:30:48.505104287 +0100
# +++ /home/fdmanana/git/hub/xfstests/results//btrfs/171.out.bad 2019-09-19 02:01:36.938486039 +0100
# @@ -1,2 +1,3 @@
# QA output created by 171
# +ERROR: quota rescan failed: Operation now in progress
# Silence is golden
# ...
# (Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/btrfs/171.out /home/fdmanana/git/hub/xfstests/results//btrfs/171.out.bad' to see the entire diff)
That is because the test calls the btrfs-progs commands "qgroup quota
rescan -w", "qgroup assign" and "qgroup remove" in a sequence that makes
calls to the rescan start ioctl fail with -EINPROGRESS (note the "btrfs"
commands 'qgroup assign' and 'qgroup remove' often call the rescan start
ioctl after calling the qgroup assign ioctl,
btrfs_ioctl_qgroup_assign()), since previous waits didn't actually wait
for a rescan worker to complete.
Another problem the race can cause is missing wake ups for waiters,
since the call to complete_all() happens outside a critical section and
after clearing the flag BTRFS_QGROUP_STATUS_FLAG_RESCAN. In the sequence
diagram above, if we have a waiter for the first rescan task (executed
by CPU 2), then fs_info->qgroup_rescan_completion.wait is not empty, and
if after the rescan worker clears BTRFS_QGROUP_STATUS_FLAG_RESCAN and
before it calls complete_all() against
fs_info->qgroup_rescan_completion, the task at CPU 3 calls
init_completion() against fs_info->qgroup_rescan_completion which
re-initilizes its wait queue to an empty queue, therefore causing the
rescan worker at CPU 2 to call complete_all() against an empty queue,
never waking up the task waiting for that rescan worker.
Fix this by clearing BTRFS_QGROUP_STATUS_FLAG_RESCAN and setting
fs_info->qgroup_rescan_running to false in the same critical section,
delimited by the mutex fs_info->qgroup_rescan_lock, as well as doing the
call to complete_all() in that same critical section. This gives the
protection needed to avoid rescan wait ioctl callers not waiting for a
running rescan worker and the lost wake ups problem, since setting that
rescan flag and boolean as well as initializing the wait queue is done
already in a critical section delimited by that mutex (at
qgroup_rescan_init()).
Fixes: 57254b6ebc ("Btrfs: add ioctl to wait for qgroup rescan completion")
Fixes: d2c609b834 ("btrfs: properly track when rescan worker is running")
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d4e204948f upstream.
[BUG]
The following script can cause btrfs qgroup data space leak:
mkfs.btrfs -f $dev
mount $dev -o nospace_cache $mnt
btrfs subv create $mnt/subv
btrfs quota en $mnt
btrfs quota rescan -w $mnt
btrfs qgroup limit 128m $mnt/subv
for (( i = 0; i < 3; i++)); do
# Create 3 64M holes for latter fallocate to fail
truncate -s 192m $mnt/subv/file
xfs_io -c "pwrite 64m 4k" $mnt/subv/file > /dev/null
xfs_io -c "pwrite 128m 4k" $mnt/subv/file > /dev/null
sync
# it's supposed to fail, and each failure will leak at least 64M
# data space
xfs_io -f -c "falloc 0 192m" $mnt/subv/file &> /dev/null
rm $mnt/subv/file
sync
done
# Shouldn't fail after we removed the file
xfs_io -f -c "falloc 0 64m" $mnt/subv/file
[CAUSE]
Btrfs qgroup data reserve code allow multiple reservations to happen on
a single extent_changeset:
E.g:
btrfs_qgroup_reserve_data(inode, &data_reserved, 0, SZ_1M);
btrfs_qgroup_reserve_data(inode, &data_reserved, SZ_1M, SZ_2M);
btrfs_qgroup_reserve_data(inode, &data_reserved, 0, SZ_4M);
Btrfs qgroup code has its internal tracking to make sure we don't
double-reserve in above example.
The only pattern utilizing this feature is in the main while loop of
btrfs_fallocate() function.
However btrfs_qgroup_reserve_data()'s error handling has a bug in that
on error it clears all ranges in the io_tree with EXTENT_QGROUP_RESERVED
flag but doesn't free previously reserved bytes.
This bug has a two fold effect:
- Clearing EXTENT_QGROUP_RESERVED ranges
This is the correct behavior, but it prevents
btrfs_qgroup_check_reserved_leak() to catch the leakage as the
detector is purely EXTENT_QGROUP_RESERVED flag based.
- Leak the previously reserved data bytes.
The bug manifests when N calls to btrfs_qgroup_reserve_data are made and
the last one fails, leaking space reserved in the previous ones.
[FIX]
Also free previously reserved data bytes when btrfs_qgroup_reserve_data
fails.
Fixes: 5247255370 ("btrfs: qgroup: Introduce btrfs_qgroup_reserve_data function")
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bab32fc069 upstream.
[BUG]
Under the following case with qgroup enabled, if some error happened
after we have reserved delalloc space, then in error handling path, we
could cause qgroup data space leakage:
From btrfs_truncate_block() in inode.c:
ret = btrfs_delalloc_reserve_space(inode, &data_reserved,
block_start, blocksize);
if (ret)
goto out;
again:
page = find_or_create_page(mapping, index, mask);
if (!page) {
btrfs_delalloc_release_space(inode, data_reserved,
block_start, blocksize, true);
btrfs_delalloc_release_extents(BTRFS_I(inode), blocksize, true);
ret = -ENOMEM;
goto out;
}
[CAUSE]
In the above case, btrfs_delalloc_reserve_space() will call
btrfs_qgroup_reserve_data() and mark the io_tree range with
EXTENT_QGROUP_RESERVED flag.
In the error handling path, we have the following call stack:
btrfs_delalloc_release_space()
|- btrfs_free_reserved_data_space()
|- btrsf_qgroup_free_data()
|- __btrfs_qgroup_release_data(reserved=@reserved, free=1)
|- qgroup_free_reserved_data(reserved=@reserved)
|- clear_record_extent_bits();
|- freed += changeset.bytes_changed;
However due to a completion bug, qgroup_free_reserved_data() will clear
EXTENT_QGROUP_RESERVED flag in BTRFS_I(inode)->io_failure_tree, other
than the correct BTRFS_I(inode)->io_tree.
Since io_failure_tree is never marked with that flag,
btrfs_qgroup_free_data() will not free any data reserved space at all,
causing a leakage.
This type of error handling can only be triggered by errors outside of
qgroup code. So EDQUOT error from qgroup can't trigger it.
[FIX]
Fix the wrong target io_tree.
Reported-by: Josef Bacik <josef@toxicpanda.com>
Fixes: bc42bda223 ("btrfs: qgroup: Fix qgroup reserved space underflow by only freeing reserved ranges")
CC: stable@vger.kernel.org # 4.14+
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit eb5b64f142 upstream.
Before, if a eb failed to write out, we would end up triggering a
BUG_ON(). As of f4340622e0 ("btrfs: extent_io: Move the BUG_ON() in
flush_write_bio() one level up"), we no longer BUG_ON(), so we should
make life consistent and add back the unwritten bytes to
dirty_metadata_bytes.
Fixes: f4340622e0 ("btrfs: extent_io: Move the BUG_ON() in flush_write_bio() one level up")
CC: stable@vger.kernel.org # 5.2+
Reviewed-by: Filipe Manana <fdmanana@kernel.org>
Signed-off-by: Dennis Zhou <dennis@kernel.org>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6af112b11a upstream.
When doing any form of incremental send the parent and the child trees
need to be compared via btrfs_compare_trees. This can result in long
loop chains without ever relinquishing the CPU. This causes softlockup
detector to trigger when comparing trees with a lot of items. Example
report:
watchdog: BUG: soft lockup - CPU#0 stuck for 24s! [snapperd:16153]
CPU: 0 PID: 16153 Comm: snapperd Not tainted 5.2.9-1-default #1 openSUSE Tumbleweed (unreleased)
Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
pstate: 40000005 (nZcv daif -PAN -UAO)
pc : __ll_sc_arch_atomic_sub_return+0x14/0x20
lr : btrfs_release_extent_buffer_pages+0xe0/0x1e8 [btrfs]
sp : ffff00001273b7e0
Call trace:
__ll_sc_arch_atomic_sub_return+0x14/0x20
release_extent_buffer+0xdc/0x120 [btrfs]
free_extent_buffer.part.0+0xb0/0x118 [btrfs]
free_extent_buffer+0x24/0x30 [btrfs]
btrfs_release_path+0x4c/0xa0 [btrfs]
btrfs_free_path.part.0+0x20/0x40 [btrfs]
btrfs_free_path+0x24/0x30 [btrfs]
get_inode_info+0xa8/0xf8 [btrfs]
finish_inode_if_needed+0xe0/0x6d8 [btrfs]
changed_cb+0x9c/0x410 [btrfs]
btrfs_compare_trees+0x284/0x648 [btrfs]
send_subvol+0x33c/0x520 [btrfs]
btrfs_ioctl_send+0x8a0/0xaf0 [btrfs]
btrfs_ioctl+0x199c/0x2288 [btrfs]
do_vfs_ioctl+0x4b0/0x820
ksys_ioctl+0x84/0xb8
__arm64_sys_ioctl+0x28/0x38
el0_svc_common.constprop.0+0x7c/0x188
el0_svc_handler+0x34/0x90
el0_svc+0x8/0xc
Fix this by adding a call to cond_resched at the beginning of the main
loop in btrfs_compare_trees.
Fixes: 7069830a9e ("Btrfs: add btrfs_compare_trees function")
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3acd48507d upstream.
Various notifications of type "BUG kmalloc-4096 () : Redzone
overwritten" have been observed recently in various parts of the kernel.
After some time, it has been made a relation with the use of BTRFS
filesystem and with SLUB_DEBUG turned on.
[ 22.809700] BUG kmalloc-4096 (Tainted: G W ): Redzone overwritten
[ 22.810286] INFO: 0xbe1a5921-0xfbfc06cd. First byte 0x0 instead of 0xcc
[ 22.810866] INFO: Allocated in __load_free_space_cache+0x588/0x780 [btrfs] age=22 cpu=0 pid=224
[ 22.811193] __slab_alloc.constprop.26+0x44/0x70
[ 22.811345] kmem_cache_alloc_trace+0xf0/0x2ec
[ 22.811588] __load_free_space_cache+0x588/0x780 [btrfs]
[ 22.811848] load_free_space_cache+0xf4/0x1b0 [btrfs]
[ 22.812090] cache_block_group+0x1d0/0x3d0 [btrfs]
[ 22.812321] find_free_extent+0x680/0x12a4 [btrfs]
[ 22.812549] btrfs_reserve_extent+0xec/0x220 [btrfs]
[ 22.812785] btrfs_alloc_tree_block+0x178/0x5f4 [btrfs]
[ 22.813032] __btrfs_cow_block+0x150/0x5d4 [btrfs]
[ 22.813262] btrfs_cow_block+0x194/0x298 [btrfs]
[ 22.813484] commit_cowonly_roots+0x44/0x294 [btrfs]
[ 22.813718] btrfs_commit_transaction+0x63c/0xc0c [btrfs]
[ 22.813973] close_ctree+0xf8/0x2a4 [btrfs]
[ 22.814107] generic_shutdown_super+0x80/0x110
[ 22.814250] kill_anon_super+0x18/0x30
[ 22.814437] btrfs_kill_super+0x18/0x90 [btrfs]
[ 22.814590] INFO: Freed in proc_cgroup_show+0xc0/0x248 age=41 cpu=0 pid=83
[ 22.814841] proc_cgroup_show+0xc0/0x248
[ 22.814967] proc_single_show+0x54/0x98
[ 22.815086] seq_read+0x278/0x45c
[ 22.815190] __vfs_read+0x28/0x17c
[ 22.815289] vfs_read+0xa8/0x14c
[ 22.815381] ksys_read+0x50/0x94
[ 22.815475] ret_from_syscall+0x0/0x38
Commit 69d2480456 ("btrfs: use copy_page for copying pages instead of
memcpy") changed the way bitmap blocks are copied. But allthough bitmaps
have the size of a page, they were allocated with kzalloc().
Most of the time, kzalloc() allocates aligned blocks of memory, so
copy_page() can be used. But when some debug options like SLAB_DEBUG are
activated, kzalloc() may return unaligned pointer.
On powerpc, memcpy(), copy_page() and other copying functions use
'dcbz' instruction which provides an entire zeroed cacheline to avoid
memory read when the intention is to overwrite a full line. Functions
like memcpy() are writen to care about partial cachelines at the start
and end of the destination, but copy_page() assumes it gets pages. As
pages are naturally cache aligned, copy_page() doesn't care about
partial lines. This means that when copy_page() is called with a
misaligned pointer, a few leading bytes are zeroed.
To fix it, allocate bitmaps through kmem_cache instead of using kzalloc()
The cache pool is created with PAGE_SIZE alignment constraint.
Reported-by: Erhard F. <erhard_f@mailbox.org>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204371
Fixes: 69d2480456 ("btrfs: use copy_page for copying pages instead of memcpy")
Cc: stable@vger.kernel.org # 4.19+
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: David Sterba <dsterba@suse.com>
[ rename to btrfs_free_space_bitmap ]
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5c2e9f346b upstream.
When filtering xattr list for reading, presence of trusted xattr
results in a security audit log. However, if there is other content
no errno will be set, and if there isn't, the errno will be -ENODATA
and not -EPERM as is usually associated with a lack of capability.
The check does not block the request to list the xattrs present.
Switch to ns_capable_noaudit to reflect a more appropriate check.
Signed-off-by: Mark Salyzyn <salyzyn@android.com>
Cc: linux-security-module@vger.kernel.org
Cc: kernel-team@android.com
Cc: stable@vger.kernel.org # v3.18+
Fixes: a082c6f680 ("ovl: filter trusted xattr for non-admin")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 96d9f7ed00 upstream.
An earlier patch "CIFS: fix deadlock in cached root handling"
did not completely address the deadlock in open_shroot. This
patch addresses the deadlock.
In testing the recent patch:
smb3: improve handling of share deleted (and share recreated)
we were able to reproduce the open_shroot deadlock to one
of the target servers in unmount in a delete share scenario.
Fixes: 7e5a70ad88 ("CIFS: fix deadlock in cached root handling")
This is version 2 of this patch. An earlier version of this
patch "smb3: fix unmount hang in open_shroot" had a problem
found by Dan.
Reported-by: kbuild test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Suggested-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Aurelien Aptel <aaptel@suse.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3e7a02d478 upstream.
In some cases to work around server bugs or performance
problems it can be helpful to be able to disable requesting
SMB2.1/SMB3 leases on a particular mount (not to all servers
and all shares we are mounted to). Add new mount parm
"nolease" which turns off requesting leases on directory
or file opens. Currently the only way to disable leases is
globally through a module load parameter. This is more
granular.
Suggested-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8d6996630c upstream.
We got a null pointer deference BUG_ON in blk_mq_rq_timed_out()
as following:
[ 108.825472] BUG: kernel NULL pointer dereference, address: 0000000000000040
[ 108.827059] PGD 0 P4D 0
[ 108.827313] Oops: 0000 [#1] SMP PTI
[ 108.827657] CPU: 6 PID: 198 Comm: kworker/6:1H Not tainted 5.3.0-rc8+ #431
[ 108.829503] Workqueue: kblockd blk_mq_timeout_work
[ 108.829913] RIP: 0010:blk_mq_check_expired+0x258/0x330
[ 108.838191] Call Trace:
[ 108.838406] bt_iter+0x74/0x80
[ 108.838665] blk_mq_queue_tag_busy_iter+0x204/0x450
[ 108.839074] ? __switch_to_asm+0x34/0x70
[ 108.839405] ? blk_mq_stop_hw_queue+0x40/0x40
[ 108.839823] ? blk_mq_stop_hw_queue+0x40/0x40
[ 108.840273] ? syscall_return_via_sysret+0xf/0x7f
[ 108.840732] blk_mq_timeout_work+0x74/0x200
[ 108.841151] process_one_work+0x297/0x680
[ 108.841550] worker_thread+0x29c/0x6f0
[ 108.841926] ? rescuer_thread+0x580/0x580
[ 108.842344] kthread+0x16a/0x1a0
[ 108.842666] ? kthread_flush_work+0x170/0x170
[ 108.843100] ret_from_fork+0x35/0x40
The bug is caused by the race between timeout handle and completion for
flush request.
When timeout handle function blk_mq_rq_timed_out() try to read
'req->q->mq_ops', the 'req' have completed and reinitiated by next
flush request, which would call blk_rq_init() to clear 'req' as 0.
After commit 12f5b93145 ("blk-mq: Remove generation seqeunce"),
normal requests lifetime are protected by refcount. Until 'rq->ref'
drop to zero, the request can really be free. Thus, these requests
cannot been reused before timeout handle finish.
However, flush request has defined .end_io and rq->end_io() is still
called even if 'rq->ref' doesn't drop to zero. After that, the 'flush_rq'
can be reused by the next flush request handle, resulting in null
pointer deference BUG ON.
We fix this problem by covering flush request with 'rq->ref'.
If the refcount is not zero, flush_end_io() return and wait the
last holder recall it. To record the request status, we add a new
entry 'rq_status', which will be used in flush_end_io().
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: stable@vger.kernel.org # v4.18+
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-------
v2:
- move rq_status from struct request to struct blk_flush_queue
v3:
- remove unnecessary '{}' pair.
v4:
- let spinlock to protect 'fq->rq_status'
v5:
- move rq_status after flush_running_idx member of struct blk_flush_queue
Signed-off-by: Jens Axboe <axboe@kernel.dk>
commit cb8acabbe3 upstream.
Commit 7211aef86f ("block: mq-deadline: Fix write completion
handling") added a call to blk_mq_sched_mark_restart_hctx() in
dd_dispatch_request() to make sure that write request dispatching does
not stall when all target zones are locked. This fix left a subtle race
when a write completion happens during a dispatch execution on another
CPU:
CPU 0: Dispatch CPU1: write completion
dd_dispatch_request()
lock(&dd->lock);
...
lock(&dd->zone_lock); dd_finish_request()
rq = find request lock(&dd->zone_lock);
unlock(&dd->zone_lock);
zone write unlock
unlock(&dd->zone_lock);
...
__blk_mq_free_request
check restart flag (not set)
-> queue not run
...
if (!rq && have writes)
blk_mq_sched_mark_restart_hctx()
unlock(&dd->lock)
Since the dispatch context finishes after the write request completion
handling, marking the queue as needing a restart is not seen from
__blk_mq_free_request() and blk_mq_sched_restart() not executed leading
to the dispatch stall under 100% write workloads.
Fix this by moving the call to blk_mq_sched_mark_restart_hctx() from
dd_dispatch_request() into dd_finish_request() under the zone lock to
ensure full mutual exclusion between write request dispatch selection
and zone unlock on write request completion.
Fixes: 7211aef86f ("block: mq-deadline: Fix write completion handling")
Cc: stable@vger.kernel.org
Reported-by: Hans Holmberg <Hans.Holmberg@wdc.com>
Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a7542b8760 upstream.
While testing VF spawn/destroy the following panic occurred.
BUG: unable to handle kernel NULL pointer dereference at 0000000000000029
[...]
Workqueue: i40e i40e_service_task [i40e]
RIP: 0010:i40e_sync_vsi_filters+0x6fd/0xc60 [i40e]
[...]
Call Trace:
? __switch_to_asm+0x35/0x70
? __switch_to_asm+0x41/0x70
? __switch_to_asm+0x35/0x70
? _cond_resched+0x15/0x30
i40e_sync_filters_subtask+0x56/0x70 [i40e]
i40e_service_task+0x382/0x11b0 [i40e]
? __switch_to_asm+0x41/0x70
? __switch_to_asm+0x41/0x70
process_one_work+0x1a7/0x3b0
worker_thread+0x30/0x390
? create_worker+0x1a0/0x1a0
kthread+0x112/0x130
? kthread_bind+0x30/0x30
ret_from_fork+0x35/0x40
Investigation revealed a race where pf->vf[vsi->vf_id].trusted may get
accessed by the watchdog via i40e_sync_filters_subtask() although
i40e_free_vfs() already free'd pf->vf.
To avoid this the call to i40e_sync_vsi_filters() in
i40e_sync_filters_subtask() needs to be guarded by __I40E_VF_DISABLE,
which is also used by i40e_free_vfs().
Note: put the __I40E_VF_DISABLE check after the
__I40E_MACVLAN_SYNC_PENDING check as the latter is more likely to
trigger.
CC: stable@vger.kernel.org
Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6be6c04bcc upstream.
The tlv targets such as WCN3990 send more data in the chan info event, which is
not sent by the non tlv targets. There is a minimum size check in the wmi event
for non-tlv targets and hence we cannot update the common channel info
structure as it was done in commit 13104929d2 ("ath10k: fill the channel
survey results for WCN3990 correctly"). This broke channel survey results on
10.x firmware versions.
If the common channel info structure is updated, the size check for chan info
event for non-tlv targets will fail and return -EPROTO and we see the below
error messages
ath10k_pci 0000:01:00.0: failed to parse chan info event: -71
Add tlv specific channel info structure and restore the original size of the
common channel info structure to mitigate this issue.
Tested HW: WCN3990
QCA9887
Tested FW: WLAN.HL.3.1-00784-QCAHLSWMTPLZ-1
10.2.4-1.0-00037
Fixes: 13104929d2 ("ath10k: fill the channel survey results for WCN3990 correctly")
Cc: stable@vger.kernel.org # 5.0
Signed-off-by: Rakesh Pillai <pillair@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ee6db78f5d upstream.
Testing with RTL8822BE hardware, when available memory is low, we
frequently see a kernel panic and system freeze.
First, rtw_pci_rx_isr encounters a memory allocation failure (trimmed):
rx routine starvation
WARNING: CPU: 7 PID: 9871 at drivers/net/wireless/realtek/rtw88/pci.c:822 rtw_pci_rx_isr.constprop.25+0x35a/0x370 [rtwpci]
[ 2356.580313] RIP: 0010:rtw_pci_rx_isr.constprop.25+0x35a/0x370 [rtwpci]
Then we see a variety of different error conditions and kernel panics,
such as this one (trimmed):
rtw_pci 0000:02:00.0: pci bus timeout, check dma status
skbuff: skb_over_panic: text:00000000091b6e66 len:415 put:415 head:00000000d2880c6f data:000000007a02b1ea tail:0x1df end:0xc0 dev:<NULL>
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:105!
invalid opcode: 0000 [#1] SMP NOPTI
RIP: 0010:skb_panic+0x43/0x45
When skb allocation fails and the "rx routine starvation" is hit, the
function returns immediately without updating the RX ring. At this
point, the RX ring may continue referencing an old skb which was already
handed off to ieee80211_rx_irqsafe(). When it comes to be used again,
bad things happen.
This patch allocates a new, data-sized skb first in RX ISR. After
copying the data in, we pass it to the upper layers. However, if skb
allocation fails, we effectively drop the frame. In both cases, the
original, full size ring skb is reused.
In addition, to fixing the kernel crash, the RX routine should now
generally behave better under low memory conditions.
Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=204053
Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9f75c82246 upstream.
Commit 0b6cf6b97b ("tpm: pass an array of tpm_extend_digest structures to
tpm_pcr_extend()") modifies tpm_pcr_extend() to accept a digest for each
PCR bank. After modification, tpm_pcr_extend() expects that digests are
passed in the same order as the algorithms set in chip->allocated_banks.
This patch fixes two issues introduced in the last iterations of the patch
set: missing initialization of the TPM algorithm ID in the tpm_digest
structures passed to tpm_pcr_extend() by the trusted key module, and
unreleased locks in the TPM driver due to returning from tpm_pcr_extend()
without calling tpm_put_ops().
Cc: stable@vger.kernel.org
Fixes: 0b6cf6b97b ("tpm: pass an array of tpm_extend_digest structures to tpm_pcr_extend()")
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Suggested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 850e8f6fbd upstream.
When beacon length is not a multiple of 4, the beacon could be sent with
the last 1-3 bytes corrupted. The skb data is guaranteed to have enough
room for reading beyond the end, because it is always followed by
skb_shared_info, so rounding up is safe.
All other callers of mt76_wr_copy have multiple-of-4 length already.
Cc: stable@vger.kernel.org
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f0b444b349 upstream.
In function sweep_bh_for_rgrps, which is a helper for punch_hole,
it uses variable buf_in_tr to keep track of when it needs to commit
pending block frees on a partial delete that overflows the
transaction created for the delete. The problem is that the
variable was initialized at the start of function sweep_bh_for_rgrps
but it was never cleared, even when starting a new transaction.
This patch reinitializes the variable when the transaction is
ended, so the next transaction starts out with it cleared.
Fixes: d552a2b9b3 ("GFS2: Non-recursive delete")
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 51677dfcc1 upstream.
For various reasons, at least with x86 EFI firmwares, the xoffset and
yoffset in the BGRT info are not always reliable.
Extensive testing has shown that when the info is correct, the
BGRT image is always exactly centered horizontally (the yoffset variable
is more variable and not always predictable).
This commit simplifies / improves the bgrt_sanity_check to simply
check that the BGRT image is exactly centered horizontally and skips
(re)drawing it when it is not.
This fixes the BGRT image sometimes being drawn in the wrong place.
Cc: stable@vger.kernel.org
Fixes: 88fe4ceb24 ("efifb: BGRT: Do not copy the boot graphics for non native resolutions")
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Cc: Peter Jones <pjones@redhat.com>,
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190721131918.10115-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 55576cf185 upstream.
The kernel has no way of knowing when we have finished instantiating
drivers, between deferred probe and systems that build key drivers as
modules we might be doing this long after userspace has booted. This has
always been a bit of an issue with regulator_init_complete since it can
power off hardware that's not had it's driver loaded which can result in
user visible effects, the main case is powering off displays. Practically
speaking it's not been an issue in real systems since most systems that
use the regulator API are embedded and build in key drivers anyway but
with Arm laptops coming on the market it's becoming more of an issue so
let's do something about it.
In the absence of any better idea just defer the powering off for 30s
after late_initcall(), this is obviously a hack but it should mask the
issue for now and it's no more arbitrary than late_initcall() itself.
Ideally we'd have some heuristics to detect if we're on an affected
system and tune or skip the delay appropriately, and there may be some
need for a command line option to be added.
Link: https://lore.kernel.org/r/20190904124250.25844-1-broonie@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Tested-by: Lee Jones <lee.jones@linaro.org>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2c70010867 upstream.
set_msi_sid_cb() is used to determine whether device aliases share the
same bus, but it can provide false indications that aliases use the same
bus when in fact they do not. The reason is that set_msi_sid_cb()
assumes that pdev is fixed, while actually pci_for_each_dma_alias() can
call fn() when pdev is set to a subordinate device.
As a result, running an VM on ESX with VT-d emulation enabled can
results in the log warning such as:
DMAR: [INTR-REMAP] Request device [00:11.0] fault index 3b [fault reason 38] Blocked an interrupt request due to source-id verification failure
This seems to cause additional ata errors such as:
ata3.00: qc timeout (cmd 0xa1)
ata3.00: failed to IDENTIFY (I/O error, err_mask=0x4)
These timeouts also cause boot to be much longer and other errors.
Fix it by checking comparing the alias with the previous one instead.
Fixes: 3f0c625c6a ("iommu/vt-d: Allow interrupts from the entire bus for aliased devices")
Cc: stable@vger.kernel.org
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Nadav Amit <namit@vmware.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f18ddc13af upstream.
ENOTSUPP is not supposed to be returned to userspace. This was found on an
OpenPower machine, where the RTC does not support set_alarm.
On that system, a clock_nanosleep(CLOCK_REALTIME_ALARM, ...) results in
"524 Unknown error 524"
Replace it with EOPNOTSUPP which results in the expected "95 Operation not
supported" error.
Fixes: 1c6b39ad3f (alarmtimers: Return -ENOTSUPP if no RTC device is present)
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20190903171802.28314-1-cascardo@canonical.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b5e86196b8 upstream.
Detecting the ATS capability of the SMMU at probe time introduces a
spinlock into the ->unmap() fast path, even when ATS is not actually
in use. Furthermore, the ATC invalidation that exists is broken, as it
occurs before invalidation of the main SMMU TLB which leaves a window
where the ATC can be repopulated with stale entries.
Given that ATS is both a new feature and a specialist sport, disable it
for now whilst we fix it properly in subsequent patches. Since PRI
requires ATS, disable that too.
Cc: <stable@vger.kernel.org>
Fixes: 9ce27afc08 ("iommu/arm-smmu-v3: Add support for PCI ATS")
Acked-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 03e61929c0 upstream.
150MHz is a fundamental limitation of RK3328 Soc, w/o this limitation,
eMMC, for instance, will run into 200MHz clock rate in HS200 mode, which
makes the RK3328 boards not always boot properly. By adding it in
rk3328.dtsi would also obviate the worry of missing it when adding new
boards.
Fixes: 52e02d377a ("arm64: dts: rockchip: add core dtsi file for RK3328 SoCs")
Cc: stable@vger.kernel.org
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Liang Chen <cl@rock-chips.com>
Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 51696d346c upstream.
05f2d2f83b ("arm64: tlbflush: Introduce __flush_tlb_kernel_pgtable")
added a new TLB invalidation helper which is used when freeing
intermediate levels of page table used for kernel mappings, but is
missing the required ISB instruction after completion of the TLBI
instruction.
Add the missing barrier.
Cc: <stable@vger.kernel.org>
Fixes: 05f2d2f83b ("arm64: tlbflush: Introduce __flush_tlb_kernel_pgtable")
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b7005d4ef4 upstream.
This fixes a kernel panic on memcpy when
FORTIFY_SOURCE is enabled.
The initial smp implementation on commit aa7eb2bb4e
("arm: zynq: Add smp support")
used memcpy, which worked fine until commit ee333554fe
("ARM: 8749/1: Kconfig: Add ARCH_HAS_FORTIFY_SOURCE")
enabled overflow checks at runtime, producing a read
overflow panic.
The computed size of memcpy args are:
- p_size (dst): 4294967295 = (size_t) -1
- q_size (src): 1
- size (len): 8
Additionally, the memory is marked as __iomem, so one of
the memcpy_* functions should be used for read/write.
Fixes: aa7eb2bb4e ("arm: zynq: Add smp support")
Signed-off-by: Luis Araneda <luaraneda@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7be3cb019d upstream.
When brk was moved for binaries without an interpreter, it should have
been limited to ET_DYN only. In other words, the special case was an
ET_DYN that lacks an INTERP, not just an executable that lacks INTERP.
The bug manifested for giant static executables, where the brk would end
up in the middle of the text area on 32-bit architectures.
Reported-and-tested-by: Richard Kojedzinszky <richard@kojedz.in>
Fixes: bbdc6076d2 ("binfmt_elf: move brk out of mmap when doing direct loader exec")
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 79e85d1d2c upstream.
Add an extra condition to add the video output control class when the
device has some hdmi outputs defined. This is required to then always
be able to add the display present control, which is enabled when
there are some hdmi outputs.
This fixes the corner case where no_error_inj is enabled and the
device has no frame buffer but some hdmi outputs, as otherwise the
video output control class would be added anyway. Without this fix,
the sanity checks fail in v4l2_ctrl_new() as name is NULL.
Fixes: c533435ffb ("media: vivid: add display present control")
Cc: stable@vger.kernel.org # for 5.3
Signed-off-by: Guillaume Tucker <guillaume.tucker@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 14e3cdbb00 upstream.
A bugfix introduce a link failure in configurations without CONFIG_MODULES:
In file included from drivers/media/usb/dvb-usb/pctv452e.c:20:0:
drivers/media/usb/dvb-usb/pctv452e.c: In function 'pctv452e_frontend_attach':
drivers/media/dvb-frontends/stb0899_drv.h:151:36: error: weak declaration of 'stb0899_attach' being applied to a already existing, static definition
The problem is that the !IS_REACHABLE() declaration of stb0899_attach()
is a 'static inline' definition that clashes with the weak definition.
I further observed that the bugfix was only done for one of the five users
of stb0899_attach(), the other four still have the problem. This reverts
the bugfix and instead addresses the problem by not dropping the reference
count when calling '->detach()', instead we call this function directly
in dvb_frontend_put() before dropping the kref on the front-end.
I first submitted this in early 2018, and after some discussion it
was apparently discarded. While there is a long-term plan in place,
that plan is obviously not nearing completion yet, and the current
kernel is still broken unless this patch is applied.
Link: https://patchwork.kernel.org/patch/10140175/
Link: https://patchwork.linuxtv.org/patch/54831/
Cc: Max Kellermann <max.kellermann@gmail.com>
Cc: Wolfgang Rohdewald <wolfgang@rohdewald.de>
Cc: stable@vger.kernel.org
Fixes: f686c14364 ("[media] stb0899: move code to "detach" callback")
Fixes: 6cdeaed3b1 ("media: dvb_usb_pctv452e: module refcount changes were unbalanced")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Sean Young <sean@mess.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 92f58b5c01 upstream.
Use the fast invalidate mechasim to zap MMIO sptes on a MMIO generation
wrap. The fast invalidate flow was reintroduced to fix a livelock bug
in kvm_mmu_zap_all() that can occur if kvm_mmu_zap_all() is invoked when
the guest has live vCPUs. I.e. using kvm_mmu_zap_all() to handle the
MMIO generation wrap is theoretically susceptible to the livelock bug.
This effectively reverts commit 4771450c34 ("Revert "KVM: MMU: drop
kvm_mmu_zap_mmio_sptes""), i.e. restores the behavior of commit
a8eca9dcc6 ("KVM: MMU: drop kvm_mmu_zap_mmio_sptes").
Note, this actually fixes commit 571c5af06e ("KVM: x86/mmu:
Voluntarily reschedule as needed when zapping MMIO sptes"), but there
is no need to incrementally revert back to using fast invalidate, e.g.
doing so doesn't provide any bisection or stability benefits.
Fixes: 571c5af06e ("KVM: x86/mmu: Voluntarily reschedule as needed when zapping MMIO sptes")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fdcf756213 upstream.
We can easily route hardware interrupts directly into VM context when
they target the "Fixed" or "LowPriority" delivery modes.
However, on modes such as "SMI" or "Init", we need to go via KVM code
to actually put the vCPU into a different mode of operation, so we can
not post the interrupt
Add code in the VMX and SVM PI logic to explicitly refuse to establish
posted mappings for advanced IRQ deliver modes. This reflects the logic
in __apic_accept_irq() which also only ever passes Fixed and LowPriority
interrupts as posted interrupts into the guest.
This fixes a bug I have with code which configures real hardware to
inject virtual SMIs into my guest.
Signed-off-by: Alexander Graf <graf@amazon.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Wanpeng Li <wanpengli@tencent.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 16cfacc808 upstream.
Manually generate the PDPTR reserved bit mask when explicitly loading
PDPTRs. The reserved bits that are being tracked by the MMU reflect the
current paging mode, which is unlikely to be PAE paging in the vast
majority of flows that use load_pdptrs(), e.g. CR0 and CR4 emulation,
__set_sregs(), etc... This can cause KVM to incorrectly signal a bad
PDPTR, or more likely, miss a reserved bit check and subsequently fail
a VM-Enter due to a bad VMCS.GUEST_PDPTR.
Add a one off helper to generate the reserved bits instead of sharing
code across the MMU's calculations and the PDPTR emulation. The PDPTR
reserved bits are basically set in stone, and pushing a helper into
the MMU's calculation adds unnecessary complexity without improving
readability.
Oppurtunistically fix/update the comment for load_pdptrs().
Note, the buggy commit also introduced a deliberate functional change,
"Also remove bit 5-6 from rsvd_bits_mask per latest SDM.", which was
effectively (and correctly) reverted by commit cd9ae5fe47 ("KVM: x86:
Fix page-tables reserved bits"). A bit of SDM archaeology shows that
the SDM from late 2008 had a bug (likely a copy+paste error) where it
listed bits 6:5 as AVL and A for PDPTEs used for 4k entries but reserved
for 2mb entries. I.e. the SDM contradicted itself, and bits 6:5 are and
always have been reserved.
Fixes: 20c466b561 ("KVM: Use rsvd_bits_mask in load_pdptrs()")
Cc: stable@vger.kernel.org
Cc: Nadav Amit <nadav.amit@gmail.com>
Reported-by: Doug Reiland <doug.reiland@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8530a79c5a upstream.
inject_emulated_exception() returns true if and only if nested page
fault happens. However, page fault can come from guest page tables
walk, either nested or not nested. In both cases we should stop an
attempt to read under RIP and give guest to step over its own page
fault handler.
This is also visible when an emulated instruction causes a #GP fault
and the VMware backdoor is enabled. To handle the VMware backdoor,
KVM intercepts #GP faults; with only the next patch applied,
x86_emulate_instruction() injects a #GP but returns EMULATE_FAIL
instead of EMULATE_DONE. EMULATE_FAIL causes handle_exception_nmi()
(or gp_interception() for SVM) to re-inject the original #GP because it
thinks emulation failed due to a non-VMware opcode. This patch prevents
the issue as x86_emulate_instruction() will return EMULATE_DONE after
injecting the #GP.
Fixes: 6ea6e84309 ("KVM: x86: inject exceptions produced by x86_decode_insn")
Cc: stable@vger.kernel.org
Cc: Denis Lunev <den@virtuozzo.com>
Cc: Roman Kagan <rkagan@virtuozzo.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Signed-off-by: Jan Dakinevich <jan.dakinevich@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1bd43d0077 upstream.
Commit 871f1f2bcb ("platform/x86: intel_int0002_vgpio: Only implement
irq_set_wake on Bay Trail") removed the irq_set_wake method from the
struct irq_chip used on Cherry Trail, but it did not set
IRQCHIP_SKIP_SET_WAKE causing kernel/irq/manage.c: set_irq_wake_real()
to return -ENXIO.
This causes the kernel to no longer see PME events reported through the
INT0002 device as wakeup events. Which e.g. breaks wakeup by the (USB)
keyboard on many Cherry Trail 2-in-1 devices.
Cc: stable@vger.kernel.org
Fixes: 871f1f2bcb ("platform/x86: intel_int0002_vgpio: Only implement irq_set_wake on Bay Trail")
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5fa1659105 upstream.
The HP Dino PCI controller chip can be used in two variants: as on-board
controller (e.g. in B160L), or on an Add-On card ("Card-Mode") to bridge
PCI components to systems without a PCI bus, e.g. to a HSC/GSC bus. One
such Add-On card is the HP HSC-PCI Card which has one or more DEC Tulip
PCI NIC chips connected to the on-card Dino PCI controller.
Dino in Card-Mode has a big disadvantage: All PCI memory accesses need
to go through the DINO_MEM_DATA register, so Linux drivers will not be
able to use the ioremap() function. Without ioremap() many drivers will
not work, one example is the tulip driver which then simply crashes the
kernel if it tries to access the ports on the HP HSC card.
This patch disables the HP HSC card if it finds one, and as such
fixes the kernel crash on a HP D350/2 machine.
Signed-off-by: Helge Deller <deller@gmx.de>
Noticed-by: Phil Scarr <phil.scarr@pm.me>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 76e43c8cca upstream.
When IOCB_CMD_POLL is used on the FUSE device, aio_poll() disables IRQs
and takes kioctx::ctx_lock, then fuse_iqueue::waitq.lock.
This may have to wait for fuse_iqueue::waitq.lock to be released by one
of many places that take it with IRQs enabled. Since the IRQ handler
may take kioctx::ctx_lock, lockdep reports that a deadlock is possible.
Fix it by protecting the state of struct fuse_iqueue with a separate
spinlock, and only accessing fuse_iqueue::waitq using the versions of
the waitqueue functions which do IRQ-safe locking internally.
Reproducer:
#include <fcntl.h>
#include <stdio.h>
#include <sys/mount.h>
#include <sys/stat.h>
#include <sys/syscall.h>
#include <unistd.h>
#include <linux/aio_abi.h>
int main()
{
char opts[128];
int fd = open("/dev/fuse", O_RDWR);
aio_context_t ctx = 0;
struct iocb cb = { .aio_lio_opcode = IOCB_CMD_POLL, .aio_fildes = fd };
struct iocb *cbp = &cb;
sprintf(opts, "fd=%d,rootmode=040000,user_id=0,group_id=0", fd);
mkdir("mnt", 0700);
mount("foo", "mnt", "fuse", 0, opts);
syscall(__NR_io_setup, 1, &ctx);
syscall(__NR_io_submit, ctx, 1, &cbp);
}
Beginning of lockdep output:
=====================================================
WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
5.3.0-rc5 #9 Not tainted
-----------------------------------------------------
syz_fuse/135 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
000000003590ceda (&fiq->waitq){+.+.}, at: spin_lock include/linux/spinlock.h:338 [inline]
000000003590ceda (&fiq->waitq){+.+.}, at: aio_poll fs/aio.c:1751 [inline]
000000003590ceda (&fiq->waitq){+.+.}, at: __io_submit_one.constprop.0+0x203/0x5b0 fs/aio.c:1825
and this task is already holding:
0000000075037284 (&(&ctx->ctx_lock)->rlock){..-.}, at: spin_lock_irq include/linux/spinlock.h:363 [inline]
0000000075037284 (&(&ctx->ctx_lock)->rlock){..-.}, at: aio_poll fs/aio.c:1749 [inline]
0000000075037284 (&(&ctx->ctx_lock)->rlock){..-.}, at: __io_submit_one.constprop.0+0x1f4/0x5b0 fs/aio.c:1825
which would create a new lock dependency:
(&(&ctx->ctx_lock)->rlock){..-.} -> (&fiq->waitq){+.+.}
but this new dependency connects a SOFTIRQ-irq-safe lock:
(&(&ctx->ctx_lock)->rlock){..-.}
[...]
Reported-by: syzbot+af05535bb79520f95431@syzkaller.appspotmail.com
Reported-by: syzbot+d86c4426a01f60feddc7@syzkaller.appspotmail.com
Fixes: bfe4037e72 ("aio: implement IOCB_CMD_POLL")
Cc: <stable@vger.kernel.org> # v4.19+
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e13cd21ffd upstream.
tpm_send() does not give anymore the result back to the caller. This
would require another memcpy(), which kind of tells that the whole
approach is somewhat broken. Instead, as Mimi suggested, this commit
just wraps the data to the tpm_buf, and thus the result will not go to
the garbage.
Obviously this assumes from the caller that it passes large enough
buffer, which makes the whole API somewhat broken because it could be
different size than @buflen but since trusted keys is the only module
using this API right now I think that this fix is sufficient for the
moment.
In the near future the plan is to replace the parameters with a tpm_buf
created by the caller.
Reported-by: Mimi Zohar <zohar@linux.ibm.com>
Suggested-by: Mimi Zohar <zohar@linux.ibm.com>
Cc: stable@vger.kernel.org
Fixes: 412eb58558 ("use tpm_buf in tpm_transmit_cmd() as the IO parameter")
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 41ba17f20e upstream.
Commit <684d984038aa> ('powerpc/powernv: Add debugfs interface for
imc-mode and imc') added debugfs interface for the nest imc pmu
devices to support changing of different ucode modes. Primarily adding
this capability for debug. But when doing so, the code did not
consider the case of cpu-less nodes. So when reading the _cmd_ or
_mode_ file of a cpu-less node will create this crash.
Faulting instruction address: 0xc0000000000d0d58
Oops: Kernel access of bad area, sig: 11 [#1]
...
CPU: 67 PID: 5301 Comm: cat Not tainted 5.2.0-rc6-next-20190627+ #19
NIP: c0000000000d0d58 LR: c00000000049aa18 CTR:c0000000000d0d50
REGS: c00020194548f9e0 TRAP: 0300 Not tainted (5.2.0-rc6-next-20190627+)
MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR:28022822 XER: 00000000
CFAR: c00000000049aa14 DAR: 000000000003fc08 DSISR:40000000 IRQMASK: 0
...
NIP imc_mem_get+0x8/0x20
LR simple_attr_read+0x118/0x170
Call Trace:
simple_attr_read+0x70/0x170 (unreliable)
debugfs_attr_read+0x6c/0xb0
__vfs_read+0x3c/0x70
vfs_read+0xbc/0x1a0
ksys_read+0x7c/0x140
system_call+0x5c/0x70
Patch fixes the issue with a more robust check for vbase to NULL.
Before patch, ls output for the debugfs imc directory
# ls /sys/kernel/debug/powerpc/imc/
imc_cmd_0 imc_cmd_251 imc_cmd_253 imc_cmd_255 imc_mode_0 imc_mode_251 imc_mode_253 imc_mode_255
imc_cmd_250 imc_cmd_252 imc_cmd_254 imc_cmd_8 imc_mode_250 imc_mode_252 imc_mode_254 imc_mode_8
After patch, ls output for the debugfs imc directory
# ls /sys/kernel/debug/powerpc/imc/
imc_cmd_0 imc_cmd_8 imc_mode_0 imc_mode_8
Actual bug here is that, we have two loops with potentially different
loop counts. That is, in imc_get_mem_addr_nest(), loop count is
obtained from the dt entries. But in case of export_imc_mode_and_cmd(),
loop was based on for_each_nid() count. Patch fixes the loop count in
latter based on the struct mem_info. Ideally it would be better to
have array size in struct imc_pmu.
Fixes: 684d984038 ('powerpc/powernv: Add debugfs interface for imc-mode and imc')
Reported-by: Qian Cai <cai@lca.pw>
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190827101635.6942-1-maddy@linux.vnet.ibm.com
Cc: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 226b4fc75c ]
SCSI maintains its own driver private data hooked off of each SCSI
request, and the pridate data won't be freed after scsi_queue_rq()
returns BLK_STS_RESOURCE or BLK_STS_DEV_RESOURCE. An upper layer driver
(e.g. dm-rq) may need to retry these SCSI requests, before SCSI has
fully dispatched them, due to a lower level SCSI driver's resource
limitation identified in scsi_queue_rq(). Currently SCSI's per-request
private data is leaked when the upper layer driver (dm-rq) frees and
then retries these requests in response to BLK_STS_RESOURCE or
BLK_STS_DEV_RESOURCE returns from scsi_queue_rq().
This usecase is so specialized that it doesn't warrant training an
existing blk-mq interface (e.g. blk_mq_free_request) to allow SCSI to
account for freeing its driver private data -- doing so would add an
extra branch for handling a special case that all other consumers of
SCSI (and blk-mq) won't ever need to worry about.
So the most pragmatic way forward is to delegate freeing SCSI driver
private data to the upper layer driver (dm-rq). Do so by adding
new .cleanup_rq callback and calling a new blk_mq_cleanup_rq() method
from dm-rq. A following commit will implement the .cleanup_rq() hook
in scsi_mq_ops.
Cc: Ewan D. Milne <emilne@redhat.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: dm-devel@redhat.com
Cc: <stable@vger.kernel.org>
Fixes: 396eaf21ee ("blk-mq: improve DM's blk-mq IO merging via blk_insert_cloned_request feedback")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bd9c10bc66 ]
The laptop has a combined jack to attach headsets on the right.
The BIOS encodes them as two different colored jacks at the front,
but otherwise it seems to be configured ok. But any adaption of
the pins config on its own doesn't fix the jack detection to work
in Linux. Still Windows works correct.
This is somehow fixed by chaining ALC256_FIXUP_ASUS_HEADSET_MODE,
which seems to register the microphone jack as a headset part and
also results in fixing jack sensing, visible in dmesg as:
-snd_hda_codec_realtek hdaudioC0D0: Mic=0x19
+snd_hda_codec_realtek hdaudioC0D0: Headset Mic=0x19
[ Actually the essential change is the location of the jack; the
driver created "Front Mic Jack" without the matching volume / mute
control element due to its jack location, which confused PA.
-- tiwai ]
Signed-off-by: Jan-Marek Glogowski <glogow@fbihome.de>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/8f4f9b20-0aeb-f8f1-c02f-fd53c09679f1@fbihome.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 130d9c331b ]
A rather embarrasing mistake had us call sched_setscheduler() before
initializing the parameters passed to it.
Fixes: 1a763fd7c6 ("rcu/tree: Call setschedule() gp ktread to SCHED_FIFO outside of atomic region")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Paul E. McKenney <paulmck@linux.ibm.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 24cf23276a upstream.
A previous commit removed the panel-dpi driver, which made the
video on the AM3517-evm stop working because it relied on the dpi
driver for setting video timings. Now that the simple-panel driver
is available in omap2plus, this patch migrates the am3517-evm
to use a similar panel and remove the manual timing requirements.
Fixes: 8bf4b16211 ("drm/omap: Remove panel-dpi driver")
Signed-off-by: Adam Ford <aford173@gmail.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b2590bdd0b upstream.
When a KDETH packet is subject to fault injection during transmission,
HCRC is supposed to be omitted from the packet so that the hardware on the
receiver side would drop the packet. When creating pbc, the PbcInsertHcrc
field is set to be PBC_IHCRC_NONE if the KDETH packet is subject to fault
injection, but overwritten with PBC_IHCRC_LKDETH when update_hcrc() is
called later.
This problem is fixed by not calling update_hcrc() when the packet is
subject to fault injection.
Fixes: 6b6cf9357f ("IB/hfi1: Set PbcInsertHcrc for TID RDMA packets")
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20190715164546.74174.99296.stgit@awfm-01.aw.intel.com
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c9dccacfcc upstream.
kmsg_dump_get_buffer() is supposed to select all the youngest log
messages which fit into the provided buffer. It determines the correct
start index by using msg_print_text() with a NULL buffer to calculate
the size of each entry. However, when performing the actual writes,
msg_print_text() only writes the entry to the buffer if the written len
is lesser than the size of the buffer. So if the lengths of the
selected youngest log messages happen to precisely fill up the provided
buffer, the last log message is not included.
We don't want to modify msg_print_text() to fill up the buffer and start
returning a length which is equal to the size of the buffer, since
callers of its other users, such as kmsg_dump_get_line(), depend upon
the current behaviour.
Instead, fix kmsg_dump_get_buffer() to compensate for this.
For example, with the following two final prints:
[ 6.427502] AAAAAAAAAAAAA
[ 6.427769] BBBBBBBB12345
A dump of a 64-byte buffer filled by kmsg_dump_get_buffer(), before this
patch:
00000000: 3c 30 3e 5b 20 20 20 20 36 2e 35 32 32 31 39 37 <0>[ 6.522197
00000010: 5d 20 41 41 41 41 41 41 41 41 41 41 41 41 41 0a ] AAAAAAAAAAAAA.
00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
After this patch:
00000000: 3c 30 3e 5b 20 20 20 20 36 2e 34 35 36 36 37 38 <0>[ 6.456678
00000010: 5d 20 42 42 42 42 42 42 42 42 31 32 33 34 35 0a ] BBBBBBBB12345.
00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Link: http://lkml.kernel.org/r/20190711142937.4083-1-vincent.whitchurch@axis.com
Fixes: e2ae715d66 ("kmsg - kmsg_dump() use iterator to receive log buffer content")
To: rostedt@goodmis.org
Cc: linux-kernel@vger.kernel.org
Cc: <stable@vger.kernel.org> # v3.5+
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8b5292bcfc upstream.
Relogin fails to move forward due to scan_state flag indicating device is
not there. Before relogin process, Session delete process accidently
modified the scan_state flag.
[mkp: typos plus corrected Fixes: sha as reported by sfr]
Fixes: 2dee552102 ("scsi: qla2xxx: Fix login state machine freeze")
Cc: stable@vger.kernel.org
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e1a00b5b25 upstream.
2 bytes in MSB of register for clock status is zero during intermediate
state after changing status of sampling clock in models of TASCAM FireWire
series. The duration of this state differs depending on cases. During the
state, it's better to retry reading the register for current status of
the clock.
In current implementation, the intermediate state is checked only when
getting current sampling transmission frequency, then retry reading.
This care is required for the other operations to read the register.
This commit moves the codes of check and retry into helper function
commonly used for operations to read the register.
Fixes: e453df44f0 ("ALSA: firewire-tascam: add PCM functionality")
Cc: <stable@vger.kernel.org> # v4.4+
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Link: https://lore.kernel.org/r/20190910135152.29800-3-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fddbfeece9 upstream.
The intention was to have the GEO_TX_POWER_LIMIT command in FW version
36 as well, but not all 8000 family got this feature enabled. The
8000 family is the only one using version 36, so skip this version
entirely. If we try to send this command to the firmwares that do not
support it, we get a BAD_COMMAND response from the firmware.
This fixes https://bugzilla.kernel.org/show_bug.cgi?id=204151.
Cc: stable@vger.kernel.org # 4.19+
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4957eccf97 upstream.
When the panel-dpi driver was removed, the simple-panels driver
was never enabled, so anyone who used the panel-dpi driver lost
video, and those who used it inconjunction with simple-panels
would have to manually enable CONFIG_DRM_PANEL_SIMPLE.
This patch makes CONFIG_DRM_PANEL_SIMPLE a module in the same
way the deprecated panel-dpi was.
Fixes: 8bf4b16211 ("drm/omap: Remove panel-dpi driver")
Signed-off-by: Adam Ford <aford173@gmail.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f9f5518a38 upstream.
A previous commit removed the panel-dpi driver, which made the
Torpedo video stop working because it relied on the dpi driver
for setting video timings. Now that the simple-panel driver is
available in omap2plus, this patch migrates the Torpedo dev kits
to use a similar panel and remove the manual timing requirements.
Fixes: 8bf4b16211 ("drm/omap: Remove panel-dpi driver")
Signed-off-by: Adam Ford <aford173@gmail.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 1cfff4d9a5 ]
On AMD processors, in PAE 32bit mode, nested KVM instances don't
work. The L0 host get a kernel OOPS, which is related to
arch.mmu->pae_root being NULL.
The reason for this is that when setting up nested KVM instance,
arch.mmu is set to &arch.guest_mmu (while normally, it would be
&arch.root_mmu). However, the initialization and allocation of
pae_root only creates it in root_mmu. KVM code (ie. in
mmu_alloc_shadow_roots) then accesses arch.mmu->pae_root, which is the
unallocated arch.guest_mmu->pae_root.
This fix just allocates (and frees) pae_root in both guest_mmu and
root_mmu (and also lm_root if it was allocated). The allocation is
subject to previous restrictions ie. it won't allocate anything on
64-bit and AFAIK not on Intel.
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=203923
Fixes: 14c07ad89f ("x86/kvm/mmu: introduce guest_mmu")
Signed-off-by: Jiri Palecek <jpalecek@web.de>
Tested-by: Jiri Palecek <jpalecek@web.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 259ee7754b ]
This patch will introduce ROOT_ITEM check, which includes:
- Key->objectid and key->offset check
Currently only some easy check, e.g. 0 as rootid is invalid.
- Item size check
Root item size is fixed.
- Generation checks
Generation, generation_v2 and last_snapshot should not be greater than
super generation + 1
- Level and alignment check
Level should be in [0, 7], and bytenr must be aligned to sector size.
- Flags check
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203261
Reported-by: Jungyeon Yoon <jungyeon.yoon@gmail.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2a28468e52 ]
[BUG]
With fuzzed image and MIXED_GROUPS super flag, we can hit the following
BUG_ON():
kernel BUG at fs/btrfs/delayed-ref.c:491!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 1849 Comm: sync Tainted: G O 5.2.0-custom #27
RIP: 0010:update_existing_head_ref.cold+0x44/0x46 [btrfs]
Call Trace:
add_delayed_ref_head+0x20c/0x2d0 [btrfs]
btrfs_add_delayed_tree_ref+0x1fc/0x490 [btrfs]
btrfs_free_tree_block+0x123/0x380 [btrfs]
__btrfs_cow_block+0x435/0x500 [btrfs]
btrfs_cow_block+0x110/0x240 [btrfs]
btrfs_search_slot+0x230/0xa00 [btrfs]
? __lock_acquire+0x105e/0x1e20
btrfs_insert_empty_items+0x67/0xc0 [btrfs]
alloc_reserved_file_extent+0x9e/0x340 [btrfs]
__btrfs_run_delayed_refs+0x78e/0x1240 [btrfs]
? kvm_clock_read+0x18/0x30
? __sched_clock_gtod_offset+0x21/0x50
btrfs_run_delayed_refs.part.0+0x4e/0x180 [btrfs]
btrfs_run_delayed_refs+0x23/0x30 [btrfs]
btrfs_commit_transaction+0x53/0x9f0 [btrfs]
btrfs_sync_fs+0x7c/0x1c0 [btrfs]
? __ia32_sys_fdatasync+0x20/0x20
sync_fs_one_sb+0x23/0x30
iterate_supers+0x95/0x100
ksys_sync+0x62/0xb0
__ia32_sys_sync+0xe/0x20
do_syscall_64+0x65/0x240
entry_SYSCALL_64_after_hwframe+0x49/0xbe
[CAUSE]
This situation is caused by several factors:
- Fuzzed image
The extent tree of this fs missed one backref for extent tree root.
So we can allocated space from that slot.
- MIXED_BG feature
Super block has MIXED_BG flag.
- No mixed block groups exists
All block groups are just regular ones.
This makes data space_info->block_groups[] contains metadata block
groups. And when we reserve space for data, we can use space in
metadata block group.
Then we hit the following file operations:
- fallocate
We need to allocate data extents.
find_free_extent() choose to use the metadata block to allocate space
from, and choose the space of extent tree root, since its backref is
missing.
This generate one delayed ref head with is_data = 1.
- extent tree update
We need to update extent tree at run_delayed_ref time.
This generate one delayed ref head with is_data = 0, for the same
bytenr of old extent tree root.
Then we trigger the BUG_ON().
[FIX]
The quick fix here is to check block_group->flags before using it.
The problem can only happen for MIXED_GROUPS fs. Regular filesystems
won't have space_info with DATA|METADATA flag, and no way to hit the
bug.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203255
Reported-by: Jungyeon Yoon <jungyeon.yoon@gmail.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 933c22a751 ]
There is one report of fuzzed image which leads to BUG_ON() in
btrfs_delete_delayed_dir_index().
Although that fuzzed image can already be addressed by enhanced
extent-tree error handler, it's still better to hunt down more BUG_ON().
This patch will hunt down two BUG_ON()s in
btrfs_delete_delayed_dir_index():
- One for error from btrfs_delayed_item_reserve_metadata()
Instead of BUG_ON(), we output an error message and free the item.
And return the error.
All callers of this function handles the error by aborting current
trasaction.
- One for possible EEXIST from __btrfs_add_delayed_deletion_item()
That function can return -EEXIST.
We already have a good enough error message for that, only need to
clean up the reserved metadata space and allocated item.
To help above cleanup, also modifiy __btrfs_remove_delayed_item() called
in btrfs_release_delayed_item(), to skip unassociated item.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203253
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 051c78af14 ]
Lenovo ThinkCentre M73 and M93 don't seem to have a proper beep
although the driver tries to probe and set up blindly.
Blacklist these machines for suppressing the beep creation.
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=204635
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f2dbe87c5a ]
We don't need to deal with the unsol events for Intel chips that are
tied with the graphics via audio component notifier. Although the
presence of the audio component is checked at the beginning of
hdmi_unsol_event(), better to short cut by dropping unsol_event ops.
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=204565
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a2ef03fe61 ]
[ This is rather a revival of the patch Tomas sent in months ago, but
applying only with the quirk model option -- tiwai ]
Hard coded coefficients to make Huawuei Matebook X right speaker
work. The Matebook X has a ALC298, please refer to bug 197801 on
how these numbers were reverse engineered from the Windows driver
The reversed engineered sequence represents a repeating pattern
of verbs, and the only values that are changing periodically are
written on indexes 0x23 and 0x25:
0x500, 0x23
0x400, VALUE1
0x500, 0x25
0x400, VALUE2
* skipped reading sequences (0x500 - 0xc00 sequences are ignored)
* static values from reverse engineering are used
NOTE: since a significant risk is still considered, this is provided
as an experimental fix that isn't applied as default for now. For
enabling the fix, you'll have to choose huawei-mbx-stereo via model
option of snd-hda-intel module.
If we get feedback from users that this works stably, we may apply it
per default.
[ Some coding style fixes and replacement with AC_VERB_* by tiwai ]
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=197801
Signed-off-by: Tomas Espeleta <tomas.espeleta@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1c81d69d4c ]
In cases when SDIO IRQs have been enabled, runtime suspend is prevented by
the driver. However, this still means msdc_runtime_suspend|resume() gets
called during system suspend/resume, via pm_runtime_force_suspend|resume().
This means during system suspend/resume, the register context of the mtk-sd
device most likely loses its register context, even in cases when SDIO IRQs
have been enabled.
To re-enable the SDIO IRQs during system resume, the mtk-sd driver
currently relies on the mmc core to re-enable the SDIO IRQs when it resumes
the SDIO card, but this isn't the recommended solution. Instead, it's
better to deal with this locally in the mtk-sd driver, so let's do that.
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b76b4715eb ]
While MD continues to count read errors returned by the lower layer.
If those errors are -EILSEQ, instead of -EIO, it should NOT increase
the read_errors count.
When RAID6 is set up on dm-integrity target that detects massive
corruption, the leg will be ejected from the array. Even if the
issue is correctable with a sector re-write and the array has
necessary redundancy to correct it.
The leg is ejected because it runs up the rdev->read_errors beyond
conf->max_nr_stripes. The return status in dm-drypt when there is
a data integrity error is -EILSEQ (BLK_STS_PROTECTION).
Signed-off-by: Nigel Croxon <ncroxon@redhat.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7c526608d5 ]
In cases when SDIO IRQs have been enabled, runtime suspend is prevented by
the driver. However, this still means dw_mci_runtime_suspend|resume() gets
called during system suspend/resume, via pm_runtime_force_suspend|resume().
This means during system suspend/resume, the register context of the dw_mmc
device most likely loses its register context, even in cases when SDIO IRQs
have been enabled.
To re-enable the SDIO IRQs during system resume, the dw_mmc driver
currently relies on the mmc core to re-enable the SDIO IRQs when it resumes
the SDIO card, but this isn't the recommended solution. Instead, it's
better to deal with this locally in the dw_mmc driver, so let's do that.
Tested-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bd880b0069 ]
To avoid each host driver supporting SDIO IRQs, from keeping track
internally about if SDIO IRQs has been claimed, let's introduce a common
helper function, sdio_irq_claimed().
The function returns true if SDIO IRQs are claimed, via using the
information about the number of claimed irqs. This is safe, even without
any locks, as long as the helper function is called only from
runtime/system suspend callbacks of the host driver.
Tested-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c894e33ddc ]
When switching from any MMC speed mode that requires 1.8v
(HS200, HS400 and HS400ES) to High Speed (HS) mode, the system
ends up configured for SDR12 with a 50MHz clock which is an illegal
mode.
This happens because the SDHCI_CTRL_VDD_180 bit in the
SDHCI_HOST_CONTROL2 register is left set and when this bit is
set, the speed mode is controlled by the SDHCI_CTRL_UHS field
in the SDHCI_HOST_CONTROL2 register. The SDHCI_CTRL_UHS field
will end up being set to 0 (SDR12) by sdhci_set_uhs_signaling()
because there is no UHS mode being set.
The fix is to change sdhci_set_uhs_signaling() to set the
SDHCI_CTRL_UHS field to SDR25 (which is the same as HS) for
any switch to HS mode.
This was found on a new eMMC controller that does strict checking
of the speed mode and the corresponding clock rate. It caused the
switch to HS400 mode to fail because part of the sequence to switch
to HS400 requires a switch from HS200 to HS before going to HS400.
Suggested-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Al Cooper <alcooperx@gmail.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 36d57efb4a ]
The sdio_irq_pending flag is used to let host drivers indicate that it has
signaled an IRQ. If that is the case and we only have a single SDIO func
that have claimed an SDIO IRQ, our assumption is that we can avoid reading
the SDIO_CCCR_INTx register and just call the SDIO func irq handler
immediately. This makes sense, but the flag is set/cleared in a somewhat
messy order, let's fix that up according to below.
First, the flag is currently set in sdio_run_irqs(), which is executed as a
work that was scheduled from sdio_signal_irq(). To make it more implicit
that the host have signaled an IRQ, let's instead immediately set the flag
in sdio_signal_irq(). This also makes the behavior consistent with host
drivers that uses the legacy, mmc_signal_sdio_irq() API. This have no
functional impact, because we don't expect host drivers to call
sdio_signal_irq() until after the work (sdio_run_irqs()) have been executed
anyways.
Second, currently we never clears the flag when using the sdio_run_irqs()
work, but only when using the sdio_irq_thread(). Let make the behavior
consistent, by moving the flag to be cleared inside the common
process_sdio_pending_irqs() function. Additionally, tweak the behavior of
the flag slightly, by avoiding to clear it unless we processed the SDIO
IRQ. The purpose with this at this point, is to keep the information about
whether there have been an SDIO IRQ signaled by the host, so at system
resume we can decide to process it without reading the SDIO_CCCR_INTx
register.
Tested-by: Matthias Kaehlcke <mka@chromium.org>
Reviewed-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6ce220dd2f ]
If stripe in batch list is set with STRIPE_HANDLE flag, then the stripe
could be set with STRIPE_ACTIVE by the handle_stripe function. And if
error happens to the batch_head at the same time, break_stripe_batch_list
is called, then below warning could happen (the same report in [1]), it
means a member of batch list was set with STRIPE_ACTIVE.
[7028915.431770] stripe state: 2001
[7028915.431815] ------------[ cut here ]------------
[7028915.431828] WARNING: CPU: 18 PID: 29089 at drivers/md/raid5.c:4614 break_stripe_batch_list+0x203/0x240 [raid456]
[...]
[7028915.431879] CPU: 18 PID: 29089 Comm: kworker/u82:5 Tainted: G O 4.14.86-1-storage #4.14.86-1.2~deb9
[7028915.431881] Hardware name: Supermicro SSG-2028R-ACR24L/X10DRH-iT, BIOS 3.1 06/18/2018
[7028915.431888] Workqueue: raid5wq raid5_do_work [raid456]
[7028915.431890] task: ffff9ab0ef36d7c0 task.stack: ffffb72926f84000
[7028915.431896] RIP: 0010:break_stripe_batch_list+0x203/0x240 [raid456]
[7028915.431898] RSP: 0018:ffffb72926f87ba8 EFLAGS: 00010286
[7028915.431900] RAX: 0000000000000012 RBX: ffff9aaa84a98000 RCX: 0000000000000000
[7028915.431901] RDX: 0000000000000000 RSI: ffff9ab2bfa15458 RDI: ffff9ab2bfa15458
[7028915.431902] RBP: ffff9aaa8fb4e900 R08: 0000000000000001 R09: 0000000000002eb4
[7028915.431903] R10: 00000000ffffffff R11: 0000000000000000 R12: ffff9ab1736f1b00
[7028915.431904] R13: 0000000000000000 R14: ffff9aaa8fb4e900 R15: 0000000000000001
[7028915.431906] FS: 0000000000000000(0000) GS:ffff9ab2bfa00000(0000) knlGS:0000000000000000
[7028915.431907] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[7028915.431908] CR2: 00007ff953b9f5d8 CR3: 0000000bf4009002 CR4: 00000000003606e0
[7028915.431909] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[7028915.431910] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[7028915.431910] Call Trace:
[7028915.431923] handle_stripe+0x8e7/0x2020 [raid456]
[7028915.431930] ? __wake_up_common_lock+0x89/0xc0
[7028915.431935] handle_active_stripes.isra.58+0x35f/0x560 [raid456]
[7028915.431939] raid5_do_work+0xc6/0x1f0 [raid456]
Also commit 59fc630b8b ("RAID5: batch adjacent full stripe write")
said "If a stripe is added to batch list, then only the first stripe
of the list should be put to handle_list and run handle_stripe."
So don't set STRIPE_HANDLE to stripe which is already in batch list,
otherwise the stripe could be put to handle_list and run handle_stripe,
then the above warning could be triggered.
[1]. https://www.spinics.net/lists/raid/msg62552.html
Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3d24430694 ]
Currently rq->data_len will be decreased by partial completion or
zeroed by completion, so when blk_stat_add() is invoked, data_len
will be zero and there will never be samples in poll_cb because
blk_mq_poll_stats_bkt() will return -1 if data_len is zero.
We could move blk_stat_add() back to __blk_mq_complete_request(),
but that would make the effort of trying to call ktime_get_ns()
once in vain. Instead we can reuse throtl_size field, and use
it for both block stats and block throttle, and adjust the
logic in blk_mq_poll_stats_bkt() accordingly.
Fixes: 4bc6339a58 ("block: move blk_stat_add() to __blk_mq_end_request()")
Tested-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8776f3fa15 ]
Sqo_thread will get sqring in batches, which will cause
ctx->cached_sq_head to be added in batches. if one of these
sqes is set with the DRAIN flag, then he will never get a
chance to process, and finally sqo_thread will not exit.
Fixes: de0617e467 ("io_uring: add support for marking commands as draining")
Signed-off-by: Jackie Liu <liuyun01@kylinos.cn>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4c524191c0 ]
Commit 3bd7f6589f ("spi: bcm2835: Overcome sglist entry length
limitation") amended the BCM2835 SPI driver with support for DMA
transfers whose buffers are not aligned to 4 bytes and require more than
one sglist entry.
When testing this feature with upcoming commits to speed up TX-only and
RX-only transfers, I noticed that SPI transmission sometimes breaks.
A function introduced by the commit, bcm2835_spi_transfer_prologue(),
performs one or two PIO transmissions as a prologue to the actual DMA
transmission. It turns out that the breakage goes away if the DONE bit
in the CS register is set when ending such a PIO transmission.
The DONE bit signifies emptiness of the TX FIFO. According to the spec,
the bit is of type RO, so writing it should never have any effect.
Perhaps the spec is wrong and the bit is actually of type RW1C.
E.g. the I2C controller on the BCM2835 does have an RW1C DONE bit which
needs to be cleared by the driver. Another, possibly more likely
explanation is that it's a hardware erratum since the issue does not
occur consistently.
Either way, amend bcm2835_spi_transfer_prologue() to always write the
DONE bit.
Usually a transmission is ended by bcm2835_spi_reset_hw(). If the
transmission was successful, the TX FIFO is empty and thus the DONE bit
is set when bcm2835_spi_reset_hw() reads the CS register. The bit is
then written back to the register, so we happen to do the right thing.
However if DONE is not set, e.g. because transmission is aborted with
a non-empty TX FIFO, the bit won't be written by bcm2835_spi_reset_hw()
and it seems possible that transmission might subsequently break. To be
on the safe side, likewise amend bcm2835_spi_reset_hw() to always write
the bit.
Tested-by: Nuno Sá <nuno.sa@analog.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Acked-by: Stefan Wahren <wahrenst@gmx.net>
Acked-by: Martin Sperl <kernel@martin.sperl.org>
Link: https://lore.kernel.org/r/edb004dff4af6106f6bfcb89e1a96391e96eb857.1564825752.git.lukas@wunner.de
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0b43e41e93 ]
Prior to this commit, removing the intel_pmc_core_pltdrv module
would cause the following warning:
Device 'intel_pmc_core.0' does not have a release() function, it is broken and must be fixed. See Documentation/kobject.txt.
WARNING: CPU: 0 PID: 2202 at drivers/base/core.c:1238 device_release+0x6f/0x80
This commit hence adds an empty release function for the driver.
Signed-off-by: M. Vefa Bicakci <m.v.b@runbox.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7d505758b1 ]
On a Xen-based PVH virtual machine with more than 4 GiB of RAM,
intel_pmc_core fails initialization with the following warning message
from the kernel, indicating that the driver is attempting to ioremap
RAM:
ioremap on RAM at 0x00000000fe000000 - 0x00000000fe001fff
WARNING: CPU: 1 PID: 434 at arch/x86/mm/ioremap.c:186 __ioremap_caller.constprop.0+0x2aa/0x2c0
...
Call Trace:
? pmc_core_probe+0x87/0x2d0 [intel_pmc_core]
pmc_core_probe+0x87/0x2d0 [intel_pmc_core]
This issue appears to manifest itself because of the following fallback
mechanism in the driver:
if (lpit_read_residency_count_address(&slp_s0_addr))
pmcdev->base_addr = PMC_BASE_ADDR_DEFAULT;
The validity of address PMC_BASE_ADDR_DEFAULT (i.e., 0xFE000000) is not
verified by the driver, which is what this patch introduces. With this
patch, if address PMC_BASE_ADDR_DEFAULT is in RAM, then the driver will
not attempt to ioremap the aforementioned address.
Signed-off-by: M. Vefa Bicakci <m.v.b@runbox.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c9c96e30ec ]
When allocating a range of LPIs for a Multi-MSI capable device,
this allocation extended to the closest power of 2.
But on the release path, the interrupts are released one by
one. This results in not releasing the "extra" range, leaking
the its_device. Trying to reprobe the device will then fail.
Fix it by releasing the LPIs the same way we allocate them.
Fixes: 8208d1708b ("irqchip/gic-v3-its: Align PCI Multi-MSI allocation on their size")
Reported-by: Jiaxing Luo <luojiaxing@huawei.com>
Tested-by: John Garry <john.garry@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/f5e948aa-e32f-3f74-ae30-31fee06c2a74@huawei.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9e323d45ba ]
With 'extra run-time crypto self tests' enabled, the selftest
for s390-xts fails with
alg: skcipher: xts-aes-s390 encryption unexpectedly succeeded on
test vector "random: len=0 klen=64"; expected_error=-22,
cfg="random: inplace use_digest nosimd src_divs=[2.61%@+4006,
84.44%@+21, 1.55%@+13, 4.50%@+344, 4.26%@+21, 2.64%@+27]"
This special case with nbytes=0 is not handled correctly and this
fix now makes sure that -EINVAL is returned when there is en/decrypt
called with 0 bytes to en/decrypt.
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9ce06497c2 ]
When running in M-mode, the S-mode plic handlers are still listed in the
device tree. Ignore them by setting the maximum threshold.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 07f1a6850c ]
When run test case:
mdadm -CR /dev/md1 -l 1 -n 4 /dev/sd[a-d] --assume-clean --bitmap=internal
mdadm -S /dev/md1
mdadm -A /dev/md1 /dev/sd[b-c] --run --force
mdadm --zero /dev/sda
mdadm /dev/md1 -a /dev/sda
echo offline > /sys/block/sdc/device/state
echo offline > /sys/block/sdb/device/state
sleep 5
mdadm -S /dev/md1
echo running > /sys/block/sdb/device/state
echo running > /sys/block/sdc/device/state
mdadm -A /dev/md1 /dev/sd[a-c] --run --force
mdadm run fail with kernel message as follow:
[ 172.986064] md: kicking non-fresh sdb from array!
[ 173.004210] md: kicking non-fresh sdc from array!
[ 173.022383] md/raid1:md1: active with 0 out of 4 mirrors
[ 173.022406] md1: failed to create bitmap (-5)
In fact, when active disk in raid1 array less than one, we
need to return fail in raid1_run().
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6e4d91aa07 ]
At boot time, the acpi_power_meter driver logs the following error level
message: "Ignoring unsafe software power cap". Having read about it from
a few sources, it seems that the error message can be quite misleading.
While the message can imply that Linux is ignoring the fact that the
system is operating in potentially dangerous conditions, the truth is
the driver found an ACPI_PMC object that supports software power
capping. The driver simply decides not to use it, perhaps because it
doesn't support the object.
The best solution is probably changing the log level from error to warning.
All sources I have found, regarding the error, have downplayed its
significance. There is not much of a reason for it to be on error level,
while causing potential confusions or misinterpretations.
Signed-off-by: Wang Shenran <shenran268@gmail.com>
Link: https://lore.kernel.org/r/20190724080110.6952-1-shenran268@gmail.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 12163cfbfc ]
It would seem like model 70h is behaving in the same way as model 30h,
so let's just add the new F3 PCI ID to the list of compatible devices.
Unlike previous Ryzen/Threadripper, Ryzen gen 3 processors do not need
temperature offsets anymore. This has been reported in the press and
verified on my Ryzen 3700X by checking that the idle temperature
reported by k10temp is matching the temperature reported by the
firmware.
Vicki Pfau sent an identical patch after I checked that no-one had
written this patch. I would have been happy about dropping my patch but
unlike for his patch series, I had already Cc:ed the x86 people and
they already reviewed the changes. Since Vicki has not answered to
any email after his initial series, let's assume she is on vacation
and let's avoid duplication of reviews from the maintainers and merge
my series. To acknowledge Vicki's anteriority, I added her S-o-b to
the patch.
v2, suggested by Guenter Roeck and Brian Woods:
- rename from 71h to 70h
Signed-off-by: Vicki Pfau <vi@endrift.com>
Signed-off-by: Marcel Bocu <marcel.p.bocu@gmail.com>
Tested-by: Marcel Bocu <marcel.p.bocu@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: "Woods, Brian" <Brian.Woods@amd.com>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: Jean Delvare <jdelvare@suse.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: linux-hwmon@vger.kernel.org
Link: https://lore.kernel.org/r/20190722174653.2391-1-marcel.p.bocu@gmail.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a22a9602b8 ]
The race was when a thread using closure_sync() notices cl->s->done == 1
before the thread calling closure_put() calls wake_up_process(). Then,
it's possible for that thread to return and exit just before
wake_up_process() is called - so we're trying to wake up a process that
no longer exists.
rcu_read_lock() is sufficient to protect against this, as there's an rcu
barrier somewhere in the process teardown path.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Acked-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 29b49958cf ]
In acpi_pci_irq_enable(), 'entry' is allocated by kzalloc() in
acpi_pci_irq_check_entry() (invoked from acpi_pci_irq_lookup()). However,
it is not deallocated if acpi_pci_irq_valid() returns false, leading to a
memory leak. To fix this issue, free 'entry' before returning 0.
Fixes: e237a55184 ("x86/ACPI/PCI: Recognize that Interrupt Line 255 means "not connected"")
Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 03d1571d95 ]
In cm_write(), 'buf' is allocated through kzalloc(). In the following
execution, if an error occurs, 'buf' is not deallocated, leading to memory
leaks. To fix this issue, free 'buf' before returning the error.
Fixes: 526b4af47f ("ACPI: Split out custom_method functionality into an own driver")
Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit af4e1c5eca ]
The AMD Ryzen gen 3 processors came with a different PCI IDs for the
function 3 & 4 which are used to access the SMN interface. The root
PCI address however remained at the same address as the model 30h.
Adding the F3/F4 PCI IDs respectively to the misc and link ids appear
to be sufficient for k10temp, so let's add them and follow up on the
patch if other functions need more tweaking.
Vicki Pfau sent an identical patch after I checked that no-one had
written this patch. I would have been happy about dropping my patch but
unlike for his patch series, I had already Cc:ed the x86 people and
they already reviewed the changes. Since Vicki has not answered to
any email after his initial series, let's assume she is on vacation
and let's avoid duplication of reviews from the maintainers and merge
my series. To acknowledge Vicki's anteriority, I added her S-o-b to
the patch.
v2, suggested by Guenter Roeck and Brian Woods:
- rename from 71h to 70h
Signed-off-by: Vicki Pfau <vi@endrift.com>
Signed-off-by: Marcel Bocu <marcel.p.bocu@gmail.com>
Tested-by: Marcel Bocu <marcel.p.bocu@gmail.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Brian Woods <brian.woods@amd.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com> # pci_ids.h
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: "Woods, Brian" <Brian.Woods@amd.com>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: Jean Delvare <jdelvare@suse.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: linux-hwmon@vger.kernel.org
Link: https://lore.kernel.org/r/20190722174510.2179-1-marcel.p.bocu@gmail.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5b0eeeaa37 ]
Commit aff138bf8e ("ARM: dts: exynos: Add TMU nodes regulator supply
for Peach boards") assigned LDO10 to Exynos Thermal Measurement Unit,
but it turned out that it supplies also some other critical parts and
board freezes/crashes when it is turned off.
The mentioned commit made Exynos TMU a consumer of that regulator and in
typical case Exynos TMU driver keeps it enabled from early boot. However
there are such configurations (example is multi_v7_defconfig), in which
some of the regulators are compiled as modules and are not available
from early boot. In such case it may happen that LDO10 is turned off by
regulator core, because it has no consumers yet (in this case consumer
drivers cannot get it, because the supply regulators for it are not yet
available). This in turn causes the board to crash. This patch restores
'always-on' property for the LDO10 regulator.
Fixes: aff138bf8e ("ARM: dts: exynos: Add TMU nodes regulator supply for Peach boards")
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0d87308cca ]
In commit 14bd9a607f ("iommu/iova: Separate atomic variables
to improve performance") Jinyu Qi identified that the atomic_cmpxchg()
in queue_iova() was causing a performance loss and moved critical fields
so that the false sharing would not impact them.
However, avoiding the false sharing in the first place seems easy.
We should attempt the atomic_cmpxchg() no more than 100 times
per second. Adding an atomic_read() will keep the cache
line mostly shared.
This false sharing came with commit 9a005a800a
("iommu/iova: Add flush timer").
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 9a005a800a ('iommu/iova: Add flush timer')
Cc: Jinyu Qi <jinyuqi@huawei.com>
Cc: Joerg Roedel <jroedel@suse.de>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c312ef1763 ]
The Linux ahci driver has historically implemented a configuration fixup
for platforms / platform-firmware that fails to enable the ports prior
to OS hand-off at boot. The fixup was originally implemented way back
before ahci moved from drivers/scsi/ to drivers/ata/, and was updated in
2007 via commit 49f2909039 "ahci: update PCS programming". The quirk
sets a port-enable bitmap in the PCS register at offset 0x92.
This quirk could be applied generically up until the arrival of the
Denverton (DNV) platform. The DNV AHCI controller architecture supports
more than 6 ports and along with that the PCS register location and
format were updated to allow for more possible ports in the bitmap. DNV
AHCI expands the register to 32-bits and moves it to offset 0x94.
As it stands there are no known problem reports with existing Linux
trying to set bits at offset 0x92 which indicates that the quirk is not
applicable. Likely it is not applicable on a wider range of platforms,
but it is difficult to discern which platforms if any still depend on
the quirk.
Rather than try to fix the PCS quirk to consider the DNV register layout
instead require explicit opt-in. The assumption is that the OS driver
need not touch this register, and platforms can be added with a new
boad_ahci_pcs7 board-id when / if problematic platforms are found in the
future. The logic in ahci_intel_pcs_quirk() looks for all Intel AHCI
instances with "legacy" board-ids and otherwise skips the quirk if the
board was matched by class-code.
Reported-by: Stephen Douthit <stephend@silicom-usa.com>
Cc: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Stephen Douthit <stephend@silicom-usa.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ca964edf0d ]
Apart from Haswell machines, all other devices have their private data
set to snd_soc_acpi_mach instance.
Changes for HSW/ BDW boards introduced with series:
https://patchwork.kernel.org/cover/10782035/
added support for dai_link platform_name adjustments within card probe
routines. These take for granted private_data points to
snd_soc_acpi_mach whereas for Haswell, it's sst_pdata instead. Change
private context of platform_device - representing machine board - to
address this.
Fixes: e87055d732 ("ASoC: Intel: haswell: platform name fixup support")
Fixes: 7e40ddcf97 ("ASoC: Intel: bdw-rt5677: platform name fixup support")
Fixes: 2d067b2807 ("ASoC: Intel: broadwell: platform name fixup support")
Signed-off-by: Cezary Rojewski <cezary.rojewski@intel.com>
Link: https://lore.kernel.org/r/20190822113616.22702-2-cezary.rojewski@intel.com
Tested-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3d70889532 ]
When running heavy memory pressure workloads, the system is throwing
endless warnings,
smartpqi 0000:23:00.0: AMD-Vi: IOMMU mapping error in map_sg (io-pages:
5 reason: -12)
Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40
07/10/2019
swapper/10: page allocation failure: order:0, mode:0xa20(GFP_ATOMIC),
nodemask=(null),cpuset=/,mems_allowed=0,4
Call Trace:
<IRQ>
dump_stack+0x62/0x9a
warn_alloc.cold.43+0x8a/0x148
__alloc_pages_nodemask+0x1a5c/0x1bb0
get_zeroed_page+0x16/0x20
iommu_map_page+0x477/0x540
map_sg+0x1ce/0x2f0
scsi_dma_map+0xc6/0x160
pqi_raid_submit_scsi_cmd_with_io_request+0x1c3/0x470 [smartpqi]
do_IRQ+0x81/0x170
common_interrupt+0xf/0xf
</IRQ>
because the allocation could fail from iommu_map_page(), and the volume
of this call could be huge which may generate a lot of serial console
output and cosumes all CPUs.
Fix it by silencing the warning in this call site, and there is still a
dev_err() later to notify the failure.
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6af86bdb8a ]
MOTU 4pre was launched in 2012 by MOTU, Inc. This commit allows userspace
applications can transmit and receive PCM frames and MIDI messages for
this model via ALSA PCM interface and RawMidi/Sequencer interfaces.
The device supports MOTU protocol version 3. Unlike the other devices, the
device is simply designed. The size of data block is fixed to 10 quadlets
during available sampling rates (44.1 - 96.0 kHz). Each data block
includes 1 source packet header, 2 data chunks for messages, 8 data chunks
for PCM samples and 2 data chunks for padding to quadlet alignment. The
device has no MIDI, optical, BNC and AES/EBU interfaces.
Like support for the other MOTU devices, the quality of playback sound
is not enough good with periodical noise yet.
$ python2 crpp < ~/git/am-config-rom/motu/motu-4pre.img
ROM header and bus information block
-----------------------------------------------------------------
400 041078cc bus_info_length 4, crc_length 16, crc 30924
404 31333934 bus_name "1394"
408 20ff7000 irmc 0, cmc 0, isc 1, bmc 0, cyc_clk_acc 255, max_rec 7 (256)
40c 0001f200 company_id 0001f2 |
410 000a41c5 device_id 00000a41c5 | EUI-64 0001f200000a41c5
root directory
-----------------------------------------------------------------
414 0004ef04 directory_length 4, crc 61188
418 030001f2 vendor
41c 0c0083c0 node capabilities per IEEE 1394
420 d1000002 --> unit directory at 428
424 8d000005 --> eui-64 leaf at 438
unit directory at 428
-----------------------------------------------------------------
428 0003ceda directory_length 3, crc 52954
42c 120001f2 specifier id
430 13000045 version
434 17103800 model
eui-64 leaf at 438
-----------------------------------------------------------------
438 0002d248 leaf_length 2, crc 53832
43c 0001f200 company_id 0001f2 |
440 000a41c5 device_id 00000a41c5 | EUI-64 0001f200000a41c5
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e01f91dff9 ]
ANA log parsing invokes nvme_update_ana_state() per ANA group desc.
This updates the state of namespaces with nsids in desc->nsids[].
Both ctrl->namespaces list and desc->nsids[] array are sorted by nsid.
Hence nvme_update_ana_state() performs a single walk over ctrl->namespaces:
- if current namespace matches the current desc->nsids[n],
this namespace is updated, and n is incremented.
- the process stops when it encounters the end of either
ctrl->namespaces end or desc->nsids[]
In case desc->nsids[n] does not match any of ctrl->namespaces,
the remaining nsids following desc->nsids[n] will not be updated.
Such situation was considered abnormal and generated WARN_ON_ONCE.
However ANA log MAY contain nsids not (yet) found in ctrl->namespaces.
For example, lets consider the following scenario:
- nvme0 exposes namespaces with nsids = [2, 3] to the host
- a new namespace nsid = 1 is added dynamically
- also, a ANA topology change is triggered
- NS_CHANGED aen is generated and triggers scan_work
- before scan_work discovers nsid=1 and creates a namespace, a NOTICE_ANA
aen was issues and ana_work receives ANA log with nsids=[1, 2, 3]
Result: ana_work fails to update ANA state on existing namespaces [2, 3]
Solution:
Change the way nvme_update_ana_state() namespace list walk
checks the current namespace against desc->nsids[n] as follows:
a) ns->head->ns_id < desc->nsids[n]: keep walking ctrl->namespaces.
b) ns->head->ns_id == desc->nsids[n]: match, update the namespace
c) ns->head->ns_id >= desc->nsids[n]: skip to desc->nsids[n+1]
This enables correct operation in the scenario described above.
This also allows ANA log to contain nsids currently invisible
to the host, i.e. inactive nsids.
Signed-off-by: Anton Eidelman <anton@lightbitslabs.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3bec2e3754 ]
In nvme spec 1.3 there is a definition for data write/read counters
from SMART log, (See section 5.14.1.2):
This value is reported in thousands (i.e., a value of 1
corresponds to 1000 units of 512 bytes read) and is rounded up.
However, in nvme target where value is reported with actual units,
but not thousands of units as the spec requires.
Signed-off-by: Tom Wu <tomwu@mellanox.com>
Reviewed-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 825d0b73cd ]
pti_clone_pmds() assumes that the supplied address is either:
- properly PUD/PMD aligned
or
- the address is actually mapped which means that independently
of the mapping level (PUD/PMD/PTE) the next higher mapping
exists.
If that's not the case the unaligned address can be incremented by PUD or
PMD size incorrectly. All callers supply mapped and/or aligned addresses,
but for the sake of robustness it's better to handle that case properly and
to emit a warning.
[ tglx: Rewrote changelog and added WARN_ON_ONCE() ]
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1908282352470.1938@nanos.tec.linutronix.de
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 696d05225c ]
The test case is
arecord -Dhw:0 -d 10 -f S16_LE -r 48000 -c 2 temp.wav &
aplay -Dhw:0 -d 30 -f S16_LE -r 48000 -c 2 test.wav
There will be error after end of arecord:
aplay: pcm_write:2051: write error: Input/output error
Capture and Playback work in parallel in master mode, one
substream stops, the other substream is impacted, the
reason is that clock is disabled wrongly.
The clock's reference count is not increased when second
substream starts, the hw_param() function returns in the
beginning because first substream is enabled, then in end
of first substream, the hw_free() disables the clock.
This patch is to move the clock enablement to the place
before checking of the device enablement in hw_param().
Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com>
Link: https://lore.kernel.org/r/1567012817-12625-1-git-send-email-shengjiu.wang@nxp.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 990784b577 ]
When PTI is disabled at boot time either because the CPU is not affected or
PTI has been disabled on the command line, the boot code still calls into
pti_finalize() which then unconditionally invokes:
pti_clone_entry_text()
pti_clone_kernel_text()
pti_clone_kernel_text() was called unconditionally before the 32bit support
was added and 32bit added the call to pti_clone_entry_text().
The call has no side effects as cloning the page tables into the available
second one, which was allocated for PTI does not create damage. But it does
not make sense either and in case that this functionality would be extended
later this might actually lead to hard to diagnose issues.
Neither function should be called when PTI is runtime disabled. Make the
invocation conditional.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20190828143124.063353972@linutronix.de
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8f35eaa5f2 ]
On architectures that discard .exit.* sections at runtime, a
warning is printed for each jump label that is used within an
in-kernel __exit annotated function:
can't patch jump_label at ehci_hcd_cleanup+0x8/0x3c
WARNING: CPU: 0 PID: 1 at kernel/jump_label.c:410 __jump_label_update+0x12c/0x138
As these functions will never get executed (they are free'd along
with the rest of initmem) - we do not need to patch them and should
not display any warnings.
The warning is displayed because the test required to satisfy
jump_entry_is_init is based on init_section_contains (__init_begin to
__init_end) whereas the test in __jump_label_update is based on
init_kernel_text (_sinittext to _einittext) via kernel_text_address).
Fixes: 1948367768 ("jump_label: Annotate entries that operate on __init code earlier")
Signed-off-by: Andrew Murray <andrew.murray@arm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 580fa1b874 ]
The A64 ISA accepts distinct (but overlapping) ranges of immediates for:
* add arithmetic instructions ('I' machine constraint)
* sub arithmetic instructions ('J' machine constraint)
* 32-bit logical instructions ('K' machine constraint)
* 64-bit logical instructions ('L' machine constraint)
... but we currently use the 'I' constraint for many atomic operations
using sub or logical instructions, which is not always valid.
When CONFIG_ARM64_LSE_ATOMICS is not set, this allows invalid immediates
to be passed to instructions, potentially resulting in a build failure.
When CONFIG_ARM64_LSE_ATOMICS is selected the out-of-line ll/sc atomics
always use a register as they have no visibility of the value passed by
the caller.
This patch adds a constraint parameter to the ATOMIC_xx and
__CMPXCHG_CASE macros so that we can pass appropriate constraints for
each case, with uses updated accordingly.
Unfortunately prior to GCC 8.1.0 the 'K' constraint erroneously accepted
'4294967295', so we must instead force the use of a register.
Signed-off-by: Andrew Murray <andrew.murray@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b397f8468f ]
When we started using a thread to catch the PERF_RECORD_BPF_EVENT meta
data events to then ask the kernel for further info (BTF, etc) for BPF
programs shortly after they get loaded, we forgot to use
unshare(CLONE_FS) as was done in:
868a832918 ("perf top: Support lookup of symbols in other mount namespaces.")
Do it so that we can enter the namespaces to read the build-ids at the
end of a 'perf record' session for the DSOs that had hits.
Before:
Starting a 'stress-ng --cpus 8' inside a container and then, outside the
container running:
# perf record -a --namespaces sleep 5
# perf buildid-list | grep stress-ng
#
We would end up with a 'perf.data' file that had no entry in its
build-id table for the /usr/bin/stress-ng binary inside the container
that got tons of PERF_RECORD_SAMPLEs.
After:
# perf buildid-list | grep stress-ng
f2ed02c68341183a124b9b0f6e2e6c493c465b29 /usr/bin/stress-ng
#
Then its just a matter of making sure that that binary debuginfo package
gets available in a place that 'perf report' will look at build-id keyed
ELF files, which, in my case, on a f30 notebook, was a matter of
installing the debuginfo file for the distro used in the container,
fedora 31:
# rpm -ivh http://fedora.c3sl.ufpr.br/linux/development/31/Everything/x86_64/debug/tree/Packages/s/stress-ng-debuginfo-0.07.29-10.fc31.x86_64.rpm
Then, because perf currently looks for those debuginfo files (richer ELF
symtab) inside that namespace (look at the setns calls):
openat(AT_FDCWD, "/proc/self/ns/mnt", O_RDONLY) = 137
openat(AT_FDCWD, "/proc/13169/ns/mnt", O_RDONLY) = 139
setns(139, CLONE_NEWNS) = 0
stat("/usr/bin/stress-ng", {st_mode=S_IFREG|0755, st_size=3065416, ...}) = 0
openat(AT_FDCWD, "/usr/bin/stress-ng", O_RDONLY) = 140
fcntl(140, F_GETFD) = 0
fstat(140, {st_mode=S_IFREG|0755, st_size=3065416, ...}) = 0
mmap(NULL, 3065416, PROT_READ, MAP_PRIVATE, 140, 0) = 0x7ff2fdc5b000
munmap(0x7ff2fdc5b000, 3065416) = 0
close(140) = 0
stat("stress-ng-0.07.29-10.fc31.x86_64.debug", 0x7fff45d71260) = -1 ENOENT (No such file or directory)
stat("/usr/bin/stress-ng-0.07.29-10.fc31.x86_64.debug", 0x7fff45d71260) = -1 ENOENT (No such file or directory)
stat("/usr/bin/.debug/stress-ng-0.07.29-10.fc31.x86_64.debug", 0x7fff45d71260) = -1 ENOENT (No such file or directory)
stat("/usr/lib/debug/usr/bin/stress-ng-0.07.29-10.fc31.x86_64.debug", 0x7fff45d71260) = -1 ENOENT (No such file or directory)
stat("/root/.debug/.build-id/f2/ed02c68341183a124b9b0f6e2e6c493c465b29", 0x7fff45d711e0) = -1 ENOENT (No such file or directory)
To only then go back to the "host" namespace to look just in the users's
~/.debug cache:
setns(137, CLONE_NEWNS) = 0
chdir("/root") = 0
close(137) = 0
close(139) = 0
stat("/root/.debug/.build-id/f2/ed02c68341183a124b9b0f6e2e6c493c465b29/elf", 0x7fff45d732e0) = -1 ENOENT (No such file or directory)
It continues to fail to resolve symbols:
# perf report | grep stress-ng | head -5
9.50% stress-ng-cpu stress-ng [.] 0x0000000000021ac1
8.58% stress-ng-cpu stress-ng [.] 0x0000000000021ab4
8.51% stress-ng-cpu stress-ng [.] 0x0000000000021489
7.17% stress-ng-cpu stress-ng [.] 0x00000000000219b6
3.93% stress-ng-cpu stress-ng [.] 0x0000000000021478
#
To overcome that we use:
# perf buildid-cache -v --add /usr/lib/debug/usr/bin/stress-ng-0.07.29-10.fc31.x86_64.debug
Adding f2ed02c68341183a124b9b0f6e2e6c493c465b29 /usr/lib/debug/usr/bin/stress-ng-0.07.29-10.fc31.x86_64.debug: Ok
#
# ls -la /root/.debug/.build-id/f2/ed02c68341183a124b9b0f6e2e6c493c465b29/elf
-rw-r--r--. 3 root root 2401184 Jul 27 07:03 /root/.debug/.build-id/f2/ed02c68341183a124b9b0f6e2e6c493c465b29/elf
# file /root/.debug/.build-id/f2/ed02c68341183a124b9b0f6e2e6c493c465b29/elf
/root/.debug/.build-id/f2/ed02c68341183a124b9b0f6e2e6c493c465b29/elf: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter \004, BuildID[sha1]=f2ed02c68341183a124b9b0f6e2e6c493c465b29, for GNU/Linux 3.2.0, with debug_info, not stripped, too many notes (256)
#
Now it finally works:
# perf report | grep stress-ng | head -5
23.59% stress-ng-cpu stress-ng [.] ackermann
23.33% stress-ng-cpu stress-ng [.] is_prime
17.36% stress-ng-cpu stress-ng [.] stress_cpu_sieve
6.08% stress-ng-cpu stress-ng [.] stress_cpu_correlate
3.55% stress-ng-cpu stress-ng [.] queens_try
#
I'll make sure that it looks for the build-id keyed files in both the
"host" namespace (the namespace the user running 'perf record' was a the
time of the recording) and in the container namespace, as it shouldn't
matter where a content based key lookup finds the ELF file to use in
resolving symbols, etc.
Reported-by: Karl Rister <krister@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Krister Johansen <kjlx@templeofstupid.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stanislav Fomichev <sdf@google.com>
Cc: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com>
Fixes: 657ee55319 ("perf evlist: Introduce side band thread")
Link: https://lkml.kernel.org/n/tip-g79k0jz41adiaeuqud742t2l@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f32c7a8e45 ]
While the MMUs is disabled, I-cache speculation can result in
instructions being fetched from the PoC. During boot we may patch
instructions (e.g. for alternatives and jump labels), and these may be
dirty at the PoU (and stale at the PoC).
Thus, while the MMU is disabled in the KPTI pagetable fixup code we may
load stale instructions into the I-cache, potentially leading to
subsequent crashes when executing regions of code which have been
modified at runtime.
Similarly to commit:
8ec4198743 ("arm64: mm: ensure patched kernel text is fetched from PoU")
... we can invalidate the I-cache after enabling the MMU to prevent such
issues.
The KPTI pagetable fixup code itself should be clean to the PoC per the
boot protocol, so no maintenance is required for this code.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 743dac494d ]
On x86, CPUs are limited in the number of interrupts they can have affined
to them as they only support 256 interrupt vectors per CPU. 32 vectors are
reserved for the CPU and the kernel reserves another 22 for internal
purposes. That leaves 202 vectors for assignement to devices.
When an interrupt is set up or the affinity is changed by the kernel or the
administrator, the vector assignment code attempts to honor the requested
affinity mask. If the vector space on the CPUs in that affinity mask is
exhausted the code falls back to a wider set of CPUs and assigns a vector
on a CPU outside of the requested affinity mask silently.
While the effective affinity is reflected in the corresponding
/proc/irq/$N/effective_affinity* files the silent breakage of the requested
affinity can lead to unexpected behaviour for administrators.
Add a pr_warn() when this happens so that adminstrators get at least
informed about it in the syslog.
[ tglx: Massaged changelog and made the pr_warn() more informative ]
Reported-by: djuran@redhat.com
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: djuran@redhat.com
Link: https://lkml.kernel.org/r/20190822143421.9535-1-nhorman@tuxdriver.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f9717178b9 ]
This fixes the following DT schemas check errors:
meson-gxbb-odroidc2.dt.yaml: gpio-regulator-tf_io: states:0: Additional items are not allowed (1800000, 1 were unexpected)
meson-gxbb-odroidc2.dt.yaml: gpio-regulator-tf_io: states:0: [3300000, 0, 1800000, 1] is too long
meson-gxbb-nexbox-a95x.dt.yaml: gpio-regulator: states:0: Additional items are not allowed (3300000, 1 were unexpected)
meson-gxbb-nexbox-a95x.dt.yaml: gpio-regulator: states:0: [1800000, 0, 3300000, 1] is too long
meson-gxbb-p200.dt.yaml: gpio-regulator: states:0: Additional items are not allowed (3300000, 1 were unexpected)
meson-gxbb-p200.dt.yaml: gpio-regulator: states:0: [1800000, 0, 3300000, 1] is too long
meson-gxl-s905x-hwacom-amazetv.dt.yaml: gpio-regulator: states:0: Additional items are not allowed (3300000, 1 were unexpected)
meson-gxl-s905x-hwacom-amazetv.dt.yaml: gpio-regulator: states:0: [1800000, 0, 3300000, 1] is too long
meson-gxbb-p201.dt.yaml: gpio-regulator: states:0: Additional items are not allowed (3300000, 1 were unexpected)
meson-gxbb-p201.dt.yaml: gpio-regulator: states:0: [1800000, 0, 3300000, 1] is too long
meson-g12b-odroid-n2.dt.yaml: gpio-regulator-tf_io: states:0: Additional items are not allowed (1800000, 1 were unexpected)
meson-g12b-odroid-n2.dt.yaml: gpio-regulator-tf_io: states:0: [3300000, 0, 1800000, 1] is too long
meson-gxl-s905x-nexbox-a95x.dt.yaml: gpio-regulator: states:0: Additional items are not allowed (3300000, 1 were unexpected)
meson-gxl-s905x-nexbox-a95x.dt.yaml: gpio-regulator: states:0: [1800000, 0, 3300000, 1] is too long
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Kevin Hilman <khilman@baylibre.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 77c84dd188 ]
Fast switching path only emits an event for the CPU of interest, whereas the
regular path emits an event for all the CPUs that had their frequency changed,
i.e. all the CPUs sharing the same policy.
With the current behavior, looking at cpu_frequency event for a given CPU that
is using the fast switching path will not give the correct frequency signal.
Signed-off-by: Douglas RAILLARD <douglas.raillard@arm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4c4cdc4c63 ]
According to the ACPI 6.3 specification, the _PSD method is optional
when using CPPC. The underlying assumption is that each CPU can change
frequency independently from all other CPUs; _PSD is provided to tell
the OS that some processors can NOT do that.
However, the acpi_get_psd() function returns ENODEV if there is no _PSD
method present, or an ACPI error status if an error occurs when evaluating
_PSD, if present. This makes _PSD mandatory when using CPPC, in violation
of the specification, and only on Linux.
This has forced some firmware writers to provide a dummy _PSD, even though
it is irrelevant, but only because Linux requires it; other OSPMs follow
the spec. We really do not want to have OS specific ACPI tables, though.
So, correct acpi_get_psd() so that it does not return an error if there
is no _PSD method present, but does return a failure when the method can
not be executed properly. This allows _PSD to be optional as it should
be.
Signed-off-by: Al Stone <ahs3@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6559ac3299 ]
Fixed misspelled words, added error check during probe
on the init of the registers, and fixed ALS/I2C control
mode.
Fixes: bc1b8492c7 ("leds: lm3532: Introduce the lm3532 LED driver")
Reported-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Dan Murphy <dmurphy@ti.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 093347abc7 ]
As pointed by cppcheck:
[drivers/media/i2c/ov9650.c:706]: (error) Shifting by a negative value is undefined behaviour
[drivers/media/i2c/ov9650.c:707]: (error) Shifting by a negative value is undefined behaviour
[drivers/media/i2c/ov9650.c:721]: (error) Shifting by a negative value is undefined behaviour
Prevent mangling with gains with invalid values.
As pointed by Sylvester, this should never happen in practice,
as min value of V4L2_CID_GAIN control is 16 (gain is always >= 16
and m is always >= 0), but it is too hard for a static analyzer
to get this, as the logic with validates control min/max is
elsewhere inside V4L2 core.
Reviewed-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 31b8b0bd6e ]
While this might not occur in practice, if the device is doing
the right thing, it would be teoretically be possible to have
both hsync_counter and vsync_counter negatives.
If this ever happen, ctrl will be undefined, but the driver
will still call:
aspeed_video_update(video, VE_CTRL, 0, ctrl);
Change the code to prevent this to happen.
This was warned by cppcheck:
[drivers/media/platform/aspeed-video.c:653]: (error) Uninitialized variable: ctrl
Reviewed-by: Eddie James <eajames@linux.ibm.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1c770f0f52 ]
In submit_urbs(), 'cam->sbuf[i].data' is allocated through kmalloc_array().
However, it is not deallocated if the following allocation for urbs fails.
To fix this issue, free 'cam->sbuf[i].data' if usb_alloc_urb() fails.
Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 42e64117d3 ]
If saa7146_register_device() fails, no cleanup is executed, leading to
memory/resource leaks. To fix this issue, perform necessary cleanup work
before returning the error.
Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 14d5511691 ]
If cec_notifier_cec_adap_unregister() is called before
cec_unregister_adapter() then everything is OK (and this is the
case today). But if it is the other way around, then
cec_notifier_unregister() is called first, and that doesn't
set n->cec_adap to NULL.
So if e.g. cec_notifier_set_phys_addr() is called after
cec_notifier_unregister() but before cec_unregister_adapter()
then n->cec_adap points to an unregistered and likely deleted
cec adapter. So just set n->cec_adap->notifier and n->cec_adap
to NULL for rubustness.
Eventually cec_notifier_unregister will disappear and this will
be simplified substantially.
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2c2b20e0da ]
Regulators should be enabled before clocks to avoid h/w hang. This
require change in exynos_bus_probe() to move exynos_bus_parse_of()
after exynos_bus_parent_parse_of() and change in error handling.
Similar change is needed in exynos_bus_exit() where clock should be
disabled before regulators.
Signed-off-by: Kamil Konieczny <k.konieczny@partner.samsung.com>
Acked-by: Chanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0ef7c7cce4 ]
The devfreq passive governor registers and unregisters devfreq
transition notifiers on DEVFREQ_GOV_START/GOV_STOP using devm wrappers.
If devfreq itself is registered with devm then a warning is triggered on
rmmod from devm_devfreq_unregister_notifier. Call stack looks like this:
devm_devfreq_unregister_notifier+0x30/0x40
devfreq_passive_event_handler+0x4c/0x88
devfreq_remove_device.part.8+0x6c/0x9c
devm_devfreq_dev_release+0x18/0x20
release_nodes+0x1b0/0x220
devres_release_all+0x78/0x84
device_release_driver_internal+0x100/0x1c0
driver_detach+0x4c/0x90
bus_remove_driver+0x7c/0xd0
driver_unregister+0x2c/0x58
platform_driver_unregister+0x10/0x18
imx_devfreq_platdrv_exit+0x14/0xd40 [imx_devfreq]
This happens because devres_release_all will first remove all the nodes
into a separate todo list so the nested devres_release from
devm_devfreq_unregister_notifier won't find anything.
Fix the warning by calling the non-devm APIS for frequency notification.
Using devm wrappers is not actually useful for a governor anyway: it
relies on the devfreq core to correctly match the GOV_START/GOV_STOP
notifications.
Fixes: 996133119f ("PM / devfreq: Add new passive governor")
Signed-off-by: Leonard Crestez <leonard.crestez@nxp.com>
Acked-by: Chanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ccf4975dca ]
<generated/ti-pm-asm-offsets.h> is only generated and included by
arch/arm/mach-omap2/, so it does not need to reside in the globally
visible include/generated/.
I renamed it to arch/arm/mach-omap2/pm-asm-offsets.h since the prefix
'ti-' is just redundant in mach-omap2/.
My main motivation of this change is to avoid the race condition for
the parallel build (-j) when CONFIG_IKHEADERS is enabled.
When it is enabled, all the headers under include/ are archived into
kernel/kheaders_data.tar.xz and exposed in the sysfs.
In the parallel build, we have no idea in which order files are built.
- If ti-pm-asm-offsets.h is built before kheaders_data.tar.xz,
the header will be included in the archive. Probably nobody will
use it, but it is harmless except that it will increase the archive
size needlessly.
- If kheaders_data.tar.xz is built before ti-pm-asm-offsets.h,
the header will not be included in the archive. However, in the next
build, the archive will be re-generated to include the newly-found
ti-pm-asm-offsets.h. This is not nice from the build system point
of view.
- If ti-pm-asm-offsets.h and kheaders_data.tar.xz are built at the
same time, the corrupted header might be included in the archive,
which does not look nice either.
This commit fixes the race.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Tested-by: Keerthy <j-keerthy@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7544fd7f38 ]
A bit unexpectedly (but still documented), request_module may
return a positive value, in case of a modprobe error.
This is currently causing issues in the devfreq framework.
When a request_module exits with a positive value, we currently
return that via ERR_PTR. However, because the value is positive,
it's not a ERR_VALUE proper, and is therefore treated as a
valid struct devfreq_governor pointer, leading to a kernel oops.
Fix this by returning -EINVAL if request_module returns a positive
value.
Fixes: b53b012805 ("PM / devfreq: Fix static checker warning in try_then_request_governor")
Signed-off-by: Ezequiel Garcia <ezequiel@collabora.com>
Reviewed-by: Chanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2eced4607a ]
ARM Erratum 754322 affects Cortex-A9 revisions r2p* and r3p*.
Automatically enable support code to mitigate the erratum when compiling
a kernel for any of the affected Renesas SoCs:
- RZ/A1: r3p0,
- R-Mobile A1: r2p4,
- R-Car M1A: r2p2-00rel0,
- R-Car H1: r3p0,
- SH-Mobile AG5: r2p2.
EMMA Mobile EV2 (r1p3) and RZ/A2 (r4p1) are not affected.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit af0bc63472 ]
Currently the R-Mobile "always-on" PM Domain is implemented by returning
-EBUSY from the generic_pm_domain.power_off() callback, and doing
nothing in the generic_pm_domain.power_on() callback. However, this
means the PM Domain core code is not aware of the semantics of this
special domain, leading to boot warnings like the following on
SH/R-Mobile SoCs:
sh_cmt e6130000.timer: PM domain c5 will not be powered off
Fix this by making the always-on nature of the domain explicit instead,
by setting the GENPD_FLAG_ALWAYS_ON flag. This removes the need for the
domain to provide power control callbacks.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Simon Horman <horms+renesas@verge.net.au>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9fac85a6db ]
<generated/at91_pm_data-offsets.h> is only generated and included by
arch/arm/mach-at91/, so it does not need to reside in the globally
visible include/generated/.
I renamed it to arch/arm/mach-at91/pm_data-offsets.h since the prefix
'at91_' is just redundant in mach-at91/.
My main motivation of this change is to avoid the race condition for
the parallel build (-j) when CONFIG_IKHEADERS is enabled.
When it is enabled, all the headers under include/ are archived into
kernel/kheaders_data.tar.xz and exposed in the sysfs.
In the parallel build, we have no idea in which order files are built.
- If at91_pm_data-offsets.h is built before kheaders_data.tar.xz,
the header will be included in the archive. Probably nobody will
use it, but it is harmless except that it will increase the archive
size needlessly.
- If kheaders_data.tar.xz is built before at91_pm_data-offsets.h,
the header will not be included in the archive. However, in the next
build, the archive will be re-generated to include the newly-found
at91_pm_data-offsets.h. This is not nice from the build system point
of view.
- If at91_pm_data-offsets.h and kheaders_data.tar.xz are built at the
same time, the corrupted header might be included in the archive,
which does not look nice either.
This commit fixes the race.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Link: https://lore.kernel.org/r/20190823024346.591-1-yamada.masahiro@socionext.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8a2eaab7da ]
AMD Family 17h systems currently require address translation in order to
report the system address of a DRAM ECC error. This is currently done
before decoding the syndrome information. The syndrome information does
not depend on the address translation, so the proper EDAC csrow/channel
reporting can function without the address. However, the syndrome
information will not be decoded if the address translation fails.
Decode the syndrome information before doing the address translation.
The syndrome information is architecturally defined in MCA_SYND and can
be considered robust. The address translation is system-specific and may
fail on newer systems without proper updates to the translation
algorithm.
Fixes: 713ad54675 ("EDAC, amd64: Define and register UMC error decode function")
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20190821235938.118710-6-Yazen.Ghannam@amd.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d971e28e2c ]
The struct chip_select array that's used for saving chip select bases
and masks is fixed at length of two. There should be one struct
chip_select for each controller, so this array should be increased to
support systems that may have more than two controllers.
Increase the size of the struct chip_select array to eight, which is the
largest number of controllers per die currently supported on AMD
systems.
Fix number of DIMMs and Chip Select bases/masks on Family17h, because
AMD Family 17h systems support 2 DIMMs, 4 CS bases, and 2 CS masks per
channel.
Also, carve out the Family 17h+ reading of the bases/masks into a
separate function. This effectively reverts the original bases/masks
reading code to before Family 17h support was added.
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20190821235938.118710-2-Yazen.Ghannam@amd.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fcd5ce4b39 ]
In dvb_create_media_entity(), 'dvbdev->entity' is allocated through
kzalloc(). Then, 'dvbdev->pads' is allocated through kcalloc(). However, if
kcalloc() fails, the allocated 'dvbdev->entity' is not deallocated, leading
to a memory leak bug. To fix this issue, free 'dvbdev->entity' before
returning -ENOMEM.
Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
Signed-off-by: Sean Young <sean@mess.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 692117c1f7 ]
Warning when p == NULL and then proceeding and dereferencing p does not
make any sense as the kernel will crash with a NULL pointer dereference
right away.
Bailing out when p == NULL and returning an error code does not cure the
underlying problem which caused p to be NULL. Though it might allow to
do proper debugging.
Same applies to the clock id check in set_process_cpu_timer().
Clean them up and make them return without trying to do further damage.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lkml.kernel.org/r/20190819143801.846497772@linutronix.de
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c268e7adea ]
KASAN: global-out-of-bounds Read in dvb_pll_attach
Syzbot reported global-out-of-bounds Read in dvb_pll_attach, while
accessing id[dvb_pll_devcount], because dvb_pll_devcount was 65,
that is more than size of 'id' which is DVB_PLL_MAX(64).
Rather than increasing dvb_pll_devcount every time, use ida so that
numbers are allocated correctly. This does mean that no more than
64 devices can be attached at the same time, but this is more than
sufficient.
usb 1-1: dvb_usb_v2: will pass the complete MPEG2 transport stream to the
software demuxer
dvbdev: DVB: registering new adapter (774 Friio White ISDB-T USB2.0)
usb 1-1: media controller created
dvbdev: dvb_create_media_entity: media entity 'dvb-demux' registered.
tc90522 0-0018: Toshiba TC90522 attached.
usb 1-1: DVB: registering adapter 0 frontend 0 (Toshiba TC90522 ISDB-T
module)...
dvbdev: dvb_create_media_entity: media entity 'Toshiba TC90522 ISDB-T
module' registered.
==================================================================
BUG: KASAN: global-out-of-bounds in dvb_pll_attach+0x6c5/0x830
drivers/media/dvb-frontends/dvb-pll.c:798
Read of size 4 at addr ffffffff89c9e5e0 by task kworker/0:1/12
CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.2.0-rc6+ #13
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Workqueue: usb_hub_wq hub_event
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0xca/0x13e lib/dump_stack.c:113
print_address_description+0x67/0x231 mm/kasan/report.c:188
__kasan_report.cold+0x1a/0x32 mm/kasan/report.c:317
kasan_report+0xe/0x20 mm/kasan/common.c:614
dvb_pll_attach+0x6c5/0x830 drivers/media/dvb-frontends/dvb-pll.c:798
dvb_pll_probe+0xfe/0x174 drivers/media/dvb-frontends/dvb-pll.c:877
i2c_device_probe+0x790/0xaa0 drivers/i2c/i2c-core-base.c:389
really_probe+0x281/0x660 drivers/base/dd.c:509
driver_probe_device+0x104/0x210 drivers/base/dd.c:670
__device_attach_driver+0x1c2/0x220 drivers/base/dd.c:777
bus_for_each_drv+0x15c/0x1e0 drivers/base/bus.c:454
__device_attach+0x217/0x360 drivers/base/dd.c:843
bus_probe_device+0x1e4/0x290 drivers/base/bus.c:514
device_add+0xae6/0x16f0 drivers/base/core.c:2111
i2c_new_client_device+0x5b3/0xc40 drivers/i2c/i2c-core-base.c:778
i2c_new_device+0x19/0x50 drivers/i2c/i2c-core-base.c:821
dvb_module_probe+0xf9/0x220 drivers/media/dvb-core/dvbdev.c:985
friio_tuner_attach+0x125/0x1d0 drivers/media/usb/dvb-usb-v2/gl861.c:536
dvb_usbv2_adapter_frontend_init
drivers/media/usb/dvb-usb-v2/dvb_usb_core.c:675 [inline]
dvb_usbv2_adapter_init drivers/media/usb/dvb-usb-v2/dvb_usb_core.c:804
[inline]
dvb_usbv2_init drivers/media/usb/dvb-usb-v2/dvb_usb_core.c:865 [inline]
dvb_usbv2_probe.cold+0x24dc/0x255d
drivers/media/usb/dvb-usb-v2/dvb_usb_core.c:980
usb_probe_interface+0x305/0x7a0 drivers/usb/core/driver.c:361
really_probe+0x281/0x660 drivers/base/dd.c:509
driver_probe_device+0x104/0x210 drivers/base/dd.c:670
__device_attach_driver+0x1c2/0x220 drivers/base/dd.c:777
bus_for_each_drv+0x15c/0x1e0 drivers/base/bus.c:454
__device_attach+0x217/0x360 drivers/base/dd.c:843
bus_probe_device+0x1e4/0x290 drivers/base/bus.c:514
device_add+0xae6/0x16f0 drivers/base/core.c:2111
usb_set_configuration+0xdf6/0x1670 drivers/usb/core/message.c:2023
generic_probe+0x9d/0xd5 drivers/usb/core/generic.c:210
usb_probe_device+0x99/0x100 drivers/usb/core/driver.c:266
really_probe+0x281/0x660 drivers/base/dd.c:509
driver_probe_device+0x104/0x210 drivers/base/dd.c:670
__device_attach_driver+0x1c2/0x220 drivers/base/dd.c:777
bus_for_each_drv+0x15c/0x1e0 drivers/base/bus.c:454
__device_attach+0x217/0x360 drivers/base/dd.c:843
bus_probe_device+0x1e4/0x290 drivers/base/bus.c:514
device_add+0xae6/0x16f0 drivers/base/core.c:2111
usb_new_device.cold+0x8c1/0x1016 drivers/usb/core/hub.c:2534
hub_port_connect drivers/usb/core/hub.c:5089 [inline]
hub_port_connect_change drivers/usb/core/hub.c:5204 [inline]
port_event drivers/usb/core/hub.c:5350 [inline]
hub_event+0x1ada/0x3590 drivers/usb/core/hub.c:5432
process_one_work+0x905/0x1570 kernel/workqueue.c:2269
process_scheduled_works kernel/workqueue.c:2331 [inline]
worker_thread+0x7ab/0xe20 kernel/workqueue.c:2417
kthread+0x30b/0x410 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
The buggy address belongs to the variable:
id+0x100/0x120
Memory state around the buggy address:
ffffffff89c9e480: fa fa fa fa 00 00 fa fa fa fa fa fa 00 00 00 00
ffffffff89c9e500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> ffffffff89c9e580: 00 00 00 00 00 00 00 00 00 00 00 00 fa fa fa fa
^
ffffffff89c9e600: 04 fa fa fa fa fa fa fa 04 fa fa fa fa fa fa fa
ffffffff89c9e680: 04 fa fa fa fa fa fa fa 04 fa fa fa fa fa fa fa
==================================================================
Reported-by: syzbot+8a8f48672560c8ca59dd@syzkaller.appspotmail.com
Signed-off-by: Sean Young <sean@mess.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f45f7b5bda ]
s390 kasan code uses sclp_early_printk to report initialization
failures. The code doing that should not be instrumented, because kasan
shadow memory has not been set up yet. Even though sclp_early_core.c is
compiled with instrumentation disabled it uses strlen function, which
is instrumented and would produce shadow memory access if used. To
avoid that, introduce uninstrumented __strlen function to be used
instead.
Before commit 7e0d92f002 ("s390/kasan: improve string/memory functions
checks") few string functions (including strlen) were escaping kasan
instrumentation due to usage of platform specific versions which are
implemented in inline assembly.
Fixes: 7e0d92f002 ("s390/kasan: improve string/memory functions checks")
Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2671828c3f ]
When taking an SError or Debug exception from EL0, we run the C
handler for these exceptions before updating the context tracking
code and unmasking lower priority interrupts.
When booting with nohz_full lockdep tells us we got this wrong:
| =============================
| WARNING: suspicious RCU usage
| 5.3.0-rc2-00010-gb4b5e9dcb11b-dirty #11271 Not tainted
| -----------------------------
| include/linux/rcupdate.h:643 rcu_read_unlock() used illegally wh!
|
| other info that might help us debug this:
|
|
| RCU used illegally from idle CPU!
| rcu_scheduler_active = 2, debug_locks = 1
| RCU used illegally from extended quiescent state!
| 1 lock held by a.out/432:
| #0: 00000000c7a79515 (rcu_read_lock){....}, at: brk_handler+0x00
|
| stack backtrace:
| CPU: 1 PID: 432 Comm: a.out Not tainted 5.3.0-rc2-00010-gb4b5e9d1
| Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno De8
| Call trace:
| dump_backtrace+0x0/0x140
| show_stack+0x14/0x20
| dump_stack+0xbc/0x104
| lockdep_rcu_suspicious+0xf8/0x108
| brk_handler+0x164/0x1b0
| do_debug_exception+0x11c/0x278
| el0_dbg+0x14/0x20
Moving the ct_user_exit calls to be before do_debug_exception() means
they are also before trace_hardirqs_off() has been updated. Add a new
ct_user_exit_irqoff macro to avoid the context-tracking code using
irqsave/restore before we've updated trace_hardirqs_off(). To be
consistent, do this everywhere.
The C helper is called enter_from_user_mode() to match x86 in the hope
we can merge them into kernel/context_tracking.c later.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Fixes: 6c81fe7925 ("arm64: enable context tracking")
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 887e975c41 ]
Fix bug added with the patch:
commit 8f3ea35929
Author: Josef Bacik <josef@toxicpanda.com>
Date: Mon Jul 16 12:11:35 2018 -0400
nbd: handle unexpected replies better
where if the timeout handler runs when the completion path is and we fail
to grab the mutex in the timeout handler we will leave a config reference
and cannot free the config later.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 60e2dde1e9 ]
In led_trigger_set(), 'event' is allocated in kasprintf(). However, it is
not deallocated in the following execution if the label 'err_activate' or
'err_add_groups' is entered, leading to memory leaks. To fix this issue,
free 'event' before returning the error.
Fixes: 52c47742f7 ("leds: triggers: send uevent when changing triggers")
Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0f6fc97501 ]
Since hw_free() can be called multiple times and not just after a stop
trigger command, we should check whether the RX or TX ready interrupt was
truly enabled previously. For this, we assure that the condition of the
wait event is always true, except when RX/TX interrupts are enabled.
Fixes: 7e0cdf545a55 ("ASoC: mchp-i2s-mcc: add driver for I2SC Multi-Channel Controller")
Signed-off-by: Codrin Ciubotariu <codrin.ciubotariu@microchip.com>
Link: https://lore.kernel.org/r/20190820162411.24836-3-codrin.ciubotariu@microchip.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7df8f9a201 ]
The BCLK divider should be calculated using the parameters that actually
make the BCLK rate: the number of channels, the sampling rate and the
sample width.
We've been using the oversample_rate previously because in the former SoCs,
the BCLK's parent is MCLK, which in turn is being used to generate the
oversample rate, so we end up with something like this:
oversample = mclk_rate / sampling_rate
bclk_div = oversample / word_size / channels
So, bclk_div = mclk_rate / sampling_rate / word_size / channels.
And this is actually better, since the oversampling ratio only plays a role
because the MCLK is its parent, not because of what BCLK is supposed to be.
Furthermore, that assumption of MCLK being the parent has been broken on
newer SoCs, so let's use the proper formula, and have the parent rate as an
argument.
Fixes: 7d2993811a ("ASoC: sun4i-i2s: Add support for H3")
Fixes: 21faaea134 ("ASoC: sun4i-i2s: Add support for A83T")
Fixes: 66ecce3325 ("ASoC: sun4i-i2s: Add compatibility with A64 codec I2S")
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
Link: https://lore.kernel.org/r/c3595e3a9788c2ef2dcc30aa3c8c4953bb5cc249.1566242458.git-series.maxime.ripard@bootlin.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 42fc2e9ef9 ]
We were getting the file by luck, from one of the paths in -I, fix it to
get it from the proper place:
$ cd tools/include/uapi/asm/
[acme@quaco asm]$ grep include bitsperlong.h
#include "../../arch/x86/include/uapi/asm/bitsperlong.h"
#include "../../arch/arm64/include/uapi/asm/bitsperlong.h"
#include "../../arch/powerpc/include/uapi/asm/bitsperlong.h"
#include "../../arch/s390/include/uapi/asm/bitsperlong.h"
#include "../../arch/sparc/include/uapi/asm/bitsperlong.h"
#include "../../arch/mips/include/uapi/asm/bitsperlong.h"
#include "../../arch/ia64/include/uapi/asm/bitsperlong.h"
#include "../../arch/riscv/include/uapi/asm/bitsperlong.h"
#include "../../arch/alpha/include/uapi/asm/bitsperlong.h"
#include <asm-generic/bitsperlong.h>
$ ls -la ../../arch/x86/include/uapi/asm/bitsperlong.h
ls: cannot access '../../arch/x86/include/uapi/asm/bitsperlong.h': No such file or directory
$ ls -la ../../../arch/*/include/uapi/asm/bitsperlong.h
-rw-rw-r--. 1 237 ../../../arch/alpha/include/uapi/asm/bitsperlong.h
-rw-rw-r--. 1 841 ../../../arch/arm64/include/uapi/asm/bitsperlong.h
-rw-rw-r--. 1 966 ../../../arch/hexagon/include/uapi/asm/bitsperlong.h
-rw-rw-r--. 1 234 ../../../arch/ia64/include/uapi/asm/bitsperlong.h
-rw-rw-r--. 1 100 ../../../arch/microblaze/include/uapi/asm/bitsperlong.h
-rw-rw-r--. 1 244 ../../../arch/mips/include/uapi/asm/bitsperlong.h
-rw-rw-r--. 1 352 ../../../arch/parisc/include/uapi/asm/bitsperlong.h
-rw-rw-r--. 1 312 ../../../arch/powerpc/include/uapi/asm/bitsperlong.h
-rw-rw-r--. 1 353 ../../../arch/riscv/include/uapi/asm/bitsperlong.h
-rw-rw-r--. 1 292 ../../../arch/s390/include/uapi/asm/bitsperlong.h
-rw-rw-r--. 1 323 ../../../arch/sparc/include/uapi/asm/bitsperlong.h
-rw-rw-r--. 1 320 ../../../arch/x86/include/uapi/asm/bitsperlong.h
$
Found while fixing some other problem, before it was escaping the
tools/ chroot and using stuff in the kernel sources:
CC /tmp/build/perf/util/find_bit.o
In file included from /git/linux/tools/include/../../arch/x86/include/uapi/asm/bitsperlong.h:11,
from /git/linux/tools/include/uapi/asm/bitsperlong.h:3,
from /git/linux/tools/include/linux/bits.h:6,
from /git/linux/tools/include/linux/bitops.h:13,
from ../lib/find_bit.c:17:
# cd /git/linux/tools/include/../../arch/x86/include/uapi/asm/
# pwd
/git/linux/arch/x86/include/uapi/asm
#
Now it is getting the one we want it to, i.e. the one inside tools/:
CC /tmp/build/perf/util/find_bit.o
In file included from /git/linux/tools/arch/x86/include/uapi/asm/bitsperlong.h:11,
from /git/linux/tools/include/linux/bits.h:6,
from /git/linux/tools/include/linux/bitops.h:13,
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-8f8cfqywmf6jk8a3ucr0ixhu@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 117acf5c29 ]
Back in 2004 we added logic to arch/ppc64/Makefile to pass
the --synthetic option to nm, if it was supported by nm.
Then in 2005 when arch/ppc64 and arch/ppc were merged, the logic to
add --synthetic was moved inside an #ifdef CONFIG_PPC64 block within
arch/powerpc/Makefile, and has remained there since.
That was fine, though crufty, until recently when a change to
init/Kconfig added a config time check that uses $(NM). On powerpc
that leads to an infinite loop because Kconfig uses $(NM) to calculate
some values, then the powerpc Makefile changes $(NM), which Kconfig
notices and restarts.
The original commit that added --synthetic simply said:
On new toolchains we need to use nm --synthetic or we miss code
symbols.
And the nm man page says that the --synthetic option causes nm to:
Include synthetic symbols in the output. These are special symbols
created by the linker for various purposes.
So it seems safe to always pass --synthetic if nm supports it, ie. on
32-bit and 64-bit, it just means 32-bit kernels might have more
symbols reported (and in practice I see no extra symbols). Making it
unconditional avoids the #ifdef CONFIG_PPC64, which in turn avoids the
infinite loop.
Debugged-by: Peter Collingbourne <pcc@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c372a35550 ]
When transitioning to supend state, uniphier_aio_dai_suspend() is called
and asserts reset lines and disables clocks.
However, if there are two or more DAIs, uniphier_aio_dai_suspend() are
called multiple times, and double reset assersion will cause.
This patch defines the counter that has the number of DAIs at first, and
whenever uniphier_aio_dai_suspend() are called, it decrements the
counter. And only if the counter is zero, it asserts reset lines and
disables clocks.
In the same way, uniphier_aio_dai_resume() are called, it increments the
counter after deasserting reset lines and enabling clocks.
Fixes: 139a342002 ("ASoC: uniphier: add support for UniPhier AIO CPU DAI driver")
Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com>
Link: https://lore.kernel.org/r/1566281764-14059-1-git-send-email-hayashi.kunihiko@socionext.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 00452ba9fd ]
There are 2 problems with the old iosf PMIC I2C bus arbritration code which
need to be addressed:
1. The lockdep code complains about a possible deadlock in the
iosf_mbi_[un]block_punit_i2c_access code:
[ 6.712662] ======================================================
[ 6.712673] WARNING: possible circular locking dependency detected
[ 6.712685] 5.3.0-rc2+ #79 Not tainted
[ 6.712692] ------------------------------------------------------
[ 6.712702] kworker/0:1/7 is trying to acquire lock:
[ 6.712712] 00000000df1c5681 (iosf_mbi_block_punit_i2c_access_count_mutex){+.+.}, at: iosf_mbi_unblock_punit_i2c_access+0x13/0x90
[ 6.712739]
but task is already holding lock:
[ 6.712749] 0000000067cb23e7 (iosf_mbi_punit_mutex){+.+.}, at: iosf_mbi_block_punit_i2c_access+0x97/0x186
[ 6.712768]
which lock already depends on the new lock.
[ 6.712780]
the existing dependency chain (in reverse order) is:
[ 6.712792]
-> #1 (iosf_mbi_punit_mutex){+.+.}:
[ 6.712808] __mutex_lock+0xa8/0x9a0
[ 6.712818] iosf_mbi_block_punit_i2c_access+0x97/0x186
[ 6.712831] i2c_dw_acquire_lock+0x20/0x30
[ 6.712841] i2c_dw_set_reg_access+0x15/0xb0
[ 6.712851] i2c_dw_probe+0x57/0x473
[ 6.712861] dw_i2c_plat_probe+0x33e/0x640
[ 6.712874] platform_drv_probe+0x38/0x80
[ 6.712884] really_probe+0xf3/0x380
[ 6.712894] driver_probe_device+0x59/0xd0
[ 6.712905] bus_for_each_drv+0x84/0xd0
[ 6.712915] __device_attach+0xe4/0x170
[ 6.712925] bus_probe_device+0x9f/0xb0
[ 6.712935] deferred_probe_work_func+0x79/0xd0
[ 6.712946] process_one_work+0x234/0x560
[ 6.712957] worker_thread+0x50/0x3b0
[ 6.712967] kthread+0x10a/0x140
[ 6.712977] ret_from_fork+0x3a/0x50
[ 6.712986]
-> #0 (iosf_mbi_block_punit_i2c_access_count_mutex){+.+.}:
[ 6.713004] __lock_acquire+0xe07/0x1930
[ 6.713015] lock_acquire+0x9d/0x1a0
[ 6.713025] __mutex_lock+0xa8/0x9a0
[ 6.713035] iosf_mbi_unblock_punit_i2c_access+0x13/0x90
[ 6.713047] i2c_dw_set_reg_access+0x4d/0xb0
[ 6.713058] i2c_dw_probe+0x57/0x473
[ 6.713068] dw_i2c_plat_probe+0x33e/0x640
[ 6.713079] platform_drv_probe+0x38/0x80
[ 6.713089] really_probe+0xf3/0x380
[ 6.713099] driver_probe_device+0x59/0xd0
[ 6.713109] bus_for_each_drv+0x84/0xd0
[ 6.713119] __device_attach+0xe4/0x170
[ 6.713129] bus_probe_device+0x9f/0xb0
[ 6.713140] deferred_probe_work_func+0x79/0xd0
[ 6.713150] process_one_work+0x234/0x560
[ 6.713160] worker_thread+0x50/0x3b0
[ 6.713170] kthread+0x10a/0x140
[ 6.713180] ret_from_fork+0x3a/0x50
[ 6.713189]
other info that might help us debug this:
[ 6.713202] Possible unsafe locking scenario:
[ 6.713212] CPU0 CPU1
[ 6.713221] ---- ----
[ 6.713229] lock(iosf_mbi_punit_mutex);
[ 6.713239] lock(iosf_mbi_block_punit_i2c_access_count_mutex);
[ 6.713253] lock(iosf_mbi_punit_mutex);
[ 6.713265] lock(iosf_mbi_block_punit_i2c_access_count_mutex);
[ 6.713276]
*** DEADLOCK ***
In practice can never happen because only the first caller which
increments iosf_mbi_block_punit_i2c_access_count will also take
iosf_mbi_punit_mutex, that is the whole purpose of the counter, which
itself is protected by iosf_mbi_block_punit_i2c_access_count_mutex.
But there is no way to tell the lockdep code about this and we really
want to be able to run a kernel with lockdep enabled without these
warnings being triggered.
2. The lockdep warning also points out another real problem, if 2 threads
both are in a block of code protected by iosf_mbi_block_punit_i2c_access
and the first thread to acquire the block exits before the second thread
then the second thread will call mutex_unlock on iosf_mbi_punit_mutex,
but it is not the thread which took the mutex and unlocking by another
thread is not allowed.
Fix this by getting rid of the notion of holding a mutex for the entire
duration of the PMIC accesses, be it either from the PUnit side, or from an
in kernel I2C driver. In general holding a mutex after exiting a function
is a bad idea and the above problems show this case is no different.
Instead 2 counters are now used, one for PMIC accesses from the PUnit
and one for accesses from in kernel I2C code. When access is requested
now the code will wait (using a waitqueue) for the counter of the other
type of access to reach 0 and on release, if the counter reaches 0 the
wakequeue is woken.
Note that the counter approach is necessary to allow nested calls.
The main reason for this is so that a series of i2c transfers can be done
with the punit blocked from accessing the bus the whole time. This is
necessary to be able to safely read/modify/write a PMIC register without
racing with the PUNIT doing the same thing.
Allowing nested iosf_mbi_block_punit_i2c_access() calls also is desirable
from a performance pov since the whole dance necessary to block the PUnit
from accessing the PMIC I2C bus is somewhat expensive.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Link: https://lkml.kernel.org/r/20190812102113.95794-1-hdegoede@redhat.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a95fbda08e ]
Force HS200 by masking bit 63 of the SDHCI capability register.
The i.MX ESDHC driver uses SDHCI_QUIRK2_CAPS_BIT63_FOR_HS400. With
that the stack checks bit 63 to descide whether HS400 is available.
Using sdhci-caps-mask allows to mask bit 63. The stack then selects
HS200 as operating mode.
This prevents rare communication errors with minimal effect on
performance:
sdhci-esdhc-imx 30b60000.usdhc: warning! HS400 strobe DLL
status REF not lock!
Signed-off-by: Stefan Agner <stefan.agner@toradex.com>
Signed-off-by: Philippe Schenker <philippe.schenker@toradex.com>
Reviewed-by: Oleksandr Suvorov <oleksandr.suvorov@toradex.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 54d895bea4 ]
'7d0c76bdf227 ("clk: qcom: Add WCSS gcc clock control for QCS404")'
introduces two new clocks to gcc. These are not used before
clk_disable_unused() and as such the clock framework tries to disable
them.
But on the EVB these registers are only accessible through TrustZone, so
these clocks must be marked as "protected" to prevent the clock code
from touching them.
Numerical values are used as the constants are not yet available in a
common tree.
Reviewed-by: Niklas Cassel <niklas.cassel@linaro.org>
Reported-by: Mark Brown <broonie@kernel.org>
Reported-by: Niklas Cassel <niklas.cassel@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9846a4524a ]
Recent changes to the atheros at803x driver caused
ethernet to stop working on this board.
In particular commit 6d4cd041f0
("net: phy: at803x: disable delay only for RGMII mode")
and commit cd28d1d6e5
("net: phy: at803x: Disable phy delay for RGMII mode")
fix the AR8031 driver to configure the phy's (RX/TX)
delays as per the 'phy-mode' in the device tree.
This now prevents ethernet from working on this board.
It used to work before those commits, because the
AR8031 comes out of reset with RX delay enabled, and
the at803x driver didn't touch the delay configuration
at all when "rgmii" mode was selected, and because
arch/arm/mach-imx/mach-imx7d.c:ar8031_phy_fixup()
unconditionally enables TX delay.
Since above commits ar8031_phy_fixup() also has no
effect anymore, and the end-result is that all delays
are disabled in the phy, no ethernet.
Update the device tree to restore functionality.
Signed-off-by: André Draszik <git@andred.net>
CC: Ilya Ledvich <ilya@compulab.co.il>
CC: Igor Grinberg <grinberg@compulab.co.il>
CC: Rob Herring <robh+dt@kernel.org>
CC: Mark Rutland <mark.rutland@arm.com>
CC: Shawn Guo <shawnguo@kernel.org>
CC: Sascha Hauer <s.hauer@pengutronix.de>
CC: Pengutronix Kernel Team <kernel@pengutronix.de>
CC: Fabio Estevam <festevam@gmail.com>
CC: NXP Linux Team <linux-imx@nxp.com>
CC: devicetree@vger.kernel.org
CC: linux-arm-kernel@lists.infradead.org
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 94c0439022 ]
Since commit d3b41b6bb4 ("m68k: Dispatch nvram_ops calls to Atari or
Mac functions"), Coldfire builds generate compiler warnings due to the
unconditional inclusion of asm/atarihw.h and asm/macintosh.h.
The inclusion of asm/atarihw.h causes warnings like this:
In file included from ./arch/m68k/include/asm/atarihw.h:25:0,
from arch/m68k/kernel/setup_mm.c:41,
from arch/m68k/kernel/setup.c:3:
./arch/m68k/include/asm/raw_io.h:39:0: warning: "__raw_readb" redefined
#define __raw_readb in_8
In file included from ./arch/m68k/include/asm/io.h:6:0,
from arch/m68k/kernel/setup_mm.c:36,
from arch/m68k/kernel/setup.c:3:
./arch/m68k/include/asm/io_no.h:16:0: note: this is the location of the previous definition
#define __raw_readb(addr) \
...
This issue is resolved by dropping the asm/raw_io.h include. It turns out
that asm/io_mm.h already includes that header file.
Moving the relevant macro definitions helps to clarify this dependency
and make it safe to include asm/atarihw.h.
The other warnings look like this:
In file included from arch/m68k/kernel/setup_mm.c:48:0,
from arch/m68k/kernel/setup.c:3:
./arch/m68k/include/asm/macintosh.h:19:35: warning: 'struct irq_data' declared inside parameter list will not be visible outside of this definition or declaration
extern void mac_irq_enable(struct irq_data *data);
^~~~~~~~
...
This issue is resolved by adding the missing linux/irq.h include.
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Acked-by: Greg Ungerer <gerg@linux-m68k.org>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit de6f97b2ba ]
compile-testing this driver on other architectures showed
multiple warnings:
drivers/net/ethernet/nxp/lpc_eth.c: In function 'lpc_eth_drv_probe':
drivers/net/ethernet/nxp/lpc_eth.c:1337:19: warning: format '%d' expects argument of type 'int', but argument 4 has type 'resource_size_t {aka long long unsigned int}' [-Wformat=]
drivers/net/ethernet/nxp/lpc_eth.c:1342:19: warning: format '%x' expects argument of type 'unsigned int', but argument 4 has type 'dma_addr_t {aka long long unsigned int}' [-Wformat=]
Use format strings that work on all architectures.
Link: https://lore.kernel.org/r/20190809144043.476786-10-arnd@arndb.de
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 34b5560db4 ]
The generic Makefile.kasan propagates CONFIG_KASAN_SHADOW_OFFSET into
KASAN_SHADOW_OFFSET, but only does so for CONFIG_KASAN_GENERIC.
Since commit:
6bd1d0be0e ("arm64: kasan: Switch to using KASAN_SHADOW_OFFSET")
... arm64 defines CONFIG_KASAN_SHADOW_OFFSET in Kconfig rather than
defining KASAN_SHADOW_OFFSET in a Makefile. Thus, if
CONFIG_KASAN_SW_TAGS && KASAN_INLINE are selected, we get build time
splats due to KASAN_SHADOW_OFFSET not being set:
| [mark@lakrids:~/src/linux]% usellvm 8.0.1 usekorg 8.1.0 make ARCH=arm64 CROSS_COMPILE=aarch64-linux- CC=clang
| scripts/kconfig/conf --syncconfig Kconfig
| CC scripts/mod/empty.o
| clang (LLVM option parsing): for the -hwasan-mapping-offset option: '' value invalid for uint argument!
| scripts/Makefile.build:273: recipe for target 'scripts/mod/empty.o' failed
| make[1]: *** [scripts/mod/empty.o] Error 1
| Makefile:1123: recipe for target 'prepare0' failed
| make: *** [prepare0] Error 2
Let's fix this by always propagating CONFIG_KASAN_SHADOW_OFFSET into
KASAN_SHADOW_OFFSET if CONFIG_KASAN is selected, moving the existing
common definition of +CFLAGS_KASAN_NOSANITIZE to the top of
Makefile.kasan.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Tested-by Steve Capper <steve.capper@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0d5078c717 ]
Not all sensors will be able to guarantee a proper initial state.
This may be either because the driver is not properly written,
or (probably unlikely) because the hardware won't support it.
While the right solution in the former case is to fix the sensor
driver, the real world not always allows right solutions, due to lack
of available documentation and support on these sensors.
Let's relax this requirement, and allow the driver to support stream start,
even if the sensor initial sequence wasn't the expected.
Also improve the warning message to better explain the problem and provide
a hint that the sensor driver needs to be fixed.
Signed-off-by: Ezequiel Garcia <ezequiel@collabora.com>
Signed-off-by: Fabio Estevam <festevam@gmail.com>
Reviewed-by: Steve Longerbeam <slongerbeam@gmail.com>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7ef57be07a ]
The streaming state should be set to the first upstream sub-device only,
not everywhere, for a sub-device driver itself knows how to best control
the streaming state of its own upstream sub-devices.
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 092e8eb90a ]
This is mostly a port of Jacopo's fix:
commit aa4bb8b883
Author: Jacopo Mondi <jacopo@jmondi.org>
Date: Fri Jul 6 05:51:52 2018 -0400
media: ov5640: Re-work MIPI startup sequence
In the OV5645 case, the changes are:
- At set_power(1) time power up MIPI Tx/Rx and set data and clock lanes in
LP11 during 'sleep' and 'idle' with MIPI clock in non-continuous mode.
- At set_power(0) time power down MIPI Tx/Rx (in addition to the current
power down of regulators and clock gating).
- At s_stream time enable/disable the MIPI interface output.
With this commit the sensor is able to enter LP-11 mode during power up,
as expected by some CSI-2 controllers.
Many thanks to Fabio Estevam for his help debugging this issue.
Tested-by: Fabio Estevam <festevam@gmail.com>
Signed-off-by: Ezequiel Garcia <ezequiel@collabora.com>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Jacopo Mondi <jacopo@jmondi.org>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0a4d8fb229 ]
Same as in the commit 0176622953 ("perf record: Support s390 random
socket_id assignment"), aarch64 also have this problem.
Without this fix:
[root@localhost perf]# ./perf report --header -I -v
...
socket_id number is too big.You may need to upgrade the perf tool.
# ========
# captured on : Thu Aug 1 22:58:38 2019
# header version : 1
...
# Core ID and Socket ID information is not available
...
With this fix:
[root@localhost perf]# ./perf report --header -I -v
...
cpumask list: 0-31
cpumask list: 32-63
cpumask list: 64-95
cpumask list: 96-127
# ========
# captured on : Thu Aug 1 22:58:38 2019
# header version : 1
...
# CPU 0: Core ID 0, Socket ID 36
# CPU 1: Core ID 1, Socket ID 36
...
# CPU 126: Core ID 126, Socket ID 8442
# CPU 127: Core ID 127, Socket ID 8442
...
Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Tzvetomir Stoyanov (VMware) <tz.stoyanov@gmail.com>
Link: http://lkml.kernel.org/r/1564717737-21602-1-git-send-email-tanxiaojun@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c7b6804994 ]
Building a combined ARMv4+XScale kernel produces these
and other build failures:
/tmp/copypage-xscale-3aa821.s: Assembler messages:
/tmp/copypage-xscale-3aa821.s:167: Error: selected processor does not support `pld [r7,#0]' in ARM mode
/tmp/copypage-xscale-3aa821.s:168: Error: selected processor does not support `pld [r7,#32]' in ARM mode
/tmp/copypage-xscale-3aa821.s:169: Error: selected processor does not support `pld [r1,#0]' in ARM mode
/tmp/copypage-xscale-3aa821.s:170: Error: selected processor does not support `pld [r1,#32]' in ARM mode
/tmp/copypage-xscale-3aa821.s:171: Error: selected processor does not support `pld [r7,#64]' in ARM mode
/tmp/copypage-xscale-3aa821.s:176: Error: selected processor does not support `ldrd r4,r5,[r7],#8' in ARM mode
/tmp/copypage-xscale-3aa821.s:180: Error: selected processor does not support `strd r4,r5,[r1],#8' in ARM mode
Add an explict .arch armv5 in the inline assembly to allow the ARMv5
specific instructions regardless of the compiler -march= target.
Link: https://lore.kernel.org/r/20190809163334.489360-5-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 00c9755524 ]
When compile-testing on other architectures, we get lots of warnings
about incorrect format strings, like:
drivers/dma/iop-adma.c: In function 'iop_adma_alloc_slots':
drivers/dma/iop-adma.c:307:6: warning: format '%x' expects argument of type 'unsigned int', but argument 6 has type 'dma_addr_t {aka long long unsigned int}' [-Wformat=]
drivers/dma/iop-adma.c: In function 'iop_adma_prep_dma_memcpy':
>> drivers/dma/iop-adma.c:518:40: warning: format '%u' expects argument of type 'unsigned int', but argument 5 has type 'size_t {aka long unsigned int}' [-Wformat=]
Use %zu for printing size_t as required, and cast the dma_addr_t
arguments to 'u64' for printing with %llx. Ideally this should use
the %pad format string, but that requires an lvalue argument that
doesn't work here.
Link: https://lore.kernel.org/r/20190809163334.489360-3-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b20a6e298b ]
Allow selecting the IR protocol, MCE or iMON, for a device that
identifies as follows (with config id 0x7e):
15c2:ffdc SoundGraph Inc. iMON PAD Remote Controller
As the driver is structured to default to iMON when both RC
protocols are supported, existing users of this device (using MCE
protocol) will need to manually switch to MCE (RC-6) protocol from
userspace (with ir-keytable, sysfs).
Signed-off-by: Darius Rad <alpha@area49.net>
Signed-off-by: Sean Young <sean@mess.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e8ba2906f6 ]
Commit e5adfc3e7e ("perf map: Synthesize maps only for thread group
leader") changed the recording side so that we no longer get mmap events
for threads other than the thread group leader (when synthesising these
events for threads which exist before perf is started).
When a file recorded after this change is loaded, the lack of mmap
records mean that unwinding is not set up for any other threads.
This can be seen in a simple record/report scenario:
perf record --call-graph=dwarf -t $TID
perf report
If $TID is a process ID then the report will show call graphs, but if
$TID is a secondary thread the output is as if --call-graph=none was
specified.
Following the rationale in that commit, move the libunwind fields into
struct map_groups and update the libunwind functions to take this
instead of the struct thread. This is only required for
unwind__finish_access which must now be called from map_groups__delete
and the others are changed for symmetry.
Note that unwind__get_entries keeps the thread argument since it is
required for symbol lookup and the libdw unwind provider uses the thread
ID.
Signed-off-by: John Keeping <john@metanate.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: e5adfc3e7e ("perf map: Synthesize maps only for thread group leader")
Link: http://lkml.kernel.org/r/20190815100146.28842-2-john@metanate.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 90776dd1c4 ]
It seems that LLVM's linker does not correctly handle variable assignments
involving section positions that are updated during the SECTIONS
parsing. Commit aa69fb62be ("arm64/efi: Mark __efistub_stext_offset as
an absolute symbol explicitly") ran into this too, but found a different
workaround.
However, this was not enough, as other variables were also miscalculated
which manifested as boot failures under UEFI where __efistub__end was
not taking the correct _end value (they should be the same):
$ ld.lld -EL -maarch64elf --no-undefined -X -shared \
-Bsymbolic -z notext -z norelro --no-apply-dynamic-relocs \
-o vmlinux.lld -T poc.lds --whole-archive vmlinux.o && \
readelf -Ws vmlinux.lld | egrep '\b(__efistub_|)_end\b'
368272: ffff000002218000 0 NOTYPE LOCAL HIDDEN 38 __efistub__end
368322: ffff000012318000 0 NOTYPE GLOBAL DEFAULT 38 _end
$ aarch64-linux-gnu-ld.bfd -EL -maarch64elf --no-undefined -X -shared \
-Bsymbolic -z notext -z norelro --no-apply-dynamic-relocs \
-o vmlinux.bfd -T poc.lds --whole-archive vmlinux.o && \
readelf -Ws vmlinux.bfd | egrep '\b(__efistub_|)_end\b'
338124: ffff000012318000 0 NOTYPE LOCAL DEFAULT ABS __efistub__end
383812: ffff000012318000 0 NOTYPE GLOBAL DEFAULT 15325 _end
To work around this, all of the __efistub_-prefixed variable assignments
need to be moved after the linker script's SECTIONS entry. As it turns
out, this also solves the problem fixed in commit aa69fb62be, so those
changes are reverted here.
Link: https://github.com/ClangBuiltLinux/linux/issues/634
Link: https://bugs.llvm.org/show_bug.cgi?id=42990
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4fd2293856 ]
When support for the IPMMU is not enabled, the FDP driver may be
probe-deferred multiple times, causing several messages to be printed
like:
rcar_fdp1 fe940000.fdp1: FCP not found (-517)
rcar_fdp1 fe944000.fdp1: FCP not found (-517)
Fix this by reducing the message level to debug level, as is done in the
VSP1 driver.
Fixes: 4710b752e0 ("[media] v4l: Add Renesas R-Car FDP1 Driver")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2f822f1da0 ]
i2c_new_dummy() can fail returning a NULL pointer. This is not checked
and the returned pointer is blindly used. Convert to
devm_i2c_new_dummy_client() which returns an ERR_PTR and also add a
validity check. Using devm_* here also fixes a leak because the dummy
client was not released in the probe error path.
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 864919ea03 ]
of_get_next_child() increments the reference count of the returning
device_node. Decrement it in the check if we are using the old or the
new DTB.
Fixes: ba1f1f70c2 ("[media] media: mtk-mdp: Fix mdp device tree")
Signed-off-by: Matthias Brugger <matthias.bgg@gmail.com>
Acked-by: Houlong Wei <houlong.wei@mediatek.com>
[hverkuil-cisco@xs4all.nl: use node instead of parent as temp variable]
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4fe94ce1c6 ]
To get the expected output we have to ignore whatever changes the user
has in its ~/.perfconfig file, so set PERF_CONFIG to /dev/null to
achieve that.
Before:
# egrep 'trace|show_' ~/.perfconfig
[trace]
show_zeros = yes
show_duration = no
show_timestamp = no
show_arg_names = no
show_prefix = yes
# echo $PERF_CONFIG
# perf test "trace + vfs_getname"
70: Check open filename arg using perf trace + vfs_getname: FAILED!
# export PERF_CONFIG=/dev/null
# perf test "trace + vfs_getname"
70: Check open filename arg using perf trace + vfs_getname: Ok
#
After:
# egrep 'trace|show_' ~/.perfconfig
[trace]
show_zeros = yes
show_duration = no
show_timestamp = no
show_arg_names = no
show_prefix = yes
# echo $PERF_CONFIG
# perf test "trace + vfs_getname"
70: Check open filename arg using perf trace + vfs_getname: Ok
#
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Taeung Song <treeze.taeung@gmail.com>
Link: https://lkml.kernel.org/n/tip-3up27pexg5i3exuzqrvt4m8u@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 61a461fcbd ]
We had this comment in Documentation/perf_counter/config.c, i.e. since
when we got this from the git sources, but never really did that
getenv("PERF_CONFIG"), do it now as I need to disable whatever
~/.perfconfig root has so that tests parsing tool output are done for
the expected default output or that we specify an alternate config file
that when read will make the tools produce expected output.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Taeung Song <treeze.taeung@gmail.com>
Fixes: 0780060124 ("perf_counter tools: add in basic glue from Git")
Link: https://lkml.kernel.org/n/tip-jo209zac9rut0dz1rqvbdlgm@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e78a7614f3 ]
Scheduling-clock interrupts can arrive late in the CPU-offline process,
after idle entry and the subsequent call to cpuhp_report_idle_dead().
Once execution passes the call to rcu_report_dead(), RCU is ignoring
the CPU, which results in lockdep complaints when the interrupt handler
uses RCU:
------------------------------------------------------------------------
=============================
WARNING: suspicious RCU usage
5.2.0-rc1+ #681 Not tainted
-----------------------------
kernel/sched/fair.c:9542 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
RCU used illegally from offline CPU!
rcu_scheduler_active = 2, debug_locks = 1
no locks held by swapper/5/0.
stack backtrace:
CPU: 5 PID: 0 Comm: swapper/5 Not tainted 5.2.0-rc1+ #681
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Bochs 01/01/2011
Call Trace:
<IRQ>
dump_stack+0x5e/0x8b
trigger_load_balance+0xa8/0x390
? tick_sched_do_timer+0x60/0x60
update_process_times+0x3b/0x50
tick_sched_handle+0x2f/0x40
tick_sched_timer+0x32/0x70
__hrtimer_run_queues+0xd3/0x3b0
hrtimer_interrupt+0x11d/0x270
? sched_clock_local+0xc/0x74
smp_apic_timer_interrupt+0x79/0x200
apic_timer_interrupt+0xf/0x20
</IRQ>
RIP: 0010:delay_tsc+0x22/0x50
Code: ff 0f 1f 80 00 00 00 00 65 44 8b 05 18 a7 11 48 0f ae e8 0f 31 48 89 d6 48 c1 e6 20 48 09 c6 eb 0e f3 90 65 8b 05 fe a6 11 48 <41> 39 c0 75 18 0f ae e8 0f 31 48 c1 e2 20 48 09 c2 48 89 d0 48 29
RSP: 0000:ffff8f92c0157ed0 EFLAGS: 00000212 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000005 RBX: ffff8c861f356400 RCX: ffff8f92c0157e64
RDX: 000000321214c8cc RSI: 00000032120daa7f RDI: 0000000000260f15
RBP: 0000000000000005 R08: 0000000000000005 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000
R13: 0000000000000000 R14: ffff8c861ee18000 R15: ffff8c861ee18000
cpuhp_report_idle_dead+0x31/0x60
do_idle+0x1d5/0x200
? _raw_spin_unlock_irqrestore+0x2d/0x40
cpu_startup_entry+0x14/0x20
start_secondary+0x151/0x170
secondary_startup_64+0xa4/0xb0
------------------------------------------------------------------------
This happens rarely, but can be forced by happen more often by
placing delays in cpuhp_report_idle_dead() following the call to
rcu_report_dead(). With this in place, the following rcutorture
scenario reproduces the problem within a few minutes:
tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 8 --duration 5 --kconfig "CONFIG_DEBUG_LOCK_ALLOC=y CONFIG_PROVE_LOCKING=y" --configs "TREE04"
This commit uses the crude but effective expedient of moving the disabling
of interrupts within the idle loop to precede the cpu_is_offline()
check. It also invokes tick_nohz_idle_stop_tick() instead of
tick_nohz_idle_stop_tick_protected() to shut off the scheduling-clock
interrupt.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
[ paulmck: Revert tick_nohz_idle_stop_tick_protected() removal, new callers. ]
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a46d14eca7 ]
Enabling WARN_DOUBLE_CLOCK in /sys/kernel/debug/sched_features causes
warning to fire in update_rq_clock. This seems to be caused by onlining
a new fair sched group not using the rq lock wrappers.
[] rq->clock_update_flags & RQCF_UPDATED
[] WARNING: CPU: 5 PID: 54385 at kernel/sched/core.c:210 update_rq_clock+0xec/0x150
[] Call Trace:
[] online_fair_sched_group+0x53/0x100
[] cpu_cgroup_css_online+0x16/0x20
[] online_css+0x1c/0x60
[] cgroup_apply_control_enable+0x231/0x3b0
[] cgroup_mkdir+0x41b/0x530
[] kernfs_iop_mkdir+0x61/0xa0
[] vfs_mkdir+0x108/0x1a0
[] do_mkdirat+0x77/0xe0
[] do_syscall_64+0x55/0x1d0
[] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Using the wrappers in online_fair_sched_group instead of the raw locking
removes this warning.
[ tglx: Use rq_*lock_irq() ]
Signed-off-by: Phil Auld <pauld@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20190801133749.11033-1-pauld@redhat.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9dc34d635c ]
Sometimes platfom may take too long to respond to the command and OS
might timeout before platform transfer the ownership of the shared
memory region to the OS with the response.
Since the mailbox channel associated with the channel is freed and new
commands are dispatch on the same channel, OS needs to wait until it
gets back the ownership. If not, either OS may end up overwriting the
platform response for the last command(which is fine as OS timed out
that command) or platform might overwrite the payload for the next
command with the response for the old.
The latter is problematic as platform may end up interpretting the
response as the payload. In order to avoid such race, let's wait until
the OS gets back the ownership before we prepare the shared memory with
the payload for the next command.
Reported-by: Jim Quinlan <james.quinlan@broadcom.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 29a3388bfc ]
Depending on how BIOS has marked the reserved region containing the 32KB
MCHBAR you can get warnings like:
resource sanity check: requesting [mem 0xfed10000-0xfed1ffff], which spans more than reserved [mem 0xfed10000-0xfed17fff]
caller dnv_rd_reg+0xc8/0x240 [pnd2_edac] mapping multiple BARs
Not all of the mmio regions used in dnv_rd_reg() are the same size. The
MCHBAR window is 32KB and the sideband ports are 64KB. Pass the correct
size to ioremap() depending on which resource we're reading from.
Signed-off-by: Stephen Douthit <stephend@silicom-usa.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fdbe4eeeb1 ]
Enabling Direct I/O with loop devices helps reducing memory usage by
avoiding double caching. 32 bit applications running on 64 bits systems
are currently not able to request direct I/O because is missing from the
lo_compat_ioctl.
This patch fixes the compatibility issue mentioned above by exporting
LOOP_SET_DIRECT_IO as additional lo_compat_ioctl() entry.
The input argument for this ioctl is a single long converted to a 1-bit
boolean, so compatibility is preserved.
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Alessio Balsini <balsini@android.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2c2b005f54 ]
Some platforms define their processors in this manner:
Device (SCK0)
{
Name (_HID, "ACPI0004" /* Module Device */) // _HID: Hardware ID
Name (_UID, "CPUSCK0") // _UID: Unique ID
Processor (CP00, 0x00, 0x00000410, 0x06){}
Processor (CP01, 0x02, 0x00000410, 0x06){}
Processor (CP02, 0x04, 0x00000410, 0x06){}
Processor (CP03, 0x06, 0x00000410, 0x06){}
Processor (CP04, 0x01, 0x00000410, 0x06){}
Processor (CP05, 0x03, 0x00000410, 0x06){}
Processor (CP06, 0x05, 0x00000410, 0x06){}
Processor (CP07, 0x07, 0x00000410, 0x06){}
Processor (CP08, 0xFF, 0x00000410, 0x06){}
Processor (CP09, 0xFF, 0x00000410, 0x06){}
Processor (CP0A, 0xFF, 0x00000410, 0x06){}
Processor (CP0B, 0xFF, 0x00000410, 0x06){}
...
The processors marked as 0xff are invalid, there are only 8 of them in
this case.
So do not print an error on ids == 0xff, just print an info message.
Actually, we could return ENODEV even on the first CPU with ID 0xff, but
ACPI spec does not forbid the 0xff value to be a processor ID. Given
0xff could be a correct one, we would break working systems if we
returned ENODEV.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b6ff24f7b5 ]
In addition, the 0day bot reported this build error:
>> drivers/ras/debugfs.c:10:5: error: redefinition of 'ras_userspace_consumers'
int ras_userspace_consumers(void)
^~~~~~~~~~~~~~~~~~~~~~~
In file included from drivers/ras/debugfs.c:3:0:
include/linux/ras.h:14:19: note: previous definition of 'ras_userspace_consumers' was here
static inline int ras_userspace_consumers(void) { return 0; }
^~~~~~~~~~~~~~~~~~~~~~~
for a riscv-specific .config where CONFIG_DEBUG_FS is not set. Fix all
that by making debugfs.o depend on that define.
[ bp: Rewrite commit message. ]
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac@vger.kernel.org
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/7053.1565218556@turing-police
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6b22659752 ]
If IOMMU_SUPPORT is not set, and COMPILE_TEST is y,
IOMMU_IOVA may be set to m. So building will fails:
drivers/staging/media/tegra-vde/iommu.o: In function `tegra_vde_iommu_map':
iommu.c:(.text+0x41): undefined reference to `alloc_iova'
iommu.c:(.text+0x56): undefined reference to `__free_iova'
Select IOMMU_IOVA while COMPILE_TEST is set to fix this.
Reported-by: Hulk Robot <hulkci@huawei.com>
Suggested-by: Dmitry Osipenko <digetx@gmail.com>
Fixes: b301f8de19 ("media: staging: media: tegra-vde: Add IOMMU support")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6898dd580a ]
arch/microblaze/ defines out_be32() and in_be32(), so don't do that
again in the driver source.
Fixes these build warnings:
../drivers/media/platform/fsl-viu.c:36: warning: "out_be32" redefined
../arch/microblaze/include/asm/io.h:50: note: this is the location of the previous definition
../drivers/media/platform/fsl-viu.c:37: warning: "in_be32" redefined
../arch/microblaze/include/asm/io.h:53: note: this is the location of the previous definition
Fixes: 29d7506863 ("media: fsl-viu: allow building it with COMPILE_TEST")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 062f5b2ae1 ]
When a disk is added to array, the following path is called in mdadm.
Manage_subdevs -> sysfs_freeze_array
-> Manage_add
-> sysfs_set_str(&info, NULL, "sync_action","idle")
Then from kernel side, Manage_add invokes the path (add_new_disk ->
validate_super = super_1_validate) to set In_sync flag.
Since In_sync means "device is in_sync with rest of array", and the new
added disk need to resync thread to help the synchronization of data.
And md_reap_sync_thread would call spare_active to set In_sync for the
new added disk finally. So don't set In_sync if array is in frozen.
Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0d8ed0e9bf ]
When add one disk to array, the md_reap_sync_thread is responsible
to activate the spare and set In_sync flag for the new member in
spare_active().
But if raid1 has one member disk A, and disk B is added to the array.
Then we offline A before all the datas are synchronized from A to B,
obviously B doesn't have the latest data as A, but B is still marked
with In_sync flag.
So let's not call spare_active under the condition, otherwise B is
still showed with 'U' state which is not correct.
Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit eeba6809d8 ]
When write bio return error, it would be added to conf->retry_list
and wait for raid1d thread to retry write and acknowledge badblocks.
In narrow_write_error(), the error bio will be split in the unit of
badblock shift (such as one sector) and raid1d thread issues them
one by one. Until all of the splited bio has finished, raid1d thread
can go on processing other things, which is time consuming.
But, there is a scene for error handling that is not necessary.
When the device has been set faulty, flush_bio_list() may end
bios in pending_bio_list with error status. Since these bios
has not been issued to the device actually, error handlding to
retry write and acknowledge badblocks make no sense.
Even without that scene, when the device is faulty, badblocks info
can not be written out to the device. Thus, we also no need to
handle the error IO.
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b99286b088 ]
The commit d5370f7548 ("arm64: prefetch: add alternative pattern for
CPUs without a prefetcher") introduced MIDR_IS_CPU_MODEL_RANGE() to be
used in has_no_hw_prefetch() with rv_min=0 which generates a compilation
warning from GCC,
In file included from ./arch/arm64/include/asm/cache.h:8,
from ./include/linux/cache.h:6,
from ./include/linux/printk.h:9,
from ./include/linux/kernel.h:15,
from ./include/linux/cpumask.h:10,
from arch/arm64/kernel/cpufeature.c:11:
arch/arm64/kernel/cpufeature.c: In function 'has_no_hw_prefetch':
./arch/arm64/include/asm/cputype.h:59:26: warning: comparison of
unsigned expression >= 0 is always true [-Wtype-limits]
_model == (model) && rv >= (rv_min) && rv <= (rv_max); \
^~
arch/arm64/kernel/cpufeature.c:889:9: note: in expansion of macro
'MIDR_IS_CPU_MODEL_RANGE'
return MIDR_IS_CPU_MODEL_RANGE(midr, MIDR_THUNDERX,
^~~~~~~~~~~~~~~~~~~~~~~
Fix it by converting MIDR_IS_CPU_MODEL_RANGE to a static inline
function.
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 06e8f5c842 ]
ADG is using clk_get_rate() under atomic context, thus, we might
have scheduling issue.
To avoid this issue, we need to get/keep clk rate under
non atomic context.
We need to handle ADG as special device at Renesas Sound driver.
From SW point of view, we want to impletent it as
rsnd_mod_ops :: prepare, but it makes code just complicate.
To avoid complicated code/patch, this patch adds new clk_rate[] array,
and keep clk IN rate when rsnd_adg_clk_enable() was called.
Reported-by: Leon Kong <Leon.KONG@cn.bosch.com>
Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Tested-by: Leon Kong <Leon.KONG@cn.bosch.com>
Link: https://lore.kernel.org/r/87v9vb0xkp.wl-kuninori.morimoto.gx@renesas.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c5e5c48c16 ]
The function free_module in file kernel/module.c as follow:
void free_module(struct module *mod) {
......
module_arch_cleanup(mod);
......
module_arch_freeing_init(mod);
......
}
Both module_arch_cleanup and module_arch_freeing_init function
would free the mod->arch.init_unw_table, which cause double free.
Here, set mod->arch.init_unw_table = NULL after remove the unwind
table to avoid double free.
Signed-off-by: chenzefeng <chenzefeng2@huawei.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1b34121d9f ]
The Linux kernel assumes that get_endpoint(alts,0) and
get_endpoint(alts,1) are eachothers feedback endpoints.
To reassure that validity it will test bsynchaddress to comply with that
assumption. But if the bsyncaddress is 0 (invalid), it will flag that as
a wrong assumption and return an error.
Fix: Skip the test if bSynchAddress is 0.
Note: those with a valid bSynchAddress should have a code quirck added.
Signed-off-by: Ard van Breemen <ard@kwaak.net>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 13776f9d40 ]
We should free the initrd reserved memblock in an aligned manner,
because the initrd reserves the memblock in an aligned manner
in arm64_memblock_init().
Otherwise there are some fragments in memblock_reserved regions
after free_initrd_mem(). e.g.:
/sys/kernel/debug/memblock # cat reserved
0: 0x0000000080080000..0x00000000817fafff
1: 0x0000000083400000..0x0000000083ffffff
2: 0x0000000090000000..0x000000009000407f
3: 0x00000000b0000000..0x00000000b000003f
4: 0x00000000b26184ea..0x00000000b2618fff
The fragments like the ranges from b0000000 to b000003f and
from b26184ea to b2618fff should be freed.
And we can do free_reserved_area() after memblock_free(),
as free_reserved_area() calls __free_pages(), once we've done
that it could be allocated somewhere else,
but memblock and iomem still say this is reserved memory.
Fixes: 05c58752f9 ("arm64: To remove initrd reserved area entry from memblock")
Signed-off-by: Junhua Huang <huang.junhua@zte.com.cn>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit cab09f3d2d ]
The TEO goveror prevents the scheduler tick from being stopped (unless
stopped already) if there is a PM QoS latency constraint for the given
CPU and the target residency of the deepest idle state matching that
constraint is below the tick boundary.
However, that is problematic if CPUs with PM QoS latency constraints
are idle for long times, because it effectively causes the tick to
run on them all the time which is wasteful. [It is also confusing
and questionable if they are full dynticks CPUs.]
To address that issue, modify the TEO governor to carry out the
entire search for the most suitable idle state (from the target
residency perspective) even if a latency constraint is present,
to allow it to determine the expected idle duration in all cases.
Also, when using the last several measured idle duration values
to refine the idle state selection, make it compare those values
with the current expected idle duration value (instead of
comparing them with the target residency of the idle state
selected so far) which should prevent the tick from being
retained when it makes sense to stop it sometimes (especially
in the presence of PM QoS latency constraints).
Fixes: b26bf6ab71 ("cpuidle: New timer events oriented governor for tickless systems")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9eced3a2f2 ]
According to latest datasheet (Rev.1, 10/2018) from below links,
in the consumer datasheet, 1.5GHz is mentioned as highest opp but
depends on speed grading fuse, and in the industrial datasheet,
1.3GHz is mentioned as highest opp but depends on speed grading
fuse. 1.5GHz and 1.3GHz opp use same voltage, so no need for
consumer part to support 1.3GHz opp, with same voltage, CPU should
run at highest frequency in order to go into idle as quick as
possible, this can save power.
That means for consumer part, 1GHz/1.5GHz are supported, for
industrial part, 800MHz/1.3GHz are supported, and then check the
speed grading fuse to limit the highest CPU frequency further.
Correct the market segment bits in opp table to make them work
according to datasheets.
https://www.nxp.com/docs/en/data-sheet/IMX8MDQLQIEC.pdfhttps://www.nxp.com/docs/en/data-sheet/IMX8MDQLQCEC.pdf
Fixes: 12629c5c37 ("arm64: dts: imx8mq: Add cpu speed grading and all OPPs")
Signed-off-by: Anson Huang <Anson.Huang@nxp.com>
Reviewed-by: Leonard Crestez <leonard.crestez@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3724ace582 ]
The grain in EDAC is defined as "minimum granularity for an error
report, in bytes". The following calculation of the grain_bits in
edac_mc is wrong:
grain_bits = fls_long(e->grain) + 1;
Where grain_bits is defined as:
grain = 1 << grain_bits
Example:
grain = 8 # 64 bit (8 bytes)
grain_bits = fls_long(8) + 1
grain_bits = 4 + 1 = 5
grain = 1 << grain_bits
grain = 1 << 5 = 32
Replace it with the correct calculation:
grain_bits = fls_long(e->grain - 1);
The example gives now:
grain_bits = fls_long(8 - 1)
grain_bits = fls_long(7)
grain_bits = 3
grain = 1 << 3 = 8
Also, check if the hardware reports a reasonable grain != 0 and fallback
with a warning to 1 byte granularity otherwise.
[ bp: massage a bit. ]
Signed-off-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20190624150758.6695-2-rrichter@marvell.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fbad01af8c ]
The synchronize_rcu_expedited() function has an INIT_WORK_ONSTACK(),
but lacks the corresponding destroy_work_on_stack(). This commit
therefore adds destroy_work_on_stack().
Reported-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Acked-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2127c01b7f ]
In build_adc_controls(), there is an if statement on line 773 to check
whether ak->adc_info is NULL:
if (! ak->adc_info ||
! ak->adc_info[mixer_ch].switch_name)
When ak->adc_info is NULL, it is used on line 792:
knew.name = ak->adc_info[mixer_ch].selector_name;
Thus, a possible null-pointer dereference may occur.
To fix this bug, referring to lines 773 and 774, ak->adc_info
and ak->adc_info[mixer_ch].selector_name are checked before being used.
This bug is found by a static analysis tool STCheck written by us.
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit dd65f7e19c ]
The last fallback of CORB/RIRB communication error recovery is to turn
on the single command mode, and this last resort usually means that
something is really screwed up. Instead of a normal dev_err(), show
the error more clearly with dev_WARN() with the caller stack trace.
Also, show the bus-reset fallback also as an error, too.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2640da4ccc ]
If the APIC was already enabled on entry of setup_local_APIC() then
disabling it soft via the SPIV register makes a lot of sense.
That masks all LVT entries and brings it into a well defined state.
Otherwise previously enabled LVTs which are not touched in the setup
function stay unmasked and might surprise the just booting kernel.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20190722105219.068290579@linutronix.de
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 747d5a1bf2 ]
A reboot request sends an IPI via the reboot vector and waits for all other
CPUs to stop. If one or more CPUs are in critical regions with interrupts
disabled then the IPI is not handled on those CPUs and the shutdown hangs
if native_stop_other_cpus() is called with the wait argument set.
Such a situation can happen when one CPU was stopped within a lock held
section and another CPU is trying to acquire that lock with interrupts
disabled. There are other scenarios which can cause such a lockup as well.
In theory the shutdown should be attempted by an NMI IPI after the timeout
period elapsed. Though the wait loop after sending the reboot vector IPI
prevents this. It checks the wait request argument and the timeout. If wait
is set, which is true for sys_reboot() then it won't fall through to the
NMI shutdown method after the timeout period has finished.
This was an oversight when the NMI shutdown mechanism was added to handle
the 'reboot IPI is not working' situation. The mechanism was added to deal
with stuck panic shutdowns, which do not have the wait request set, so the
'wait request' case was probably not considered.
Remove the wait check from the post reboot vector IPI wait loop and enforce
that the wait loop in the NMI fallback path is invoked even if NMI IPIs are
disabled or the registration of the NMI handler fails. That second wait
loop will then hang if not all CPUs shutdown and the wait argument is set.
[ tglx: Avoid the hard to parse line break in the NMI fallback path,
add comments and massage the changelog ]
Fixes: 7d007d21e5 ("x86/reboot: Use NMI to assist in shutting down if IRQ fails")
Signed-off-by: Grzegorz Halat <ghalat@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Don Zickus <dzickus@redhat.com>
Link: https://lkml.kernel.org/r/20190628122813.15500-1-ghalat@redhat.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit cc8bf19137 ]
In course of developing shorthand based IPI support issues with the
function which tries to clear eventually pending ISR bits in the local APIC
were observed.
1) O-day testing triggered the WARN_ON() in apic_pending_intr_clear().
This warning is emitted when the function fails to clear pending ISR
bits or observes pending IRR bits which are not delivered to the CPU
after the stale ISR bit(s) are ACK'ed.
Unfortunately the function only emits a WARN_ON() and fails to dump
the IRR/ISR content. That's useless for debugging.
Feng added spot on debug printk's which revealed that the stale IRR
bit belonged to the APIC timer interrupt vector, but adding ad hoc
debug code does not help with sporadic failures in the field.
Rework the loop so the full IRR/ISR contents are saved and on failure
dumped.
2) The loop termination logic is interesting at best.
If the machine has no TSC or cpu_khz is not known yet it tries 1
million times to ack stale IRR/ISR bits. What?
With TSC it uses the TSC to calculate the loop termination. It takes a
timestamp at entry and terminates the loop when:
(rdtsc() - start_timestamp) >= (cpu_hkz << 10)
That's roughly one second.
Both methods are problematic. The APIC has 256 vectors, which means
that in theory max. 256 IRR/ISR bits can be set. In practice this is
impossible and the chance that more than a few bits are set is close
to zero.
With the pure loop based approach the 1 million retries are complete
overkill.
With TSC this can terminate too early in a guest which is running on a
heavily loaded host even with only a couple of IRR/ISR bits set. The
reason is that after acknowledging the highest priority ISR bit,
pending IRRs must get serviced first before the next round of
acknowledge can take place as the APIC (real and virtualized) does not
honour EOI without a preceeding interrupt on the CPU. And every APIC
read/write takes a VMEXIT if the APIC is virtualized. While trying to
reproduce the issue 0-day reported it was observed that the guest was
scheduled out long enough under heavy load that it terminated after 8
iterations.
Make the loop terminate after 512 iterations. That's plenty enough
in any case and does not take endless time to complete.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20190722105219.158847694@linutronix.de
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a07db5c086 ]
On !CONFIG_RT_GROUP_SCHED configurations it is currently not possible to
move RT tasks between cgroups to which CPU controller has been attached;
but it is oddly possible to first move tasks around and then make them
RT (setschedule to FIFO/RR).
E.g.:
# mkdir /sys/fs/cgroup/cpu,cpuacct/group1
# chrt -fp 10 $$
# echo $$ > /sys/fs/cgroup/cpu,cpuacct/group1/tasks
bash: echo: write error: Invalid argument
# chrt -op 0 $$
# echo $$ > /sys/fs/cgroup/cpu,cpuacct/group1/tasks
# chrt -fp 10 $$
# cat /sys/fs/cgroup/cpu,cpuacct/group1/tasks
2345
2598
# chrt -p 2345
pid 2345's current scheduling policy: SCHED_FIFO
pid 2345's current scheduling priority: 10
Also, as Michal noted, it is currently not possible to enable CPU
controller on unified hierarchy with !CONFIG_RT_GROUP_SCHED (if there
are any kernel RT threads in root cgroup, they can't be migrated to the
newly created CPU controller's root in cgroup_update_dfl_csses()).
Existing code comes with a comment saying the "we don't support RT-tasks
being in separate groups". Such comment is however stale and belongs to
pre-RT_GROUP_SCHED times. Also, it doesn't make much sense for
!RT_GROUP_ SCHED configurations, since checks related to RT bandwidth
are not performed at all in these cases.
Make moving RT tasks between CPU controller groups viable by removing
special case check for RT (and DEADLINE) tasks.
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: lizefan@huawei.com
Cc: longman@redhat.com
Cc: luca.abeni@santannapisa.it
Cc: rostedt@goodmis.org
Link: https://lkml.kernel.org/r/20190719063455.27328-1-juri.lelli@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f6cad8df6b ]
The load_balance() has a dedicated mecanism to detect when an imbalance
is due to CPU affinity and must be handled at parent level. In this case,
the imbalance field of the parent's sched_group is set.
The description of sg_imbalanced() gives a typical example of two groups
of 4 CPUs each and 4 tasks each with a cpumask covering 1 CPU of the first
group and 3 CPUs of the second group. Something like:
{ 0 1 2 3 } { 4 5 6 7 }
* * * *
But the load_balance fails to fix this UC on my octo cores system
made of 2 clusters of quad cores.
Whereas the load_balance is able to detect that the imbalanced is due to
CPU affinity, it fails to fix it because the imbalance field is cleared
before letting parent level a chance to run. In fact, when the imbalance is
detected, the load_balance reruns without the CPU with pinned tasks. But
there is no other running tasks in the situation described above and
everything looks balanced this time so the imbalance field is immediately
cleared.
The imbalance field should not be cleared if there is no other task to move
when the imbalance is detected.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/1561996022-28829-1-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 84ec3a0787 ]
time/tick-broadcast: Fix tick_broadcast_offline() lockdep complaint
The TASKS03 and TREE04 rcutorture scenarios produce the following
lockdep complaint:
WARNING: inconsistent lock state
5.2.0-rc1+ #513 Not tainted
--------------------------------
inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
migration/1/14 [HC0[0]:SC0[0]:HE1:SE1] takes:
(____ptrval____) (tick_broadcast_lock){?...}, at: tick_broadcast_offline+0xf/0x70
{IN-HARDIRQ-W} state was registered at:
lock_acquire+0xb0/0x1c0
_raw_spin_lock_irqsave+0x3c/0x50
tick_broadcast_switch_to_oneshot+0xd/0x40
tick_switch_to_oneshot+0x4f/0xd0
hrtimer_run_queues+0xf3/0x130
run_local_timers+0x1c/0x50
update_process_times+0x1c/0x50
tick_periodic+0x26/0xc0
tick_handle_periodic+0x1a/0x60
smp_apic_timer_interrupt+0x80/0x2a0
apic_timer_interrupt+0xf/0x20
_raw_spin_unlock_irqrestore+0x4e/0x60
rcu_nocb_gp_kthread+0x15d/0x590
kthread+0xf3/0x130
ret_from_fork+0x3a/0x50
irq event stamp: 171
hardirqs last enabled at (171): [<ffffffff8a201a37>] trace_hardirqs_on_thunk+0x1a/0x1c
hardirqs last disabled at (170): [<ffffffff8a201a53>] trace_hardirqs_off_thunk+0x1a/0x1c
softirqs last enabled at (0): [<ffffffff8a264ee0>] copy_process.part.56+0x650/0x1cb0
softirqs last disabled at (0): [<0000000000000000>] 0x0
[...]
To reproduce, run the following rcutorture test:
$ tools/testing/selftests/rcutorture/bin/kvm.sh --duration 5 --kconfig "CONFIG_DEBUG_LOCK_ALLOC=y CONFIG_PROVE_LOCKING=y" --configs "TASKS03 TREE04"
It turns out that tick_broadcast_offline() was an innocent bystander.
After all, interrupts are supposed to be disabled throughout
take_cpu_down(), and therefore should have been disabled upon entry to
tick_offline_cpu() and thus to tick_broadcast_offline(). This suggests
that one of the CPU-hotplug notifiers was incorrectly enabling interrupts,
and leaving them enabled on return.
Some debugging code showed that the culprit was sched_cpu_dying().
It had irqs enabled after return from sched_tick_stop(). Which in turn
had irqs enabled after return from cancel_delayed_work_sync(). Which is a
wrapper around __cancel_work_timer(). Which can sleep in the case where
something else is concurrently trying to cancel the same delayed work,
and as Thomas Gleixner pointed out on IRC, sleeping is a decidedly bad
idea when you are invoked from take_cpu_down(), regardless of the state
you leave interrupts in upon return.
Code inspection located no reason why the delayed work absolutely
needed to be canceled from sched_tick_stop(): The work is not
bound to the outgoing CPU by design, given that the whole point is
to collect statistics without disturbing the outgoing CPU.
This commit therefore simply drops the cancel_delayed_work_sync() from
sched_tick_stop(). Instead, a new ->state field is added to the tick_work
structure so that the delayed-work handler function sched_tick_remote()
can avoid reposting itself. A cpu_is_offline() check is also added to
sched_tick_remote() to avoid mucking with the state of an offlined CPU
(though it does appear safe to do so). The sched_tick_start() and
sched_tick_stop() functions also update ->state, and sched_tick_start()
also schedules the delayed work if ->state indicates that it is not
already in flight.
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
[ paulmck: Apply Peter Zijlstra and Frederic Weisbecker atomics feedback. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190625165238.GJ26519@linux.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8791a102ce ]
The power down and reset GPIO are optional, but the return value
from devm_gpiod_get_optional() needs to be checked and propagated
in the case of error, so that probe deferral can work.
Signed-off-by: Fabio Estevam <festevam@gmail.com>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1a03f91c2c ]
Building a KASAN-enabled kernel with clang ends up in a case where too
much is inlined into vivid_thread_vid_cap() and the stack usage grows
a lot, possibly when the register allocation fails to produce efficient
code and spills a lot of temporaries to the stack. This uses more
than twice the amount of stack than the sum of the individual functions
when they are not inlined:
drivers/media/platform/vivid/vivid-kthread-cap.c:766:12: error: stack frame size of 2208 bytes in function 'vivid_thread_vid_cap' [-Werror,-Wframe-larger-than=]
Marking two of the key functions in here as 'noinline_for_stack' avoids
the pathological case in clang without any apparent downside for gcc.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8d86a15649 ]
When reaching the end of stream, V4L2 clients may expect the
V4L2_EOS_EVENT before being able to dequeue the last buffer, which has
the V4L2_BUF_FLAG_LAST flag set.
If the vb2_poll() function first checks for events and afterwards if
buffers are available, a driver can queue the V4L2_EOS_EVENT event and
return the buffer after the check for events but before the check for
buffers. This causes vb2_poll() to signal that the buffer with
V4L2_BUF_FLAG_LAST can be read without the V4L2_EOS_EVENT being
available.
First, check for available buffers and afterwards for events to ensure
that if vb2_poll() signals POLLIN | POLLRDNORM for the
V4L2_BUF_FLAG_LAST buffer, it also signals POLLPRI for the
V4L2_EOS_EVENT.
Signed-off-by: Michael Tretter <m.tretter@pengutronix.de>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit da79bf41a4 ]
The call to of_get_child_by_name returns a node pointer with refcount
incremented thus it must be explicitly decremented after the last
usage.
Detected by coccinelle with the following warnings:
drivers/media/platform/exynos4-is/fimc-is.c:813:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 807, but without a corresponding object release within this function.
drivers/media/platform/exynos4-is/fimc-is.c:870:1-7: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 807, but without a corresponding object release within this function.
drivers/media/platform/exynos4-is/fimc-is.c:885:1-7: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 807, but without a corresponding object release within this function.
drivers/media/platform/exynos4-is/media-dev.c:545:1-7: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 541, but without a corresponding object release within this function.
drivers/media/platform/exynos4-is/media-dev.c:528:1-7: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 499, but without a corresponding object release within this function.
drivers/media/platform/exynos4-is/media-dev.c:534:1-7: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 499, but without a corresponding object release within this function.
Signed-off-by: Wen Yang <wen.yang99@zte.com.cn>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5dd4b89dc0 ]
The rc-mm protocol can't be decoded by the mtk-cir since the de-glitch
filter removes pulses/spaces shorter than 294 microseconds.
Tested on a BananaPi R2.
Signed-off-by: Sean Young <sean@mess.org>
Acked-by: Sean Wang <sean.wang@kernel.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 765bb8610d ]
When CONFIG_DVB_DIB9000 is disabled, we can still compile code that
now fails to link against dibx000_i2c_set_speed:
drivers/media/usb/dvb-usb/dib0700_devices.o: In function `dib01x0_pmu_update.constprop.7':
dib0700_devices.c:(.text.unlikely+0x1c9c): undefined reference to `dibx000_i2c_set_speed'
The call sites are both through dib01x0_pmu_update(), which gets passed
an 'i2c' pointer from dib9000_get_i2c_master(), which has returned
NULL. Checking this pointer seems to be a good idea anyway, and it avoids
the link failure in most cases.
Sean Young found another case that is not fixed by that, where certain
gcc versions leave an unused function in place that causes the link error,
but adding an explict IS_ENABLED() check also solves this.
Fixes: b7f54910ce ("V4L/DVB (4647): Added module for DiB0700 based devices")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Sean Young <sean@mess.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 04c8027764 ]
When application goes through SUSPEND/STOP->PREPARE->START
cycle, we should always reprogram the SOF device to start
DMA from a known state so that hw_ptr/appl_ptrs remain valid.
This is expected by ALSA core as it resets the buffer
state as part of prepare (see snd_pcm_do_prepare()).
Fix the issue by forcing reconfiguration of the FW with
STREAM_PCM_PARAMS in prepare(). Use combined logic to handle
prepare and the existing flow to reprogram hw-params after
system suspend.
Without the fix, first call to pcm pointer() will return
an invalid hw_ptr and application may immediately observe XRUN
status, unless "start_threshold" SW parameter is set to maximum
value by the application.
Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Link: https://lore.kernel.org/r/20190722141402.7194-3-pierre-louis.bossart@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ed2abfebb0 ]
Firmware files are in ASCII, using 2 hex characters per byte. The
maximum length of a firmware string is therefore
16 (commands) * 2 (bytes per command) * 2 (characters per byte) = 64
Fixes: ff45262a85 ("leds: add new LP5562 LED driver")
Signed-off-by: Nick Stoughton <nstoughton@logitech.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6e37ccf78a ]
We need to use the proper types and convert between physical addresses
and dma addresses here to avoid mismatch warnings. This is especially
important on systems with a different size for dma addresses and
physical addresses. Otherwise, we get the following warning:
drivers/firmware/qcom_scm.c: In function "qcom_scm_assign_mem":
drivers/firmware/qcom_scm.c:469:47: error: passing argument 3 of "dma_alloc_coherent" from incompatible pointer type [-Werror=incompatible-pointer-types]
We also fix the size argument to dma_free_coherent() because that size
doesn't need to be aligned after it's already aligned on the allocation
size. In fact, dma debugging expects the same arguments to be passed to
both the allocation and freeing sides of the functions so changing the
size is incorrect regardless.
Reported-by: Ian Jackson <ian.jackson@citrix.com>
Cc: Ian Jackson <ian.jackson@citrix.com>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
Cc: Avaneesh Kumar Dwivedi <akdwived@codeaurora.org>
Tested-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 962f170d93 ]
According to the datasheet http://www.ti.com/lit/ds/symlink/lm36274.pdf:
Table 23. VPOS Bias Register Field Descriptions VPOS[5:0]:
VPOS voltage (50-mV steps): VPOS = 4 V + (Code × 50 mV), 6.5 V max
000000 = 4 V
000001 = 4.05 V
:
011110 = 5.5 V (Default)
:
110010 = 6.5 V
110011 to 111111 map to 6.5 V
So the LM36274_LDO_VSEL_MAX should be 0b110010 (0x32).
The valid selectors are 0 ... LM36274_LDO_VSEL_MAX, n_voltages should be
LM36274_LDO_VSEL_MAX + 1. Similarly, the n_voltages should be
LM36274_BOOST_VSEL_MAX + 1 for LM36274_BOOST.
Fixes: bff5e80715 ("regulator: lm363x: Add support for LM36274")
Signed-off-by: Axel Lin <axel.lin@ingics.com>
Link: https://lore.kernel.org/r/20190626132632.32629-2-axel.lin@ingics.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1e2cc8c5e0 ]
According to the datasheet https://www.ti.com/lit/ds/symlink/lm3632a.pdf
Table 20. VPOS Bias Register Field Descriptions VPOS[5:0]
Sets the Positive Display Bias (LDO) Voltage (50 mV per step)
000000: 4 V
000001: 4.05 V
000010: 4.1 V
....................
011101: 5.45 V
011110: 5.5 V (Default)
011111: 5.55 V
....................
100111: 5.95 V
101000: 6 V
Note: Codes 101001 to 111111 map to 6 V
The LM3632_LDO_VSEL_MAX should be 0b101000 (0x28), so the maximum voltage
can match the datasheet.
Fixes: 3a8d1a73a0 ("regulator: add LM363X driver")
Signed-off-by: Axel Lin <axel.lin@ingics.com>
Link: https://lore.kernel.org/r/20190626132632.32629-1-axel.lin@ingics.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 551626ec0a ]
The HDMI jack handling reports the state change always via
snd_jack_report() whenever hdmi_present_sense() is called, even if the
state itself doesn't change from the previous time. This is mostly
harmless but still a bit confusing to user-space.
This patch reduces such spurious jack state changes and reports only
when the state really changed. Also, as a minor optimization, avoid
overwriting the pin ELD data when the state is identical.
Reviewed-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3355c91b79 ]
Add NULL check after kcalloc.
Fix below issue reported by coccicheck
./drivers/cpufreq/armada-8k-cpufreq.c:138:1-12: alloc with no test,
possible model on line 151
Signed-off-by: Hariprasad Kelam <hariprasad.kelam@gmail.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ef9bec2748 ]
snd_hdac_ext_bus_device_exit() has been recently modified
to no longer free the hdac device. SOF allocates memory for
hdac_device and hda_hda_priv with kzalloc. Make them
device-managed instead so that they will be freed when the
SOF driver is unloaded.
Because of the above change, hda_codec is device-managed and
it will be freed when the ASoC device is removed. Freeing
the codec in snd_hda_codec_dev_release() leads to kernel
panic while unloading and reloading the ASoC driver. So,
avoid freeing the hda_codec for ASoC driver. This is done in
the same patch to avoid bisect failure.
Signed-off-by: Libin Yang <libin.yang@intel.com>
Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Reviewed-by: Takashi Iwai <tiwai@suse.de>
Link: https://lore.kernel.org/r/20190626070450.7229-1-ranjani.sridharan@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d19a79ee38 ]
Add the device ID of upcoming BlueField-2 integrated ConnectX-6 Dx
network controller. Its VFs will be using the generic VF device ID:
0x101e "ConnectX Family mlx5Gen Virtual Function".
Fixes: 2e9d3e83ab ("net/mlx5: Update the list of the PCI supported devices")
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a66b10c05ee2d744189e9a2130394b070883d289 ]
Yuchung Cheng and Marek Majkowski independently reported a weird
behavior of TCP_USER_TIMEOUT option when used at connect() time.
When the TCP_USER_TIMEOUT is reached, tcp_write_timeout()
believes the flow should live, and the following condition
in tcp_clamp_rto_to_user_timeout() programs one jiffie timers :
remaining = icsk->icsk_user_timeout - elapsed;
if (remaining <= 0)
return 1; /* user timeout has passed; fire ASAP */
This silly situation ends when the max syn rtx count is reached.
This patch makes sure we honor both TCP_SYNCNT and TCP_USER_TIMEOUT,
avoiding these spurious SYN packets.
Fixes: b701a99e43 ("tcp: Add tcp_clamp_rto_to_user_timeout() helper to improve accuracy")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Yuchung Cheng <ycheng@google.com>
Reported-by: Marek Majkowski <marek@cloudflare.com>
Cc: Jon Maxwell <jmaxwell37@gmail.com>
Link: https://marc.info/?l=linux-netdev&m=156940118307949&w=2
Acked-by: Jon Maxwell <jmaxwell37@gmail.com>
Tested-by: Marek Majkowski <marek@cloudflare.com>
Signed-off-by: Marek Majkowski <marek@cloudflare.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d22fcc806b ]
Before this patch, when adding multiple ethtool steering rules with
identical classification, the driver used to append the new destination
to the already existing hw rule, which caused the hw to forward the
traffic to all destinations (rx queues).
Here we avoid this by setting the "no append" mlx5 fs core flag when
adding a new ethtool rule.
Fixes: 6dc6071cfc ("net/mlx5e: Add ethtool flow steering support")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 77d5bc7e6a ]
Julian noted that rt_uses_gateway has a more subtle use than 'is gateway
set':
https://lore.kernel.org/netdev/alpine.LFD.2.21.1909151104060.2546@ja.home.ssi.bg/
Revert that part of the commit referenced in the Fixes tag.
Currently, there are no u8 holes in 'struct rtable'. There is a 4-byte hole
in the second cacheline which contains the gateway declaration. So move
rt_gw_family down to the gateway declarations since they are always used
together, and then re-use that u8 for rt_uses_gateway. End result is that
rtable size is unchanged.
Fixes: 1550c17193 ("ipv4: Prepare rtable for IPv6 gateway")
Reported-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 407d8098cb ]
The Micrel KSZ9031 PHY may fail to establish a link when the Asymmetric
Pause capability is set. This issue is described in a Silicon Errata
(DS80000691D or DS80000692D), which advises to always disable the
capability.
Micrel KSZ9021 has no errata, but has the same issue with Asymmetric Pause.
This patch apply the same workaround as the one for KSZ9031.
Fixes: 3aed3e2a14 ("net: phy: micrel: add Asym Pause workaround")
Signed-off-by: Hans Andersson <hans.andersson@cellavision.se>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e84622ce24 ]
Some distributions (e.g., debian buster) do not install ping6. Re-use
the hook in pmtu.sh to detect this and fallback to ping.
Fixes: 735ab2f65d ("selftests: Add test with multiple prefixes using single nexthop")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 7b09c2d052 ]
Yi Ren reported an issue discovered by syzkaller, and bisected
to the cited commit.
Many thanks to Yi, this trivial patch does not reflect the patient
work that has been done.
Fixes: d64a1f574a ("ipv6: honor RT6_LOOKUP_F_DST_NOREF in rule lookup logic")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Wei Wang <weiwan@google.com>
Bisected-and-reported-by: Yi Ren <c4tren@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit fe1587a7de ]
In mlx5 parse_tunnel_attr() function dispatch on encap IP address type
is performed by directly checking flow_rule_match_key() on
FLOW_DISSECTOR_KEY_ENC_IPV4_ADDRS, and then on
FLOW_DISSECTOR_KEY_ENC_IPV6_ADDRS. However, since those are stored in
union, first check is always true if any type of encap address is set,
which leads to IPv6 tunnel encap address being parsed as IPv4 by mlx5.
Determine correct IP address type by checking control key first and if
it set, take address type from match.key->addr_type.
Fixes: d1bda7eecd ("net/mlx5e: Allow matching only enc_key_id/enc_dst_port for decapsulation action")
Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 8d3d7c2029 ]
Endpoints with zero wMaxPacketSize are not usable for transferring
data. Ignore such endpoints when looking for valid in, out and
status pipes, to make the drivers more robust against invalid and
meaningless descriptors.
The wMaxPacketSize of these endpoints are used for memory allocations
and as divisors in many usbnet minidrivers. Avoiding zero is therefore
critical.
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 6b3656a60f ]
There was a bug in the previous logic that attempted to ensure gain cycling
gets inflight above BDP even for small BDPs. This code correctly raised and
lowered target inflight values during the gain cycle. And this code
correctly ensured that cwnd was raised when probing bandwidth. However, it
did not correspondingly ensure that cwnd was *not* raised in this way when
*not* probing for bandwidth. The result was that small-BDP flows that were
always cwnd-bound could go for many cycles with a fixed cwnd, and not probe
or yield bandwidth at all. This meant that multiple small-BDP flows could
fail to converge in their bandwidth allocations.
Fixes: 3c346b233c68 ("tcp_bbr: fix bw probing to raise in-flight data for very small BDPs")
Signed-off-by: Kevin(Yudong) Yang <yyd@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0360894a05 ]
Some distributions (e.g., debian buster) do not install ping6. Re-use
the hook in pmtu.sh to detect this and fallback to ping.
Fixes: a0e11da78f ("fib_tests: Add tests for metrics on routes")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ea8564c865 ]
userspace openvswitch patch "(dpif-linux: Implement the API
functions to allow multiple handler threads read upcall)"
changes its type from U32 to UNSPEC, but leave the kernel
unchanged
and after kernel 6e237d099f "(netlink: Relax attr validation
for fixed length types)", this bug is exposed by the below
warning
[ 57.215841] netlink: 'ovs-vswitchd': attribute type 5 has an invalid length.
Fixes: 5cd667b0a4 ("openvswitch: Allow each vport to have an array of 'port_id's")
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 8572cea146 ]
In nfp_flower_spawn_phy_reprs, in the for loop over eth_tbl if any of
intermediate allocations or initializations fail memory is leaked.
requiered releases are added.
Fixes: b945245297 ("nfp: flower: add per repr private data for LAG offload")
Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 8ce39eb5a6 ]
In nfp_flower_spawn_vnic_reprs in the loop if initialization or the
allocations fail memory is leaked. Appropriate releases are added.
Fixes: b945245297 ("nfp: flower: add per repr private data for LAG offload")
Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4f28bd956e ]
The size of individual pages in the page pool in given by an order. The
order is the binary logarithm of the number of pages that make up one of
the pages in the pool. However, the driver currently passes the number
of pages rather than the order, so it ends up wasting quite a bit of
memory.
Fix this by taking the binary logarithm and passing that in the order
field.
Fixes: 2af6106ae9 ("net: stmmac: Introducing support for Page Pool")
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 92974a1d00 ]
current 'sample' action doesn't push the mac header of ingress packets if
they are received by a layer 3 tunnel (like gre or sit); but it forgot to
check for gre over ipv6, so the following script:
# tc q a dev $d clsact
# tc f a dev $d ingress protocol ip flower ip_proto icmp action sample \
> group 100 rate 1
# psample -v -g 100
dumps everything, including outer header and mac, when $d is a gre tunnel
over ipv6. Fix this adding a missing label for ARPHRD_IP6GRE devices.
Fixes: 5c5670fae4 ("net/sched: Introduce sample tc action")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Yotam Gigi <yotam.gi@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 73f0c11d11 ]
As the endpoint is unregistered there might still be work pending to
handle incoming messages, which will result in a use after free
scenario. The plan is to remove the rx_worker, but until then (and for
stable@) ensure that the work is stopped before the node is freed.
Fixes: bdabad3e36 ("net: Add Qualcomm IPC router")
Cc: stable@vger.kernel.org
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e47488b2df ]
According to the DP83865 datasheet "the 10 Mbps HDX loopback can be
disabled in the expanded memory register 0x1C0.1". The driver erroneously
used bit 0 instead of bit 1.
Fixes: 4621bf1298 ("phy: Add file missed in previous commit.")
Signed-off-by: Peter Mamonov <pmamonov@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ba56d8ce38 ]
Fei Liu reported a crash when doing netperf on a topo of macsec
dev over veth:
[ 448.919128] refcount_t: underflow; use-after-free.
[ 449.090460] Call trace:
[ 449.092895] refcount_sub_and_test+0xb4/0xc0
[ 449.097155] tcp_wfree+0x2c/0x150
[ 449.100460] ip_rcv+0x1d4/0x3a8
[ 449.103591] __netif_receive_skb_core+0x554/0xae0
[ 449.108282] __netif_receive_skb+0x28/0x78
[ 449.112366] netif_receive_skb_internal+0x54/0x100
[ 449.117144] napi_gro_complete+0x70/0xc0
[ 449.121054] napi_gro_flush+0x6c/0x90
[ 449.124703] napi_complete_done+0x50/0x130
[ 449.128788] gro_cell_poll+0x8c/0xa8
[ 449.132351] net_rx_action+0x16c/0x3f8
[ 449.136088] __do_softirq+0x128/0x320
The issue was caused by skb's true_size changed without its sk's
sk_wmem_alloc increased in tcp/skb_gro_receive(). Later when the
skb is being freed and the skb's truesize is subtracted from its
sk's sk_wmem_alloc in tcp_wfree(), underflow occurs.
macsec is calling gro_cells_receive() to receive a packet, which
actually requires skb->sk to be NULL. However when macsec dev is
over veth, it's possible the skb->sk is still set if the skb was
not unshared or expanded from the peer veth.
ip_rcv() is calling skb_orphan() to drop the skb's sk for tproxy,
but it is too late for macsec's calling gro_cells_receive(). So
fix it by dropping the skb's sk earlier on rx path of macsec.
Fixes: 5491e7c6b1 ("macsec: enable GRO and RPS on macsec devices")
Reported-by: Xiumei Mu <xmu@redhat.com>
Reported-by: Fei Liu <feliu@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ca7a03c417 ]
Commit 7d9e5f4221 removed references from certain dsts, but accounting
for this never translated down into the fib6 suppression code. This bug
was triggered by WireGuard users who use wg-quick(8), which uses the
"suppress-prefix" directive to ip-rule(8) for routing all of their
internet traffic without routing loops. The test case added here
causes the reference underflow by causing packets to evaluate a suppress
rule.
Fixes: 7d9e5f4221 ("ipv6: convert major tx path to use RT6_LOOKUP_F_DST_NOREF")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Acked-by: Wei Wang <weiwan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3fe4b33513 ]
Endpoints with zero wMaxPacketSize are not usable for transferring
data. Ignore such endpoints when looking for valid in, out and
status pipes, to make the driver more robust against invalid and
meaningless descriptors.
The wMaxPacketSize of the out pipe is used as divisor. So this change
fixes a divide-by-zero bug.
Reported-by: syzbot+ce366e2b8296e25d84f5@syzkaller.appspotmail.com
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 108639aac35eb57f1d0e8333f5fc8c7ff68df938 ]
struct archdr is only big enough to hold the header of various types of
arcnet packets. So to provide enough space to hold the data read from
hardware provide a buffer large enough to hold a packet with maximal
size.
The problem was noticed by the stack protector which makes the kernel
oops.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Acked-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This reverts commit 9c30694424
That wasn't a 5.3.2 release, it was a 5.3.3 release, as the Makefile
changed, but the text in the commit message didn't.
So revert it, so we can do an "empty" 5.3.3 release, just to keep things
working properly.
Sorry for the noise :(
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 24a8d78a9a upstream.
When naming the new devices, instead of using the ACPI ID in
the name as base, using the parent device's name. That makes
it possible to support multiple multi-instance i2c devices
of the same type in the same system.
This fixes an issue seen on some Intel Kaby Lake based
boards:
sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:15.0/i2c_designware.0/i2c-0/i2c-INT3515-tps6598x.0'
Fixes: 2336dfadfb ("platform/x86: i2c-multi-instantiate: Allow to have same slaves")
Cc: stable@vger.kernel.org
Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d2c63b7dfd upstream.
It's reported that the garbled sound on HP Envy x360 13z-ag000 (Ryzen
Laptop) is fixed by the same workaround applied to other AMD chips.
Update the driver_data entry for Raven (1022:15e3) to use the newly
introduced preset, AZX_DCAPS_PRESET_AMD_SB. Since it already contains
AZX_DCAPS_PM_RUNTIME, we can drop that bit, too.
Reported-and-tested-by: Dennis Padiernos <depadiernos@gmail.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20190920073040.31764-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3a9236e972 upstream.
At higher sampling rate (e.g. 192.0 kHz), Alesis iO26 transfers 4 data
channels per data block in CIP.
Both iO14 and iO26 have the same contents in their configuration ROM.
For this reason, ALSA Dice driver attempts to distinguish them according
to the value of TX0_AUDIO register at probe callback. Although the way is
valid at lower and middle sampling rate, it's lastly invalid at higher
sampling rate because because the two models returns the same value for
read transaction to the register.
In the most cases, users just plug-in the device and ALSA dice driver
detects it. In the case, the device runs at lower sampling rate and
the driver detects expectedly. For this reason, this commit leaves the
way to detect as is.
Fixes: 28b208f600 ("ALSA: dice: add parameters of stream formats for models produced by Alesis")
Cc: <stable@vger.kernel.org> # v4.18+
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Link: https://lore.kernel.org/r/20190916101851.30409-1-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 053a4ffe29 upstream.
The AUDIO PLL max support 650M, so the original clk settings violate
spec. This patch makes the output 786432000 -> 393216000,
and 722534400 -> 361267200 to aligned with NXP vendor kernel without any
impact on audio functionality and go within 650MHz PLL limit.
Cc: <stable@vger.kernel.org>
Fixes: ba5625c3e2 ("clk: imx: Add clock driver support for imx8mm")
Signed-off-by: Peng Fan <peng.fan@nxp.com>
Acked-by: Abel Vesa <abel.vesa@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 37c673ade3 upstream.
As reported by the OpenWRT team, write requests sometimes fail on some
platforms.
Currently to check the state chip_ready() is used correctly as described by
the flash memory S29GL256P11TFI01 datasheet.
Also chip_good() is used to check if the write is succeeded and it was
implemented by the commit fb4a90bfcd ("[MTD] CFI-0002 - Improve error
checking").
But actually the write failure is caused on some platforms and also it can
be fixed by using chip_good() to check the state and retry instead.
Also it seems that it is caused after repeated about 1,000 times to retry
the write one word with the reset command.
By using chip_good() to check the state to be done it can be reduced the
retry with reset.
It is depended on the actual flash chip behavior so the root cause is
unknown.
Cc: Chris Packham <chris.packham@alliedtelesis.co.nz>
Cc: Joakim Tjernlund <Joakim.Tjernlund@infinera.com>
Cc: linux-mtd@lists.infradead.org
Cc: stable@vger.kernel.org
Reported-by: Fabio Bettoni <fbettoni@gmail.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: Tokunori Ikegami <ikegami.t@gmail.com>
[vigneshr@ti.com: Fix a checkpatch warning]
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8ccff2843f upstream.
Before this commit dj_probe would exit with an error if the initial
logi_dj_recv_query_paired_devices fails. The initial call may fail
when the receiver is connected through a kvm and the focus is away.
When the call fails this causes 2 problems:
1) dj_probe calls logi_dj_recv_query_paired_devices after calling
hid_device_io_start() so a HID report may have been received in between
and our delayedwork_callback may be running. It seems that the initial
logi_dj_recv_query_paired_devices failure happening with some KVMs triggers
this exact scenario, causing the work-queue to run on free-ed memory,
leading to:
BUG: unable to handle page fault for address: 0000000000001e88
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 3 PID: 257 Comm: kworker/3:3 Tainted: G OE 5.3.0-rc5+ #100
Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./B150M Pro4S/D3, BIOS P7.10 12/06/2016
Workqueue: events 0xffffffffc02ba200
RIP: 0010:0xffffffffc02ba1bd
Code: e8 e8 13 00 d8 48 89 c5 48 85 c0 74 4c 48 8b 7b 10 48 89 ea b9 07 00 00 00 41 b9 09 00 00 00 41 b8 01 00 00 00 be 10 00 00 00 <48> 8b 87 88 1e 00 00 48 8b 40 40 e8 b3 6b b4 d8 48 89 ef 41 89 c4
RSP: 0018:ffffb760c046bdb8 EFLAGS: 00010286
RAX: ffff935038ea4550 RBX: ffff935046778000 RCX: 0000000000000007
RDX: ffff935038ea4550 RSI: 0000000000000010 RDI: 0000000000000000
RBP: ffff935038ea4550 R08: 0000000000000001 R09: 0000000000000009
R10: 000000000000e011 R11: 0000000000000001 R12: ffff9350467780e8
R13: ffff935046778000 R14: 0000000000000000 R15: ffff935046778070
FS: 0000000000000000(0000) GS:ffff935054e00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000001e88 CR3: 000000075a612002 CR4: 00000000003606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
0xffffffffc02ba2f7
? process_one_work+0x1b1/0x560
process_one_work+0x234/0x560
worker_thread+0x50/0x3b0
kthread+0x10a/0x140
? process_one_work+0x560/0x560
? kthread_park+0x80/0x80
ret_from_fork+0x3a/0x50
Modules linked in: vboxpci(O) vboxnetadp(O) vboxnetflt(O) vboxdrv(O) bnep vfat fat btusb btrtl btbcm btintel bluetooth intel_rapl_msr ecdh_generic rfkill ecc snd_usb_audio snd_usbmidi_lib intel_rapl_common snd_rawmidi mc x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support mei_wdt mei_hdcp ppdev kvm_intel kvm irqbypass crct10dif_pclmul crc32_generic crc32_pclmul snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio ghash_clmulni_intel intel_cstate snd_hda_intel snd_hda_codec intel_uncore snd_hda_core snd_hwdep intel_rapl_perf snd_seq snd_seq_device snd_pcm snd_timer intel_wmi_thunderbolt snd e1000e soundcore mxm_wmi i2c_i801 bfq mei_me mei intel_pch_thermal parport_pc parport acpi_pad binfmt_misc hid_lg_g15(E) hid_logitech_dj(E) i915 crc32c_intel i2c_algo_bit drm_kms_helper nvme nvme_core drm wmi video uas usb_storage i2c_dev
CR2: 0000000000001e88
---[ end trace 1d3f8afdcfcbd842 ]---
2) Even if we were to fix 1. by making sure the work is stopped before
failing probe, failing probe is the wrong thing to do, we have
logi_dj_recv_queue_unknown_work to deal with the initial
logi_dj_recv_query_paired_devices failure.
Rather then error-ing out of the probe, causing the receiver to not work at
all we should rely on this, so that the attached devices will get properly
enumerated once the KVM focus is switched back.
Cc: stable@vger.kernel.org
Fixes: 74808f9115 ("HID: logitech-dj: add support for non unifying receivers")
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2bcdacb703 upstream.
The sony driver is not properly cleaning up from potential failures in
sony_input_configured. Currently it calls hid_hw_stop, while hid_connect
is still running. This is not a good idea, instead hid_hw_stop should
be moved to sony_probe. Similar changes were recently made to Logitech
drivers, which were also doing improper cleanup.
Signed-off-by: Roderick Colenbrander <roderick.colenbrander@sony.com>
CC: stable@vger.kernel.org
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 98375b86c7 upstream.
The syzbot fuzzer provoked a general protection fault in the
hid-prodikeys driver:
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN
CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.3.0-rc5+ #28
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Workqueue: usb_hub_wq hub_event
RIP: 0010:pcmidi_submit_output_report drivers/hid/hid-prodikeys.c:300 [inline]
RIP: 0010:pcmidi_set_operational drivers/hid/hid-prodikeys.c:558 [inline]
RIP: 0010:pcmidi_snd_initialise drivers/hid/hid-prodikeys.c:686 [inline]
RIP: 0010:pk_probe+0xb51/0xfd0 drivers/hid/hid-prodikeys.c:836
Code: 0f 85 50 04 00 00 48 8b 04 24 4c 89 7d 10 48 8b 58 08 e8 b2 53 e4 fc
48 8b 54 24 20 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02 00 0f
85 13 04 00 00 48 ba 00 00 00 00 00 fc ff df 49 8b
The problem is caused by the fact that pcmidi_get_output_report() will
return an error if the HID device doesn't provide the right sort of
output report, but pcmidi_set_operational() doesn't bother to check
the return code and assumes the function call always succeeds.
This patch adds the missing check and aborts the probe operation if
necessary.
Reported-and-tested-by: syzbot+1088533649dafa1c9004@syzkaller.appspotmail.com
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
CC: <stable@vger.kernel.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6ccb4ac2bf upstream.
There's a bug in skiboot that causes the OPAL_XIVE_ALLOCATE_IRQ call
to return the 32-bit value 0xffffffff when OPAL has run out of IRQs.
Unfortunatelty, OPAL return values are signed 64-bit entities and
errors are supposed to be negative. If that happens, the linux code
confusingly treats 0xffffffff as a valid IRQ number and panics at some
point.
A fix was recently merged in skiboot:
e97391ae2bb5 ("xive: fix return value of opal_xive_allocate_irq()")
but we need a workaround anyway to support older skiboots already
in the field.
Internally convert 0xffffffff to OPAL_RESOURCE which is the usual error
returned upon resource exhaustion.
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/156821713818.1985334.14123187368108582810.stgit@bahia.lan
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0f0727d971 upstream.
arch/x86/Makefile disables SSE and SSE2 for the whole kernel. The
AMDGPU drivers modified in this patch re-enable SSE but not SSE2. Turn
on SSE2 to support emitting double precision floating point instructions
rather than calls to non-existent (usually available from gcc_s or
compiler_rt) floating point helper routines for Clang.
This was originally landed in:
commit 1011745073 ("drm/amd/display: add -msse2 to prevent Clang from emitting libcalls to undefined SW FP routines")
but reverted in:
commit 193392ed9f ("Revert "drm/amd/display: add -msse2 to prevent Clang from emitting libcalls to undefined SW FP routines"")
due to bugreports from GCC builds. Add guards to only do so for Clang.
Link: https://bugs.freedesktop.org/show_bug.cgi?id=109487
Link: https://github.com/ClangBuiltLinux/linux/issues/327
Suggested-by: Sedat Dilek <sedat.dilek@gmail.com>
Suggested-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bd200d190f upstream.
[Why]
DRM private objects have no hw_done/flip_done fencing mechanism on their
own and cannot be used to sequence commits accordingly.
When issuing commits that don't touch the same set of hardware resources
like page-flips on different CRTCs we can run into the issue below
because of this:
1. Client requests non-blocking Commit #1, has a new dc_state #1,
state is swapped, commit tail is deferred to work queue
2. Client requests non-blocking Commit #2, has a new dc_state #2,
state is swapped, commit tail is deferred to work queue
3. Commit #2 work starts, commit tail finishes,
atomic state is cleared, dc_state #1 is freed
4. Commit #1 work starts,
commit tail encounters null pointer deref on dc_state #1
In order to change the DC state as in the private object we need to
ensure that we wait for all outstanding commits to finish and that
any other pending commits must wait for the current one to finish as
well.
We do this for MEDIUM and FULL updates. But not for FAST updates, nor
would we want to since it would cause stuttering from the delays.
FAST updates that go through dm_determine_update_type_for_commit always
create a new dc_state and lock the DRM private object if there are
any changed planes.
We need the old state to validate, but we don't actually need the new
state here.
[How]
If the commit isn't a full update then the use after free can be
resolved by simply discarding the new state entirely and retaining
the existing one instead.
With this change the sequence above can be reexamined. Commit #2 will
still free Commit #1's reference, but before this happens we actually
added an additional reference as part of Commit #2.
If an update comes in during this that needs to change the dc_state
it will need to wait on Commit #1 and Commit #2 to finish. Then it'll
swap the state, finish the work in commit tail and drop the last
reference on Commit #2's dc_state.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204181
Fixes: 004b3938e6 ("drm/amd/display: Check scaling info when determing update type")
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: David Francis <david.francis@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 43d10d30df upstream.
[Why]
By passing through the dm_determine_update_type_for_commit for atomic
commits that can be done asynchronously we are incurring a
performance penalty by locking access to the global private object
and holding that access until the end of the programming sequence.
This is also allocating a new large dc_state on every access in addition
to retaining all the references on each stream and plane until the end
of the programming sequence.
[How]
Shift the determination for async update before validation. Return early
if it's going to be an async update.
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: David Francis <david.francis@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e16e37efb4 upstream.
[Why]
We previously allowed framebuffer swaps as async updates for cursor
planes but had to disable them due to a bug in DRM with async update
handling and incorrect ref counting. The check to block framebuffer
swaps has been added to DRM for a while now, so this check is redundant.
The real fix that allows this to properly in DRM has also finally been
merged and is getting backported into stable branches, so dropping
this now seems to be the right time to do so.
[How]
Drop the redundant check for old_fb != new_fb.
With the proper fix in DRM, this should also fix some cursor stuttering
issues with xf86-video-amdgpu since it double buffers the cursor.
IGT tests that swap framebuffers (-varying-size for example) should
also pass again.
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: David Francis <david.francis@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 14e019df1e upstream.
Deferred probe is an expected return value on many platforms and so
there's no need to output a warning that may potentially confuse users.
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 763719771e upstream.
Deferred probe is an expected return value for clk_get() on many
platforms. The driver deals with it properly, so there's no need
to output a warning that may potentially confuse users.
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 14ced7e3a1 upstream.
Despite extensive testing of commit 885bd76596 ("phy: qcom-qmp: Correct
READY_STATUS poll break condition") I failed to conclude that the
PHYSTATUS bit of the PCS_STATUS register used in PCIe and USB3 falls as
the PHY gets ready. Similar to the prior bug with UFS the code will
generally get past the check before the transition and thereby
"succeed".
Correct the name of the register used PCIe and USB3 PHYs, replace
mask_pcs_ready with a constant expression depending on the type of the
PHY and check for the appropriate ready state.
Cc: stable@vger.kernel.org
Cc: Vivek Gautam <vivek.gautam@codeaurora.org>
Cc: Evan Green <evgreen@chromium.org>
Cc: Niklas Cassel <niklas.cassel@linaro.org>
Reported-by: Marc Gonzalez <marc.w.gonzalez@free.fr>
Fixes: 885bd76596 ("phy: qcom-qmp: Correct READY_STATUS poll break condition")
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Tested-by: Marc Gonzalez <marc.w.gonzalez@free.fr>
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0be0bfd2de upstream.
Once upon a time, commit 2cac0c00a6 ("ovl: get exclusive ownership on
upper/work dirs") in v4.13 added some sanity checks on overlayfs layers.
This change caused a docker regression. The root cause was mount leaks
by docker, which as far as I know, still exist.
To mitigate the regression, commit 85fdee1eef ("ovl: fix regression
caused by exclusive upper/work dir protection") in v4.14 turned the
mount errors into warnings for the default index=off configuration.
Recently, commit 146d62e5a5 ("ovl: detect overlapping layers") in
v5.2, re-introduced exclusive upper/work dir checks regardless of
index=off configuration.
This changes the status quo and mount leak related bug reports have
started to re-surface. Restore the status quo to fix the regressions.
To clarify, index=off does NOT relax overlapping layers check for this
ovelayfs mount. index=off only relaxes exclusive upper/work dir checks
with another overlayfs mount.
To cover the part of overlapping layers detection that used the
exclusive upper/work dir checks to detect overlap with self upper/work
dir, add a trap also on the work base dir.
Link: https://github.com/moby/moby/issues/34672
Link: https://lore.kernel.org/linux-fsdevel/20171006121405.GA32700@veci.piliscsaba.szeredi.hu/
Link: https://github.com/containers/libpod/issues/3540
Fixes: 146d62e5a5 ("ovl: detect overlapping layers")
Cc: <stable@vger.kernel.org> # v4.19+
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Tested-by: Colin Walters <walters@verbum.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d0b7a302d5 upstream.
This reverts commit 24fe1b0efa.
Commit 24fe1b0efa ("arm64: Remove unnecessary ISBs from
set_{pte,pmd,pud}") removed ISB instructions immediately following updates
to the page table, on the grounds that they are not required by the
architecture and a DSB alone is sufficient to ensure that subsequent data
accesses use the new translation:
DDI0487E_a, B2-128:
| ... no instruction that appears in program order after the DSB
| instruction can alter any state of the system or perform any part of
| its functionality until the DSB completes other than:
|
| * Being fetched from memory and decoded
| * Reading the general-purpose, SIMD and floating-point,
| Special-purpose, or System registers that are directly or indirectly
| read without causing side-effects.
However, the same document also states the following:
DDI0487E_a, B2-125:
| DMB and DSB instructions affect reads and writes to the memory system
| generated by Load/Store instructions and data or unified cache
| maintenance instructions being executed by the PE. Instruction fetches
| or accesses caused by a hardware translation table access are not
| explicit accesses.
which appears to claim that the DSB alone is insufficient. Unfortunately,
some CPU designers have followed the second clause above, whereas in Linux
we've been relying on the first. This means that our mapping sequence:
MOV X0, <valid pte>
STR X0, [Xptep] // Store new PTE to page table
DSB ISHST
LDR X1, [X2] // Translates using the new PTE
can actually raise a translation fault on the load instruction because the
translation can be performed speculatively before the page table update and
then marked as "faulting" by the CPU. For user PTEs, this is ok because we
can handle the spurious fault, but for kernel PTEs and intermediate table
entries this results in a panic().
Revert the offending commit to reintroduce the missing barriers.
Cc: <stable@vger.kernel.org>
Fixes: 24fe1b0efa ("arm64: Remove unnecessary ISBs from set_{pte,pmd,pud}")
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4b708b7b1a upstream.
The VPD implementation from Chromium Vital Product Data project used to
parse data from untrusted input without checking if the meta data is
invalid or corrupted. For example, the size from decoded content may
be negative value, or larger than whole input buffer. Such invalid data
may cause buffer overflow.
To fix that, the size parameters passed to vpd_decode functions should
be changed to unsigned integer (u32) type, and the parsing of entry
header should be refactored so every size field is correctly verified
before starting to decode.
Fixes: ad2ac9d5c5 ("firmware: Google VPD: import lib_vpd source files")
Signed-off-by: Hung-Te Lin <hungte@chromium.org>
Cc: stable <stable@vger.kernel.org>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Link: https://lore.kernel.org/r/20190830022402.214442-1-hungte@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 19e13cb27b ]
We need to hold rnl lock in suspend and resume callbacks because phylink
requires it. Otherwise we will get a WARN() in suspend and resume.
Also, move phylink start and stop callbacks to inside device's internal
lock so that we prevent concurrent HW accesses.
Fixes: 74371272f9 ("net: stmmac: Convert to phylink and remove phylib logic")
Reported-by: Christophe ROULLIER <christophe.roullier@st.com>
Tested-by: Christophe ROULLIER <christophe.roullier@st.com>
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 23426a25e5 ]
The DSA core, DSA taggers and DSA drivers all make use of
module_init(). Hence they get initialised at device_initcall() time.
The ordering is non-deterministic. It can be a DSA driver is bound to
a device before the needed tag driver has been initialised, resulting
in the message:
No tagger for this switch
Rather than have this be fatal, return -EPROBE_DEFER so that it is
tried again later once all the needed drivers have been loaded.
Fixes: d3b8c04988 ("dsa: Add boilerplate helper to register DSA tag driver modules")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 00b368502d ]
When skb_shinfo(skb) is not able to cache extra fragment (that is,
skb_shinfo(skb)->nr_frags >= MAX_SKB_FRAGS), xennet_fill_frags() assumes
the sk_buff_head list is already empty. As a result, cons is increased only
by 1 and returns to error handling path in xennet_poll().
However, if the sk_buff_head list is not empty, queue->rx.rsp_cons may be
set incorrectly. That is, queue->rx.rsp_cons would point to the rx ring
buffer entries whose queue->rx_skbs[i] and queue->grant_rx_ref[i] are
already cleared to NULL. This leads to NULL pointer access in the next
iteration to process rx ring buffer entries.
Below is how xennet_poll() does error handling. All remaining entries in
tmpq are accounted to queue->rx.rsp_cons without assuming how many
outstanding skbs are remained in the list.
985 static int xennet_poll(struct napi_struct *napi, int budget)
... ...
1032 if (unlikely(xennet_set_skb_gso(skb, gso))) {
1033 __skb_queue_head(&tmpq, skb);
1034 queue->rx.rsp_cons += skb_queue_len(&tmpq);
1035 goto err;
1036 }
It is better to always have the error handling in the same way.
Fixes: ad4f15dc2c ("xen/netfront: don't bug in case of too many frags")
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit acdcecc612 ]
UDP reuseport groups can hold a mix unconnected and connected sockets.
Ensure that connections only receive all traffic to their 4-tuple.
Fast reuseport returns on the first reuseport match on the assumption
that all matches are equal. Only if connections are present, return to
the previous behavior of scoring all sockets.
Record if connections are present and if so (1) treat such connected
sockets as an independent match from the group, (2) only return
2-tuple matches from reuseport and (3) do not return on the first
2-tuple reuseport match to allow for a higher scoring match later.
New field has_conns is set without locks. No other fields in the
bitmap are modified at runtime and the field is only ever set
unconditionally, so an RMW cannot miss a change.
Fixes: e32ea7e747 ("soreuseport: fast reuseport UDP socket selection")
Link: http://lkml.kernel.org/r/CA+FuTSfRP09aJNYRt04SS6qj22ViiOEWaWmLAwX0psk8-PGNxw@mail.gmail.com
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Craig Gallek <kraig@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d518d2ed86 ]
The test implemented by some_qdisc_is_busy() is somewhat loosy for
NOLOCK qdisc, as we may hit the following scenario:
CPU1 CPU2
// in net_tx_action()
clear_bit(__QDISC_STATE_SCHED...);
// in some_qdisc_is_busy()
val = (qdisc_is_running(q) ||
test_bit(__QDISC_STATE_SCHED,
&q->state));
// here val is 0 but...
qdisc_run(q)
// ... CPU1 is going to run the qdisc next
As a conseguence qdisc_run() in net_tx_action() can race with qdisc_reset()
in dev_qdisc_reset(). Such race is not possible for !NOLOCK qdisc as
both the above bit operations are under the root qdisc lock().
After commit 021a17ed79 ("pfifo_fast: drop unneeded additional lock on dequeue")
the race can cause use after free and/or null ptr dereference, but the root
cause is likely older.
This patch addresses the issue explicitly checking for deactivation under
the seqlock for NOLOCK qdisc, so that the qdisc_run() in the critical
scenario becomes a no-op.
Note that the enqueue() op can still execute concurrently with dev_qdisc_reset(),
but that is safe due to the skb_array() locking, and we can't avoid that
for NOLOCK qdiscs.
Fixes: 021a17ed79 ("pfifo_fast: drop unneeded additional lock on dequeue")
Reported-by: Li Shuang <shuali@redhat.com>
Reported-and-tested-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 28e4860377 ]
In ip6erspan_tunnel_xmit(), if the skb will not be sent out, it has to
be freed on the tx_err path. Otherwise when deleting a netns, it would
cause dst/dev to leak, and dmesg shows:
unregister_netdevice: waiting for lo to become free. Usage count = 1
Fixes: ef7baf5e08 ("ip6_gre: add ip6 erspan collect_md mode")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e6839c31a6 upstream.
The hardware manual should be revised, but the initial value of
VBCTRL.OCCLREN is set to 1 actually. If the bit is set, the hardware
clears VBCTRL.VBOUT and ADPCTRL.DRVVBUS registers automatically
when the hardware detects over-current signal from a USB power switch.
However, since the hardware doesn't have any registers which
indicates over-current, the driver cannot handle it at all. So, if
"is_otg_channel" hardware detects over-current, since ADPCTRL.DRVVBUS
register is cleared automatically, the channel cannot be used after
that.
To resolve this behavior, this patch sets the VBCTRL.OCCLREN to 0
to keep ADPCTRL.DRVVBUS even if the "is_otg_channel" hardware
detects over-current. (We assume a USB power switch itself protects
over-current and turns the VBUS off.)
This patch is inspired by a BSP patch from Kazuya Mizuguchi.
Fixes: 1114e2d317 ("phy: rcar-gen3-usb2: change the mode to OTG on the combined channel")
Cc: <stable@vger.kernel.org> # v4.5+
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3dd550a2d3 upstream.
The syzbot fuzzer provoked a slab-out-of-bounds error in the USB core:
BUG: KASAN: slab-out-of-bounds in memcmp+0xa6/0xb0 lib/string.c:904
Read of size 1 at addr ffff8881d175bed6 by task kworker/0:3/2746
CPU: 0 PID: 2746 Comm: kworker/0:3 Not tainted 5.3.0-rc5+ #28
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Workqueue: usb_hub_wq hub_event
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0xca/0x13e lib/dump_stack.c:113
print_address_description+0x6a/0x32c mm/kasan/report.c:351
__kasan_report.cold+0x1a/0x33 mm/kasan/report.c:482
kasan_report+0xe/0x12 mm/kasan/common.c:612
memcmp+0xa6/0xb0 lib/string.c:904
memcmp include/linux/string.h:400 [inline]
descriptors_changed drivers/usb/core/hub.c:5579 [inline]
usb_reset_and_verify_device+0x564/0x1300 drivers/usb/core/hub.c:5729
usb_reset_device+0x4c1/0x920 drivers/usb/core/hub.c:5898
rt2x00usb_probe+0x53/0x7af
drivers/net/wireless/ralink/rt2x00/rt2x00usb.c:806
The error occurs when the descriptors_changed() routine (called during
a device reset) attempts to compare the old and new BOS and capability
descriptors. The length it uses for the comparison is the
wTotalLength value stored in BOS descriptor, but this value is not
necessarily the same as the length actually allocated for the
descriptors. If it is larger the routine will call memcmp() with a
length that is too big, thus reading beyond the end of the allocated
region and leading to this fault.
The kernel reads the BOS descriptor twice: first to get the total
length of all the capability descriptors, and second to read it along
with all those other descriptors. A malicious (or very faulty) device
may send different values for the BOS descriptor fields each time.
The memory area will be allocated using the wTotalLength value read
the first time, but stored within it will be the value read the second
time.
To prevent this possibility from causing any errors, this patch
modifies the BOS descriptor after it has been read the second time:
It sets the wTotalLength field to the actual length of the descriptors
that were read in and validated. Then the memcpy() call, or any other
code using these descriptors, will be able to rely on wTotalLength
being valid.
Reported-and-tested-by: syzbot+35f4d916c623118d576e@syzkaller.appspotmail.com
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
CC: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/Pine.LNX.4.44L0.1909041154260.1722-100000@iolanthe.rowland.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This reverts commit b03755ad6f.
This is sad, and done for all the wrong reasons. Because that commit is
good, and does exactly what it says: avoids a lot of small disk requests
for the inode table read-ahead.
However, it turns out that it causes an entirely unrelated problem: the
getrandom() system call was introduced back in 2014 by commit
c6e9d6f388 ("random: introduce getrandom(2) system call"), and people
use it as a convenient source of good random numbers.
But part of the current semantics for getrandom() is that it waits for
the entropy pool to fill at least partially (unlike /dev/urandom). And
at least ArchLinux apparently has a systemd that uses getrandom() at
boot time, and the improvements in IO patterns means that existing
installations suddenly start hanging, waiting for entropy that will
never happen.
It seems to be an unlucky combination of not _quite_ enough entropy,
together with a particular systemd version and configuration. Lennart
says that the systemd-random-seed process (which is what does this early
access) is supposed to not block any other boot activity, but sadly that
doesn't actually seem to be the case (possibly due bogus dependencies on
cryptsetup for encrypted swapspace).
The correct fix is to fix getrandom() to not block when it's not
appropriate, but that fix is going to take a lot more discussion. Do we
just make it act like /dev/urandom by default, and add a new flag for
"wait for entropy"? Do we add a boot-time option? Or do we just limit
the amount of time it will wait for entropy?
So in the meantime, we do the revert to give us time to discuss the
eventual fix for the fundamental problem, at which point we can re-apply
the ext4 inode table access optimization.
Reported-by: Ahmed S. Darwish <darwish.07@gmail.com>
Cc: Ted Ts'o <tytso@mit.edu>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Alexander E. Patrakov <patrakov@gmail.com>
Cc: Lennart Poettering <mzxreary@0pointer.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull kvm fixes from Paolo Bonzini:
"The main change here is a revert of reverts. We recently simplified
some code that was thought unnecessary; however, since then KVM has
grown quite a few cond_resched()s and for that reason the simplified
code is prone to livelocks---one CPUs tries to empty a list of guest
page tables while the others keep adding to them. This adds back the
generation-based zapping of guest page tables, which was not
unnecessary after all.
On top of this, there is a fix for a kernel memory leak and a couple
of s390 fixlets as well"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86/mmu: Reintroduce fast invalidate/zap for flushing memslot
KVM: x86: work around leak of uninitialized stack contents
KVM: nVMX: handle page fault in vmread
KVM: s390: Do not leak kernel stack data in the KVM_S390_INTERRUPT ioctl
KVM: s390: kvm_s390_vm_start_migration: check dirty_bitmap before using it as target for memset()
Pull virtio fix from Michael Tsirkin:
"A last minute revert
The 32-bit build got broken by the latest defence in depth patch.
Revert and we'll try again in the next cycle"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
Revert "vhost: block speculation of translated descriptors"
Pull RISC-V fix from Paul Walmsley:
"Last week, Palmer and I learned that there was an error in the RISC-V
kernel image header format that could make it less compatible with the
ARM64 kernel image header format. I had missed this error during my
original reviews of the patch.
The kernel image header format is an interface that impacts
bootloaders, QEMU, and other user tools. Those packages must be
updated to align with whatever is merged in the kernel. We would like
to avoid proliferating these image formats by keeping the RISC-V
header as close as possible to the existing ARM64 header. Since the
arch/riscv patch that adds support for the image header was merged
with our v5.3-rc1 pull request as commit 0f327f2aaa ("RISC-V: Add
an Image header that boot loader can parse."), we think it wise to try
to fix this error before v5.3 is released.
The fix itself should be backwards-compatible with any project that
has already merged support for premature versions of this interface.
It primarily involves ensuring that the RISC-V image header has
something useful in the same field as the ARM64 image header"
* tag 'riscv/for-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: modify the Image header to improve compatibility with the ARM64 header
This reverts commit a89db445fb.
I was hasty to include this patch, and it breaks the build on 32 bit.
Defence in depth is good but let's do it properly.
Cc: stable@vger.kernel.org
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Pull networking fixes from David Miller:
1) Don't corrupt xfrm_interface parms before validation, from Nicolas
Dichtel.
2) Revert use of usb-wakeup in btusb, from Mario Limonciello.
3) Block ipv6 packets in bridge netfilter if ipv6 is disabled, from
Leonardo Bras.
4) IPS_OFFLOAD not honored in ctnetlink, from Pablo Neira Ayuso.
5) Missing ULP check in sock_map, from John Fastabend.
6) Fix receive statistic handling in forcedeth, from Zhu Yanjun.
7) Fix length of SKB allocated in 6pack driver, from Christophe
JAILLET.
8) ip6_route_info_create() returns an error pointer, not NULL. From
Maciej Żenczykowski.
9) Only add RDS sock to the hashes after rs_transport is set, from
Ka-Cheong Poon.
10) Don't double clean TX descriptors in ixgbe, from Ilya Maximets.
11) Presence of transmit IPSEC offload in an SKB is not tested for
correctly in ixgbe and ixgbevf. From Steffen Klassert and Jeff
Kirsher.
12) Need rcu_barrier() when register_netdevice() takes one of the
notifier based failure paths, from Subash Abhinov Kasiviswanathan.
13) Fix leak in sctp_do_bind(), from Mao Wenan.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (72 commits)
cdc_ether: fix rndis support for Mediatek based smartphones
sctp: destroy bucket if failed to bind addr
sctp: remove redundant assignment when call sctp_get_port_local
sctp: change return type of sctp_get_port_local
ixgbevf: Fix secpath usage for IPsec Tx offload
sctp: Fix the link time qualifier of 'sctp_ctrlsock_exit()'
ixgbe: Fix secpath usage for IPsec TX offload.
net: qrtr: fix memort leak in qrtr_tun_write_iter
net: Fix null de-reference of device refcount
ipv6: Fix the link time qualifier of 'ping_v6_proc_exit_net()'
tun: fix use-after-free when register netdev failed
tcp: fix tcp_ecn_withdraw_cwr() to clear TCP_ECN_QUEUE_CWR
ixgbe: fix double clean of Tx descriptors with xdp
ixgbe: Prevent u8 wrapping of ITR value to something less than 10us
mlx4: fix spelling mistake "veify" -> "verify"
net: hns3: fix spelling mistake "undeflow" -> "underflow"
net: lmc: fix spelling mistake "runnin" -> "running"
NFC: st95hf: fix spelling mistake "receieve" -> "receive"
net/rds: An rds_sock is added too early to the hash table
mac80211: Do not send Layer 2 Update frame before authorization
...
Pull drm fixes from Dave Airlie:
"From the maintainer summit, just some last minute fixes for final:
lima:
- fix gem_wait ioctl
core:
- constify modes list
i915:
- DP MST high color depth regression
- GPU hangs on vulkan compute workloads"
* tag 'drm-fixes-2019-09-13' of git://anongit.freedesktop.org/drm/drm:
drm/lima: fix lima_gem_wait() return value
drm/i915: Restore relaxed padding (OCL_OOB_SUPPRES_ENABLE) for skl+
drm/i915: Limit MST to <= 8bpc once again
drm/modes: Make the whitelist more const
James Harvey reported a livelock that was introduced by commit
d012a06ab1 ("Revert "KVM: x86/mmu: Zap only the relevant pages when
removing a memslot"").
The livelock occurs because kvm_mmu_zap_all() as it exists today will
voluntarily reschedule and drop KVM's mmu_lock, which allows other vCPUs
to add shadow pages. With enough vCPUs, kvm_mmu_zap_all() can get stuck
in an infinite loop as it can never zap all pages before observing lock
contention or the need to reschedule. The equivalent of kvm_mmu_zap_all()
that was in use at the time of the reverted commit (4e103134b8, "KVM:
x86/mmu: Zap only the relevant pages when removing a memslot") employed
a fast invalidate mechanism and was not susceptible to the above livelock.
There are three ways to fix the livelock:
- Reverting the revert (commit d012a06ab1) is not a viable option as
the revert is needed to fix a regression that occurs when the guest has
one or more assigned devices. It's unlikely we'll root cause the device
assignment regression soon enough to fix the regression timely.
- Remove the conditional reschedule from kvm_mmu_zap_all(). However, although
removing the reschedule would be a smaller code change, it's less safe
in the sense that the resulting kvm_mmu_zap_all() hasn't been used in
the wild for flushing memslots since the fast invalidate mechanism was
introduced by commit 6ca18b6950 ("KVM: x86: use the fast way to
invalidate all pages"), back in 2013.
- Reintroduce the fast invalidate mechanism and use it when zapping shadow
pages in response to a memslot being deleted/moved, which is what this
patch does.
For all intents and purposes, this is a revert of commit ea145aacf4
("Revert "KVM: MMU: fast invalidate all pages"") and a partial revert of
commit 7390de1e99 ("Revert "KVM: x86: use the fast way to invalidate
all pages""), i.e. restores the behavior of commit 5304b8d37c ("KVM:
MMU: fast invalidate all pages") and commit 6ca18b6950 ("KVM: x86:
use the fast way to invalidate all pages") respectively.
Fixes: d012a06ab1 ("Revert "KVM: x86/mmu: Zap only the relevant pages when removing a memslot"")
Reported-by: James Harvey <jamespharvey20@gmail.com>
Cc: Alex Willamson <alex.williamson@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Emulation of VMPTRST can incorrectly inject a page fault
when passed an operand that points to an MMIO address.
The page fault will use uninitialized kernel stack memory
as the CR2 and error code.
The right behavior would be to abort the VM with a KVM_EXIT_INTERNAL_ERROR
exit to userspace; however, it is not an easy fix, so for now just ensure
that the error code and CR2 are zero.
Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com>
Cc: stable@vger.kernel.org
[add comment]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The implementation of vmread to memory is still incomplete, as it
lacks the ability to do vmread to I/O memory just like vmptrst.
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Part of the intention during the definition of the RISC-V kernel image
header was to lay the groundwork for a future merge with the ARM64
image header. One error during my original review was not noticing
that the RISC-V header's "magic" field was at a different size and
position than the ARM64's "magic" field. If the existing ARM64 Image
header parsing code were to attempt to parse an existing RISC-V kernel
image header format, it would see a magic number 0. This is
undesirable, since it's our intention to align as closely as possible
with the ARM64 header format. Another problem was that the original
"res3" field was not being initialized correctly to zero.
Address these issues by creating a 32-bit "magic2" field in the RISC-V
header which matches the ARM64 "magic" field. RISC-V binaries will
store "RSC\x05" in this field. The intention is that the use of the
existing 64-bit "magic" field in the RISC-V header will be deprecated
over time. Increment the minor version number of the file format to
indicate this change, and update the documentation accordingly. Fix
the assembler directives in head.S to ensure that reserved fields are
properly zero-initialized.
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
Reported-by: Palmer Dabbelt <palmer@sifive.com>
Reviewed-by: Palmer Dabbelt <palmer@sifive.com>
Cc: Atish Patra <atish.patra@wdc.com>
Cc: Karsten Merker <merker@debian.org>
Link: https://lore.kernel.org/linux-riscv/194c2f10c9806720623430dbf0cc59a965e50448.camel@wdc.com/T/#u
Link: https://lore.kernel.org/linux-riscv/mhng-755b14c4-8f35-4079-a7ff-e421fd1b02bc@palmer-si-x1e/T/#t
A Mediatek based smartphone owner reports problems with USB
tethering in Linux. The verbose USB listing shows a rndis_host
interface pair (e0/01/03 + 10/00/00), but the driver fails to
bind with
[ 355.960428] usb 1-4: bad CDC descriptors
The problem is a failsafe test intended to filter out ACM serial
functions using the same 02/02/ff class/subclass/protocol as RNDIS.
The serial functions are recognized by their non-zero bmCapabilities.
No RNDIS function with non-zero bmCapabilities were known at the time
this failsafe was added. But it turns out that some Wireless class
RNDIS functions are using the bmCapabilities field. These functions
are uniquely identified as RNDIS by their class/subclass/protocol, so
the failing test can safely be disabled. The same applies to the two
types of Misc class RNDIS functions.
Applying the failsafe to Communication class functions only retains
the original functionality, and fixes the problem for the Mediatek based
smartphone.
Tow examples of CDC functional descriptors with non-zero bmCapabilities
from Wireless class RNDIS functions are:
0e8d:000a Mediatek Crosscall Spider X5 3G Phone
CDC Header:
bcdCDC 1.10
CDC ACM:
bmCapabilities 0x0f
connection notifications
sends break
line coding and serial state
get/set/clear comm features
CDC Union:
bMasterInterface 0
bSlaveInterface 1
CDC Call Management:
bmCapabilities 0x03
call management
use DataInterface
bDataInterface 1
and
19d2:1023 ZTE K4201-z
CDC Header:
bcdCDC 1.10
CDC ACM:
bmCapabilities 0x02
line coding and serial state
CDC Call Management:
bmCapabilities 0x03
call management
use DataInterface
bDataInterface 1
CDC Union:
bMasterInterface 0
bSlaveInterface 1
The Mediatek example is believed to apply to most smartphones with
Mediatek firmware. The ZTE example is most likely also part of a larger
family of devices/firmwares.
Suggested-by: Lars Melin <larsm17@gmail.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mao Wenan says:
====================
fix memory leak for sctp_do_bind
First two patches are to do cleanup, remove redundant assignment,
and change return type of sctp_get_port_local.
Third patch is to fix memory leak for sctp_do_bind if failed
to bind address.
v2: add one patch to change return type of sctp_get_port_local.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
There are more parentheses in if clause when call sctp_get_port_local
in sctp_do_bind, and redundant assignment to 'ret'. This patch is to
do cleanup.
Signed-off-by: Mao Wenan <maowenan@huawei.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently sctp_get_port_local() returns a long
which is either 0,1 or a pointer casted to long.
It's neither of the callers use the return value since
commit 62208f1245 ("net: sctp: simplify sctp_get_port").
Now two callers are sctp_get_port and sctp_do_bind,
they actually assumend a casted to an int was the same as
a pointer casted to a long, and they don't save the return
value just check whether it is zero or non-zero, so
it would better change return type from long to int for
sctp_get_port_local.
Signed-off-by: Mao Wenan <maowenan@huawei.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Port the same fix for ixgbe to ixgbevf.
The ixgbevf driver currently does IPsec Tx offloading
based on an existing secpath. However, the secpath
can also come from the Rx side, in this case it is
misinterpreted for Tx offload and the packets are
dropped with a "bad sa_idx" error. Fix this by using
the xfrm_offload() function to test for Tx offload.
CC: Shannon Nelson <snelson@pensando.io>
Fixes: 7f68d43067 ("ixgbevf: enable VF IPsec offload operations")
Reported-by: Jonathan Tooker <jonathan@reliablehosting.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Accessing the device when it may be runtime suspended is a bug, which is
the case in tmio_mmc_host_remove(). Let's fix the behaviour.
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
The tmio_mmc_host_probe() calls pm_runtime_set_active() to update the
runtime PM status of the device, as to make it reflect the current status
of the HW. This works fine for most cases, but unfortunate not for all.
Especially, there is a generic problem when the device has a genpd attached
and that genpd have the ->start|stop() callbacks assigned.
More precisely, if the driver calls pm_runtime_set_active() during
->probe(), genpd does not get to invoke the ->start() callback for it,
which means the HW isn't really fully powered on. Furthermore, in the next
phase, when the device becomes runtime suspended, genpd will invoke the
->stop() callback for it, potentially leading to usage count imbalance
problems, depending on what's implemented behind the callbacks of course.
To fix this problem, convert to call pm_runtime_get_sync() from
tmio_mmc_host_probe() rather than pm_runtime_set_active(). Additionally, to
avoid bumping usage counters and unnecessary re-initializing the HW the
first time the tmio driver's ->runtime_resume() callback is called,
introduce a state flag to keeping track of this.
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
This reverts commit 7ff2131933.
It turns out that the above commit introduces other problems. For example,
calling pm_runtime_set_active() must not be done prior calling
pm_runtime_enable() as that makes it fail. This leads to additional
problems, such as clock enables being wrongly balanced.
Rather than fixing the problem on top, let's start over by doing a revert.
Fixes: 7ff2131933 ("mmc: tmio: move runtime PM enablement to the driver implementations")
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Pull cgroup fix from Tejun Heo:
"Roman found and fixed a bug in the cgroup2 freezer which allows new
child cgroup to escape frozen state"
* 'for-5.3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: freezer: fix frozen state inheritance
kselftests: cgroup: add freezer mkdir test
Pull btrfs fixes from David Sterba:
"Here are two fixes, one of them urgent fixing a bug introduced in 5.2
and reported by many users. It took time to identify the root cause,
catching the 5.3 release is higly desired also to push the fix to 5.2
stable tree.
The bug is a mess up of return values after adding proper error
handling and honestly the kind of bug that can cause sleeping
disorders until it's caught. My appologies to everybody who was
affected.
Summary of what could happen:
1) either a hang when committing a transaction, if this happens
there's no risk of corruption, still the hang is very inconvenient
and can't be resolved without a reboot
2) writeback for some btree nodes may never be started and we end up
committing a transaction without noticing that, this is really
serious and that will lead to the "parent transid verify failed"
messages"
* tag 'for-5.3-rc8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
Btrfs: fix unwritten extent buffers and hangs on future writeback attempts
Btrfs: fix assertion failure during fsync and use of stale transaction
If a new child cgroup is created in the frozen cgroup hierarchy
(one or more of ancestor cgroups is frozen), the CGRP_FREEZE cgroup
flag should be set. Otherwise if a process will be attached to the
child cgroup, it won't become frozen.
The problem can be reproduced with the test_cgfreezer_mkdir test.
This is the output before this patch:
~/test_freezer
ok 1 test_cgfreezer_simple
ok 2 test_cgfreezer_tree
ok 3 test_cgfreezer_forkbomb
Cgroup /sys/fs/cgroup/cg_test_mkdir_A/cg_test_mkdir_B isn't frozen
not ok 4 test_cgfreezer_mkdir
ok 5 test_cgfreezer_rmdir
ok 6 test_cgfreezer_migrate
ok 7 test_cgfreezer_ptrace
ok 8 test_cgfreezer_stopped
ok 9 test_cgfreezer_ptraced
ok 10 test_cgfreezer_vfork
And with this patch:
~/test_freezer
ok 1 test_cgfreezer_simple
ok 2 test_cgfreezer_tree
ok 3 test_cgfreezer_forkbomb
ok 4 test_cgfreezer_mkdir
ok 5 test_cgfreezer_rmdir
ok 6 test_cgfreezer_migrate
ok 7 test_cgfreezer_ptrace
ok 8 test_cgfreezer_stopped
ok 9 test_cgfreezer_ptraced
ok 10 test_cgfreezer_vfork
Reported-by: Mark Crossen <mcrossen@fb.com>
Signed-off-by: Roman Gushchin <guro@fb.com>
Fixes: 76f969e894 ("cgroup: cgroup v2 freezer")
Cc: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org # v5.2+
Signed-off-by: Tejun Heo <tj@kernel.org>
Add a new cgroup freezer selftest, which checks that if a cgroup is
frozen, their new child cgroups will properly inherit the frozen
state.
It creates a parent cgroup, freezes it, creates a child cgroup
and populates it with a dummy process. Then it checks that both
parent and child cgroup are frozen.
Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Pull clone3 fix from Christian Brauner:
"This is a last-minute bugfix for clone3() that should go in before we
release 5.3 with clone3().
clone3() did not verify that the exit_signal argument was set to a
valid signal. This can be used to cause a crash by specifying a signal
greater than NSIG. e.g. -1.
The commit from Eugene adds a check to copy_clone_args_from_user() to
verify that the exit signal is limited by CSIGNAL as with legacy
clone() and that the signal is valid. With this we don't get the
legacy clone behavior were an invalid signal could be handed down and
would only be detected and then ignored in do_notify_parent(). Users
of clone3() will now get a proper error right when they pass an
invalid exit signal. Note, that this is not a change in user-visible
behavior since no kernel with clone3() has been released yet"
* tag 'for-linus-20190912' of gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux:
fork: block invalid exit signals with clone3()
Pull x86 fixes from Ingo Molnar:
"A KVM guest fix, and a kdump kernel relocation errors fix"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/timer: Force PIT initialization when !X86_FEATURE_ARAT
x86/purgatory: Change compiler flags from -mcmodel=kernel to -mcmodel=large to fix kexec relocation errors
Previously, higher 32 bits of exit_signal fields were lost when copied
to the kernel args structure (that uses int as a type for the respective
field). Moreover, as Oleg has noted, exit_signal is used unchecked, so
it has to be checked for sanity before use; for the legacy syscalls,
applying CSIGNAL mask guarantees that it is at least non-negative;
however, there's no such thing is done in clone3() code path, and that
can break at least thread_group_leader.
This commit adds a check to copy_clone_args_from_user() to verify that
the exit signal is limited by CSIGNAL as with legacy clone() and that
the signal is valid. With this we don't get the legacy clone behavior
were an invalid signal could be handed down and would only be detected
and ignored in do_notify_parent(). Users of clone3() will now get a
proper error when they pass an invalid exit signal. Note, that this is
not user-visible behavior since no kernel with clone3() has been
released yet.
The following program will cause a splat on a non-fixed clone3() version
and will fail correctly on a fixed version:
#define _GNU_SOURCE
#include <linux/sched.h>
#include <linux/types.h>
#include <sched.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/syscall.h>
#include <sys/wait.h>
#include <unistd.h>
int main(int argc, char *argv[])
{
pid_t pid = -1;
struct clone_args args = {0};
args.exit_signal = -1;
pid = syscall(__NR_clone3, &args, sizeof(struct clone_args));
if (pid < 0)
exit(EXIT_FAILURE);
if (pid == 0)
exit(EXIT_SUCCESS);
wait(NULL);
exit(EXIT_SUCCESS);
}
Fixes: 7f192e3cd3 ("fork: add clone3")
Reported-by: Oleg Nesterov <oleg@redhat.com>
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Suggested-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com>
Link: https://lore.kernel.org/r/4b38fa4ce420b119a4c6345f42fe3cec2de9b0b5.1568223594.git.esyr@redhat.com
[christian.brauner@ubuntu.com: simplify check and rework commit message]
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
When the userspace program runs the KVM_S390_INTERRUPT ioctl to inject
an interrupt, we convert them from the legacy struct kvm_s390_interrupt
to the new struct kvm_s390_irq via the s390int_to_s390irq() function.
However, this function does not take care of all types of interrupts
that we can inject into the guest later (see do_inject_vcpu()). Since we
do not clear out the s390irq values before calling s390int_to_s390irq(),
there is a chance that we copy random data from the kernel stack which
could be leaked to the userspace later.
Specifically, the problem exists with the KVM_S390_INT_PFAULT_INIT
interrupt: s390int_to_s390irq() does not handle it, and the function
__inject_pfault_init() later copies irq->u.ext which contains the
random kernel stack data. This data can then be leaked either to
the guest memory in __deliver_pfault_init(), or the userspace might
retrieve it directly with the KVM_S390_GET_IRQ_STATE ioctl.
Fix it by handling that interrupt type in s390int_to_s390irq(), too,
and by making sure that the s390irq struct is properly pre-initialized.
And while we're at it, make sure that s390int_to_s390irq() now
directly returns -EINVAL for unknown interrupt types, so that we
immediately get a proper error code in case we add more interrupt
types to do_inject_vcpu() without updating s390int_to_s390irq()
sometime in the future.
Cc: stable@vger.kernel.org
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Link: https://lore.kernel.org/kvm/20190912115438.25761-1-thuth@redhat.com
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
The '.exit' functions from 'pernet_operations' structure should be marked
as __net_exit, not __net_init.
Fixes: 8e2d61e0ae ("sctp: fix race on protocol/netns initialization")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ixgbe driver currently does IPsec TX offloading
based on an existing secpath. However, the secpath
can also come from the RX side, in this case it is
misinterpreted for TX offload and the packets are
dropped with a "bad sa_idx" error. Fix this by using
the xfrm_offload() function to test for TX offload.
Fixes: 5925947047 ("ixgbe: process the Tx ipsec offload")
Reported-by: Michael Marley <michael@michaelmarley.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The lock_extent_buffer_io() returns 1 to the caller to tell it everything
went fine and the callers needs to start writeback for the extent buffer
(submit a bio, etc), 0 to tell the caller everything went fine but it does
not need to start writeback for the extent buffer, and a negative value if
some error happened.
When it's about to return 1 it tries to lock all pages, and if a try lock
on a page fails, and we didn't flush any existing bio in our "epd", it
calls flush_write_bio(epd) and overwrites the return value of 1 to 0 or
an error. The page might have been locked elsewhere, not with the goal
of starting writeback of the extent buffer, and even by some code other
than btrfs, like page migration for example, so it does not mean the
writeback of the extent buffer was already started by some other task,
so returning a 0 tells the caller (btree_write_cache_pages()) to not
start writeback for the extent buffer. Note that epd might currently have
either no bio, so flush_write_bio() returns 0 (success) or it might have
a bio for another extent buffer with a lower index (logical address).
Since we return 0 with the EXTENT_BUFFER_WRITEBACK bit set on the
extent buffer and writeback is never started for the extent buffer,
future attempts to writeback the extent buffer will hang forever waiting
on that bit to be cleared, since it can only be cleared after writeback
completes. Such hang is reported with a trace like the following:
[49887.347053] INFO: task btrfs-transacti:1752 blocked for more than 122 seconds.
[49887.347059] Not tainted 5.2.13-gentoo #2
[49887.347060] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[49887.347062] btrfs-transacti D 0 1752 2 0x80004000
[49887.347064] Call Trace:
[49887.347069] ? __schedule+0x265/0x830
[49887.347071] ? bit_wait+0x50/0x50
[49887.347072] ? bit_wait+0x50/0x50
[49887.347074] schedule+0x24/0x90
[49887.347075] io_schedule+0x3c/0x60
[49887.347077] bit_wait_io+0x8/0x50
[49887.347079] __wait_on_bit+0x6c/0x80
[49887.347081] ? __lock_release.isra.29+0x155/0x2d0
[49887.347083] out_of_line_wait_on_bit+0x7b/0x80
[49887.347084] ? var_wake_function+0x20/0x20
[49887.347087] lock_extent_buffer_for_io+0x28c/0x390
[49887.347089] btree_write_cache_pages+0x18e/0x340
[49887.347091] do_writepages+0x29/0xb0
[49887.347093] ? kmem_cache_free+0x132/0x160
[49887.347095] ? convert_extent_bit+0x544/0x680
[49887.347097] filemap_fdatawrite_range+0x70/0x90
[49887.347099] btrfs_write_marked_extents+0x53/0x120
[49887.347100] btrfs_write_and_wait_transaction.isra.4+0x38/0xa0
[49887.347102] btrfs_commit_transaction+0x6bb/0x990
[49887.347103] ? start_transaction+0x33e/0x500
[49887.347105] transaction_kthread+0x139/0x15c
So fix this by not overwriting the return value (ret) with the result
from flush_write_bio(). We also need to clear the EXTENT_BUFFER_WRITEBACK
bit in case flush_write_bio() returns an error, otherwise it will hang
any future attempts to writeback the extent buffer, and undo all work
done before (set back EXTENT_BUFFER_DIRTY, etc).
This is a regression introduced in the 5.2 kernel.
Fixes: 2e3c25136a ("btrfs: extent_io: add proper error handling to lock_extent_buffer_for_io()")
Fixes: f4340622e0 ("btrfs: extent_io: Move the BUG_ON() in flush_write_bio() one level up")
Reported-by: Zdenek Sojka <zsojka@seznam.cz>
Link: https://lore.kernel.org/linux-btrfs/GpO.2yos.3WGDOLpx6t%7D.1TUDYM@seznam.cz/T/#u
Reported-by: Stefan Priebe - Profihost AG <s.priebe@profihost.ag>
Link: https://lore.kernel.org/linux-btrfs/5c4688ac-10a7-fb07-70e8-c5d31a3fbb38@profihost.ag/T/#t
Reported-by: Drazen Kacar <drazen.kacar@oradian.com>
Link: https://lore.kernel.org/linux-btrfs/DB8PR03MB562876ECE2319B3E579590F799C80@DB8PR03MB5628.eurprd03.prod.outlook.com/
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204377
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Sometimes when fsync'ing a file we need to log that other inodes exist and
when we need to do that we acquire a reference on the inodes and then drop
that reference using iput() after logging them.
That generally is not a problem except if we end up doing the final iput()
(dropping the last reference) on the inode and that inode has a link count
of 0, which can happen in a very short time window if the logging path
gets a reference on the inode while it's being unlinked.
In that case we end up getting the eviction callback, btrfs_evict_inode(),
invoked through the iput() call chain which needs to drop all of the
inode's items from its subvolume btree, and in order to do that, it needs
to join a transaction at the helper function evict_refill_and_join().
However because the task previously started a transaction at the fsync
handler, btrfs_sync_file(), it has current->journal_info already pointing
to a transaction handle and therefore evict_refill_and_join() will get
that transaction handle from btrfs_join_transaction(). From this point on,
two different problems can happen:
1) evict_refill_and_join() will often change the transaction handle's
block reserve (->block_rsv) and set its ->bytes_reserved field to a
value greater than 0. If evict_refill_and_join() never commits the
transaction, the eviction handler ends up decreasing the reference
count (->use_count) of the transaction handle through the call to
btrfs_end_transaction(), and after that point we have a transaction
handle with a NULL ->block_rsv (which is the value prior to the
transaction join from evict_refill_and_join()) and a ->bytes_reserved
value greater than 0. If after the eviction/iput completes the inode
logging path hits an error or it decides that it must fallback to a
transaction commit, the btrfs fsync handle, btrfs_sync_file(), gets a
non-zero value from btrfs_log_dentry_safe(), and because of that
non-zero value it tries to commit the transaction using a handle with
a NULL ->block_rsv and a non-zero ->bytes_reserved value. This makes
the transaction commit hit an assertion failure at
btrfs_trans_release_metadata() because ->bytes_reserved is not zero but
the ->block_rsv is NULL. The produced stack trace for that is like the
following:
[192922.917158] assertion failed: !trans->bytes_reserved, file: fs/btrfs/transaction.c, line: 816
[192922.917553] ------------[ cut here ]------------
[192922.917922] kernel BUG at fs/btrfs/ctree.h:3532!
[192922.918310] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
[192922.918666] CPU: 2 PID: 883 Comm: fsstress Tainted: G W 5.1.4-btrfs-next-47 #1
[192922.919035] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
[192922.919801] RIP: 0010:assfail.constprop.25+0x18/0x1a [btrfs]
(...)
[192922.920925] RSP: 0018:ffffaebdc8a27da8 EFLAGS: 00010286
[192922.921315] RAX: 0000000000000051 RBX: ffff95c9c16a41c0 RCX: 0000000000000000
[192922.921692] RDX: 0000000000000000 RSI: ffff95cab6b16838 RDI: ffff95cab6b16838
[192922.922066] RBP: ffff95c9c16a41c0 R08: 0000000000000000 R09: 0000000000000000
[192922.922442] R10: ffffaebdc8a27e70 R11: 0000000000000000 R12: ffff95ca731a0980
[192922.922820] R13: 0000000000000000 R14: ffff95ca84c73338 R15: ffff95ca731a0ea8
[192922.923200] FS: 00007f337eda4e80(0000) GS:ffff95cab6b00000(0000) knlGS:0000000000000000
[192922.923579] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[192922.923948] CR2: 00007f337edad000 CR3: 00000001e00f6002 CR4: 00000000003606e0
[192922.924329] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[192922.924711] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[192922.925105] Call Trace:
[192922.925505] btrfs_trans_release_metadata+0x10c/0x170 [btrfs]
[192922.925911] btrfs_commit_transaction+0x3e/0xaf0 [btrfs]
[192922.926324] btrfs_sync_file+0x44c/0x490 [btrfs]
[192922.926731] do_fsync+0x38/0x60
[192922.927138] __x64_sys_fdatasync+0x13/0x20
[192922.927543] do_syscall_64+0x60/0x1c0
[192922.927939] entry_SYSCALL_64_after_hwframe+0x49/0xbe
(...)
[192922.934077] ---[ end trace f00808b12068168f ]---
2) If evict_refill_and_join() decides to commit the transaction, it will
be able to do it, since the nested transaction join only increments the
transaction handle's ->use_count reference counter and it does not
prevent the transaction from getting committed. This means that after
eviction completes, the fsync logging path will be using a transaction
handle that refers to an already committed transaction. What happens
when using such a stale transaction can be unpredictable, we are at
least having a use-after-free on the transaction handle itself, since
the transaction commit will call kmem_cache_free() against the handle
regardless of its ->use_count value, or we can end up silently losing
all the updates to the log tree after that iput() in the logging path,
or using a transaction handle that in the meanwhile was allocated to
another task for a new transaction, etc, pretty much unpredictable
what can happen.
In order to fix both of them, instead of using iput() during logging, use
btrfs_add_delayed_iput(), so that the logging path of fsync never drops
the last reference on an inode, that step is offloaded to a safe context
(usually the cleaner kthread).
The assertion failure issue was sporadically triggered by the test case
generic/475 from fstests, which loads the dm error target while fsstress
is running, which lead to fsync failing while logging inodes with -EIO
errors and then trying later to commit the transaction, triggering the
assertion failure.
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
If userspace doesn't set KVM_MEM_LOG_DIRTY_PAGES on memslot before calling
kvm_s390_vm_start_migration(), kernel will oops with:
Unable to handle kernel pointer dereference in virtual kernel address space
Failing address: 0000000000000000 TEID: 0000000000000483
Fault in home space mode while using kernel ASCE.
AS:0000000002a2000b R2:00000001bff8c00b R3:00000001bff88007 S:00000001bff91000 P:000000000000003d
Oops: 0004 ilc:2 [#1] SMP
...
Call Trace:
([<001fffff804ec552>] kvm_s390_vm_set_attr+0x347a/0x3828 [kvm])
[<001fffff804ecfc0>] kvm_arch_vm_ioctl+0x6c0/0x1998 [kvm]
[<001fffff804b67e4>] kvm_vm_ioctl+0x51c/0x11a8 [kvm]
[<00000000008ba572>] do_vfs_ioctl+0x1d2/0xe58
[<00000000008bb284>] ksys_ioctl+0x8c/0xb8
[<00000000008bb2e2>] sys_ioctl+0x32/0x40
[<000000000175552c>] system_call+0x2b8/0x2d8
INFO: lockdep is turned off.
Last Breaking-Event-Address:
[<0000000000dbaf60>] __memset+0xc/0xa0
due to ms->dirty_bitmap being NULL, which might crash the host.
Make sure that ms->dirty_bitmap is set before using it or
return -EINVAL otherwise.
Cc: <stable@vger.kernel.org>
Fixes: afdad61615 ("KVM: s390: Fix storage attributes migration with memory slots")
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Link: https://lore.kernel.org/kvm/20190911075218.29153-1-imammedo@redhat.com/
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
In qrtr_tun_write_iter the allocated kbuf should be release in case of
error or success return.
v2 Update: Thanks to David Miller for pointing out the release on success
path as well.
Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In event of failure during register_netdevice, free_netdev is
invoked immediately. free_netdev assumes that all the netdevice
refcounts have been dropped prior to it being called and as a
result frees and clears out the refcount pointer.
However, this is not necessarily true as some of the operations
in the NETDEV_UNREGISTER notifier handlers queue RCU callbacks for
invocation after a grace period. The IPv4 callback in_dev_rcu_put
tries to access the refcount after free_netdev is called which
leads to a null de-reference-
44837.761523: <6> Unable to handle kernel paging request at
virtual address 0000004a88287000
44837.761651: <2> pc : in_dev_finish_destroy+0x4c/0xc8
44837.761654: <2> lr : in_dev_finish_destroy+0x2c/0xc8
44837.762393: <2> Call trace:
44837.762398: <2> in_dev_finish_destroy+0x4c/0xc8
44837.762404: <2> in_dev_rcu_put+0x24/0x30
44837.762412: <2> rcu_nocb_kthread+0x43c/0x468
44837.762418: <2> kthread+0x118/0x128
44837.762424: <2> ret_from_fork+0x10/0x1c
Fix this by waiting for the completion of the call_rcu() in
case of register_netdevice errors.
Fixes: 93ee31f14f ("[NET]: Fix free_netdev on register_netdev failure.")
Cc: Sean Tranchetti <stranche@codeaurora.org>
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The '.exit' functions from 'pernet_operations' structure should be marked
as __net_exit, not __net_init.
Fixes: d862e54614 ("net: ipv6: Implement /proc/net/icmp6.")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull virtio fixes from Michael Tsirkin:
"Last minute bugfixes.
A couple of security things.
And an error handling bugfix that is never encountered by most people,
but that also makes it kind of safe to push at the last minute, and it
helps push the fix to stable a bit sooner"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vhost: make sure log_num < in_num
vhost: block speculation of translated descriptors
virtio_ring: fix unmap of indirect descriptors
Pull perf fix from Ingo Molnar:
"Fix an initialization bug in the hw-breakpoints, which triggered on
the ARM platform"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/hw_breakpoint: Fix arch_hw_breakpoint use-before-initialization
Pull irq fix from Ingo Molnar:
"Fix a race in the IRQ resend mechanism, which can result in a NULL
dereference crash"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq: Prevent NULL pointer dereference in resend_irqs()
Pull pin control fix from Linus Walleij:
"Hopefully last pin control fix: a single patch for some Aspeed
problems. The BMCs are much happier now"
* tag 'pinctrl-v5.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: aspeed: Fix spurious mux failures on the AST2500
Pull GPIO fixes from Linus Walleij:
"I don't really like to send so many fixes at the very last minute, but
the bug-sport activity is unpredictable.
Four fixes, three are -stable material that will go everywhere, one is
for the current cycle:
- An ACPI DSDT error fixup of the type we always see and Hans
invariably gets to fix.
- A OF quirk fix for the current release (v5.3)
- Some consistency checks on the userspace ABI.
- A memory leak"
* tag 'gpio-v5.3-6' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpiolib: acpi: Add gpiolib_acpi_run_edge_events_on_boot option and blacklist
gpiolib: of: fix fallback quirks handling
gpio: fix line flag validation in lineevent_create
gpio: fix line flag validation in linehandle_create
gpio: mockup: add missing single_release()
Commit 674fa8daa8 ("pinctrl: aspeed-g5: Delay acquisition of regmaps")
was determined to be a partial fix to the problem of acquiring the LPC
Host Controller and GFX regmaps: The AST2500 pin controller may need to
fetch syscon regmaps during expression evaluation as well as when
setting mux state. For example, this case is hit by attempting to export
pins exposing the LPC Host Controller as GPIOs.
An optional eval() hook is added to the Aspeed pinmux operation struct
and called from aspeed_sig_expr_eval() if the pointer is set by the
SoC-specific driver. This enables the AST2500 to perform the custom
action of acquiring its regmap dependencies as required.
John Wang tested the fix on an Inspur FP5280G2 machine (AST2500-based)
where the issue was found, and I've booted the fix on Witherspoon
(AST2500) and Palmetto (AST2400) machines, and poked at relevant pins
under QEMU by forcing mux configurations via devmem before exporting
GPIOs to exercise the driver.
Fixes: 7d29ed88ac ("pinctrl: aspeed: Read and write bits in LPC and GFX controllers")
Fixes: 674fa8daa8 ("pinctrl: aspeed-g5: Delay acquisition of regmaps")
Reported-by: John Wang <wangzqbj@inspur.com>
Tested-by: John Wang <wangzqbj@inspur.com>
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Link: https://lore.kernel.org/r/20190829071738.2523-1-andrew@aj.id.au
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2019-09-11
This series contains fixes to ixgbe.
Alex fixes up the adaptive ITR scheme for ixgbe which could result in a
value that was either 0 or something less than 10 which was causing
issues with hardware features, like RSC, that do not function well with
ITR values that low.
Ilya Maximets fixes the ixgbe driver to limit the number of transmit
descriptors to clean by the number of transmit descriptors used in the
transmit ring, so that the driver does not try to "double" clean the
same descriptors.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix tcp_ecn_withdraw_cwr() to clear the correct bit:
TCP_ECN_QUEUE_CWR.
Rationale: basically, TCP_ECN_DEMAND_CWR is a bit that is purely about
the behavior of data receivers, and deciding whether to reflect
incoming IP ECN CE marks as outgoing TCP th->ece marks. The
TCP_ECN_QUEUE_CWR bit is purely about the behavior of data senders,
and deciding whether to send CWR. The tcp_ecn_withdraw_cwr() function
is only called from tcp_undo_cwnd_reduction() by data senders during
an undo, so it should zero the sender-side state,
TCP_ECN_QUEUE_CWR. It does not make sense to stop the reflection of
incoming CE bits on incoming data packets just because outgoing
packets were spuriously retransmitted.
The bug has been reproduced with packetdrill to manifest in a scenario
with RFC3168 ECN, with an incoming data packet with CE bit set and
carrying a TCP timestamp value that causes cwnd undo. Before this fix,
the IP CE bit was ignored and not reflected in the TCP ECE header bit,
and sender sent a TCP CWR ('W') bit on the next outgoing data packet,
even though the cwnd reduction had been undone. After this fix, the
sender properly reflects the CE bit and does not set the W bit.
Note: the bug actually predates 2005 git history; this Fixes footer is
chosen to be the oldest SHA1 I have tested (from Sep 2007) for which
the patch applies cleanly (since before this commit the code was in a
.h file).
Fixes: bdf1ee5d3b ("[TCP]: Move code from tcp_ecn.h to tcp*.c and tcp.h & remove it")
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The code assumes log_num < in_num everywhere, and that is true as long as
in_num is incremented by descriptor iov count, and log_num by 1. However
this breaks if there's a zero sized descriptor.
As a result, if a malicious guest creates a vring desc with desc.len = 0,
it may cause the host kernel to crash by overflowing the log array. This
bug can be triggered during the VM migration.
There's no need to log when desc.len = 0, so just don't increment log_num
in this case.
Fixes: 3a4d5c94e9 ("vhost_net: a kernel-level virtio server")
Cc: stable@vger.kernel.org
Reviewed-by: Lidong Chen <lidongchen@tencent.com>
Signed-off-by: ruippan <ruippan@tencent.com>
Signed-off-by: yongduan <yongduan@tencent.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
iovec addresses coming from vhost are assumed to be
pre-validated, but in fact can be speculated to a value
out of range.
Userspace address are later validated with array_index_nospec so we can
be sure kernel info does not leak through these addresses, but vhost
must also not leak userspace info outside the allowed memory table to
guests.
Following the defence in depth principle, make sure
the address is not validated out of node range.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Cc: stable@vger.kernel.org
Acked-by: Jason Wang <jasowang@redhat.com>
Tested-by: Jason Wang <jasowang@redhat.com>
Tx code doesn't clear the descriptors' status after cleaning.
So, if the budget is larger than number of used elems in a ring, some
descriptors will be accounted twice and xsk_umem_complete_tx will move
prod_tail far beyond the prod_head breaking the completion queue ring.
Fix that by limiting the number of descriptors to clean by the number
of used descriptors in the Tx ring.
'ixgbe_clean_xdp_tx_irq()' function refactored to look more like
'ixgbe_xsk_clean_tx_ring()' since we're allowed to directly use
'next_to_clean' and 'next_to_use' indexes.
CC: stable@vger.kernel.org
Fixes: 8221c5eba8 ("ixgbe: add AF_XDP zero-copy Tx support")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Tested-by: William Tu <u9012063@gmail.com>
Tested-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
There were a couple cases where the ITR value generated via the adaptive
ITR scheme could exceed 126. This resulted in the value becoming either 0
or something less than 10. Switching back and forth between a value less
than 10 and a value greater than 10 can cause issues as certain hardware
features such as RSC to not function well when the ITR value has dropped
that low.
CC: stable@vger.kernel.org
Fixes: b4ded8327f ("ixgbe: Update adaptive ITR algorithm")
Reported-by: Gregg Leventhal <gleventhal@janestreet.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
There is a spelling mistake in a mlx4_err error message. Fix it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In rds_bind(), an rds_sock is added to the RDS bind hash table before
rs_transport is set. This means that the socket can be found by the
receive code path when rs_transport is NULL. And the receive code
path de-references rs_transport for congestion update check. This can
cause a panic. An rds_sock should not be added to the bind hash table
before all the needed fields are set.
Reported-by: syzbot+4b4f8163c2e246df3c4c@syzkaller.appspotmail.com
Signed-off-by: Ka-Cheong Poon <ka-cheong.poon@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Layer 2 Update frame is used to update bridges when a station roams
to another AP even if that STA does not transmit any frames after the
reassociation. This behavior was described in IEEE Std 802.11F-2003 as
something that would happen based on MLME-ASSOCIATE.indication, i.e.,
before completing 4-way handshake. However, this IEEE trial-use
recommended practice document was published before RSN (IEEE Std
802.11i-2004) and as such, did not consider RSN use cases. Furthermore,
IEEE Std 802.11F-2003 was withdrawn in 2006 and as such, has not been
maintained amd should not be used anymore.
Sending out the Layer 2 Update frame immediately after association is
fine for open networks (and also when using SAE, FT protocol, or FILS
authentication when the station is actually authenticated by the time
association completes). However, it is not appropriate for cases where
RSN is used with PSK or EAP authentication since the station is actually
fully authenticated only once the 4-way handshake completes after
authentication and attackers might be able to use the unauthenticated
triggering of Layer 2 Update frame transmission to disrupt bridge
behavior.
Fix this by postponing transmission of the Layer 2 Update frame from
station entry addition to the point when the station entry is marked
authorized. Similarly, send out the VLAN binding update only if the STA
entry has already been authorized.
Signed-off-by: Jouni Malinen <jouni@codeaurora.org>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 414126f9e5.
This commit broke eMMC storage access on a new consumer MiniPC based on
AMD SoC, which has eMMC connected to:
02:00.0 SD Host controller: O2 Micro, Inc. Device 8620 (rev 01) (prog-if 01)
Subsystem: O2 Micro, Inc. Device 0002
During probe, several errors are seen including:
mmc1: Got data interrupt 0x02000000 even though no data operation was in progress.
mmc1: Timeout waiting for hardware interrupt.
mmc1: error -110 whilst initialising MMC card
Reverting this commit allows the eMMC storage to be detected & usable
again.
Signed-off-by: Daniel Drake <drake@endlessm.com>
Fixes: 414126f9e5 ("mmc: sdhci: Remove unneeded quirk2 flag of O2 SD host
controller")
Cc: stable@vger.kernel.org # v5.1+
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
The commit 37fefadee8 ("mmc: bcm2835: Terminate timeout work
synchronously") causes lockups in case of hardware timeouts due the
timeout work also calling cancel_delayed_work_sync() on its own.
So revert it.
Fixes: 37fefadee8 ("mmc: bcm2835: Terminate timeout work synchronously")
Cc: stable@vger.kernel.org
Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Another day; another DSDT bug we need to workaround...
Since commit ca876c7483 ("gpiolib-acpi: make sure we trigger edge events
at least once on boot") we call _AEI edge handlers at boot.
In some rare cases this causes problems. One example of this is the Minix
Neo Z83-4 mini PC, this device has a clear DSDT bug where it has some copy
and pasted code for dealing with Micro USB-B connector host/device role
switching, while the mini PC does not even have a micro-USB connector.
This code, which should not be there, messes with the DDC data pin from
the HDMI connector (switching it to GPIO mode) breaking HDMI support.
To avoid problems like this, this commit adds a new
gpiolib_acpi.run_edge_events_on_boot kernel commandline option, which
allows disabling the running of _AEI edge event handlers at boot.
The default value is -1/auto which uses a DMI based blacklist, the initial
version of this blacklist contains the Neo Z83-4 fixing the HDMI breakage.
Cc: stable@vger.kernel.org
Cc: Daniel Drake <drake@endlessm.com>
Cc: Ian W MORRISON <ianwmorrison@gmail.com>
Reported-by: Ian W MORRISON <ianwmorrison@gmail.com>
Suggested-by: Ian W MORRISON <ianwmorrison@gmail.com>
Fixes: ca876c7483 ("gpiolib-acpi: make sure we trigger edge events at least once on boot")
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20190827202835.213456-1-hdegoede@redhat.com
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Tested-by: Ian W MORRISON <ianwmorrison@gmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Keep the "Library routines" menu intact by moving OBJAGG into it.
Otherwise OBJAGG is displayed/presented as an orphan in the
various config menus.
Fixes: 0a020d416d ("lib: introduce initial implementation of object aggregation manager")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Ido Schimmel <idosch@mellanox.com>
Cc: David S. Miller <davem@davemloft.net>
Tested-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sonic_send_packet will be processed in irq or non-irq
context, so it would better use dev_kfree_skb_any
instead of dev_kfree_skb.
Fixes: d9fb9f3842 ("*sonic/natsemi/ns83829: Move the National Semi-conductor drivers")
Signed-off-by: Mao Wenan <maowenan@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In i2400m_op_rfkill_sw_toggle cmd buffer should be released along with
skb response.
Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This issue causes SCTP_PEER_ADDR_THLDS sockopt not to be able to dump
a transport thresholds info.
Fix it by adding 'goto' put_user in sctp_getsockopt_paddr_thresholds.
Fixes: 8add543e36 ("sctp: add SCTP_FUTURE_ASSOC for SCTP_PEER_ADDR_THLDS sockopt")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
At least sch_red and sch_tbf don't implement ->tcf_block()
while still have a non-zero tc "class".
Instead of adding nop implementations to each of such qdisc's,
we can just relax the check of cops->tcf_block() in
tc_bind_tclass(). They don't support TC filter anyway.
Reported-by: syzbot+21b29db13c065852f64b@syzkaller.appspotmail.com
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull ipc regression fixes from Arnd Bergmann:
"Fix ipc regressions from y2038 patches
These are two regression fixes for bugs that got introduced during the
system call rework that went into linux-5.1 but only bisected and
fixed now:
- One patch affects semtimedop() on many of the less common 32-bit
architectures, this just needs a single-line bugfix.
- The other affects only sparc64 and has a slightly more invasive
workaround to apply the same change to sparc64 that was done to the
generic code used everywhere else"
* tag 'ipc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
ipc: fix sparc64 ipc() wrapper
ipc: fix semtimedop for generic 32-bit architectures
We should only try to execute fallback quirks handling when previous
call returned -ENOENT, and not when we did not get -EPROBE_DEFER.
The other errors should be treated as hard errors: we did find the GPIO
description, but for some reason we failed to handle it properly.
The fallbacks should only be executed when previous handlers returned
-ENOENT, which means the mapping/description was not found.
Also let's remove the explicit deferral handling when iterating through
GPIO suffixes: it is not needed anymore as we will not be calling
fallbacks for anything but -ENOENT.
Fixes: df451f83e1 ("gpio: of: fix Freescale SPI CS quirk handling")
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Link: https://lore.kernel.org/r/20190903231856.GA165165@dtor-ws
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
NLM_F_MULTI must be used only when a NLMSG_DONE message is sent at the end.
In fact, NLMSG_DONE is sent only at the end of a dump.
Libraries like libnl will wait forever for NLMSG_DONE.
Fixes: 949f1e39a6 ("bridge: mdb: notify on router port add and del")
CC: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 1c2977c094 ("net/ibmvnic: free reset work of removed device from queue")
adds a } without corresponding { causing build break.
Fixes: 1c2977c094 ("net/ibmvnic: free reset work of removed device from queue")
Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Reviewed-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Reviewed-by: Juliet Kim <julietk@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull regulator fixes from Mark Brown:
"This is obviouly very late, containing three small and simple driver
specific fixes.
The main one is the TWL fix, this fixes issues with cpufreq on the
PMICs used with BeagleBoard generation OMAP SoCs which had been broken
due to changes in the generic OPP code exposing a bug in the regulator
driver for these devices causing them to think that OPPs weren't
supported on the system.
Sorry about sending this so late, I hadn't registered that the TWL
issue manifested in cpufreq"
* tag 'regulator-fix-v5.3-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: twl: voltage lists for vdd1/2 on twl4030
regulator: act8945a-regulator: fix ldo register addresses in set_mode hook
regulator: slg51000: Fix a couple NULL vs IS_ERR() checks
The function virtqueue_add_split() DMA-maps the scatterlist buffers. In
case a mapping error occurs the already mapped buffers must be unmapped.
This happens by jumping to the 'unmap_release' label.
In case of indirect descriptors the release is wrong and may leak kernel
memory. Because the implementation assumes that the head descriptor is
already mapped it starts iterating over the descriptor list starting
from the head descriptor. However for indirect descriptors the head
descriptor is never mapped in case of an error.
The fix is to initialize the start index with zero in case of indirect
descriptors and use the 'desc' pointer directly for iterating over the
descriptor chain.
Signed-off-by: Matthias Lange <matthias.lange@kernkonzept.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
lineevent_create should not allow any of GPIOHANDLE_REQUEST_OUTPUT,
GPIOHANDLE_REQUEST_OPEN_DRAIN or GPIOHANDLE_REQUEST_OPEN_SOURCE to be set.
Fixes: d7c51b47ac ("gpio: userspace ABI for reading/writing GPIO lines")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Kent Gibson <warthog618@gmail.com>
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
linehandle_create should not allow both GPIOHANDLE_REQUEST_INPUT
and GPIOHANDLE_REQUEST_OUTPUT to be set.
Fixes: d7c51b47ac ("gpio: userspace ABI for reading/writing GPIO lines")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Kent Gibson <warthog618@gmail.com>
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
When using single_open() for opening, single_release() should be
used instead of seq_release(), otherwise there is a memory leak.
Fixes: 2a9e27408e ("gpio: mockup: rework debugfs interface")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Pull section attribute fix from Miguel Ojeda:
"Fix Oops in Clang-compiled kernels (Nick Desaulniers)"
* tag 'compiler-attributes-for-linus-v5.3-rc8' of git://github.com/ojeda/linux:
include/linux/compiler.h: fix Oops for Clang-compiled kernels
Pull GPIO fixes from Linus Walleij:
"All related to the PCA953x driver when handling chips with more than 8
ports, now that works again"
* tag 'gpio-v5.3-5' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpio: pca953x: use pca953x_read_regs instead of regmap_bulk_read
gpio: pca953x: correct type of reg_direction
KVM guests with commit c8c4076723 ("x86/timer: Skip PIT initialization on
modern chipsets") applied to guest kernel have been observed to have
unusually higher CPU usage with symptoms of increase in vm exits for HLT
and MSW_WRITE (MSR_IA32_TSCDEADLINE).
This is caused by older QEMUs lacking support for X86_FEATURE_ARAT. lapic
clock retains CLOCK_EVT_FEAT_C3STOP and nohz stays inactive. There's no
usable broadcast device either.
Do the PIT initialization if guest CPU lacks X86_FEATURE_ARAT. On real
hardware it shouldn't matter as ARAT and DEADLINE come together.
Fixes: c8c4076723 ("x86/timer: Skip PIT initialization on modern chipsets")
Signed-off-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This reverts commit 558682b529.
Chris Wilson reports that it breaks his CPU hotplug test scripts. In
particular, it breaks offlining and then re-onlining the boot CPU, which
we treat specially (and the BIOS does too).
The symptoms are that we can offline the CPU, but it then does not come
back online again:
smpboot: CPU 0 is now offline
smpboot: Booting Node 0 Processor 0 APIC 0x0
smpboot: do_boot_cpu failed(-1) to wakeup CPU#0
Thomas says he knows why it's broken (my personal suspicion: our magic
handling of the "cpu0_logical_apicid" thing), but for 5.3 the right fix
is to just revert it, since we've never touched the LDR bits before, and
it's not worth the risk to do anything else at this stage.
[ Hotpluging of the boot CPU is special anyway, and should be off by
default. See the "BOOTPARAM_HOTPLUG_CPU0" config option and the
cpu0_hotplug kernel parameter.
In general you should not do it, and it has various known limitations
(hibernate and suspend require the boot CPU, for example).
But it should work, even if the boot CPU is special and needs careful
treatment - Linus ]
Link: https://lore.kernel.org/lkml/156785100521.13300.14461504732265570003@skylake-alporthouse-com/
Reported-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Bandan Das <bsd@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Matt bisected a sparc64 specific issue with semctl, shmctl and msgctl
to a commit from my y2038 series in linux-5.1, as I missed the custom
sys_ipc() wrapper that sparc64 uses in place of the generic version that
I patched.
The problem is that the sys_{sem,shm,msg}ctl() functions in the kernel
now do not allow being called with the IPC_64 flag any more, resulting
in a -EINVAL error when they don't recognize the command.
Instead, the correct way to do this now is to call the internal
ksys_old_{sem,shm,msg}ctl() functions to select the API version.
As we generally move towards these functions anyway, change all of
sparc_ipc() to consistently use those in place of the sys_*() versions,
and move the required ksys_*() declarations into linux/syscalls.h
The IS_ENABLED(CONFIG_SYSVIPC) check is required to avoid link
errors when ipc is disabled.
Reported-by: Matt Turner <mattst88@gmail.com>
Fixes: 275f22148e ("ipc: rename old-style shmctl/semctl/msgctl syscalls")
Cc: stable@vger.kernel.org
Tested-by: Matt Turner <mattst88@gmail.com>
Tested-by: Anatoly Pugachev <matorola@gmail.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Pull Documentation updates from Greg KH:
"A few small patches for the documenation file that came in through the
char-misc tree in -rc7 for your tree.
They fix the mistake in the .rst format that kept the table of
companies from showing up in the html output, and most importantly,
add people's names to the list showing support for our process"
* tag 'char-misc-5.3-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
Documentation/process: Add Qualcomm process ambassador for hardware security issues
Documentation/process/embargoed-hardware-issues: Microsoft ambassador
Documentation/process: Add Google contact for embargoed hardware issues
Documentation/process: Volunteer as the ambassador for Xen
Pull dmaengine fixes from Vinod Koul:
"Some late fixes for drivers:
- memory leak in ti crossbar dma driver
- cleanup of omap dma probe
- Fix for link list configuration in sprd dma driver
- Handling fixed for DMACHCLR if iommu is mapped in rcar dma"
* tag 'dmaengine-fix-5.3' of git://git.infradead.org/users/vkoul/slave-dma:
dmaengine: rcar-dmac: Fix DMACHCLR handling if iommu is mapped
dmaengine: sprd: Fix the DMA link-list configuration
dmaengine: ti: omap-dma: Add cleanup in omap_dma_probe()
dmaengine: ti: dma-crossbar: Fix a memory leak bug
Flower control message replies are handled in different locations. The truly
high priority replies are handled in the BH (tasklet) context, while the
remaining replies are handled in a predefined Linux work queue. The work
queue handler orders replies into high and low priority groups, and always
start servicing the high priority replies within the received batch first.
Reply Type: Rtnl Lock: Handler:
CMSG_TYPE_PORT_MOD no BH tasklet (mtu)
CMSG_TYPE_TUN_NEIGH no BH tasklet
CMSG_TYPE_FLOW_STATS no BH tasklet
CMSG_TYPE_PORT_REIFY no WQ high
CMSG_TYPE_PORT_MOD yes WQ high (link/mtu)
CMSG_TYPE_MERGE_HINT yes WQ low
CMSG_TYPE_NO_NEIGH no WQ low
CMSG_TYPE_ACTIVE_TUNS no WQ low
CMSG_TYPE_QOS_STATS no WQ low
CMSG_TYPE_LAG_CONFIG no WQ low
A subset of control messages can block waiting for an rtnl lock (from both
work queue priority groups). The rtnl lock is heavily contended for by
external processes such as systemd-udevd, systemd-network and libvirtd,
especially during netdev creation, such as when flower VFs and representors
are instantiated.
Kernel netlink instrumentation shows that external processes (such as
systemd-udevd) often use successive rtnl_trylock() sequences, which can result
in an rtnl_lock() blocked control message to starve for longer periods of time
during rtnl lock contention, i.e. netdev creation.
In the current design a single blocked control message will block the entire
work queue (both priorities), and introduce a latency which is
nondeterministic and dependent on system wide rtnl lock usage.
In some extreme cases, one blocked control message at exactly the wrong time,
just before the maximum number of VFs are instantiated, can block the work
queue for long enough to prevent VF representor REIFY replies from getting
handled in time for the 40ms timeout.
The firmware will deliver the total maximum number of REIFY message replies in
around 300us.
Only REIFY and MTU update messages require replies within a timeout period (of
40ms). The MTU-only updates are already done directly in the BH (tasklet)
handler.
Move the REIFY handler down into the BH (tasklet) in order to resolve timeouts
caused by a blocked work queue waiting on rtnl locks.
Signed-off-by: Fred Lotter <frederik.lotter@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Historically, support for frag_list packets entering skb_segment() was
limited to frag_list members terminating on exact same gso_size
boundaries. This is verified with a BUG_ON since commit 89319d3801
("net: Add frag_list support to skb_segment"), quote:
As such we require all frag_list members terminate on exact MSS
boundaries. This is checked using BUG_ON.
As there should only be one producer in the kernel of such packets,
namely GRO, this requirement should not be difficult to maintain.
However, since commit 6578171a7f ("bpf: add bpf_skb_change_proto helper"),
the "exact MSS boundaries" assumption no longer holds:
An eBPF program using bpf_skb_change_proto() DOES modify 'gso_size', but
leaves the frag_list members as originally merged by GRO with the
original 'gso_size'. Example of such programs are bpf-based NAT46 or
NAT64.
This lead to a kernel BUG_ON for flows involving:
- GRO generating a frag_list skb
- bpf program performing bpf_skb_change_proto() or bpf_skb_adjust_room()
- skb_segment() of the skb
See example BUG_ON reports in [0].
In commit 13acc94eff ("net: permit skb_segment on head_frag frag_list skb"),
skb_segment() was modified to support the "gso_size mangling" case of
a frag_list GRO'ed skb, but *only* for frag_list members having
head_frag==true (having a page-fragment head).
Alas, GRO packets having frag_list members with a linear kmalloced head
(head_frag==false) still hit the BUG_ON.
This commit adds support to skb_segment() for a 'head_skb' packet having
a frag_list whose members are *non* head_frag, with gso_size mangled, by
disabling SG and thus falling-back to copying the data from the given
'head_skb' into the generated segmented skbs - as suggested by Willem de
Bruijn [1].
Since this approach involves the penalty of skb_copy_and_csum_bits()
when building the segments, care was taken in order to enable this
solution only when required:
- untrusted gso_size, by testing SKB_GSO_DODGY is set
(SKB_GSO_DODGY is set by any gso_size mangling functions in
net/core/filter.c)
- the frag_list is non empty, its item is a non head_frag, *and* the
headlen of the given 'head_skb' does not match the gso_size.
[0]
https://lore.kernel.org/netdev/20190826170724.25ff616f@pixies/https://lore.kernel.org/netdev/9265b93f-253d-6b8c-f2b8-4b54eff1835c@fb.com/
[1]
https://lore.kernel.org/netdev/CA+FuTSfVsgNDi7c=GUU8nMg2hWxF2SjCNLXetHeVPdnxAW5K-w@mail.gmail.com/
Fixes: 6578171a7f ("bpf: add bpf_skb_change_proto helper")
Suggested-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
syzbot reported:
BUG: KMSAN: uninit-value in capi_write+0x791/0xa90 drivers/isdn/capi/capi.c:700
CPU: 0 PID: 10025 Comm: syz-executor379 Not tainted 4.20.0-rc7+ #2
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x173/0x1d0 lib/dump_stack.c:113
kmsan_report+0x12e/0x2a0 mm/kmsan/kmsan.c:613
__msan_warning+0x82/0xf0 mm/kmsan/kmsan_instr.c:313
capi_write+0x791/0xa90 drivers/isdn/capi/capi.c:700
do_loop_readv_writev fs/read_write.c:703 [inline]
do_iter_write+0x83e/0xd80 fs/read_write.c:961
vfs_writev fs/read_write.c:1004 [inline]
do_writev+0x397/0x840 fs/read_write.c:1039
__do_sys_writev fs/read_write.c:1112 [inline]
__se_sys_writev+0x9b/0xb0 fs/read_write.c:1109
__x64_sys_writev+0x4a/0x70 fs/read_write.c:1109
do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291
entry_SYSCALL_64_after_hwframe+0x63/0xe7
[...]
The problem is that capi_write() is reading past the end of the message.
Fix it by checking the message's length in the needed places.
Reported-and-tested-by: syzbot+0849c524d9c634f5ae66@syzkaller.appspotmail.com
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 36f1031c51 ("ibmvnic: Do not process reset during or after
device removal") made the change to exit reset if the driver has been
removed, but does not free reset work items of the adapter from queue.
Ensure all reset work items are freed when breaking out of the loop early.
Fixes: 36f1031c51 ("ibmnvic: Do not process reset during or after device removal”)
Signed-off-by: Juliet Kim <julietk@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Regarding to IEEE 802.3-2015 standard section 2
28B.3 Priority resolution - Table 28-3 - Pause resolution
In case of Local device Pause=1 AsymDir=0, Link partner
Pause=1 AsymDir=1, Local device resolution should be enable PAUSE
transmit, disable PAUSE receive.
And in case of Local device Pause=1 AsymDir=1, Link partner
Pause=1 AsymDir=0, Local device resolution should be enable PAUSE
receive, disable PAUSE transmit.
Fixes: 9525ae8395 ("phylink: add phylink infrastructure")
Signed-off-by: Stefan Chulski <stefanc@marvell.com>
Reported-by: Shaul Ben-Mayor <shaulb@marvell.com>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
We 'allocate' 'count' bytes here. In fact, 'dev_alloc_skb' already add some
extra space for padding, so a bit more is allocated.
However, we use 1 byte for the KISS command, then copy 'count' bytes, so
count+1 bytes.
Explicitly allocate and use 1 more byte to be safe.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov says:
====================
pull-request: bpf 2019-09-06
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) verifier precision tracking fix, from Alexei.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull SCSI fix from James Bottomley:
"Just a single lpfc fix adjusting the number of available queues for
high CPU count systems"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: lpfc: Raise config max for lpfc_fcp_mq_threshold variable
Pull libnvdimm fix from Dan Williams:
"Restore support for 1GB alignment namespaces, truncate the end of
misaligned namespaces"
* tag 'libnvdimm-fix-5.3-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
libnvdimm/pfn: Fix namespace creation on misaligned addresses
Pull input fix from Dmitry Torokhov:
"A tiny update from Benjamin removing a mistakenly added Elan PNP ID so
that the device is again handled by hid-multitouch"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: elan_i2c - remove Lenovo Legion Y7000 PnpID
Looks like the Bios of the Lenovo Legion Y7000 is using ELAN061B
when the actual device is supposed to be used with hid-multitouch.
Remove it from the list of the supported device, hoping that
no one will complain about the loss in functionality.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=203467
Fixes: 738c06d0e4 ("Input: elan_i2c - add hardware ID for multiple Lenovo laptops")
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Pull ARM SoC fixes from Arnd Bergmann:
"There are three more fixes for this week:
- The Windows-on-ARM laptops require a workaround to prevent crashing
at boot from ACPI
- The Renesas 'draak' board needs one bugfix for the backlight
regulator
- Also for Renesas, the 'hihope' board accidentally had its eMMC
turned off in the 5.3 merge window"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
soc: qcom: geni: Provide parameter error checking
arm64: dts: renesas: hihope-common: Fix eMMC status
arm64: dts: renesas: r8a77995: draak: Fix backlight regulator name
Pull configfs fixes from Christoph Hellwig:
"Late configfs fixes from Al that fix pretty nasty removal vs attribute
access races"
* tag 'configfs-for-5.3' of git://git.infradead.org/users/hch/configfs:
configfs: provide exclusion between IO and removals
configfs: new object reprsenting tree fragments
configfs_register_group() shouldn't be (and isn't) called in rmdirable parts
configfs: stash the data we need into configfs_buffer at open time
Pull IOMMU fixes from Joerg Roedel:
- Revert an Intel VT-d patch that caused problems for some users.
- Removal of a feature in the Intel VT-d driver that was never
supported in hardware. This qualifies as a fix because the code for
this feature sets reserved bits in the invalidation queue descriptor,
causing failed invalidations on real hardware.
- Two fixes for AMD IOMMU driver to fix a race condition and to add a
missing IOTLB flush when kernel is booted in kdump mode.
* tag 'iommu-fixes-v5.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/amd: Fix race in increase_address_space()
iommu/amd: Flush old domains in kdump kernel
iommu/vt-d: Remove global page flush support
Revert "iommu/vt-d: Avoid duplicated pci dma alias consideration"
Pull MMC fix from Ulf Hansson:
"Revert in order to fix card init for some eMMCs that need retries for
CMD6"
* tag 'mmc-v5.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
Revert "mmc: core: do not retry CMD6 in __mmc_switch()"
Pull drm fixes from Dave Airlie:
"Live from my friend's couch in Barcelona, latest round of drm fixes.
The command line parser regression fixes look a bit larger because
they come with selftests included for the bugs they fix. Otherwise a
single nouveau, single ingenic and single vmwgfx fix:
nouveau:
- add missing MODULE_FIRMWARE definitions
igenic:
- hardcode panel type DPI
vmwgfx:
- double free fix
core:
- command line mode parser fixes"
* tag 'drm-fixes-2019-09-06' of git://anongit.freedesktop.org/drm/drm:
drm/vmwgfx: Fix double free in vmw_recv_msg()
drm/nouveau/sec2/gp102: add missing MODULE_FIRMWAREs
drm/selftests: modes: Add more unit tests for the cmdline parser
drm/modes: Introduce a whitelist for the named modes
drm/modes: Fix the command line parser to take force options into account
drm/modes: Add a switch to differentiate free standing options
drm/ingenic: Hardcode panel type to DPI
Pull virtio fixes from Michael Tsirkin:
"virtio, vhost, and balloon bugfixes.
A couple of last minute bugfixes. And a revert of a failed attempt at
metadata access optimization - we'll try again in the next cycle"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
mm/balloon_compaction: suppress allocation warnings
Revert "vhost: access vq metadata through kernel virtual address"
vhost: Remove unnecessary variable
virtio-net: lower min ring num_free for efficiency
vhost/test: fix build for vhost test
vhost/test: fix build for vhost test
Pull powerpc fixes from Michael Ellerman:
"One fix for a boot hang on some Freescale machines when PREEMPT is
enabled.
Two CVE fixes for bugs in our handling of FP registers and
transactional memory, both of which can result in corrupted FP state,
or FP state leaking between processes.
Thanks to: Chris Packham, Christophe Leroy, Gustavo Romero, Michael
Neuling"
* tag 'powerpc-5.3-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/tm: Fix restoring FP/VMX facility incorrectly on interrupts
powerpc/tm: Fix FP/VMX unavailable exceptions inside a transaction
powerpc/64e: Drop stale call to smp_processor_id() which hangs SMP startup
Kalle Valo says:
====================
wireless-drivers fixes for 5.3
Fourth set of fixes for 5.3, and hopefully really the last one. Quite
a few CVE fixes this time but at least to my knowledge none of them
have a known exploit.
mt76
* workaround firmware hang by disabling hardware encryption on MT7630E
* disable 5GHz band for MT7630E as it's not working properly
mwifiex
* fix IE parsing to avoid a heap buffer overflow
iwlwifi
* fix for QuZ device initialisation
rt2x00
* another fix for rekeying
* revert a commit causing degradation in rx signal levels
rsi
* fix a double free
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
I am maintaining xilinx axiethernet driver in xilinx tree and would like
to maintain it in the mainline kernel as well. Hence adding myself as a
maintainer. Also Anirudha and John has moved to new roles, so based on
request removing them from the maintainer list.
Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
Acked-by: John Linn <john.linn@xilinx.com>
Acked-by: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Whenever MQ is not used on a multiqueue device, we experience
serious reordering problems. Bisection found the cited
commit.
The issue can be described this way :
- A single qdisc hierarchy is shared by all transmit queues.
(eg : tc qdisc replace dev eth0 root fq_codel)
- When/if try_bulk_dequeue_skb_slow() dequeues a packet targetting
a different transmit queue than the one used to build a packet train,
we stop building the current list and save the 'bad' skb (P1) in a
special queue. (bad_txq)
- When dequeue_skb() calls qdisc_dequeue_skb_bad_txq() and finds this
skb (P1), it checks if the associated transmit queues is still in frozen
state. If the queue is still blocked (by BQL or NIC tx ring full),
we leave the skb in bad_txq and return NULL.
- dequeue_skb() calls q->dequeue() to get another packet (P2)
The other packet can target the problematic queue (that we found
in frozen state for the bad_txq packet), but another cpu just ran
TX completion and made room in the txq that is now ready to accept
new packets.
- Packet P2 is sent while P1 is still held in bad_txq, P1 might be sent
at next round. In practice P2 is the lead of a big packet train
(P2,P3,P4 ...) filling the BQL budget and delaying P1 by many packets :/
To solve this problem, we have to block the dequeue process as long
as the first packet in bad_txq can not be sent. Reordering issues
disappear and no side effects have been seen.
Fixes: a53851e2c3 ("net: sched: explicit locking in gso_cpu fallback")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert says:
====================
pull request (net): ipsec 2019-09-05
1) Several xfrm interface fixes from Nicolas Dichtel:
- Avoid an interface ID corruption on changelink.
- Fix wrong intterface names in the logs.
- Fix a list corruption when changing network namespaces.
- Fix unregistation of the underying phydev.
2) Fix a potential warning when merging xfrm_plocy nodes.
From Florian Westphal.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When testing with a background iperf pushing 1Gbit/sec traffic and running
both ifconfig and netstat to collect statistics, some deadlocks occurred.
Ifconfig and netstat will call nv_get_stats64 to get software xmit/recv
statistics. In the commit f5d827aece ("forcedeth: implement
ndo_get_stats64() API"), the normal tx/rx variables is to collect tx/rx
statistics. The fix is to replace normal tx/rx variables with per
cpu 64-bit variable to collect xmit/recv statistics. The per cpu variable
will avoid deadlocks and provide fast efficient statistics updates.
In nv_probe, the per cpu variable is initialized. In nv_remove, this
per cpu variable is freed.
In xmit/recv process, this per cpu variable will be updated.
In nv_get_stats64, this per cpu variable on each cpu is added up. Then
the driver can get xmit/recv packets statistics.
A test runs for several days with this commit, the deadlocks disappear
and the performance is better.
Tested:
- iperf SMP x86_64 ->
Client connecting to 1.1.1.108, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[ 3] local 1.1.1.105 port 38888 connected with 1.1.1.108 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 1.10 GBytes 943 Mbits/sec
ifconfig results:
enp0s9 Link encap:Ethernet HWaddr 00:21:28:6f:de:0f
inet addr:1.1.1.105 Bcast:0.0.0.0 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:5774764531 errors:0 dropped:0 overruns:0 frame:0
TX packets:633534193 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:7646159340904 (7.6 TB) TX bytes:11425340407722 (11.4 TB)
netstat results:
Kernel Interface table
Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
...
enp0s9 1500 0 5774764531 0 0 0 633534193 0 0 0 BMRU
...
Fixes: f5d827aece ("forcedeth: implement ndo_get_stats64() API")
CC: Joe Jin <joe.jin@oracle.com>
CC: JUNXIAO_BI <junxiao.bi@oracle.com>
Reported-and-tested-by: Nan san <nan.1986san@gmail.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
NETDEV_TX_BUSY really should only be used by drivers that call
netif_tx_stop_queue() at the wrong moment. If dma_map_single() is
failed to map tx DMA buffer, it might trigger an infinite loop.
This patch use NETDEV_TX_OK instead of NETDEV_TX_BUSY, and change
printk to pr_err_ratelimited.
Fixes: d9fb9f3842 ("*sonic/natsemi/ns83829: Move the National Semi-conductor drivers")
Signed-off-by: Mao Wenan <maowenan@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After the conversion to lock-less dma-api call the
increase_address_space() function can be called without any
locking. Multiple CPUs could potentially race for increasing
the address space, leading to invalid domain->mode settings
and invalid page-tables. This has been happening in the wild
under high IO load and memory pressure.
Fix the race by locking this operation. The function is
called infrequently so that this does not introduce
a performance regression in the dma-api path again.
Reported-by: Qian Cai <cai@lca.pw>
Fixes: 256e4621c2 ('iommu/amd: Make use of the generic IOVA allocator')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
When devices are attached to the amd_iommu in a kdump kernel, the old device
table entries (DTEs), which were copied from the crashed kernel, will be
overwritten with a new domain number. When the new DTE is written, the IOMMU
is told to flush the DTE from its internal cache--but it is not told to flush
the translation cache entries for the old domain number.
Without this patch, AMD systems using the tg3 network driver fail when kdump
tries to save the vmcore to a network system, showing network timeouts and
(sometimes) IOMMU errors in the kernel log.
This patch will flush IOMMU translation cache entries for the old domain when
a DTE gets overwritten with a new domain number.
Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>
Fixes: 3ac3e5ee5e ('iommu/amd: Copy old trans table from old kernel')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The last change to this Makefile caused relocation errors when loading
a kdump kernel. Restore -mcmodel=large (not -mcmodel=kernel),
-ffreestanding, and -fno-zero-initialized-bsss, without reverting to
the former practice of resetting KBUILD_CFLAGS.
Purgatory.ro is a standalone binary that is not linked against the
rest of the kernel. Its image is copied into an array that is linked
to the kernel, and from there kexec relocates it wherever it desires.
With the previous change to compiler flags, the error "kexec: Overflow
in relocation type 11 value 0x11fffd000" was encountered when trying
to load the crash kernel. This is from kexec code trying to relocate
the purgatory.ro object.
From the error message, relocation type 11 is R_X86_64_32S. The
x86_64 ABI says:
"The R_X86_64_32 and R_X86_64_32S relocations truncate the
computed value to 32-bits. The linker must verify that the
generated value for the R_X86_64_32 (R_X86_64_32S) relocation
zero-extends (sign-extends) to the original 64-bit value."
This type of relocation doesn't work when kexec chooses to place the
purgatory binary in memory that is not reachable with 32 bit
addresses.
The compiler flag -mcmodel=kernel allows those type of relocations to
be emitted, so revert to using -mcmodel=large as was done before.
Also restore the -ffreestanding and -fno-zero-initialized-bss flags
because they are appropriate for a stand alone piece of object code
which doesn't explicitly zero the bss, and one other report has said
undefined symbols are encountered without -ffreestanding.
These identical compiler flag changes need to happen for every object
that becomes part of the purgatory.ro object, so gather them together
first into PURGATORY_CFLAGS_REMOVE and PURGATORY_CFLAGS, and then
apply them to each of the objects that have C source. Do not apply
any of these flags to kexec-purgatory.o, which is not part of the
standalone object but part of the kernel proper.
Tested-by: Vaibhav Rustagi <vaibhavrustagi@google.com>
Tested-by: Andreas Smas <andreas@lonelycoder.com>
Signed-off-by: Steve Wahl <steve.wahl@hpe.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: None
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: clang-built-linux@googlegroups.com
Cc: dimitri.sivanich@hpe.com
Cc: mike.travis@hpe.com
Cc: russ.anderson@hpe.com
Fixes: b059f801a9 ("x86/purgatory: Use CFLAGS_REMOVE rather than reset KBUILD_CFLAGS")
Link: https://lkml.kernel.org/r/20190905202346.GA26595@swahl-linux
Signed-off-by: Ingo Molnar <mingo@kernel.org>
If a request_key authentication token key gets revoked, there's a window in
which request_key_auth_describe() can see it with a NULL payload - but it
makes no check for this and something like the following oops may occur:
BUG: Kernel NULL pointer dereference at 0x00000038
Faulting instruction address: 0xc0000000004ddf30
Oops: Kernel access of bad area, sig: 11 [#1]
...
NIP [...] request_key_auth_describe+0x90/0xd0
LR [...] request_key_auth_describe+0x54/0xd0
Call Trace:
[...] request_key_auth_describe+0x54/0xd0 (unreliable)
[...] proc_keys_show+0x308/0x4c0
[...] seq_read+0x3d0/0x540
[...] proc_reg_read+0x90/0x110
[...] __vfs_read+0x3c/0x70
[...] vfs_read+0xb4/0x1b0
[...] ksys_read+0x7c/0x130
[...] system_call+0x5c/0x70
Fix this by checking for a NULL pointer when describing such a key.
Also make the read routine check for a NULL pointer to be on the safe side.
[DH: Modified to not take already-held rcu lock and modified to also check
in the read routine]
Fixes: 04c567d931 ("[PATCH] Keys: Fix race between two instantiators of a key")
Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Signed-off-by: Hillf Danton <hdanton@sina.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The following crash was observed:
Unable to handle kernel NULL pointer dereference at 0000000000000158
Internal error: Oops: 96000004 [#1] SMP
pc : resend_irqs+0x68/0xb0
lr : resend_irqs+0x64/0xb0
...
Call trace:
resend_irqs+0x68/0xb0
tasklet_action_common.isra.6+0x84/0x138
tasklet_action+0x2c/0x38
__do_softirq+0x120/0x324
run_ksoftirqd+0x44/0x60
smpboot_thread_fn+0x1ac/0x1e8
kthread+0x134/0x138
ret_from_fork+0x10/0x18
The reason for this is that the interrupt resend mechanism happens in soft
interrupt context, which is a asynchronous mechanism versus other
operations on interrupts. free_irq() does not take resend handling into
account. Thus, the irq descriptor might be already freed before the resend
tasklet is executed. resend_irqs() does not check the return value of the
interrupt descriptor lookup and derefences the return value
unconditionally.
1):
__setup_irq
irq_startup
check_irq_resend // activate softirq to handle resend irq
2):
irq_domain_free_irqs
irq_free_descs
free_desc
call_rcu(&desc->rcu, delayed_free_desc)
3):
__do_softirq
tasklet_action
resend_irqs
desc = irq_to_desc(irq)
desc->handle_irq(desc) // desc is NULL --> Ooops
Fix this by adding a NULL pointer check in resend_irqs() before derefencing
the irq descriptor.
Fixes: a4633adcdb ("[PATCH] genirq: add genirq sw IRQ-retrigger")
Signed-off-by: Yunfeng Ye <yeyunfeng@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Zhiqiang Liu <liuzhiqiang26@huawei.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1630ae13-5c8e-901e-de09-e740b6a426a7@huawei.com
Pull sound fixes from Takashi Iwai:
"A collection of small HD-audio fixes:
- A regression fix for Realtek codecs due to the recent
initialization procedure change
- A fix for potential endless loop at the quirk table lookup
- Quirks for Lenovo, ASUS and HP machines"
* tag 'sound-5.3-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/realtek - Fix the problem of two front mics on a ThinkCentre
ALSA: hda/realtek - Enable internal speaker & headset mic of ASUS UX431FL
ALSA: hda/realtek - Add quirk for HP Pavilion 15
ALSA: hda/realtek - Fix overridden device-specific initialization
ALSA: hda - Fix potential endless loop at applying quirks
Pull x86 fixes from Ingo Molnar:
"Misc fixes:
- EFI boot fix for signed kernels
- an AC flags fix related to UBSAN
- Hyper-V infinite loop fix"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/hyper-v: Fix overflow bug in fill_gva_list()
x86/uaccess: Don't leak the AC flags into __get_user() argument evaluation
x86/boot: Preserve boot_params.secure_boot from sanitizing
Pull scheduler fixes from Ingo Molnar:
"This fixes an ABI bug introduced this cycle, plus fixes a throttling
bug"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/core: Fix uclamp ABI bug, clean up and robustify sched_read_attr() ABI logic and code
sched/fair: Don't assign runtime for throttled cfs_rq
Pull clang-format update from Miguel Ojeda:
"Update with the latest for_each macro list"
* tag 'clang-format-for-linus-v5.3-rc8' of git://github.com/ojeda/linux:
clang-format: Update with the latest for_each macro list
Second Round of Renesas ARM Based SoC Fixes for v5.3
* RZ/G2M based HiHope main board
- Re-enabled accidently disabled SDHI3 (eMMC) support
* tag 'renesas-fixes2-for-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas:
arm64: dts: renesas: hihope-common: Fix eMMC status
Link: https://lore.kernel.org/r/cover.1567675986.git.horms+renesas@verge.net.au
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
We recently added a kfree() after the end of the loop:
if (retries == RETRIES) {
kfree(reply);
return -EINVAL;
}
There are two problems. First the test is wrong and because retries
equals RETRIES if we succeed on the last iteration through the loop.
Second if we fail on the last iteration through the loop then the kfree
is a double free.
When you're reading this code, please note the break statement at the
end of the while loop. This patch changes the loop so that if it's not
successful then "reply" is NULL and we can test for that afterward.
Cc: <stable@vger.kernel.org>
Fixes: 6b7c3b86f0 ("drm/vmwgfx: fix memory leak when too many retries have occurred")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
The problem can be seen in the following two tests:
0: (bf) r3 = r10
1: (55) if r3 != 0x7b goto pc+0
2: (7a) *(u64 *)(r3 -8) = 0
3: (79) r4 = *(u64 *)(r10 -8)
..
0: (85) call bpf_get_prandom_u32#7
1: (bf) r3 = r10
2: (55) if r3 != 0x7b goto pc+0
3: (7b) *(u64 *)(r3 -8) = r0
4: (79) r4 = *(u64 *)(r10 -8)
When backtracking need to mark R4 it will mark slot fp-8.
But ST or STX into fp-8 could belong to the same block of instructions.
When backtracing is done the parent state may have fp-8 slot
as "unallocated stack". Which will cause verifier to warn
and incorrectly reject such programs.
Writes into stack via non-R10 register are rare. llvm always
generates canonical stack spill/fill.
For such pathological case fall back to conservative precision
tracking instead of rejecting.
Reported-by: syzbot+c8d66267fd2b5955287e@syzkaller.appspotmail.com
Fixes: b5dc0163d8 ("bpf: precise scalar_value tracking")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
When creating a v4 route that uses a v6 nexthop from a nexthop group.
Allow the kernel to properly send the nexthop as v6 via the RTA_VIA
attribute.
Broken behavior:
$ ip nexthop add via fe80::9 dev eth0
$ ip nexthop show
id 1 via fe80::9 dev eth0 scope link
$ ip route add 4.5.6.7/32 nhid 1
$ ip route show
default via 10.0.2.2 dev eth0
4.5.6.7 nhid 1 via 254.128.0.0 dev eth0
10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.15
$
Fixed behavior:
$ ip nexthop add via fe80::9 dev eth0
$ ip nexthop show
id 1 via fe80::9 dev eth0 scope link
$ ip route add 4.5.6.7/32 nhid 1
$ ip route show
default via 10.0.2.2 dev eth0
4.5.6.7 nhid 1 via inet6 fe80::9 dev eth0
10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.15
$
v2, v3: Addresses code review comments from David Ahern
Fixes: dcb1ecb50e (“ipv4: Prepare for fib6_nh from a nexthop object”)
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern says:
====================
nexthops: Fix multipath notifications for IPv6 and selftests
A couple of bug fixes noticed while testing Donald's patch.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Cleanups of the tests in fib_nexthops.sh
1. Several tests noted unexpected route output, but the
discrepancy was not showing in the summary output and
overlooked in the verbose output. Add a WARNING message
to the summary output to make it clear a test is not showing
expected output.
2. Several check_* calls are missing extra data like scope and metric
causing mismatches when the nexthops or routes are correct - some of
them are a side effect of the evolving iproute2 command. Update the
data to the expected output.
3. Several check_routes are checking for the wrong nexthop data,
most likely a copy-paste-update error.
4. A couple of tests were re-using a nexthop id that already existed.
Fix those to use a new id.
Fixes: 6345266a99 ("selftests: Add test cases for nexthop objects")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A change to the core nla helpers was missed during the push of
the nexthop changes. rt6_fill_node_nexthop should be calling
nla_nest_start_noflag not nla_nest_start. Currently, iproute2
does not print multipath data because of parsing issues with
the attribute.
Fixes: f88d8ea67f ("ipv6: Plumb support for nexthop object in a fib6_info")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sock_map and ULP only work together when ULP is loaded after the sock
map is loaded. In the sock_map case we added a check for this to fail
the load if ULP is already set. However, we missed the check on the
sock_hash side.
Add a ULP check to the sock_hash update path.
Fixes: 604326b41a ("bpf, sockmap: convert to generic sk_msg interface")
Reported-by: syzbot+7a6ee4d0078eac6bf782@syzkaller.appspotmail.com
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add forward declaration for struct gpio_desc in order to address
the following:
./include/linux/phy_fixed.h:48:17: error: 'struct gpio_desc' declared inside parameter list [-Werror]
./include/linux/phy_fixed.h:48:17: error: its scope is only this definition or declaration, which is probably not what you want [-Werror]
Fixes: 71bd106d25 ("net: fixed-phy: Add fixed_phy_register_with_gpiod() API")
Signed-off-by: Moritz Fischer <mdf@kernel.org>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johan Hedberg says:
====================
pull request: bluetooth 2019-09-05
Here are a few more Bluetooth fixes for 5.3. I hope they can still make
it. There's one USB ID addition for btusb, two reverts due to discovered
regressions, and two other important fixes.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit c49a8682fc.
There are devices which require low connection intervals for usable operation
including keyboards and mice. Forcing a static connection interval for
these types of devices has an impact in latency and causes a regression.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
There is a subtle change in behaviour introduced by:
commit c7a1ce397a
'ipv6: Change addrconf_f6i_alloc to use ip6_route_info_create'
Before that patch /proc/net/ipv6_route includes:
00000000000000000000000000000001 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000003 00000000 80200001 lo
Afterwards /proc/net/ipv6_route includes:
00000000000000000000000000000001 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000002 00000000 80240001 lo
ie. the above commit causes the ::1/128 local (automatic) route to be flagged with RTF_ADDRCONF (0x040000).
AFAICT, this is incorrect since these routes are *not* coming from RA's.
As such, this patch restores the old behaviour.
Fixes: c7a1ce397a ("ipv6: Change addrconf_f6i_alloc to use ip6_route_info_create")
Cc: David Ahern <dsahern@gmail.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Transport should use its own pf_retrans to do the error_count
check, instead of asoc's. Otherwise, it's meaningless to make
pf_retrans per transport.
Fixes: 5aa93bcf66 ("sctp: Implement quick failover draft from tsvwg")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There's a misplaced traceline in rxrpc_input_packet() which is looking at a
packet that just got released rather than the replacement packet.
Fix this by moving the traceline after the assignment that moves the new
packet pointer to the actual packet pointer.
Fixes: d0d5c0cd1e ("rxrpc: Use skb_unshare() rather than skb_cow_data()")
Reported-by: Hillf Danton <hdanton@sina.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) br_netfilter drops IPv6 packets if ipv6 is disabled, from Leonardo Bras.
2) nft_socket hits BUG() due to illegal skb->sk caching, patch from
Fernando Fernandez Mancera.
3) nft_fib_netdev could be called with ipv6 disabled, leading to crash
in the fib lookup, also from Leonardo.
4) ctnetlink honors IPS_OFFLOAD flag, just like nf_conntrack sysctl does.
5) Properly set up flowtable entry timeout, otherwise immediate
removal by garbage collector might occur.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Make sure that attribute methods are not called after the item
has been removed from the tree. To do so, we
* at the point of no return in removals, grab ->frag_sem
exclusive and mark the fragment dead.
* call the methods of attributes with ->frag_sem taken
shared and only after having verified that the fragment is still
alive.
The main benefit is for method instances - they are
guaranteed that the objects they are accessing *and* all ancestors
are still there. Another win is that we don't need to bother
with extra refcount on config_item when opening a file -
the item will be alive for as long as it stays in the tree, and
we won't touch it/attributes/any associated data after it's
been removed from the tree.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Thadeu Lima de Souza Cascardo reported that 'chrt' broke on recent kernels:
$ chrt -p $$
chrt: failed to get pid 26306's policy: Argument list too long
and he has root-caused the bug to the following commit increasing sched_attr
size and breaking sched_read_attr() into returning -EFBIG:
a509a7cd79 ("sched/uclamp: Extend sched_setattr() to support utilization clamping")
The other, bigger bug is that the whole sched_getattr() and sched_read_attr()
logic of checking non-zero bits in new ABI components is arguably broken,
and pretty much any extension of the ABI will spuriously break the ABI.
That's way too fragile.
Instead implement the perf syscall's extensible ABI instead, which we
already implement on the sched_setattr() side:
- if user-attributes have the same size as kernel attributes then the
logic is unchanged.
- if user-attributes are larger than the kernel knows about then simply
skip the extra bits, but set attr->size to the (smaller) kernel size
so that tooling can (in principle) handle older kernel as well.
- if user-attributes are smaller than the kernel knows about then just
copy whatever user-space can accept.
Also clean up the whole logic:
- Simplify the code flow - there's no need for 'ret' for example.
- Standardize on 'kattr/uattr' and 'ksize/usize' naming to make sure we
always know which side we are dealing with.
- Why is it called 'read' when what it does is to copy to user? This
code is so far away from VFS read() semantics that the naming is
actively confusing. Name it sched_attr_copy_to_user() instead, which
mirrors other copy_to_user() functionality.
- Move the attr->size assignment from the head of sched_getattr() to the
sched_attr_copy_to_user() function. Nothing else within the kernel
should care about the size of the structure.
With these fixes the sched_getattr() syscall now nicely supports an
extensible ABI in both a forward and backward compatible fashion, and
will also fix the chrt bug.
As an added bonus the bogus -EFBIG return is removed as well, which as
Thadeu noted should have been -E2BIG to begin with.
Reported-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Tested-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Acked-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Patrick Bellasi <patrick.bellasi@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: a509a7cd79 ("sched/uclamp: Extend sched_setattr() to support utilization clamping")
Link: https://lkml.kernel.org/r/20190904075532.GA26751@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When returning from bpa10x_send_frame, it is necessary to propagate any
potential errno returned from usb_submit_urb.
Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Looks like Deadlock is observed in hci_qca while performing
stress and stability tests. Since same lock is getting
acquired from qca_wq_awake_rx and hci_ibs_tx_idle_timeout
seeing spinlock recursion, irqs should be disable while
acquiring the spinlock always.
Signed-off-by: Harish Bandi <c-hbandi@codeaurora.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Renesas ARM Based SoC Fixes for v5.3
* R-Car D3 (r8a77995) based Draak Board
- Correct backlight regulator name in device tree
* tag 'renesas-fixes-for-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas:
arm64: dts: renesas: r8a77995: draak: Fix backlight regulator name
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
When in userspace and MSR FP=0 the hardware FP state is unrelated to
the current process. This is extended for transactions where if tbegin
is run with FP=0, the hardware checkpoint FP state will also be
unrelated to the current process. Due to this, we need to ensure this
hardware checkpoint is updated with the correct state before we enable
FP for this process.
Unfortunately we get this wrong when returning to a process from a
hardware interrupt. A process that starts a transaction with FP=0 can
take an interrupt. When the kernel returns back to that process, we
change to FP=1 but with hardware checkpoint FP state not updated. If
this transaction is then rolled back, the FP registers now contain the
wrong state.
The process looks like this:
Userspace: Kernel
Start userspace
with MSR FP=0 TM=1
< -----
...
tbegin
bne
Hardware interrupt
---- >
<do_IRQ...>
....
ret_from_except
restore_math()
/* sees FP=0 */
restore_fp()
tm_active_with_fp()
/* sees FP=1 (Incorrect) */
load_fp_state()
FP = 0 -> 1
< -----
Return to userspace
with MSR TM=1 FP=1
with junk in the FP TM checkpoint
TM rollback
reads FP junk
When returning from the hardware exception, tm_active_with_fp() is
incorrectly making restore_fp() call load_fp_state() which is setting
FP=1.
The fix is to remove tm_active_with_fp().
tm_active_with_fp() is attempting to handle the case where FP state
has been changed inside a transaction. In this case the checkpointed
and transactional FP state is different and hence we must restore the
FP state (ie. we can't do lazy FP restore inside a transaction that's
used FP). It's safe to remove tm_active_with_fp() as this case is
handled by restore_tm_state(). restore_tm_state() detects if FP has
been using inside a transaction and will set load_fp and call
restore_math() to ensure the FP state (checkpoint and transaction) is
restored.
This is a data integrity problem for the current process as the FP
registers are corrupted. It's also a security problem as the FP
registers from one process may be leaked to another.
Similarly for VMX.
A simple testcase to replicate this will be posted to
tools/testing/selftests/powerpc/tm/tm-poison.c
This fixes CVE-2019-15031.
Fixes: a7771176b4 ("powerpc: Don't enable FP/Altivec if not checkpointed")
Cc: stable@vger.kernel.org # 4.15+
Signed-off-by: Gustavo Romero <gromero@linux.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190904045529.23002-2-gromero@linux.vnet.ibm.com
When we take an FP unavailable exception in a transaction we have to
account for the hardware FP TM checkpointed registers being
incorrect. In this case for this process we know the current and
checkpointed FP registers must be the same (since FP wasn't used
inside the transaction) hence in the thread_struct we copy the current
FP registers to the checkpointed ones.
This copy is done in tm_reclaim_thread(). We use thread->ckpt_regs.msr
to determine if FP was on when in userspace. thread->ckpt_regs.msr
represents the state of the MSR when exiting userspace. This is setup
by check_if_tm_restore_required().
Unfortunatley there is an optimisation in giveup_all() which returns
early if tsk->thread.regs->msr (via local variable `usermsr`) has
FP=VEC=VSX=SPE=0. This optimisation means that
check_if_tm_restore_required() is not called and hence
thread->ckpt_regs.msr is not updated and will contain an old value.
This can happen if due to load_fp=255 we start a userspace process
with MSR FP=1 and then we are context switched out. In this case
thread->ckpt_regs.msr will contain FP=1. If that same process is then
context switched in and load_fp overflows, MSR will have FP=0. If that
process now enters a transaction and does an FP instruction, the FP
unavailable will not update thread->ckpt_regs.msr (the bug) and MSR
FP=1 will be retained in thread->ckpt_regs.msr. tm_reclaim_thread()
will then not perform the required memcpy and the checkpointed FP regs
in the thread struct will contain the wrong values.
The code path for this happening is:
Userspace: Kernel
Start userspace
with MSR FP/VEC/VSX/SPE=0 TM=1
< -----
...
tbegin
bne
fp instruction
FP unavailable
---- >
fp_unavailable_tm()
tm_reclaim_current()
tm_reclaim_thread()
giveup_all()
return early since FP/VMX/VSX=0
/* ckpt MSR not updated (Incorrect) */
tm_reclaim()
/* thread_struct ckpt FP regs contain junk (OK) */
/* Sees ckpt MSR FP=1 (Incorrect) */
no memcpy() performed
/* thread_struct ckpt FP regs not fixed (Incorrect) */
tm_recheckpoint()
/* Put junk in hardware checkpoint FP regs */
....
< -----
Return to userspace
with MSR TM=1 FP=1
with junk in the FP TM checkpoint
TM rollback
reads FP junk
This is a data integrity problem for the current process as the FP
registers are corrupted. It's also a security problem as the FP
registers from one process may be leaked to another.
This patch moves up check_if_tm_restore_required() in giveup_all() to
ensure thread->ckpt_regs.msr is updated correctly.
A simple testcase to replicate this will be posted to
tools/testing/selftests/powerpc/tm/tm-poison.c
Similarly for VMX.
This fixes CVE-2019-15030.
Fixes: f48e91e87e ("powerpc/tm: Fix FP and VMX register corruption")
Cc: stable@vger.kernel.org # 4.12+
Signed-off-by: Gustavo Romero <gromero@linux.vnet.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190904045529.23002-1-gromero@linux.vnet.ibm.com
There is no reason to print warnings when balloon page allocation fails,
as they are expected and can be handled gracefully. Since VMware
balloon now uses balloon-compaction infrastructure, and suppressed these
warnings before, it is also beneficial to suppress these warnings to
keep the same behavior that the balloon had before.
Cc: Jason Wang <jasowang@redhat.com>
Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
This reverts commit 7f466032dc ("vhost: access vq metadata through
kernel virtual address"). The commit caused a bunch of issues, and
while commit 73f628ec9e ("vhost: disable metadata prefetch
optimization") disabled the optimization it's not nice to keep lots of
dead code around.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
It is unnecessary to use ret variable to return the error
code, just return the error code directly.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This change lowers ring buffer reclaim threshold from 1/2*queue to budget
for better performance. According to our test with qemu + dpdk, packet
dropping happens when the guest is not able to provide free buffer in
avail ring timely with default 1/2*queue. The value in the patch has been
tested and does show better performance.
Test setup: iperf3 to generate packets to guest (total 30mins, pps 400k, UDP)
avg packets drop before: 2842
avg packets drop after: 360(-87.3%)
Further, current code suffers from a starvation problem: the amount of
work done by try_fill_recv is not bounded by the budget parameter, thus
(with large queues) once in a while userspace gets blocked for a long
time while queue is being refilled. Trigger refills earlier to make sure
the amount of work to do is limited.
Signed-off-by: jiangkidd <jiangkidd@hotmail.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Since vhost_exceeds_weight() was introduced, callers need to specify
the packet weight and byte weight in vhost_dev_init(). Note that, the
packet weight isn't counted in this patch to keep the original behavior
unchanged.
Fixes: e82b9b0727 ("vhost: introduce vhost_exceeds_weight()")
Cc: stable@vger.kernel.org
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
This ThinkCentre machine has a new realtek codec alc222, it is not
in the support list, we add it in the realtek.c then this machine
can apply FIXUPs for the realtek codec.
And this machine has two front mics which can't be handled
by PA so far, it uses the pin 0x18 and 0x19 as the front mics, as
a result the existing FIXUP ALC294_FIXUP_LENOVO_MIC_LOCATION doesn't
work on this machine. Fortunately another FIXUP
ALC283_FIXUP_HEADSET_MIC also can change the location for one of the
two mics on this machine.
Link: https://lore.kernel.org/r/20190904055327.9883-1-hui.wang@canonical.com
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The commit 20c169aceb ("dmaengine: rcar-dmac: clear pertinence
number of channels") forgets to clear the last channel by
DMACHCLR in rcar_dmac_init() (and doesn't need to clear the first
channel) if iommu is mapped to the device. So, this patch fixes it
by using "channels_mask" bitfield.
Note that the hardware and driver don't support more than 32 bits
in DMACHCLR register anyway, so this patch should reject more than
32 channels in rcar_dmac_parse_of().
Fixes: 20c169aceb ("dmaengine: rcar-dmac: clear pertinence number of channels")
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Reviewed-by: Simon Horman <horms+renesas@verge.net.au>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/1567424643-26629-1-git-send-email-yoshihiro.shimoda.uh@renesas.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
For the Spreadtrum DMA link-list mode, when the DMA engine got a slave
hardware request, which will trigger the DMA engine to load the DMA
configuration from the link-list memory automatically. But before the
slave hardware request, the slave will get an incorrect residue due
to the first node used to trigger the link-list was configured as the
last source address and destination address.
Thus we should make sure the first node was configured the start source
address and destination address, which can fix this issue.
Fixes: 4ac6954647 ("dmaengine: sprd: Support DMA link-list mode")
Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
Link: https://lore.kernel.org/r/77868edb7aff9d5cb12ac3af8827ef2e244441a6.1567150471.git.baolin.wang@linaro.org
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Set up the default timeout for this new entry otherwise the garbage
collector might quickly remove it right after the flowtable insertion.
Fixes: ac2a66665e ("netfilter: add generic flow table infrastructure")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
If this flag is set, timeout and state are irrelevant to userspace.
Fixes: 90964016e5 ("netfilter: nf_conntrack: add IPS_OFFLOAD status bit")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
If IPv6 is disabled on boot (ipv6.disable=1), but nft_fib_inet ends up
dealing with a IPv6 packet, it causes a kernel panic in
fib6_node_lookup_1(), crashing in bad_page_fault.
The panic is caused by trying to deference a very low address (0x38
in ppc64le), due to ipv6.fib6_main_tbl = NULL.
BUG: Kernel NULL pointer dereference at 0x00000038
The kernel panic was reproduced in a host that disabled IPv6 on boot and
have to process guest packets (coming from a bridge) using it's ip6tables.
Terminate rule evaluation when packet protocol is IPv6 but the ipv6 module
is not loaded.
Signed-off-by: Leonardo Bras <leonardo@linux.ibm.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Turns out the commit 3a0681c744 ("mmc: core: do not retry CMD6 in
__mmc_switch()") breaks initialization of a Toshiba THGBMNG5 eMMC card,
when using the meson-gx-mmc.c driver on a custom board based on Amlogic
A113D.
The CMD6 that switches the card into HS200 mode is then one that fails and
according to the below printed messages from the log:
[ 1.648951] mmc0: mmc_select_hs200 failed, error -84
[ 1.648988] mmc0: error -84 whilst initialising MMC card
After some analyze, it turns out that adding a delay of ~5ms inside
mmc_select_bus_width() but after mmc_compare_ext_csds() has been executed,
also fixes the problem. Adding yet some more debug code, trying to figure
out if potentially the card could be in a busy state, both by using CMD13
and ->card_busy() ops concluded that this was not the case.
Therefore, let's simply revert the commit that dropped support for retrying
of CMD6, as this also fixes the problem.
Fixes: 3a0681c744 ("mmc: core: do not retry CMD6 in __mmc_switch()")
Cc: stable@vger.kernel.org
Signed-off-by: Jan Kaisrlik <ja.kaisrlik@gmail.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
`dev` (struct rsi_91x_usbdev *) field of adapter
(struct rsi_91x_usbdev *) is allocated and initialized in
`rsi_init_usb_interface`. If any error is detected in information
read from the device side, `rsi_init_usb_interface` will be
freed. However, in the higher level error handling code in
`rsi_probe`, if error is detected, `rsi_91x_deinit` is called
again, in which `dev` will be freed again, resulting double free.
This patch fixes the double free by removing the free operation on
`dev` in `rsi_init_usb_interface`, because `rsi_91x_deinit` is also
used in `rsi_disconnect`, in that code path, the `dev` field is not
(and thus needs to be) freed.
This bug was found in v4.19, but is also present in the latest version
of kernel. Fixes CVE-2019-15504.
Reported-by: Hui Peng <benquike@gmail.com>
Reported-by: Mathias Payer <mathias.payer@nebelwelt.net>
Signed-off-by: Hui Peng <benquike@gmail.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
This reverts commit 9ad3b55654.
As reported by Sergey:
"I got some problem after upgrade kernel to 5.2 version (debian testing
linux-image-5.2.0-2-amd64). 5Ghz client stopped to see AP.
Some tests with 1metre distance between client-AP: 2.4Ghz -22dBm, for
5Ghz - 53dBm !, for longer distance (8m + walls) 2.4 - 61dBm, 5Ghz not
visible."
It was identified that rx signal level degradation was caused by
9ad3b55654 ("rt2800: enable TX_PIN_CFG_LNA_PE_ bits per band").
So revert this commit.
Cc: <stable@vger.kernel.org> # v5.1+
Reported-and-tested-by: Sergey Maranchuk <slav0nic0@gmail.com>
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
After looking at code I realized that my previous fix
9584412438 ("rt2x00: clear IV's on start to fix AP mode regression")
was incomplete. We can still have wrong IV's after re-keyring.
To fix that, clear up IV's also on key removal.
Fixes: 710e6cc159 ("rt2800: do not nullify initialization vector data")
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
tested-by: Emil Karlson <jekarl@iki.fi>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
We were erroneously assigning the new configuration to a local
variable cfg, but that was not being assigned to anything, so the
change was getting lost. Assign directly to iwl_trans->cfg instead.
Fixes: 5a8c31aa63 ("iwlwifi: pcie: fix recognition of QuZ devices")
Cc: stable@vger.kernel.org # 5.2
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
mwifiex_update_vs_ie(),mwifiex_set_uap_rates() and
mwifiex_set_wmm_params() call memcpy() without checking
the destination size.Since the source is given from
user-space, this may trigger a heap buffer overflow.
Fix them by putting the length check before performing memcpy().
This fix addresses CVE-2019-14814,CVE-2019-14815,CVE-2019-14816.
Signed-off-by: Wen Huang <huangwenabc@gmail.com>
Acked-by: Ganapathi Bhat <gbhat@marvell.comg>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
MT7630E hardware does support 5GHz, but we do not properly configure phy
for 5GHz channels. Scanning at this band not only do not show any APs
but also can hang the firmware.
Since vendor reference driver do not support 5GHz we don't know how
properly configure 5GHz channels. So disable this band for MT7630E .
Cc: stable@vger.kernel.org
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Since 41634aa8d6 ("mt76: only schedule txqs from the tx tasklet")
I can observe firmware hangs on MT7630E on station mode: tx stop
functioning after minor activity (rx keep working) and on module
unload device fail to stop with messages:
[ 5446.141413] mt76x0e 0000:06:00.0: TX DMA did not stop
[ 5449.176764] mt76x0e 0000:06:00.0: TX DMA did not stop
Loading module again results in failure to associate with AP.
Only machine power off / power on cycle can make device work again.
It's unclear why commit 41634aa8d6 causes the problem, but it is
related to HW encryption. Since issue is a firmware hang, that is super
hard to debug, just disable HW encryption as fix for the issue.
Fixes: 41634aa8d6 ("mt76: only schedule txqs from the tx tasklet")
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Global pages support is removed from VT-d spec 3.0. Since global pages G
flag only affects first-level paging structures and because DMA request
with PASID are only supported by VT-d spec. 3.0 and onward, we can
safely remove global pages support.
For kernel shared virtual address IOTLB invalidation, PASID
granularity and page selective within PASID will be used. There is
no global granularity supported. Without this fix, IOTLB invalidation
will cause invalid descriptor error in the queued invalidation (QI)
interface.
Fixes: 1c4f88b7f1 ("iommu/vt-d: Shared virtual address in scalable mode")
Reported-by: Sanjay K Kumar <sanjay.k.kumar@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
do_sched_cfs_period_timer() will refill cfs_b runtime and call
distribute_cfs_runtime to unthrottle cfs_rq, sometimes cfs_b->runtime
will allocate all quota to one cfs_rq incorrectly, then other cfs_rqs
attached to this cfs_b can't get runtime and will be throttled.
We find that one throttled cfs_rq has non-negative
cfs_rq->runtime_remaining and cause an unexpetced cast from s64 to u64
in snippet:
distribute_cfs_runtime() {
runtime = -cfs_rq->runtime_remaining + 1;
}
The runtime here will change to a large number and consume all
cfs_b->runtime in this cfs_b period.
According to Ben Segall, the throttled cfs_rq can have
account_cfs_rq_runtime called on it because it is throttled before
idle_balance, and the idle_balance calls update_rq_clock to add time
that is accounted to the task.
This commit prevents cfs_rq to be assgined new runtime if it has been
throttled until that distribute_cfs_runtime is called.
Signed-off-by: Liangyan <liangyan.peng@linux.alibaba.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Reviewed-by: Ben Segall <bsegall@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: shanpeic@linux.alibaba.com
Cc: stable@vger.kernel.org
Cc: xlpang@linux.alibaba.com
Fixes: d3d9dc3302 ("sched: Throttle entities exceeding their allowed bandwidth")
Link: https://lkml.kernel.org/r/20190826121633.6538-1-liangyan.peng@linux.alibaba.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Original pin node values of ASUS UX431FL with ALC294:
0x12 0xb7a60140
0x13 0x40000000
0x14 0x90170110
0x15 0x411111f0
0x16 0x411111f0
0x17 0x90170111
0x18 0x411111f0
0x19 0x411111f0
0x1a 0x411111f0
0x1b 0x411111f0
0x1d 0x4066852d
0x1e 0x411111f0
0x1f 0x411111f0
0x21 0x04211020
1. Has duplicated internal speakers (0x14 & 0x17) which makes the output
route become confused. So, the output volume cannot be changed by
setting.
2. Misses the headset mic pin node.
This patch disables the confusing speaker (NID 0x14) and enables the
headset mic (NID 0x19).
Link: https://lore.kernel.org/r/20190902100054.6941-1-jian-hong@endlessm.com
Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
A kernel panic can happen if a host has disabled IPv6 on boot and have to
process guest packets (coming from a bridge) using it's ip6tables.
IPv6 packets need to be dropped if the IPv6 module is not loaded, and the
host ip6tables will be used.
Signed-off-by: Leonardo Bras <leonardo@linux.ibm.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Refcounted, hangs of configfs_dirent, created by operations that add
fragments to configfs tree (mkdir and configfs_register_{subsystem,group}).
Will be used in the next commit to provide exclusion between fragment
removal and ->show/->store calls.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
revert cc57c07343 "configfs: fix registered group removal"
It was an attempt to handle something that fundamentally doesn't
work - configfs_register_group() should never be done in a part
of tree that can be rmdir'ed. And in mainline it never had been,
so let's not borrow trouble; the fix was racy anyway, it would take
a lot more to make that work and desired semantics is not clear.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
simplifies the ->read()/->write()/->release() instances nicely
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
In function sun8i_dwmac_set_syscon(), local variable "val" could
be uninitialized if function regmap_field_read() returns -EINVAL.
However, it will be used directly in the if statement, which
is potentially unsafe.
Signed-off-by: Yizhuo <yzhai003@ucr.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the 'start' parameter is >= 0xFF000000 on 32-bit
systems, or >= 0xFFFFFFFF'FF000000 on 64-bit systems,
fill_gva_list() gets into an infinite loop.
With such inputs, 'cur' overflows after adding HV_TLB_FLUSH_UNIT
and always compares as less than end. Memory is filled with
guest virtual addresses until the system crashes.
Fix this by never incrementing 'cur' to be larger than 'end'.
Reported-by: Jong Hyun Park <park.jonghyun@yonsei.ac.kr>
Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 2ffd9e33ce ("x86/hyper-v: Use hypercall for remote TLB flush")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We want to throw out the attrbute if it refers to the mounted on fileid,
and not the real fileid. However we do not want to block cache consistency
updates from NFSv4 writes.
Reported-by: Murphy Zhou <jencce.kernel@gmail.com>
Fixes: 7e10cc25bf ("NFS: Don't refresh attributes with mounted-on-file...")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Pull char/misc driver fixes from Greg KH:
"Here are some small char and misc driver fixes for reported issues for
5.3-rc7
Also included in here is the documentation for how we are handling
hardware issues under embargo that everyone has finally agreed on, as
well as a MAINTAINERS update for the suckers who agreed to handle the
LICENSES/ files.
All of these have been in linux-next last week with no reported
issues"
* tag 'char-misc-5.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
fsi: scom: Don't abort operations for minor errors
vmw_balloon: Fix offline page marking with compaction
VMCI: Release resource if the work is already queued
Documentation/process: Embargoed hardware security issues
lkdtm/bugs: fix build error in lkdtm_EXHAUST_STACK
mei: me: add Tiger Lake point LP device ID
intel_th: pci: Add Tiger Lake support
intel_th: pci: Add support for another Lewisburg PCH
stm class: Fix a double free of stm_source_device
MAINTAINERS: add entry for LICENSES and SPDX stuff
fpga: altera-ps-spi: Fix getting of optional confd gpio
Pull USB fixes from Greg KH:
"Here are some small USB fixes that have been in linux-next this past
week for 5.3-rc7
They fix the usual xhci, syzbot reports, and other small issues that
have come up last week.
All have been in linux-next with no reported issues"
* tag 'usb-5.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
USB: cdc-wdm: fix race between write and disconnect due to flag abuse
usb: host: xhci: rcar: Fix typo in compatible string matching
usb: host: xhci-tegra: Set DMA mask correctly
USB: storage: ums-realtek: Whitelist auto-delink support
USB: storage: ums-realtek: Update module parameter description for auto_delink_en
usb: host: ohci: fix a race condition between shutdown and irq
usb: hcd: use managed device resources
typec: tcpm: fix a typo in the comparison of pdo_max_voltage
usb-storage: Add new JMS567 revision to unusual_devs
usb: chipidea: udc: don't do hardware access if gadget has stopped
usbtmc: more sanity checking for packet size
usb: udc: lpc32xx: silence fall-through warning
HP Pavilion 15 (AMD Ryzen-based model) with 103c:84e7 needs the same
quirk like HP Envy/Spectre x360 for enabling the mute LED over Mic3 pin.
[ rearranged in the SSID number order by tiwai ]
Signed-off-by: Sam Bazley <sambazley@fastmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Commit
a90118c445 ("x86/boot: Save fields explicitly, zero out everything else")
now zeroes the secure boot setting information (enabled/disabled/...)
passed by the boot loader or by the kernel's EFI handover mechanism.
The problem manifests itself with signed kernels using the EFI handoff
protocol with grub and the kernel loses the information whether secure
boot is enabled in the firmware, i.e., the log message "Secure boot
enabled" becomes "Secure boot could not be determined".
efi_main() arch/x86/boot/compressed/eboot.c sets this field early but it
is subsequently zeroed by the above referenced commit.
Include boot_params.secure_boot in the preserve field list.
[ bp: restructure commit message and massage. ]
Fixes: a90118c445 ("x86/boot: Save fields explicitly, zero out everything else")
Signed-off-by: John S. Gruber <JohnSGruber@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: stable <stable@vger.kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/CAPotdmSPExAuQcy9iAHqX3js_fc4mMLQOTr5RBGvizyCOPcTQQ@mail.gmail.com
Pull networking fixes from David Miller:
1) Fix some length checks during OGM processing in batman-adv, from
Sven Eckelmann.
2) Fix regression that caused netfilter conntrack sysctls to not be
per-netns any more. From Florian Westphal.
3) Use after free in netpoll, from Feng Sun.
4) Guard destruction of pfifo_fast per-cpu qdisc stats with
qdisc_is_percpu_stats(), from Davide Caratti. Similar bug is fixed
in pfifo_fast_enqueue().
5) Fix memory leak in mld_del_delrec(), from Eric Dumazet.
6) Handle neigh events on internal ports correctly in nfp, from John
Hurley.
7) Clear SKB timestamp in NF flow table code so that it does not
confuse fq scheduler. From Florian Westphal.
8) taprio destroy can crash if it is invoked in a failure path of
taprio_init(), because the list head isn't setup properly yet and
the list del is unconditional. Perform the list add earlier to
address this. From Vladimir Oltean.
9) Make sure to reapply vlan filters on device up, in aquantia driver.
From Dmitry Bogdanov.
10) sgiseeq driver releases DMA memory using free_page() instead of
dma_free_attrs(). From Christophe JAILLET.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (58 commits)
net: seeq: Fix the function used to release some memory in an error handling path
enetc: Add missing call to 'pci_free_irq_vectors()' in probe and remove functions
net: bcmgenet: use ethtool_op_get_ts_info()
tc-testing: don't hardcode 'ip' in nsPlugin.py
net: dsa: microchip: add KSZ8563 compatibility string
dt-bindings: net: dsa: document additional Microchip KSZ8563 switch
net: aquantia: fix out of memory condition on rx side
net: aquantia: linkstate irq should be oneshot
net: aquantia: reapply vlan filters on up
net: aquantia: fix limit of vlan filters
net: aquantia: fix removal of vlan 0
net/sched: cbs: Set default link speed to 10 Mbps in cbs_set_port_rate
taprio: Set default link speed to 10 Mbps in taprio_set_picos_per_byte
taprio: Fix kernel panic in taprio_destroy
net: dsa: microchip: fill regmap_config name
rxrpc: Fix lack of conn cleanup when local endpoint is cleaned up [ver #2]
net: stmmac: dwmac-rk: Don't fail if phy regulator is absent
amd-xgbe: Fix error path in xgbe_mod_init()
netfilter: nft_meta_bridge: Fix get NFT_META_BRI_IIFVPROTO in network byteorder
mac80211: Correctly set noencrypt for PAE frames
...
In commit 99cd149efe ("sgiseeq: replace use of dma_cache_wback_inv"),
a call to 'get_zeroed_page()' has been turned into a call to
'dma_alloc_coherent()'. Only the remove function has been updated to turn
the corresponding 'free_page()' into 'dma_free_attrs()'.
The error hndling path of the probe function has not been updated.
Fix it now.
Rename the corresponding label to something more in line.
Fixes: 99cd149efe ("sgiseeq: replace use of dma_cache_wback_inv")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull x86 fixes from Thomas Gleixner:
"A set of fixes for x86:
- Fix the bogus detection of 32bit user mode for uretprobes which
caused corruption of the user return address resulting in
application crashes. In the uprobes handler in_ia32_syscall() is
obviously always returning false on a 64bit kernel. Use
user_64bit_mode() instead which works correctly.
- Prevent large page splitting when ftrace flips RW/RO on the kernel
text which caused iTLB performance issues. Ftrace wants to be
converted to text_poke() which avoids the problem, but for now
allow large page preservation in the static protections check when
the change request spawns a full large page.
- Prevent arch_dynirq_lower_bound() from returning 0 when the IOAPIC
is configured via device tree. In the device tree case the GSI 1:1
mapping is meaningless therefore the lower bound which protects the
GSI range on ACPI machines is irrelevant. Return the lower bound
which the core hands to the function instead of blindly returning 0
which causes the core to allocate the invalid virtual interupt
number 0 which in turn prevents all drivers from allocating and
requesting an interrupt.
- Remove the bogus initialization of LDR and DFR in the 32bit bigsmp
APIC driver. That uses physical destination mode where LDR/DFR are
ignored, but the initialization and the missing clear of LDR caused
the APIC to be left in a inconsistent state on kexec/reboot.
- Clear LDR when clearing the APIC registers so the APIC is in a well
defined state.
- Initialize variables proper in the find_trampoline_placement()
code.
- Silence GCC( build warning for the real mode part of the build"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm/cpa: Prevent large page split when ftrace flips RW on kernel text
x86/build: Add -Wnoaddress-of-packed-member to REALMODE_CFLAGS, to silence GCC9 build warning
x86/boot/compressed/64: Fix missing initialization in find_trampoline_placement()
x86/apic: Include the LDR when clearing out APIC registers
x86/apic: Do not initialize LDR and DFR for bigsmp
uprobes/x86: Fix detection of 32-bit user mode
x86/apic: Fix arch_dynirq_lower_bound() bug for DT enabled machines
Pull perf fixes from Thomas Gleixner:
"Two fixes for perf x86 hardware implementations:
- Restrict the period on Nehalem machines to prevent perf from
hogging the CPU
- Prevent the AMD IBS driver from overwriting the hardwre controlled
and pre-seeded reserved bits (0-6) in the count register which
caused a sample bias for dispatched micro-ops"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/amd/ibs: Fix sample bias for dispatched micro-ops
perf/x86/intel: Restrict period on Nehalem
Pull turbostat updates from Len Brown:
"User-space turbostat (and x86_energy_perf_policy) patches.
They are primarily bug fixes from users"
* 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
tools/power turbostat: update version number
tools/power turbostat: Add support for Hygon Fam 18h (Dhyana) RAPL
tools/power turbostat: Fix caller parameter of get_tdp_amd()
tools/power turbostat: Fix CPU%C1 display value
tools/power turbostat: do not enforce 1ms
tools/power turbostat: read from pipes too
tools/power turbostat: Add Ice Lake NNPI support
tools/power turbostat: rename has_hsw_msrs()
tools/power turbostat: Fix Haswell Core systems
tools/power turbostat: add Jacobsville support
tools/power turbostat: fix buffer overrun
tools/power turbostat: fix file descriptor leaks
tools/power turbostat: fix leak of file descriptor on error return path
tools/power turbostat: Make interval calculation per thread to reduce jitter
tools/power turbostat: remove duplicate pc10 column
tools/power x86_energy_perf_policy: Fix argument parsing
tools/power: Fix typo in man page
tools/power/x86: Enable compiler optimisations and Fortify by default
tools/power x86_energy_perf_policy: Fix "uninitialized variable" warnings at -O2
Call to 'pci_free_irq_vectors()' are missing both in the error handling
path of the probe function, and in the remove function.
Add them.
Fixes: 19971f5ea0 ("enetc: add PTP clock driver")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change enables the use of SW timestamping on the Raspberry Pi 4.
bcmgenet's transmit function bcmgenet_xmit() implements software
timestamping. However the SOF_TIMESTAMPING_TX_SOFTWARE capability was
missing and only SOF_TIMESTAMPING_RX_SOFTWARE was announced. By using
ethtool_ops bcmgenet_ethtool_ops() as get_ts_info(), the
SOF_TIMESTAMPING_TX_SOFTWARE capability is announced.
Similar to commit a8f5cb9e79 ("smsc95xx: use ethtool_op_get_ts_info()")
Signed-off-by: Ryan M. Collins <rmc032@bucknell.edu>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Doug Berger <opendmb@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
the following tdc test fails on Fedora:
# ./tdc.py -e 2638
-- ns/SubPlugin.__init__
Test 2638: Add matchall and try to get it
-----> prepare stage *** Could not execute: "$TC qdisc add dev $DEV1 clsact"
-----> prepare stage *** Error message: "/bin/sh: ip: command not found"
returncode 127; expected [0]
-----> prepare stage *** Aborting test run.
Let nsPlugin.py use the 'IP' variable introduced with commit 92c1a19e2f
("tc-tests: added path to ip command in tdc"), so that the path to 'ip' is
correctly resolved to the value we have in tdc_config.py.
# ./tdc.py -e 2638
-- ns/SubPlugin.__init__
Test 2638: Add matchall and try to get it
All test results:
1..1
ok 1 2638 - Add matchall and try to get it
Fixes: 489ce2f425 ("tc-testing: Restore original behaviour for namespaces in tdc")
Reported-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Razvan Stefanescu says:
====================
net: dsa: microchip: add KSZ8563 support
This patchset adds compatibility string for the KSZ8563 switch.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Russkikh says:
====================
net: aquantia: fixes on vlan filters and other conditions
Here is a set of various bug fixes related to vlan filter offload and
two other rare cases.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
On embedded environments with hard memory limits it is a normal although
rare case when skb can't be allocated on rx part under high traffic.
In such OOM cases napi_complete_done() was not called.
So the napi object became in an invalid state like it is "scheduled".
Kernel do not re-schedules the poll of that napi object.
Consequently, kernel can not remove that object the system hangs on
`ifconfig down` waiting for a poll.
We are fixing this by gracefully closing napi poll routine with correct
invocation of napi_complete_done.
This was reproduced with artificially failing the allocation of skb to
simulate an "out of memory" error case and check that traffic does
not get stuck.
Fixes: 970a2e9864 ("net: ethernet: aquantia: Vector operations")
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: Dmitry Bogdanov <dmitry.bogdanov@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Declaring threaded irq handler should also indicate the irq is
oneshot. It is oneshot indeed, because HW implements irq automasking
on trigger.
Not declaring this causes some kernel configurations to fail
on interface up, because request_threaded_irq returned an err code.
The issue was originally hidden on normal x86_64 configuration with
latest kernel, because depending on interrupt controller, irq driver
added ONESHOT flag on its own.
Issue was observed on older kernels (4.14) where no such logic exists.
Fixes: 4c83f170b3 ("net: aquantia: link status irq handling")
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Reported-by: Michael Symolkin <Michael.Symolkin@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case of device reconfiguration the driver may reset the device invisible
for other modules, vlan module in particular. So vlans will not be
removed&created and vlan filters will not be configured in the device.
The patch reapplies the vlan filters at device start.
Fixes: 7975d2aff5 ("net: aquantia: add support of rx-vlan-filter offload")
Signed-off-by: Dmitry Bogdanov <dmitry.bogdanov@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix a limit condition of vlans on the interface before setting vlan
promiscuous mode
Fixes: 48dd73d08d ("net: aquantia: fix vlans not working over bridged network")
Signed-off-by: Dmitry Bogdanov <dmitry.bogdanov@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Due to absence of checking against the rx flow rule when vlan 0 is being
removed, the other rule could be removed instead of the rule with vlan 0
Fixes: 7975d2aff5 ("net: aquantia: add support of rx-vlan-filter offload")
Signed-off-by: Dmitry Bogdanov <dmitry.bogdanov@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean says:
====================
Fix issues in tc-taprio and tc-cbs
This series fixes one panic and one WARN_ON found in the tc-taprio
qdisc, while trying to apply it:
- On an interface which is not multi-queue
- On an interface which has no carrier
The tc-cbs was also visually found to suffer of the same issue as
tc-taprio, and the fix was only compile-tested in that case.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The discussion to be made is absolutely the same as in the case of
previous patch ("taprio: Set default link speed to 10 Mbps in
taprio_set_picos_per_byte"). Nothing is lost when setting a default.
Cc: Leandro Dorileo <leandro.maciel.dorileo@intel.com>
Fixes: e0a7683d30 ("net/sched: cbs: fix port_rate miscalculation")
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The taprio budget needs to be adapted at runtime according to interface
link speed. But that handling is problematic.
For one thing, installing a qdisc on an interface that doesn't have
carrier is not illegal. But taprio prints the following stack trace:
[ 31.851373] ------------[ cut here ]------------
[ 31.856024] WARNING: CPU: 1 PID: 207 at net/sched/sch_taprio.c:481 taprio_dequeue+0x1a8/0x2d4
[ 31.864566] taprio: dequeue() called with unknown picos per byte.
[ 31.864570] Modules linked in:
[ 31.873701] CPU: 1 PID: 207 Comm: tc Not tainted 5.3.0-rc5-01199-g8838fe023cd6 #1689
[ 31.881398] Hardware name: Freescale LS1021A
[ 31.885661] [<c03133a4>] (unwind_backtrace) from [<c030d8cc>] (show_stack+0x10/0x14)
[ 31.893368] [<c030d8cc>] (show_stack) from [<c10ac958>] (dump_stack+0xb4/0xc8)
[ 31.900555] [<c10ac958>] (dump_stack) from [<c0349d04>] (__warn+0xe0/0xf8)
[ 31.907395] [<c0349d04>] (__warn) from [<c0349d64>] (warn_slowpath_fmt+0x48/0x6c)
[ 31.914841] [<c0349d64>] (warn_slowpath_fmt) from [<c0f38db4>] (taprio_dequeue+0x1a8/0x2d4)
[ 31.923150] [<c0f38db4>] (taprio_dequeue) from [<c0f227b0>] (__qdisc_run+0x90/0x61c)
[ 31.930856] [<c0f227b0>] (__qdisc_run) from [<c0ec82ac>] (net_tx_action+0x12c/0x2bc)
[ 31.938560] [<c0ec82ac>] (net_tx_action) from [<c0302298>] (__do_softirq+0x130/0x3c8)
[ 31.946350] [<c0302298>] (__do_softirq) from [<c03502a0>] (irq_exit+0xbc/0xd8)
[ 31.953536] [<c03502a0>] (irq_exit) from [<c03a4808>] (__handle_domain_irq+0x60/0xb4)
[ 31.961328] [<c03a4808>] (__handle_domain_irq) from [<c0754478>] (gic_handle_irq+0x58/0x9c)
[ 31.969638] [<c0754478>] (gic_handle_irq) from [<c0301a8c>] (__irq_svc+0x6c/0x90)
[ 31.977076] Exception stack(0xe8167b20 to 0xe8167b68)
[ 31.982100] 7b20: e9d4bd80 00000cc0 000000cf 00000000 e9d4bd80 c1f38958 00000cc0 c1f38960
[ 31.990234] 7b40: 00000001 000000cf 00000004 e9dc0800 00000000 e8167b70 c0f478ec c0f46d94
[ 31.998363] 7b60: 60070013 ffffffff
[ 32.001833] [<c0301a8c>] (__irq_svc) from [<c0f46d94>] (netlink_trim+0x18/0xd8)
[ 32.009104] [<c0f46d94>] (netlink_trim) from [<c0f478ec>] (netlink_broadcast_filtered+0x34/0x414)
[ 32.017930] [<c0f478ec>] (netlink_broadcast_filtered) from [<c0f47cec>] (netlink_broadcast+0x20/0x28)
[ 32.027102] [<c0f47cec>] (netlink_broadcast) from [<c0eea378>] (rtnetlink_send+0x34/0x88)
[ 32.035238] [<c0eea378>] (rtnetlink_send) from [<c0f25890>] (notify_and_destroy+0x2c/0x44)
[ 32.043461] [<c0f25890>] (notify_and_destroy) from [<c0f25e08>] (qdisc_graft+0x398/0x470)
[ 32.051595] [<c0f25e08>] (qdisc_graft) from [<c0f27a00>] (tc_modify_qdisc+0x3a4/0x724)
[ 32.059470] [<c0f27a00>] (tc_modify_qdisc) from [<c0ee4c84>] (rtnetlink_rcv_msg+0x260/0x2ec)
[ 32.067864] [<c0ee4c84>] (rtnetlink_rcv_msg) from [<c0f4a988>] (netlink_rcv_skb+0xb8/0x110)
[ 32.076172] [<c0f4a988>] (netlink_rcv_skb) from [<c0f4a170>] (netlink_unicast+0x1b4/0x22c)
[ 32.084392] [<c0f4a170>] (netlink_unicast) from [<c0f4a5e4>] (netlink_sendmsg+0x33c/0x380)
[ 32.092614] [<c0f4a5e4>] (netlink_sendmsg) from [<c0ea9f40>] (sock_sendmsg+0x14/0x24)
[ 32.100403] [<c0ea9f40>] (sock_sendmsg) from [<c0eaa780>] (___sys_sendmsg+0x214/0x228)
[ 32.108279] [<c0eaa780>] (___sys_sendmsg) from [<c0eabad0>] (__sys_sendmsg+0x50/0x8c)
[ 32.116068] [<c0eabad0>] (__sys_sendmsg) from [<c0301000>] (ret_fast_syscall+0x0/0x54)
[ 32.123938] Exception stack(0xe8167fa8 to 0xe8167ff0)
[ 32.128960] 7fa0: b6fa68c8 000000f8 00000003 bea142d0 00000000 00000000
[ 32.137093] 7fc0: b6fa68c8 000000f8 0052154c 00000128 5d6468a2 00000000 00000028 00558c9c
[ 32.145224] 7fe0: 00000070 bea14278 00530d64 b6e17e64
[ 32.150659] ---[ end trace 2139c9827c3e5177 ]---
This happens because the qdisc ->dequeue callback gets called. Which
again is not illegal, the qdisc will dequeue even when the interface is
up but doesn't have carrier (and hence SPEED_UNKNOWN), and the frames
will be dropped further down the stack in dev_direct_xmit().
And, at the end of the day, for what? For calculating the initial budget
of an interface which is non-operational at the moment and where frames
will get dropped anyway.
So if we can't figure out the link speed, default to SPEED_10 and move
along. We can also remove the runtime check now.
Cc: Leandro Dorileo <leandro.maciel.dorileo@intel.com>
Fixes: 7b9eba7ba0 ("net/sched: taprio: fix picos_per_byte miscalculation")
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
taprio_init may fail earlier than this line:
list_add(&q->taprio_list, &taprio_list);
i.e. due to the net device not being multi queue.
Attempting to remove q from the global taprio_list when it is not part
of it will result in a kernel panic.
Fix it by matching list_add and list_del better to one another in the
order of operations. This way we can keep the deletion unconditional
and with lower complexity - O(1).
Cc: Leandro Dorileo <leandro.maciel.dorileo@intel.com>
Fixes: 7b9eba7ba0 ("net/sched: taprio: fix picos_per_byte miscalculation")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the register value width as the regmap_config name to prevent the
following error when the second and third regmap_configs are
initialized.
"debugfs: Directory '${bus-id}' with parent 'regmap' already present!"
Signed-off-by: George McCollister <george.mccollister@gmail.com>
Reviewed-by: Marek Vasut <marex@denx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Simon Wunderlich says:
====================
Here are two batman-adv bugfixes:
- Fix OGM and OGMv2 header read boundary check,
by Sven Eckelmann (2 patches)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 9392bd98bb ("tools/power turbostat: Add support for AMD
Fam 17h (Zen) RAPL") and the commit 3316f99a9f ("tools/power
turbostat: Also read package power on AMD F17h (Zen)") add AMD Fam 17h
RAPL support.
Hygon Family 18h(Dhyana) support RAPL in bit 14 of CPUID 0x80000007 EDX,
and has MSRs RAPL_PWR_UNIT/CORE_ENERGY_STAT/PKG_ENERGY_STAT. So add Hygon
Dhyana Family 18h support for RAPL.
Already tested on Hygon multi-node systems and it shows correct per-core
energy usage and the total package power.
Signed-off-by: Pu Wen <puwen@hygon.cn>
Reviewed-by: Calvin Walton <calvin.walton@kepstin.ca>
Signed-off-by: Len Brown <len.brown@intel.com>
Commit 9392bd98bb ("tools/power turbostat: Add support for AMD
Fam 17h (Zen) RAPL") add a function get_tdp_amd(), the parameter is CPU
family. But the rapl_probe_amd() function use wrong model parameter.
Fix the wrong caller parameter of get_tdp_amd() to use family.
Cc: <stable@vger.kernel.org> # v5.1+
Signed-off-by: Pu Wen <puwen@hygon.cn>
Reviewed-by: Calvin Walton <calvin.walton@kepstin.ca>
Signed-off-by: Len Brown <len.brown@intel.com>
In some case C1% will be wrong value, when platform doesn't have MSR for
C1 residency.
For example:
Core CPU CPU%c1
- - 100.00
0 0 100.00
0 2 100.00
1 1 100.00
1 3 100.00
But adding Busy% will fix this
Core CPU Busy% CPU%c1
- - 99.77 0.23
0 0 99.77 0.23
0 2 99.77 0.23
1 1 99.77 0.23
1 3 99.77 0.23
This issue can be reproduced on most of the recent systems including
Broadwell, Skylake and later.
This is because if we don't select Busy% or Avg_MHz or Bzy_MHz then
mperf value will not be read from MSR, so it will be 0. But this
is required for C1% calculation when MSR for C1 residency is not present.
Same is true for C3, C6 and C7 column selection.
So add another define DO_BIC_READ(), which doesn't depend on user
column selection and use for mperf, C3, C6 and C7 related counters.
So when there is no platform support for C1 residency counters,
we still read these counters, if the CPU has support and user selected
display of CPU%c1.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Turbostat works by taking a snapshot of counters, sleeping, taking another
snapshot, calculating deltas, and printing out the table.
The sleep time is controlled via -i option or by user sending a signal or a
character to stdin. In the latter case, turbostat always adds 1 ms
sleep before it reads the counters, in order to avoid larger imprecisions
in the results in prints.
While the 1 ms delay may be a good idea for a "dumb" user, it is a
problem for an "aware" user. I do thousands and thousands of measurements
over a short period of time (like 2ms), and turbostat unconditionally adds
a 1ms to my interval, so I cannot get what I really need.
This patch removes the unconditional 1ms sleep. This is an expert user
tool, after all, and non-experts will unlikely ever use it in the non-fixed
interval mode anyway, so I think it is OK to remove the 1ms delay.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Commit '47936f944e78 tools/power turbostat: fix printing on input' make
a valid fix, but it completely disabled piped stdin support, which is
a valuable use-case. Indeed, if stdin is a pipe, turbostat won't read
anything from it, so it becomes impossible to get turbostat output at
user-defined moments, instead of the regular intervals.
There is no reason why this should works for terminals, but not for
pipes. This patch improves the situation. Instead of ignoring pipes, we
read data from them but gracefully handle the EOF case.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Perhaps if this more descriptive name had been used,
then we wouldn't have had the HSW ULT vs HSW CORE bug,
fixed by the previous commit.
Signed-off-by: Len Brown <len.brown@intel.com>
turbostat: cpu0: msr offset 0x630 read failed: Input/output error
because Haswell Core does not have C8-C10.
Output C8-C10 only on Haswell ULT.
Fixes: f5a4c76ad7 ("tools/power turbostat: consolidate duplicate model numbers")
Reported-by: Prarit Bhargava <prarit@redhat.com>
Suggested-by: Kosuke Tatsukawa <tatsu@ab.jp.nec.com>
Signed-off-by: Len Brown <len.brown@intel.com>
turbostat could be terminated by general protection fault on some latest
hardwares which (for example) support 9 levels of C-states and show 18
"tADDED" lines. That bloats the total output and finally causes buffer
overrun. So let's extend the buffer to avoid this.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Currently the error return path does not close the file fp and leaks
a file descriptor. Fix this by closing the file.
Fixes: 5ea7647b33 ("tools/power turbostat: Warn on bad ACPI LPIT data")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Turbostat currently normalizes TSC and other values by dividing by an
interval. This interval is the delta between the start of one global
(all counters on all CPUs) sampling and the start of another. However,
this introduces a lot of jitter into the data.
In order to reduce jitter, the interval calculation should be based on
timestamps taken per thread and close to the start of the thread's
sampling.
Define a per thread time value to hold the delta between samples taken
on the thread.
Use the timestamp taken at the beginning of sampling to calculate the
delta.
Move the thread's beginning timestamp to after the CPU migration to
avoid jitter due to the migration.
Use the global time delta for the average time delta.
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Len Brown <len.brown@intel.com>
The -w argument in x86_energy_perf_policy currently triggers an
unconditional segfault.
This is because the argument string reads: "+a:c:dD:E:e:f:m:M:rt:u:vw" and
yet the argument handler expects an argument.
When parse_optarg_string is called with a null argument, we then proceed to
crash in strncmp, not horribly friendly.
The man page describes -w as taking an argument, the long form
(--hwp-window) is correctly marked as taking a required argument, and the
code expects it.
As such, this patch simply marks the short form (-w) as requiring an
argument.
Signed-off-by: Zephaniah E. Loss-Cutler-Hull <zephaniah@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Compiling without optimisations is silly, especially since some
warnings depend on the optimiser. Use -O2.
Fortify adds warnings for unchecked I/O (among other things), which
seems to be a good idea for user-space code. Enable that too.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Len Brown <len.brown@intel.com>
x86_energy_perf_policy first uses __get_cpuid() to check the maximum
CPUID level and exits if it is too low. It then assumes that later
calls will succeed (which I think is architecturally guaranteed). It
also assumes that CPUID works at all (which is not guaranteed on
x86_32).
If optimisations are enabled, gcc warns about potentially
uninitialized variables. Fix this by adding an exit-on-error after
every call to __get_cpuid() instead of just checking the maximum
level.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Len Brown <len.brown@intel.com>
Pull i2c fixes from Wolfram Sang:
"I2C has a bunch of driver fixes and a core improvement to make the
on-going API transition more robust"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: mediatek: disable zero-length transfers for mt8183
i2c: iproc: Stop advertising support of SMBUS quick cmd
MAINTAINERS: i2c mv64xxx: Update documentation path
i2c: piix4: Fix port selection for AMD Family 16h Model 30h
i2c: designware: Synchronize IRQs when unregistering slave client
i2c: i801: Avoid memory leak in check_acpi_smo88xx_device()
i2c: make i2c_unregister_device() ERR_PTR safe
Pull tracing fixes from Steven Rostedt:
"Small fixes and minor cleanups for tracing:
- Make exported ftrace function not static
- Fix NULL pointer dereference in reading probes as they are created
- Fix NULL pointer dereference in k/uprobe clean up path
- Various documentation fixes"
* tag 'trace-v5.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Correct kdoc formats
ftrace/x86: Remove mcount() declaration
tracing/probe: Fix null pointer dereference
tracing: Make exported ftrace_set_clr_event non-static
ftrace: Check for successful allocation of hash
ftrace: Check for empty hash and comment the race with registering probes
ftrace: Fix NULL pointer dereference in t_probe_next()
Pull RISC-V fix from Paul Walmsley:
"One significant fix for 32-bit RISC-V systems:
Fix the RV32 memory map to prevent userspace from corrupting the
FIXMAP area. Without this patch, the system can crash very early
during the boot"
* tag 'riscv/for-v5.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
RISC-V: Fix FIXMAP area corruption on RV32 systems
Pull KVM fixes from Radim Krčmář:
"PPC:
- Fix bug which could leave locks held in the host on return to a
guest.
x86:
- Prevent infinitely looping emulation of a failing syscall while
single stepping.
- Do not crash the host when nesting is disabled"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: Don't update RIP or do single-step on faulting emulation
KVM: x86: hyper-v: don't crash on KVM_GET_SUPPORTED_HV_CPUID when kvm_intel.nested is disabled
KVM: PPC: Book3S: Fix incorrect guest-to-user-translation error handling
Merge misc mm fixes from Andrew Morton:
"7 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm: memcontrol: fix percpu vmstats and vmevents flush
mm, memcg: do not set reclaim_state on soft limit reclaim
mailmap: add aliases for Dmitry Safonov
mm/z3fold.c: fix lock/unlock imbalance in z3fold_page_isolate
mm, memcg: partially revert "mm/memcontrol.c: keep local VM counters in sync with the hierarchical ones"
mm/zsmalloc.c: fix build when CONFIG_COMPACTION=n
mm: memcontrol: flush percpu slab vmstats on kmem offlining
Fix the following kdoc warnings:
kernel/trace/trace.c:1579: warning: Function parameter or member 'tr' not described in 'update_max_tr_single'
kernel/trace/trace.c:1579: warning: Function parameter or member 'tsk' not described in 'update_max_tr_single'
kernel/trace/trace.c:1579: warning: Function parameter or member 'cpu' not described in 'update_max_tr_single'
kernel/trace/trace.c:1776: warning: Function parameter or member 'type' not described in 'register_tracer'
kernel/trace/trace.c:2239: warning: Function parameter or member 'task' not described in 'tracing_record_taskinfo'
kernel/trace/trace.c:2239: warning: Function parameter or member 'flags' not described in 'tracing_record_taskinfo'
kernel/trace/trace.c:2269: warning: Function parameter or member 'prev' not described in 'tracing_record_taskinfo_sched_switch'
kernel/trace/trace.c:2269: warning: Function parameter or member 'next' not described in 'tracing_record_taskinfo_sched_switch'
kernel/trace/trace.c:2269: warning: Function parameter or member 'flags' not described in 'tracing_record_taskinfo_sched_switch'
kernel/trace/trace.c:3078: warning: Function parameter or member 'ip' not described in 'trace_vbprintk'
kernel/trace/trace.c:3078: warning: Function parameter or member 'fmt' not described in 'trace_vbprintk'
kernel/trace/trace.c:3078: warning: Function parameter or member 'args' not described in 'trace_vbprintk'
Link: http://lkml.kernel.org/r/20190828052549.2472-2-jakub.kicinski@netronome.com
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The function ftrace_set_clr_event is declared static and marked
EXPORT_SYMBOL_GPL(), which is at best an odd combination. Because the
function was decided to be a part of API, this commit removes the static
attribute and adds the declaration to the header.
Link: http://lkml.kernel.org/r/20190704172110.27041-1-efremov@linux.com
Fixes: f45d1225ad ("tracing: Kernel access to Ftrace instances")
Reviewed-by: Joe Jin <joe.jin@oracle.com>
Signed-off-by: Denis Efremov <efremov@linux.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Pull crypto fix from Herbert Xu:
"Fix a potential crash in the ccp driver"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: ccp - Ignore unconfigured CCP device on suspend/resume
Commit dfe2a77fd2 ("kfifo: fix kfifo_alloc() and kfifo_init()") made
the kfifo code round the number of elements up. That was good for
__kfifo_alloc(), but it's actually wrong for __kfifo_init().
The difference? __kfifo_alloc() will allocate the rounded-up number of
elements, but __kfifo_init() uses an allocation done by the caller. We
can't just say "use more elements than the caller allocated", and have
to round down.
The good news? All the normal cases will be using power-of-two arrays
anyway, and most users of kfifo's don't use kfifo_init() at all, but one
of the helper macros to declare a KFIFO that enforce the proper
power-of-two behavior. But it looks like at least ibmvscsis might be
affected.
The bad news? Will Deacon refers to an old thread and points points out
that the memory ordering in kfifo's is questionable. See
https://lore.kernel.org/lkml/20181211034032.32338-1-yuleixzhang@tencent.com/
for more.
Fixes: dfe2a77fd2 ("kfifo: fix kfifo_alloc() and kfifo_init()")
Reported-by: laokz <laokz@foxmail.com>
Cc: Stefani Seibold <stefani@seibold.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Greg KH <greg@kroah.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Adric Blake has noticed[1] the following warning:
WARNING: CPU: 7 PID: 175 at mm/vmscan.c:245 set_task_reclaim_state+0x1e/0x40
[...]
Call Trace:
mem_cgroup_shrink_node+0x9b/0x1d0
mem_cgroup_soft_limit_reclaim+0x10c/0x3a0
balance_pgdat+0x276/0x540
kswapd+0x200/0x3f0
? wait_woken+0x80/0x80
kthread+0xfd/0x130
? balance_pgdat+0x540/0x540
? kthread_park+0x80/0x80
ret_from_fork+0x35/0x40
---[ end trace 727343df67b2398a ]---
which tells us that soft limit reclaim is about to overwrite the
reclaim_state configured up in the call chain (kswapd in this case but
the direct reclaim is equally possible). This means that reclaim stats
would get misleading once the soft reclaim returns and another reclaim
is done.
Fix the warning by dropping set_task_reclaim_state from the soft reclaim
which is always called with reclaim_state set up.
[1] http://lkml.kernel.org/r/CAE1jjeePxYPvw1mw2B3v803xHVR_BNnz0hQUY_JDMN8ny29M6w@mail.gmail.com
Link: http://lkml.kernel.org/r/20190828071808.20410-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Adric Blake <promarbler14@gmail.com>
Acked-by: Yafang Shao <laoar.shao@gmail.com>
Acked-by: Yang Shi <yang.shi@linux.alibaba.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Hillf Danton <hdanton@sina.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 766a4c19d8 ("mm/memcontrol.c: keep local VM counters in sync
with the hierarchical ones") effectively decreased the precision of
per-memcg vmstats_local and per-memcg-per-node lruvec percpu counters.
That's good for displaying in memory.stat, but brings a serious
regression into the reclaim process.
One issue I've discovered and debugged is the following:
lruvec_lru_size() can return 0 instead of the actual number of pages in
the lru list, preventing the kernel to reclaim last remaining pages.
Result is yet another dying memory cgroups flooding. The opposite is
also happening: scanning an empty lru list is the waste of cpu time.
Also, inactive_list_is_low() can return incorrect values, preventing the
active lru from being scanned and freed. It can fail both because the
size of active and inactive lists are inaccurate, and because the number
of workingset refaults isn't precise. In other words, the result is
pretty random.
I'm not sure, if using the approximate number of slab pages in
count_shadow_number() is acceptable, but issues described above are
enough to partially revert the patch.
Let's keep per-memcg vmstat_local batched (they are only used for
displaying stats to the userspace), but keep lruvec stats precise. This
change fixes the dead memcg flooding on my setup.
Link: http://lkml.kernel.org/r/20190817004726.2530670-1-guro@fb.com
Fixes: 766a4c19d8 ("mm/memcontrol.c: keep local VM counters in sync with the hierarchical ones")
Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Yafang Shao <laoar.shao@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I've noticed that the "slab" value in memory.stat is sometimes 0, even
if some children memory cgroups have a non-zero "slab" value. The
following investigation showed that this is the result of the kmem_cache
reparenting in combination with the per-cpu batching of slab vmstats.
At the offlining some vmstat value may leave in the percpu cache, not
being propagated upwards by the cgroup hierarchy. It means that stats
on ancestor levels are lower than actual. Later when slab pages are
released, the precise number of pages is substracted on the parent
level, making the value negative. We don't show negative values, 0 is
printed instead.
To fix this issue, let's flush percpu slab memcg and lruvec stats on
memcg offlining. This guarantees that numbers on all ancestor levels
are accurate and match the actual number of outstanding slab pages.
Link: http://lkml.kernel.org/r/20190819202338.363363-3-guro@fb.com
Fixes: fb2f2b0adb ("mm: memcg/slab: reparent memcg kmem_caches on cgroup removal")
Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Spurious warning when loading rules using the physdev match,
from Todd Seidelmann.
2) Fix FTP conntrack helper debugging output, from Thomas Jarosch.
3) Restore per-netns nf_conntrack_{acct,helper,timeout} sysctl knobs,
from Florian Westphal.
4) Clear skbuff timestamp from the flowtable datapath, also from Florian.
5) Fix incorrect byteorder of NFT_META_BRI_IIFVPROTO, from wenxu.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf 2019-08-31
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Fix 32-bit zero-extension during constant blinding which
has been causing a regression on ppc64, from Naveen.
2) Fix a latency bug in nfp driver when updating stack index
register, from Jiong.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When a local endpoint is ceases to be in use, such as when the kafs module
is unloaded, the kernel will emit an assertion failure if there are any
outstanding client connections:
rxrpc: Assertion failed
------------[ cut here ]------------
kernel BUG at net/rxrpc/local_object.c:433!
and even beyond that, will evince other oopses if there are service
connections still present.
Fix this by:
(1) Removing the triggering of connection reaping when an rxrpc socket is
released. These don't actually clean up the connections anyway - and
further, the local endpoint may still be in use through another
socket.
(2) Mark the local endpoint as dead when we start the process of tearing
it down.
(3) When destroying a local endpoint, strip all of its client connections
from the idle list and discard the ref on each that the list was
holding.
(4) When destroying a local endpoint, call the service connection reaper
directly (rather than through a workqueue) to immediately kill off all
outstanding service connections.
(5) Make the service connection reaper reap connections for which the
local endpoint is marked dead.
Only after destroying the connections can we close the socket lest we get
an oops in a workqueue that's looking at a connection or a peer.
Fixes: 3d18cbb7fd ("rxrpc: Fix conn expiry timers")
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Howells says:
====================
rxrpc: Fix use of skb_cow_data()
Here's a series of patches that replaces the use of skb_cow_data() in rxrpc
with skb_unshare() early on in the input process. The problem that is
being seen is that skb_cow_data() indirectly requires that the maximum
usage count on an sk_buff be 1, and it may generate an assertion failure in
pskb_expand_head() if not.
This can occur because rxrpc_input_data() may be still holding a ref when
it has just attached the sk_buff to the rx ring and given that attachment
its own ref. If recvmsg happens fast enough, skb_cow_data() can see the
ref still held by the softirq handler.
Further, a packet may contain multiple subpackets, each of which gets its
own attachment to the ring and its own ref - also making skb_cow_data() go
bang.
Fix this by:
(1) The DATA packet is currently parsed for subpackets twice by the input
routines. Parse it just once instead and make notes in the sk_buff
private data.
(2) Use the notes from (1) when attaching the packet to the ring multiple
times. Once the packet is attached to the ring, recvmsg can see it
and start modifying it, so the softirq handler is not permitted to
look inside it from that point.
(3) Pass the ref from the input code to the ring rather than getting an
extra ref. rxrpc_input_data() uses a ref on the second refcount to
prevent the packet from evaporating under it.
(4) Call skb_unshare() on secured DATA packets in rxrpc_input_packet()
before we take call->input_lock. Other sorts of packets don't get
modified and so can be left.
A trace is emitted if skb_unshare() eats the skb. Note that
skb_share() for our accounting in this regard as we can't see the
parameters in the packet to log in a trace line if it releases it.
(5) Remove the calls to skb_cow_data(). These are then no longer
necessary.
There are also patches to improve the rxrpc_skb tracepoint to make sure
that Tx-derived buffers are identified separately from Rx-derived buffers
in the trace.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The devicetree binding lists the phy phy as optional. As such, the
driver should not bail out if it can't find a regulator. Instead it
should just skip the remaining regulator related code and continue
on normally.
Skip the remainder of phy_power_on() if a regulator supply isn't
available. This also gets rid of the bogus return code.
Fixes: 2e12f53663 ("net: stmmac: dwmac-rk: Use standard devicetree property for phy regulator")
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In xgbe_mod_init(), we should do cleanup if some error occurs
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: efbaa82833 ("amd-xgbe: Add support to handle device renaming")
Fixes: 47f164deab ("amd-xgbe: Add PCI device support")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The race between adding a function probe and reading the probes that exist
is very subtle. It needs a comment. Also, the issue can also happen if the
probe has has the EMPTY_HASH as its func_hash.
Cc: stable@vger.kernel.org
Fixes: 7b60f3d876 ("ftrace: Dynamically create the probe ftrace_ops for the trace_array")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
LTP testsuite on powerpc results in the below crash:
Unable to handle kernel paging request for data at address 0x00000000
Faulting instruction address: 0xc00000000029d800
Oops: Kernel access of bad area, sig: 11 [#1]
LE SMP NR_CPUS=2048 NUMA PowerNV
...
CPU: 68 PID: 96584 Comm: cat Kdump: loaded Tainted: G W
NIP: c00000000029d800 LR: c00000000029dac4 CTR: c0000000001e6ad0
REGS: c0002017fae8ba10 TRAP: 0300 Tainted: G W
MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 28022422 XER: 20040000
CFAR: c00000000029d90c DAR: 0000000000000000 DSISR: 40000000 IRQMASK: 0
...
NIP [c00000000029d800] t_probe_next+0x60/0x180
LR [c00000000029dac4] t_mod_start+0x1a4/0x1f0
Call Trace:
[c0002017fae8bc90] [c000000000cdbc40] _cond_resched+0x10/0xb0 (unreliable)
[c0002017fae8bce0] [c0000000002a15b0] t_start+0xf0/0x1c0
[c0002017fae8bd30] [c0000000004ec2b4] seq_read+0x184/0x640
[c0002017fae8bdd0] [c0000000004a57bc] sys_read+0x10c/0x300
[c0002017fae8be30] [c00000000000b388] system_call+0x5c/0x70
The test (ftrace_set_ftrace_filter.sh) is part of ftrace stress tests
and the crash happens when the test does 'cat
$TRACING_PATH/set_ftrace_filter'.
The address points to the second line below, in t_probe_next(), where
filter_hash is dereferenced:
hash = iter->probe->ops.func_hash->filter_hash;
size = 1 << hash->size_bits;
This happens due to a race with register_ftrace_function_probe(). A new
ftrace_func_probe is created and added into the func_probes list in
trace_array under ftrace_lock. However, before initializing the filter,
we drop ftrace_lock, and re-acquire it after acquiring regex_lock. If
another process is trying to read set_ftrace_filter, it will be able to
acquire ftrace_lock during this window and it will end up seeing a NULL
filter_hash.
Fix this by just checking for a NULL filter_hash in t_probe_next(). If
the filter_hash is NULL, then this probe is just being added and we can
simply return from here.
Link: http://lkml.kernel.org/r/05e021f757625cbbb006fad41380323dbe4e3b43.1562249521.git.naveen.n.rao@linux.vnet.ibm.com
Cc: stable@vger.kernel.org
Fixes: 7b60f3d876 ("ftrace: Dynamically create the probe ftrace_ops for the trace_array")
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Pull ARM fixes from Russell King:
"Three fixes for ARM this time around:
- A fix for update_sections_early() to cope with NULL ->mm pointers.
- A correction to the backtrace code to allow proper backtraces.
- Reinforcement of pfn_valid() with PFNs >= 4GiB"
* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: 8901/1: add a criteria for pfn_valid of arm
ARM: 8897/1: check stmfd instruction using right shift
ARM: 8874/1: mm: only adjust sections of valid mm structures
If check_cached_key() returns a non-NULL value, we still need to call
key_type::match_free() to undo key_type::match_preparse().
Fixes: 7743c48e54 ("keys: Cache result of request_key*() temporarily in task_struct")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull ARM SoC fixes from Arnd Bergmann:
"The majority of the fixes this time are for OMAP hardware, here is a
breakdown of the significant changes:
Various device tree bug fixes:
- TI am57xx boards need a voltage level fix to avoid damaging SD
cards
- vf610-bk4 fails to detect its flash due to an incorrect description
- meson-g12a USB phy configuration fails
- meson-g12b reboot should not power off the SD card
- Some corrections for apparently harmless differences from the
documentation.
Regression fixes:
- ams-delta FIQ interrupts broke in 5.3
- TI am3/am4 mmc controllers broke in 5.2
The logic_pio driver (used on some Huawei ARM servers) got a few bug
fixes for reliability.
And a couple of compile-time warning fixes"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (26 commits)
soc: ixp4xx: Protect IXP4xx SoC drivers by ARCH_IXP4XX || COMPILE_TEST
soc: ti: pm33xx: Make two symbols static
soc: ti: pm33xx: Fix static checker warnings
ARM: OMAP: dma: Mark expected switch fall-throughs
ARM: dts: Fix incomplete dts data for am3 and am4 mmc
bus: ti-sysc: Simplify cleanup upon failures in sysc_probe()
ARM: OMAP1: ams-delta-fiq: Fix missing irq_ack
ARM: dts: dra74x: Fix iodelay configuration for mmc3
ARM: dts: am335x: Fix UARTs length
ARM: OMAP2+: Fix omap4 errata warning on other SoCs
bus: hisi_lpc: Add .remove method to avoid driver unbind crash
bus: hisi_lpc: Unregister logical PIO range to avoid potential use-after-free
lib: logic_pio: Add logic_pio_unregister_range()
lib: logic_pio: Avoid possible overlap for unregistering regions
lib: logic_pio: Fix RCU usage
arm64: dts: amlogic: odroid-n2: keep SD card regulator always on
arm64: dts: meson-g12a-sei510: enable IR controller
arm64: dts: meson-g12a: add missing dwc2 phy-names
ARM: dts: vf610-bk4: Fix qspi node description
ARM: dts: Fix incorrect dcan register mapping for am3, am4 and dra7
...
Pull rdma fix from Doug Ledford:
"Much calmer week this week. Just one patch queued up:
The way the siw driver was locking around the traversal of the list of
ipv6 addresses on a device was causing a scheduling while atomic
issue. Bernard straightened it out by using the rtnl_lock"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/siw: Fix IPv6 addr_list locking
Pull two ceph fixes from Ilya Dryomov:
"A fix for a -rc1 regression in rbd and a trivial static checker fix"
* tag 'ceph-for-5.3-rc7' of git://github.com/ceph/ceph-client:
rbd: restore zeroing past the overlap when reading from parent
libceph: don't call crypto_free_sync_skcipher() on a NULL tfm
Pull drm fixes from Dave Airlie:
"Nothing too crazy, there's probably more patches than I'd like at this
stage, but they are all pretty self contained:
amdgpu:
- Fix GFXOFF regression for PCO and RV2
- Fix missing fence reference
- Fix VG20 power readings on certain SMU firmware versions
- Fix dpm level setup for VG20
- Add an ATPX laptop quirk
i915:
- Fix DP MST max BPC property creation after DRM register
- Fix unused ggtt deballooning and NULL dereference in guest
- Fix DSC eDP transcoder identification
- Fix WARN from DMA API debug by setting DMA max segment size
qxl:
- Make qxl reservel the vga ports using vgaargb to prevent switching to vga compatibility mode.
omap:
- Fix omap port lookup for SDI output
virtio:
- Use virtio_max_dma_size to fix an issue with swiotlb.
komeda:
- Compiler fixes to komeda.
- Add missing of_node_get() call in komeda.
- Reorder the komeda de-init functions"
* tag 'drm-fixes-2019-08-30' of git://anongit.freedesktop.org/drm/drm:
drm/komeda: Reordered the komeda's de-init functions
drm/amdgpu: fix GFXOFF on Picasso and Raven2
drm/amdgpu: Add APTX quirk for Dell Latitude 5495
drm/amd/powerplay: correct Vega20 dpm level related settings
drm/i915: Call dma_set_max_seg_size() in i915_driver_hw_probe()
drm/i915/dp: Fix DSC enable code to use cpu_transcoder instead of encoder->type
drm/i915: Don't deballoon unused ggtt drm_mm_node in linux guest
drm/i915: Do not create a new max_bpc prop for MST connectors
drm/powerplay: Fix Vega20 power reading again
drm/powerplay: Fix Vega20 Average Power value v4
drm/amdgpu: fix dma_fence_wait without reference
drm/komeda: Add missing of_node_get() call
drm/komeda: Clean warning 'komeda_component_add' might be a candidate for 'gnu_printf'
drm/komeda: Fix warning -Wunused-but-set-variable
drm/komeda: Fix error: not allocating enough data 1592 vs 1584
drm/virtio: use virtio_max_dma_size
drm/omap: Fix port lookup for SDI output
drm/qxl: get vga ioports
This reverts commit 557529494d.
Commit 557529494d ("iommu/vt-d: Avoid duplicated pci dma alias
consideration") aimed to address a NULL pointer deference issue
happened when a thunderbolt device driver returned unexpectedly.
Unfortunately, this change breaks a previous pci quirk added by
commit cc346a4714 ("PCI: Add function 1 DMA alias quirk for
Marvell devices"), as the result, devices like Marvell 88SE9128
SATA controller doesn't work anymore.
We will continue to try to find the real culprit mentioned in
557529494d, but for now we should revert it to fix current
breakage.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204627
Cc: Stijn Tintel <stijn@linux-ipv6.be>
Cc: Petr Vandrovec <petr@vandrovec.name>
Reported-by: Stijn Tintel <stijn@linux-ipv6.be>
Reported-by: Petr Vandrovec <petr@vandrovec.name>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Quoting from mt8183 datasheet, the number of transfers to be
transferred in one transaction should be set to bigger than 1,
so we should forbid zero-length transfer and update functionality.
Reported-by: Alexandru M Stan <amstan@chromium.org>
Signed-off-by: Hsin-Yi Wang <hsinyi@chromium.org>
Reviewed-by: Qii Wang <qii.wang@mediatek.com>
[wsa: shortened commit message a little]
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
The driver does not support the SMBUS Quick command so remove the
flag that indicates that level of support.
By default the i2c_detect tool uses the quick command to try and
detect devices at some bus addresses. If the quick command is used
then we will not detect the device, even though it is present.
Fixes: e6e5dd3566 (i2c: iproc: Add Broadcom iProc I2C Driver)
Signed-off-by: Lori Hikichi <lori.hikichi@broadcom.com>
Signed-off-by: Rayagonda Kokatanur <rayagonda.kokatanur@broadcom.com>
Reviewed-by: Ray Jui <ray.jui@broadcom.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
When counting dispatched micro-ops with cnt_ctl=1, in order to prevent
sample bias, IBS hardware preloads the least significant 7 bits of
current count (IbsOpCurCnt) with random values, such that, after the
interrupt is handled and counting resumes, the next sample taken
will be slightly perturbed.
The current count bitfield is in the IBS execution control h/w register,
alongside the maximum count field.
Currently, the IBS driver writes that register with the maximum count,
leaving zeroes to fill the current count field, thereby overwriting
the random bits the hardware preloaded for itself.
Fix the driver to actually retain and carry those random bits from the
read of the IBS control register, through to its write, instead of
overwriting the lower current count bits with zeroes.
Tested with:
perf record -c 100001 -e ibs_op/cnt_ctl=1/pp -a -C 0 taskset -c 0 <workload>
'perf annotate' output before:
15.70 65: addsd %xmm0,%xmm1
17.30 add $0x1,%rax
15.88 cmp %rdx,%rax
je 82
17.32 72: test $0x1,%al
jne 7c
7.52 movapd %xmm1,%xmm0
5.90 jmp 65
8.23 7c: sqrtsd %xmm1,%xmm0
12.15 jmp 65
'perf annotate' output after:
16.63 65: addsd %xmm0,%xmm1
16.82 add $0x1,%rax
16.81 cmp %rdx,%rax
je 82
16.69 72: test $0x1,%al
jne 7c
8.30 movapd %xmm1,%xmm0
8.13 jmp 65
8.24 7c: sqrtsd %xmm1,%xmm0
8.39 jmp 65
Tested on Family 15h and 17h machines.
Machines prior to family 10h Rev. C don't have the RDWROPCNT capability,
and have the IbsOpCurCnt bitfield reserved, so this patch shouldn't
affect their operation.
It is unknown why commit db98c5faf8 ("perf/x86: Implement 64-bit
counter support for IBS") ignored the lower 4 bits of the IbsOpCurCnt
field; the number of preloaded random bits has always been 7, AFAICT.
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: "Arnaldo Carvalho de Melo" <acme@kernel.org>
Cc: <x86@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "Borislav Petkov" <bp@alien8.de>
Cc: Stephane Eranian <eranian@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: "Namhyung Kim" <namhyung@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lkml.kernel.org/r/20190826195730.30614-1-kim.phillips@amd.com
The recent change to shuffle the codec initialization procedure for
Realtek via commit 607ca3bd22 ("ALSA: hda/realtek - EAPD turn on
later") caused the silent output on some machines. This change was
supposed to be safe, but it isn't actually; some devices have quirk
setups to override the EAPD via COEF or BTL in the additional verb
table, which is applied at the beginning of snd_hda_gen_init(). And
this EAPD setup is again overridden in alc_auto_init_amp().
For recovering from the regression, tell snd_hda_gen_init() not to
apply the verbs there by a new flag, then apply the verbs in
alc_init().
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=204727
Fixes: 607ca3bd22 ("ALSA: hda/realtek - EAPD turn on later")
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The IP datasheet says this controller is compatible with SD Host
Specification Version v4.00.
As it turned out, the ADMA of this IP does not work with 64-bit mode
when it is in the Version 3.00 compatible mode; it understands the
old 64-bit descriptor table (as defined in SDHCI v2), but the ADMA
System Address Register (SDHCI_ADMA_ADDRESS) cannot point to the
64-bit address.
I noticed this issue only after commit bd2e75633c ("dma-contiguous:
use fallback alloc_pages for single pages"). Prior to that commit,
dma_set_mask_and_coherent() returned the dma address that fits in
32-bit range, at least for the default arm64 configuration
(arch/arm64/configs/defconfig). Now the host->adma_addr exceeds the
32-bit limit, causing the real problem for the Socionext SoCs.
(As a side-note, I was also able to reproduce the issue for older
kernels by turning off CONFIG_DMA_CMA.)
Call sdhci_enable_v4_mode() to fix this.
Cc: <stable@vger.kernel.org> # v4.20+
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
The OCR register defines the supported range of VDD voltages for SD cards.
However, it has turned out that some SD cards reports an invalid voltage
range, for example having bit7 set.
When a host supports MMC_CAP2_FULL_PWR_CYCLE and some of the voltages from
the invalid VDD range, this triggers the core to run a power cycle of the
card to try to initialize it at the lowest common supported voltage.
Obviously this fails, since the card can't support it.
Let's fix this problem, by clearing invalid bits from the read OCR register
for SD cards, before proceeding with the VDD voltage negotiation.
Cc: stable@vger.kernel.org
Reported-by: Philip Langdale <philipl@overt.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Philip Langdale <philipl@overt.org>
Tested-by: Philip Langdale <philipl@overt.org>
Tested-by: Manuel Presnitz <mail@mpy.de>
Pull cifs fixes from Steve French:
"A few small SMB3 fixes, and a larger one to fix various older string
handling functions"
* tag '5.3-rc6-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: update internal module number
cifs: replace various strncpy with strscpy and similar
cifs: Use kzfree() to zero out the password
cifs: set domainName when a domain-key is used in multiuser
Get the vlan_proto of ingress bridge in network byteorder as userspace
expects. Otherwise this is inconsistent with NFT_META_PROTOCOL.
Fixes: 2a3a93ef0b ("netfilter: nft_meta_bridge: Add NFT_META_BRI_IIFVPROTO support")
Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Johannes Berg says:
====================
We have
* one fix for a driver as I'm covering for Kalle while he's on vacation
* two fixes for eapol-over-nl80211 work
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Family 16h Model 30h SMBus controller needs the same port selection fix
as described and fixed in commit 0fe16195f8 ("i2c: piix4: Fix SMBus port
selection for AMD Family 17h chips")
commit 6befa3fde6 ("i2c: piix4: Support alternative port selection
register") also fixed the port selection for Hudson2, but unfortunately
this is not the exact same device and the AMD naming and PCI Device IDs
aren't particularly helpful here.
The SMBus port selection register is common to the following Families
and models, as documented in AMD's publicly available BIOS and Kernel
Developer Guides:
50742 - Family 15h Model 60h-6Fh (PCI_DEVICE_ID_AMD_KERNCZ_SMBUS)
55072 - Family 15h Model 70h-7Fh (PCI_DEVICE_ID_AMD_KERNCZ_SMBUS)
52740 - Family 16h Model 30h-3Fh (PCI_DEVICE_ID_AMD_HUDSON2_SMBUS)
The Hudson2 PCI Device ID (PCI_DEVICE_ID_AMD_HUDSON2_SMBUS) is shared
between Bolton FCH and Family 16h Model 30h, but the location of the
SmBus0Sel port selection bits are different:
51192 - Bolton Register Reference Guide
We distinguish between Bolton and Family 16h Model 30h using the PCI
Revision ID:
Bolton is device 0x780b, revision 0x15
Family 16h Model 30h is device 0x780b, revision 0x1F
Family 15h Model 60h and 70h are both device 0x790b, revision 0x4A.
The following additional public AMD BKDG documents were checked and do
not share the same port selection register:
42301 - Family 15h Model 00h-0Fh doesn't mention any
42300 - Family 15h Model 10h-1Fh doesn't mention any
49125 - Family 15h Model 30h-3Fh doesn't mention any
48751 - Family 16h Model 00h-0Fh uses the previously supported
index register SB800_PIIX4_PORT_IDX_ALT at 0x2e
Signed-off-by: Andrew Cooks <andrew.cooks@opengear.com>
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Cc: stable@vger.kernel.org [v4.6+]
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
ftrace does not use text_poke() for enabling trace functionality. It uses
its own mechanism and flips the whole kernel text to RW and back to RO.
The CPA rework removed a loop based check of 4k pages which tried to
preserve a large page by checking each 4k page whether the change would
actually cover all pages in the large page.
This resulted in endless loops for nothing as in testing it turned out that
it actually never preserved anything. Of course testing missed to include
ftrace, which is the one and only case which benefitted from the 4k loop.
As a consequence enabling function tracing or ftrace based kprobes results
in a full 4k split of the kernel text, which affects iTLB performance.
The kernel RO protection is the only valid case where this can actually
preserve large pages.
All other static protections (RO data, data NX, PCI, BIOS) are truly
static. So a conflict with those protections which results in a split
should only ever happen when a change of memory next to a protected region
is attempted. But these conflicts are rightfully splitting the large page
to preserve the protected regions. In fact a change to the protected
regions itself is a bug and is warned about.
Add an exception for the static protection check for kernel text RO when
the to be changed region spawns a full large page which allows to preserve
the large mappings. This also prevents the syslog to be spammed about CPA
violations when ftrace is used.
The exception needs to be removed once ftrace switched over to text_poke()
which avoids the whole issue.
Fixes: 585948f4f6 ("x86/mm/cpa: Avoid the 4k pages check completely")
Reported-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Song Liu <songliubraving@fb.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1908282355340.1938@nanos.tec.linutronix.de
Make sure interrupt handler i2c_dw_irq_handler_slave() has finished
before clearing the the dev->slave pointer in i2c_dw_unreg_slave().
There is possibility for a race if i2c_dw_irq_handler_slave() is running
on another CPU while clearing the dev->slave pointer.
Reported-by: Krzysztof Adamski <krzysztof.adamski@nokia.com>
Reported-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
check_acpi_smo88xx_device() utilizes acpi_get_object_info() which in its turn
allocates a buffer. User is responsible to clean allocated resources. The last
has been missed in the original code. Fix it here.
While here, replace !ACPI_SUCCESS() with ACPI_FAILURE().
Fixes: 19b07cb4a1 ("i2c: i801: Register optional lis3lv02d I2C device on Dell machines")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Pali Rohár <pali.rohar@gmail.com>
Reviewed-by: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
We are moving towards returning ERR_PTRs when i2c_new_*_device() calls
fail. Make sure its counterpart for unregistering handles ERR_PTRs as
well.
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Pull fallthrough fixes from Gustavo A. R. Silva:
"Fix fall-through warnings on arc and nds32 for multiple
configurations"
* tag 'Wimplicit-fallthrough-5.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux:
nds32: Mark expected switch fall-throughs
ARC: unwind: Mark expected switch fall-through
Pull mtd fix from Miquel Raynal:
"Add a 'depends on' in the core Hyperbus Kconfig entry to avoid build
errors"
* tag 'mtd/fixes-for-5.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
mtd: hyperbus: fix dependency and build error
Mark switch cases where we are expecting to fall through.
This patch fixes the following warnings (Building: allmodconfig nds32):
include/math-emu/soft-fp.h:124:8: warning: this statement may fall through [-Wimplicit-fallthrough=]
arch/nds32/kernel/signal.c:362:20: warning: this statement may fall through [-Wimplicit-fallthrough=]
arch/nds32/kernel/signal.c:315:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
include/math-emu/op-common.h:417:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
include/math-emu/op-common.h:430:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
include/math-emu/op-common.h:310:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
include/math-emu/op-common.h:320:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
include/math-emu/op-common.h:310:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
include/math-emu/op-common.h:320:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
include/math-emu/soft-fp.h:124:8: warning: this statement may fall through [-Wimplicit-fallthrough=]
include/math-emu/op-common.h:417:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
include/math-emu/op-common.h:430:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
include/math-emu/op-common.h:310:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
include/math-emu/op-common.h:320:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
include/math-emu/op-common.h:310:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
include/math-emu/op-common.h:320:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Mark switch cases where we are expecting to fall through.
This patch fixes the following warnings (Building: haps_hs_defconfig arc):
arch/arc/kernel/unwind.c: In function ‘read_pointer’:
./include/linux/compiler.h:328:5: warning: this statement may fall through [-Wimplicit-fallthrough=]
do { \
^
./include/linux/compiler.h:338:2: note: in expansion of macro ‘__compiletime_assert’
__compiletime_assert(condition, msg, prefix, suffix)
^~~~~~~~~~~~~~~~~~~~
./include/linux/compiler.h:350:2: note: in expansion of macro ‘_compiletime_assert’
_compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
^~~~~~~~~~~~~~~~~~~
./include/linux/build_bug.h:39:37: note: in expansion of macro ‘compiletime_assert’
#define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
^~~~~~~~~~~~~~~~~~
./include/linux/build_bug.h:50:2: note: in expansion of macro ‘BUILD_BUG_ON_MSG’
BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
^~~~~~~~~~~~~~~~
arch/arc/kernel/unwind.c:573:3: note: in expansion of macro ‘BUILD_BUG_ON’
BUILD_BUG_ON(sizeof(u32) != sizeof(value));
^~~~~~~~~~~~
arch/arc/kernel/unwind.c:575:2: note: here
case DW_EH_PE_native:
^~~~
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Hisilicon fixes for v5.3-rc
- Fixed RCU usage in logical PIO
- Added a function to unregister a logical PIO range in logical PIO
to support the fixes in the hisi-lpc driver
- Fixed and optimized hisi-lpc driver to avoid potential use-after-free
and driver unbind crash
* tag 'hisi-fixes-for-5.3' of git://github.com/hisilicon/linux-hisi:
bus: hisi_lpc: Add .remove method to avoid driver unbind crash
bus: hisi_lpc: Unregister logical PIO range to avoid potential use-after-free
lib: logic_pio: Add logic_pio_unregister_range()
lib: logic_pio: Avoid possible overlap for unregistering regions
lib: logic_pio: Fix RCU usage
Link: https://lore.kernel.org/r/5D562335.7000902@hisilicon.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
In ieee80211_deliver_skb_to_local_stack intercepts EAPoL frames if
mac80211 is configured to do so and forwards the contents over nl80211.
During this process some additional data is also forwarded, including
whether the frame was received encrypted or not. Unfortunately just
prior to the call to ieee80211_deliver_skb_to_local_stack, skb->cb is
cleared, resulting in incorrect data being exposed over nl80211.
Fixes: 018f6fbf54 ("mac80211: Send control port frames over nl80211")
Cc: stable@vger.kernel.org
Signed-off-by: Denis Kenzior <denkenz@gmail.com>
Link: https://lore.kernel.org/r/20190827224120.14545-2-denkenz@gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
We need to use a different firmware for C0 versions of killer Qu NICs.
Add structures for them and handle them in the if block that detects
C0 revisions.
Additionally, instead of having an inclusive check for QnJ devices,
make the selection exclusive, so that switching to QnJ is the
exception, not the default. This prevents us from having to add all
the non-QnJ cards to an exclusion list. To do so, only go into the
QnJ block if the device has an RF ID type HR and HW revision QnJ.
Cc: stable@vger.kernel.org # 5.2
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Link: https://lore.kernel.org/r/20190821171732.2266-1-luca@coelho.fi
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
If 'fq' qdisc is used and a program has requested timestamps,
skb->tstamp needs to be cleared, else fq will treat these as
'transmit time'.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Since the chained quirks via chained_before flag is applied before the
depth check, it may lead to the endless recursive calls, when the
chain were set up incorrectly. Fix it by moving the depth check at
the beginning of the loop.
Fixes: 1f57825077 ("ALSA: hda - Add chained_before flag to the fixup entry")
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
lib/devres.c, which implements devm_ioremap_resource(), is only built
when CONFIG_HAS_IOMEM is set/enabled, so MTD_HYPERBUS should depend
on HAS_IOMEM. Fixes a build error and a Kconfig warning (as seen on
UML builds):
WARNING: unmet direct dependencies detected for MTD_COMPLEX_MAPPINGS
Depends on [n]: MTD [=m] && HAS_IOMEM [=n]
Selected by [m]:
- MTD_HYPERBUS [=m] && MTD [=m]
ERROR: "devm_ioremap_resource" [drivers/mtd/hyperbus/hyperbus-core.ko] undefined!
Fixes: dcc7d3446a ("mtd: Add support for HyperBus memory devices")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Vignesh Raghavendra <vigneshr@ti.com>
Cc: Miquel Raynal <miquel.raynal@bootlin.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: linux-mtd@lists.infradead.org
Acked-by: Vignesh Raghavendra <vigneshr@ti.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
A similar workaround for the suspend/resume problem is needed for yet
another ASUS machines, P6X models. Like the previous fix, the BIOS
doesn't provide the standard DMI_SYS_* entry, so again DMI_BOARD_*
entries are used instead.
Reported-and-tested-by: SteveM <swm@swm1.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski says:
====================
nfp: flower: fix bugs in merge tunnel encap code
John says:
There are few bugs in the merge encap code that have come to light with
recent driver changes. Effectively, flow bind callbacks were being
registered twice when using internal ports (new 'busy' code triggers
this). There was also an issue with neighbour notifier messages being
ignored for internal ports.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Recent code changes to NFP allowed the offload of neighbour entries to FW
when the next hop device was an internal port. This allows for offload of
tunnel encap when the end-point IP address is applied to such a port.
Unfortunately, the neighbour event handler still rejects events that are
not associated with a repr dev and so the firmware neighbour table may get
out of sync for internal ports.
Fix this by allowing internal port neighbour events to be correctly
processed.
Fixes: 45756dfeda ("nfp: flower: allow tunnels to output to internal port")
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Internal port TC offload is implemented through user-space applications
(such as OvS) by adding filters at egress via TC clsact qdiscs. Indirect
block offload support in the NFP driver accepts both ingress qdisc binds
and egress binds if the device is an internal port. However, clsact sends
bind notification for both ingress and egress block binds which can lead
to the driver registering multiple callbacks and receiving multiple
notifications of new filters.
Fix this by rejecting ingress block bind callbacks when the port is
internal and only adding filter callbacks for egress binds.
Fixes: 4d12ba4278 ("nfp: flower: allow offloading of matches on 'internal' ports")
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hayes Wang says:
====================
r8152: fix side effect
v3:
Update the commit message for patch #1.
v2:
Replace patch #2 with "r8152: remove calling netif_napi_del".
v1:
The commit 0ee1f47349 ("r8152: napi hangup fix after disconnect")
add a check to avoid using napi_disable after netif_napi_del. However,
the commit ffa9fec30c ("r8152: set RTL8152_UNPLUG only for real
disconnection") let the check useless.
Therefore, I revert commit 0ee1f47349 ("r8152: napi hangup fix
after disconnect") first, and add another patch to fix it.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove unnecessary use of netif_napi_del. This also avoids to call
napi_disable() after netif_napi_del().
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 0ee1f47349.
The commit 0ee1f47349 ("r8152: napi hangup fix after
disconnect") adds a check about RTL8152_UNPLUG to determine
if calling napi_disable() is invalid in rtl8152_close(),
when rtl8152_disconnect() is called. This avoids to use
napi_disable() after calling netif_napi_del().
Howver, commit ffa9fec30c ("r8152: set RTL8152_UNPLUG
only for real disconnection") causes that RTL8152_UNPLUG
is not always set when calling rtl8152_disconnect().
Therefore, I have to revert commit 0ee1f47349 ("r8152:
napi hangup fix after disconnect"), first. And submit
another patch to fix it.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Action sample doesn't properly handle psample_group pointer in overwrite
case. Following issues need to be fixed:
- In tcf_sample_init() function RCU_INIT_POINTER() is used to set
s->psample_group, even though we neither setting the pointer to NULL, nor
preventing concurrent readers from accessing the pointer in some way.
Use rcu_swap_protected() instead to safely reset the pointer.
- Old value of s->psample_group is not released or deallocated in any way,
which results resource leak. Use psample_group_put() on non-NULL value
obtained with rcu_swap_protected().
- The function psample_group_put() that released reference to struct
psample_group pointed by rcu-pointer s->psample_group doesn't respect rcu
grace period when deallocating it. Extend struct psample_group with rcu
head and use kfree_rcu when freeing it.
Fixes: 5c5670fae4 ("net/sched: Introduce sample tc action")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, the ibmvnic driver will not schedule device resets
if the device is being removed, but does not check the device
state before the reset is actually processed. This leads to a race
where a reset is scheduled with a valid device state but is
processed after the driver has been removed, resulting in an oops.
Fix this by checking the device state before processing a queued
reset event.
Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Tested-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
pfn_valid can be wrong when parsing a invalid pfn whose phys address
exceeds BITS_PER_LONG as the MSB will be trimed when shifted.
The issue originally arise from bellowing call stack, which corresponding to
an access of the /proc/kpageflags from userspace with a invalid pfn parameter
and leads to kernel panic.
[46886.723249] c7 [<c031ff98>] (stable_page_flags) from [<c03203f8>]
[46886.723264] c7 [<c0320368>] (kpageflags_read) from [<c0312030>]
[46886.723280] c7 [<c0311fb0>] (proc_reg_read) from [<c02a6e6c>]
[46886.723290] c7 [<c02a6e24>] (__vfs_read) from [<c02a7018>]
[46886.723301] c7 [<c02a6f74>] (vfs_read) from [<c02a778c>]
[46886.723315] c7 [<c02a770c>] (SyS_pread64) from [<c0108620>]
(ret_fast_syscall+0x0/0x28)
Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Currently, various virtual memory areas of Linux RISC-V are organized
in increasing order of their virtual addresses is as follows:
1. User space area (This is lowest area and starts at 0x0)
2. FIXMAP area
3. VMALLOC area
4. Kernel area (This is highest area and starts at PAGE_OFFSET)
The maximum size of user space aread is represented by TASK_SIZE.
On RV32 systems, TASK_SIZE is defined as VMALLOC_START which causes the
user space area to overlap the FIXMAP area. This allows user space apps
to potentially corrupt the FIXMAP area and kernel OF APIs will crash
whenever they access corrupted FDT in the FIXMAP area.
On RV64 systems, TASK_SIZE is set to fixed 256GB and no other areas
happen to overlap so we don't see any FIXMAP area corruptions.
This patch fixes FIXMAP area corruption on RV32 systems by setting
TASK_SIZE to FIXADDR_START. We also move FIXADDR_TOP, FIXADDR_SIZE,
and FIXADDR_START defines to asm/pgtable.h so that we can avoid cyclic
header includes.
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Tested-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
Only the first fragment in a datagram contains the L4 headers. When the
Open vSwitch module parses a packet, it always sets the IP protocol
field in the key, but can only set the L4 fields on the first fragment.
The original behavior would not clear the L4 portion of the key, so
garbage values would be sent in the key for "later" fragments. This
patch clears the L4 fields in that circumstance to prevent sending those
garbage values as part of the upcall.
Signed-off-by: Justin Pettit <jpettit@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When IP fragments are reassembled before being sent to conntrack, the
key from the last fragment is used. Unless there are reordering
issues, the last fragment received will not contain the L4 ports, so the
key for the reassembled datagram won't contain them. This patch updates
the key once we have a reassembled datagram.
The handle_fragments() function works on L3 headers so we pull the L3/L4
flow key update code from key_extract into a new function
'key_extract_l3l4'. Then we add a another new function
ovs_flow_key_update_l3l4() and export it so that it is accessible by
handle_fragments() for conntrack packet reassembly.
Co-authored-by: Justin Pettit <jpettit@ovn.org>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that 'TCQ_F_CPUSTATS' bit can be cleared, depending on the value of
'TCQ_F_NOLOCK' bit in the parent qdisc, we need to be sure that per-cpu
counters are present when 'reset()' is called for pfifo_fast qdiscs.
Otherwise, the following script:
# tc q a dev lo handle 1: root htb default 100
# tc c a dev lo parent 1: classid 1:100 htb \
> rate 95Mbit ceil 100Mbit burst 64k
[...]
# tc f a dev lo parent 1: protocol arp basic classid 1:100
[...]
# tc q a dev lo parent 1:100 handle 100: pfifo_fast
[...]
# tc q d dev lo root
can generate the following splat:
Unable to handle kernel paging request at virtual address dfff2c01bd148000
Mem abort info:
ESR = 0x96000004
Exception class = DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000004
CM = 0, WnR = 0
[dfff2c01bd148000] address between user and kernel address ranges
Internal error: Oops: 96000004 [#1] SMP
[...]
pstate: 80000005 (Nzcv daif -PAN -UAO)
pc : pfifo_fast_reset+0x280/0x4d8
lr : pfifo_fast_reset+0x21c/0x4d8
sp : ffff800d09676fa0
x29: ffff800d09676fa0 x28: ffff200012ee22e4
x27: dfff200000000000 x26: 0000000000000000
x25: ffff800ca0799958 x24: ffff1001940f332b
x23: 0000000000000007 x22: ffff200012ee1ab8
x21: 0000600de8a40000 x20: 0000000000000000
x19: ffff800ca0799900 x18: 0000000000000000
x17: 0000000000000002 x16: 0000000000000000
x15: 0000000000000000 x14: 0000000000000000
x13: 0000000000000000 x12: ffff1001b922e6e2
x11: 1ffff001b922e6e1 x10: 0000000000000000
x9 : 1ffff001b922e6e1 x8 : dfff200000000000
x7 : 0000000000000000 x6 : 0000000000000000
x5 : 1fffe400025dc45c x4 : 1fffe400025dc357
x3 : 00000c01bd148000 x2 : 0000600de8a40000
x1 : 0000000000000007 x0 : 0000600de8a40004
Call trace:
pfifo_fast_reset+0x280/0x4d8
qdisc_reset+0x6c/0x370
htb_reset+0x150/0x3b8 [sch_htb]
qdisc_reset+0x6c/0x370
dev_deactivate_queue.constprop.5+0xe0/0x1a8
dev_deactivate_many+0xd8/0x908
dev_deactivate+0xe4/0x190
qdisc_graft+0x88c/0xbd0
tc_get_qdisc+0x418/0x8a8
rtnetlink_rcv_msg+0x3a8/0xa78
netlink_rcv_skb+0x18c/0x328
rtnetlink_rcv+0x28/0x38
netlink_unicast+0x3c4/0x538
netlink_sendmsg+0x538/0x9a0
sock_sendmsg+0xac/0xf8
___sys_sendmsg+0x53c/0x658
__sys_sendmsg+0xc8/0x140
__arm64_sys_sendmsg+0x74/0xa8
el0_svc_handler+0x164/0x468
el0_svc+0x10/0x14
Code: 910012a0 92400801 d343fc03 11000c21 (38fb6863)
Fix this by testing the value of 'TCQ_F_CPUSTATS' bit in 'qdisc->flags',
before dereferencing 'qdisc->cpu_qstats'.
Changes since v1:
- coding style improvements, thanks to Stefano Brivio
Fixes: 8a53e616de ("net: sched: when clearing NOLOCK, clear TCQ_F_CPUSTATS, too")
CC: Paolo Abeni <pabeni@redhat.com>
Reported-by: Li Shuang <shuali@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yash Shah says:
====================
macb: Update ethernet compatible string for SiFive FU540
This patch series renames the compatible property to a more appropriate
string. The patchset is based on Linux-5.3-rc6 and tested on SiFive
Unleashed board
Change history:
Since v1:
- Dropped PATCH3 because it's already merged
- Change the reference url in the patch descriptions to point to a
'lore.kernel.org' link instead of 'lkml.org'
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The scom driver currently fails out of operations if certain system
errors are flagged in the status register; system checkstop, special
attention, or recoverable error. These errors won't impact the ability
of the scom engine to perform operations, so the driver should continue
under these conditions.
Also, don't do a PIB reset for these conditions, since it won't help.
Fixes: 6b293258cd ("fsi: scom: Major overhaul")
Signed-off-by: Eddie James <eajames@linux.ibm.com>
Cc: stable <stable@vger.kernel.org>
Acked-by: Jeremy Kerr <jk@ozlabs.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Joel Stanley <joel@jms.id.au>
Link: https://lore.kernel.org/r/20190827041249.13381-1-jk@ozlabs.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Francois reported that VMware balloon gets stuck after a balloon reset,
when the VMCI doorbell is removed. A similar error can occur when the
balloon driver is removed with the following splat:
[ 1088.622000] INFO: task modprobe:3565 blocked for more than 120 seconds.
[ 1088.622035] Tainted: G W 5.2.0 #4
[ 1088.622087] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1088.622205] modprobe D 0 3565 1450 0x00000000
[ 1088.622210] Call Trace:
[ 1088.622246] __schedule+0x2a8/0x690
[ 1088.622248] schedule+0x2d/0x90
[ 1088.622250] schedule_timeout+0x1d3/0x2f0
[ 1088.622252] wait_for_completion+0xba/0x140
[ 1088.622320] ? wake_up_q+0x80/0x80
[ 1088.622370] vmci_resource_remove+0xb9/0xc0 [vmw_vmci]
[ 1088.622373] vmci_doorbell_destroy+0x9e/0xd0 [vmw_vmci]
[ 1088.622379] vmballoon_vmci_cleanup+0x6e/0xf0 [vmw_balloon]
[ 1088.622381] vmballoon_exit+0x18/0xcc8 [vmw_balloon]
[ 1088.622394] __x64_sys_delete_module+0x146/0x280
[ 1088.622408] do_syscall_64+0x5a/0x130
[ 1088.622410] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1088.622415] RIP: 0033:0x7f54f62791b7
[ 1088.622421] Code: Bad RIP value.
[ 1088.622421] RSP: 002b:00007fff2a949008 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[ 1088.622426] RAX: ffffffffffffffda RBX: 000055dff8b55d00 RCX: 00007f54f62791b7
[ 1088.622426] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 000055dff8b55d68
[ 1088.622427] RBP: 000055dff8b55d00 R08: 00007fff2a947fb1 R09: 0000000000000000
[ 1088.622427] R10: 00007f54f62f5cc0 R11: 0000000000000206 R12: 000055dff8b55d68
[ 1088.622428] R13: 0000000000000001 R14: 000055dff8b55d68 R15: 00007fff2a94a3f0
The cause for the bug is that when the "delayed" doorbell is invoked, it
takes a reference on the doorbell entry and schedules work that is
supposed to run the appropriate code and drop the doorbell entry
reference. The code ignores the fact that if the work is already queued,
it will not be scheduled to run one more time. As a result one of the
references would not be dropped. When the code waits for the reference
to get to zero, during balloon reset or module removal, it gets stuck.
Fix it. Drop the reference if schedule_work() indicates that the work is
already queued.
Note that this bug got more apparent (or apparent at all) due to
commit ce664331b2 ("vmw_balloon: VMCI_DOORBELL_SET does not check status").
Fixes: 83e2ec765b ("VMCI: doorbell implementation.")
Reported-by: Francois Rigault <rigault.francois@gmail.com>
Cc: Jorgen Hansen <jhansen@vmware.com>
Cc: Adit Ranadive <aditr@vmware.com>
Cc: Alexios Zavras <alexios.zavras@intel.com>
Cc: Vishnu DASA <vdasa@vmware.com>
Cc: stable@vger.kernel.org
Signed-off-by: Nadav Amit <namit@vmware.com>
Reviewed-by: Vishnu Dasa <vdasa@vmware.com>
Link: https://lore.kernel.org/r/20190820202638.49003-1-namit@vmware.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The Falcon microcontroller that runs the XUSB firmware and which is
responsible for exposing the XHCI interface can address only 40 bits of
memory. Typically that's not a problem because Tegra devices don't have
enough system memory to exceed those 40 bits.
However, if the ARM SMMU is enable on Tegra186 and later, the addresses
passed to the XUSB controller can be anywhere in the 48-bit IOV address
space of the ARM SMMU. Since the DMA/IOMMU API starts allocating from
the top of the IOVA space, the Falcon microcontroller is not able to
load the firmware successfully.
Fix this by setting the DMA mask to 40 bits, which will force the DMA
API to map the buffer for the firmware to an IOVA that is addressable by
the Falcon.
Signed-off-by: Nagarjuna Kristam <nkristam@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Link: https://lore.kernel.org/r/1566989697-13049-1-git-send-email-nkristam@nvidia.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch fixes an issue that the following error is
possible to happen when ohci hardware causes an interruption
and the system is shutting down at the same time.
[ 34.851754] usb 2-1: USB disconnect, device number 2
[ 35.166658] irq 156: nobody cared (try booting with the "irqpoll" option)
[ 35.173445] CPU: 0 PID: 22 Comm: kworker/0:1 Not tainted 5.3.0-rc5 #85
[ 35.179964] Hardware name: Renesas Salvator-X 2nd version board based on r8a77965 (DT)
[ 35.187886] Workqueue: usb_hub_wq hub_event
[ 35.192063] Call trace:
[ 35.194509] dump_backtrace+0x0/0x150
[ 35.198165] show_stack+0x14/0x20
[ 35.201475] dump_stack+0xa0/0xc4
[ 35.204785] __report_bad_irq+0x34/0xe8
[ 35.208614] note_interrupt+0x2cc/0x318
[ 35.212446] handle_irq_event_percpu+0x5c/0x88
[ 35.216883] handle_irq_event+0x48/0x78
[ 35.220712] handle_fasteoi_irq+0xb4/0x188
[ 35.224802] generic_handle_irq+0x24/0x38
[ 35.228804] __handle_domain_irq+0x5c/0xb0
[ 35.232893] gic_handle_irq+0x58/0xa8
[ 35.236548] el1_irq+0xb8/0x180
[ 35.239681] __do_softirq+0x94/0x23c
[ 35.243253] irq_exit+0xd0/0xd8
[ 35.246387] __handle_domain_irq+0x60/0xb0
[ 35.250475] gic_handle_irq+0x58/0xa8
[ 35.254130] el1_irq+0xb8/0x180
[ 35.257268] kernfs_find_ns+0x5c/0x120
[ 35.261010] kernfs_find_and_get_ns+0x3c/0x60
[ 35.265361] sysfs_unmerge_group+0x20/0x68
[ 35.269454] dpm_sysfs_remove+0x2c/0x68
[ 35.273284] device_del+0x80/0x370
[ 35.276683] hid_destroy_device+0x28/0x60
[ 35.280686] usbhid_disconnect+0x4c/0x80
[ 35.284602] usb_unbind_interface+0x6c/0x268
[ 35.288867] device_release_driver_internal+0xe4/0x1b0
[ 35.293998] device_release_driver+0x14/0x20
[ 35.298261] bus_remove_device+0x110/0x128
[ 35.302350] device_del+0x148/0x370
[ 35.305832] usb_disable_device+0x8c/0x1d0
[ 35.309921] usb_disconnect+0xc8/0x2d0
[ 35.313663] hub_event+0x6e0/0x1128
[ 35.317146] process_one_work+0x1e0/0x320
[ 35.321148] worker_thread+0x40/0x450
[ 35.324805] kthread+0x124/0x128
[ 35.328027] ret_from_fork+0x10/0x18
[ 35.331594] handlers:
[ 35.333862] [<0000000079300c1d>] usb_hcd_irq
[ 35.338126] [<0000000079300c1d>] usb_hcd_irq
[ 35.342389] Disabling IRQ #156
ohci_shutdown() disables all the interrupt and rh_state is set to
OHCI_RH_HALTED. In other hand, ohci_irq() is possible to enable
OHCI_INTR_SF and OHCI_INTR_MIE on ohci_irq(). Note that OHCI_INTR_SF
is possible to be set by start_ed_unlink() which is called:
ohci_irq()
-> process_done_list()
-> takeback_td()
-> start_ed_unlink()
So, ohci_irq() has the following condition, the issue happens by
&ohci->regs->intrenable = OHCI_INTR_MIE | OHCI_INTR_SF and
ohci->rh_state = OHCI_RH_HALTED:
/* interrupt for some other device? */
if (ints == 0 || unlikely(ohci->rh_state == OHCI_RH_HALTED))
return IRQ_NOTMINE;
To fix the issue, ohci_shutdown() holds the spin lock while disabling
the interruption and changing the rh_state flag to prevent reenable
the OHCI_INTR_MIE unexpectedly. Note that io_watchdog_func() also
calls the ohci_shutdown() and it already held the spin lock, so that
the patch makes a new function as _ohci_shutdown().
This patch is inspired by a Renesas R-Car Gen3 BSP patch
from Tho Vu.
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Cc: stable <stable@vger.kernel.org>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Link: https://lore.kernel.org/r/1566877910-6020-1-git-send-email-yoshihiro.shimoda.uh@renesas.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Using managed device resources in usb_hcd_pci_probe() allows devm usage for
resource subranges, such as the mmio resource for the platform device
created to control host/device mode mux, which is a xhci extended
capability, and sits inside the xhci mmio region.
If managed device resources are not used then "parent" resource
is released before subrange at driver removal as .remove callback is
called before the devres list of resources for this device is walked
and released.
This has been observed with the xhci extended capability driver causing a
use-after-free which is now fixed.
An additional nice benefit is that error handling on driver initialisation
is simplified much.
Signed-off-by: Carsten Schmid <carsten_schmid@mentor.com>
Tested-by: Carsten Schmid <carsten_schmid@mentor.com>
Reviewed-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Fixes: fa31b3cb2a ("xhci: Add Intel extended cap / otg phy mux handling")
Cc: <stable@vger.kernel.org> # v4.19+
Link: https://lore.kernel.org/r/1566569488679.31808@mentor.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To address the requirements of embargoed hardware issues, like Meltdown,
Spectre, L1TF etc. it is necessary to define and document a process for
handling embargoed hardware security issues.
Following the discussion at the maintainer summit 2018 in Edinburgh
(https://lwn.net/Articles/769417/) the volunteered people have worked
out a process and a Memorandum of Understanding. The latter addresses
the fact that the Linux kernel community cannot sign NDAs for various
reasons.
The initial contact point for hardware security issues is different from
the regular kernel security contact to provide a known and neutral
interface for hardware vendors and researchers. The initial primary
contact team is proposed to be staffed by Linux Foundation Fellows, who
are not associated to a vendor or a distribution and are well connected
in the industry as a whole.
The process is designed with the experience of the past incidents in
mind and tries to address the remaining gaps, so future (hopefully rare)
incidents can be handled more efficiently. It won't remove the fact,
that most of this has to be done behind closed doors, but it is set up
to avoid big bureaucratic hurdles for individual developers.
The process is solely for handling hardware security issues and cannot
be used for regular kernel (software only) security bugs.
This memo can help with hardware companies who, and I quote, "[my
manager] doesn't want to bet his job on the list keeping things secret."
This despite numerous leaks directly from that company over the years,
and none ever so far from the kernel security team. Cognitive
dissidence seems to be a requirement to be a good manager.
To accelerate the adoption of this process, we introduce the concept of
ambassadors in participating companies. The ambassadors are there to
guide people to comply with the process, but are not automatically
involved in the disclosure of a particular incident.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Laura Abbott <labbott@redhat.com>
Acked-by: Ben Hutchings <ben@decadent.org.uk>
Reviewed-by: Tyler Hicks <tyhicks@canonical.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Jiri Kosina <jkosina@suse.cz>
Link: https://lore.kernel.org/r/20190815212505.GC12041@kroah.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Moritz writes:
FPGA Manager fixes for 5.3
A single fix for the altera-ps-spi driver that fixes the behavior when
the driver receives -EPROBE_DEFER when trying to obtain a GPIO desc.
Signed-off-by: Moritz Fischer <mdf@kernel.org>
* tag 'fpga-fixes-for-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/mdf/linux-fpga:
fpga: altera-ps-spi: Fix getting of optional confd gpio
Pull arm64 fixes from Will Deacon:
"Hot on the heels of our last set of fixes are a few more for -rc7.
Two of them are fixing issues with our virtual interrupt controller
implementation in KVM/arm, while the other is a longstanding but
straightforward kallsyms fix which was been acked by Masami and
resolves an initialisation failure in kprobes observed on arm64.
- Fix GICv2 emulation bug (KVM)
- Fix deadlock in virtual GIC interrupt injection code (KVM)
- Fix kprobes blacklist init failure due to broken kallsyms lookup"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
KVM: arm/arm64: vgic-v2: Handle SGI bits in GICD_I{S,C}PENDR0 as WI
KVM: arm/arm64: vgic: Fix potential deadlock when ap_list is long
kallsyms: Don't let kallsyms_lookup_size_offset() fail on retrieving the first symbol
Yi reported[1] that after commit a3619190d6 ("libnvdimm/pfn: stop
padding pmem namespaces to section alignment"), it was no longer
possible to create a device dax namespace with a 1G alignment. The
reason was that the pmem region was not itself 1G-aligned. The code
happily skips past the first 512M, but fails to account for a now
misaligned end offset (since space was allocated starting at that
misaligned address, and extending for size GBs). Reintroduce
end_trunc, so that the code correctly handles the misaligned end
address. This results in the same behavior as before the introduction
of the offending commit.
[1] https://lists.01.org/pipermail/linux-nvdimm/2019-July/022813.html
Fixes: a3619190d6 ("libnvdimm/pfn: stop padding pmem namespaces ...")
Reported-and-tested-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Link: https://lore.kernel.org/r/x49ftll8f39.fsf@segfault.boston.devel.redhat.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The de-init routine should be doing the following in order:-
1. Unregister the drm device
2. Shut down the crtcs - failing to do this might cause a connector leakage
See the 'commit 109c4d18e5 ("drm/arm/malidp: Ensure that the crtcs are
shutdown before removing any encoder/connector")'
3. Disable the interrupts
4. Unbind the components
5. Free up DRM mode_config info
Changes from v1:-
1. Re-ordered the header files inclusion
2. Rebased on top of the latest drm-misc-fixes
Signed-off-by:. Ayan Kumar Halder <Ayan.Halder@arm.com>
Reviewed-by: Mihail Atanassov <mihail.atanassov@arm.com>
Reviewed-by: James Qian Wang (Arm Technology China) <james.qian.wang@arm.com>
Link: https://patchwork.freedesktop.org/patch/327606/
One of the very few warnings I have in the current build comes from
arch/x86/boot/edd.c, where I get the following with a gcc9 build:
arch/x86/boot/edd.c: In function ‘query_edd’:
arch/x86/boot/edd.c:148:11: warning: taking address of packed member of ‘struct boot_params’ may result in an unaligned pointer value [-Waddress-of-packed-member]
148 | mbrptr = boot_params.edd_mbr_sig_buffer;
| ^~~~~~~~~~~
This warning triggers because we throw away all the CFLAGS and then make
a new set for REALMODE_CFLAGS, so the -Wno-address-of-packed-member we
added in the following commit is not present:
6f303d6053 ("gcc-9: silence 'address-of-packed-member' warning")
The simplest solution for now is to adjust the warning for this version
of CFLAGS as well, but it would definitely make sense to examine whether
REALMODE_CFLAGS could be derived from CFLAGS, so that it picks up changes
in the compiler flags environment automatically.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Walking the address list of an inet6_dev requires
appropriate locking. Since the called function
siw_listen_address() may sleep, we have to use
rtnl_lock() instead of read_lock_bh().
Also introduces sanity checks if we got a device
from in_dev_get() or in6_dev_get().
Reported-by: Bart Van Assche <bvanassche@acm.org>
Fixes: 6c52fdc244 ("rdma/siw: connection management")
Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com>
Link: https://lore.kernel.org/r/20190828130355.22830-1-bmt@zurich.ibm.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
The register number needs to be translated for chips with more than 8
ports. This patch fixes a bug causing all chips with more than 8 GPIO pins
to not work correctly.
Fixes: 0f25fda840 ("gpio: pca953x: Zap ad-hoc reg_direction cache")
Cc: Cc: <stable@vger.kernel.org>
Signed-off-by: David Jander <david@protonic.nl>
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
The parent image is read only up to the overlap point, the rest of
the buffer should be zeroed. This snuck in because as it turns out
the overlap test case has not been triggering this code path for
a while now.
Fixes: a9b67e6994 ("rbd: replace obj_req->tried_parent with obj_req->read_state")
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
In set_secret(), key->tfm is assigned to NULL on line 55, and then
ceph_crypto_key_destroy(key) is executed.
ceph_crypto_key_destroy(key)
crypto_free_sync_skcipher(key->tfm)
crypto_free_skcipher(&tfm->base);
This happens to work because crypto_sync_skcipher is a trivial wrapper
around crypto_skcipher: &tfm->base is still 0 and crypto_free_skcipher()
handles that. Let's not rely on the layout of crypto_sync_skcipher.
This bug is found by a static analysis tool STCheck written by us.
Fixes: 69d6302b65 ("libceph: Remove VLA usage of skcipher").
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
A guest is not allowed to inject a SGI (or clear its pending state)
by writing to GICD_ISPENDR0 (resp. GICD_ICPENDR0), as these bits are
defined as WI (as per ARM IHI 0048B 4.3.7 and 4.3.8).
Make sure we correctly emulate the architecture.
Fixes: 96b298000d ("KVM: arm/arm64: vgic-new: Add PENDING registers handlers")
Cc: stable@vger.kernel.org # 4.7+
Reported-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Vladimir Rutsky reported stuck TCP sessions after memory pressure
events. Edge Trigger epoll() user would never receive an EPOLLOUT
notification allowing them to retry a sendmsg().
Jason tested the case of sk_stream_alloc_skb() returning NULL,
but there are other paths that could lead both sendmsg() and sendpage()
to return -1 (EAGAIN), with an empty skb queued on the write queue.
This patch makes sure we remove this empty skb so that
Jason code can detect that the queue is empty, and
call sk->sk_write_space(sk) accordingly.
Fixes: ce5ec44099 ("tcp: ensure epoll edge trigger wakeup when write queue is empty")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jason Baron <jbaron@akamai.com>
Reported-by: Vladimir Rutsky <rutsky@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The rds6_inc_info_copy() function has a couple struct members which
are leaking stack information. The ->tos field should hold actual
information and the ->flags field needs to be zeroed out.
Fixes: 3eb450367d ("rds: add type of service(tos) infrastructure")
Fixes: b7ff8b1036 ("rds: Extend RDS API for IPv6 support")
Reported-by: 黄ID蝴蝶 <butterflyhuangxx@gmail.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Ka-Cheong Poon <ka-cheong.poon@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After witnessing the discussion in https://lkml.org/lkml/2019/8/14/151
w.r.t. ioctl extensibility, it became clear that such an issue might
prevent that the 3 RSV bits inside the DSA 802.1Q tag might also suffer
the same fate and be useless for further extension.
So clearly specify that the reserved bits should currently be
transmitted as zero and ignored on receive. The DSA tagger already does
this (and has always did), and is the only known user so far (no
Wireshark dissection plugin, etc). So there should be no incompatibility
to speak of.
Fixes: 0471dd429c ("net: dsa: tag_8021q: Create a stable binary format")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 34786005ec ("net: phy: prevent PHYs w/o Clause 22 regs from calling
genphy_config_aneg") introduced a check that aborts phy_config_aneg()
if the phy is a C45 phy.
This causes phy_state_machine() to call phy_error() so that the phy
ends up in PHY_HALTED state.
Instead of returning -EOPNOTSUPP, call genphy_c45_config_aneg()
(analogous to the C22 case) so that the state machine can run
correctly.
genphy_c45_config_aneg() closely resembles mv3310_config_aneg()
in drivers/net/phy/marvell10g.c, excluding vendor specific
configurations for 1000BaseT.
Fixes: 22b56e8270 ("net: phy: replace genphy_10g_driver with genphy_c45_driver")
Signed-off-by: Marco Hartmann <marco.hartmann@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Using strscpy is cleaner, and avoids some problems with
handling maximum length strings. Linus noticed the
original problem and Aurelien pointed out some additional
problems. Fortunately most of this is SMB1 code (and
in particular the ASCII string handling older, which
is less common).
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
The net pointer in struct xt_tgdtor_param is not explicitly
initialized therefore is still NULL when dereferencing it.
So we have to find a way to pass the correct net pointer to
ipt_destroy_target().
The best way I find is just saving the net pointer inside the per
netns struct tcf_idrinfo, which could make this patch smaller.
Fixes: 0c66dc1ea3 ("netfilter: conntrack: register hooks in netns when needed by ruleset")
Reported-and-tested-by: itugrok@yahoo.com
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It's safer to zero out the password so that it can never be disclosed.
Fixes: 0c219f5799c7 ("cifs: set domainName when a domain-key is used in multiuser")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
RHBZ: 1710429
When we use a domain-key to authenticate using multiuser we must also set
the domainnmame for the new volume as it will be used and passed to the server
in the NTLMSSP Domain-name.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Pull NFS client bugfixes from Trond Myklebust:
"Highlights include:
Stable fixes:
- Fix a page lock leak in nfs_pageio_resend()
- Ensure O_DIRECT reports an error if the bytes read/written is 0
- Don't handle errors if the bind/connect succeeded
- Revert "NFSv4/flexfiles: Abort I/O early if the layout segment was
invalidat ed"
Bugfixes:
- Don't refresh attributes with mounted-on-file information
- Fix return values for nfs4_file_open() and nfs_finish_open()
- Fix pnfs layoutstats reporting of I/O errors
- Don't use soft RPC calls for pNFS/flexfiles I/O, and don't abort
for soft I/O errors when the user specifies a hard mount.
- Various fixes to the error handling in sunrpc
- Don't report writepage()/writepages() errors twice"
* tag 'nfs-for-5.3-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFS: remove set but not used variable 'mapping'
NFSv2: Fix write regression
NFSv2: Fix eof handling
NFS: Fix writepage(s) error handling to not report errors twice
NFS: Fix spurious EIO read errors
pNFS/flexfiles: Don't time out requests on hard mounts
SUNRPC: Handle connection breakages correctly in call_status()
Revert "NFSv4/flexfiles: Abort I/O early if the layout segment was invalidated"
SUNRPC: Handle EADDRINUSE and ENOBUFS correctly
pNFS/flexfiles: Turn off soft RPC calls
SUNRPC: Don't handle errors if the bind/connect succeeded
NFS: On fatal writeback errors, we need to call nfs_inode_remove_request()
NFS: Fix initialisation of I/O result struct in nfs_pgio_rpcsetup
NFS: Ensure O_DIRECT reports an error if the bytes read/written is 0
NFSv4/pnfs: Fix a page lock leak in nfs_pageio_resend()
NFSv4: Fix return value in nfs_finish_open()
NFSv4: Fix return values for nfs4_file_open()
NFS: Don't refresh attributes with mounted-on-file information
Don't advance RIP or inject a single-step #DB if emulation signals a
fault. This logic applies to all state updates that are conditional on
clean retirement of the emulation instruction, e.g. updating RFLAGS was
previously handled by commit 38827dbd3f ("KVM: x86: Do not update
EFLAGS on faulting emulation").
Not advancing RIP is likely a nop, i.e. ctxt->eip isn't updated with
ctxt->_eip until emulation "retires" anyways. Skipping #DB injection
fixes a bug reported by Andy Lutomirski where a #UD on SYSCALL due to
invalid state with EFLAGS.TF=1 would loop indefinitely due to emulation
overwriting the #UD with #DB and thus restarting the bad SYSCALL over
and over.
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: stable@vger.kernel.org
Reported-by: Andy Lutomirski <luto@kernel.org>
Fixes: 663f4c61b8 ("KVM: x86: handle singlestep during emulation")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
If kvm_intel is loaded with nested=0 parameter an attempt to perform
KVM_GET_SUPPORTED_HV_CPUID results in OOPS as nested_get_evmcs_version hook
in kvm_x86_ops is NULL (we assign it in nested_vmx_hardware_setup() and
this only happens in case nested is enabled).
Check that kvm_x86_ops->nested_get_evmcs_version is not NULL before
calling it. With this, we can remove the stub from svm as it is no
longer needed.
Cc: <stable@vger.kernel.org>
Fixes: e2e871ab2f ("x86/kvm/hyper-v: Introduce nested_get_evmcs_version() helper")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Pull ARC updates from Vineet Gupta:
- support for Edge Triggered IRQs in ARC IDU intc
- other fixes here and there
* tag 'arc-5.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
arc: prefer __section from compiler_attributes.h
dt-bindings: IDU-intc: Add support for edge-triggered interrupts
dt-bindings: IDU-intc: Clean up documentation
ARCv2: IDU-intc: Add support for edge-triggered interrupts
ARC: unwind: Mark expected switch fall-throughs
ARC: [plat-hsdk]: allow to switch between AXI DMAC port configurations
ARC: fix typo in setup_dma_ops log message
ARCv2: entry: early return from exception need not clear U & DE bits
Pull MFD fix from Lee Jones:
"Identify potentially unused functions in rk808 driver when !PM"
* tag 'mfd-fixes-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd:
mfd: rk808: Make PM function declaration static
mfd: rk808: Mark pm functions __maybe_unused
Pull sound fixes from Takashi Iwai:
"A collection of small fixes as usual:
- More coverage of USB-audio descriptor sanity checks
- A fix for mute LED regression on Conexant HD-audio codecs
- A few device-specific fixes and quirks for USB-audio and HD-audio
- A fix for (die-hard remaining) possible race in sequencer core
- FireWire oxfw regression fix that was introduced in 5.3-rc1"
* tag 'sound-5.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: oxfw: fix to handle correct stream for PCM playback
ALSA: seq: Fix potential concurrent access to the deleted pool
ALSA: usb-audio: Check mixer unit bitmap yet more strictly
ALSA: line6: Fix memory leak at line6_init_pcm() error path
ALSA: usb-audio: Fix invalid NULL check in snd_emuusb_set_samplerate()
ALSA: hda/ca0132 - Add new SBZ quirk
ALSA: usb-audio: Add implicit fb quirk for Behringer UFX1604
ALSA: hda - Fixes inverted Conexant GPIO mic mute led
For picasso(adev->pdev->device == 0x15d8)&raven2(adev->rev_id >= 0x8),
firmware is sufficient to support gfxoff.
In commit 98f58ada2d, for picasso&raven2,
return directly and cause gfxoff disabled.
Fixes: 98f58ada2d ("drm/amdgpu/gfx9: update pg_flags after determining if gfx off is possible")
Reviewed-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Pull networking fixes from David Miller:
1) Use 32-bit index for tails calls in s390 bpf JIT, from Ilya
Leoshkevich.
2) Fix missed EPOLLOUT events in TCP, from Eric Dumazet. Same fix for
SMC from Jason Baron.
3) ipv6_mc_may_pull() should return 0 for malformed packets, not
-EINVAL. From Stefano Brivio.
4) Don't forget to unpin umem xdp pages in error path of
xdp_umem_reg(). From Ivan Khoronzhuk.
5) Fix sta object leak in mac80211, from Johannes Berg.
6) Fix regression by not configuring PHYLINK on CPU port of bcm_sf2
switches. From Florian Fainelli.
7) Revert DMA sync removal from r8169 which was causing regressions on
some MIPS Loongson platforms. From Heiner Kallweit.
8) Use after free in flow dissector, from Jakub Sitnicki.
9) Fix NULL derefs of net devices during ICMP processing across
collect_md tunnels, from Hangbin Liu.
10) proto_register() memory leaks, from Zhang Lin.
11) Set NLM_F_MULTI flag in multipart netlink messages consistently,
from John Fastabend.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (66 commits)
r8152: Set memory to all 0xFFs on failed reg reads
openvswitch: Fix conntrack cache with timeout
ipv4: mpls: fix mpls_xmit for iptunnel
nexthop: Fix nexthop_num_path for blackhole nexthops
net: rds: add service level support in rds-info
net: route dump netlink NLM_F_MULTI flag missing
s390/qeth: reject oversized SNMP requests
sock: fix potential memory leak in proto_register()
MAINTAINERS: Add phylink keyword to SFF/SFP/SFP+ MODULE SUPPORT
xfrm/xfrm_policy: fix dst dev null pointer dereference in collect_md mode
ipv4/icmp: fix rt dst dev null pointer dereference
openvswitch: Fix log message in ovs conntrack
bpf: allow narrow loads of some sk_reuseport_md fields with offset > 0
bpf: fix use after free in prog symbol exposure
bpf: fix precision tracking in presence of bpf2bpf calls
flow_dissector: Fix potential use-after-free on BPF_PROG_DETACH
Revert "r8169: remove not needed call to dma_sync_single_for_device"
ipv6: propagate ipv6_add_dev's error returns out of ipv6_find_idev
net/ncsi: Fix the payload copying for the request coming from Netlink
qed: Add cleanup in qed_slowpath_start()
...
When I merged the extension sysctl tables with the main one I forgot to
reset them on netns creation. They currently read/write init_net settings.
Fixes: d912dec124 ("netfilter: conntrack: merge acct and helper sysctl table with main one")
Fixes: cb2833ed00 ("netfilter: conntrack: merge ecache and timestamp sysctl tables with main one")
Reported-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
If the ap_list is longer than 256 entries, merge_final() in list_sort()
will call the comparison callback with the same element twice, causing
a deadlock in vgic_irq_cmp().
Fix it by returning early when irqa == irqb.
Cc: stable@vger.kernel.org # 4.7+
Fixes: 8e44474579 ("KVM: arm/arm64: vgic-new: Add IRQ sorting")
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Heyi Guo <guoheyi@huawei.com>
[maz: massaged commit log and patch, added Fixes and Cc-stable]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
An arm64 kernel configured with
CONFIG_KPROBES=y
CONFIG_KALLSYMS=y
# CONFIG_KALLSYMS_ALL is not set
CONFIG_KALLSYMS_BASE_RELATIVE=y
reports the following kprobe failure:
[ 0.032677] kprobes: failed to populate blacklist: -22
[ 0.033376] Please take care of using kprobes.
It appears that kprobe fails to retrieve the symbol at address
0xffff000010081000, despite this symbol being in System.map:
ffff000010081000 T __exception_text_start
This symbol is part of the first group of aliases in the
kallsyms_offsets array (symbol names generated using ugly hacks in
scripts/kallsyms.c):
kallsyms_offsets:
.long 0x1000 // do_undefinstr
.long 0x1000 // efi_header_end
.long 0x1000 // _stext
.long 0x1000 // __exception_text_start
.long 0x12b0 // do_cp15instr
Looking at the implementation of get_symbol_pos(), it returns the
lowest index for aliasing symbols. In this case, it return 0.
But kallsyms_lookup_size_offset() considers 0 as a failure, which
is obviously wrong (there is definitely a valid symbol living there).
In turn, the kprobe blacklisting stops abruptly, hence the original
error.
A CONFIG_KALLSYMS_ALL kernel wouldn't fail as there is always
some random symbols at the beginning of this array, which are never
looked up via kallsyms_lookup_size_offset.
Fix it by considering that get_symbol_pos() is always successful
(which is consistent with the other uses of this function).
Fixes: ffc5089196 ("[PATCH] Create kallsyms_lookup_size_offset()")
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Fixes gcc '-Wunused-but-set-variable' warning:
fs/nfs/write.c: In function nfs_page_async_flush:
fs/nfs/write.c:609:24: warning: variable mapping set but not used [-Wunused-but-set-variable]
It is not use since commit aefb623c422e ("NFS: Fix
writepage(s) error handling to not report errors twice")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
If we received a reply from the server with a zero length read and
no error, then that implies we are at eof.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Avoids:
../drivers/mfd/rk808.c:771:1: warning: symbol 'rk8xx_pm_ops' \
was not declared. Should it be static?
Fixes: 5752bc4373 ("mfd: rk808: Mark pm functions __maybe_unused")
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
The find_pattern() debug output was printing the 'skip' character.
This can be a NULL-byte and messes up further pr_debug() output.
Output without the fix:
kernel: nf_conntrack_ftp: Pattern matches!
kernel: nf_conntrack_ftp: Skipped up to `<7>nf_conntrack_ftp: find_pattern `PORT': dlen = 8
kernel: nf_conntrack_ftp: find_pattern `EPRT': dlen = 8
Output with the fix:
kernel: nf_conntrack_ftp: Pattern matches!
kernel: nf_conntrack_ftp: Skipped up to 0x0 delimiter!
kernel: nf_conntrack_ftp: Match succeeded!
kernel: nf_conntrack_ftp: conntrack_ftp: match `172,17,0,100,200,207' (20 bytes at 4150681645)
kernel: nf_conntrack_ftp: find_pattern `PORT': dlen = 8
Signed-off-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Simplify the check in physdev_mt_check() to emit an error message
only when passed an invalid chain (ie, NF_INET_LOCAL_OUT).
This avoids cluttering up the log with errors against valid rules.
For large/heavily modified rulesets, current behavior can quickly
overwhelm the ring buffer, because this function gets called on
every change, regardless of the rule that was changed.
Signed-off-by: Todd Seidelmann <tseidelmann@linode.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The in-place decryption routines in AF_RXRPC's rxkad security module
currently call skb_cow_data() to make sure the data isn't shared and that
the skb can be written over. This has a problem, however, as the softirq
handler may be still holding a ref or the Rx ring may be holding multiple
refs when skb_cow_data() is called in rxkad_verify_packet() - and so
skb_shared() returns true and __pskb_pull_tail() dislikes that. If this
occurs, something like the following report will be generated.
kernel BUG at net/core/skbuff.c:1463!
...
RIP: 0010:pskb_expand_head+0x253/0x2b0
...
Call Trace:
__pskb_pull_tail+0x49/0x460
skb_cow_data+0x6f/0x300
rxkad_verify_packet+0x18b/0xb10 [rxrpc]
rxrpc_recvmsg_data.isra.11+0x4a8/0xa10 [rxrpc]
rxrpc_kernel_recv_data+0x126/0x240 [rxrpc]
afs_extract_data+0x51/0x2d0 [kafs]
afs_deliver_fs_fetch_data+0x188/0x400 [kafs]
afs_deliver_to_call+0xac/0x430 [kafs]
afs_wait_for_call_to_complete+0x22f/0x3d0 [kafs]
afs_make_call+0x282/0x3f0 [kafs]
afs_fs_fetch_data+0x164/0x300 [kafs]
afs_fetch_data+0x54/0x130 [kafs]
afs_readpages+0x20d/0x340 [kafs]
read_pages+0x66/0x180
__do_page_cache_readahead+0x188/0x1a0
ondemand_readahead+0x17d/0x2e0
generic_file_read_iter+0x740/0xc10
__vfs_read+0x145/0x1a0
vfs_read+0x8c/0x140
ksys_read+0x4a/0xb0
do_syscall_64+0x43/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fix this by using skb_unshare() instead in the input path for DATA packets
that have a security index != 0. Non-DATA packets don't need in-place
encryption and neither do unencrypted DATA packets.
Fixes: 248f219cb8 ("rxrpc: Rewrite the data and ack handling code")
Reported-by: Julian Wollrath <jwollrath@web.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Use the previously-added transmit-phase skbuff private flag to simplify the
socket buffer tracing a bit. Which phase the skbuff comes from can now be
divined from the skb rather than having to be guessed from the call state.
We can also reduce the number of rxrpc_skb_trace values by eliminating the
difference between Tx and Rx in the symbols.
Signed-off-by: David Howells <dhowells@redhat.com>
Add a flag in the private data on an skbuff to indicate that this is a
transmission-phase buffer rather than a receive-phase buffer.
Signed-off-by: David Howells <dhowells@redhat.com>
Abstract out rxtx ring cleanup into its own function from its two callers.
This makes it easier to apply the same changes to both.
Signed-off-by: David Howells <dhowells@redhat.com>
Pass the reference held on a DATA skb in the rxrpc input handler into the
Rx ring rather than getting an additional ref for this and then dropping
the original ref at the end.
Signed-off-by: David Howells <dhowells@redhat.com>
Use the information now cached in the skbuff private data to avoid the need
to reparse a jumbo packet. We can find all the subpackets by dead
reckoning, so it's only necessary to note how many there are, whether the
last one is flagged as LAST_PACKET and whether any have the REQUEST_ACK
flag set.
This is necessary as once recvmsg() can see the packet, it can start
modifying it, such as doing in-place decryption.
Fixes: 248f219cb8 ("rxrpc: Rewrite the data and ack handling code")
Signed-off-by: David Howells <dhowells@redhat.com>
Improve the information stored about jumbo packets so that we don't need to
reparse them so much later.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
Currently, we don't call dma_set_max_seg_size() for i915 because we
intentionally do not limit the segment length that the device supports.
However, this results in a warning being emitted if we try to map
anything larger than SZ_64K on a kernel with CONFIG_DMA_API_DEBUG_SG
enabled:
[ 7.751926] DMA-API: i915 0000:00:02.0: mapping sg segment longer
than device claims to support [len=98304] [max=65536]
[ 7.751934] WARNING: CPU: 5 PID: 474 at kernel/dma/debug.c:1220
debug_dma_map_sg+0x20f/0x340
This was originally brought up on
https://bugs.freedesktop.org/show_bug.cgi?id=108517 , and the consensus
there was it wasn't really useful to set a limit (and that dma-debug
isn't really all that useful for i915 in the first place). Unfortunately
though, CONFIG_DMA_API_DEBUG_SG is enabled in the debug configs for
various distro kernels. Since a WARN_ON() will disable automatic problem
reporting (and cause any CI with said option enabled to start
complaining), we really should just fix the problem.
Note that as me and Chris Wilson discussed, the other solution for this
would be to make DMA-API not make such assumptions when a driver hasn't
explicitly set a maximum segment size. But, taking a look at the commit
which originally introduced this behavior, commit 78c47830a5
("dma-debug: check scatterlist segments"), there is an explicit mention
of this assumption and how it applies to devices with no segment size:
Conversely, devices which are less limited than the rather
conservative defaults, or indeed have no limitations at all
(e.g. GPUs with their own internal MMU), should be encouraged to
set appropriate dma_parms, as they may get more efficient DMA
mapping performance out of it.
So unless there's any concerns (I'm open to discussion!), let's just
follow suite and call dma_set_max_seg_size() with UINT_MAX as our limit
to silence any warnings.
Changes since v3:
* Drop patch for enabling CONFIG_DMA_API_DEBUG_SG in CI. It looks like
just turning it on causes the kernel to spit out bogus WARN_ONs()
during some igt tests which would otherwise require teaching igt to
disable the various DMA-API debugging options causing this. This is
too much work to be worth it, since DMA-API debugging is useless for
us. So, we'll just settle with this single patch to squelch WARN_ONs()
during driver load for users that have CONFIG_DMA_API_DEBUG_SG turned
on for some reason.
* Move dma_set_max_seg_size() call into i915_driver_hw_probe() - Chris
Wilson
Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: <stable@vger.kernel.org> # v4.18+
Link: https://patchwork.freedesktop.org/patch/msgid/20190823205251.14298-1-lyude@redhat.com
(cherry picked from commit acd674af95)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
We're not allowed to create new properties after device registration
so for MST connectors we need to either create the max_bpc property
earlier, or we reuse one we already have. Let's do the latter apporach
since the corresponding SST connector already has the prop and its
min/max are correct also for the MST connector.
The problem was highlighted by commit 4f5368b554 ("drm/kms:
Catch mode_object lifetime errors") which results in the following
spew:
[ 1330.878941] WARNING: CPU: 2 PID: 1554 at drivers/gpu/drm/drm_mode_object.c:45 __drm_mode_object_add+0xa0/0xb0 [drm]
...
[ 1330.879008] Call Trace:
[ 1330.879023] drm_property_create+0xba/0x180 [drm]
[ 1330.879036] drm_property_create_range+0x15/0x30 [drm]
[ 1330.879048] drm_connector_attach_max_bpc_property+0x62/0x80 [drm]
[ 1330.879086] intel_dp_add_mst_connector+0x11f/0x140 [i915]
[ 1330.879094] drm_dp_add_port.isra.20+0x20b/0x440 [drm_kms_helper]
...
Cc: stable@vger.kernel.org
Cc: Lyude Paul <lyude@redhat.com>
Cc: sunpeng.li@amd.com
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Sean Paul <sean@poorly.run>
Fixes: 5ca0ef8a56 ("drm/i915: Add max_bpc property for DP MST")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190820161657.9658-1-ville.syrjala@linux.intel.com
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
(cherry picked from commit 1b9bd09630)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
The newly added suspend/resume functions are only used if CONFIG_PM
is enabled:
drivers/mfd/rk808.c:752:12: error: 'rk8xx_resume' defined but not used [-Werror=unused-function]
drivers/mfd/rk808.c:732:12: error: 'rk8xx_suspend' defined but not used [-Werror=unused-function]
Mark them as __maybe_unused so the compiler can silently drop them
when they are not needed.
Fixes: 586c1b4125 ("mfd: rk808: Add RK817 and RK809 support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
H_PUT_TCE_INDIRECT handlers receive a page with up to 512 TCEs from
a guest. Although we verify correctness of TCEs before we do anything
with the existing tables, there is a small window when a check in
kvmppc_tce_validate might pass and right after that the guest alters
the page of TCEs, causing an early exit from the handler and leaving
srcu_read_lock(&vcpu->kvm->srcu) (virtual mode) or lock_rmap(rmap)
(real mode) locked.
This fixes the bug by jumping to the common exit code with an appropriate
unlock.
Cc: stable@vger.kernel.org # v4.11+
Fixes: 121f80ba68 ("KVM: PPC: VFIO: Add in-kernel acceleration for VFIO")
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
For the 40.46 SMU release, they changed CurrSocketPower to
AverageSocketPower, but this was changed back in 40.47 so just check if
it's 40.46 and make the appropriate change
Tested with 40.45, 40.46 and 40.47 successfully
Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The SMU changed reading from CurrSocketPower to AverageSocketPower, so
reflect this accordingly. This fixes the issue where Average Power
Consumption was being reported as 0 from SMU 40.46-onward
v2: Fixed headline prefix
v3: Add check for SMU version for proper compatibility
v4: Style fix
Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Since BPF constant blinding is performed after the verifier pass, the
ALU32 instructions inserted for doubleword immediate loads don't have a
corresponding zext instruction. This is causing a kernel oops on powerpc
and can be reproduced by running 'test_cgroup_storage' with
bpf_jit_harden=2.
Fix this by emitting BPF_ZEXT during constant blinding if
prog->aux->verifier_zext is set.
Fixes: a4b1d3c1dd ("bpf: verifier: insert zero extension according to analysis result")
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
NFP is using Local Memory to model stack. LM_addr could be used as base of
a 16 32-bit word region of Local Memory. Then, if the stack offset is
beyond the current region, the local index needs to be updated. The update
needs at least three cycles to take effect, therefore the sequence normally
looks like:
local_csr_wr[ActLMAddr3, gprB_5]
nop
nop
nop
If the local index switch happens on a narrow loads, then the instruction
preparing value to zero high 32-bit of the destination register could be
counted as one cycle, the sequence then could be something like:
local_csr_wr[ActLMAddr3, gprB_5]
nop
nop
immed[gprB_5, 0]
However, we have zero extension optimization that zeroing high 32-bit could
be eliminated, therefore above IMMED insn won't be available for which case
the first sequence needs to be generated.
Fixes: 0b4de1ff19 ("nfp: bpf: eliminate zero extension code-gen")
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
If writepage()/writepages() saw an error, but handled it without
reporting it, we should not be re-reporting that error on exit.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
If the client attempts to read a page, but the read fails due to some
spurious error (e.g. an ACCESS error or a timeout, ...) then we need
to allow other processes to retry.
Also try to report errors correctly when doing a synchronous readpage.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
If the mount is hard, we should ignore the 'io_maxretrans' module
parameter so that we always keep retrying.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
If the connection breaks while we're waiting for a reply from the
server, then we want to immediately try to reconnect.
Fixes: ec6017d903 ("SUNRPC fix regression in umount of a secure mount")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
This reverts commit a79f194aa4.
The mechanism for aborting I/O is racy, since we are not guaranteed that
the request is asleep while we're changing both task->tk_status and
task->tk_action.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: stable@vger.kernel.org # v5.1
If a connect or bind attempt returns EADDRINUSE, that means we want to
retry with a different port. It is not a fatal connection error.
Similarly, ENOBUFS is not fatal, but just indicates a memory allocation
issue. Retry after a short delay.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
The pNFS/flexfiles I/O requests are sent with the SOFTCONN flag set, so
they automatically time out if the connection breaks. It should
therefore not be necessary to have the soft flag set in addition.
Fixes: 5f01d95394 ("nfs41: create NFSv3 DS connection if specified")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Don't handle errors in call_bind_status()/call_connect_status()
if it turns out that a previous call caused it to succeed.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: stable@vger.kernel.org # v5.1+
Although APIC initialization will typically clear out the LDR before
setting it, the APIC cleanup code should reset the LDR.
This was discovered with a 32-bit KVM guest jumping into a kdump
kernel. The stale bits in the LDR triggered a bug in the KVM APIC
implementation which caused the destination mapping for VCPUs to be
corrupted.
Note that this isn't intended to paper over the KVM APIC bug. The kernel
has to clear the LDR when resetting the APIC registers except when X2APIC
is enabled.
This lacks a Fixes tag because missing to clear LDR goes way back into pre
git history.
[ tglx: Made x2apic_enabled a function call as required ]
Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20190826101513.5080-3-bsd@redhat.com
Legacy apic init uses bigsmp for smp systems with 8 and more CPUs. The
bigsmp APIC implementation uses physical destination mode, but it
nevertheless initializes LDR and DFR. The LDR even ends up incorrectly with
multiple bit being set.
This does not cause a functional problem because LDR and DFR are ignored
when physical destination mode is active, but it triggered a problem on a
32-bit KVM guest which jumps into a kdump kernel.
The multiple bits set unearthed a bug in the KVM APIC implementation. The
code which creates the logical destination map for VCPUs ignores the
disabled state of the APIC and ends up overwriting an existing valid entry
and as a result, APIC calibration hangs in the guest during kdump
initialization.
Remove the bogus LDR/DFR initialization.
This is not intended to work around the KVM APIC bug. The LDR/DFR
ininitalization is wrong on its own.
The issue goes back into the pre git history. The fixes tag is the commit
in the bitkeeper import which introduced bigsmp support in 2003.
git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Fixes: db7b9e9f26b8 ("[PATCH] Clustered APIC setup for >8 CPU systems")
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20190826101513.5080-2-bsd@redhat.com
This updates the documentation for supporting an optional extra interrupt
cell to specify edge vs level triggered.
Signed-off-by: Mischa Jonker <mischa.jonker@synopsys.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
This adds support for an optional extra interrupt cell to specify edge
vs level triggered. It is backward compatible with dts files with only
one cell, and will default to level-triggered in such a case.
Note that I had to make a change to idu_irq_set_affinity as well, as
this function was setting the interrupt type to "level" unconditionally,
since this was the only type supported previously.
Signed-off-by: Mischa Jonker <mischa.jonker@synopsys.com>
Reviewed-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
When userspace application calls ioctl(2) to configure hardware for PCM
playback substream, ALSA OXFW driver handles incoming AMDTP stream.
In this case, outgoing AMDTP stream should be handled.
This commit fixes the bug for v5.3-rc kernel.
Fixes: 4f380d0070 ("ALSA: oxfw: configure packet format in pcm.hw_params callback")
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
32-bit processes running on a 64-bit kernel are not always detected
correctly, causing the process to crash when uretprobes are installed.
The reason for the crash is that in_ia32_syscall() is used to determine the
process's mode, which only works correctly when called from a syscall.
In the case of uretprobes, however, the function is called from a exception
and always returns 'false' on a 64-bit kernel. In consequence this leads to
corruption of the process's return address.
Fix this by using user_64bit_mode() instead of in_ia32_syscall(), which
is correct in any situation.
[ tglx: Add a comment and the following historical info ]
This should have been detected by the rename which happened in commit
abfb9498ee ("x86/entry: Rename is_{ia32,x32}_task() to in_{ia32,x32}_syscall()")
which states in the changelog:
The is_ia32_task()/is_x32_task() function names are a big misnomer: they
suggests that the compat-ness of a system call is a task property, which
is not true, the compatness of a system call purely depends on how it
was invoked through the system call layer.
.....
and then it went and blindly renamed every call site.
Sadly enough this was already mentioned here:
8faaed1b9f ("uprobes/x86: Introduce sizeof_long(), cleanup adjust_ret_addr() and
arch_uretprobe_hijack_return_addr()")
where the changelog says:
TODO: is_ia32_task() is not what we actually want, TS_COMPAT does
not necessarily mean 32bit. Fortunately syscall-like insns can't be
probed so it actually works, but it would be better to rename and
use is_ia32_frame().
and goes all the way back to:
0326f5a94d ("uprobes/core: Handle breakpoint and singlestep exceptions")
Oh well. 7+ years until someone actually tried a uretprobe on a 32bit
process on a 64bit kernel....
Fixes: 0326f5a94d ("uprobes/core: Handle breakpoint and singlestep exceptions")
Signed-off-by: Sebastian Mayr <me@sam.st>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20190728152617.7308-1-me@sam.st
get_registers() blindly copies the memory written to by the
usb_control_msg() call even if the underlying urb failed.
This could lead to junk register values being read by the driver, since
some indirect callers of get_registers() ignore the return values. One
example is:
ocp_read_dword() ignores the return value of generic_ocp_read(), which
calls get_registers().
So, emulate PCI "Master Abort" behavior by setting the buffer to all
0xFFs when usb_control_msg() fails.
This patch is copied from the r8152 driver (v2.12.0) published by
Realtek (www.realtek.com).
Signed-off-by: Prashant Malani <pmalani@chromium.org>
Acked-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch addresses a conntrack cache issue with timeout policy.
Currently, we do not check if the timeout extension is set properly in the
cached conntrack entry. Thus, after packet recirculate from conntrack
action, the timeout policy is not applied properly. This patch fixes the
aforementioned issue.
Fixes: 06bd2bdf19 ("openvswitch: Add timeout support to ct action")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When using mpls over gre/gre6 setup, rt->rt_gw4 address is not set, the
same for rt->rt_gw_family. Therefore, when rt->rt_gw_family is checked
in mpls_xmit(), neigh_xmit() call is skipped. As a result, such setup
doesn't work anymore.
This issue was found with LTP mpls03 tests.
Fixes: 1550c17193 ("ipv4: Prepare rtable for IPv6 gateway")
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Donald reported this sequence:
ip next add id 1 blackhole
ip next add id 2 blackhole
ip ro add 1.1.1.1/32 nhid 1
ip ro add 1.1.1.2/32 nhid 2
would cause a crash. Backtrace is:
[ 151.302790] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[ 151.304043] CPU: 1 PID: 277 Comm: ip Not tainted 5.3.0-rc5+ #37
[ 151.305078] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/2014
[ 151.306526] RIP: 0010:fib_add_nexthop+0x8b/0x2aa
[ 151.307343] Code: 35 f7 81 48 8d 14 01 c7 02 f1 f1 f1 f1 c7 42 04 01 f4 f4 f4 48 89 f2 48 c1 ea 03 65 48 8b 0c 25 28 00 00 00 48 89 4d d0 31 c9 <80> 3c 02 00 74 08 48 89 f7 e8 1a e8 53 ff be 08 00 00 00 4c 89 e7
[ 151.310549] RSP: 0018:ffff888116c27340 EFLAGS: 00010246
[ 151.311469] RAX: dffffc0000000000 RBX: ffff8881154ece00 RCX: 0000000000000000
[ 151.312713] RDX: 0000000000000004 RSI: 0000000000000020 RDI: ffff888115649b40
[ 151.313968] RBP: ffff888116c273d8 R08: ffffed10221e3757 R09: ffff888110f1bab8
[ 151.315212] R10: 0000000000000001 R11: ffff888110f1bab3 R12: ffff888115649b40
[ 151.316456] R13: 0000000000000020 R14: ffff888116c273b0 R15: ffff888115649b40
[ 151.317707] FS: 00007f60b4d8d800(0000) GS:ffff88811ac00000(0000) knlGS:0000000000000000
[ 151.319113] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 151.320119] CR2: 0000555671ffdc00 CR3: 00000001136ba005 CR4: 0000000000020ee0
[ 151.321367] Call Trace:
[ 151.321820] ? fib_nexthop_info+0x635/0x635
[ 151.322572] fib_dump_info+0xaa4/0xde0
[ 151.323247] ? fib_create_info+0x2431/0x2431
[ 151.324008] ? napi_alloc_frag+0x2a/0x2a
[ 151.324711] rtmsg_fib+0x2c4/0x3be
[ 151.325339] fib_table_insert+0xe2f/0xeee
...
fib_dump_info incorrectly has nhs = 0 for blackhole nexthops, so it
believes the nexthop object is a multipath group (nhs != 1) and ends
up down the nexthop_mpath_fill_node() path which is wrong for a
blackhole.
The blackhole check in nexthop_num_path is leftover from early days
of the blackhole implementation which did not initialize the device.
In the end the design was simpler (fewer special case checks) to set
the device to loopback in nh_info, so the check in nexthop_num_path
should have been removed.
Fixes: 430a049190 ("nexthop: Add support for nexthop groups")
Reported-by: Donald Sharp <sharpd@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull auxdisplay cleanup from Miguel Ojeda:
"Make ht16k33_fb_fix and ht16k33_fb_var constant (Nishka Dasgupta)"
* tag 'auxdisplay-for-linus-v5.3-rc7' of git://github.com/ojeda/linux:
auxdisplay: ht16k33: Make ht16k33_fb_fix and ht16k33_fb_var constant
Pull UML fix from Richard Weinberger:
"Fix time travel mode"
* tag 'for-linus-5.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
um: fix time travel mode
Pull UBIFS and JFFS2 fixes from Richard Weinberger:
"UBIFS:
- Don't block too long in writeback_inodes_sb()
- Fix for a possible overrun of the log head
- Fix double unlock in orphan_delete()
JFFS2:
- Remove C++ style from UAPI header and unbreak picky toolchains"
* tag 'for-linus-5.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs:
ubifs: Limit the number of pages in shrink_liability
ubifs: Correctly initialize c->min_log_bytes
ubifs: Fix double unlock around orphan_delete()
jffs2: Remove C++ style comments from uapi header
Pull x86 fixes from Thomas Gleixner:
"A few fixes for x86:
- Fix a boot regression caused by the recent bootparam sanitizing
change, which escaped the attention of all people who reviewed that
code.
- Address a boot problem on machines with broken E820 tables caused
by an underflow which ended up placing the trampoline start at
physical address 0.
- Handle machines which do not advertise a legacy timer of any form,
but need calibration of the local APIC timer gracefully by making
the calibration routine independent from the tick interrupt. Marked
for stable as well as there seems to be quite some new laptops
rolled out which expose this.
- Clear the RDRAND CPUID bit on AMD family 15h and 16h CPUs which are
affected by broken firmware which does not initialize RDRAND
correctly after resume. Add a command line parameter to override
this for machine which either do not use suspend/resume or have a
fixed BIOS. Unfortunately there is no way to detect this on boot,
so the only safe decision is to turn it off by default.
- Prevent RFLAGS from being clobbers in CALL_NOSPEC on 32bit which
caused fast KVM instruction emulation to break.
- Explain the Intel CPU model naming convention so that the repeating
discussions come to an end"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/retpoline: Don't clobber RFLAGS during CALL_NOSPEC on i386
x86/boot: Fix boot regression caused by bootparam sanitizing
x86/CPU/AMD: Clear RDRAND CPUID bit on AMD family 15h/16h
x86/boot/compressed/64: Fix boot on machines with broken E820 table
x86/apic: Handle missing global clockevent gracefully
x86/cpu: Explain Intel model naming convention
Pull timekeeping fix from Thomas Gleixner:
"A single fix for a regression caused by the generic VDSO
implementation where a math overflow causes CLOCK_BOOTTIME to become a
random number generator"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timekeeping/vsyscall: Prevent math overflow in BOOTTIME update
Pull scheduler fix from Thomas Gleixner:
"Handle the worker management in situations where a task is scheduled
out on a PI lock contention correctly and schedule a new worker if
possible"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/core: Schedule new worker even if PI-blocked
Pull perf fixes from Thomas Gleixner:
"Two small fixes for kprobes and perf:
- Prevent a deadlock in kprobe_optimizer() causes by reverse lock
ordering
- Fix a comment typo"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
kprobes: Fix potential deadlock in kprobe_optimizer()
perf/x86: Fix typo in comment
Pull irq fix from Thomas Gleixner:
"A single fix for a imbalanced kobject operation in the irq decriptor
code which was unearthed by the new warnings in the kobject code"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq: Properly pair kobject_del() with kobject_add()
Mergr misc fixes from Andrew Morton:
"11 fixes"
Mostly VM fixes, one psi polling fix, and one parisc build fix.
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm/kasan: fix false positive invalid-free reports with CONFIG_KASAN_SW_TAGS=y
mm/zsmalloc.c: fix race condition in zs_destroy_pool
mm/zsmalloc.c: migration can leave pages in ZS_EMPTY indefinitely
mm, page_owner: handle THP splits correctly
userfaultfd_release: always remove uffd flags and clear vm_userfaultfd_ctx
psi: get poll_work to run when calling poll syscall next time
mm: memcontrol: flush percpu vmevents before releasing memcg
mm: memcontrol: flush percpu vmstats before releasing memcg
parisc: fix compilation errrors
mm, page_alloc: move_freepages should not examine struct page of reserved memory
mm/z3fold.c: fix race between migration and destruction
The input pool of a client might be deleted via the resize ioctl, the
the access to it should be covered by the proper locks. Currently the
only missing place is the call in snd_seq_ioctl_get_client_pool(), and
this patch papers over it.
Reported-by: syzbot+4a75454b9ca2777f35c7@syzkaller.appspotmail.com
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Pull dma-mapping fixes from Christoph Hellwig:
"Two fixes for regressions in this merge window:
- select the Kconfig symbols for the noncoherent dma arch helpers on
arm if swiotlb is selected, not just for LPAE to not break then Xen
build, that uses swiotlb indirectly through swiotlb-xen
- fix the page allocator fallback in dma_alloc_contiguous if the CMA
allocation fails"
* tag 'dma-mapping-5.3-5' of git://git.infradead.org/users/hch/dma-mapping:
dma-direct: fix zone selection after an unaddressable CMA allocation
arm: select the dma-noncoherent symbols for all swiotlb builds
The code like this:
ptr = kmalloc(size, GFP_KERNEL);
page = virt_to_page(ptr);
offset = offset_in_page(ptr);
kfree(page_address(page) + offset);
may produce false-positive invalid-free reports on the kernel with
CONFIG_KASAN_SW_TAGS=y.
In the example above we lose the original tag assigned to 'ptr', so
kfree() gets the pointer with 0xFF tag. In kfree() we check that 0xFF
tag is different from the tag in shadow hence print false report.
Instead of just comparing tags, do the following:
1) Check that shadow doesn't contain KASAN_TAG_INVALID. Otherwise it's
double-free and it doesn't matter what tag the pointer have.
2) If pointer tag is different from 0xFF, make sure that tag in the
shadow is the same as in the pointer.
Link: http://lkml.kernel.org/r/20190819172540.19581-1-aryabinin@virtuozzo.com
Fixes: 7f94ffbc4c ("kasan: add hooks implementation for tag-based mode")
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reported-by: Walter Wu <walter-zh.wu@mediatek.com>
Reported-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In zs_destroy_pool() we call flush_work(&pool->free_work). However, we
have no guarantee that migration isn't happening in the background at
that time.
Since migration can't directly free pages, it relies on free_work being
scheduled to free the pages. But there's nothing preventing an
in-progress migrate from queuing the work *after*
zs_unregister_migration() has called flush_work(). Which would mean
pages still pointing at the inode when we free it.
Since we know at destroy time all objects should be free, no new
migrations can come in (since zs_page_isolate() fails for fully-free
zspages). This means it is sufficient to track a "# isolated zspages"
count by class, and have the destroy logic ensure all such pages have
drained before proceeding. Keeping that state under the class spinlock
keeps the logic straightforward.
In this case a memory leak could lead to an eventual crash if compaction
hits the leaked page. This crash would only occur if people are
changing their zswap backend at runtime (which eventually starts
destruction).
Link: http://lkml.kernel.org/r/20190809181751.219326-2-henryburns@google.com
Fixes: 48b4800a1c ("zsmalloc: page migration support")
Signed-off-by: Henry Burns <henryburns@google.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Henry Burns <henrywolfeburns@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Jonathan Adams <jwadams@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In zs_page_migrate() we call putback_zspage() after we have finished
migrating all pages in this zspage. However, the return value is
ignored. If a zs_free() races in between zs_page_isolate() and
zs_page_migrate(), freeing the last object in the zspage,
putback_zspage() will leave the page in ZS_EMPTY for potentially an
unbounded amount of time.
To fix this, we need to do the same thing as zs_page_putback() does:
schedule free_work to occur.
To avoid duplicated code, move the sequence to a new
putback_zspage_deferred() function which both zs_page_migrate() and
zs_page_putback() call.
Link: http://lkml.kernel.org/r/20190809181751.219326-1-henryburns@google.com
Fixes: 48b4800a1c ("zsmalloc: page migration support")
Signed-off-by: Henry Burns <henryburns@google.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Henry Burns <henrywolfeburns@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Jonathan Adams <jwadams@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Similar to vmstats, percpu caching of local vmevents leads to an
accumulation of errors on non-leaf levels. This happens because some
leftovers may remain in percpu caches, so that they are never propagated
up by the cgroup tree and just disappear into nonexistence with on
releasing of the memory cgroup.
To fix this issue let's accumulate and propagate percpu vmevents values
before releasing the memory cgroup similar to what we're doing with
vmstats.
Since on cpu hotplug we do flush percpu vmstats anyway, we can iterate
only over online cpus.
Link: http://lkml.kernel.org/r/20190819202338.363363-4-guro@fb.com
Fixes: 42a3003535 ("mm: memcontrol: fix recursive statistics correctness & scalabilty")
Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Percpu caching of local vmstats with the conditional propagation by the
cgroup tree leads to an accumulation of errors on non-leaf levels.
Let's imagine two nested memory cgroups A and A/B. Say, a process
belonging to A/B allocates 100 pagecache pages on the CPU 0. The percpu
cache will spill 3 times, so that 32*3=96 pages will be accounted to A/B
and A atomic vmstat counters, 4 pages will remain in the percpu cache.
Imagine A/B is nearby memory.max, so that every following allocation
triggers a direct reclaim on the local CPU. Say, each such attempt will
free 16 pages on a new cpu. That means every percpu cache will have -16
pages, except the first one, which will have 4 - 16 = -12. A/B and A
atomic counters will not be touched at all.
Now a user removes A/B. All percpu caches are freed and corresponding
vmstat numbers are forgotten. A has 96 pages more than expected.
As memory cgroups are created and destroyed, errors do accumulate. Even
1-2 pages differences can accumulate into large numbers.
To fix this issue let's accumulate and propagate percpu vmstat values
before releasing the memory cgroup. At this point these numbers are
stable and cannot be changed.
Since on cpu hotplug we do flush percpu vmstats anyway, we can iterate
only over online cpus.
Link: http://lkml.kernel.org/r/20190819202338.363363-2-guro@fb.com
Fixes: 42a3003535 ("mm: memcontrol: fix recursive statistics correctness & scalabilty")
Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 0cfaee2af3 ("include/asm-generic/5level-fixup.h: fix variable
'p4d' set but not used") converted a few functions from macros to static
inline, which causes parisc to complain,
In file included from include/asm-generic/4level-fixup.h:38:0,
from arch/parisc/include/asm/pgtable.h:5,
from arch/parisc/include/asm/io.h:6,
from include/linux/io.h:13,
from sound/core/memory.c:9:
include/asm-generic/5level-fixup.h:14:18: error: unknown type name 'pgd_t'; did you mean 'pid_t'?
#define p4d_t pgd_t
^
include/asm-generic/5level-fixup.h:24:28: note: in expansion of macro 'p4d_t'
static inline int p4d_none(p4d_t p4d)
^~~~~
It is because "4level-fixup.h" is included before "asm/page.h" where
"pgd_t" is defined.
Link: http://lkml.kernel.org/r/20190815205305.1382-1-cai@lca.pw
Fixes: 0cfaee2af3 ("include/asm-generic/5level-fixup.h: fix variable 'p4d' set but not used")
Signed-off-by: Qian Cai <cai@lca.pw>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
After commit 907ec5fca3 ("mm: zero remaining unavailable struct
pages"), struct page of reserved memory is zeroed. This causes
page->flags to be 0 and fixes issues related to reading
/proc/kpageflags, for example, of reserved memory.
The VM_BUG_ON() in move_freepages_block(), however, assumes that
page_zone() is meaningful even for reserved memory. That assumption is
no longer true after the aforementioned commit.
There's no reason why move_freepages_block() should be testing the
legitimacy of page_zone() for reserved memory; its scope is limited only
to pages on the zone's freelist.
Note that pfn_valid() can be true for reserved memory: there is a
backing struct page. The check for page_to_nid(page) is also buggy but
reserved memory normally only appears on node 0 so the zeroing doesn't
affect this.
Move the debug checks to after verifying PageBuddy is true. This
isolates the scope of the checks to only be for buddy pages which are on
the zone's freelist which move_freepages_block() is operating on. In
this case, an incorrect node or zone is a bug worthy of being warned
about (and the examination of struct page is acceptable bcause this
memory is not reserved).
Why does move_freepages_block() gets called on reserved memory? It's
simply math after finding a valid free page from the per-zone free area
to use as fallback. We find the beginning and end of the pageblock of
the valid page and that can bring us into memory that was reserved per
the e820. pfn_valid() is still true (it's backed by a struct page), but
since it's zero'd we shouldn't make any inferences here about comparing
its node or zone. The current node check just happens to succeed most
of the time by luck because reserved memory typically appears on node 0.
The fix here is to validate that we actually have buddy pages before
testing if there's any type of zone or node strangeness going on.
We noticed it almost immediately after bringing 907ec5fca3 in on
CONFIG_DEBUG_VM builds. It depends on finding specific free pages in
the per-zone free area where the math in move_freepages() will bring the
start or end pfn into reserved memory and wanting to claim that entire
pageblock as a new migratetype. So the path will be rare, require
CONFIG_DEBUG_VM, and require fallback to a different migratetype.
Some struct pages were already zeroed from reserve pages before
907ec5fca3c so it theoretically could trigger before this commit. I
think it's rare enough under a config option that most people don't run
that others may not have noticed. I wouldn't argue against a stable tag
and the backport should be easy enough, but probably wouldn't single out
a commit that this is fixing.
Mel said:
: The overhead of the debugging check is higher with this patch although
: it'll only affect debug builds and the path is not particularly hot.
: If this was a concern, I think it would be reasonable to simply remove
: the debugging check as the zone boundaries are checked in
: move_freepages_block and we never expect a zone/node to be smaller than
: a pageblock and stuck in the middle of another zone.
Link: http://lkml.kernel.org/r/alpine.DEB.2.21.1908122036560.10779@chino.kir.corp.google.com
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
>From IB specific 7.6.5 SERVICE LEVEL, Service Level (SL)
is used to identify different flows within an IBA subnet.
It is carried in the local route header of the packet.
Before this commit, run "rds-info -I". The outputs are as
below:
"
RDS IB Connections:
LocalAddr RemoteAddr Tos SL LocalDev RemoteDev
192.2.95.3 192.2.95.1 2 0 fe80::21:28:1a:39 fe80::21:28:10:b9
192.2.95.3 192.2.95.1 1 0 fe80::21:28:1a:39 fe80::21:28:10:b9
192.2.95.3 192.2.95.1 0 0 fe80::21:28:1a:39 fe80::21:28:10:b9
"
After this commit, the output is as below:
"
RDS IB Connections:
LocalAddr RemoteAddr Tos SL LocalDev RemoteDev
192.2.95.3 192.2.95.1 2 2 fe80::21:28:1a:39 fe80::21:28:10:b9
192.2.95.3 192.2.95.1 1 1 fe80::21:28:1a:39 fe80::21:28:10:b9
192.2.95.3 192.2.95.1 0 0 fe80::21:28:1a:39 fe80::21:28:10:b9
"
The commit fe3475af3b ("net: rds: add per rds connection cache
statistics") adds cache_allocs in struct rds_info_rdma_connection
as below:
struct rds_info_rdma_connection {
...
__u32 rdma_mr_max;
__u32 rdma_mr_size;
__u8 tos;
__u32 cache_allocs;
};
The peer struct in rds-tools of struct rds_info_rdma_connection is as
below:
struct rds_info_rdma_connection {
...
uint32_t rdma_mr_max;
uint32_t rdma_mr_size;
uint8_t tos;
uint8_t sl;
uint32_t cache_allocs;
};
The difference between userspace and kernel is the member variable sl.
In the kernel struct, the member variable sl is missing. This will
introduce risks. So it is necessary to use this commit to avoid this risk.
Fixes: fe3475af3b ("net: rds: add per rds connection cache statistics")
CC: Joe Jin <joe.jin@oracle.com>
CC: JUNXIAO_BI <junxiao.bi@oracle.com>
Suggested-by: Gerd Rausch <gerd.rausch@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
An excerpt from netlink(7) man page,
In multipart messages (multiple nlmsghdr headers with associated payload
in one byte stream) the first and all following headers have the
NLM_F_MULTI flag set, except for the last header which has the type
NLMSG_DONE.
but, after (ee28906) there is a missing NLM_F_MULTI flag in the middle of a
FIB dump. The result is user space applications following above man page
excerpt may get confused and may stop parsing msg believing something went
wrong.
In the golang netlink lib [0] the library logic stops parsing believing the
message is not a multipart message. Found this running Cilium[1] against
net-next while adding a feature to auto-detect routes. I noticed with
multiple route tables we no longer could detect the default routes on net
tree kernels because the library logic was not returning them.
Fix this by handling the fib_dump_info_fnhe() case the same way the
fib_dump_info() handles it by passing the flags argument through the
call chain and adding a flags argument to rt_fill_info().
Tested with Cilium stack and auto-detection of routes works again. Also
annotated libs to dump netlink msgs and inspected NLM_F_MULTI and
NLMSG_DONE flags look correct after this.
Note: In inet_rtm_getroute() pass rt_fill_info() '0' for flags the same
as is done for fib_dump_info() so this looks correct to me.
[0] https://github.com/vishvananda/netlink/
[1] https://github.com/cilium/
Fixes: ee28906fd7 ("ipv4: Dump route exceptions if requested")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit d4c08afafa ("s390/qeth: streamline SNMP cmd code") removed
the bounds checking for req_len, under the assumption that the check in
qeth_alloc_cmd() would suffice.
But that code path isn't sufficiently robust to handle a user-provided
data_length, which could overflow (when adding the cmd header overhead)
before being checked against QETH_BUFSIZE. We end up allocating just a
tiny iob, and the subsequent copy_from_user() writes past the end of
that iob.
Special-case this path and add a coarse bounds check, to protect against
maliciuous requests. This let's the subsequent code flow do its normal
job and precise checking, without risk of overflow.
Fixes: d4c08afafa ("s390/qeth: streamline SNMP cmd code")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If protocols registered exceeded PROTO_INUSE_NR, prot will be
added to proto_list, but no available bit left for prot in
proto_inuse_idx.
Changes since v2:
* Propagate the error code properly
Signed-off-by: zhanglin <zhang.lin16@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Saeed Mahameed says:
====================
Mellanox, mlx5 fixes 2019-08-22
This series introduces some fixes to mlx5 driver.
1) Form Moshe, two fixes for firmware health reporter
2) From Eran, two ktls fixes.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell king maintains phylink, as part of the SFP module support.
However, much of the review work is about drivers swapping from phylib
to phylink. Such changes don't make changes to the phylink core, and
so the F: rules in MAINTAINERS don't match. Add a K:, keywork rule,
which hopefully get_maintainers will match against for patches to MAC
drivers swapping to phylink.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
In decode_session{4,6} there is a possibility that the skb dst dev is NULL,
e,g, with tunnel collect_md mode, which will cause kernel crash.
Here is what the code path looks like, for GRE:
- ip6gre_tunnel_xmit
- ip6gre_xmit_ipv6
- __gre6_xmit
- ip6_tnl_xmit
- if skb->len - t->tun_hlen - eth_hlen > mtu; return -EMSGSIZE
- icmpv6_send
- icmpv6_route_lookup
- xfrm_decode_session_reverse
- decode_session4
- oif = skb_dst(skb)->dev->ifindex; <-- here
- decode_session6
- oif = skb_dst(skb)->dev->ifindex; <-- here
The reason is __metadata_dst_init() init dst->dev to NULL by default.
We could not fix it in __metadata_dst_init() as there is no dev supplied.
On the other hand, the skb_dst(skb)->dev is actually not needed as we
called decode_session{4,6} via xfrm_decode_session_reverse(), so oif is not
used by: fl4->flowi4_oif = reverse ? skb->skb_iif : oif;
So make a dst dev check here should be clean and safe.
v4: No changes.
v3: No changes.
v2: fix the issue in decode_session{4,6} instead of updating shared dst dev
in {ip_md, ip6}_tunnel_xmit.
Fixes: 8d79266bc4 ("ip6_tunnel: add collect_md mode to IPv6 tunnels")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Tested-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In __icmp_send() there is a possibility that the rt->dst.dev is NULL,
e,g, with tunnel collect_md mode, which will cause kernel crash.
Here is what the code path looks like, for GRE:
- ip6gre_tunnel_xmit
- ip6gre_xmit_ipv4
- __gre6_xmit
- ip6_tnl_xmit
- if skb->len - t->tun_hlen - eth_hlen > mtu; return -EMSGSIZE
- icmp_send
- net = dev_net(rt->dst.dev); <-- here
The reason is __metadata_dst_init() init dst->dev to NULL by default.
We could not fix it in __metadata_dst_init() as there is no dev supplied.
On the other hand, the reason we need rt->dst.dev is to get the net.
So we can just try get it from skb->dev when rt->dst.dev is NULL.
v4: Julian Anastasov remind skb->dev also could be NULL. We'd better
still use dst.dev and do a check to avoid crash.
v3: No changes.
v2: fix the issue in __icmp_send() instead of updating shared dst dev
in {ip_md, ip6}_tunnel_xmit.
Fixes: c8b34e680a ("ip_tunnel: Add tnl_update_pmtu in ip_md_tunnel_xmit")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull GPIO fixes from Linus Walleij:
"Here is a (hopefully last) set of GPIO fixes for the v5.3 kernel
cycle. Two are pretty core:
- Fix not reporting open drain/source lines to userspace as "input"
- Fix a minor build error found in randconfigs
- Fix a chip select quirk on the Freescale SPI
- Fix the irqchip initialization semantic order to reflect what it
was using the old API"
* tag 'gpio-v5.3-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpio: Fix irqchip initialization order
gpio: of: fix Freescale SPI CS quirk handling
gpio: Fix build error of function redefinition
gpiolib: never report open-drain/source lines as 'input' to user-space
Stefan Schmidt says:
====================
pull-request: ieee802154 for net 2019-08-24
An update from ieee802154 for your *net* tree.
Yue Haibing fixed two bugs discovered by KASAN in the hwsim driver for
ieee802154 and Colin Ian King cleaned up a redundant variable assignment.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull Hyper-V fixes from Sasha Levin:
- Fix for panics and network failures on PAE guests by Dexuan Cui.
- Fix of a memory leak (and related cleanups) in the hyper-v keyboard
driver by Dexuan Cui.
- Code cleanups for hyper-v clocksource driver during the merge window
by Dexuan Cui.
- Fix for a false positive warning in the userspace hyper-v KVP store
by Vitaly Kuznetsov.
* tag 'hyperv-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
Drivers: hv: vmbus: Fix virt_to_hvpfn() for X86_PAE
Tools: hv: kvp: eliminate 'may be used uninitialized' warning
Input: hyperv-keyboard: Use in-place iterator API in the channel callback
Drivers: hv: vmbus: Remove the unused "tsc_page" from struct hv_context
Pull arm64 fixes from Will Deacon:
"Two KVM/arm fixes for MMIO emulation and UBSAN.
Unusually, we're routing them via the arm64 tree as per Paolo's
request on the list:
https://lore.kernel.org/kvm/21ae69a2-2546-29d0-bff6-2ea825e3d968@redhat.com/
We don't actually have any other arm64 fixes pending at the moment
(touch wood), so I've pulled from Marc, written a merge commit, tagged
the result and run it through my build/boot/bisect scripts"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
KVM: arm/arm64: VGIC: Properly initialise private IRQ affinity
KVM: arm/arm64: Only skip MMIO insn once
Pull SCSI fixes from James Bottomley:
"Four fixes, three for edge conditions which don't occur very often.
The lpfc fix mitigates memory exhaustion for some high CPU systems"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: lpfc: Mitigate high memory pre-allocation by SCSI-MQ
scsi: ufs: Fix NULL pointer dereference in ufshcd_config_vreg_hpm()
scsi: target: tcmu: avoid use-after-free after command timeout
scsi: qla2xxx: Fix gnl.l memory leak on adapter init failure
Pull xfs fix from Darrick Wong:
"A single patch that fixes a xfs lockup problem when a chown/chgrp
operation fails due to running out of quota. It has survived the usual
xfstests runs and merges cleanly with this morning's master:
- Fix a forgotten inode unlock when chown/chgrp fail due to quota"
* tag 'xfs-5.3-fixes-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: fix missing ILOCK unlock when xfs_setattr_nonsize fails due to EDQUOT
Pull more drm fixes from Dave Airlie:
"Although the tree built for me fine on arm here, it appears either
header cleanups in next or some kconfig combo it breaks, so this
contains a fix to mediatek to include dma-mapping.h explicitly.
There was also one nouveau fix that came in late that I was going to
leave until next week, but since I was sending this I thought it may
as well be in here:
mediatek:
- fix build in some cases
nouveau:
- fix hang with i2c and mst docks"
* tag 'drm-fixes-2019-08-24' of git://anongit.freedesktop.org/drm/drm:
drm/mediatek: include dma-mapping header
drm/nouveau: Don't retry infinitely when receiving no data on i2c over AUX
Pull KVM/arm fixes from Marc Zyngier as per Paulo's request at:
https://lkml.kernel.org/r/21ae69a2-2546-29d0-bff6-2ea825e3d968@redhat.com
"One (hopefully last) set of fixes for KVM/arm for 5.3: an embarassing
MMIO emulation regression, and a UBSAN splat. Oh well...
- Don't overskip instructions on MMIO emulation
- Fix UBSAN splat when initializing PPI priorities"
* tag 'kvmarm-fixes-for-5.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm:
KVM: arm/arm64: VGIC: Properly initialise private IRQ affinity
KVM: arm/arm64: Only skip MMIO insn once
Although it builds fine here in my arm cross compile, it seems
either via some other patches in -next or some Kconfig combination,
this fails to build for everyone.
Include linux/dma-mapping.h should fix it.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Daniel Borkmann says:
====================
pull-request: bpf 2019-08-24
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Fix verifier precision tracking with BPF-to-BPF calls, from Alexei.
2) Fix a use-after-free in prog symbol exposure, from Daniel.
3) Several s390x JIT fixes plus BE related fixes in BPF kselftests, from Ilya.
4) Fix memory leak by unpinning XDP umem pages in error path, from Ivan.
5) Fix a potential use-after-free on flow dissector detach, from Jakub.
6) Fix bpftool to close prog fd after showing metadata, from Quentin.
7) BPF kselftest config and TEST_PROGS_EXTENDED fixes, from Anders.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
test_select_reuseport fails on s390 due to verifier rejecting
test_select_reuseport_kern.o with the following message:
; data_check.eth_protocol = reuse_md->eth_protocol;
18: (69) r1 = *(u16 *)(r6 +22)
invalid bpf_context access off=22 size=2
This is because on big-endian machines casts from __u32 to __u16 are
generated by referencing the respective variable as __u16 with an offset
of 2 (as opposed to 0 on little-endian machines).
The verifier already has all the infrastructure in place to allow such
accesses, it's just that they are not explicitly enabled for
eth_protocol field. Enable them for eth_protocol field by using
bpf_ctx_range instead of offsetof.
Ditto for ip_protocol, bind_inany and len, since they already allow
narrowing, and the same problem can arise when working with them.
Fixes: 2dbb9b9e6d ("bpf: Introduce BPF_PROG_TYPE_SK_REUSEPORT")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
syzkaller managed to trigger the warning in bpf_jit_free() which checks via
bpf_prog_kallsyms_verify_off() for potentially unlinked JITed BPF progs
in kallsyms, and subsequently trips over GPF when walking kallsyms entries:
[...]
8021q: adding VLAN 0 to HW filter on device batadv0
8021q: adding VLAN 0 to HW filter on device batadv0
WARNING: CPU: 0 PID: 9869 at kernel/bpf/core.c:810 bpf_jit_free+0x1e8/0x2a0
Kernel panic - not syncing: panic_on_warn set ...
CPU: 0 PID: 9869 Comm: kworker/0:7 Not tainted 5.0.0-rc8+ #1
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: events bpf_prog_free_deferred
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x113/0x167 lib/dump_stack.c:113
panic+0x212/0x40b kernel/panic.c:214
__warn.cold.8+0x1b/0x38 kernel/panic.c:571
report_bug+0x1a4/0x200 lib/bug.c:186
fixup_bug arch/x86/kernel/traps.c:178 [inline]
do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271
do_invalid_op+0x36/0x40 arch/x86/kernel/traps.c:290
invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:973
RIP: 0010:bpf_jit_free+0x1e8/0x2a0
Code: 02 4c 89 e2 83 e2 07 38 d0 7f 08 84 c0 0f 85 86 00 00 00 48 ba 00 02 00 00 00 00 ad de 0f b6 43 02 49 39 d6 0f 84 5f fe ff ff <0f> 0b e9 58 fe ff ff 48 b8 00 00 00 00 00 fc ff df 4c 89 e2 48 c1
RSP: 0018:ffff888092f67cd8 EFLAGS: 00010202
RAX: 0000000000000007 RBX: ffffc90001947000 RCX: ffffffff816e9d88
RDX: dead000000000200 RSI: 0000000000000008 RDI: ffff88808769f7f0
RBP: ffff888092f67d00 R08: fffffbfff1394059 R09: fffffbfff1394058
R10: fffffbfff1394058 R11: ffffffff89ca02c7 R12: ffffc90001947002
R13: ffffc90001947020 R14: ffffffff881eca80 R15: ffff88808769f7e8
BUG: unable to handle kernel paging request at fffffbfff400d000
#PF error: [normal kernel read fault]
PGD 21ffee067 P4D 21ffee067 PUD 21ffed067 PMD 9f942067 PTE 0
Oops: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 9869 Comm: kworker/0:7 Not tainted 5.0.0-rc8+ #1
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: events bpf_prog_free_deferred
RIP: 0010:bpf_get_prog_addr_region kernel/bpf/core.c:495 [inline]
RIP: 0010:bpf_tree_comp kernel/bpf/core.c:558 [inline]
RIP: 0010:__lt_find include/linux/rbtree_latch.h:115 [inline]
RIP: 0010:latch_tree_find include/linux/rbtree_latch.h:208 [inline]
RIP: 0010:bpf_prog_kallsyms_find+0x107/0x2e0 kernel/bpf/core.c:632
Code: 00 f0 ff ff 44 38 c8 7f 08 84 c0 0f 85 fa 00 00 00 41 f6 45 02 01 75 02 0f 0b 48 39 da 0f 82 92 00 00 00 48 89 d8 48 c1 e8 03 <42> 0f b6 04 30 84 c0 74 08 3c 03 0f 8e 45 01 00 00 8b 03 48 c1 e0
[...]
Upon further debugging, it turns out that whenever we trigger this
issue, the kallsyms removal in bpf_prog_ksym_node_del() was /skipped/
but yet bpf_jit_free() reported that the entry is /in use/.
Problem is that symbol exposure via bpf_prog_kallsyms_add() but also
perf_event_bpf_event() were done /after/ bpf_prog_new_fd(). Once the
fd is exposed to the public, a parallel close request came in right
before we attempted to do the bpf_prog_kallsyms_add().
Given at this time the prog reference count is one, we start to rip
everything underneath us via bpf_prog_release() -> bpf_prog_put().
The memory is eventually released via deferred free, so we're seeing
that bpf_jit_free() has a kallsym entry because we added it from
bpf_prog_load() but /after/ bpf_prog_put() from the remote CPU.
Therefore, move both notifications /before/ we install the fd. The
issue was never seen between bpf_prog_alloc_id() and bpf_prog_new_fd()
because upon bpf_prog_get_fd_by_id() we'll take another reference to
the BPF prog, so we're still holding the original reference from the
bpf_prog_load().
Fixes: 6ee52e2a3f ("perf, bpf: Introduce PERF_RECORD_BPF_EVENT")
Fixes: 74451e66d5 ("bpf: make jited programs visible in traces")
Reported-by: syzbot+bd3bba6ff3fcea7a6ec6@syzkaller.appspotmail.com
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Song Liu <songliubraving@fb.com>
While adding extra tests for precision tracking and extra infra
to adjust verifier heuristics the existing test
"calls: cross frame pruning - liveness propagation" started to fail.
The root cause is the same as described in verifer.c comment:
* Also if parent's curframe > frame where backtracking started,
* the verifier need to mark registers in both frames, otherwise callees
* may incorrectly prune callers. This is similar to
* commit 7640ead939 ("bpf: verifier: make sure callees don't prune with caller differences")
* For now backtracking falls back into conservative marking.
Turned out though that returning -ENOTSUPP from backtrack_insn() and
doing mark_all_scalars_precise() in the current parentage chain is not enough.
Depending on how is_state_visited() heuristic is creating parentage chain
it's possible that callee will incorrectly prune caller.
Fix the issue by setting precise=true earlier and more aggressively.
Before this fix the precision tracking _within_ functions that don't do
bpf2bpf calls would still work. Whereas now precision tracking is completely
disabled when bpf2bpf calls are present anywhere in the program.
No difference in cilium tests (they don't have bpf2bpf calls).
No difference in test_progs though some of them have bpf2bpf calls,
but precision tracking wasn't effective there.
Fixes: b5dc0163d8 ("bpf: precise scalar_value tracking")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Call to bpf_prog_put(), with help of call_rcu(), queues an RCU-callback to
free the program once a grace period has elapsed. The callback can run
together with new RCU readers that started after the last grace period.
New RCU readers can potentially see the "old" to-be-freed or already-freed
pointer to the program object before the RCU update-side NULLs it.
Reorder the operations so that the RCU update-side resets the protected
pointer before the end of the grace period after which the program will be
freed.
Fixes: d58e468b11 ("flow_dissector: implements flow dissector BPF hook")
Reported-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: Petar Penkov <ppenkov@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This reverts commit f072218cca.
As reported by Aaro this patch causes network problems on
MIPS Loongson platform. Therefore revert it.
Fixes: f072218cca ("r8169: remove not needed call to dma_sync_single_for_device")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reported-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull rdma fixes from Doug Ledford:
"No beating around the bush: this is a monster pull request for an -rc5
kernel. Intel hit me with a series of fixes for TID processing.
Mellanox hit me with a series for their UMR memory support.
And we had one fix for siw that fixes the 32bit build warnings and
because of the number of casts that had to be changed to properly
silence the warnings, that one patch alone is a full 40% of the LOC of
this entire pull request. Given that this is the initial release
kernel for siw, I'm trying to fix anything in it that we can, so that
adds to the impetus to take fixes for it like this one.
I had to do a rebase early in the week. Jason had thought he put a
patch on the rc queue that he needed to be there so he could base some
work off of it, and it had actually not been placed there. So he asked
me (on Tuesday) to fix that up before pushing my wip branch to the
official rc branch. I did, and that's why the early patches look like
they were all committed at the same time on Tuesday. That bunch had
been in my queue prior.
The various patches all pass my test for being legitimate fixes and
not attempts to slide new features or development into a late rc.
Well, they were all fixes with the exception of a couple clean up
patches people wrote for making the fixes they also wrote better (like
a cleanup patch to move UMR checking into a function so that the
remaining UMR fix patches can reference that function), so I left
those in place too.
My apologies for the LOC count and the number of patches here, it's
just how the cards fell this cycle.
Summary:
- Fix siw buffer mapping issue
- Fix siw 32/64 casting issues
- Fix a KASAN access issue in bnxt_re
- Fix several memory leaks (hfi1, mlx4)
- Fix a NULL deref in cma_cleanup
- Fixes for UMR memory support in mlx5 (4 patch series)
- Fix namespace check for restrack
- Fixes for counter support
- Fixes for hfi1 TID processing (5 patch series)
- Fix potential NULL deref in siw
- Fix memory page calculations in mlx5"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (21 commits)
RDMA/siw: Fix 64/32bit pointer inconsistency
RDMA/siw: Fix SGL mapping issues
RDMA/bnxt_re: Fix stack-out-of-bounds in bnxt_qplib_rcfw_send_message
infiniband: hfi1: fix memory leaks
infiniband: hfi1: fix a memory leak bug
IB/mlx4: Fix memory leaks
RDMA/cma: fix null-ptr-deref Read in cma_cleanup
IB/mlx5: Block MR WR if UMR is not possible
IB/mlx5: Fix MR re-registration flow to use UMR properly
IB/mlx5: Report and handle ODP support properly
IB/mlx5: Consolidate use_umr checks into single function
RDMA/restrack: Rewrite PID namespace check to be reliable
RDMA/counters: Properly implement PID checks
IB/core: Fix NULL pointer dereference when bind QP to counter
IB/hfi1: Drop stale TID RDMA packets that cause TIDErr
IB/hfi1: Add additional checks when handling TID RDMA WRITE DATA packet
IB/hfi1: Add additional checks when handling TID RDMA READ RESP packet
IB/hfi1: Unsafe PSN checking for TID RDMA READ Resp packet
IB/hfi1: Drop stale TID RDMA packets
RDMA/siw: Fix potential NULL de-ref
...
Currently, ipv6_find_idev returns NULL when ipv6_add_dev fails,
ignoring the specific error value. This results in addrconf_add_dev
returning ENOBUFS in all cases, which is unfortunate in cases such as:
# ip link add dummyX type dummy
# ip link set dummyX mtu 1200 up
# ip addr add 2000::/64 dev dummyX
RTNETLINK answers: No buffer space available
Commit a317a2f19d ("ipv6: fail early when creating netdev named all
or default") introduced error returns in ipv6_add_dev. Before that,
that function would simply return NULL for all failures.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull block fixes from Jens Axboe:
"Here's a set of fixes that should go into this release. This contains:
- Three minor fixes for NVMe.
- Three minor tweaks for the io_uring polling logic.
- Officially mark Song as the MD maintainer, after he's been filling
that role sucessfully for the last 6 months or so"
* tag 'for-linus-20190823' of git://git.kernel.dk/linux-block:
io_uring: add need_resched() check in inner poll loop
md: update MAINTAINERS info
io_uring: don't enter poll loop if we have CQEs pending
nvme: Add quirk for LiteON CL1 devices running FW 22301111
nvme: Fix cntlid validation when not using NVMEoF
nvme-multipath: fix possible I/O hang when paths are updated
io_uring: fix potential hang with polled IO
Pull device mapper fixes from Mike Snitzer:
- Revert a DM bufio change from during the 5.3 merge window now that a
proper fix has been made to the block loopback driver.
- Fix DM kcopyd to wakeup so failed subjobs get completed.
- Various fixes to DM zoned target to address error handling, and other
small tweaks (SPDX license identifiers and fix typos).
- Fix DM integrity range locking race by tracking whether journal has
changed.
- Fix DM dust target to detect reads of badblocks beyond the first 512b
sector (applicable if blocksize is larger than 512b).
- Fix DM persistent-data issue in both the DM btree and DM
space-map-metadata interfaces.
- Fix out of bounds memory access with certain DM table configurations.
* tag 'for-5.3/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm table: fix invalid memory accesses with too high sector number
dm space map metadata: fix missing store of apply_bops() return value
dm btree: fix order of block initialization in btree_split_beneath
dm raid: add missing cleanup in raid_ctr()
dm zoned: fix potential NULL dereference in dmz_do_reclaim()
dm dust: use dust block size for badblocklist index
dm integrity: fix a crash due to BUG_ON in __journal_read_write()
dm zoned: fix a few typos
dm zoned: add SPDX license identifiers
dm zoned: properly handle backing device failure
dm zoned: improve error handling in i/o map code
dm zoned: improve error handling in reclaim
dm kcopyd: always complete failed jobs
Revert "dm bufio: fix deadlock with loop device"
Pull xfs fixes from Darrick Wong:
"Here are a few more bug fixes that trickled in since the last pull.
They've survived the usual xfstests runs and merge cleanly with this
morning's master.
I expect there to be one more pull request tomorrow for the fix to
that quota related inode unlock bug that we were reviewing last night,
but it will continue to soak in the testing machine for several more
hours.
- Fix missing compat ioctl handling for get/setlabel
- Fix missing ioctl pointer sanitization on s390
- Fix a page locking deadlock in the dedupe comparison code
- Fix inadequate locking in reflink code w.r.t. concurrent directio
- Fix broken error detection when breaking layouts"
* tag 'xfs-5.3-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
fs/xfs: Fix return code of xfs_break_leased_layouts()
xfs: fix reflink source file racing with directio writes
vfs: fix page locking deadlocks when deduping files
xfs: compat_ioctl: use compat_ptr()
xfs: fall back to native ioctls for unhandled compat ones
At the moment we initialise the target *mask* of a virtual IRQ to the
VCPU it belongs to, even though this mask is only defined for GICv2 and
quickly runs out of bits for many GICv3 guests.
This behaviour triggers an UBSAN complaint for more than 32 VCPUs:
------
[ 5659.462377] UBSAN: Undefined behaviour in virt/kvm/arm/vgic/vgic-init.c:223:21
[ 5659.471689] shift exponent 32 is too large for 32-bit type 'unsigned int'
------
Also for GICv3 guests the reporting of TARGET in the "vgic-state" debugfs
dump is wrong, due to this very same problem.
Because there is no requirement to create the VGIC device before the
VCPUs (and QEMU actually does it the other way round), we can't safely
initialise mpidr or targets in kvm_vgic_vcpu_init(). But since we touch
every private IRQ for each VCPU anyway later (in vgic_init()), we can
just move the initialisation of those fields into there, where we
definitely know the VGIC type.
On the way make sure we really have either a VGICv2 or a VGICv3 device,
since the existing code is just checking for "VGICv3 or not", silently
ignoring the uninitialised case.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reported-by: Dave Martin <dave.martin@arm.com>
Tested-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Pull modules fixes from Jessica Yu:
"Fix BUG_ON() being triggered in frob_text() due to non-page-aligned
module sections"
* tag 'modules-for-v5.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
modules: page-align module section allocations only for arches supporting strict module rwx
modules: always page-align module section allocations
Multiple batadv_ogm2_packet can be stored in an skbuff. The functions
batadv_v_ogm_send_to_if() uses batadv_v_ogm_aggr_packet() to check if there
is another additional batadv_ogm2_packet in the skb or not before they
continue processing the packet.
The length for such an OGM2 is BATADV_OGM2_HLEN +
batadv_ogm2_packet->tvlv_len. The check must first check that at least
BATADV_OGM2_HLEN bytes are available before it accesses tvlv_len (which is
part of the header. Otherwise it might try read outside of the currently
available skbuff to get the content of tvlv_len.
Fixes: 9323158ef9 ("batman-adv: OGMv2 - implement originators logic")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Multiple batadv_ogm_packet can be stored in an skbuff. The functions
batadv_iv_ogm_send_to_if()/batadv_iv_ogm_receive() use
batadv_iv_ogm_aggr_packet() to check if there is another additional
batadv_ogm_packet in the skb or not before they continue processing the
packet.
The length for such an OGM is BATADV_OGM_HLEN +
batadv_ogm_packet->tvlv_len. The check must first check that at least
BATADV_OGM_HLEN bytes are available before it accesses tvlv_len (which is
part of the header. Otherwise it might try read outside of the currently
available skbuff to get the content of tvlv_len.
Fixes: ef26157747 ("batman-adv: tvlv - basic infrastructure")
Reported-by: syzbot+355cab184197dbbfa384@syzkaller.appspotmail.com
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Acked-by: Antonio Quartulli <a@unstable.cc>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Pull ceph fixes from Ilya Dryomov:
"Three important fixes tagged for stable (an indefinite hang, a crash
on an assert and a NULL pointer dereference) plus a small series from
Luis fixing instances of vfree() under spinlock"
* tag 'ceph-for-5.3-rc6' of git://github.com/ceph/ceph-client:
libceph: fix PG split vs OSD (re)connect race
ceph: don't try fill file_lock on unsuccessful GETFILELOCK reply
ceph: clear page dirty before invalidate page
ceph: fix buffer free while holding i_ceph_lock in fill_inode()
ceph: fix buffer free while holding i_ceph_lock in __ceph_build_xattrs_blob()
ceph: fix buffer free while holding i_ceph_lock in __ceph_setxattr()
libceph: allow ceph_buffer_put() to receive a NULL ceph_buffer
Pull drm fixes from Dave Airlie:
"Live from the laundromat after my washing machine broke down, we have
the 5.3-rc6 fixes. Changelog is in the tag below, but nothing too
noteworthy in here:
rcar-du:
- LVDS dual-link mode fix
mediatek:
- of node refcount fix
- prime buffer import fix
- dma max seg fix
komeda:
- output polling fix
- abfc format fix
- memory-region DT fix
amdgpu:
- bpc display fix
- ioctl memory leak fix
- gfxoff fix
- smu warnings fix
i915:
- HDMI mode readout fix"
* tag 'drm-fixes-2019-08-23' of git://anongit.freedesktop.org/drm/drm:
drm/amdgpu/powerplay: silence a warning in smu_v11_0_setup_pptable
drm/amd/display: Calculate bpc based on max_requested_bpc
drm/amdgpu: prevent memory leaks in AMDGPU_CS ioctl
drm/amd/amdgpu: disable MMHUB PG for navi10
drm/amd/powerplay: remove duplicate macro smu_get_uclk_dpm_states in amdgpu_smu.h
drm/amd/powerplay: fix variable type errors in smu_v11_0_setup_pptable
drm/amdgpu/gfx9: update pg_flags after determining if gfx off is possible
drm/i915: Fix HW readout for crtc_clock in HDMI mode
drm/mediatek: mtk_drm_drv.c: Add of_node_put() before goto
drm: rcar_lvds: Fix dual link mode operations
drm/mediatek: set DMA max segment size
drm/mediatek: use correct device to import PRIME buffers
drm/omap: ensure we have a valid dma_mask
drm/komeda: Add support for 'memory-region' DT node property
drm/komeda: Adds internal bpp computing for arm afbc only format YU08 YU10
drm/komeda: Initialize and enable output polling on Komeda
Use 'lea' instead of 'add' when adjusting %rsp in CALL_NOSPEC so as to
avoid clobbering flags.
KVM's emulator makes indirect calls into a jump table of sorts, where
the destination of the CALL_NOSPEC is a small blob of code that performs
fast emulation by executing the target instruction with fixed operands.
adcb_al_dl:
0x000339f8 <+0>: adc %dl,%al
0x000339fa <+2>: ret
A major motiviation for doing fast emulation is to leverage the CPU to
handle consumption and manipulation of arithmetic flags, i.e. RFLAGS is
both an input and output to the target of CALL_NOSPEC. Clobbering flags
results in all sorts of incorrect emulation, e.g. Jcc instructions often
take the wrong path. Sans the nops...
asm("push %[flags]; popf; " CALL_NOSPEC " ; pushf; pop %[flags]\n"
0x0003595a <+58>: mov 0xc0(%ebx),%eax
0x00035960 <+64>: mov 0x60(%ebx),%edx
0x00035963 <+67>: mov 0x90(%ebx),%ecx
0x00035969 <+73>: push %edi
0x0003596a <+74>: popf
0x0003596b <+75>: call *%esi
0x000359a0 <+128>: pushf
0x000359a1 <+129>: pop %edi
0x000359a2 <+130>: mov %eax,0xc0(%ebx)
0x000359b1 <+145>: mov %edx,0x60(%ebx)
ctxt->eflags = (ctxt->eflags & ~EFLAGS_MASK) | (flags & EFLAGS_MASK);
0x000359a8 <+136>: mov -0x10(%ebp),%eax
0x000359ab <+139>: and $0x8d5,%edi
0x000359b4 <+148>: and $0xfffff72a,%eax
0x000359b9 <+153>: or %eax,%edi
0x000359bd <+157>: mov %edi,0x4(%ebx)
For the most part this has gone unnoticed as emulation of guest code
that can trigger fast emulation is effectively limited to MMIO when
running on modern hardware, and MMIO is rarely, if ever, accessed by
instructions that affect or consume flags.
Breakage is almost instantaneous when running with unrestricted guest
disabled, in which case KVM must emulate all instructions when the guest
has invalid state, e.g. when the guest is in Big Real Mode during early
BIOS.
Fixes: 776b043848fd2 ("x86/retpoline: Add initial retpoline support")
Fixes: 1a29b5b7f3 ("KVM: x86: Make indirect calls in emulator speculation safe")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20190822211122.27579-1-sean.j.christopherson@intel.com
If the sector number is too high, dm_table_find_target() should return a
pointer to a zeroed dm_target structure (the caller should test it with
dm_target_is_valid).
However, for some table sizes, the code in dm_table_find_target() that
performs btree lookup will access out of bound memory structures.
Fix this bug by testing the sector number at the beginning of
dm_table_find_target(). Also, add an "inline" keyword to the function
dm_table_get_size() because this is a hot path.
Fixes: 512875bd96 ("dm: table detect io beyond device")
Cc: stable@vger.kernel.org
Reported-by: Zhang Tao <kontais@zoho.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Fixed two -Wunused-but-set-variable warnings:
/arm/linux/display/aosp-4.14-drm-next/drivers/gpu/drm/arm/display/komeda/komeda_kms.c: In function ‘komeda_crtc_normalize_zpos’:
/arm/linux/display/aosp-4.14-drm-next/drivers/gpu/drm/arm/display/komeda/komeda_kms.c:150:26: warning: variable ‘fb’ set but not used [-Wunused-but-set-variable]
struct drm_framebuffer *fb;
^~
/arm/linux/display/aosp-4.14-drm-next/drivers/gpu/drm/arm/display/komeda/komeda_kms.c: In function ‘komeda_kms_check’:
/arm/linux/display/aosp-4.14-drm-next/drivers/gpu/drm/arm/display/komeda/komeda_kms.c:209:25: warning: variable ‘old_crtc_st’ set but not used [-Wunused-but-set-variable]
struct drm_crtc_state *old_crtc_st, *new_crtc_st;
^~~~~~~~~~~
Signed-off-by: james qian wang (Arm Technology China) <james.qian.wang@arm.com>
Reviewed-by: Ayan Kumar Halder <ayan.halder@arm.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190812112322.15990-1-james.qian.wang@arm.com
In the commit ef41b5c924 ("ARM: make kernel oops easier to read"),
- .word 0xe92d0000 >> 10 @ stmfd sp!, {}
+ .word 0xe92d0000 >> 11 @ stmfd sp!, {}
then the shift need to change to 11.
Signed-off-by: Lvqiang Huang <Lvqiang.Huang@unisoc.com>
Signed-off-by: Chunyan Zhang <zhang.lyra@gmail.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
A timing hazard exists when an early fork/exec thread begins
exiting and sets its mm pointer to NULL while a separate core
tries to update the section information.
This commit ensures that the mm pointer is not NULL before
setting its section parameters. The arguments provided by
commit 11ce4b33ae ("ARM: 8672/1: mm: remove tasklist locking
from update_sections_early()") are equally valid for not
requiring grabbing the task_lock around this check.
Fixes: 08925c2f12 ("ARM: 8464/1: Update all mm structures with section adjustments")
Signed-off-by: Doug Berger <opendmb@gmail.com>
Acked-by: Laura Abbott <labbott@redhat.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Rob Herring <robh@kernel.org>
Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
Cc: Peng Fan <peng.fan@nxp.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
The new API for registering a gpio_irq_chip along with a
gpio_chip has a different semantic ordering than the old
API which added the irqchip explicitly after registering
the gpio_chip.
Move the calls to add the gpio_irq_chip *last* in the
function, so that the different hooks setting up OF and
ACPI and machine gpio_chips are called *before* we try
to register the interrupts, preserving the elder semantic
order.
This cropped up in the PL061 driver which used to work
fine with no special ACPI quirks, but started to misbehave
using the new API.
Fixes: e0d8972898 ("gpio: Implement tighter IRQ chip integration")
Cc: Thierry Reding <treding@nvidia.com>
Cc: Grygorii Strashko <grygorii.strashko@ti.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Reported-by: Wei Xu <xuwei5@hisilicon.com>
Tested-by: Wei Xu <xuwei5@hisilicon.com>
Reported-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20190820080527.11796-1-linus.walleij@linaro.org
qxl has two modes: "native" (used by the drm driver) and "vga" (vga
compatibility mode, typically used for boot display and firmware
framebuffers).
Accessing any vga ioport will switch the qxl device into vga mode.
The qxl driver never does that, but other drivers accessing vga ports
can trigger that too and therefore disturb qxl operation. So aquire
the legacy vga ioports from vgaarb to avoid that.
Reproducer: Boot kvm guest with both qxl and i915 vgpu, with qxl being
first in pci scan order.
v2: Skip this for secondary qxl cards which don't have vga mode in the
first place (Frediano).
Cc: Frediano Ziglio <fziglio@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20190805105401.29874-1-kraxel@redhat.com
Benjamin Moody reported to Debian that XFS partially wedges when a chgrp
fails on account of being out of disk quota. I ran his reproducer
script:
# adduser dummy
# adduser dummy plugdev
# dd if=/dev/zero bs=1M count=100 of=test.img
# mkfs.xfs test.img
# mount -t xfs -o gquota test.img /mnt
# mkdir -p /mnt/dummy
# chown -c dummy /mnt/dummy
# xfs_quota -xc 'limit -g bsoft=100k bhard=100k plugdev' /mnt
(and then as user dummy)
$ dd if=/dev/urandom bs=1M count=50 of=/mnt/dummy/foo
$ chgrp plugdev /mnt/dummy/foo
and saw:
================================================
WARNING: lock held when returning to user space!
5.3.0-rc5 #rc5 Tainted: G W
------------------------------------------------
chgrp/47006 is leaving the kernel with locks still held!
1 lock held by chgrp/47006:
#0: 000000006664ea2d (&xfs_nondir_ilock_class){++++}, at: xfs_ilock+0xd2/0x290 [xfs]
...which is clearly caused by xfs_setattr_nonsize failing to unlock the
ILOCK after the xfs_qm_vop_chown_reserve call fails. Add the missing
unlock.
Reported-by: benjamin.moody@gmail.com
Fixes: 253f4911f2 ("xfs: better xfs_trans_alloc interface")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Tested-by: Salvatore Bonaccorso <carnil@debian.org>
While I had thought I had fixed this issue in:
commit 342406e4fb ("drm/nouveau/i2c: Disable i2c bus access after
->fini()")
It turns out that while I did fix the error messages I was seeing on my
P50 when trying to access i2c busses with the GPU in runtime suspend, I
accidentally had missed one important detail that was mentioned on the
bug report this commit was supposed to fix: that the CPU would only lock
up when trying to access i2c busses _on connected devices_ _while the
GPU is not in runtime suspend_. Whoops. That definitely explains why I
was not able to get my machine to hang with i2c bus interactions until
now, as plugging my P50 into it's dock with an HDMI monitor connected
allowed me to finally reproduce this locally.
Now that I have managed to reproduce this issue properly, it looks like
the problem is much simpler then it looks. It turns out that some
connected devices, such as MST laptop docks, will actually ACK i2c reads
even if no data was actually read:
[ 275.063043] nouveau 0000:01:00.0: i2c: aux 000a: 1: 0000004c 1
[ 275.063447] nouveau 0000:01:00.0: i2c: aux 000a: 00 01101000 10040000
[ 275.063759] nouveau 0000:01:00.0: i2c: aux 000a: rd 00000001
[ 275.064024] nouveau 0000:01:00.0: i2c: aux 000a: rd 00000000
[ 275.064285] nouveau 0000:01:00.0: i2c: aux 000a: rd 00000000
[ 275.064594] nouveau 0000:01:00.0: i2c: aux 000a: rd 00000000
Because we don't handle the situation of i2c ack without any data, we
end up entering an infinite loop in nvkm_i2c_aux_i2c_xfer() since the
value of cnt always remains at 0. This finally properly explains how
this could result in a CPU hang like the ones observed in the
aforementioned commit.
So, fix this by retrying transactions if no data is written or received,
and give up and fail the transaction if we continue to not write or
receive any data after 32 retries.
Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
The request coming from Netlink should use the OEM generic handler.
The standard command handler expects payload in bytes/words/dwords
but the actual payload is stored in data if the request is coming from Netlink.
Signed-off-by: Justin Lee <justin.lee1@dell.com>
Reviewed-by: Vijay Khemka <vijaykhemka@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The VDSO update for CLOCK_BOOTTIME has a overflow issue as it shifts the
nanoseconds based boot time offset left by the clocksource shift. That
overflows once the boot time offset becomes large enough. As a consequence
CLOCK_BOOTTIME in the VDSO becomes a random number causing applications to
misbehave.
Fix it by storing a timespec64 representation of the offset when boot time
is adjusted and add that to the MONOTONIC base time value in the vdso data
page. Using the timespec64 representation avoids a 64bit division in the
update code.
Fixes: 44f57d788e ("timekeeping: Provide a generic update_vsyscall() implementation")
Reported-by: Chris Clayton <chris2553@googlemail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Chris Clayton <chris2553@googlemail.com>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1908221257580.1983@nanos.tec.linutronix.de
Kalle Valo says:
====================
wireless-drivers fixes for 5.3
Third set of fixes for 5.3, and most likely the last one. The rt2x00
regression has been reported multiple times, others are of lower
priority.
mt76
* fix hang on resume on certain machines
rt2x00
* fix AP mode regression related to encryption
iwlwifi
* avoid unnecessary error messages due to multicast frames when not
associated
* fix configuration for ax201 devices
* fix recognition of QuZ devices
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
If qed_mcp_send_drv_version() fails, no cleanup is executed, leading to
memory leaks. To fix this issue, introduce the label 'err4' to perform the
cleanup work before returning the error.
Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
Acked-by: Sudarsana Reddy Kalluru <skalluru@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The trap action should be copying the frame to CPU and
dropping it for forwarding, but current setting was just
copying frame to CPU.
Fixes: b596229448 ("net: mscc: ocelot: Add support for tcam")
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Acked-by: Allan W. Nielsen <allan.nielsen@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Unfortunately, my build fix for when time travel mode isn't
enabled broke time travel mode, because I forgot that we need
to use the timer time after the timer has been marked disabled,
and thus need to leave the time stored instead of zeroing it.
Fix that by splitting the inline into two, so we can call only
the _mode() one in the relevant code path.
Fixes: b482e48d29 ("um: fix build without CONFIG_UML_TIME_TRAVEL_SUPPORT")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
The outer poll loop checks for whether we need to reschedule, and
returns to userspace if we do. However, it's possible to get stuck
in the inner loop as well, if the CPU we are running on needs to
reschedule to finish the IO work.
Add the need_resched() check in the inner loop as well. This fixes
a potential hang if the kernel is configured with
CONFIG_PREEMPT_VOLUNTARY=y.
Reported-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Tested-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull PCI fixes from Bjorn Helgaas:
- Reset both NVIDIA GPU and HDA in ThinkPad P50 quirk, which was broken
by another quirk that enabled the HDA device (Lyude Paul)
- Fix pciebus-howto.rst documentation filename typo (Bjorn Helgaas)
* tag 'pci-v5.3-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
Documentation PCI: Fix pciebus-howto.rst filename typo
PCI: Reset both NVIDIA GPU and HDA in ThinkPad P50 workaround
Dump WQE shall not include Ethernet segment. Define mlx5e_dump_wqe to be
used for "Dump WQEs" instead of sharing it with the general mlx5e_tx_wqe
layout.
Fixes: d2ead1f360 ("net/mlx5e: Add kTLS TX HW offload support")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
For TLS WQEs, metadata info did not include num_bytes. Due to this issue,
tx_tls_dump_bytes counter did not increment.
Modify tx_fill_wi() to fill num bytes. When it is called for non-traffic
WQE, zero is expected.
Fixes: d2ead1f360 ("net/mlx5e: Add kTLS TX HW offload support")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When fw fatal error occurs, poll health() first detects and reports on a
fw error. Afterwards, it detects and reports on the fw fatal error
itself.
That can cause a long delay in fw fatal error handling which waits in a
queue for the fw error handling to be finished. The fw error handle will
try asking for fw core dump command while fw in fatal state may not
respond and driver will wait for command timeout.
Changing the flow to detect and handle first fw fatal errors and only if
no fatal error detected look for a fw error to handle.
Fixes: d1bf0e2cc4 ("net/mlx5: Report devlink health on FW issues")
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Crdump repeats itself every chunk of 256bytes.
That is due to bug of missing progressing offset while copying the data
from buffer to devlink_fmsg.
Fixes: 9b1f298236 ("net/mlx5: Add support for FW fatal reporter dump")
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
In commit 6096d91af0 ("dm space map metadata: fix occasional leak
of a metadata block on resize"), we refactor the commit logic to a new
function 'apply_bops'. But when that logic was replaced in out() the
return value was not stored. This may lead out() returning a wrong
value to the caller.
Fixes: 6096d91af0 ("dm space map metadata: fix occasional leak of a metadata block on resize")
Cc: stable@vger.kernel.org
Signed-off-by: ZhangXiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
When btree_split_beneath() splits a node to two new children, it will
allocate two blocks: left and right. If right block's allocation
failed, the left block will be unlocked and marked dirty. If this
happened, the left block'ss content is zero, because it wasn't
initialized with the btree struct before the attempot to allocate the
right block. Upon return, when flushing the left block to disk, the
validator will fail when check this block. Then a BUG_ON is raised.
Fix this by completely initializing the left block before allocating and
initializing the right block.
Fixes: 4dcb8b57df ("dm btree: fix leak of bufio-backed block in btree_split_beneath error path")
Cc: stable@vger.kernel.org
Signed-off-by: ZhangXiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Pull more fallthrough fixes from Gustavo A. R. Silva:
"Fix fall-through warnings on arm and mips for multiple configurations"
* tag 'Wimplicit-fallthrough-5.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux:
video: fbdev: acornfb: Mark expected switch fall-through
scsi: libsas: sas_discover: Mark expected switch fall-through
MIPS: Octeon: Mark expected switch fall-through
power: supply: ab8500_charger: Mark expected switch fall-through
watchdog: wdt285: Mark expected switch fall-through
mtd: sa1100: Mark expected switch fall-through
drm/sun4i: tcon: Mark expected switch fall-through
drm/sun4i: sun6i_mipi_dsi: Mark expected switch fall-through
ARM: riscpc: Mark expected switch fall-through
dmaengine: fsldma: Mark expected switch fall-through
Pull chrome platform fix from Benson Leung:
"Fix a kernel crash during suspend/resume of cros_ec_ishtp"
* tag 'tag-chrome-platform-fixes-for-v5.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux:
platform/chrome: cros_ec_ishtp: fix crash during suspend
Pull AFS fixes from David Howells:
- Fix a cell record leak due to the default error not being cleared.
- Fix an oops in tracepoint due to a pointer that may contain an error.
- Fix the ACL storage op for YFS where the wrong op definition is being
used. By luck, this only actually affects the information appearing
in traces.
* tag 'afs-fixes-20190822' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
afs: use correct afs_call_type in yfs_fs_store_opaque_acl2
afs: Fix possible oops in afs_lookup trace event
afs: Fix leak in afs_lookup_cell_rcu()
If the number of dirty pages to be written back is large,
then writeback_inodes_sb will block waiting for a long time,
causing hung task detection alarm. Therefore, we should limit
the maximum number of pages written back this time, which let
the budget be completed faster. The remaining dirty pages
tend to rely on the writeback mechanism to complete the
synchronization.
Fixes: b6e51316da ("writeback: separate starting of sync vs opportunistic writeback")
Signed-off-by: Liu Song <liu.song11@zte.com.cn>
Signed-off-by: Richard Weinberger <richard@nod.at>
Currently on a freshly mounted UBIFS, c->min_log_bytes is 0.
This can lead to a log overrun and make commits fail.
Recent kernels will report the following assert:
UBIFS assert failed: c->lhead_lnum != c->ltail_lnum, in fs/ubifs/log.c:412
c->min_log_bytes can have two states, 0 and c->leb_size.
It controls how much bytes of the log area are reserved for non-bud
nodes such as commit nodes.
After a commit it has to be set to c->leb_size such that we have always
enough space for a commit. While a commit runs it can be 0 to make the
remaining bytes of the log available to writers.
Having it set to 0 right after mount is wrong since no space for commits
is reserved.
Fixes: 1e51764a3c ("UBIFS: add new flash file system")
Reported-and-tested-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
We unlock after orphan_delete(), so no need to unlock
in the function too.
Reported-by: Han Xu <han.xu@nxp.com>
Fixes: 8009ce956c ("ubifs: Don't leak orphans on memory during commit")
Signed-off-by: Richard Weinberger <richard@nod.at>
Linux kernel tolerates C++ style comments these days. Actually, the
SPDX License tags for .c files start with //.
On the other hand, uapi headers are written in more strict C, where
the C++ comment style is forbidden.
I simply dropped these lines instead of fixing the comment style.
This code has been always commented out since it was added around
Linux 2.4.9 (i.e. commented out for more than 17 years).
'Maybe later...' will never happen.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Richard Weinberger <richard@nod.at>
All user level and most in-kernel applications submit WQEs
where the SG list entries are all of a single type.
iSER in particular, however, will send us WQEs with mixed SG
types: sge[0] = kernel buffer, sge[1] = PBL region.
Check and set is_kva on each SG entry individually instead of
assuming the first SGE type carries through to the last.
This fixes iSER over siw.
Fixes: b9be6f18cf ("rdma/siw: transmit path")
Reported-by: Krishnamraju Eraparaju <krishna2@chelsio.com>
Tested-by: Krishnamraju Eraparaju <krishna2@chelsio.com>
Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com>
Link: https://lore.kernel.org/r/20190822150741.21871-1-bmt@zurich.ibm.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
It seems that 'yfs_RXYFSStoreOpaqueACL2' should be use in
yfs_fs_store_opaque_acl2().
Fixes: f5e4546347 ("afs: Implement YFS ACL setting")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David Howells <dhowells@redhat.com>
The afs_lookup trace event can cause the following:
[ 216.576777] BUG: kernel NULL pointer dereference, address: 000000000000023b
[ 216.576803] #PF: supervisor read access in kernel mode
[ 216.576813] #PF: error_code(0x0000) - not-present page
...
[ 216.576913] RIP: 0010:trace_event_raw_event_afs_lookup+0x9e/0x1c0 [kafs]
If the inode from afs_do_lookup() is an error other than ENOENT, or if it
is ENOENT and afs_try_auto_mntpt() returns an error, the trace event will
try to dereference the error pointer as a valid pointer.
Use IS_ERR_OR_NULL to only pass a valid pointer for the trace, or NULL.
Ideally the trace would include the error value, but for now just avoid
the oops.
Fixes: 80548b0399 ("afs: Add more tracepoints")
Signed-off-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Fix a leak on the cell refcount in afs_lookup_cell_rcu() due to
non-clearance of the default error in the case a NULL cell name is passed
and the workstation default cell is used.
Also put a bit at the end to make sure we don't leak a cell ref if we're
going to be returning an error.
This leak results in an assertion like the following when the kafs module is
unloaded:
AFS: Assertion failed
2 == 1 is false
0x2 == 0x1 is false
------------[ cut here ]------------
kernel BUG at fs/afs/cell.c:770!
...
RIP: 0010:afs_manage_cells+0x220/0x42f [kafs]
...
process_one_work+0x4c2/0x82c
? pool_mayday_timeout+0x1e1/0x1e1
? do_raw_spin_lock+0x134/0x175
worker_thread+0x336/0x4a6
? rescuer_thread+0x4af/0x4af
kthread+0x1de/0x1ee
? kthread_park+0xd4/0xd4
ret_from_fork+0x24/0x30
Fixes: 989782dcdc ("afs: Overhaul cell database management")
Signed-off-by: David Howells <dhowells@redhat.com>
If after an MMIO exit to userspace a VCPU is immediately run with an
immediate_exit request, such as when a signal is delivered or an MMIO
emulation completion is needed, then the VCPU completes the MMIO
emulation and immediately returns to userspace. As the exit_reason
does not get changed from KVM_EXIT_MMIO in these cases we have to
be careful not to complete the MMIO emulation again, when the VCPU is
eventually run again, because the emulation does an instruction skip
(and doing too many skips would be a waste of guest code :-) We need
to use additional VCPU state to track if the emulation is complete.
As luck would have it, we already have 'mmio_needed', which even
appears to be used in this way by other architectures already.
Fixes: 0d640732db ("arm64: KVM: Skip MMIO insn after emulation")
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
HS200 is not implemented in the driver, but the controller claims it
through caps. Remove it via a quirk, to make sure the mmc core do not try
to enable HS200, as it causes the eMMC initialization to fail.
Signed-off-by: Eugen Hristev <eugen.hristev@microchip.com>
Acked-by: Ludovic Desroches <ludovic.desroches@microchip.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Fixes: bb5f8ea4d5 ("mmc: sdhci-of-at91: introduce driver for the Atmel SDMMC")
Cc: stable@vger.kernel.org # v4.4+
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
We can't rely on ->peer_features in calc_target() because it may be
called both when the OSD session is established and open and when it's
not. ->peer_features is not valid unless the OSD session is open. If
this happens on a PG split (pg_num increase), that could mean we don't
resend a request that should have been resent, hanging the client
indefinitely.
In userspace this was fixed by looking at require_osd_release and
get_xinfo[osd].features fields of the osdmap. However these fields
belong to the OSD section of the osdmap, which the kernel doesn't
decode (only the client section is decoded).
Instead, let's drop this feature check. It effectively checks for
luminous, so only pre-luminous OSDs would be affected in that on a PG
split the kernel might resend a request that should not have been
resent. Duplicates can occur in other scenarios, so both sides should
already be prepared for them: see dup/replay logic on the OSD side and
retry_attempt check on the client side.
Cc: stable@vger.kernel.org
Fixes: 7de030d6b1 ("libceph: resend on PG splits if OSD has RESEND_ON_SPLIT")
Link: https://tracker.ceph.com/issues/41162
Reported-by: Jerry Lee <leisurelysw24@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Tested-by: Jerry Lee <leisurelysw24@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
When ceph_mdsc_do_request returns an error, we can't assume that the
filelock_reply pointer will be set. Only try to fetch fields out of
the r_reply_info when it returns success.
Cc: stable@vger.kernel.org
Reported-by: Hector Martin <hector@marcansoft.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Calling ceph_buffer_put() in fill_inode() may result in freeing the
i_xattrs.blob buffer while holding the i_ceph_lock. This can be fixed by
postponing the call until later, when the lock is released.
The following backtrace was triggered by fstests generic/070.
BUG: sleeping function called from invalid context at mm/vmalloc.c:2283
in_atomic(): 1, irqs_disabled(): 0, pid: 3852, name: kworker/0:4
6 locks held by kworker/0:4/3852:
#0: 000000004270f6bb ((wq_completion)ceph-msgr){+.+.}, at: process_one_work+0x1b8/0x5f0
#1: 00000000eb420803 ((work_completion)(&(&con->work)->work)){+.+.}, at: process_one_work+0x1b8/0x5f0
#2: 00000000be1c53a4 (&s->s_mutex){+.+.}, at: dispatch+0x288/0x1476
#3: 00000000559cb958 (&mdsc->snap_rwsem){++++}, at: dispatch+0x2eb/0x1476
#4: 000000000d5ebbae (&req->r_fill_mutex){+.+.}, at: dispatch+0x2fc/0x1476
#5: 00000000a83d0514 (&(&ci->i_ceph_lock)->rlock){+.+.}, at: fill_inode.isra.0+0xf8/0xf70
CPU: 0 PID: 3852 Comm: kworker/0:4 Not tainted 5.2.0+ #441
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-prebuilt.qemu.org 04/01/2014
Workqueue: ceph-msgr ceph_con_workfn
Call Trace:
dump_stack+0x67/0x90
___might_sleep.cold+0x9f/0xb1
vfree+0x4b/0x60
ceph_buffer_release+0x1b/0x60
fill_inode.isra.0+0xa9b/0xf70
ceph_fill_trace+0x13b/0xc70
? dispatch+0x2eb/0x1476
dispatch+0x320/0x1476
? __mutex_unlock_slowpath+0x4d/0x2a0
ceph_con_workfn+0xc97/0x2ec0
? process_one_work+0x1b8/0x5f0
process_one_work+0x244/0x5f0
worker_thread+0x4d/0x3e0
kthread+0x105/0x140
? process_one_work+0x5f0/0x5f0
? kthread_park+0x90/0x90
ret_from_fork+0x3a/0x50
Signed-off-by: Luis Henriques <lhenriques@suse.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Calling ceph_buffer_put() in __ceph_build_xattrs_blob() may result in
freeing the i_xattrs.blob buffer while holding the i_ceph_lock. This can
be fixed by having this function returning the old blob buffer and have
the callers of this function freeing it when the lock is released.
The following backtrace was triggered by fstests generic/117.
BUG: sleeping function called from invalid context at mm/vmalloc.c:2283
in_atomic(): 1, irqs_disabled(): 0, pid: 649, name: fsstress
4 locks held by fsstress/649:
#0: 00000000a7478e7e (&type->s_umount_key#19){++++}, at: iterate_supers+0x77/0xf0
#1: 00000000f8de1423 (&(&ci->i_ceph_lock)->rlock){+.+.}, at: ceph_check_caps+0x7b/0xc60
#2: 00000000562f2b27 (&s->s_mutex){+.+.}, at: ceph_check_caps+0x3bd/0xc60
#3: 00000000f83ce16a (&mdsc->snap_rwsem){++++}, at: ceph_check_caps+0x3ed/0xc60
CPU: 1 PID: 649 Comm: fsstress Not tainted 5.2.0+ #439
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-prebuilt.qemu.org 04/01/2014
Call Trace:
dump_stack+0x67/0x90
___might_sleep.cold+0x9f/0xb1
vfree+0x4b/0x60
ceph_buffer_release+0x1b/0x60
__ceph_build_xattrs_blob+0x12b/0x170
__send_cap+0x302/0x540
? __lock_acquire+0x23c/0x1e40
? __mark_caps_flushing+0x15c/0x280
? _raw_spin_unlock+0x24/0x30
ceph_check_caps+0x5f0/0xc60
ceph_flush_dirty_caps+0x7c/0x150
? __ia32_sys_fdatasync+0x20/0x20
ceph_sync_fs+0x5a/0x130
iterate_supers+0x8f/0xf0
ksys_sync+0x4f/0xb0
__ia32_sys_sync+0xa/0x10
do_syscall_64+0x50/0x1c0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7fc6409ab617
Signed-off-by: Luis Henriques <lhenriques@suse.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Calling ceph_buffer_put() in __ceph_setxattr() may end up freeing the
i_xattrs.prealloc_blob buffer while holding the i_ceph_lock. This can be
fixed by postponing the call until later, when the lock is released.
The following backtrace was triggered by fstests generic/117.
BUG: sleeping function called from invalid context at mm/vmalloc.c:2283
in_atomic(): 1, irqs_disabled(): 0, pid: 650, name: fsstress
3 locks held by fsstress/650:
#0: 00000000870a0fe8 (sb_writers#8){.+.+}, at: mnt_want_write+0x20/0x50
#1: 00000000ba0c4c74 (&type->i_mutex_dir_key#6){++++}, at: vfs_setxattr+0x55/0xa0
#2: 000000008dfbb3f2 (&(&ci->i_ceph_lock)->rlock){+.+.}, at: __ceph_setxattr+0x297/0x810
CPU: 1 PID: 650 Comm: fsstress Not tainted 5.2.0+ #437
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-prebuilt.qemu.org 04/01/2014
Call Trace:
dump_stack+0x67/0x90
___might_sleep.cold+0x9f/0xb1
vfree+0x4b/0x60
ceph_buffer_release+0x1b/0x60
__ceph_setxattr+0x2b4/0x810
__vfs_setxattr+0x66/0x80
__vfs_setxattr_noperm+0x59/0xf0
vfs_setxattr+0x81/0xa0
setxattr+0x115/0x230
? filename_lookup+0xc9/0x140
? rcu_read_lock_sched_held+0x74/0x80
? rcu_sync_lockdep_assert+0x2e/0x60
? __sb_start_write+0x142/0x1a0
? mnt_want_write+0x20/0x50
path_setxattr+0xba/0xd0
__x64_sys_lsetxattr+0x24/0x30
do_syscall_64+0x50/0x1c0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7ff23514359a
Signed-off-by: Luis Henriques <lhenriques@suse.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
The bmControls (for UAC1) or bmMixerControls (for UAC2/3) bitmap has a
variable size depending on both input and output pins. Its size is to
fit with input * output bits. The problem is that the input size
can't be determined simply from the unit descriptor itself but it
needs to parse the whole connected sources. Although the
uac_mixer_unit_get_channels() tries to check some possible overflow of
this bitmap, it's incomplete due to the lack of the evaluation of
input pins.
For covering possible overflows, this patch adds the bitmap overflow
check in the loop of input pins in parse_audio_mixer_unit().
Fixes: 0bfe5e434e ("ALSA: usb-audio: Check mixer unit descriptors more strictly")
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
If a CCP is unconfigured (e.g. there are no available queues) then
there will be no data structures allocated for the device. Thus, we
must check for validity of a pointer before trying to access structure
members.
Fixes: 720419f018 ("crypto: ccp - Introduce the AMD Secure Processor device")
Cc: <stable@vger.kernel.org>
Signed-off-by: Gary R Hook <gary.hook@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
I have been reviewing patches for md in the past few months. Mark me
as the MD maintainer, as I have effectively been filling that role.
Cc: NeilBrown <neilb@suse.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
There is a copy and paste error so we have "rx" where "tx" was intended
in the priv->tx[] array.
Fixes: f5cedc84a3 ("gve: Add transmit and receive support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit 93a714d6b5 ("multicast: Extend ip address command to enable
multicast group join/leave on") we added a new flag IFA_F_MCAUTOJOIN
to make user able to add multicast address on ethernet interface.
This works for IPv4, but not for IPv6. See the inet6_addr_add code.
static int inet6_addr_add()
{
...
if (cfg->ifa_flags & IFA_F_MCAUTOJOIN) {
ipv6_mc_config(net->ipv6.mc_autojoin_sk, true...)
}
ifp = ipv6_add_addr(idev, cfg, true, extack); <- always fail with maddr
if (!IS_ERR(ifp)) {
...
} else if (cfg->ifa_flags & IFA_F_MCAUTOJOIN) {
ipv6_mc_config(net->ipv6.mc_autojoin_sk, false...)
}
}
But in ipv6_add_addr() it will check the address type and reject multicast
address directly. So this feature is never worked for IPv6.
We should not remove the multicast address check totally in ipv6_add_addr(),
but could accept multicast address only when IFA_F_MCAUTOJOIN flag supplied.
v2: update commit description
Fixes: 93a714d6b5 ("multicast: Extend ip address command to enable multicast group join/leave on")
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The SF2 binding does not specify that the CPU port should have
properties mandatory for successfully instantiating a PHYLINK object. As
such, there will be missing properties (including fixed-link) and when
attempting to validate and later configure link modes, we will have an
incorrect set of parameters (interface, speed, duplex).
Simply prevent the CPU port from being configured through PHYLINK since
bcm_sf2_imp_setup() takes care of that already.
Fixes: 0e27921816 ("net: dsa: Use PHYLINK for the CPU/DSA ports")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[Why]
The only place where state->max_bpc is updated on the connector is
at the start of atomic check during drm_atomic_connector_check. It
isn't updated when adding the connectors to the atomic state after
the fact. It also doesn't necessarily reflect the right value when
called in amdgpu during mode validation outside of atomic check.
This can cause the wrong bpc to be used even if the max_requested_bpc
is the correct value.
[How]
Don't rely on state->max_bpc reflecting the real bpc value and just
do the min(...) based on display info bpc and max_requested_bpc.
Fixes: 01933ba42d ("drm/amd/display: Use current connector state if NULL when checking bpc")
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Reviewed-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Error out if the AMDGPU_CS ioctl is called with multiple SYNCOBJ_OUT and/or
TIMELINE_SIGNAL chunks, since otherwise the last chunk wins while the
allocated array as well as the reference counts of sync objects are leaked.
Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
fix size type errors, from uint32_t to uint16_t.
it will cause only initializes the highest 16 bits in
smu_get_atom_data_table function.
bug report:
This fixes the following static checker warning.
drivers/gpu/drm/amd/amdgpu/../powerplay/smu_v11_0.c:390 smu_v11_0_setup_pptable()
warn: passing casted pointer '&size' to 'smu_get_atom_data_table()' 32 vs 16.
Signed-off-by: Kevin Wang <kevin1.wang@amd.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We need to set certain power gating flags after we determine
if the firmware version is sufficient to support gfxoff.
Previously we set the pg flags in early init, but we later
we might have disabled gfxoff if the firmware versions didn't
support it. Move adding the additional pg flags after we
determine whether or not to support gfxoff.
Fixes: 005440066f ("drm/amdgpu: enable gfxoff again on raven series (v2)")
Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Tested-by: Tom St Denis <tom.stdenis@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Kai-Heng Feng <kai.heng.feng@canonical.com>
Cc: stable@vger.kernel.org
In certain cases when the probe function fails the error path calls
cpsw_remove_dt() before calling platform_set_drvdata(). This is an
issue as cpsw_remove_dt() uses platform_get_drvdata() to retrieve the
cpsw_common data and leds to a NULL pointer exception. This patches
fixes it by calling platform_set_drvdata() earlier in the probe.
Fixes: 83a8471ba2 ("net: ethernet: ti: cpsw: refactor probe to group common hw initialization")
Reported-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Simon Wunderlich says:
====================
Here is a batman-adv bugfix:
- fix uninit-value in batadv_netlink_get_ifindex(), by Eric Dumazet
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace 'decided' with 'decide' so that comment would be
/* To decide when the network namespace should be freed. */
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johannes Berg says:
====================
Just three fixes:
* extended key ID key installation
* regulatory processing
* possible memory leak in an error path
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull KVM fixes from Paolo Bonzini:
"A couple bugfixes, and mostly selftests changes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
selftests/kvm: make platform_info_test pass on AMD
Revert "KVM: x86/mmu: Zap only the relevant pages when removing a memslot"
selftests: kvm: fix state save/load on processors without XSAVE
selftests: kvm: fix vmx_set_nested_state_test
selftests: kvm: provide common function to enable eVMCS
selftests: kvm: do not try running the VM in vmx_set_nested_state_test
KVM: x86: svm: remove redundant assignment of var new_entry
MAINTAINERS: add KVM x86 reviewers
MAINTAINERS: change list for KVM/s390
kvm: x86: skip populating logical dest map if apic is not sw enabled
I forgot to release the allocated object at the early error path in
line6_init_pcm(). For addressing it, slightly shuffle the code so
that the PCM destructor (pcm->private_free) is assigned properly
before all error paths.
Fixes: 3450121997 ("ALSA: line6: Fix write on zero-sized buffer")
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
test_msr_platform_info_disabled() generates EXIT_SHUTDOWN but VMCB state
is undefined after that so an attempt to launch this guest again from
test_msr_platform_info_enabled() fails. Reorder the tests to make test
pass.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Pull nfsd fixes from Bruce Fields:
"Fix nfsd bugs: three in the new nfsd/clients/ code, one in the reply
cache containerization"
* tag 'nfsd-5.3-1' of git://linux-nfs.org/~bfields/linux:
nfsd4: Fix kernel crash when reading proc file reply_cache_stats
nfsd: initialize i_private before d_add
nfsd: use i_wrlock instead of rcu for nfsdfs i_private
nfsd: fix dentry leak upon mkdir failure.
After _gadget_stop_activity is executed, we can consider the hardware
operation for gadget has finished, and the udc can be stopped and enter
low power mode. So, any later hardware operations (from usb_ep_ops APIs
or usb_gadget_ops APIs) should be considered invalid, any deinitializatons
has been covered at _gadget_stop_activity.
I meet this problem when I plug out usb cable from PC using mass_storage
gadget, my callstack like: vbus interrupt->.vbus_session->
composite_disconnect ->pm_runtime_put_sync(&_gadget->dev),
the composite_disconnect will call fsg_disable, but fsg_disable calls
usb_ep_disable using async way, there are register accesses for
usb_ep_disable. So sometimes, I get system hang due to visit register
without clock, sometimes not.
The Linux Kernel USB maintainer Alan Stern suggests this kinds of solution.
See: http://marc.info/?l=linux-usb&m=138541769810983&w=2.
Cc: <stable@vger.kernel.org> #v4.9+
Signed-off-by: Peter Chen <peter.chen@nxp.com>
Link: https://lore.kernel.org/r/20190820020503.27080-2-peter.chen@nxp.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If rs_prepare_reshape() fails, no cleanup is executed, leading to
leak of the raid_set structure allocated at the beginning of
raid_ctr(). To fix this issue, go to the label 'bad' if the error
occurs.
Fixes: 11e4723206 ("dm raid: stop keeping raid set frozen altogether")
Cc: stable@vger.kernel.org
Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
This function is supposed to return error pointers so it matches the
dmz_get_rnd_zone_for_reclaim() function. The current code could lead to
a NULL dereference in dmz_do_reclaim()
Fixes: b234c6d7a7 ("dm zoned: improve error handling in reclaim")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Change the "frontend" dust_remove_block, dust_add_block, and
dust_query_block functions to store the "dust block number", instead
of the sector number corresponding to the "dust block number".
For the "backend" functions dust_map_read and dust_map_write,
right-shift by sect_per_block_shift. This fixes the inability to
emulate failure beyond the first sector of each "dust block" (for
devices with a "dust block size" larger than 512 bytes).
Fixes: e4f3fabd67 ("dm: add dust target")
Cc: stable@vger.kernel.org
Signed-off-by: Bryan Gurney <bgurney@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
When ./test_xdp_vlan_mode_generic.sh runs it complains that it can't
find file test_xdp_vlan.sh.
# selftests: bpf: test_xdp_vlan_mode_generic.sh
# ./test_xdp_vlan_mode_generic.sh: line 9: ./test_xdp_vlan.sh: No such
file or directory
Rework so that test_xdp_vlan.sh gets installed, added to the variable
TEST_PROGS_EXTENDED.
Fixes: d35661fcf9 ("selftests/bpf: add wrapper scripts for test_xdp_vlan.sh")
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Acked-by: Jesper Dangaard Brouer <jbrouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
When running test_kmod.sh the following shows up
# sysctl cannot stat /proc/sys/net/core/bpf_jit_enable No such file or directory
cannot: stat_/proc/sys/net/core/bpf_jit_enable #
# sysctl cannot stat /proc/sys/net/core/bpf_jit_harden No such file or directory
cannot: stat_/proc/sys/net/core/bpf_jit_harden #
Rework to enable CONFIG_BPF_JIT to solve "No such file or directory"
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
test_btf_dump fails when run with O=, because it needs to access source
files and assumes they live in ./progs/, which is not the case in this
scenario.
Fix by instructing kselftest to copy btf_dump_test_case_*.c files to the
test directory. Since kselftest does not preserve directory structure,
adjust the test to look in ./progs/ and then in ./.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
test_cgroup_storage fails on s390 with an assertion failure: packets are
dropped when they shouldn't. The problem is that BPF_DW packet count is
accessed as BPF_W with an offset of 0, which is not correct on
big-endian machines.
Since the point of this test is not to verify narrow loads/stores,
simply use BPF_DW when working with packet counts.
Fixes: 68cfa3ac6b ("selftests/bpf: add a cgroup storage test")
Fixes: 919646d2a3 ("selftests/bpf: extend the storage test to test per-cpu cgroup storage")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The WRITE_PROTECT bit is always in a "protected mode" on Tegra and
WP-GPIO state need to be used instead. In a case of the GPIO absence,
write-enable should be assumed. External SD is writable once again as
a result of this patch because the offending commit changed behaviour for
the case of a missing WP-GPIO to fall back to WRITE_PROTECT bit-checking,
which is incorrect for Tegra.
Cc: stable@vger.kernel.org # v5.1+
Fixes: e8391453e2 ("mmc: sdhci-tegra: drop ->get_ro() implementation")
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
We should keep the case of "#define debug_align(X) (X)" for all arches
without CONFIG_HAS_STRICT_MODULE_RWX ability, which would save people, who
are sensitive to system size, a lot of memory when using modules,
especially for embedded systems. This is also the intention of the
original #ifdef... statement and still valid for now.
Note that this still keeps the effect of the fix of the following commit,
38f054d549 ("modules: always page-align module section allocations"),
since when CONFIG_ARCH_HAS_STRICT_MODULE_RWX is enabled, module pages are
aligned.
Signed-off-by: He Zhe <zhe.he@windriver.com>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
This reverts commit 96cce12ff6 ("cfg80211: fix processing world
regdomain when non modular").
Re-triggering a reg_process_hint with the last request on all events,
can make the regulatory domain fail in case of multiple WiFi modules. On
slower boards (espacially with mdev), enumeration of the WiFi modules
can end up in an intersected regulatory domain, and user cannot set it
with 'iw reg set' anymore.
This is happening, because:
- 1st module enumerates, queues up a regulatory request
- request gets processed by __reg_process_hint_driver():
- checks if previous was set by CORE -> yes
- checks if regulator domain changed -> yes, from '00' to e.g. 'US'
-> sends request to the 'crda'
- 2nd module enumerates, queues up a regulator request (which triggers
the reg_todo() work)
- reg_todo() -> reg_process_pending_hints() sees, that the last request
is not processed yet, so it tries to process it again.
__reg_process_hint driver() will run again, and:
- checks if the last request's initiator was the core -> no, it was
the driver (1st WiFi module)
- checks, if the previous initiator was the driver -> yes
- checks if the regulator domain changed -> yes, it was '00' (set by
core, and crda call did not return yet), and should be changed to 'US'
------> __reg_process_hint_driver calls an intersect
Besides, the reg_process_hint call with the last request is meaningless
since the crda call has a timeout work. If that timeout expires, the
first module's request will lost.
Cc: stable@vger.kernel.org
Fixes: 96cce12ff6 ("cfg80211: fix processing world regdomain when non modular")
Signed-off-by: Robert Hodaszi <robert.hodaszi@digi.com>
Link: https://lore.kernel.org/r/20190614131600.GA13897@a1-hr
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This reverts commit 4e103134b8.
Alex Williamson reported regressions with device assignment with
this patch. Even though the bug is probably elsewhere and still
latent, this is needed to fix the regression.
Fixes: 4e103134b8 ("KVM: x86/mmu: Zap only the relevant pages when removing a memslot", 2019-02-05)
Reported-by: Alex Willamson <alex.williamson@redhat.com>
Cc: stable@vger.kernel.org
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
state_test and smm_test are failing on older processors that do not
have xcr0. This is because on those processor KVM does provide
support for KVM_GET/SET_XSAVE (to avoid having to rely on the older
KVM_GET/SET_FPU) but not for KVM_GET/SET_XCRS.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Fix two shortcomings in the Extended Key ID API:
1) Allow the userspace to install pairwise keys using keyid 1 without
NL80211_KEY_NO_TX set. This allows the userspace to install and
activate pairwise keys with keyid 1 in the same way as for keyid 0,
simplifying the API usage for e.g. FILS and FT key installs.
2) IEEE 802.11 - 2016 restricts Extended Key ID usage to CCMP/GCMP
ciphers in IEEE 802.11 - 2016 "9.4.2.25.4 RSN capabilities".
Enforce that when installing a key.
Cc: stable@vger.kernel.org # 5.2
Fixes: 6cdd3979a2 ("nl80211/cfg80211: Extended Key ID support")
Signed-off-by: Alexander Wetzel <alexander@wetzel-home.de>
Link: https://lore.kernel.org/r/20190805123400.51567-1-alexander@wetzel-home.de
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The quirk function snd_emuusb_set_samplerate() has a NULL check for
the mixer element, but this is useless in the current code. It used
to be a check against mixer->id_elems[unitid] but it was changed later
to the value after mixer_eleme_list_to_info() which is always non-NULL
due to the container_of() usage.
This patch fixes the check before the conversion.
While we're at it, correct a typo in the comment in the function,
too.
Fixes: 8c558076c7 ("ALSA: usb-audio: Clean up mixer element list traverse")
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Mark switch cases where we are expecting to fall through.
Fix the following warning (Building: rpc_defconfig arm):
drivers/video/fbdev/acornfb.c: In function ‘acornfb_parse_dram’:
drivers/video/fbdev/acornfb.c:860:9: warning: this statement may fall through [-Wimplicit-fallthrough=]
size *= 1024;
~~~~~^~~~~~~
drivers/video/fbdev/acornfb.c:861:3: note: here
case 'K':
^~~~
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Mark switch cases where we are expecting to fall through.
Fix the following warning (Building: mtx1_defconfig mips):
drivers/scsi/libsas/sas_discover.c: In function ‘sas_discover_domain’:
./include/linux/printk.h:309:2: warning: this statement may fall through [-Wimplicit-fallthrough=]
printk(KERN_NOTICE pr_fmt(fmt), ##__VA_ARGS__)
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/scsi/libsas/sas_discover.c:459:3: note: in expansion of macro ‘pr_notice’
pr_notice("ATA device seen but CONFIG_SCSI_SAS_ATA=N so cannot attach\n");
^~~~~~~~~
drivers/scsi/libsas/sas_discover.c:462:2: note: here
default:
^~~~~~~
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Mark switch cases where we are expecting to fall through.
Fix the following warning (Building: cavium_octeon_defconfig mips):
arch/mips/include/asm/octeon/cvmx-sli-defs.h:47:6: warning: this statement
may fall through [-Wimplicit-fallthrough=]
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Mark switch cases where we are expecting to fall through.
Fix the following warning (Building: allmodconfig arm):
drivers/power/supply/ab8500_charger.c: In function ‘ab8500_charger_max_usb_curr’:
drivers/power/supply/ab8500_charger.c:738:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
if (di->vbus_detected) {
^
drivers/power/supply/ab8500_charger.c:745:2: note: here
case USB_STAT_HM_IDGND:
^~~~
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Mark switch cases where we are expecting to fall through.
Fix the following warning (Building: footbridge_defconfig arm):
drivers/watchdog/wdt285.c: In function ‘watchdog_ioctl’:
drivers/watchdog/wdt285.c:170:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
watchdog_ping();
^~~~~~~~~~~~~~~
drivers/watchdog/wdt285.c:172:2: note: here
case WDIOC_GETTIMEOUT:
^~~~
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Mark switch cases where we are expecting to fall through.
Fix the following warning (Building: assabet_defconfig arm):
drivers/mtd/maps/sa1100-flash.c: In function ‘sa1100_probe_subdev’:
drivers/mtd/maps/sa1100-flash.c:82:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
printk(KERN_WARNING "SA1100 flash: unknown base address "
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
"0x%08lx, assuming CS0\n", phys);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/mtd/maps/sa1100-flash.c:85:2: note: here
case SA1100_CS0_PHYS:
^~~~
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Mark switch cases where we are expecting to fall through.
Fix the following warning (Building: sunxi_defconfig arm):
drivers/gpu/drm/sun4i/sun4i_tcon.c: In function ‘sun4i_tcon0_mode_set_dithering’:
drivers/gpu/drm/sun4i/sun4i_tcon.c:318:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
val |= SUN4I_TCON0_FRM_CTL_MODE_B;
drivers/gpu/drm/sun4i/sun4i_tcon.c:319:2: note: here
case MEDIA_BUS_FMT_RGB666_1X18:
^~~~
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Mark switch cases where we are expecting to fall through.
Fix the following warning (Building: multi_v7_defconfig arm):
drivers/gpu/drm/sun4i/sun6i_mipi_dsi.c: In function ‘sun6i_dsi_transfer’:
drivers/gpu/drm/sun4i/sun6i_mipi_dsi.c:993:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
if (msg->rx_len == 1) {
^
drivers/gpu/drm/sun4i/sun6i_mipi_dsi.c:998:2: note: here
default:
^~~~~~~
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Mark switch cases where we are expecting to fall through.
Fix the following warning (Building: rpc_defconfig arm):
arch/arm/mach-rpc/riscpc.c: In function ‘parse_tag_acorn’:
arch/arm/mach-rpc/riscpc.c:48:13: warning: this statement may fall through [-Wimplicit-fallthrough=]
vram_size += PAGE_SIZE * 256;
~~~~~~~~~~^~~~~~~~~~~~~~~~~~
arch/arm/mach-rpc/riscpc.c:49:2: note: here
case 256:
^~~~
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Mark switch cases where we are expecting to fall through.
Fix the following warnings (Building: powerpc-ppa8548_defconfig powerpc):
drivers/dma/fsldma.c: In function ‘fsl_dma_chan_probe’:
drivers/dma/fsldma.c:1165:26: warning: this statement may fall through [-Wimplicit-fallthrough=]
chan->toggle_ext_pause = fsl_chan_toggle_ext_pause;
~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/dma/fsldma.c:1166:2: note: here
case FSL_DMA_IP_83XX:
^~~~
Reported-by: kbuild test robot <lkp@intel.com>
Acked-by: Li Yang <leoyang.li@nxp.com>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
The new dma_alloc_contiguous hides if we allocate CMA or regular
pages, and thus fails to retry a ZONE_NORMAL allocation if the CMA
allocation succeeds but isn't addressable. That means we either fail
outright or dip into a small zone that might not succeed either.
Thanks to Hillf Danton for debugging this issue.
Fixes: b1d2dc009d ("dma-contiguous: add dma_{alloc,free}_contiguous() helpers")
Reported-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
in ip_mc_inc_group, memory allocation flag, not mcast mode, is expected
by __ip_mc_inc_group
similar issue in __ip_mc_join_group, both mcase mode and gfp_t are needed
here, so use ____ip_mc_inc_group(...)
Fixes: 9fb20801da ("net: Fix ip_mc_{dec,inc}_group allocation context")
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The NCSI spec indicates that if the data does not end on a 32 bit
boundary, one to three padding bytes equal to 0x00 shall be present to
align the checksum field to a 32-bit boundary.
Signed-off-by: Terry S. Duncan <terry.s.duncan@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, we are only explicitly setting SOCK_NOSPACE on a write timeout
for non-blocking sockets. Epoll() edge-trigger mode relies on SOCK_NOSPACE
being set when -EAGAIN is returned to ensure that EPOLLOUT is raised.
Expand the setting of SOCK_NOSPACE to non-blocking sockets as well that can
use SO_SNDTIMEO to adjust their write timeout. This mirrors the behavior
that Eric Dumazet introduced for tcp sockets.
Signed-off-by: Jason Baron <jbaron@akamai.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Ursula Braun <ubraun@linux.ibm.com>
Cc: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull HID fixes from Jiri Kosina:
- a few regression fixes for wacom driver (including fix for my earlier
mismerge) from Aaron Armstrong Skomra and Jason Gerecke
- revert of a few Logitech device ID additions which turn out to not
work perfectly with the hidpp driver at the moment; proper support is
now scheduled for 5.4. Fixes from Benjamin Tissoires
- scheduling-in-atomic fix for cp2112 driver, from Benjamin Tissoires
- new device ID to intel-ish, from Even Xu
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
HID: wacom: correct misreported EKR ring values
HID: cp2112: prevent sleeping function called from invalid context
HID: intel-ish-hid: ipc: add EHL device id
HID: wacom: Correct distance scale for 2nd-gen Intuos devices
HID: logitech-hidpp: remove support for the G700 over USB
Revert "HID: logitech-hidpp: add USB PID for a few more supported mice"
HID: wacom: add back changes dropped in merge commit
ODP depends on the several device capabilities, among them is the ability
to send UMR WQEs with that modify atomic and entity size of the MR.
Therefore, only if all conditions to send such a UMR WQE are met then
driver can report that ODP is supported. Use this check of conditions
in all places where driver needs to know about ODP support.
Also, implicit ODP support depends on ability of driver to send UMR WQEs
for an indirect mkey. Therefore, verify that all conditions to do so are
met when reporting support.
Fixes: c8d75a980f ("IB/mlx5: Respect new UMR capabilities")
Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Guy Levi <guyle@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20190815083834.9245-7-leon@kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
task_active_pid_ns() is wrong API to check PID namespace because it
posses some restrictions and return PID namespace where the process
was allocated. It created mismatches with current namespace, which
can be different.
Rewrite whole rdma_is_visible_in_pid_ns() logic to provide reliable
results without any relation to allocated PID namespace.
Fixes: 8be565e65f ("RDMA/nldev: Factor out the PID namespace check")
Fixes: 6a6c306a09 ("RDMA/restrack: Make is_visible_in_pid_ns() as an API")
Reviewed-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20190815083834.9245-4-leon@kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
In a congested fabric with adaptive routing enabled, traces show that
packets could be delivered out of order. A stale TID RDMA data packet
could lead to TidErr if the TID entries have been released by duplicate
data packets generated from retries, and subsequently erroneously force
the qp into error state in the current implementation.
Since the payload has already been dropped by hardware, the packet can
be simply dropped and it is no longer necessary to put the qp into
error state.
Fixes: 9905bf06e8 ("IB/hfi1: Add functions to receive TID RDMA READ response")
Cc: <stable@vger.kernel.org>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Link: https://lore.kernel.org/r/20190815192058.105923.72324.stgit@awfm-01.aw.intel.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
In a congested fabric with adaptive routing enabled, traces show that
the sender could receive stale TID RDMA NAK packets that contain newer
KDETH PSNs and older Verbs PSNs. If not dropped, these packets could
cause the incorrect rewinding of the software flows and the incorrect
completion of TID RDMA WRITE requests, and eventually leading to memory
corruption and kernel crash.
The current code drops stale TID RDMA ACK/NAK packets solely based
on KDETH PSNs, which may lead to erroneous processing. This patch
fixes the issue by also checking the Verbs PSN. Addition checks are
added before rewinding the TID RDMA WRITE DATA packets.
Fixes: 9e93e967f7 ("IB/hfi1: Add a function to receive TID RDMA ACK packet")
Cc: <stable@vger.kernel.org>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Link: https://lore.kernel.org/r/20190815192033.105923.44192.stgit@awfm-01.aw.intel.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
We need to check if we have CQEs pending before starting a poll loop,
as those could be the events we will be spinning for (and hence we'll
find none). This can happen if a CQE triggers an error, or if it is
found by eg an IRQ before we get a chance to find it through polling.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
One of the components in LiteON CL1 device has limitations that
can be encountered based upon boundary race conditions using the
nvme bus specific suspend to idle flow.
When this situation occurs the drive doesn't resume properly from
suspend-to-idle.
LiteON has confirmed this problem and fixed in the next firmware
version. As this firmware is already in the field, avoid running
nvme specific suspend to idle flow.
Fixes: d916b1be94 ("nvme-pci: use host managed power state for suspend")
Link: http://lists.infradead.org/pipermail/linux-nvme/2019-July/thread.html
Signed-off-by: Mario Limonciello <mario.limonciello@dell.com>
Signed-off-by: Charles Hyde <charles.hyde@dellteam.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Commit 1b1031ca63 ("nvme: validate cntlid during controller initialisation")
introduced a validation for controllers with duplicate cntlid that runs
on nvme_init_subsystem(). The problem is that the validation relies on
ctrl->cntlid, and this value is assigned (from id_ctrl value) after the
call for nvme_init_subsystem() in nvme_init_identify() for non-fabrics
scenario. That leads to ctrl->cntlid always being 0 in case we have a
physical set of controllers in the same subsystem.
This patch fixes that by loading the discovered cntlid id_ctrl value into
ctrl->cntlid before the subsystem initialization, only for the non-fabrics
case. The patch was tested with emulated nvme devices (qemu) having two
controllers in a single subsystem. Without the patch, we couldn't make
it work failing in the duplicate check; when running with the patch, we
could see the subsystem holding both controllers.
For the fabrics case we see ctrl->cntlid has a more intricate relation
with the admin connect, so we didn't change that.
Fixes: 1b1031ca63 ("nvme: validate cntlid during controller initialisation")
Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
nvme_state_set_live() making a path available triggers requeue_work
in order to resubmit requests that ended up on requeue_list when no
paths were available.
This requeue_work may race with concurrent nvme_ns_head_make_request()
that do not observe the live path yet.
Such concurrent requests may by made by either:
- New IO submission.
- Requeue_work triggered by nvme_failover_req() or another ana_work.
A race may cause requeue_work capture the state of requeue_list before
more requests get onto the list. These requests will stay on the list
forever unless requeue_work is triggered again.
In order to prevent such race, nvme_state_set_live() should
synchronize_srcu(&head->srcu) before triggering the requeue_work and
prevent nvme_ns_head_make_request referencing an old snapshot of the
path list.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anton Eidelman <anton@lightbitslabs.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If a request issue ends up being punted to async context to avoid
blocking, we can get into a situation where the original application
enters the poll loop for that very request before it has been issued.
This should not be an issue, except that the polling will hold the
io_uring uring_ctx mutex for the duration of the poll. When the async
worker has actually issued the request, it needs to acquire this mutex
to add the request to the poll issued list. Since the application
polling is already holding this mutex, the workqueue sleeps on the
mutex forever, and the application thus never gets a chance to poll for
the very request it was interested in.
Fix this by ensuring that the polling drops the uring_ctx occasionally
if it's not making any progress.
Reported-by: Jeffrey M. Birnbaum <jmbnyc@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In the case of X86_PAE, unsigned long is u32, but the physical address type
should be u64. Due to the bug here, the netvsc driver can not load
successfully, and sometimes the VM can panic due to memory corruption (the
hypervisor writes data to the wrong location).
Fixes: 6ba34171bc ("Drivers: hv: vmbus: Remove use of slow_virt_to_phys()")
Cc: stable@vger.kernel.org
Cc: Michael Kelley <mikelley@microsoft.com>
Reported-and-tested-by: Juliana Rodrigueiro <juliana.rodrigueiro@intra2net.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
When building hv_kvp_daemon GCC-8.3 complains:
hv_kvp_daemon.c: In function ‘kvp_get_ip_info.constprop’:
hv_kvp_daemon.c:812:30: warning: ‘ip_buffer’ may be used uninitialized in this function [-Wmaybe-uninitialized]
struct hv_kvp_ipaddr_value *ip_buffer;
this seems to be a false positive: we only use ip_buffer when
op == KVP_OP_GET_IP_INFO and it is only unset when op == KVP_OP_ENUMERATE.
Silence the warning by initializing ip_buffer to NULL.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Simplify the ring buffer handling with the in-place API.
Also avoid the dynamic allocation and the memory leak in the channel
callback function.
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
This field is no longer used after the commit
63ed4e0c67 ("Drivers: hv: vmbus: Consolidate all Hyper-V specific clocksource code")
, because it's replaced by the global variable
"struct ms_hyperv_tsc_page *tsc_pg;" (now, the variable is in
drivers/clocksource/hyperv_timer.c).
Fixes: 63ed4e0c67 ("Drivers: hv: vmbus: Consolidate all Hyper-V specific clocksource code")
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
If the HW revision of Qu devices we found is QuZ, then we need to
switch the configuration accordingly in order to use the correct FW.
Add a block of ifs in order do that.
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
We have a too generic condition that switches from Qu configurations
to QnJ configurations. We need to exclude some configurations so that
they are not erroneously switched. Add the ax201 configuration to the
list of exclusions.
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
The MAC context configuration always allowed multicast data frames
to pass to the driver for all MAC context types, and in the
case of station MAC context both when associated and when not
associated.
One of the outcomes of this configuration is having the FW forward
encrypted multicast frames to the driver with Rx status indicating
that the frame was not decrypted (as expected, since no keys were
configured yet) which in turn results with unnecessary error
messages.
Change this behavior to allow multicast data frames only when they
are actually expected, e.g., station MAC context is associated etc.
Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
To do not brake HW restart we should keep initialization vectors data.
I assumed that on start the data is already initialized to zeros, but
that not true on some scenarios and we should clear it. So add
additional flag to check if we are under HW restart and clear IV's
data if we are not.
Patch fixes AP mode regression.
Reported-and-tested-by: Emil Karlson <jekarl@iki.fi>
Fixes: 710e6cc159 ("rt2800: do not nullify initialization vector data")
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
In ti_dra7_xbar_probe(), 'rsv_events' is allocated through kcalloc(). Then
of_property_read_u32_array() is invoked to search for the property.
However, if this process fails, 'rsv_events' is not deallocated, leading to
a memory leak bug. To fix this issue, free 'rsv_events' before returning
the error.
Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
Acked-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Link: https://lore.kernel.org/r/1565938136-7249-1-git-send-email-wenwen@cs.uga.edu
Signed-off-by: Vinod Koul <vkoul@kernel.org>
The static structures ht16k33_fb_fix and ht16k33_fb_var, of types
fb_fix_screeninfo and fb_var_screeninfo respectively, are not used
except to be copied into other variables. Hence make both of them
constant to prevent unintended modification.
Issue found with
Coccinelle.
Acked-by: Robin van der Gracht <robin@protonic.nl>
Signed-off-by: Nishka Dasgupta <nishkadg.linux@gmail.com>
Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
The EKR ring claims a range of 0 to 71 but actually reports
values 1 to 72. The ring is used in relative mode so this
change should not affect users.
Signed-off-by: Aaron Armstrong Skomra <aaron.skomra@wacom.com>
Fixes: 72b236d602 ("HID: wacom: Add support for Express Key Remote.")
Cc: <stable@vger.kernel.org> # v4.3+
Reviewed-by: Ping Cheng <ping.cheng@wacom.com>
Reviewed-by: Jason Gerecke <jason.gerecke@wacom.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
syzbot reported a splat:
xfrm_policy_inexact_list_reinsert+0x625/0x6e0 net/xfrm/xfrm_policy.c:877
CPU: 1 PID: 6756 Comm: syz-executor.1 Not tainted 5.3.0-rc2+ #57
Call Trace:
xfrm_policy_inexact_node_reinsert net/xfrm/xfrm_policy.c:922 [inline]
xfrm_policy_inexact_node_merge net/xfrm/xfrm_policy.c:958 [inline]
xfrm_policy_inexact_insert_node+0x537/0xb50 net/xfrm/xfrm_policy.c:1023
xfrm_policy_inexact_alloc_chain+0x62b/0xbd0 net/xfrm/xfrm_policy.c:1139
xfrm_policy_inexact_insert+0xe8/0x1540 net/xfrm/xfrm_policy.c:1182
xfrm_policy_insert+0xdf/0xce0 net/xfrm/xfrm_policy.c:1574
xfrm_add_policy+0x4cf/0x9b0 net/xfrm/xfrm_user.c:1670
xfrm_user_rcv_msg+0x46b/0x720 net/xfrm/xfrm_user.c:2676
netlink_rcv_skb+0x1f0/0x460 net/netlink/af_netlink.c:2477
xfrm_netlink_rcv+0x74/0x90 net/xfrm/xfrm_user.c:2684
netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
netlink_unicast+0x809/0x9a0 net/netlink/af_netlink.c:1328
netlink_sendmsg+0xa70/0xd30 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:637 [inline]
sock_sendmsg net/socket.c:657 [inline]
There is no reproducer, however, the warning can be reproduced
by adding rules with ever smaller prefixes.
The sanity check ("does the policy match the node") uses the prefix value
of the node before its updated to the smaller value.
To fix this, update the prefix earlier. The bug has no impact on tree
correctness, this is only to prevent a false warning.
Reported-by: syzbot+8cc27ace5f6972910b31@syzkaller.appspotmail.com
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
We need to provide the arch hooks for non-coherent dma-direct
and swiotlb for all swiotlb builds, not just when LPAS is enabled.
Without that the Xen build that selects SWIOTLB indirectly through
SWIOTLB_XEN fails to build.
Fixes: ad3c7b18c5 ("arm: use swiotlb for bounce buffering on LPAE configs")
Reported-by: Stefan Wahren <wahrenst@gmx.net>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Stefan Wahren <wahrenst@gmx.net>
When SCSI-MQ is enabled, the SCSI-MQ layers will do pre-allocation of MQ
resources based on shost values set by the driver. In newer cases of the
driver, which attempts to set nr_hw_queues to the cpu count, the
multipliers become excessive, with a single shost having SCSI-MQ
pre-allocation reaching into the multiple GBytes range. NPIV, which
creates additional shosts, only multiply this overhead. On lower-memory
systems, this can exhaust system memory very quickly, resulting in a system
crash or failures in the driver or elsewhere due to low memory conditions.
After testing several scenarios, the situation can be mitigated by limiting
the value set in shost->nr_hw_queues to 4. Although the shost values were
changed, the driver still had per-cpu hardware queues of its own that
allowed parallelization per-cpu. Testing revealed that even with the
smallish number for nr_hw_queues for SCSI-MQ, performance levels remained
near maximum with the within-driver affiinitization.
A module parameter was created to allow the value set for the nr_hw_queues
to be tunable.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The parens used in the while loop would result in error being assigned
the value 1 rather than the intended errno value.
This is required to return -ETXTBSY from follow on break_layout()
changes.
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Commit ba5ea61462 ("bridge: simplify ip_mc_check_igmp() and
ipv6_mc_check_mld() calls") replaces direct calls to pskb_may_pull()
in br_ipv6_multicast_mld2_report() with calls to ipv6_mc_may_pull(),
that returns -EINVAL on buffers too short to be valid IPv6 packets,
while maintaining the previous handling of the return code.
This leads to the direct opposite of the intended effect: if the
packet is malformed, -EINVAL evaluates as true, and we'll happily
proceed with the processing.
Return 0 if the packet is too short, in the same way as this was
fixed for IPv4 by commit 083b78a9ed ("ip: fix ip_mc_may_pull()
return value").
I don't have a reproducer for this, unlike the one referred to by
the IPv4 commit, but this is clearly broken.
Fixes: ba5ea61462 ("bridge: simplify ip_mc_check_igmp() and ipv6_mc_check_mld() calls")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Acked-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull clk fixes from Stephen Boyd:
"A couple fixes to the core framework logic that finds clk parents, a
handful of samsung clk driver fixes for audio and display clks, and a
small fix for the Stratix10 SoC driver that was checking the wrong
register for validity"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: Fix potential NULL dereference in clk_fetch_parent_index()
clk: Fix falling back to legacy parent string matching
clk: socfpga: stratix10: fix rate caclulationg for cnt_clks
clk: samsung: exynos542x: Move MSCL subsystem clocks to its sub-CMU
clk: samsung: exynos5800: Move MAU subsystem clocks to MAU sub-CMU
clk: samsung: Change signature of exynos5_subcmus_init() function
Pull kernel thread signal handling fix from Eric Biederman:
"I overlooked the fact that kernel threads are created with all signals
set to SIG_IGN, and accidentally caused a regression in cifs and drbd
when replacing force_sig with send_sig.
This is my fix for that regression. I add a new function
allow_kernel_signal which allows kernel threads to receive signals
sent from the kernel, but continues to ignore all signals sent from
userspace. This ensures the user space interface for cifs and drbd
remain the same.
These kernel threads depend on blocking networking calls which block
until something is received or a signal is pending. Making receiving
of signals somewhat necessary for these kernel threads.
Perhaps someday we can cleanup those interfaces and remove
allow_kernel_signal. If not allow_kernel_signal is pretty trivial and
clearly documents what is going on so I don't think we will mind
carrying it"
* 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
signal: Allow cifs and drbd to receive their terminating signals
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Remove IP MASQUERADING record in MAINTAINERS file,
from Denis Efremov.
2) Counter arguments are swapped in ebtables, from
Todd Seidelmann.
3) Missing netlink attribute validation in flow_offload
extension.
4) Incorrect alignment in xt_nfacct that breaks 32-bits
userspace / 64-bits kernels, from Juliana Rodrigueiro.
5) Missing include guard in nf_conntrack_h323_types.h,
from Masahiro Yamada.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
As Jason Baron explained in commit 790ba4566c ("tcp: set SOCK_NOSPACE
under memory pressure"), it is crucial we properly set SOCK_NOSPACE
when needed.
However, Jason patch had a bug, because the 'nonblocking' status
as far as sk_stream_wait_memory() is concerned is governed
by MSG_DONTWAIT flag passed at sendmsg() time :
long timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT);
So it is very possible that tcp sendmsg() calls sk_stream_wait_memory(),
and that sk_stream_wait_memory() returns -EAGAIN with SOCK_NOSPACE
cleared, if sk->sk_sndtimeo has been set to a small (but not zero)
value.
This patch removes the 'noblock' variable since we must always
set SOCK_NOSPACE if -EAGAIN is returned.
It also renames the do_nonblock label since we might reach this
code path even if we were in blocking mode.
Fixes: 790ba4566c ("tcp: set SOCK_NOSPACE under memory pressure")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jason Baron <jbaron@akamai.com>
Reported-by: Vladimir Rutsky <rutsky@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If alloc_descs() fails before irq_sysfs_init() has run, free_desc() in the
cleanup path will call kobject_del() even though the kobject has not been
added with kobject_add().
Fix this by making the call to kobject_del() conditional on whether
irq_sysfs_init() has run.
This problem surfaced because commit aa30f47cf6 ("kobject: Add support
for default attribute groups to kobj_type") makes kobject_del() stricter
about pairing with kobject_add(). If the pairing is incorrrect, a WARNING
and backtrace occur in sysfs_remove_group() because there is no parent.
[ tglx: Add a comment to the code and make it work with CONFIG_SYSFS=n ]
Fixes: ecb3f394c5 ("genirq: Expose interrupt information through sysfs")
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1564703564-4116-1-git-send-email-mikelley@microsoft.com
"enabled" parameter historically referred to the device input or
output, not to the led indicator. After the changes added with the led
helper functions the mic mute led logic refers to the led and not to
the mic input which caused led indicator to be negated.
Fixing logic in cxt_update_gpio_led and updated
cxt_fixup_gpio_mute_hook
Also updated debug messages to ease further debugging if necessary.
Fixes: 184e302b46 ("ALSA: hda/conexant - Use the mic-mute LED helper")
Suggested-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jeronimo Borque <jeronimo@borque.com.ar>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Pull networking fixes from David Miller:
1) Fix jmp to 1st instruction in x64 JIT, from Alexei Starovoitov.
2) Severl kTLS fixes in mlx5 driver, from Tariq Toukan.
3) Fix severe performance regression due to lack of SKB coalescing of
fragments during local delivery, from Guillaume Nault.
4) Error path memory leak in sch_taprio, from Ivan Khoronzhuk.
5) Fix batched events in skbedit packet action, from Roman Mashak.
6) Propagate VLAN TX offload to hw_enc_features in bond and team
drivers, from Yue Haibing.
7) RXRPC local endpoint refcounting fix and read after free in
rxrpc_queue_local(), from David Howells.
8) Fix endian bug in ibmveth multicast list handling, from Thomas
Falcon.
9) Oops, make nlmsg_parse() wrap around the correct function,
__nlmsg_parse not __nla_parse(). Fix from David Ahern.
10) Memleak in sctp_scend_reset_streams(), fro Zheng Bin.
11) Fix memory leak in cxgb4, from Wenwen Wang.
12) Yet another race in AF_PACKET, from Eric Dumazet.
13) Fix false detection of retransmit failures in tipc, from Tuong
Lien.
14) Use after free in ravb_tstamp_skb, from Tho Vu.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (101 commits)
ravb: Fix use-after-free ravb_tstamp_skb
netfilter: nf_tables: map basechain priority to hardware priority
net: sched: use major priority number as hardware priority
wimax/i2400m: fix a memory leak bug
net: cavium: fix driver name
ibmvnic: Unmap DMA address of TX descriptor buffers after use
bnxt_en: Fix to include flow direction in L2 key
bnxt_en: Use correct src_fid to determine direction of the flow
bnxt_en: Suppress HWRM errors for HWRM_NVM_GET_VARIABLE command
bnxt_en: Fix handling FRAG_ERR when NVM_INSTALL_UPDATE cmd fails
bnxt_en: Improve RX doorbell sequence.
bnxt_en: Fix VNIC clearing logic for 57500 chips.
net: kalmia: fix memory leaks
cx82310_eth: fix a memory leak bug
bnx2x: Fix VF's VLAN reconfiguration in reload.
Bluetooth: Add debug setting for changing minimum encryption key size
tipc: fix false detection of retransmit failures
lan78xx: Fix memory leaks
MAINTAINERS: r8169: Update path to the driver
MAINTAINERS: PHY LIBRARY: Update files in the record
...
The maximum key description size is 4095. Commit f771fde820 ("keys:
Simplify key description management") inadvertantly reduced that to 255
and made sizes between 256 and 4095 work weirdly, and any size whereby
size & 255 == 0 would cause an assertion in __key_link_begin() at the
following line:
BUG_ON(index_key->desc_len == 0);
This can be fixed by simply increasing the size of desc_len in struct
keyring_index_key to a u16.
Note the argument length test in keyutils only checked empty
descriptions and descriptions with a size around the limit (ie. 4095)
and not for all the values in between, so it missed this. This has been
addressed and
https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/keyutils.git/commit/?id=066bf56807c26cd3045a25f355b34c1d8a20a5aa
now exhaustively tests all possible lengths of type, description and
payload and then some.
The assertion failure looks something like:
kernel BUG at security/keys/keyring.c:1245!
...
RIP: 0010:__key_link_begin+0x88/0xa0
...
Call Trace:
key_create_or_update+0x211/0x4b0
__x64_sys_add_key+0x101/0x200
do_syscall_64+0x5b/0x1e0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
It can be triggered by:
keyctl add user "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" a @s
Fixes: f771fde820 ("keys: Simplify key description management")
Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
BIOS on Samsung 500C Chromebook reports very rudimentary E820 table that
consists of 2 entries:
BIOS-e820: [mem 0x0000000000000000-0x0000000000000fff] usable
BIOS-e820: [mem 0x00000000fffff000-0x00000000ffffffff] reserved
It breaks logic in find_trampoline_placement(): bios_start lands on the
end of the first 4k page and trampoline start gets placed below 0.
Detect underflow and don't touch bios_start for such cases. It makes
kernel ignore E820 table on machines that doesn't have two usable pages
below BIOS_START_MAX.
Fixes: 1b3a626436 ("x86/boot/compressed/64: Validate trampoline placement against E820")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=203463
Link: https://lkml.kernel.org/r/20190813131654.24378-1-kirill.shutemov@linux.intel.com
If the writeback error is fatal, we need to remove the tracking structures
(i.e. the nfs_page) from the inode.
Fixes: 6fbda89b25 ("NFS: Replace custom error reporting mechanism...")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Initialise the result count to 0 rather than initialising it to the
argument count. The reason is that we want to ensure we record the
I/O stats correctly in the case where an error is returned (for
instance in the layoutstats).
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
If the attempt to resend the I/O results in no bytes being read/written,
we must ensure that we report the error.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Fixes: 0a00b77b33 ("nfs: mirroring support for direct io")
Cc: stable@vger.kernel.org # v3.20+
If the attempt to resend the pages fails, we need to ensure that we
clean up those pages that were not transmitted.
Fixes: d600ad1f2b ("NFS41: pop some layoutget errors to application")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: stable@vger.kernel.org # v4.5+
If the file turns out to be of the wrong type after opening, we want
to revalidate the path and retry, so return EOPENSTALE rather than
ESTALE.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Currently, we are translating RPC level errors such as timeouts,
as well as interrupts etc into EOPENSTALE, which forces a single
replay of the open attempt. What we actually want to do is
force the replay only in the cases where the returned error
indicates that the file may have changed on the server.
So the fix is to spell out the exact set of errors where we want
to return EOPENSTALE.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
If we've been given the attributes of the mounted-on-file, then do not
use those to check or update the attributes on the application-visible
inode.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
When calling request_threaded_irq() with a CP2112, the function
cp2112_gpio_irq_startup() is called in a IRQ context.
Therefore we can not sleep, and we can not call
cp2112_gpio_direction_input() there.
Move the call to cp2112_gpio_direction_input() earlier to have a working
driver.
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Distance values reported by 2nd-gen Intuos tablets are on an inverted
scale (0 == far, 63 == near). We need to change them over to a normal
scale before reporting to userspace or else userspace drivers and
applications can get confused.
Ref: https://github.com/linuxwacom/input-wacom/issues/98
Fixes: eda01dab53 ("HID: wacom: Add four new Intuos devices")
Signed-off-by: Jason Gerecke <jason.gerecke@wacom.com>
Cc: <stable@vger.kernel.org> # v4.4+
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
My recent to change to only use force_sig for a synchronous events
wound up breaking signal reception cifs and drbd. I had overlooked
the fact that by default kthreads start out with all signals set to
SIG_IGN. So a change I thought was safe turned out to have made it
impossible for those kernel thread to catch their signals.
Reverting the work on force_sig is a bad idea because what the code
was doing was very much a misuse of force_sig. As the way force_sig
ultimately allowed the signal to happen was to change the signal
handler to SIG_DFL. Which after the first signal will allow userspace
to send signals to these kernel threads. At least for
wake_ack_receiver in drbd that does not appear actively wrong.
So correct this problem by adding allow_kernel_signal that will allow
signals whose siginfo reports they were sent by the kernel through,
but will not allow userspace generated signals, and update cifs and
drbd to call allow_kernel_signal in an appropriate place so that their
thread can receive this signal.
Fixing things this way ensures that userspace won't be able to send
signals and cause problems, that it is clear which signals the
threads are expecting to receive, and it guarantees that nothing
else in the system will be affected.
This change was partly inspired by similar cifs and drbd patches that
added allow_signal.
Reported-by: ronnie sahlberg <ronniesahlberg@gmail.com>
Reported-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Tested-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Cc: Steve French <smfrench@gmail.com>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: David Laight <David.Laight@ACULAB.COM>
Fixes: 247bc9470b ("cifs: fix rmmod regression in cifs.ko caused by force_sig changes")
Fixes: 72abe3bcf0 ("signal/cifs: Fix cifs_put_tcp_session to call send_sig instead of force_sig")
Fixes: fee109901f ("signal/drbd: Use send_sig not force_sig")
Fixes: 3cf5d076fb ("signal: Remove task parameter from force_sig")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Some newer machines do not advertise legacy timers. The kernel can handle
that situation if the TSC and the CPU frequency are enumerated by CPUID or
MSRs and the CPU supports TSC deadline timer. If the CPU does not support
TSC deadline timer the local APIC timer frequency has to be known as well.
Some Ryzens machines do not advertize legacy timers, but there is no
reliable way to determine the bus frequency which feeds the local APIC
timer when the machine allows overclocking of that frequency.
As there is no legacy timer the local APIC timer calibration crashes due to
a NULL pointer dereference when accessing the not installed global clock
event device.
Switch the calibration loop to a non interrupt based one, which polls
either TSC (if frequency is known) or jiffies. The latter requires a global
clockevent. As the machines which do not have a global clockevent installed
have a known TSC frequency this is a non issue. For older machines where
TSC frequency is not known, there is no known case where the legacy timers
do not exist as that would have been reported long ago.
Reported-by: Daniel Drake <drake@endlessm.com>
Reported-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Daniel Drake <drake@endlessm.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1908091443030.21433@nanos.tec.linutronix.de
Link: http://bugzilla.opensuse.org/show_bug.cgi?id=1142926#c12
lockdep reports the following deadlock scenario:
WARNING: possible circular locking dependency detected
kworker/1:1/48 is trying to acquire lock:
000000008d7a62b2 (text_mutex){+.+.}, at: kprobe_optimizer+0x163/0x290
but task is already holding lock:
00000000850b5e2d (module_mutex){+.+.}, at: kprobe_optimizer+0x31/0x290
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (module_mutex){+.+.}:
__mutex_lock+0xac/0x9f0
mutex_lock_nested+0x1b/0x20
set_all_modules_text_rw+0x22/0x90
ftrace_arch_code_modify_prepare+0x1c/0x20
ftrace_run_update_code+0xe/0x30
ftrace_startup_enable+0x2e/0x50
ftrace_startup+0xa7/0x100
register_ftrace_function+0x27/0x70
arm_kprobe+0xb3/0x130
enable_kprobe+0x83/0xa0
enable_trace_kprobe.part.0+0x2e/0x80
kprobe_register+0x6f/0xc0
perf_trace_event_init+0x16b/0x270
perf_kprobe_init+0xa7/0xe0
perf_kprobe_event_init+0x3e/0x70
perf_try_init_event+0x4a/0x140
perf_event_alloc+0x93a/0xde0
__do_sys_perf_event_open+0x19f/0xf30
__x64_sys_perf_event_open+0x20/0x30
do_syscall_64+0x65/0x1d0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
-> #0 (text_mutex){+.+.}:
__lock_acquire+0xfcb/0x1b60
lock_acquire+0xca/0x1d0
__mutex_lock+0xac/0x9f0
mutex_lock_nested+0x1b/0x20
kprobe_optimizer+0x163/0x290
process_one_work+0x22b/0x560
worker_thread+0x50/0x3c0
kthread+0x112/0x150
ret_from_fork+0x3a/0x50
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(module_mutex);
lock(text_mutex);
lock(module_mutex);
lock(text_mutex);
*** DEADLOCK ***
As a reproducer I've been using bcc's funccount.py
(https://github.com/iovisor/bcc/blob/master/tools/funccount.py),
for example:
# ./funccount.py '*interrupt*'
That immediately triggers the lockdep splat.
Fix by acquiring text_mutex before module_mutex in kprobe_optimizer().
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: d5b844a2cf ("ftrace/x86: Remove possible deadlock between register_kprobe() and ftrace_run_update_code()")
Link: http://lkml.kernel.org/r/20190812184302.GA7010@xps-13
Signed-off-by: Ingo Molnar <mingo@kernel.org>
If a task is PI-blocked (blocking on sleeping spinlock) then we don't want to
schedule a new kworker if we schedule out due to lock contention because !RT
does not do that as well. A spinning spinlock disables preemption and a worker
does not schedule out on lock contention (but spin).
On RT the RW-semaphore implementation uses an rtmutex so
tsk_is_pi_blocked() will return true if a task blocks on it. In this case we
will now start a new worker which may deadlock if one worker is waiting on
progress from another worker. Since a RW-semaphore starts a new worker on !RT,
we should do the same on RT.
XFS is able to trigger this deadlock.
Allow to schedule new worker if the current worker is PI-blocked.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20190816160626.12742-1-bigeasy@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When running a 64-bit kernel with a 32-bit iptables binary, the size of
the xt_nfacct_match_info struct diverges.
kernel: sizeof(struct xt_nfacct_match_info) : 40
iptables: sizeof(struct xt_nfacct_match_info)) : 36
Trying to append nfacct related rules results in an unhelpful message.
Although it is suggested to look for more information in dmesg, nothing
can be found there.
# iptables -A <chain> -m nfacct --nfacct-name <acct-object>
iptables: Invalid argument. Run `dmesg' for more information.
This patch fixes the memory misalignment by enforcing 8-byte alignment
within the struct's first revision. This solution is often used in many
other uapi netfilter headers.
Signed-off-by: Juliana Rodrigueiro <juliana.rodrigueiro@intra2net.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The netlink attribute policy for NFTA_FLOW_TABLE_NAME is missing.
Fixes: a3c90f7a23 ("netfilter: nf_tables: flow offload expression")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The ordering of arguments to the x_tables ADD_COUNTER macro
appears to be wrong in ebtables (cf. ip_tables.c, ip6_tables.c,
and arp_tables.c).
This causes data corruption in the ebtables userspace tools
because they get incorrect packet & byte counts from the kernel.
Fixes: d72133e628 ("netfilter: ebtables: use ADD_COUNTER macro")
Signed-off-by: Todd Seidelmann <tseidelmann@linode.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This entry is in MAINTAINERS for historical purpose.
It doesn't match current sources since the commit
adf82accc5 ("netfilter: x_tables: merge ip and
ipv6 masquerade modules") moved the module.
The net/netfilter/xt_MASQUERADE.c module is already under
the netfilter section. Thus, there is no purpose to keep this
separate entry in MAINTAINERS.
Cc: Florian Westphal <fw@strlen.de>
Cc: Juanjo Ciarlante <jjciarla@raiz.uncu.edu.ar>
Cc: netfilter-devel@vger.kernel.org
Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Denis Efremov <efremov@linux.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
While trawling through the dedupe file comparison code trying to fix
page deadlocking problems, Dave Chinner noticed that the reflink code
only takes shared IOLOCK/MMAPLOCKs on the source file. Because
page_mkwrite and directio writes do not take the EXCL versions of those
locks, this means that reflink can race with writer processes.
For pure remapping this can lead to undefined behavior and file
corruption; for dedupe this means that we cannot be sure that the
contents are identical when we decide to go ahead with the remapping.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Currently the driver does not handle EPROBE_DEFER for the confd gpio.
Use devm_gpiod_get_optional() instead of devm_gpiod_get() and return
error codes from altera_ps_probe().
Fixes: 5692fae074 ("fpga manager: Add altera-ps-spi driver for Altera FPGAs")
Signed-off-by: Phil Reid <preid@electromag.com.au>
Signed-off-by: Moritz Fischer <mdf@kernel.org>
Each iteration of for_each_child_of_node puts the previous
node, but in the case of a goto from the middle of the loop, there is
no put, thus causing a memory leak. Hence add an of_node_put before the
goto in two places.
Issue found with Coccinelle.
Fixes: 119f517362 (drm/mediatek: Add DRM Driver for Mediatek SoC MT8173)
Signed-off-by: Nishka Dasgupta <nishkadg.linux@gmail.com>
Signed-off-by: CK Hu <ck.hu@mediatek.com>
When a Tx timestamp is requested, a pointer to the skb is stored in the
ravb_tstamp_skb struct. This was done without an skb_get. There exists
the possibility that the skb could be freed by ravb_tx_free (when
ravb_tx_free is called from ravb_start_xmit) before the timestamp was
processed, leading to a use-after-free bug.
Use skb_get when filling a ravb_tstamp_skb struct, and add appropriate
frees/consumes when a ravb_tstamp_skb struct is freed.
Fixes: c156633f13 ("Renesas Ethernet AVB driver proper")
Signed-off-by: Tho Vu <tho.vu.wh@rvc.renesas.com>
Signed-off-by: Kazuya Mizuguchi <kazuya.mizuguchi.ks@renesas.com>
Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
flow_offload hardware priority fixes
This patchset contains two updates for the flow_offload users:
1) Pass the major tc priority to drivers so they do not have to
lshift it. This is a preparation patch for the fix coming in
patch #2.
2) Set the hardware priority from the netfilter basechain priority,
some drivers break when using the existing hardware priority
number that is set to zero.
v5: fix patch 2/2 to address a clang warning and to simplify
the priority mapping.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds initial support for offloading basechains using the
priority range from 1 to 65535. This is restricting the netfilter
priority range to 16-bit integer since this is what most drivers assume
so far from tc. It should be possible to extend this range of supported
priorities later on once drivers are updated to support for 32-bit
integer priorities.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
tc transparently maps the software priority number to hardware. Update
it to pass the major priority which is what most drivers expect. Update
drivers too so they do not need to lshift the priority field of the
flow_cls_common_offload object. The stmmac driver is an exception, since
this code assumes the tc software priority is fine, therefore, lshift it
just to be conservative.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In i2400m_barker_db_init(), 'options_orig' is allocated through kstrdup()
to hold the original command line options. Then, the options are parsed.
However, if an error occurs during the parsing process, 'options_orig' is
not deallocated, leading to a memory leak bug. To fix this issue, free
'options_orig' before returning the error.
Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver name gets exposed in sysfs under /sys/bus/pci/drivers
so it should look like other devices. Change it to be common
format (instead of "Cavium PTP").
This is a trivial fix that was observed by accident because
Debian kernels were building this driver into kernel (bug).
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
There's no need to wait until a completion is received to unmap
TX descriptor buffers that have been passed to the hypervisor.
Instead unmap it when the hypervisor call has completed. This patch
avoids the possibility that a buffer will not be unmapped because
a TX completion is lost or mishandled.
Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Tested-by: Devesh K. Singh <devesh_singh@in.ibm.com>
Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan says:
====================
bnxt_en: Bug fixes.
2 Bug fixes related to 57500 shutdown sequence and doorbell sequence,
2 TC Flower bug fixes related to the setting of the flow direction,
1 NVRAM update bug fix, and a minor fix to suppress an unnecessary
error message. Please queue for -stable as well. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
FW expects the driver to provide unique flow reference handles
for Tx or Rx flows. When a Tx flow and an Rx flow end up sharing
a reference handle, flow offload does not seem to work.
This could happen in the case of 2 flows having their L2 fields
wildcarded but in different direction.
Fix to incorporate the flow direction as part of the L2 key
v2: Move the dir field to the end of the bnxt_tc_l2_key struct to
fix the warning reported by kbuild test robot <lkp@intel.com>.
There is existing code that initializes the structure using
nested initializer and will warn with the new u8 field added to
the beginning. The structure also packs nicer when this new u8 is
added to the end of the structure [MChan].
Fixes: abd43a1352 ("bnxt_en: Support for 64-bit flow handle.")
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Direction of the flow is determined using src_fid. For an RX flow,
src_fid is PF's fid and for TX flow, src_fid is VF's fid. Direction
of the flow must be specified, when getting statistics for that flow.
Currently, for DECAP flow, direction is determined incorrectly, i.e.,
direction is initialized as TX for DECAP flow, instead of RX. Because
of which, stats are not reported for this DECAP flow, though it is
offloaded and there is traffic for that flow, resulting in flow age out.
This patch fixes the problem by determining the DECAP flow's direction
using correct fid. Set the flow direction in all cases for consistency
even if 64-bit flow handle is not used.
Fixes: abd43a1352 ("bnxt_en: Support for 64-bit flow handle.")
Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For newly added NVM parameters, older firmware may not have the support.
Suppress the error message to avoid the unncessary error message which is
triggered when devlink calls the driver during initialization.
Fixes: 782a624d00 ("bnxt_en: Add bnxt_en initial params table and register it.")
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If FW returns FRAG_ERR in response error code, driver is resending the
command only when HWRM command returns success. Fix the code to resend
NVM_INSTALL_UPDATE command with DEFRAG install flags, if FW returns
FRAG_ERR in its response error code.
Fixes: cb4d1d6261 ("bnxt_en: Retry failed NVM_INSTALL_UPDATE with defragmentation flag enabled.")
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When both RX buffers and RX aggregation buffers have to be
replenished at the end of NAPI, post the RX aggregation buffers first
before RX buffers. Otherwise, we may run into a situation where
there are only RX buffers without RX aggregation buffers for a split
second. This will cause the hardware to abort the RX packet and
report buffer errors, which will cause unnecessary cleanup by the
driver.
Ringing the Aggregation ring doorbell first before the RX ring doorbell
will prevent some of these buffer errors. Use the same sequence during
ring initialization as well.
Fixes: 697197e5a1 ("bnxt_en: Re-structure doorbells.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
During device shutdown, the VNIC clearing sequence needs to be modified
to free the VNIC first before freeing the RSS contexts. The current
code is doing the reverse and we can get mis-directed RX completions
to CP ring ID 0 when the RSS contexts are freed and zeroed. The clearing
of RSS contexts is not required with the new sequence.
Refactor the VNIC clearing logic into a new function bnxt_clear_vnic()
and do the chip specific VNIC clearing sequence.
Fixes: 7b3af4f75b ("bnxt_en: Add RSS support for 57500 chips.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In kalmia_init_and_get_ethernet_addr(), 'usb_buf' is allocated through
kmalloc(). In the following execution, if the 'status' returned by
kalmia_send_init_packet() is not 0, 'usb_buf' is not deallocated, leading
to memory leaks. To fix this issue, add the 'out' label to free 'usb_buf'.
Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
In cx82310_bind(), 'dev->partial_data' is allocated through kmalloc().
Then, the execution waits for the firmware to become ready. If the firmware
is not ready in time, the execution is terminated. However, the allocated
'dev->partial_data' is not deallocated on this path, leading to a memory
leak bug. To fix this issue, free 'dev->partial_data' before returning the
error.
Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull MTD fix from Richard Weinberger:
"A single fix for MTD to correctly set the spi-nor WP pin"
* tag 'fixes-for-5.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
mtd: spi-nor: Fix the disabling of write protection at init
Commit 04f05230c5 ("bnx2x: Remove configured vlans as
part of unload sequence."), introduced a regression in driver
that as a part of VF's reload flow, VLANs created on the VF
doesn't get re-configured in hardware as vlan metadata/info
was not getting cleared for the VFs which causes vlan PING to stop.
This patch clears the vlan metadata/info so that VLANs gets
re-configured back in the hardware in VF's reload flow and
PING/traffic continues for VLANs created over the VFs.
Fixes: 04f05230c5 ("bnx2x: Remove configured vlans as part of unload sequence.")
Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: Sudarsana Kalluru <skalluru@marvell.com>
Signed-off-by: Shahed Shaikh <shshaikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull btrfs fixes from David Sterba:
"Two fixes that popped up during testing:
- fix for sysfs-related code that adds/removes block groups, warnings
appear during several fstests in connection with sysfs updates in
5.3, the fix essentially replaces a workaround with scope NOFS and
applies to 5.2-based branch too
- add sanity check of trim range"
* tag 'for-5.3-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: trim: Check the range passed into to prevent overflow
Btrfs: fix sysfs warning and missing raid sysfs directories
Pull x86 fixes from Thomas Gleixner:
"A set of fixes for x86:
- Fix the inconsistent error handling in the umwait init code
- Rework the boot param zeroing so gcc9 stops complaining about out
of bound memset. The resulting source code is actually more sane to
read than the smart solution we had
- Maintainers update so Tony gets involved when Intel models are
added
- Some more fallthrough fixes"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot: Save fields explicitly, zero out everything else
MAINTAINERS, x86/CPU: Tony Luck will maintain asm/intel-family.h
x86/fpu/math-emu: Address fallthrough warnings
x86/apic/32: Fix yet another implicit fallthrough warning
x86/umwait: Fix error handling in umwait_init()
Pull EFI fix from Thomas Gleixner:
"A single fix for a EFI mixed mode regression caused by recent rework
which did not take the firmware bitwidth into account"
* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi-stub: Fix get_efi_config_table on mixed-mode setups
Pull SPDX fixes from Greg KH:
"Here are four small SPDX fixes for 5.3-rc5.
A few style fixes for some SPDX comments, added an SPDX tag for one
file, and fix up some GPL boilerplate for another file.
All of these have been in linux-next for a few weeks with no reported
issues (they are comment changes only, so that's to be expected...)"
* tag 'spdx-5.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx:
i2c: stm32: Use the correct style for SPDX License Identifier
intel_th: Use the correct style for SPDX License Identifier
coccinelle: api/atomic_as_refcounter: add SPDX License Identifier
kernel/configs: Replace GPL boilerplate code with SPDX identifier
Pull char/misc driver fixes from Greg KH:
"Here are some small char and misc driver fixes for 5.3-rc5.
These are two different subsystems needing some fixes, the habanalabs
driver which is has some more big endian fixes for problems found. The
other are some small soundwire fixes, including some Kconfig
dependencies needed to resolve reported build errors.
All of these have been in linux-next this week with no reported
issues"
* tag 'char-misc-5.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
misc: xilinx-sdfec: fix dependency and build error
habanalabs: fix device IRQ unmasking for BE host
habanalabs: fix endianness handling for internal QMAN submission
habanalabs: fix completion queue handling when host is BE
habanalabs: fix endianness handling for packets from user
habanalabs: fix DRAM usage accounting on context tear down
habanalabs: Avoid double free in error flow
soundwire: fix regmap dependencies and align with other serial links
soundwire: cadence_master: fix definitions for INTSTAT0/1
soundwire: cadence_master: fix register definition for SLAVE_STATE
Pull staging/IIO fixes from Greg KH:
"Here are four small staging and iio driver fixes for 5.3-rc5
Two are for the dt3000 comedi driver for some reported problems found
in that codebase, and two are some small iio fixes.
All of these have been in linux-next this week with no reported
issues"
* tag 'staging-5.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: comedi: dt3000: Fix rounding up of timer divisor
staging: comedi: dt3000: Fix signed integer overflow 'divider * base'
iio: adc: max9611: Fix temperature reading in probe
iio: frequency: adf4371: Fix output frequency setting
Pull USB fixes from Greg KH:
"Here are number of small USB fixes for 5.3-rc5.
Syzbot has been on a tear recently now that it has some good USB
debugging hooks integrated, so there's a number of fixes in here found
by those tools for some _very_ old bugs. Also a handful of gadget
driver fixes for reported issues, some hopefully-final dma fixes for
host controller drivers, and some new USB serial gadget driver ids.
All of these have been in linux-next this week with no reported issues
(the usb-serial ones were in linux-next in its own branch, but merged
into mine on Friday)"
* tag 'usb-5.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: add a hcd_uses_dma helper
usb: don't create dma pools for HCDs with a localmem_pool
usb: chipidea: imx: fix EPROBE_DEFER support during driver probe
usb: host: fotg2: restart hcd after port reset
USB: CDC: fix sanity checks in CDC union parser
usb: cdc-acm: make sure a refcount is taken early enough
USB: serial: option: add the BroadMobi BM818 card
USB: serial: option: Add Motorola modem UARTs
USB: core: Fix races in character device registration and deregistraion
usb: gadget: mass_storage: Fix races between fsg_disable and fsg_set_alt
usb: gadget: composite: Clear "suspended" on reset/disconnect
usb: gadget: udc: renesas_usb3: Fix sysfs interface of "role"
USB: serial: option: add D-Link DWM-222 device ID
USB: serial: option: Add support for ZTE MF871A
Pull block fixes from Jens Axboe:
"A collection of fixes that should go into this series. This contains:
- Revert of the REQ_NOWAIT_INLINE and associated dio changes. There
were still corner cases there, and even though I had a solution for
it, it's too involved for this stage. (me)
- Set of NVMe fixes (via Sagi)
- io_uring fix for fixed buffers (Anthony)
- io_uring defer issue fix (Jackie)
- Regression fix for queue sync at exit time (zhengbin)
- xen blk-back memory leak fix (Wenwen)"
* tag 'for-linus-2019-08-17' of git://git.kernel.dk/linux-block:
io_uring: fix an issue when IOSQE_IO_LINK is inserted into defer list
block: remove REQ_NOWAIT_INLINE
io_uring: fix manual setup of iov_iter for fixed buffers
xen/blkback: fix memory leaks
blk-mq: move cancel of requeue_work to the front of blk_exit_queue
nvme-pci: Fix async probe remove race
nvme: fix controller removal race with scan work
nvme-rdma: fix possible use-after-free in connect error flow
nvme: fix a possible deadlock when passthru commands sent to a multipath device
nvme-core: Fix extra device_put() call on error path
nvmet-file: fix nvmet_file_flush() always returning an error
nvmet-loop: Flush nvme_delete_wq when removing the port
nvmet: Fix use-after-free bug when a port is removed
nvme-multipath: revalidate nvme_ns_head gendisk in nvme_validate_ns
Pull Hyper-V fixes from Sasha Levin:
- A few fixes for the userspace hyper-v tools from Adrian Vladu.
- A fix for the hyper-v MAINTAINERs entry from Lan Tianyu.
- Fix for SPDX license identifier in the userspace tools from Nishad
Kamdar.
* tag 'hyperv-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
MAINTAINERS: Fix Hyperv vIOMMU driver file name
tools: hv: Use the correct style for SPDX License Identifier
tools: hv: fix typos in toolchain
tools: hv: fix KVP and VSS daemons exit code
tools: hv: fixed Python pep8/flake8 warnings for lsvmbus
Johan Hedberg says:
====================
pull request: bluetooth 2019-08-17
Here's a set of Bluetooth fixes for the 5.3-rc series:
- Multiple fixes for Qualcomm (btqca & hci_qca) drivers
- Minimum encryption key size debugfs setting (this is required for
Bluetooth Qualification)
- Fix hidp_send_message() to have a meaningful return value
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The Hyperv vIOMMU file name should be "hyperv-iommu.c" rather
than "hyperv_iommu.c". This patch is to fix it.
Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
This patch corrects the SPDX License Identifier style
in the trace header file related to Microsoft Hyper-V
client drivers.
For C header files Documentation/process/license-rules.rst
mandates C-like comments (opposed to C source files where
C++ style should be used)
Changes made by using a script provided by Joe Perches here:
https://lkml.org/lkml/2019/2/7/46
Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Nishad Kamdar <nishadkamdar@gmail.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Pull i2c fixes from Wolfram Sang:
"I2C has one revert because of a regression, two fixes for tiny race
windows (which we were not able to trigger), a MAINTAINERS addition,
and a SPDX fix"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: stm32: Use the correct style for SPDX License Identifier
i2c: emev2: avoid race when unregistering slave client
i2c: rcar: avoid race when unregistering slave client
MAINTAINERS: i2c-imx: take over maintainership
Revert "i2c: imx: improve the error handling in i2c_imx_dma_request()"
Pull RISC-V fixes from Paul Walmsley:
- Two patches to fix significant bugs in floating point register
context handling
- A minor fix in RISC-V flush_tlb_page(), to supply a valid end address
to flush_tlb_range()
- Two minor defconfig additions: to build the virtio hwrng driver by
default (for QEMU targets), and to partially synchronize the 32-bit
defconfig with the 64-bit defconfig
* tag 'riscv/for-v5.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Make __fstate_clean() work correctly.
riscv: Correct the initialized flow of FP register
riscv: defconfig: Update the defconfig
riscv: rv32_defconfig: Update the defconfig
riscv: fix flush_tlb_range() end address for flush_tlb_page()
Johan writes:
USB-serial fixes for 5.3-rc5
Here are some new modem device ids.
All have been in linux-next with no reported issues.
Signed-off-by: Johan Hovold <johan@kernel.org>
* tag 'usb-serial-5.3-rc5' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial:
USB: serial: option: add the BroadMobi BM818 card
USB: serial: option: Add Motorola modem UARTs
USB: serial: option: add D-Link DWM-222 device ID
USB: serial: option: Add support for ZTE MF871A
For testing and qualification purposes it is useful to allow changing
the minimum encryption key size value that the host stack is going to
enforce. This adds a new debugfs setting min_encrypt_key_size to achieve
this functionality.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
When dedupe wants to use the page cache to compare parts of two files
for dedupe, we must be very careful to handle locking correctly. The
current code doesn't do this. It must lock and unlock the page only
once if the two pages are the same, since the overlapping range check
doesn't catch this when blocksize < pagesize. If the pages are distinct
but from the same file, we must observe page locking order and lock them
in order of increasing offset to avoid clashing with writeback locking.
Fixes: 876bec6f9b ("vfs: refactor clone/dedupe_file_range common functions")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
For 31-bit s390 user space, we have to pass pointer arguments through
compat_ptr() in the compat_ioctl handler.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Always try the native ioctl if we don't have a compat handler. This
removes a lot of boilerplate code as 'modern' ioctls should generally
be compat clean, and fixes the missing entries for the recently added
FS_IOC_GETFSLABEL/FS_IOC_SETFSLABEL ioctls.
Fixes: f7664b3197 ("xfs: implement online get/set fs label")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Pull Xtensa fix from Max Filippov:
"Add missing isync into cpu_reset to make sure ITLB changes are
effective"
* tag 'xtensa-20190816' of git://github.com/jcmvbkbc/linux-xtensa:
xtensa: add missing isync to the cpu_reset TLB code
This commit eliminates the use of the link 'stale_limit' & 'prev_from'
(besides the already removed - 'stale_cnt') variables in the detection
of repeated retransmit failures as there is no proper way to initialize
them to avoid a false detection, i.e. it is not really a retransmission
failure but due to a garbage values in the variables.
Instead, a jiffies variable will be added to individual skbs (like the
way we restrict the skb retransmissions) in order to mark the first skb
retransmit time. Later on, at the next retransmissions, the timestamp
will be checked to see if the skb in the link transmq is "too stale",
that is, the link tolerance time has passed, so that a link reset will
be ordered. Note, just checking on the first skb in the queue is fine
enough since it must be the oldest one.
A counter is also added to keep track the actual skb retransmissions'
number for later checking when the failure happens.
The downside of this approach is that the skb->cb[] buffer is about to
be exhausted, however it is always able to allocate another memory area
and keep a reference to it when needed.
Fixes: 77cf8edbc0 ("tipc: simplify stale link failure criteria")
Reported-by: Hoang Le <hoang.h.le@dektech.com.au>
Acked-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
In lan78xx_probe(), a new urb is allocated through usb_alloc_urb() and
saved to 'dev->urb_intr'. However, in the following execution, if an error
occurs, 'dev->urb_intr' is not deallocated, leading to memory leaks. To fix
this issue, invoke usb_free_urb() to free the allocated urb before
returning from the function.
Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull arm64 fixes from Catalin Marinas:
- Don't taint the kernel if CPUs have different sets of page sizes
supported (other than the one in use).
- Issue I-cache maintenance for module ftrace trampoline.
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: ftrace: Ensure module ftrace trampoline is coherent with I-side
arm64: cpufeature: Don't treat granule sizes as strict
Don't compare the parent clock name with a NULL name in the
clk_parent_map. This prevents a kernel crash when passing NULL
core->parents[i].name to strcmp().
An example which triggered this is a mux clock with four parents when
each of them is referenced in the clock driver using
clk_parent_data.fw_name and then calling clk_set_parent(clk, 3rd_parent)
on this mux.
In this case the first parent is also the HW default so
core->parents[i].hw is populated when the clock is registered. Calling
clk_set_parent(clk, 3rd_parent) will then go through all parents and
skip the first parent because it's hw pointer doesn't match. For the
second parent no hw pointer is cached yet and clk_core_get(core, 1)
returns a non-matching pointer (which is correct because we are comparing
the second with the third parent). Comparing the result of
clk_core_get(core, 2) with the requested parent gives a match. However
we don't reach this point because right after the clk_core_get(core, 1)
mismatch the old code tried to !strcmp(parent->name, NULL) (where the
second argument is actually core->parents[i].name, but that was never
populated by the clock driver).
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Link: https://lkml.kernel.org/r/20190815223155.21384-1-martin.blumenstingl@googlemail.com
Fixes: fc0c209c14 ("clk: Allow parents to be specified without string names")
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Calls to clk_core_get() will return ERR_PTR(-EINVAL) if we've started
migrating a clk driver to use the DT based style of specifying parents
but we haven't made any DT updates yet. This happens when we pass a
non-NULL value as the 'name' argument of of_parse_clkspec(). That
function returns -EINVAL in such a situation, instead of -ENOENT like we
expected. The return value comes back up to clk_core_fill_parent_index()
which proceeds to skip calling clk_core_lookup() because the error
pointer isn't equal to -ENOENT, it's -EINVAL.
Furthermore, we blindly overwrite the error pointer returned by
clk_core_get() with NULL when there isn't a legacy .name member
specified in the parent map. This isn't too bad right now because we
don't really care to differentiate NULL from an error, but in the future
we should only try to do a legacy lookup if we know we might find
something. This way DT lookups that fail don't try to lookup based on
strings when there isn't any string to match, hiding the error from DT
parsing.
Fix both these problems so that clk provider drivers can use the new
style of parent mapping without having to also update their DT at the
same time. This patch is based on an earlier patch from Taniya Das which
checked for -EINVAL in addition to -ENOENT return values from
clk_core_get().
Fixes: 601b6e9330 ("clk: Allow parents to be specified via clkspec index")
Cc: Taniya Das <tdas@codeaurora.org>
Cc: Jerome Brunet <jbrunet@baylibre.com>
Cc: Chen-Yu Tsai <wens@csie.org>
Reported-by: Taniya Das <tdas@codeaurora.org>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Link: https://lkml.kernel.org/r/20190813214147.34394-1-sboyd@kernel.org
Tested-by: Taniya Das <tdas@codeaurora.org>
The initial support for dynamic ftrace trampolines in modules made use
of an indirect branch which loaded its target from the beginning of
a special section (e71a4e1beb ("arm64: ftrace: add support for far
branches to dynamic ftrace")). Since no instructions were being patched,
no cache maintenance was needed. However, later in be0f272bfc ("arm64:
ftrace: emit ftrace-mod.o contents through code") this code was reworked
to output the trampoline instructions directly into the PLT entry but,
unfortunately, the necessary cache maintenance was overlooked.
Add a call to __flush_icache_range() after writing the new trampoline
instructions but before patching in the branch to the trampoline.
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: James Morse <james.morse@arm.com>
Cc: <stable@vger.kernel.org>
Fixes: be0f272bfc ("arm64: ftrace: emit ftrace-mod.o contents through code")
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Pull power management fixes from Rafael Wysocki:
"These add a check to avoid recent suspend-to-idle power regression on
systems with NVMe drives where the PCIe ASPM policy is "performance"
(or when the kernel is built without ASPM support), fix an issue
related to frequency limits in the schedutil cpufreq governor and fix
a mistake related to the PM QoS usage in the cpufreq core introduced
recently.
Specifics:
- Disable NVMe power optimization related to suspend-to-idle added
recently on systems where PCIe ASPM is not able to put PCIe links
into low-power states to prevent excess power from being drawn by
the system while suspended (Rafael Wysocki).
- Make the schedutil governor handle frequency limits changes
properly in all cases (Viresh Kumar).
- Prevent the cpufreq core from treating positive values returned by
dev_pm_qos_update_request() as errors (Viresh Kumar)"
* tag 'pm-5.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
nvme-pci: Allow PCI bus-level PM to be used if ASPM is disabled
PCI/ASPM: Add pcie_aspm_enabled()
cpufreq: schedutil: Don't skip freq update when limits change
cpufreq: dev_pm_qos_update_request() can return 1 on success
Pull sound fixes from Takashi Iwai:
"All small fixes targeted for stable:
- Two fixes for USB-audio with malformed descriptor, spotted by
fuzzers
- Two fixes Conexant HD-audio codec wrt power management
- Quirks for HD-audio AMD platform and HP laptop
- HD-audio memory leak fix"
* tag 'sound-5.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: usb-audio: Fix a stack buffer overflow bug in check_input_term
ALSA: usb-audio: Fix an OOB bug in parse_audio_mixer_unit
ALSA: hda - Add a generic reboot_notify
ALSA: hda - Let all conexant codec enter D3 when rebooting
ALSA: hda/realtek - Add quirk for HP Envy x360
ALSA: hda - Fix a memory leak bug
ALSA: hda - Apply workaround for another AMD chip 1022:1487
Pull drm fixes from Dave Airlie:
"Nothing too crazy this week, one amdgpu fix to use vmalloc for a
struct that grew in size, and another MST fix for nouveau, and some
other misc fixes:
i915:
- single GVT use after free fix
scheduler:
- entity destruction race fix
amdgpu:
- struct allocation fix
- gfx9 soft recovery fix
nouveau:
- followup MST fix
ast:
- vga register race fix"
* tag 'drm-fixes-2019-08-16' of git://anongit.freedesktop.org/drm/drm:
drm/nouveau: Only recalculate PBN/VCPI on mode/connector changes
drm/ast: Fixed reboot test may cause system hanged
drm/scheduler: use job count instead of peek
drm/amd/display: use kvmalloc for dc_state (v2)
drm/amdgpu: fix gfx9 soft recovery
drm/i915: Use after free in error path in intel_vgpu_create_workload()
The R-Car LVDS encoder units support dual-link operations by splitting
the pixel output between the primary encoder and the companion encoder.
Currently the companion encoder fails at probe time, causing the
registration of the primary to fail as well, preventing the whole DU unit
from being registered at all.
Fix this by not bailing out from probe with error if the
"renesas,companion" property is not specified.
Fixes: fa440d8703 ("drm: rcar-du: lvds: Add support for dual-link mode")
Reported-by: Fabrizio Castro <fabrizio.castro@bp.renesas.com>
Signed-off-by: Jacopo Mondi <jacopo+renesas@jmondi.org>
Reviewed-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Recent gcc compilers (gcc 9.1) generate warnings about an out of bounds
memset, if the memset goes accross several fields of a struct. This
generated a couple of warnings on x86_64 builds in sanitize_boot_params().
Fix this by explicitly saving the fields in struct boot_params
that are intended to be preserved, and zeroing all the rest.
[ tglx: Tagged for stable as it breaks the warning free build there as well ]
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Suggested-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20190731054627.5627-2-jhubbard@nvidia.com
Vinod writes:
soundwire fixes for v5.3-rc5
Pierre sent fixes which are queued now for v5.3-rc5 are:
- regmap dependecy
- cadence register definitions
* tag 'soundwire-5.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire:
soundwire: fix regmap dependencies and align with other serial links
soundwire: cadence_master: fix definitions for INTSTAT0/1
soundwire: cadence_master: fix register definition for SLAVE_STATE
When showing metadata about a single program by invoking
"bpftool prog show PROG", the file descriptor referring to the program
is not closed before returning from the function. Let's close it.
Fixes: 71bb428fe2 ("tools: bpf: add bpftool")
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
David Howells says:
====================
rxrpc: Fix local endpoint handling
Here's a pair of patches that fix two issues in the handling of local
endpoints (rxrpc_local structs):
(1) Use list_replace_init() rather than list_replace() if we're going to
unconditionally delete the replaced item later, lest the list get
corrupted.
(2) Don't access the rxrpc_local object after passing our ref to the
workqueue, not even to illuminate tracepoints, as the work function
may cause the object to be freed. We have to cache the information
beforehand.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
This patchset contains Netfilter fixes for net:
1) Extend selftest to cover flowtable with ipsec, from Florian Westphal.
2) Fix interaction of ipsec with flowtable, also from Florian.
3) User-after-free with bound set to rule that fails to load.
4) Adjust state and timeout for flows that expire.
5) Timeout update race with flows in teardown state.
6) Ensure conntrack id hash calculation use invariants as input,
from Dirk Morris.
7) Do not push flows into flowtable for TCP fin/rst packets.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
A process could race in an open and attempt to read one of these files
before i_private is initialized, and get a spurious error.
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Fix a crash that was introduced by the commit 724376a04d. The crash is
reported here: https://gitlab.com/cryptsetup/cryptsetup/issues/468
When reading from the integrity device, the function
dm_integrity_map_continue calls find_journal_node to find out if the
location to read is present in the journal. Then, it calculates how many
sectors are consecutively stored in the journal. Then, it locks the range
with add_new_range and wait_and_add_new_range.
The problem is that during wait_and_add_new_range, we hold no locks (we
don't hold ic->endio_wait.lock and we don't hold a range lock), so the
journal may change arbitrarily while wait_and_add_new_range sleeps.
The code then goes to __journal_read_write and hits
BUG_ON(journal_entry_get_sector(je) != logical_sector); because the
journal has changed.
In order to fix this bug, we need to re-check the journal location after
wait_and_add_new_range. We restrict the length to one block in order to
not complicate the code too much.
Fixes: 724376a04d ("dm integrity: implement fair range locks")
Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
dm-zoned is observed to lock up or livelock in case of hardware
failure or some misconfiguration of the backing zoned device.
This patch adds a new dm-zoned target function that checks the status of
the backing device. If the request queue of the backing device is found
to be in dying state or the SCSI backing device enters offline state,
the health check code sets a dm-zoned target flag prompting all further
incoming I/O to be rejected. In order to detect backing device failures
timely, this new function is called in the request mapping path, at the
beginning of every reclaim run and before performing any metadata I/O.
The proper way out of this situation is to do
dmsetup remove <dm-zoned target>
and recreate the target when the problem with the backing device
is resolved.
Fixes: 3b1a94c88b ("dm zoned: drive-managed zoned block device target")
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Some errors are ignored in the I/O path during queueing chunks
for processing by chunk works. Since at least these errors are
transient in nature, it should be possible to retry the failed
incoming commands.
The fix -
Errors that can happen while queueing chunks are carried upwards
to the main mapping function and it now returns DM_MAPIO_REQUEUE
for any incoming requests that can not be properly queued.
Error logging/debug messages are added where needed.
Fixes: 3b1a94c88b ("dm zoned: drive-managed zoned block device target")
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
There are several places in reclaim code where errors are not
propagated to the main function, dmz_reclaim(). This function
is responsible for unlocking zones that might be still locked
at the end of any failed reclaim iterations. As the result,
some device zones may be left permanently locked for reclaim,
degrading target's capability to reclaim zones.
This patch fixes these issues as follows -
Make sure that dmz_reclaim_buf(), dmz_reclaim_seq_data() and
dmz_reclaim_rnd_data() return error codes to the caller.
dmz_reclaim() function is renamed to dmz_do_reclaim() to avoid
clashing with "struct dmz_reclaim" and is modified to return the
error to the caller.
dmz_get_zone_for_reclaim() now returns an error instead of NULL
pointer and reclaim code checks for that error.
Error logging/debug messages are added where necessary.
Fixes: 3b1a94c88b ("dm zoned: drive-managed zoned block device target")
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
This patch fixes a problem in dm-kcopyd that may leave jobs in
complete queue indefinitely in the event of backing storage failure.
This behavior has been observed while running 100% write file fio
workload against an XFS volume created on top of a dm-zoned target
device. If the underlying storage of dm-zoned goes to offline state
under I/O, kcopyd sometimes never issues the end copy callback and
dm-zoned reclaim work hangs indefinitely waiting for that completion.
This behavior was traced down to the error handling code in
process_jobs() function that places the failed job to complete_jobs
queue, but doesn't wake up the job handler. In case of backing device
failure, all outstanding jobs may end up going to complete_jobs queue
via this code path and then stay there forever because there are no
more successful I/O jobs to wake up the job handler.
This patch adds a wake() call to always wake up kcopyd job wait queue
for all I/O jobs that fail before dm_io() gets called for that job.
The patch also sets the write error status in all sub jobs that are
failed because their master job has failed.
Fixes: b73c67c2cb ("dm kcopyd: add sequential write feature")
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Revert the commit bd293d071f. The proper
fix has been made available with commit d0a255e795 ("loop: set
PF_MEMALLOC_NOIO for the worker thread").
Note that the fix offered by commit bd293d071f doesn't really prevent
the deadlock from occuring - if we look at the stacktrace reported by
Junxiao Bi, we see that it hangs in bit_wait_io and not on the mutex -
i.e. it has already successfully taken the mutex. Changing the mutex
from mutex_lock to mutex_trylock won't help with deadlocks that happen
afterwards.
PID: 474 TASK: ffff8813e11f4600 CPU: 10 COMMAND: "kswapd0"
#0 [ffff8813dedfb938] __schedule at ffffffff8173f405
#1 [ffff8813dedfb990] schedule at ffffffff8173fa27
#2 [ffff8813dedfb9b0] schedule_timeout at ffffffff81742fec
#3 [ffff8813dedfba60] io_schedule_timeout at ffffffff8173f186
#4 [ffff8813dedfbaa0] bit_wait_io at ffffffff8174034f
#5 [ffff8813dedfbac0] __wait_on_bit at ffffffff8173fec8
#6 [ffff8813dedfbb10] out_of_line_wait_on_bit at ffffffff8173ff81
#7 [ffff8813dedfbb90] __make_buffer_clean at ffffffffa038736f [dm_bufio]
#8 [ffff8813dedfbbb0] __try_evict_buffer at ffffffffa0387bb8 [dm_bufio]
#9 [ffff8813dedfbbd0] dm_bufio_shrink_scan at ffffffffa0387cc3 [dm_bufio]
#10 [ffff8813dedfbc40] shrink_slab at ffffffff811a87ce
#11 [ffff8813dedfbd30] shrink_zone at ffffffff811ad778
#12 [ffff8813dedfbdc0] kswapd at ffffffff811ae92f
#13 [ffff8813dedfbec0] kthread at ffffffff810a8428
#14 [ffff8813dedfbf50] ret_from_fork at ffffffff81745242
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Fixes: bd293d071f ("dm bufio: fix deadlock with loop device")
Depends-on: d0a255e795 ("loop: set PF_MEMALLOC_NOIO for the worker thread")
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
`check_input_term` recursively calls itself with input from
device side (e.g., uac_input_terminal_descriptor.bCSourceID)
as argument (id). In `check_input_term`, if `check_input_term`
is called with the same `id` argument as the caller, it triggers
endless recursive call, resulting kernel space stack overflow.
This patch fixes the bug by adding a bitmap to `struct mixer_build`
to keep track of the checked ids and stop the execution if some id
has been checked (similar to how parse_audio_unit handles unitid
argument).
Reported-by: Hui Peng <benquike@gmail.com>
Reported-by: Mathias Payer <mathias.payer@nebelwelt.net>
Signed-off-by: Hui Peng <benquike@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
In myri10ge_probe(), myri10ge_alloc_slices() is invoked to allocate slices
related structures. Later on, myri10ge_request_irq() is used to get an irq.
However, if this process fails, the allocated slices related structures are
not deallocated, leading to memory leaks. To fix this issue, revise the
target label of the goto statement to 'abort_with_slices'.
Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ctx->sk_write_space pointer is only set when TLS tx mode is enabled.
When running without TX mode its a null pointer but we still set the
sk sk_write_space pointer on close().
Fix the close path to only overwrite sk->sk_write_space when the current
pointer is to the tls_write_space function indicating the tls module should
clean it up properly as well.
Reported-by: Hillf Danton <hdanton@sina.com>
Cc: Ying Xue <ying.xue@windriver.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Fixes: 57c722e932 ("net/tls: swap sk_write_space on close")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If oct->fn_list.enable_io_queues() fails, no cleanup is executed, leading
to memory/resource leaks. To fix this issue, invoke
octeon_delete_instr_queue() before returning from the function.
Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull xfs fixes from Darrick Wong:
- Fix crashes when the attr fork isn't present due to errors but inode
inactivation tries to zap the attr data anyway.
- Convert more directory corruption debugging asserts to actual
EFSCORRUPTED returns instead of blowing up later on.
- Don't fail writeback just because we ran out of memory allocating
metadata log data.
* tag 'xfs-5.3-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: don't crash on null attr fork xfs_bmapi_read
xfs: remove more ondisk directory corruption asserts
fs: xfs: xfs_log: Don't use KM_MAYFAIL at xfs_log_reserve().
Pull iomap fixlet from Darrick Wong:
"A single update to the MAINTAINERS entry for iomap now that we've
removed fs/iomap.c"
* tag 'iomap-5.3-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
MAINTAINERS: iomap: Remove fs/iomap.c record
Saeed Mahameed says:
====================
Mellanox, mlx5 fixes 2019-08-15
This series introduces two fixes to mlx5 driver.
1) Eran fixes a compatibility issue with ethtool flash.
2) Maxim fixes a race in XSK wakeup flow.
Please pull and let me know if there is any problem.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
synchronize_rcu() gets called multiple times each time a client is
destroyed. If the laundromat thread has a lot of clients to destroy,
the delay can be noticeable. This was causing pynfs test RENEW3 to
fail.
We could embed an rcu_head in each inode and do the kref_put in an rcu
callback. But simplest is just to take a lock here.
(I also wonder if the laundromat thread would be better replaced by a
bunch of scheduled work or timers or something.)
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Cited patch deleted ethtool flash device support, as ethtool core can
fallback into devlink flash callback. However, this is supported only if
there is a devlink port registered over the corresponding netdevice.
As mlx5e do not have devlink port support over native netdevice, it broke
the ability to flash device via ethtool.
This patch re-add the ethtool callback to avoid user functionality breakage
when trying to flash device via ethtool.
Fixes: 9c8bca2637 ("mlx5: Move firmware flash implementation to devlink")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add a missing spinlock around XSKICOSQ usage at the activation stage,
because there is a race between a configuration change and the
application calling sendto().
Fixes: db05815b36 ("net/mlx5e: Add XSK zero-copy support")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When running tcp_fastopen_backup_key.sh the following issue was seen in
a busybox environment.
./tcp_fastopen_backup_key.sh: line 33: [: -ne: unary operator expected
Shellcheck showed the following issue.
$ shellcheck tools/testing/selftests/net/tcp_fastopen_backup_key.sh
In tools/testing/selftests/net/tcp_fastopen_backup_key.sh line 33:
if [ $val -ne 0 ]; then
^-- SC2086: Double quote to prevent globbing and word splitting.
Rework to do a string comparison instead.
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch may fix two issues:
First, when IOSQE_IO_DRAIN set, the next IOs need to be inserted into
defer list to delay execution, but link io will be actively scheduled to
run by calling io_queue_sqe.
Second, when multiple LINK_IOs are inserted together with defer_list,
the LINK_IO is no longer keep order.
|-------------|
| LINK_IO | ----> insert to defer_list -----------
|-------------| |
| LINK_IO | ----> insert to defer_list ----------|
|-------------| |
| LINK_IO | ----> insert to defer_list ----------|
|-------------| |
| NORMAL_IO | ----> insert to defer_list ----------|
|-------------| |
|
queue_work at same time <-----|
Fixes: 9e645e1105 ("io_uring: add support for sqe links")
Signed-off-by: Jackie Liu <liuyun01@kylinos.cn>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We had a few issues with this code, and there's still a problem around
how we deal with error handling for chained/split bios. For now, just
revert the code and we'll try again with a thoroug solution. This
reverts commits:
e15c2ffa10 ("block: fix O_DIRECT error handling for bio fragments")
0eb6ddfb86 ("block: Fix __blkdev_direct_IO() for bio fragments")
6a43074e2f ("block: properly handle IOCB_NOWAIT for async O_DIRECT IO")
893a1c9720 ("blk-mq: allow REQ_NOWAIT to return an error inline")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Commit bd11b3a391 ("io_uring: don't use iov_iter_advance() for fixed
buffers") introduced an optimization to avoid using the slow
iov_iter_advance by manually populating the iov_iter iterator in some
cases.
However, the computation of the iterator count field was erroneous: The
first bvec was always accounted for an extent of page size even if the
bvec length was smaller.
In consequence, some I/O operations on fixed buffers were unable to
operate on the full extent of the buffer, consistently skipping some
bytes at the end of it.
Fixes: bd11b3a391 ("io_uring: don't use iov_iter_advance() for fixed buffers")
Cc: stable@vger.kernel.org
Signed-off-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
quirk_reset_lenovo_thinkpad_50_nvgpu() resets NVIDIA GPUs to work around
an apparent BIOS defect. It previously used pci_reset_function(), and
the available method was a bus reset, which was fine because there was
only one function on the bus. After b516ea586d ("PCI: Enable NVIDIA
HDA controllers"), there are now two functions (the HDA controller and
the GPU itself) on the bus, so the reset fails.
Use pci_reset_bus() explicitly instead of pci_reset_function() since it's
OK to reset both devices.
[bhelgaas: commit log, add e0547c81bf]
Fixes: b516ea586d ("PCI: Enable NVIDIA HDA controllers")
Fixes: e0547c81bf ("PCI: Reset Lenovo ThinkPad P50 nvgpu at boot if necessary")
Link: https://lore.kernel.org/r/20190801220117.14952-1-lyude@redhat.com
Signed-off-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Ben Skeggs <bskeggs@redhat.com>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Daniel Drake <drake@endlessm.com>
Cc: Aaron Plattner <aplattner@nvidia.com>
Cc: Peter Wu <peter@lekensteyn.nl>
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: Karol Herbst <kherbst@redhat.com>
Cc: Maik Freudenberg <hhfeuer@gmx.de>
Pull auxdisplay fixes from Miguel Ojeda:
"A few minor auxdisplay improvements:
- A couple of small header cleanups for charlcd (Masahiro Yamada)
- A trivial typo fix for the examples of cfag12864b (Masahiro Yamada)
- An Kconfig help text improvement for charlcd (Mans Rullgard)
- An error path fix for panel (zhengbin)"
* tag 'auxdisplay-for-linus-v5.3-rc5' of git://github.com/ojeda/linux:
auxdisplay: Fix a typo in cfag12864b-example.c
auxdisplay: charlcd: add include guard to charlcd.h
auxdisplay: charlcd: move charlcd.h to drivers/auxdisplay
auxdisplay: charlcd: add help text for backlight initial state
auxdisplay: panel: need to delete scan_timer when misc_register fails in panel_attach
Pull devicetree fixes from Rob Herring:
- Fix building DT binding examples for in tree builds
- Correct some refcounting in adjust_local_phandle_references()
- Update FSL FEC binding with deprecated properties
- Schema fix in stm32 pinctrl
- Fix typo in of_irq_parse_one docbook comment
* tag 'devicetree-fixes-for-5.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
of: irq: fix a trivial typo in a doc comment
dt-bindings: pinctrl: stm32: Fix 'st,syscfg' schema
dt-bindings: fec: explicitly mark deprecated properties
of: resolver: Add of_node_put() before return and break
dt-bindings: Fix generated example files getting added to schemas
_opp_supported_by_regulators() wrongly ignored errors from
regulator_is_supported_voltage(), so it considered errors as
success. Since
commit 4982094451 ("regulator: core: simplify return value on suported_voltage")
regulator_is_supported_voltage() returns a real boolean, so
errors make _opp_supported_by_regulators() return false.
That reveals a problem with the declaration of the VDD1/2
regulators on twl4030.
The VDD1/VDD2 regulators on twl4030 are neither defined with
voltage lists nor with the continuous flag set, so
regulator_is_supported_voltage() returns false and an error
before above mentioned commit (which was considered success)
The result is that after the above mentioned commit cpufreq
does not work properly e.g. dm3730.
[ 2.490997] core: _opp_supported_by_regulators: OPP minuV: 1012500 maxuV: 1012500, not supported by regulator
[ 2.501617] cpu cpu0: _opp_add: OPP not supported by regulators (300000000)
[ 2.509246] core: _opp_supported_by_regulators: OPP minuV: 1200000 maxuV: 1200000, not supported by regulator
[ 2.519775] cpu cpu0: _opp_add: OPP not supported by regulators (600000000)
[ 2.527313] core: _opp_supported_by_regulators: OPP minuV: 1325000 maxuV: 1325000, not supported by regulator
[ 2.537750] cpu cpu0: _opp_add: OPP not supported by regulators (800000000)
The patch fixes declaration of VDD1/2 regulators by
adding proper voltage lists.
Fixes: 4982094451 ("regulator: core: simplify return value on suported_voltage")
Cc: stable@vger.kernel.org
Signed-off-by: Andreas Kemnade <andreas@kemnade.info>
Tested-by: Adam Ford <aford173@gmail.com> #logicpd-torpedo-37xx-devkit
Link: https://lore.kernel.org/r/20190814214319.24087-1-andreas@kemnade.info
Signed-off-by: Mark Brown <broonie@kernel.org>
The USB buffer allocation code is the only place in the usb core (and in
fact the whole kernel) that uses is_device_dma_capable, while the URB
mapping code uses the uses_dma flag in struct usb_bus. Switch the buffer
allocation to use the uses_dma flag used by the rest of the USB code,
and create a helper in hcd.h that checks this flag as well as the
CONFIG_HAS_DMA to simplify the caller a bit.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20190811080520.21712-3-hch@lst.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
arm64: dts: Amlogic fixes for v5.3-rc
- a few small DT fixes for g12a/g12b platforms
* tag 'amlogic-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-amlogic:
arm64: dts: amlogic: odroid-n2: keep SD card regulator always on
arm64: dts: meson-g12a-sei510: enable IR controller
arm64: dts: meson-g12a: add missing dwc2 phy-names
Fixes for omap variants for v5.3-rc cycle
We have another fix to disable voltage switching for am57xx SDIO as
the bootrom cannot handle all the voltages after a reset that thought
I had already sent a pull request for earlier but forgot. And we also
update dra74x iodelay configuration for mmc3 to use the recommended
values.
Then I noticed we had introduced few new boot warnings with the various
recent ti-sysc changes and wanted to fix those. I also noticed we still
have too many warnings to be able to spot the real ones easily and fixed
up few of those. Sure some of the warnings have been around for a long
time and few of the fixes could have waited for the merge window, but
having more usable dmesg log level output is a valuable.
Other fixes are IO size correction for am335x UARTs that cause issues
for at least FreeBSD using the same device tree file that checks that
the child IO range is not larger than the parent has.
For omap1 ams-delta keyboard we need to fix a irq ack that broke with
all the recent gpio changes.
And there are also few static checker warning fixes for recent am335x
PM changes and ti-sysc driver and one switch fall-though update.
* tag 'omap-for-v5.3/fixes-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
soc: ti: pm33xx: Make two symbols static
soc: ti: pm33xx: Fix static checker warnings
ARM: OMAP: dma: Mark expected switch fall-throughs
ARM: dts: Fix incomplete dts data for am3 and am4 mmc
bus: ti-sysc: Simplify cleanup upon failures in sysc_probe()
ARM: OMAP1: ams-delta-fiq: Fix missing irq_ack
ARM: dts: dra74x: Fix iodelay configuration for mmc3
ARM: dts: am335x: Fix UARTs length
ARM: OMAP2+: Fix omap4 errata warning on other SoCs
ARM: dts: Fix incorrect dcan register mapping for am3, am4 and dra7
ARM: dts: Fix flags for gpio7
bus: ti-sysc: Fix using configured sysc mask value
bus: ti-sysc: Fix handling of forced idle
ARM: OMAP2+: Fix missing SYSC_HAS_RESET_STATUS for dra7 epwmss
ARM: dts: am57xx: Disable voltage switching for SD card
Link: https://lore.kernel.org/r/pull-1565844391-332885@atomide.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
On Motorola Mapphone devices such as Droid 4 there are five USB ports
that do not use the same layout as Gobi 1K/2K/etc devices listed in
qcserial.c. So we should use qcaux.c or option.c as noted by
Dan Williams <dan.j.williams@intel.com>.
As the Motorola USB serial ports have an interrupt endpoint as shown
with lsusb -v, we should use option.c instead of qcaux.c as pointed out
by Johan Hovold <johan@kernel.org>.
The ff/ff/ff interfaces seem to always be UARTs on Motorola devices.
For the other interfaces, class 0x0a (CDC Data) should not in general
be added as they are typically part of a multi-interface function as
noted earlier by Bjørn Mork <bjorn@mork.no>.
However, looking at the Motorola mapphone kernel code, the mdm6600 0x0a
class is only used for flashing the modem firmware, and there are no
other interfaces. So I've added that too with more details below as it
works just fine.
The ttyUSB ports on Droid 4 are:
ttyUSB0 DIAG, CQDM-capable
ttyUSB1 MUX or NMEA, no response
ttyUSB2 MUX or NMEA, no response
ttyUSB3 TCMD
ttyUSB4 AT-capable
The ttyUSB0 is detected as QCDM capable by ModemManager. I think
it's only used for debugging with ModemManager --debug for sending
custom AT commands though. ModemManager already can manage data
connection using the USB QMI ports that are already handled by the
qmi_wwan.c driver.
To enable the MUX or NMEA ports, it seems that something needs to be
done additionally to enable them, maybe via the DIAG or TCMD port.
It might be just a NVRAM setting somewhere, but I have no idea what
NVRAM settings may need changing for that.
The TCMD port seems to be a Motorola custom protocol for testing
the modem and to configure it's NVRAM and seems to work just fine
based on a quick test with a minimal tcmdrw tool I wrote.
The voice modem AT-capable port seems to provide only partial
support, and no PM support compared to the TS 27.010 based UART
wired directly to the modem.
The UARTs added with this change are the same product IDs as the
Motorola Mapphone Android Linux kernel mdm6600_id_table. I don't
have any mdm9600 based devices, so I have only tested these on
mdm6600 based droid 4.
Then for the class 0x0a (CDC Data) mode, the Motorola Mapphone Android
Linux kernel driver moto_flashqsc.c just seems to change the
port->bulk_out_size to 8K from the default. And is only used for
flashing the modem firmware it seems.
I've verified that flashing the modem with signed firmware works just
fine with the option driver after manually toggling the GPIO pins, so
I've added droid 4 modem flashing mode to the option driver. I've not
added the other devices listed in moto_flashqsc.c in case they really
need different port->bulk_out_size. Those can be added as they get
tested to work for flashing the modem.
After this patch the output of /sys/kernel/debug/usb/devices has
the following for normal 22b8:2a70 mode including the related qmi_wwan
interfaces:
T: Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 2 Spd=12 MxCh= 0
D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1
P: Vendor=22b8 ProdID=2a70 Rev= 0.00
S: Manufacturer=Motorola, Incorporated
S: Product=Flash MZ600
C:* #Ifs= 9 Cfg#= 1 Atr=e0 MxPwr=500mA
I:* If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
E: Ad=81(I) Atr=02(Bulk) MxPS= 64 Ivl=0ms
E: Ad=01(O) Atr=02(Bulk) MxPS= 64 Ivl=0ms
I:* If#= 1 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
E: Ad=82(I) Atr=02(Bulk) MxPS= 64 Ivl=0ms
E: Ad=02(O) Atr=02(Bulk) MxPS= 64 Ivl=0ms
I:* If#= 2 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
E: Ad=83(I) Atr=02(Bulk) MxPS= 64 Ivl=0ms
E: Ad=03(O) Atr=02(Bulk) MxPS= 64 Ivl=0ms
I:* If#= 3 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
E: Ad=84(I) Atr=02(Bulk) MxPS= 64 Ivl=0ms
E: Ad=04(O) Atr=02(Bulk) MxPS= 64 Ivl=0ms
I:* If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
E: Ad=85(I) Atr=03(Int.) MxPS= 64 Ivl=5ms
E: Ad=86(I) Atr=02(Bulk) MxPS= 64 Ivl=0ms
E: Ad=05(O) Atr=02(Bulk) MxPS= 64 Ivl=0ms
I:* If#= 5 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=fb Prot=ff Driver=qmi_wwan
E: Ad=87(I) Atr=03(Int.) MxPS= 64 Ivl=5ms
E: Ad=88(I) Atr=02(Bulk) MxPS= 64 Ivl=0ms
E: Ad=06(O) Atr=02(Bulk) MxPS= 64 Ivl=0ms
I:* If#= 6 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=fb Prot=ff Driver=qmi_wwan
E: Ad=89(I) Atr=03(Int.) MxPS= 64 Ivl=5ms
E: Ad=8a(I) Atr=02(Bulk) MxPS= 64 Ivl=0ms
E: Ad=07(O) Atr=02(Bulk) MxPS= 64 Ivl=0ms
I:* If#= 7 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=fb Prot=ff Driver=qmi_wwan
E: Ad=8b(I) Atr=03(Int.) MxPS= 64 Ivl=5ms
E: Ad=8c(I) Atr=02(Bulk) MxPS= 64 Ivl=0ms
E: Ad=08(O) Atr=02(Bulk) MxPS= 64 Ivl=0ms
I:* If#= 8 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=fb Prot=ff Driver=qmi_wwan
E: Ad=8d(I) Atr=03(Int.) MxPS= 64 Ivl=5ms
E: Ad=8e(I) Atr=02(Bulk) MxPS= 64 Ivl=0ms
E: Ad=09(O) Atr=02(Bulk) MxPS= 64 Ivl=0ms
In 22b8:900e "qc_dload" mode the device shows up as:
T: Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 2 Spd=12 MxCh= 0
D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1
P: Vendor=22b8 ProdID=900e Rev= 0.00
S: Manufacturer=Motorola, Incorporated
S: Product=Flash MZ600
C:* #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr=500mA
I:* If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
E: Ad=81(I) Atr=02(Bulk) MxPS= 64 Ivl=0ms
E: Ad=01(O) Atr=02(Bulk) MxPS= 64 Ivl=0ms
And in 22b8:4281 "ram_downloader" mode the device shows up as:
T: Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 2 Spd=12 MxCh= 0
D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1
P: Vendor=22b8 ProdID=4281 Rev= 0.00
S: Manufacturer=Motorola, Incorporated
S: Product=Flash MZ600
C:* #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr=500mA
I:* If#= 0 Alt= 0 #EPs= 2 Cls=0a(data ) Sub=00 Prot=fc Driver=option
E: Ad=81(I) Atr=02(Bulk) MxPS= 64 Ivl=0ms
E: Ad=01(O) Atr=02(Bulk) MxPS= 64 Ivl=0ms
Cc: Bjørn Mork <bjorn@mork.no>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Lars Melin <larsm17@gmail.com>
Cc: Marcel Partap <mpartap@gmx.net>
Cc: Merlijn Wajer <merlijn@wizzup.org>
Cc: Michael Scott <hashcode0f@gmail.com>
Cc: NeKit <nekit1000@gmail.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Sebastian Reichel <sre@kernel.org>
Tested-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
vmx_set_nested_state_test is trying to use the KVM_STATE_NESTED_EVMCS without
enabling enlightened VMCS first. Correct the outcome of the test, and actually
test that it succeeds after the capability is enabled.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
There are two tests already enabling eVMCS and a third is coming.
Add a function that enables the capability and tests the result.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This test is only covering various edge cases of the
KVM_SET_NESTED_STATE ioctl. Running the VM does not really
add anything.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
I -thought- I had fixed this entirely, but it looks like that I didn't
test this thoroughly enough as we apparently still make one big mistake
with nv50_msto_atomic_check() - we don't handle the following scenario:
* CRTC #1 has n VCPI allocated to it, is attached to connector DP-4
which is attached to encoder #1. enabled=y active=n
* CRTC #1 is changed from DP-4 to DP-5, causing:
* DP-4 crtc=#1→NULL (VCPI n→0)
* DP-5 crtc=NULL→#1
* CRTC #1 steals encoder #1 back from DP-4 and gives it to DP-5
* CRTC #1 maintains the same mode as before, just with a different
connector
* mode_changed=n connectors_changed=y
(we _SHOULD_ do VCPI 0→n here, but don't)
Once the above scenario is repeated once, we'll attempt freeing VCPI
from the connector that we didn't allocate due to the connectors
changing, but the mode staying the same. Sigh.
Since nv50_msto_atomic_check() has broken a few times now, let's rethink
things a bit to be more careful: limit both VCPI/PBN allocations to
mode_changed || connectors_changed, since neither VCPI or PBN should
ever need to change outside of routing and mode changes.
Changes since v1:
* Fix accidental reversal of clock and bpp arguments in
drm_dp_calc_pbn_mode() - William Lewis
Signed-off-by: Lyude Paul <lyude@redhat.com>
Reported-by: Bohdan Milar <bmilar@redhat.com>
Tested-by: Bohdan Milar <bmilar@redhat.com>
Fixes: 232c9eec41 ("drm/nouveau: Use atomic VCPI helpers for MST")
References: 412e85b605 ("drm/nouveau: Only release VCPI slots on mode changes")
Cc: Lyude Paul <lyude@redhat.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Airlie <airlied@redhat.com>
Cc: Jerry Zuo <Jerry.Zuo@amd.com>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Juston Li <juston.li@intel.com>
Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Cc: Karol Herbst <karolherbst@gmail.com>
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: <stable@vger.kernel.org> # v5.1+
Acked-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190809005307.18391-1-lyude@redhat.com
In blocked_fl_write(), 't' is not deallocated if bitmap_parse_user() fails,
leading to a memory leak bug. To fix this issue, free t before returning
the error.
Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Diverged from what the code does with commit 530210c781 ("of/irq: Replace
of_irq with of_phandle_args").
Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Signed-off-by: Rob Herring <robh@kernel.org>
Fix the following BUG:
[ 187.065689] BUG: kernel NULL pointer dereference, address: 000000000000001c
[ 187.065790] RIP: 0010:ufshcd_vreg_set_hpm+0x3c/0x110 [ufshcd_core]
[ 187.065938] Call Trace:
[ 187.065959] ufshcd_resume+0x72/0x290 [ufshcd_core]
[ 187.065980] ufshcd_system_resume+0x54/0x140 [ufshcd_core]
[ 187.065993] ? pci_pm_restore+0xb0/0xb0
[ 187.066005] ufshcd_pci_resume+0x15/0x20 [ufshcd_pci]
[ 187.066017] pci_pm_thaw+0x4c/0x90
[ 187.066030] dpm_run_callback+0x5b/0x150
[ 187.066043] device_resume+0x11b/0x220
Voltage regulators are optional, so functions must check they exist
before dereferencing.
Note this issue is hidden if CONFIG_REGULATORS is not set, because the
offending code is optimised away.
Notes for stable:
The issue first appears in commit 57d104c153 ("ufs: add UFS power
management support") but is inadvertently fixed in commit 60f0187031
("scsi: ufs: disable vccq if it's not needed by UFS device") which in
turn was reverted by commit 730679817d ("Revert "scsi: ufs: disable vccq
if it's not needed by UFS device""). So fix applies v3.18 to v4.5 and
v5.1+
Fixes: 57d104c153 ("ufs: add UFS power management support")
Fixes: 730679817d ("Revert "scsi: ufs: disable vccq if it's not needed by UFS device"")
Cc: stable@vger.kernel.org
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In tcmu_handle_completion() function, the variable called read_len is
always initialized with a value taken from se_cmd structure. If this
function is called to complete an expired (timed out) out command, the
session command pointed by se_cmd is likely to be already deallocated by
the target core at that moment. As the result, this access triggers a
use-after-free warning from KASAN.
This patch fixes the code not to touch se_cmd when completing timed out
TCMU commands. It also resets the pointer to se_cmd at the time when the
TCMU_CMD_BIT_EXPIRED flag is set because it is going to become invalid
after calling target_complete_cmd() later in the same function,
tcmu_check_expired_cmd().
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Acked-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If HBA initialization fails unexpectedly (exiting via probe_failed:), we
may fail to free vha->gnl.l. So that we don't attempt to double free, set
this pointer to NULL after a free and check for NULL at probe_failed: so we
know whether or not to call dma_free_coherent.
Signed-off-by: Bill Kuzeja <william.kuzeja@stratus.com>
Acked-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This driver requires imported PRIME buffers to appear contiguously in
its IO address space. Make sure this is the case by setting the maximum
DMA segment size to a more suitable value than the default 64KB.
Signed-off-by: Alexandre Courbot <acourbot@chromium.org>
Reviewed-by: Tomasz Figa <tfiga@chromium.org>
Signed-off-by: CK Hu <ck.hu@mediatek.com>
PRIME buffers should be imported using the DMA device. To this end, use
a custom import function that mimics drm_gem_prime_import_dev(), but
passes the correct device.
Fixes: 119f517362 ("drm/mediatek: Add DRM Driver for Mediatek SoC MT8173.")
Signed-off-by: Alexandre Courbot <acourbot@chromium.org>
Signed-off-by: CK Hu <ck.hu@mediatek.com>
Pull fallthrough fixes from Gustavo A. R. Silva:
"Fix sh mainline builds:
- Fix fall-through warning in sh.
- Fix missing break bug in sh (this is a 10-year-old bug)
Currently, mainline builds for sh are broken. These patches fix that"
* tag 'Wimplicit-fallthrough-5.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux:
sh: kernel: hw_breakpoint: Fix missing break in switch statement
sh: kernel: disassemble: Mark expected switch fall-throughs
Pull afs fixes from David Howells:
- Fix the CB.ProbeUuid handler to generate its reply correctly.
- Fix a mix up in indices when parsing a Volume Location entry record.
- Fix a potential NULL-pointer deref when cleaning up a read request.
- Fix the expected data version of the destination directory in
afs_rename().
- Fix afs_d_revalidate() to only update d_fsdata if it's not the same
as the directory data version to reduce the likelihood of overwriting
the result of a competing operation. (d_fsdata carries the directory
DV or the least-significant word thereof).
- Fix the tracking of the data-version on a directory and make sure
that dentry objects get properly initialised, updated and
revalidated.
Also fix rename to update d_fsdata to match the new directory's DV if
the dentry gets moved over and unhash the dentry to stop
afs_d_revalidate() from interfering.
* tag 'afs-fixes-20190814' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
afs: Fix missing dentry data version updating
afs: Only update d_fsdata if different in afs_d_revalidate()
afs: Fix off-by-one in afs_rename() expected data version calculation
fs: afs: Fix a possible null-pointer dereference in afs_put_read()
afs: Fix loop index mixup in afs_deliver_vl_get_entry_by_name_u()
afs: Fix the CB.ProbeUuid service handler to reply correctly
"bind4 allow specific IP & port" and "bind6 deny specific IP & port"
fail on s390 because of endianness issue: the 4 IP address bytes are
loaded as a word and compared with a constant, but the value of this
constant should be different on big- and little- endian machines, which
is not the case right now.
Use __bpf_constant_ntohl to generate proper value based on machine
endianness.
Fixes: 1d436885b2 ("selftests/bpf: Selftest for sys_bind post-hooks.")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The spsc_queue_peek function is accessing queue->head which belongs to
the consumer thread and shouldn't be accessed by the producer
This is fixing a rare race condition when destroying entities.
Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Monk.liu@amd.com
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Make the __fstate_clean() function correctly set the
state of sstatus.FS in pt_regs to SR_FS_CLEAN.
Fixes: 7db91e57a0 ("RISC-V: Task implementation")
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
[paul.walmsley@sifive.com: expanded "Fixes" commit ID]
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
The following two reasons cause FP registers are sometimes not
initialized before starting the user program.
1. Currently, the FP context is initialized in flush_thread() function
and we expect these initial values to be restored to FP register when
doing FP context switch. However, the FP context switch only occurs in
switch_to function. Hence, if this process does not be scheduled out
and scheduled in before entering the user space, the FP registers
have no chance to initialize.
2. In flush_thread(), the state of reg->sstatus.FS inherits from the
parent. Hence, the state of reg->sstatus.FS may be dirty. If this
process is scheduled out during flush_thread() and initializing the
FP register, the fstate_save() in switch_to will corrupt the FP context
which has been initialized until flush_thread().
To solve the 1st case, the initialization of the FP register will be
completed in start_thread(). It makes sure all FP registers are initialized
before starting the user program. For the 2nd case, the state of
reg->sstatus.FS in start_thread will be set to SR_FS_OFF to prevent this
process from corrupting FP context in doing context save. The FP state is
set to SR_FS_INITIAL in start_trhead().
Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Fixes: 7db91e57a0 ("RISC-V: Task implementation")
Cc: stable@vger.kernel.org
[paul.walmsley@sifive.com: fixed brace alignment issue reported by
checkpatch]
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
Pull rdma fixes from Doug Ledford:
"Fairly small pull request for -rc3. I'm out of town the rest of this
week, so I made sure to clean out as much as possible from patchworks
in enough time for 0-day to chew through it (Yay! for 0-day being back
online! :-)). Jason might send through any emergency stuff that could
pop up, otherwise I'm back next week.
The only real thing of note is the siw ABI change. Since we just
merged siw *this* release, there are no prior kernel releases to
maintain kernel ABI with. I told Bernard that if there is anything
else about the siw ABI he thinks he might want to change before it
goes set in stone, he should get it in ASAP. The siw module was around
for several years outside the kernel tree, and it had to be revamped
considerably for inclusion upstream, so we are making no attempts to
be backward compatible with the out of tree version. Once 5.3 is
actually released, we will have our baseline ABI to maintain.
Summary:
- Fix a memory registration release flow issue that was causing a
WARN_ON (mlx5)
- If the counters for a port aren't allocated, then we can't do
operations on the non-existent counters (core)
- Check the right variable for error code result (mlx5)
- Fix a use after free issue (mlx5)
- Fix an off by one memory leak (siw)
- Actually return an error code on error (core)
- Allow siw to be built on 32bit arches (siw, ABI change, but OK
since siw was just merged this merge window and there is no prior
released kernel to maintain compatibility with and we also updated
the rdma-core user space package to match)"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/siw: Change CQ flags from 64->32 bits
RDMA/core: Fix error code in stat_get_doit_qp()
RDMA/siw: Fix a memory leak in siw_init_cpulist()
IB/mlx5: Fix use-after-free error while accessing ev_file pointer
IB/mlx5: Check the correct variable in error handling code
RDMA/counter: Prevent QP counter binding if counters unsupported
IB/mlx5: Fix implicit MR release flow
The `uac_mixer_unit_descriptor` shown as below is read from the
device side. In `parse_audio_mixer_unit`, `baSourceID` field is
accessed from index 0 to `bNrInPins` - 1, the current implementation
assumes that descriptor is always valid (the length of descriptor
is no shorter than 5 + `bNrInPins`). If a descriptor read from
the device side is invalid, it may trigger out-of-bound memory
access.
```
struct uac_mixer_unit_descriptor {
__u8 bLength;
__u8 bDescriptorType;
__u8 bDescriptorSubtype;
__u8 bUnitID;
__u8 bNrInPins;
__u8 baSourceID[];
}
```
This patch fixes the bug by add a sanity check on the length of
the descriptor.
Reported-by: Hui Peng <benquike@gmail.com>
Reported-by: Mathias Payer <mathias.payer@nebelwelt.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Hui Peng <benquike@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Pull dma-mapping fixes from Christoph Hellwig:
- fix the handling of the bus_dma_mask in dma_get_required_mask, which
caused a regression in this merge window (Lucas Stach)
- fix a regression in the handling of DMA_ATTR_NO_KERNEL_MAPPING (me)
- fix dma_mmap_coherent to not cause page attribute mismatches on
coherent architectures like x86 (me)
* tag 'dma-mapping-5.3-4' of git://git.infradead.org/users/hch/dma-mapping:
dma-mapping: fix page attributes for dma_mmap_*
dma-direct: don't truncate dma_required_mask to bus addressing capabilities
dma-direct: fix DMA_ATTR_NO_KERNEL_MAPPING
Pull iommu fixes from Joerg Roedel:
- A couple more fixes for the Intel VT-d driver for bugs introduced
during the recent conversion of this driver to use IOMMU core default
domains.
- Fix for common dma-iommu code to make sure MSI mappings happen in the
correct domain for a device.
- Fix a corner case in the handling of sg-lists in dma-iommu code that
might cause dma_length to be truncated.
- Mark a switch as fall-through in arm-smmu code.
* tag 'iommu-fixes-v5.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/vt-d: Fix possible use-after-free of private domain
iommu/vt-d: Detach domain before using a private one
iommu/dma: Handle SG length overflow better
iommu/vt-d: Correctly check format of page table in debugfs
iommu/vt-d: Detach domain when move device out of group
iommu/arm-smmu: Mark expected switch fall-through
iommu/dma: Handle MSI mappings separately
Merge misc VM fixes from Andrew Morton:
"A bunch of hotfixes, all affecting mm/.
The two-patch series from Andrea may be controversial. This restores
patches which were reverted in Dec 2018 due to a regression report [*].
After extensive discussion it is evident that the problems which these
patches solved were significantly more serious than the problems they
introduced. I am told that major distros are already carrying these
two patches for this reason"
[*] See
https://lore.kernel.org/lkml/alpine.DEB.2.21.1812061343240.144733@chino.kir.corp.google.com/https://lore.kernel.org/lkml/alpine.DEB.2.21.1812031545560.161134@chino.kir.corp.google.com/
for the google-specific issues brought up by David Rijentes. And as
Andrew says:
"I'm unaware of anyone else who will be adversely affected by this,
and google already carries over a thousand kernel patches - another
won't kill them.
There has been sporadic discussion about fixing these things for
real but it's clear that nobody apart from David is particularly
motivated"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
hugetlbfs: fix hugetlb page migration/fault race causing SIGBUS
mm, vmscan: do not special-case slab reclaim when watermarks are boosted
Revert "mm, thp: restore node-local hugepage allocations"
Revert "Revert "mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask""
include/asm-generic/5level-fixup.h: fix variable 'p4d' set but not used
seq_file: fix problem when seeking mid-record
mm: workingset: fix vmstat counters for shadow nodes
mm/usercopy: use memory range to be accessed for wraparound check
mm: kmemleak: disable early logging in case of error
mm/vmalloc.c: fix percpu free VM area search criteria
mm/memcontrol.c: fix use after free in mem_cgroup_iter()
mm/z3fold.c: fix z3fold_destroy_pool() race condition
mm/z3fold.c: fix z3fold_destroy_pool() ordering
mm: mempolicy: handle vma with unmovable pages mapped correctly in mbind
mm: mempolicy: make the behavior consistent when MPOL_MF_MOVE* and MPOL_MF_STRICT were specified
mm/hmm: fix bad subpage pointer in try_to_unmap_one
mm/hmm: fix ZONE_DEVICE anon page mapping reuse
mm: document zone device struct page field usage
new_entry is reassigned a new value next line. So
it's redundant and remove it.
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This is probably overdue---KVM x86 has quite a few contributors that
usually review each other's patches, which is really helpful to me.
Formalize this by listing them as reviewers. I am including people
with various expertise:
- Joerg for SVM (with designated reviewers, it makes more sense to have
him in the main KVM/x86 stanza)
- Sean for MMU and VMX
- Jim for VMX
- Vitaly for Hyper-V and possibly SVM
- Wanpeng for LAPIC and paravirtualization.
Please ack if you are okay with this arrangement, otherwise speak up.
In other news, Radim is going to leave Red Hat soon. However, he has
not been very much involved in upstream KVM development for some time,
and in the immediate future he is still going to help maintain kvm/queue
while I am on vacation. Since not much is going to change, I will let
him decide whether he wants to keep the maintainer role after he leaves.
Acked-by: Joerg Roedel <joro@8bytes.org>
Acked-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Wanpeng Li <wanpengli@tencent.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM/s390 does not have a list of its own, and linux-s390 is in the
loop anyway thanks to the generic arch/s390 match. So use the generic
KVM list for s390 patches.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
recalculate_apic_map does not santize ldr and it's possible that
multiple bits are set. In that case, a previous valid entry
can potentially be overwritten by an invalid one.
This condition is hit when booting a 32 bit, >8 CPU, RHEL6 guest and then
triggering a crash to boot a kdump kernel. This is the sequence of
events:
1. Linux boots in bigsmp mode and enables PhysFlat, however, it still
writes to the LDR which probably will never be used.
2. However, when booting into kdump, the stale LDR values remain as
they are not cleared by the guest and there isn't a apic reset.
3. kdump boots with 1 cpu, and uses Logical Destination Mode but the
logical map has been overwritten and points to an inactive vcpu.
Signed-off-by: Radim Krcmar <rkrcmar@redhat.com>
Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Don't fall through to print error message when receive sleep indication
in HCI_IBS_RX_ASLEEP state, this is allowed behavior.
Signed-off-by: Rocky Liao <rjliao@codeaurora.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch corrects the SPDX License Identifier style
in header file related to STM32 Driver for I2C hardware
bus support.
For C header files Documentation/process/license-rules.rst
mandates C-like comments (opposed to C source files where
C++ style should be used)
Changes made by using a script provided by Joe Perches here:
https://lkml.org/lkml/2019/2/7/46
Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Nishad Kamdar <nishadkamdar@gmail.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
rxrpc_queue_local() attempts to queue the local endpoint it is given and
then, if successful, prints a trace line. The trace line includes the
current usage count - but we're not allowed to look at the local endpoint
at this point as we passed our ref on it to the workqueue.
Fix this by reading the usage count before queuing the work item.
Also fix the reading of local->debug_id for trace lines, which must be done
with the same consideration as reading the usage count.
Fixes: 09d2bf595d ("rxrpc: Add a tracepoint to track rxrpc_local refcounting")
Reported-by: syzbot+78e71c5bab4f76a6a719@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
When a local endpoint (struct rxrpc_local) ceases to be in use by any
AF_RXRPC sockets, it starts the process of being destroyed, but this
doesn't cause it to be removed from the namespace endpoint list immediately
as tearing it down isn't trivial and can't be done in softirq context, so
it gets deferred.
If a new socket comes along that wants to bind to the same endpoint, a new
rxrpc_local object will be allocated and rxrpc_lookup_local() will use
list_replace() to substitute the new one for the old.
Then, when the dying object gets to rxrpc_local_destroyer(), it is removed
unconditionally from whatever list it is on by calling list_del_init().
However, list_replace() doesn't reset the pointers in the replaced
list_head and so the list_del_init() will likely corrupt the local
endpoints list.
Fix this by using list_replace_init() instead.
Fixes: 730c5fd42c ("rxrpc: Fix local endpoint refcounting")
Reported-by: syzbot+193e29e9387ea5837f1d@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
I would like to maintain the i2c-imx driver. Since I work with
different i.MX variants and have access to the hardware, I can spend
some time on the reviewing of this driver.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Since commit e1ab9a468e ("i2c: imx: improve the error handling in
i2c_imx_dma_request()") when booting with the DMA driver as module (such
as CONFIG_FSL_EDMA=m) the following endless clk warnings are seen:
[ 153.077831] ------------[ cut here ]------------
[ 153.082528] WARNING: CPU: 0 PID: 15 at drivers/clk/clk.c:924 clk_core_disable_lock+0x18/0x24
[ 153.093077] i2c0 already disabled
[ 153.096416] Modules linked in:
[ 153.099521] CPU: 0 PID: 15 Comm: kworker/0:1 Tainted: G W 5.2.0+ #321
[ 153.107290] Hardware name: Freescale Vybrid VF5xx/VF6xx (Device Tree)
[ 153.113772] Workqueue: events deferred_probe_work_func
[ 153.118979] [<c0019560>] (unwind_backtrace) from [<c0014734>] (show_stack+0x10/0x14)
[ 153.126778] [<c0014734>] (show_stack) from [<c083f8dc>] (dump_stack+0x9c/0xd4)
[ 153.134051] [<c083f8dc>] (dump_stack) from [<c0031154>] (__warn+0xf8/0x124)
[ 153.141056] [<c0031154>] (__warn) from [<c0031248>] (warn_slowpath_fmt+0x38/0x48)
[ 153.148580] [<c0031248>] (warn_slowpath_fmt) from [<c040fde0>] (clk_core_disable_lock+0x18/0x24)
[ 153.157413] [<c040fde0>] (clk_core_disable_lock) from [<c058f520>] (i2c_imx_probe+0x554/0x6ec)
[ 153.166076] [<c058f520>] (i2c_imx_probe) from [<c04b9178>] (platform_drv_probe+0x48/0x98)
[ 153.174297] [<c04b9178>] (platform_drv_probe) from [<c04b7298>] (really_probe+0x1d8/0x2c0)
[ 153.182605] [<c04b7298>] (really_probe) from [<c04b7554>] (driver_probe_device+0x5c/0x174)
[ 153.190909] [<c04b7554>] (driver_probe_device) from [<c04b58c8>] (bus_for_each_drv+0x44/0x8c)
[ 153.199480] [<c04b58c8>] (bus_for_each_drv) from [<c04b746c>] (__device_attach+0xa0/0x108)
[ 153.207782] [<c04b746c>] (__device_attach) from [<c04b65a4>] (bus_probe_device+0x88/0x90)
[ 153.215999] [<c04b65a4>] (bus_probe_device) from [<c04b6a04>] (deferred_probe_work_func+0x60/0x90)
[ 153.225003] [<c04b6a04>] (deferred_probe_work_func) from [<c004f190>] (process_one_work+0x204/0x634)
[ 153.234178] [<c004f190>] (process_one_work) from [<c004f618>] (worker_thread+0x20/0x484)
[ 153.242315] [<c004f618>] (worker_thread) from [<c0055c2c>] (kthread+0x118/0x150)
[ 153.249758] [<c0055c2c>] (kthread) from [<c00090b4>] (ret_from_fork+0x14/0x20)
[ 153.257006] Exception stack(0xdde43fb0 to 0xdde43ff8)
[ 153.262095] 3fa0: 00000000 00000000 00000000 00000000
[ 153.270306] 3fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 153.278520] 3fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[ 153.285159] irq event stamp: 3323022
[ 153.288787] hardirqs last enabled at (3323021): [<c0861c4c>] _raw_spin_unlock_irq+0x24/0x2c
[ 153.297261] hardirqs last disabled at (3323022): [<c040d7a0>] clk_enable_lock+0x10/0x124
[ 153.305392] softirqs last enabled at (3322092): [<c000a504>] __do_softirq+0x344/0x540
[ 153.313352] softirqs last disabled at (3322081): [<c00385c0>] irq_exit+0x10c/0x128
[ 153.320946] ---[ end trace a506731ccd9bd703 ]---
This endless clk warnings behaviour is well explained by Andrey Smirnov:
"Allocating DMA after registering I2C adapter can lead to infinite
probing loop, for example, consider the following scenario:
1. i2c_imx_probe() is called and successfully registers an I2C
adapter via i2c_add_numbered_adapter()
2. As a part of i2c_add_numbered_adapter() new I2C slave devices
are added from DT which results in a call to
driver_deferred_probe_trigger()
3. i2c_imx_probe() continues and calls i2c_imx_dma_request() which
due to lack of proper DMA driver returns -EPROBE_DEFER
4. i2c_imx_probe() fails, removes I2C adapter and returns
-EPROBE_DEFER, which places it into deferred probe list
5. Deferred probe work triggered in #2 above kicks in and calls
i2c_imx_probe() again thus bringing us to step #1"
So revert commit e1ab9a468e ("i2c: imx: improve the error handling in
i2c_imx_dma_request()") and restore the old behaviour, in order to
avoid regressions on existing setups.
Cc: <stable@vger.kernel.org>
Reported-by: Andrey Smirnov <andrew.smirnov@gmail.com>
Reported-by: Russell King <linux@armlinux.org.uk>
Fixes: e1ab9a468e ("i2c: imx: improve the error handling in i2c_imx_dma_request()")
Signed-off-by: Fabio Estevam <festevam@gmail.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
TCP rst and fin packets do not qualify to place a flow into the
flowtable. Most likely there will be no more packets after connection
closure. Without this patch, this flow entry expires and connection
tracking picks up the entry in ESTABLISHED state using the fixup
timeout, which makes this look inconsistent to the user for a connection
that is actually already closed.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
when do randbuilding, I got this error:
In file included from drivers/hwmon/pmbus/ucd9000.c:19:0:
./include/linux/gpio/driver.h:576:1: error: redefinition of gpiochip_add_pin_range
gpiochip_add_pin_range(struct gpio_chip *chip, const char *pinctl_name,
^~~~~~~~~~~~~~~~~~~~~~
In file included from drivers/hwmon/pmbus/ucd9000.c:18:0:
./include/linux/gpio.h:245:1: note: previous definition of gpiochip_add_pin_range was here
gpiochip_add_pin_range(struct gpio_chip *chip, const char *pinctl_name,
^~~~~~~~~~~~~~~~~~~~~~
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: 964cb34188 ("gpio: move pincontrol calls to <linux/gpio/driver.h>")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Link: https://lore.kernel.org/r/20190731123814.46624-1-yuehaibing@huawei.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
If the driver doesn't support open-drain/source config options, we
emulate this behavior when setting the direction by calling
gpiod_direction_input() if the default value is 0 (open-source) or
1 (open-drain), thus not actively driving the line in those cases.
This however clears the FLAG_IS_OUT bit for the GPIO line descriptor
and makes the LINEINFO ioctl() incorrectly report this line's mode as
'input' to user-space.
This commit modifies the ioctl() to always set the GPIOLINE_FLAG_IS_OUT
bit in the lineinfo structure's flags field. Since it's impossible to
use the input mode and open-drain/source options at the same time, we
can be sure the reported information will be correct.
Fixes: 521a2ad6f8 ("gpio: add userspace ABI for GPIO line information")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Link: https://lore.kernel.org/r/20190806114151.17652-1-brgl@bgdev.pl
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Make codec enter D3 before rebooting or poweroff can fix the noise
issue on some laptops. And in theory it is harmless for all codecs
to enter D3 before rebooting or poweroff, let us add a generic
reboot_notify, then realtek and conexant drivers can call this
function.
Cc: stable@vger.kernel.org
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
We have 3 new lenovo laptops which have conexant codec 0x14f11f86,
these 3 laptops also have the noise issue when rebooting, after
letting the codec enter D3 before rebooting or poweroff, the noise
disappers.
Instead of adding a new ID again in the reboot_notify(), let us make
this function apply to all conexant codec. In theory make codec enter
D3 before rebooting or poweroff is harmless, and I tested this change
on a couple of other Lenovo laptops which have different conexant
codecs, there is no side effect so far.
Cc: stable@vger.kernel.org
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Eric reported a syzbot warning:
BUG: KMSAN: uninit-value in nh_valid_get_del_req+0x6f1/0x8c0 net/ipv4/nexthop.c:1510
CPU: 0 PID: 11812 Comm: syz-executor444 Not tainted 5.3.0-rc3+ #17
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x191/0x1f0 lib/dump_stack.c:113
kmsan_report+0x162/0x2d0 mm/kmsan/kmsan_report.c:109
__msan_warning+0x75/0xe0 mm/kmsan/kmsan_instr.c:294
nh_valid_get_del_req+0x6f1/0x8c0 net/ipv4/nexthop.c:1510
rtm_del_nexthop+0x1b1/0x610 net/ipv4/nexthop.c:1543
rtnetlink_rcv_msg+0x115a/0x1580 net/core/rtnetlink.c:5223
netlink_rcv_skb+0x431/0x620 net/netlink/af_netlink.c:2477
rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:5241
netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
netlink_unicast+0xf6c/0x1050 net/netlink/af_netlink.c:1328
netlink_sendmsg+0x110f/0x1330 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:637 [inline]
sock_sendmsg net/socket.c:657 [inline]
___sys_sendmsg+0x14ff/0x1590 net/socket.c:2311
__sys_sendmmsg+0x53a/0xae0 net/socket.c:2413
__do_sys_sendmmsg net/socket.c:2442 [inline]
__se_sys_sendmmsg+0xbd/0xe0 net/socket.c:2439
__x64_sys_sendmmsg+0x56/0x70 net/socket.c:2439
do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:297
entry_SYSCALL_64_after_hwframe+0x63/0xe7
The root cause is nlmsg_parse calling __nla_parse which means the
header struct size is not checked.
nlmsg_parse should be a wrapper around __nlmsg_parse with
NL_VALIDATE_STRICT for the validate argument very much like
nlmsg_parse_deprecated is for NL_VALIDATE_LIBERAL.
Fixes: 3de6440354 ("netlink: re-add parse/validate functions in strict mode")
Reported-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
After configuring and restarting aneg we immediately try to read the
link status. On some systems the PHY may not yet have cleared the
"aneg complete" and "link up" bits, resulting in a false link-up
signal. See [0] for a report.
Clause 22 and 45 both require the PHY to keep the AN_RESTART
bit set until the PHY actually starts auto-negotiation.
Let's consider this in the generic functions for reading link status.
The commit marked as fixed is the first one where the patch applies
cleanly.
[0] https://marc.info/?t=156518400300003&r=1&w=2
Fixes: c1164bb1a6 ("net: phy: check PMAPMD link status only in genphy_c45_read_link")
Tested-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
In mlx4_en_config_rss_steer(), 'rss_map->indir_qp' is allocated through
kzalloc(). After that, mlx4_qp_alloc() is invoked to configure RSS
indirection. However, if mlx4_qp_alloc() fails, the allocated
'rss_map->indir_qp' is not deallocated, leading to a memory leak bug.
To fix the above issue, add the 'qp_alloc_err' label to free
'rss_map->indir_qp'.
Fixes: 4931c6ef04 ("net/mlx4_en: Optimized single ring steering")
Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
The ibm,mac-address-filters property defines the maximum number of
addresses the hypervisor's multicast filter list can support. It is
encoded as a big-endian integer in the OF device tree, but the virtual
ethernet driver does not convert it for use by little-endian systems.
As a result, the driver is not behaving as it should on affected systems
when a large number of multicast addresses are assigned to the device.
Reported-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Callbacks for a cmd reply run outside the protection of card->lock, to
allow for additional cmds to be issued & enqueued in parallel.
When qeth_send_control_data() bails out for a cmd without having
received a reply (eg. due to timeout), its callback may concurrently be
processing a reply that just arrived. In this case, the callback
potentially accesses a stale reply->reply_param area that eg. was
on-stack and has already been released.
To avoid this race, add some locking so that qeth_send_control_data()
can (1) wait for a concurrently running callback, and (2) zap any
pending callback that still wants to run.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Update the defconfig:
- Add CONFIG_HW_RANDOM=y and CONFIG_HW_RANDOM_VIRTIO=y to enable
VirtIORNG when running on QEMU
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
Update the rv32_defconfig:
- Add 'CONFIG_DEVTMPFS_MOUNT=y' to match the RISC-V defconfig
- Add CONFIG_HW_RANDOM=y and CONFIG_HW_RANDOM_VIRTIO=y to enable
VirtIORNG when running on QEMU
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
As the annotation says in sctp_do_8_2_transport_strike():
"If the transport error count is greater than the pf_retrans
threshold, and less than pathmaxrtx ..."
It should be transport->error_count checked with pathmaxrxt,
instead of asoc->pf_retrans.
Fixes: 5aa93bcf66 ("sctp: Implement quick failover draft from tsvwg")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Variable rc is initialized to a value that is never read and it is
re-assigned later. The initialization is redundant and can be removed.
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Li Wang discovered that LTP/move_page12 V2 sometimes triggers SIGBUS in
the kernel-v5.2.3 testing. This is caused by a race between hugetlb
page migration and page fault.
If a hugetlb page can not be allocated to satisfy a page fault, the task
is sent SIGBUS. This is normal hugetlbfs behavior. A hugetlb fault
mutex exists to prevent two tasks from trying to instantiate the same
page. This protects against the situation where there is only one
hugetlb page, and both tasks would try to allocate. Without the mutex,
one would fail and SIGBUS even though the other fault would be
successful.
There is a similar race between hugetlb page migration and fault.
Migration code will allocate a page for the target of the migration. It
will then unmap the original page from all page tables. It does this
unmap by first clearing the pte and then writing a migration entry. The
page table lock is held for the duration of this clear and write
operation. However, the beginnings of the hugetlb page fault code
optimistically checks the pte without taking the page table lock. If
clear (as it can be during the migration unmap operation), a hugetlb
page allocation is attempted to satisfy the fault. Note that the page
which will eventually satisfy this fault was already allocated by the
migration code. However, the allocation within the fault path could
fail which would result in the task incorrectly being sent SIGBUS.
Ideally, we could take the hugetlb fault mutex in the migration code
when modifying the page tables. However, locks must be taken in the
order of hugetlb fault mutex, page lock, page table lock. This would
require significant rework of the migration code. Instead, the issue is
addressed in the hugetlb fault code. After failing to allocate a huge
page, take the page table lock and check for huge_pte_none before
returning an error. This is the same check that must be made further in
the code even if page allocation is successful.
Link: http://lkml.kernel.org/r/20190808000533.7701-1-mike.kravetz@oracle.com
Fixes: 290408d4a2 ("hugetlb: hugepage migration core")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reported-by: Li Wang <liwang@redhat.com>
Tested-by: Li Wang <liwang@redhat.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Cyril Hrubis <chrubis@suse.cz>
Cc: Xishi Qiu <xishi.qiuxishi@alibaba-inc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dave Chinner reported a problem pointing a finger at commit 1c30844d2d
("mm: reclaim small amounts of memory when an external fragmentation
event occurs").
The report is extensive:
https://lore.kernel.org/linux-mm/20190807091858.2857-1-david@fromorbit.com/
and it's worth recording the most relevant parts (colorful language and
typos included).
When running a simple, steady state 4kB file creation test to
simulate extracting tarballs larger than memory full of small
files into the filesystem, I noticed that once memory fills up
the cache balance goes to hell.
The workload is creating one dirty cached inode for every dirty
page, both of which should require a single IO each to clean and
reclaim, and creation of inodes is throttled by the rate at which
dirty writeback runs at (via balance dirty pages). Hence the ingest
rate of new cached inodes and page cache pages is identical and
steady. As a result, memory reclaim should quickly find a steady
balance between page cache and inode caches.
The moment memory fills, the page cache is reclaimed at a much
faster rate than the inode cache, and evidence suggests that
the inode cache shrinker is not being called when large batches
of pages are being reclaimed. In roughly the same time period
that it takes to fill memory with 50% pages and 50% slab caches,
memory reclaim reduces the page cache down to just dirty pages
and slab caches fill the entirety of memory.
The LRU is largely full of dirty pages, and we're getting spikes
of random writeback from memory reclaim so it's all going to shit.
Behaviour never recovers, the page cache remains pinned at just
dirty pages, and nothing I could tune would make any difference.
vfs_cache_pressure makes no difference - I would set it so high
it should trim the entire inode caches in a single pass, yet it
didn't do anything. It was clear from tracing and live telemetry
that the shrinkers were pretty much not running except when
there was absolutely no memory free at all, and then they did
the minimum necessary to free memory to make progress.
So I went looking at the code, trying to find places where pages
got reclaimed and the shrinkers weren't called. There's only one
- kswapd doing boosted reclaim as per commit 1c30844d2d ("mm:
reclaim small amounts of memory when an external fragmentation
event occurs").
The watermark boosting introduced by the commit is triggered in response
to an allocation "fragmentation event". The boosting was not intended
to target THP specifically and triggers even if THP is disabled.
However, with Dave's perfectly reasonable workload, fragmentation events
can be very common given the ratio of slab to page cache allocations so
boosting remains active for long periods of time.
As high-order allocations might use compaction and compaction cannot
move slab pages the decision was made in the commit to special-case
kswapd when watermarks are boosted -- kswapd avoids reclaiming slab as
reclaiming slab does not directly help compaction.
As Dave notes, this decision means that slab can be artificially
protected for long periods of time and messes up the balance with slab
and page caches.
Removing the special casing can still indirectly help avoid
fragmentation by avoiding fragmentation-causing events due to slab
allocation as pages from a slab pageblock will have some slab objects
freed. Furthermore, with the special casing, reclaim behaviour is
unpredictable as kswapd sometimes examines slab and sometimes does not
in a manner that is tricky to tune or analyse.
This patch removes the special casing. The downside is that this is not
a universal performance win. Some benchmarks that depend on the
residency of data when rereading metadata may see a regression when slab
reclaim is restored to its original behaviour. Similarly, some
benchmarks that only read-once or write-once may perform better when
page reclaim is too aggressive. The primary upside is that slab
shrinker is less surprising (arguably more sane but that's a matter of
opinion), behaves consistently regardless of the fragmentation state of
the system and properly obeys VM sysctls.
A fsmark benchmark configuration was constructed similar to what Dave
reported and is codified by the mmtest configuration
config-io-fsmark-small-file-stream. It was evaluated on a 1-socket
machine to avoid dealing with NUMA-related issues and the timing of
reclaim. The storage was an SSD Samsung Evo and a fresh trimmed XFS
filesystem was used for the test data.
This is not an exact replication of Dave's setup. The configuration
scales its parameters depending on the memory size of the SUT to behave
similarly across machines. The parameters mean the first sample
reported by fs_mark is using 50% of RAM which will barely be throttled
and look like a big outlier. Dave used fake NUMA to have multiple
kswapd instances which I didn't replicate. Finally, the number of
iterations differ from Dave's test as the target disk was not large
enough. While not identical, it should be representative.
fsmark
5.3.0-rc3 5.3.0-rc3
vanilla shrinker-v1r1
Min 1-files/sec 4444.80 ( 0.00%) 4765.60 ( 7.22%)
1st-qrtle 1-files/sec 5005.10 ( 0.00%) 5091.70 ( 1.73%)
2nd-qrtle 1-files/sec 4917.80 ( 0.00%) 4855.60 ( -1.26%)
3rd-qrtle 1-files/sec 4667.40 ( 0.00%) 4831.20 ( 3.51%)
Max-1 1-files/sec 11421.50 ( 0.00%) 9999.30 ( -12.45%)
Max-5 1-files/sec 11421.50 ( 0.00%) 9999.30 ( -12.45%)
Max-10 1-files/sec 11421.50 ( 0.00%) 9999.30 ( -12.45%)
Max-90 1-files/sec 4649.60 ( 0.00%) 4780.70 ( 2.82%)
Max-95 1-files/sec 4491.00 ( 0.00%) 4768.20 ( 6.17%)
Max-99 1-files/sec 4491.00 ( 0.00%) 4768.20 ( 6.17%)
Max 1-files/sec 11421.50 ( 0.00%) 9999.30 ( -12.45%)
Hmean 1-files/sec 5004.75 ( 0.00%) 5075.96 ( 1.42%)
Stddev 1-files/sec 1778.70 ( 0.00%) 1369.66 ( 23.00%)
CoeffVar 1-files/sec 33.70 ( 0.00%) 26.05 ( 22.71%)
BHmean-99 1-files/sec 5053.72 ( 0.00%) 5101.52 ( 0.95%)
BHmean-95 1-files/sec 5053.72 ( 0.00%) 5101.52 ( 0.95%)
BHmean-90 1-files/sec 5107.05 ( 0.00%) 5131.41 ( 0.48%)
BHmean-75 1-files/sec 5208.45 ( 0.00%) 5206.68 ( -0.03%)
BHmean-50 1-files/sec 5405.53 ( 0.00%) 5381.62 ( -0.44%)
BHmean-25 1-files/sec 6179.75 ( 0.00%) 6095.14 ( -1.37%)
5.3.0-rc3 5.3.0-rc3
vanillashrinker-v1r1
Duration User 501.82 497.29
Duration System 4401.44 4424.08
Duration Elapsed 8124.76 8358.05
This is showing a slight skew for the max result representing a large
outlier for the 1st, 2nd and 3rd quartile are similar indicating that
the bulk of the results show little difference. Note that an earlier
version of the fsmark configuration showed a regression but that
included more samples taken while memory was still filling.
Note that the elapsed time is higher. Part of this is that the
configuration included time to delete all the test files when the test
completes -- the test automation handles the possibility of testing
fsmark with multiple thread counts. Without the patch, many of these
objects would be memory resident which is part of what the patch is
addressing.
There are other important observations that justify the patch.
1. With the vanilla kernel, the number of dirty pages in the system is
very low for much of the test. With this patch, dirty pages is
generally kept at 10% which matches vm.dirty_background_ratio which
is normal expected historical behaviour.
2. With the vanilla kernel, the ratio of Slab/Pagecache is close to
0.95 for much of the test i.e. Slab is being left alone and
dominating memory consumption. With the patch applied, the ratio
varies between 0.35 and 0.45 with the bulk of the measured ratios
roughly half way between those values. This is a different balance to
what Dave reported but it was at least consistent.
3. Slabs are scanned throughout the entire test with the patch applied.
The vanille kernel has periods with no scan activity and then
relatively massive spikes.
4. Without the patch, kswapd scan rates are very variable. With the
patch, the scan rates remain quite steady.
4. Overall vmstats are closer to normal expectations
5.3.0-rc3 5.3.0-rc3
vanilla shrinker-v1r1
Ops Direct pages scanned 99388.00 328410.00
Ops Kswapd pages scanned 45382917.00 33451026.00
Ops Kswapd pages reclaimed 30869570.00 25239655.00
Ops Direct pages reclaimed 74131.00 5830.00
Ops Kswapd efficiency % 68.02 75.45
Ops Kswapd velocity 5585.75 4002.25
Ops Page reclaim immediate 1179721.00 430927.00
Ops Slabs scanned 62367361.00 73581394.00
Ops Direct inode steals 2103.00 1002.00
Ops Kswapd inode steals 570180.00 5183206.00
o Vanilla kernel is hitting direct reclaim more frequently,
not very much in absolute terms but the fact the patch
reduces it is interesting
o "Page reclaim immediate" in the vanilla kernel indicates
dirty pages are being encountered at the tail of the LRU.
This is generally bad and means in this case that the LRU
is not long enough for dirty pages to be cleaned by the
background flush in time. This is much reduced by the
patch.
o With the patch, kswapd is reclaiming 10 times more slab
pages than with the vanilla kernel. This is indicative
of the watermark boosting over-protecting slab
A more complete set of tests were run that were part of the basis for
introducing boosting and while there are some differences, they are well
within tolerances.
Bottom line, the special casing kswapd to avoid slab behaviour is
unpredictable and can lead to abnormal results for normal workloads.
This patch restores the expected behaviour that slab and page cache is
balanced consistently for a workload with a steady allocation ratio of
slab/pagecache pages. It also means that if there are workloads that
favour the preservation of slab over pagecache that it can be tuned via
vm.vfs_cache_pressure where as the vanilla kernel effectively ignores
the parameter when boosting is active.
Link: http://lkml.kernel.org/r/20190808182946.GM2739@techsingularity.net
Fixes: 1c30844d2d ("mm: reclaim small amounts of memory when an external fragmentation event occurs")
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: <stable@vger.kernel.org> [5.0+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This reverts commit 2f0799a0ff ("mm, thp: restore node-local
hugepage allocations").
commit 2f0799a0ff was rightfully applied to avoid the risk of a
severe regression that was reported by the kernel test robot at the end
of the merge window. Now we understood the regression was a false
positive and was caused by a significant increase in fairness during a
swap trashing benchmark. So it's safe to re-apply the fix and continue
improving the code from there. The benchmark that reported the
regression is very useful, but it provides a meaningful result only when
there is no significant alteration in fairness during the workload. The
removal of __GFP_THISNODE increased fairness.
__GFP_THISNODE cannot be used in the generic page faults path for new
memory allocations under the MPOL_DEFAULT mempolicy, or the allocation
behavior significantly deviates from what the MPOL_DEFAULT semantics are
supposed to be for THP and 4k allocations alike.
Setting THP defrag to "always" or using MADV_HUGEPAGE (with THP defrag
set to "madvise") has never meant to provide an implicit MPOL_BIND on
the "current" node the task is running on, causing swap storms and
providing a much more aggressive behavior than even zone_reclaim_node =
3.
Any workload who could have benefited from __GFP_THISNODE has now to
enable zone_reclaim_mode=1||2||3. __GFP_THISNODE implicitly provided
the zone_reclaim_mode behavior, but it only did so if THP was enabled:
if THP was disabled, there would have been no chance to get any 4k page
from the current node if the current node was full of pagecache, which
further shows how this __GFP_THISNODE was misplaced in MADV_HUGEPAGE.
MADV_HUGEPAGE has never been intended to provide any zone_reclaim_mode
semantics, in fact the two are orthogonal, zone_reclaim_mode = 1|2|3
must work exactly the same with MADV_HUGEPAGE set or not.
The performance characteristic of memory depends on the hardware
details. The numbers below are obtained on Naples/EPYC architecture and
the N/A projection extends them to show what we should aim for in the
future as a good THP NUMA locality default. The benchmark used
exercises random memory seeks (note: the cost of the page faults is not
part of the measurement).
D0 THP | D0 4k | D1 THP | D1 4k | D2 THP | D2 4k | D3 THP | D3 4k | ...
0% | +43% | +45% | +106% | +131% | +224% | N/A | N/A
D0 means distance zero (i.e. local memory), D1 means distance one (i.e.
intra socket memory), D2 means distance two (i.e. inter socket memory),
etc...
For the guest physical memory allocated by qemu and for guest mode
kernel the performance characteristic of RAM is more complex and an
ideal default could be:
D0 THP | D1 THP | D0 4k | D2 THP | D1 4k | D3 THP | D2 4k | D3 4k | ...
0% | +58% | +101% | N/A | +222% | N/A | N/A | N/A
NOTE: the N/A are projections and haven't been measured yet, the
measurement in this case is done on a 1950x with only two NUMA nodes.
The THP case here means THP was used both in the host and in the guest.
After applying this commit the THP NUMA locality order that we'll get
out of MADV_HUGEPAGE is this:
D0 THP | D1 THP | D2 THP | D3 THP | ... | D0 4k | D1 4k | D2 4k | D3 4k | ...
Before this commit it was:
D0 THP | D0 4k | D1 4k | D2 4k | D3 4k | ...
Even if we ignore the breakage of large workloads that can't fit in a
single node that the __GFP_THISNODE implicit "current node" mbind
caused, the THP NUMA locality order provided by __GFP_THISNODE was still
not the one we shall aim for in the long term (i.e. the first one at
the top).
After this commit is applied, we can introduce a new allocator multi
order API and to replace those two alloc_pages_vmas calls in the page
fault path, with a single multi order call:
unsigned int order = (1 << HPAGE_PMD_ORDER) | (1 << 0);
page = alloc_pages_multi_order(..., &order);
if (!page)
goto out;
if (!(order & (1 << 0))) {
VM_WARN_ON(order != 1 << HPAGE_PMD_ORDER);
/* THP fault */
} else {
VM_WARN_ON(order != 1 << 0);
/* 4k fallback */
}
The page allocator logic has to be altered so that when it fails on any
zone with order 9, it has to try again with a order 0 before falling
back to the next zone in the zonelist.
After that we need to do more measurements and evaluate if adding an
opt-in feature for guest mode is worth it, to swap "DN 4k | DN+1 THP"
with "DN+1 THP | DN 4k" at every NUMA distance crossing.
Link: http://lkml.kernel.org/r/20190503223146.2312-3-aarcange@redhat.com
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: Zi Yan <zi.yan@cs.rutgers.edu>
Cc: Stefan Priebe - Profihost AG <s.priebe@profihost.ag>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "reapply: relax __GFP_THISNODE for MADV_HUGEPAGE mappings".
The fixes for what was originally reported as "pathological THP
behavior" we rightfully reverted to be sure not to introduced
regressions at end of a merge window after a severe regression report
from the kernel bot. We can safely re-apply them now that we had time
to analyze the problem.
The mm process worked fine, because the good fixes were eventually
committed upstream without excessive delay.
The regression reported by the kernel bot however forced us to revert
the good fixes to be sure not to introduce regressions and to give us
the time to analyze the issue further. The silver lining is that this
extra time allowed to think more at this issue and also plan for a
future direction to improve things further in terms of THP NUMA
locality.
This patch (of 2):
This reverts commit 356ff8a9a7 ("Revert "mm, thp: consolidate THP
gfp handling into alloc_hugepage_direct_gfpmask"). So it reapplies
89c83fb539 ("mm, thp: consolidate THP gfp handling into
alloc_hugepage_direct_gfpmask").
Consolidation of the THP allocation flags at the same place was meant to
be a clean up to easier handle otherwise scattered code which is
imposing a maintenance burden. There were no real problems observed
with the gfp mask consolidation but the reversion was rushed through
without a larger consensus regardless.
This patch brings the consolidation back because this should make the
long term maintainability easier as well as it should allow future
changes to be less error prone.
[mhocko@kernel.org: changelog additions]
Link: http://lkml.kernel.org/r/20190503223146.2312-2-aarcange@redhat.com
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: Zi Yan <zi.yan@cs.rutgers.edu>
Cc: Stefan Priebe - Profihost AG <s.priebe@profihost.ag>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A compiler throws a warning on an arm64 system since commit 9849a5697d
("arch, mm: convert all architectures to use 5level-fixup.h"),
mm/kasan/init.c: In function 'kasan_free_p4d':
mm/kasan/init.c:344:9: warning: variable 'p4d' set but not used [-Wunused-but-set-variable]
p4d_t *p4d;
^~~
because p4d_none() in "5level-fixup.h" is compiled away while it is a
static inline function in "pgtable-nopud.h".
However, if converted p4d_none() to a static inline there, powerpc would
be unhappy as it reads those in assembler language in
"arch/powerpc/include/asm/book3s/64/pgtable.h", so it needs to skip
assembly include for the static inline C function.
While at it, converted a few similar functions to be consistent with the
ones in "pgtable-nopud.h".
Link: http://lkml.kernel.org/r/20190806232917.881-1-cai@lca.pw
Signed-off-by: Qian Cai <cai@lca.pw>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If you use lseek or similar (e.g. pread) to access a location in a
seq_file file that is within a record, rather than at a record boundary,
then the first read will return the remainder of the record, and the
second read will return the whole of that same record (instead of the
next record). When seeking to a record boundary, the next record is
correctly returned.
This bug was introduced by a recent patch (identified below). Before
that patch, seq_read() would increment m->index when the last of the
buffer was returned (m->count == 0). After that patch, we rely on
->next to increment m->index after filling the buffer - but there was
one place where that didn't happen.
Link: https://lkml.kernel.org/lkml/877e7xl029.fsf@notabene.neil.brown.name/
Fixes: 1f4aace60b ("fs/seq_file.c: simplify seq_file iteration code and interface")
Signed-off-by: NeilBrown <neilb@suse.com>
Reported-by: Sergei Turchanov <turchanov@farpost.com>
Tested-by: Sergei Turchanov <turchanov@farpost.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Markus Elfring <Markus.Elfring@web.de>
Cc: <stable@vger.kernel.org> [4.19+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Memcg counters for shadow nodes are broken because the memcg pointer is
obtained in a wrong way. The following approach is used:
virt_to_page(xa_node)->mem_cgroup
Since commit 4d96ba3530 ("mm: memcg/slab: stop setting
page->mem_cgroup pointer for slab pages") page->mem_cgroup pointer isn't
set for slab pages, so memcg_from_slab_page() should be used instead.
Also I doubt that it ever worked correctly: virt_to_head_page() should
be used instead of virt_to_page(). Otherwise objects residing on tail
pages are not accounted, because only the head page contains a valid
mem_cgroup pointer. That was a case since the introduction of these
counters by the commit 68d48e6a2d ("mm: workingset: add vmstat counter
for shadow nodes").
Link: http://lkml.kernel.org/r/20190801233532.138743-1-guro@fb.com
Fixes: 4d96ba3530 ("mm: memcg/slab: stop setting page->mem_cgroup pointer for slab pages")
Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, when checking to see if accessing n bytes starting at address
"ptr" will cause a wraparound in the memory addresses, the check in
check_bogus_address() adds an extra byte, which is incorrect, as the
range of addresses that will be accessed is [ptr, ptr + (n - 1)].
This can lead to incorrectly detecting a wraparound in the memory
address, when trying to read 4 KB from memory that is mapped to the the
last possible page in the virtual address space, when in fact, accessing
that range of memory would not cause a wraparound to occur.
Use the memory range that will actually be accessed when considering if
accessing a certain amount of bytes will cause the memory address to
wrap around.
Link: http://lkml.kernel.org/r/1564509253-23287-1-git-send-email-isaacm@codeaurora.org
Fixes: f5509cc18d ("mm: Hardened usercopy")
Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org>
Co-developed-by: Prasad Sodagudi <psodagud@codeaurora.org>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Trilok Soni <tsoni@codeaurora.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Recent changes to the vmalloc code by commit 68ad4a3304
("mm/vmalloc.c: keep track of free blocks for vmap allocation") can
cause spurious percpu allocation failures. These, in turn, can result
in panic()s in the slub code. One such possible panic was reported by
Dave Hansen in following link https://lkml.org/lkml/2019/6/19/939.
Another related panic observed is,
RIP: 0033:0x7f46f7441b9b
Call Trace:
dump_stack+0x61/0x80
pcpu_alloc.cold.30+0x22/0x4f
mem_cgroup_css_alloc+0x110/0x650
cgroup_apply_control_enable+0x133/0x330
cgroup_mkdir+0x41b/0x500
kernfs_iop_mkdir+0x5a/0x90
vfs_mkdir+0x102/0x1b0
do_mkdirat+0x7d/0xf0
do_syscall_64+0x5b/0x180
entry_SYSCALL_64_after_hwframe+0x44/0xa9
VMALLOC memory manager divides the entire VMALLOC space (VMALLOC_START
to VMALLOC_END) into multiple VM areas (struct vm_areas), and it mainly
uses two lists (vmap_area_list & free_vmap_area_list) to track the used
and free VM areas in VMALLOC space. And pcpu_get_vm_areas(offsets[],
sizes[], nr_vms, align) function is used for allocating congruent VM
areas for percpu memory allocator. In order to not conflict with
VMALLOC users, pcpu_get_vm_areas allocates VM areas near the end of the
VMALLOC space. So the search for free vm_area for the given requirement
starts near VMALLOC_END and moves upwards towards VMALLOC_START.
Prior to commit 68ad4a3304, the search for free vm_area in
pcpu_get_vm_areas() involves following two main steps.
Step 1:
Find a aligned "base" adress near VMALLOC_END.
va = free vm area near VMALLOC_END
Step 2:
Loop through number of requested vm_areas and check,
Step 2.1:
if (base < VMALLOC_START)
1. fail with error
Step 2.2:
// end is offsets[area] + sizes[area]
if (base + end > va->vm_end)
1. Move the base downwards and repeat Step 2
Step 2.3:
if (base + start < va->vm_start)
1. Move to previous free vm_area node, find aligned
base address and repeat Step 2
But Commit 68ad4a3304 removed Step 2.2 and modified Step 2.3 as below:
Step 2.3:
if (base + start < va->vm_start || base + end > va->vm_end)
1. Move to previous free vm_area node, find aligned
base address and repeat Step 2
Above change is the root cause of spurious percpu memory allocation
failures. For example, consider a case where a relatively large vm_area
(~ 30 TB) was ignored in free vm_area search because it did not pass the
base + end < vm->vm_end boundary check. Ignoring such large free
vm_area's would lead to not finding free vm_area within boundary of
VMALLOC_start to VMALLOC_END which in turn leads to allocation failures.
So modify the search algorithm to include Step 2.2.
Link: http://lkml.kernel.org/r/20190729232139.91131-1-sathyanarayanan.kuppuswamy@linux.intel.com
Fixes: 68ad4a3304 ("mm/vmalloc.c: keep track of free blocks for vmap allocation")
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reported-by: Dave Hansen <dave.hansen@intel.com>
Acked-by: Dennis Zhou <dennis@kernel.org>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: sathyanarayanan kuppuswamy <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When running syzkaller internally, we ran into the below bug on 4.9.x
kernel:
kernel BUG at mm/huge_memory.c:2124!
invalid opcode: 0000 [#1] SMP KASAN
CPU: 0 PID: 1518 Comm: syz-executor107 Not tainted 4.9.168+ #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.5.1 01/01/2011
task: ffff880067b34900 task.stack: ffff880068998000
RIP: split_huge_page_to_list+0x8fb/0x1030 mm/huge_memory.c:2124
Call Trace:
split_huge_page include/linux/huge_mm.h:100 [inline]
queue_pages_pte_range+0x7e1/0x1480 mm/mempolicy.c:538
walk_pmd_range mm/pagewalk.c:50 [inline]
walk_pud_range mm/pagewalk.c:90 [inline]
walk_pgd_range mm/pagewalk.c:116 [inline]
__walk_page_range+0x44a/0xdb0 mm/pagewalk.c:208
walk_page_range+0x154/0x370 mm/pagewalk.c:285
queue_pages_range+0x115/0x150 mm/mempolicy.c:694
do_mbind mm/mempolicy.c:1241 [inline]
SYSC_mbind+0x3c3/0x1030 mm/mempolicy.c:1370
SyS_mbind+0x46/0x60 mm/mempolicy.c:1352
do_syscall_64+0x1d2/0x600 arch/x86/entry/common.c:282
entry_SYSCALL_64_after_swapgs+0x5d/0xdb
Code: c7 80 1c 02 00 e8 26 0a 76 01 <0f> 0b 48 c7 c7 40 46 45 84 e8 4c
RIP [<ffffffff81895d6b>] split_huge_page_to_list+0x8fb/0x1030 mm/huge_memory.c:2124
RSP <ffff88006899f980>
with the below test:
uint64_t r[1] = {0xffffffffffffffff};
int main(void)
{
syscall(__NR_mmap, 0x20000000, 0x1000000, 3, 0x32, -1, 0);
intptr_t res = 0;
res = syscall(__NR_socket, 0x11, 3, 0x300);
if (res != -1)
r[0] = res;
*(uint32_t*)0x20000040 = 0x10000;
*(uint32_t*)0x20000044 = 1;
*(uint32_t*)0x20000048 = 0xc520;
*(uint32_t*)0x2000004c = 1;
syscall(__NR_setsockopt, r[0], 0x107, 0xd, 0x20000040, 0x10);
syscall(__NR_mmap, 0x20fed000, 0x10000, 0, 0x8811, r[0], 0);
*(uint64_t*)0x20000340 = 2;
syscall(__NR_mbind, 0x20ff9000, 0x4000, 0x4002, 0x20000340, 0x45d4, 3);
return 0;
}
Actually the test does:
mmap(0x20000000, 16777216, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x20000000
socket(AF_PACKET, SOCK_RAW, 768) = 3
setsockopt(3, SOL_PACKET, PACKET_TX_RING, {block_size=65536, block_nr=1, frame_size=50464, frame_nr=1}, 16) = 0
mmap(0x20fed000, 65536, PROT_NONE, MAP_SHARED|MAP_FIXED|MAP_POPULATE|MAP_DENYWRITE, 3, 0) = 0x20fed000
mbind(..., MPOL_MF_STRICT|MPOL_MF_MOVE) = 0
The setsockopt() would allocate compound pages (16 pages in this test)
for packet tx ring, then the mmap() would call packet_mmap() to map the
pages into the user address space specified by the mmap() call.
When calling mbind(), it would scan the vma to queue the pages for
migration to the new node. It would split any huge page since 4.9
doesn't support THP migration, however, the packet tx ring compound
pages are not THP and even not movable. So, the above bug is triggered.
However, the later kernel is not hit by this issue due to commit
d44d363f65 ("mm: don't assume anonymous pages have SwapBacked flag"),
which just removes the PageSwapBacked check for a different reason.
But, there is a deeper issue. According to the semantic of mbind(), it
should return -EIO if MPOL_MF_MOVE or MPOL_MF_MOVE_ALL was specified and
MPOL_MF_STRICT was also specified, but the kernel was unable to move all
existing pages in the range. The tx ring of the packet socket is
definitely not movable, however, mbind() returns success for this case.
Although the most socket file associates with non-movable pages, but XDP
may have movable pages from gup. So, it sounds not fine to just check
the underlying file type of vma in vma_migratable().
Change migrate_page_add() to check if the page is movable or not, if it
is unmovable, just return -EIO. But do not abort pte walk immediately,
since there may be pages off LRU temporarily. We should migrate other
pages if MPOL_MF_MOVE* is specified. Set has_unmovable flag if some
paged could not be not moved, then return -EIO for mbind() eventually.
With this change the above test would return -EIO as expected.
[yang.shi@linux.alibaba.com: fix review comments from Vlastimil]
Link: http://lkml.kernel.org/r/1563556862-54056-3-git-send-email-yang.shi@linux.alibaba.com
Link: http://lkml.kernel.org/r/1561162809-59140-3-git-send-email-yang.shi@linux.alibaba.com
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When both MPOL_MF_MOVE* and MPOL_MF_STRICT was specified, mbind() should
try best to migrate misplaced pages, if some of the pages could not be
migrated, then return -EIO.
There are three different sub-cases:
1. vma is not migratable
2. vma is migratable, but there are unmovable pages
3. vma is migratable, pages are movable, but migrate_pages() fails
If #1 happens, kernel would just abort immediately, then return -EIO,
after a7f40cfe3b ("mm: mempolicy: make mbind() return -EIO when
MPOL_MF_STRICT is specified").
If #3 happens, kernel would set policy and migrate pages with
best-effort, but won't rollback the migrated pages and reset the policy
back.
Before that commit, they behaves in the same way. It'd better to keep
their behavior consistent. But, rolling back the migrated pages and
resetting the policy back sounds not feasible, so just make #1 behave as
same as #3.
Userspace will know that not everything was successfully migrated (via
-EIO), and can take whatever steps it deems necessary - attempt
rollback, determine which exact page(s) are violating the policy, etc.
Make queue_pages_range() return 1 to indicate there are unmovable pages
or vma is not migratable.
The #2 is not handled correctly in the current kernel, the following
patch will fix it.
[yang.shi@linux.alibaba.com: fix review comments from Vlastimil]
Link: http://lkml.kernel.org/r/1563556862-54056-2-git-send-email-yang.shi@linux.alibaba.com
Link: http://lkml.kernel.org/r/1561162809-59140-2-git-send-email-yang.shi@linux.alibaba.com
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The RISC-V kernel implementation of flush_tlb_page() when CONFIG_SMP
is set is wrong. It passes zero to flush_tlb_range() as the final
address to flush, but it should be at least 'addr'.
Some other Linux architecture ports use the beginning address to
flush, plus PAGE_SIZE, as the final address to flush. This might
flush slightly more than what's needed, but it seems unlikely that
being more clever would improve anything. So let's just take that
implementation for now.
While here, convert the macro into a static inline function, primarily
to avoid unintentional multiple evaluations of 'addr'.
This second version of the patch fixes a coding style issue found by
Christoph Hellwig <hch@lst.de>.
Reported-by: Andreas Schwab <schwab@suse.de>
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Pull tpm fixes from Jarkko Sakkinen:
"One more bug fix for the next release"
* tag 'tpmdd-next-20190813' of git://git.infradead.org/users/jjs/linux-tpmdd:
KEYS: trusted: allow module init if TPM is inactive or deactivated
Pull SCSI fix from James Bottomley:
"Single lpfc fix, for a single-cpu corner case"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: lpfc: Fix crash when cpu count is 1 and null irq affinity mask
Commit c78719203f ("KEYS: trusted: allow trusted.ko to initialize w/o a
TPM") allows the trusted module to be loaded even if a TPM is not found, to
avoid module dependency problems.
However, trusted module initialization can still fail if the TPM is
inactive or deactivated. tpm_get_random() returns an error.
This patch removes the call to tpm_get_random() and instead extends the PCR
specified by the user with zeros. The security of this alternative is
equivalent to the previous one, as either option prevents with a PCR update
unsealing and misuse of sealed data by a user space process.
Even if a PCR is extended with zeros, instead of random data, it is still
computationally infeasible to find a value as input for a new PCR extend
operation, to obtain again the PCR value that would allow unsealing.
Cc: stable@vger.kernel.org
Fixes: 240730437d ("KEYS: trusted: explicitly use tpm_chip structure...")
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Reviewed-by: Tyler Hicks <tyhicks@canonical.com>
Suggested-by: Mimi Zohar <zohar@linux.ibm.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
This patch changes the driver/user shared (mmapped) CQ notification
flags field from unsigned 64-bits size to unsigned 32-bits size. This
enables building siw on 32-bit architectures.
This patch changes the siw-abi, but as siw was only just merged in
this merge window cycle, there are no released kernels with the prior
abi. We are making no attempt to be binary compatible with siw user
space libraries prior to the merge of siw into the upstream kernel,
only moving forward with upstream kernels and upstream rdma-core
provided siw libraries are we guaranteeing compatibility.
Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com>
Link: https://lore.kernel.org/r/20190809151816.13018-1-bmt@zurich.ibm.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
Change ct id hash calculation to only use invariants.
Currently the ct id hash calculation is based on some fields that can
change in the lifetime on a conntrack entry in some corner cases. The
current hash uses the whole tuple which contains an hlist pointer which
will change when the conntrack is placed on the dying list resulting in
a ct id change.
This patch also removes the reply-side tuple and extension pointer from
the hash calculation so that the ct id will will not change from
initialization until confirmation.
Fixes: 3c79107631 ("netfilter: ctnetlink: don't use conntrack/expect object addresses as id")
Signed-off-by: Dirk Morris <dmorris@metaloft.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The G700 suffers from the same issue than the G502:
when plugging it in, the driver tries to contact it but it fails.
This timeout is problematic as it introduce a delay in the boot,
and having only the mouse event node means that the hardware
macros keys can not be relayed to the userspace.
Link: https://github.com/libratbag/libratbag/issues/797
Fixes: 91cf9a98ae ("HID: logitech-hidpp: make .probe usbhid capable")
Cc: stable@vger.kernel.org # v5.2
Reviewed-by: Filipe Laíns <lains@archlinux.org>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
This partially reverts commit 27fc32fd94.
It turns out that the G502 has some issues with hid-logitech-hidpp:
when plugging it in, the driver tries to contact it but it fails.
So the driver bails out leaving only the mouse event node available.
This timeout is problematic as it introduce a delay in the boot,
and having only the mouse event node means that the hardware
macros keys can not be relayed to the userspace.
Filipe and I just gave a shot at the following devices:
G403 Wireless (0xC082)
G703 (0xC087)
G703 Hero (0xC090)
G903 (0xC086)
G903 Hero (0xC091)
G Pro (0xC088)
Reverting the devices we are not sure that works flawlessly.
Reviewed-by: Filipe Laíns <lains@archlinux.org>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
"p runtime/jit: pass > 32bit index to tail_call" fails when
bpf_jit_enable=1, because the tail call is not executed.
This in turn is because the generated code assumes index is 64-bit,
while it must be 32-bit, and as a result prog array bounds check fails,
while it should pass. Even if bounds check would have passed, the code
that follows uses 64-bit index to compute prog array offset.
Fix by using clrj instead of clgrj for comparing index with array size,
and also by using llgfr for truncating index to 32 bits before using it
to compute prog array offset.
Fixes: 6651ee070b ("s390/bpf: implement bpf_tail_call() helper")
Reported-by: Yauheni Kaliuta <yauheni.kaliuta@redhat.com>
Acked-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
spi_nor_spansion_clear_sr_bp() depends on spansion_quad_enable().
While spansion_quad_enable() is selected as default when
initializing the flash parameters, the nor->quad_enable() method
can be overwritten later on when parsing BFPT.
Select the write protection disable mechanism at spi_nor_init() time,
when the nor->quad_enable() method is already known.
Fixes: 191f5c2ed4 ("mtd: spi-nor: use 16-bit WRR command when QE is set on spansion flashes")
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Reviewed-by: Vignesh Raghavendra <vigneshr@ti.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Fix sparse warnings:
drivers/soc/ti/pm33xx.c:144:27: warning: symbol 'rtc_wake_src' was not declared. Should it be static?
drivers/soc/ti/pm33xx.c:160:5: warning: symbol 'am33xx_rtc_only_idle' was not declared. Should it be static?
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
If a CPU doesn't support the page size for which the kernel is
configured, then we will complain and refuse to bring it online. For
secondary CPUs (and the boot CPU on a system booting with EFI), we will
also print an error identifying the mismatch.
Consequently, the only time that the cpufeature code can detect a
granule size mismatch is for a granule other than the one that is
currently being used. Although we would rather such systems didn't
exist, we've unfortunately lost that battle and Kevin reports that
on his amlogic S922X (odroid-n2 board) we end up warning and taining
with defconfig because 16k pages are not supported by all of the CPUs.
In such a situation, we don't actually care about the feature mismatch,
particularly now that KVM only exposes the sanitised view of the CPU
registers (commit 93390c0a1b - "arm64: KVM: Hide unsupported AArch64
CPU features from guests"). Treat the granule fields as non-strict and
let Kevin run without a tainted kernel.
Cc: Marc Zyngier <maz@kernel.org>
Reported-by: Kevin Hilman <khilman@baylibre.com>
Tested-by: Kevin Hilman <khilman@baylibre.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
[catalin.marinas@arm.com: changelog updated with KVM sanitised regs commit]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Mark switch cases where we are expecting to fall through.
This patch fixes the following warnings:
arch/arm/plat-omap/dma.c: In function 'omap_set_dma_src_burst_mode':
arch/arm/plat-omap/dma.c:384:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
if (dma_omap2plus()) {
^
arch/arm/plat-omap/dma.c:393:2: note: here
case OMAP_DMA_DATA_BURST_16:
^~~~
arch/arm/plat-omap/dma.c:394:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
if (dma_omap2plus()) {
^
arch/arm/plat-omap/dma.c:402:2: note: here
default:
^~~~~~~
arch/arm/plat-omap/dma.c: In function 'omap_set_dma_dest_burst_mode':
arch/arm/plat-omap/dma.c:473:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
if (dma_omap2plus()) {
^
arch/arm/plat-omap/dma.c:481:2: note: here
default:
^~~~~~~
Notice that, in this particular case, the code comment is
modified in accordance with what GCC is expecting to find.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Commit 4e27f752ab ("ARM: OMAP2+: Drop mmc platform data for am330x and
am43xx") dropped legacy mmc platform data for am3 and am4, but missed the
fact that we never updated the dts files for mmc3 that is directly on l3
interconnect instead of l4 interconnect. This leads to a situation with
no legacy platform data and incomplete dts data.
Let's update the mmc instances on l3 interconnect to probe properly with
ti-sysc interconnect target module driver to make mmc3 work again. Let's
still keep legacy "ti,hwmods" property around for v5.2 kernel and only
drop it later on.
Note that there is no need to use property status = "disabled" for mmc3.
The default for dts is enabled, and runtime PM will idle unused instances
just fine.
Fixes: 4e27f752ab ("ARM: OMAP2+: Drop mmc platform data for am330x and am43xx")
Reported-by: David Lechner <david@lechnology.com>
Tested-by: David Lechner <david@lechnology.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
The clocks are not yet parsed and prepared until after a successful
sysc_get_clocks(), so there is no need to unprepare the clocks upon
any failure of any of the prior functions in sysc_probe(). The current
code path would have been a no-op because of the clock validity checks
within sysc_unprepare(), but let's just simplify the cleanup path by
returning the error directly.
While at this, also fix the cleanup path for a sysc_init_resets()
failure which is executed after the clocks are prepared.
Signed-off-by: Suman Anna <s-anna@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Non-serio path of Amstrad Delta FIQ deferred handler depended on
irq_ack() method provided by OMAP GPIO driver. That method has been
removed by commit 693de831c6 ("gpio: omap: remove irq_ack method").
Remove useless code from the deferred handler and reimplement the
missing operation inside the base FIQ handler.
Should another dependency - irq_unmask() - be ever removed from the OMAP
GPIO driver, WARN once if missing.
Signed-off-by: Janusz Krzysztofik <jmkrzyszt@gmail.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
As seen on the AM335x TRM all the UARTs controller only are 0x1000 in size.
Fix this in the DTS.
Signed-off-by: Emmanuel Vadot <manu@freebsd.org>
Signed-off-by: Tony Lindgren <tony@atomide.com>
We have errata i688 workaround produce warnings on SoCs other than
omap4 and omap5:
omap4_sram_init:Unable to allocate sram needed to handle errata I688
omap4_sram_init:Unable to get sram pool needed to handle errata I688
This is happening because there is no ti,omap4-mpu node, or no SRAM
to configure for the other SoCs, so let's remove the warning based
on the SoC revision checks.
As nobody has complained it seems that the other SoC variants do not
need this workaround.
Signed-off-by: Tony Lindgren <tony@atomide.com>
The original driver author seemed to be under the impression that a driver
cannot be removed if it does not have a .remove method. Or maybe if it is
a built-in platform driver.
This is not true. This crash can be created:
root@ubuntu:/sys/bus/platform/drivers/hisi-lpc# echo HISI0191\:00 > unbind
root@ubuntu:/sys/bus/platform/drivers/hisi-lpc# ipmitool raw 6 1
Unable to handle kernel paging request at virtual address ffff000010035010
Mem abort info:
ESR = 0x96000047
Exception class = DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000047
CM = 0, WnR = 1
swapper pgtable: 4k pages, 48-bit VAs, pgdp=000000000118b000
[ffff000010035010] pgd=0000041ffbfff003, pud=0000041ffbffe003, pmd=0000041ffbffd003, pte=0000000000000000
Internal error: Oops: 96000047 [#1] PREEMPT SMP
Modules linked in:
CPU: 17 PID: 1473 Comm: ipmitool Not tainted 5.2.0-rc5-00003-gf68c53b414a3-dirty #198
Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon D05 IT21 Nemo 2.0 RC0 04/18/2018
pstate: 20000085 (nzCv daIf -PAN -UAO)
pc : hisi_lpc_target_in+0x7c/0x120
lr : hisi_lpc_target_in+0x70/0x120
sp : ffff00001efe3930
x29: ffff00001efe3930 x28: ffff841f9f599200
x27: 0000000000000002 x26: 0000000000000000
x25: 0000000000000080 x24: 00000000000000e4
x23: 0000000000000000 x22: 0000000000000064
x21: ffff801fb667d280 x20: 0000000000000001
x19: ffff00001efe39ac x18: 0000000000000000
x17: 0000000000000000 x16: 0000000000000000
x15: 0000000000000000 x14: 0000000000000000
x13: 0000000000000000 x12: 0000000000000000
x11: 0000000000000000 x10: 0000000000000000
x9 : 0000000000000000 x8 : ffff841febe60340
x7 : ffff801fb55c52e8 x6 : 0000000000000000
x5 : 0000000000ffc0e3 x4 : 0000000000000001
x3 : ffff801fb667d280 x2 : 0000000000000001
x1 : ffff000010035010 x0 : ffff000010035000
Call trace:
hisi_lpc_target_in+0x7c/0x120
hisi_lpc_comm_in+0x88/0x98
logic_inb+0x5c/0xb8
port_inb+0x18/0x20
bt_event+0x38/0x808
smi_event_handler+0x4c/0x5a0
check_start_timer_thread.part.4+0x40/0x58
sender+0x78/0x88
smi_send.isra.6+0x94/0x108
i_ipmi_request+0x2c4/0x8f8
ipmi_request_settime+0x124/0x160
handle_send_req+0x19c/0x208
ipmi_ioctl+0x2c0/0x990
do_vfs_ioctl+0xb8/0x8f8
ksys_ioctl+0x80/0xb8
__arm64_sys_ioctl+0x1c/0x28
el0_svc_common.constprop.0+0x64/0x160
el0_svc_handler+0x28/0x78
el0_svc+0x8/0xc
Code: 941d1511 aa0003f9 f94006a0 91004001 (b9000034)
---[ end trace aa842b86af7069e4 ]---
The problem here is that the host goes away but the associated logical PIO
region remains registered, as do the children devices.
Fix by adding a .remove method to tidy-up by removing the child devices
and unregistering the logical PIO region.
Cc: stable@vger.kernel.org
Fixes: adf38bb0b5 ("HISI LPC: Support the LPC host on Hip06/Hip07 with DT bindings")
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Wei Xu <xuwei5@hisilicon.com>
If, after registering a logical PIO range, the driver probe later fails,
the logical PIO range memory will be released automatically.
This causes an issue, in that the logical PIO range is not unregistered
and the released range memory may be later referenced.
Fix by unregistering the logical PIO range.
And since we now unregister the logical PIO range for probe failure, avoid
the special ordering of setting logical PIO range ops, which was the
previous (poor) attempt at a safeguard against this.
Cc: stable@vger.kernel.org
Fixes: adf38bb0b5 ("HISI LPC: Support the LPC host on Hip06/Hip07 with DT bindings")
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Wei Xu <xuwei5@hisilicon.com>
Add a function to unregister a logical PIO range.
Logical PIO space can still be leaked when unregistering certain
LOGIC_PIO_CPU_MMIO regions, but this acceptable for now since there are no
callers to unregister LOGIC_PIO_CPU_MMIO regions, and the logical PIO
region allocation scheme would need significant work to improve this.
Cc: stable@vger.kernel.org
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Wei Xu <xuwei5@hisilicon.com>
The code was originally written to not support unregistering logical PIO
regions.
To accommodate supporting unregistering logical PIO regions, subtly modify
LOGIC_PIO_CPU_MMIO region registration code, such that the "end" of the
registered regions is the "end" of the last region, and not the sum of
the sizes of all the registered regions.
Cc: stable@vger.kernel.org
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Wei Xu <xuwei5@hisilicon.com>
The traversing of io_range_list with list_for_each_entry_rcu()
is not properly protected by rcu_read_lock() and rcu_read_unlock(),
so add them.
These functions mark the critical section scope where the list is
protected for the reader, it cannot be "reclaimed". Any updater - in
this case, the logical PIO registration functions - cannot update the
list until the reader exits this critical section.
In addition, the list traversing used in logic_pio_register_range()
does not need to use the rcu variant.
This is because we are already using io_range_mutex to guarantee mutual
exclusion from mutating the list.
Cc: stable@vger.kernel.org
Fixes: 031e360186 ("lib: Add generic PIO mapping method")
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Wei Xu <xuwei5@hisilicon.com>
It's large and doesn't need contiguous memory. Fixes
allocation failures in some cases.
v2: kvfree the memory.
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
fec's gpio phy reset properties have been deprecated.
Update the dt-bindings documentation to explicitly mark
them as such, and provide a short description of the
recommended alternative.
Signed-off-by: Sven Van Asbroeck <TheSven73@gmail.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Each iteration of for_each_child_of_node puts the previous node, but in
the case of a return or break from the middle of the loop, there is no
put, thus causing a memory leak. Hence add an of_node_put before the
return or break in three places.
Issue found with Coccinelle.
Signed-off-by: Nishka Dasgupta <nishkadg.linux@gmail.com>
Signed-off-by: Rob Herring <robh@kernel.org>
ITLB entry modifications must be followed by the isync instruction
before the new entries are possibly used. cpu_reset lacks one isync
between ITLB way 6 initialization and jump to the identity mapping.
Add missing isync to xtensa cpu_reset.
Cc: stable@vger.kernel.org
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
This driver does a funny dance disabling and re-enabling
RX and/or TX delays. In any of the RGMII-ID modes, it first
disables the delays, just to re-enable them again right
away. This looks like a needless exercise.
Just enable the respective delays when in any of the
relevant 'id' modes, and disable them otherwise.
Also, remove comments which don't add anything that can't be
seen by looking at the code.
Signed-off-by: André Draszik <git@andred.net>
CC: Andrew Lunn <andrew@lunn.ch>
CC: Florian Fainelli <f.fainelli@gmail.com>
CC: Heiner Kallweit <hkallweit1@gmail.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Jonathan writes:
Second set of IIO fix for the 5.3 cycle.
* adf4371
- Calculation of the value to program to control the output frequency
was incorrect.
* max9611
- Fix temperature reading in probe. A recent fix for a wrong mask
meant this code was looked at afresh. A second bug became obvious
in which the return value was used inplace of the desired register
value. This had no visible effect other than a communication test
not actually testing the communications.
* tag 'iio-fixes-for-5.3b' of git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio:
iio: adc: max9611: Fix temperature reading in probe
iio: frequency: adf4371: Fix output frequency setting
The syzbot fuzzer has found two (!) races in the USB character device
registration and deregistration routines. This patch fixes the races.
The first race results from the fact that usb_deregister_dev() sets
usb_minors[intf->minor] to NULL before calling device_destroy() on the
class device. This leaves a window during which another thread can
allocate the same minor number but will encounter a duplicate name
error when it tries to register its own class device. A typical error
message in the system log would look like:
sysfs: cannot create duplicate filename '/class/usbmisc/ldusb0'
The patch fixes this race by destroying the class device first.
The second race is in usb_register_dev(). When that routine runs, it
first allocates a minor number, then drops minor_rwsem, and then
creates the class device. If the device creation fails, the minor
number is deallocated and the whole routine returns an error. But
during the time while minor_rwsem was dropped, there is a window in
which the minor number is allocated and so another thread can
successfully open the device file. Typically this results in
use-after-free errors or invalid accesses when the other thread closes
its open file reference, because the kernel then tries to release
resources that were already deallocated when usb_register_dev()
failed. The patch fixes this race by keeping minor_rwsem locked
throughout the entire routine.
Reported-and-tested-by: syzbot+30cf45ebfe0b0c4847a1@syzkaller.appspotmail.com
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
CC: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/Pine.LNX.4.44L0.1908121607590.1659-100000@iolanthe.rowland.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
/home/tglx/work/kernel/linus/linux/arch/x86/math-emu/errors.c: In function ‘FPU_printall’:
/home/tglx/work/kernel/linus/linux/arch/x86/math-emu/errors.c:187:9: warning: this statement may fall through [-Wimplicit-fallthrough=]
tagi = FPU_Special(r);
~~~~~^~~~~~~~~~~~~~~~
/home/tglx/work/kernel/linus/linux/arch/x86/math-emu/errors.c:188:3: note: here
case TAG_Valid:
^~~~
/home/tglx/work/kernel/linus/linux/arch/x86/math-emu/fpu_trig.c: In function ‘fyl2xp1’:
/home/tglx/work/kernel/linus/linux/arch/x86/math-emu/fpu_trig.c:1353:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
if (denormal_operand() < 0)
^
/home/tglx/work/kernel/linus/linux/arch/x86/math-emu/fpu_trig.c:1356:3: note: here
case TAG_Zero:
Remove the pointless 'break;' after 'continue;' while at it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Fix
arch/x86/kernel/apic/probe_32.c: In function ‘default_setup_apic_routing’:
arch/x86/kernel/apic/probe_32.c:146:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
if (!APIC_XAPIC(version)) {
^
arch/x86/kernel/apic/probe_32.c:151:3: note: here
case X86_VENDOR_HYGON:
^~~~
for 32-bit builds.
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190811154036.29805-1-bp@alien8.de
This patch will reset the download flag to default value
before retrieving the download mode type.
Fixes: 32646db8cc ("Bluetooth: btqca: inject command complete event during fw download")
Signed-off-by: Balakrishna Godavarthi <bgodavar@codeaurora.org>
Tested-by: Claire Chang <tientzu@chromium.org>
Reviewed-by: Claire Chang <tientzu@chromium.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Zorro Lang reported a crash in generic/475 if we try to inactivate a
corrupt inode with a NULL attr fork (stack trace shortened somewhat):
RIP: 0010:xfs_bmapi_read+0x311/0xb00 [xfs]
RSP: 0018:ffff888047f9ed68 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: ffff888047f9f038 RCX: 1ffffffff5f99f51
RDX: 0000000000000002 RSI: 0000000000000008 RDI: 0000000000000012
RBP: ffff888002a41f00 R08: ffffed10005483f0 R09: ffffed10005483ef
R10: ffffed10005483ef R11: ffff888002a41f7f R12: 0000000000000004
R13: ffffe8fff53b5768 R14: 0000000000000005 R15: 0000000000000001
FS: 00007f11d44b5b80(0000) GS:ffff888114200000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000ef6000 CR3: 000000002e176003 CR4: 00000000001606e0
Call Trace:
xfs_dabuf_map.constprop.18+0x696/0xe50 [xfs]
xfs_da_read_buf+0xf5/0x2c0 [xfs]
xfs_da3_node_read+0x1d/0x230 [xfs]
xfs_attr_inactive+0x3cc/0x5e0 [xfs]
xfs_inactive+0x4c8/0x5b0 [xfs]
xfs_fs_destroy_inode+0x31b/0x8e0 [xfs]
destroy_inode+0xbc/0x190
xfs_bulkstat_one_int+0xa8c/0x1200 [xfs]
xfs_bulkstat_one+0x16/0x20 [xfs]
xfs_bulkstat+0x6fa/0xf20 [xfs]
xfs_ioc_bulkstat+0x182/0x2b0 [xfs]
xfs_file_ioctl+0xee0/0x12a0 [xfs]
do_vfs_ioctl+0x193/0x1000
ksys_ioctl+0x60/0x90
__x64_sys_ioctl+0x6f/0xb0
do_syscall_64+0x9f/0x4d0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7f11d39a3e5b
The "obvious" cause is that the attr ifork is null despite the inode
claiming an attr fork having at least one extent, but it's not so
obvious why we ended up with an inode in that state.
Reported-by: Zorro Lang <zlang@redhat.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204031
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
Continue our game of replacing ASSERTs for corrupt ondisk metadata with
EFSCORRUPTED returns.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
Let hidp_send_message return the number of successfully queued bytes
instead of an unconditional 0.
With the return value fixed to 0, other drivers relying on hidp, such as
hidraw, can not return meaningful values from their respective
implementations of write(). In particular, with the current behavior, a
hidraw device's write() will have different return values depending on
whether the device is connected via USB or Bluetooth, which makes it
harder to abstract away the transport layer.
Signed-off-by: Fabian Henneke <fabian.henneke@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
WCN399x chips are coex chips, it needs a VS pre shutdown
command while turning off the BT. So that chip can inform
BT is OFF to other active clients.
Signed-off-by: Harish Bandi <c-hbandi@codeaurora.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The opcode of the command injected by commit 32646db8cc ("Bluetooth:
btqca: inject command complete event during fw download") uses the CPU
byte format, however it should always be little endian. In practice it
shouldn't really matter, since all we need is an opcode != 0, but still
let's do things correctly and keep sparse happy.
Fixes: 32646db8cc ("Bluetooth: btqca: inject command complete event during fw download")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
On WCN3990 downloading the NVM sometimes fails with a "TLV response
size mismatch" error:
[ 174.949955] Bluetooth: btqca.c:qca_download_firmware() hci0: QCA Downloading qca/crnv21.bin
[ 174.958718] Bluetooth: btqca.c:qca_tlv_send_segment() hci0: QCA TLV response size mismatch
It seems the controller needs a short time after downloading the
firmware before it is ready for the NVM. A delay as short as 1 ms
seems sufficient, make it 10 ms just in case. No event is received
during the delay, hence we don't just silently drop an extra event.
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Fix to return error code -EINVAL from the error handling
case instead of 0, as done elsewhere in this function.
Fixes: a1c49c434e ("Bluetooth: btusb: Add protocol support for MediaTek MT7668U USB devices")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
"masking, test in bounds 3" fails on s390, because
BPF_ALU64_IMM(BPF_NEG, BPF_REG_2, 0) ignores the top 32 bits of
BPF_REG_2. The reason is that JIT emits lcgfr instead of lcgr.
The associated comment indicates that the code was intended to
emit lcgr in the first place, it's just that the wrong opcode
was used.
Fix by using the correct opcode.
Fixes: 0546231057 ("s390/bpf: Add s390x eBPF JIT compiler backend")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Acked-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The error handling code doesn't free siw_cpu_info.tx_valid_cpus[0]. The
first iteration through the loop is a no-op so this is sort of an off
by one bug. Also Bernard pointed out that we can remove the NULL
assignment and simplify the code a bit.
Fixes: bdcf26bf9b ("rdma/siw: network and RDMA core interface")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com>
Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com>
Link: https://lore.kernel.org/r/20190809140904.GB3552@mwanda
Signed-off-by: Doug Ledford <dledford@redhat.com>
Oded writes:
This tag contains a couple of important fixes:
- Four fixes when running on s390 architecture (BE). With these fixes, the
driver is fully functional on Big-endian architectures. The fixes
include:
- Validation/Patching of user packets
- Completion queue handling
- Internal H/W queues submission
- Device IRQ unmasking operation
- Fix to double free in an error path to avoid kernel corruption
- Fix to DRAM usage accounting when a user process is terminated
forcefully.
* tag 'misc-habanalabs-fixes-2019-08-12' of git://people.freedesktop.org/~gabbayo/linux:
habanalabs: fix device IRQ unmasking for BE host
habanalabs: fix endianness handling for internal QMAN submission
habanalabs: fix completion queue handling when host is BE
habanalabs: fix endianness handling for packets from user
habanalabs: fix DRAM usage accounting on context tear down
habanalabs: Avoid double free in error flow
`dt3k_ns_to_timer()` determines the prescaler and divisor to use to
produce a desired timing period. It is influenced by a rounding mode
and can round the divisor up, down, or to the nearest value. However,
the code for rounding up currently does the same as rounding down! Fix
ir by using the `DIV_ROUND_UP()` macro to calculate the divisor when
rounding up.
Also, change the types of the `divider`, `base` and `prescale` variables
from `int` to `unsigned int` to avoid mixing signed and unsigned types
in the calculations.
Also fix a typo in a nearby comment: "improvment" => "improvement".
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Cc: stable <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20190812120814.21188-1-abbotti@mev.co.uk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In `dt3k_ns_to_timer()` the following lines near the end of the function
result in a signed integer overflow:
prescale = 15;
base = timer_base * (1 << prescale);
divider = 65535;
*nanosec = divider * base;
(`divider`, `base` and `prescale` are type `int`, `timer_base` and
`*nanosec` are type `unsigned int`. The value of `timer_base` will be
either 50 or 100.)
The main reason for the overflow is that the calculation for `base` is
completely wrong. It should be:
base = timer_base * (prescale + 1);
which matches an earlier instance of this calculation in the same
function.
Reported-by: David Binderman <dcb314@hotmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Link: https://lore.kernel.org/r/20190812111517.26803-1-abbotti@mev.co.uk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In read_per_ring_refs(), after 'req' and related memory regions are
allocated, xen_blkif_map() is invoked to map the shared frame, irq, and
etc. However, if this mapping process fails, no cleanup is performed,
leading to memory leaks. To fix this issue, invoke the cleanup before
returning the error.
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
blk_exit_queue will free elevator_data, while blk_mq_requeue_work
will access it. Move cancel of requeue_work to the front of
blk_exit_queue to avoid use-after-free.
blk_exit_queue blk_mq_requeue_work
__elevator_exit blk_mq_run_hw_queues
blk_mq_exit_sched blk_mq_run_hw_queue
dd_exit_queue blk_mq_hctx_has_pending
kfree(elevator_data) blk_mq_sched_has_work
dd_has_work
Fixes: fbc2a15e34 ("blk-mq: move cancel of requeue_work into blk_mq_release")
Cc: stable@vger.kernel.org
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Felipe writes:
USB: fixes for v5.3-rc4
Just a three fixes this time around.
A race condition on mass storage gadget between disable() and
set_alt()
Clear a flag that was left set upon reset or disconnect
A fix for renesas_usb3 UDC's sysfs interface
* tag 'fixes-for-v5.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb:
usb: gadget: mass_storage: Fix races between fsg_disable and fsg_set_alt
usb: gadget: composite: Clear "suspended" on reset/disconnect
usb: gadget: udc: renesas_usb3: Fix sysfs interface of "role"
The omapdrm driver uses dma_set_coherent_mask(), but that's not enough
anymore when LPAE is enabled.
From Christoph Hellwig <hch@lst.de>:
> The traditional arm DMA code ignores, but the generic dma-direct/swiotlb
> has stricter checks and thus fails mappings without a DMA mask. As we
> use swiotlb for arm with LPAE now, omapdrm needs to catch up and
> actually set a DMA mask.
Change the dma_set_coherent_mask() call to
dma_coerce_mask_and_coherent() so that the dev->dma_mask is also set.
Fixes: ad3c7b18c5 ("arm: use swiotlb for bounce buffering on LPAE configs")
Reported-by: "H. Nikolaus Schaller" <hns@goldelico.com>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Link: https://patchwork.freedesktop.org/patch/msgid/c219e7e6-0f66-d6fd-e0cf-59c803386825@ti.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Backport requested for omap dma mask fix. I'm not sure it still
requires it, but just in case. :)
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Currently, failure of cpuhp_setup_state() is ignored and the syscore ops
and the control interfaces can still be added even after the failure. But,
this error handling will cause a few issues:
1. The CPUs may have different values in the IA32_UMWAIT_CONTROL
MSR because there is no way to roll back the control MSR on
the CPUs which already set the MSR before the failure.
2. If the sysfs interface is added successfully, there will be a mismatch
between the global control value and the control MSR:
- The interface shows the default global control value. But,
the control MSR is not set to the value because the CPU online
function, which is supposed to set the MSR to the value,
is not installed.
- If the sysadmin changes the global control value through
the interface, the control MSR on all current online CPUs is
set to the new value. But, the control MSR on newly onlined CPUs
after the value change will not be set to the new value due to
lack of the CPU online function.
3. On resume from suspend/hibernation, the boot CPU restores the control
MSR to the global control value through the syscore ops. But, the
control MSR on all APs is not set due to lake of the CPU online
function.
To solve the issues and enforce consistent behavior on the failure
of the CPU hotplug setup, make the following changes:
1. Cache the original control MSR value which is configured by
hardware or BIOS before kernel boot. This value is likely to
be 0. But it could be a different number as well. Cache the
control MSR only once before the MSR is changed.
2. Add the CPU offline function so that the MSR is restored to the
original control value on all CPUs on the failure.
3. On the failure, exit from cpumait_init() so that the syscore ops
and the control interfaces are not added.
Reported-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/1565401237-60936-1-git-send-email-fenghua.yu@intel.com
One of the modifications made by commit d916b1be94 ("nvme-pci: use
host managed power state for suspend") was adding a pci_save_state()
call to nvme_suspend() so as to instruct the PCI bus type to leave
devices handled by the nvme driver in D0 during suspend-to-idle.
That was done with the assumption that ASPM would transition the
device's PCIe link into a low-power state when the device became
inactive. However, if ASPM is disabled for the device, its PCIe
link will stay in L0 and in that case commit d916b1be94 is likely
to cause the energy used by the system while suspended to increase.
Namely, if the device in question works in accordance with the PCIe
specification, putting it into D3hot causes its PCIe link to go to
L1 or L2/L3 Ready, which is lower-power than L0. Since the energy
used by the system while suspended depends on the state of its PCIe
link (as a general rule, the lower-power the state of the link, the
less energy the system will use), putting the device into D3hot
during suspend-to-idle should be more energy-efficient that leaving
it in D0 with disabled ASPM.
For this reason, avoid leaving NVMe devices with disabled ASPM in D0
during suspend-to-idle. Instead, shut them down entirely and let
the PCI bus type put them into D3.
Fixes: d916b1be94 ("nvme-pci: use host managed power state for suspend")
Link: https://lore.kernel.org/linux-pm/2763495.NmdaWeg79L@kreacher/T/#t
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Add a function checking whether or not PCIe ASPM has been enabled for
a given device.
It will be used by the NVMe driver to decide how to handle the
device during system suspend.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
When unmasking IRQs inside the ASIC, the driver passes an array of all the
IRQ to unmask. The ASIC's CPU is working in LE so when running in a BE
host, the driver needs to do the proper endianness swapping when preparing
this array.
In addition, this patch also fixes the endianness of a couple of kernel log
debug messages that print values of packets
Signed-off-by: Ben Segal <bpsegal20@gmail.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
The PQs of internal H/W queues (QMANs) can be located in different memory
areas for different ASICs. Therefore, when writing PQEs, we need to use
the correct function according to the location of the PQ. e.g. if the PQ
is located in the device's memory (SRAM or DRAM), we need to use
memcpy_toio() so it would work in architectures that have separate
address ranges for IO memory.
This patch makes the code that writes the PQE to be ASIC-specific so we
can handle this properly per ASIC.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Tested-by: Ben Segal <bpsegal20@gmail.com>
This patch fix the CQ irq handler to work in hosts with BE architecture.
It adds the correct endian-swapping macros around the relevant memory
accesses.
Signed-off-by: Ben Segal <bpsegal20@gmail.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Packets that arrive from the user and need to be parsed by the driver are
assumed to be in LE format.
This patch fix all the places where the code handles these packets and use
the correct endianness macros to handle them, as the driver handles the
packets in CPU format (LE or BE depending on the arch).
Signed-off-by: Ben Segal <bpsegal20@gmail.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
The patch fix the DRAM usage accounting by adding a missing update of
the DRAM memory consumption, when a context is being torn down without an
organized release of the allocated memory.
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
In case kernel context init fails during device initialization, both
hl_ctx_put() and kfree() are called, ending with a double free of the
kernel context.
Calling kfree() is needed only when a failure happens between the
allocation of the kernel context and its initialization, so move it to
there and remove it from the error flow.
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
If fsg_disable() and fsg_set_alt() are called too closely to each
other (for example due to a quick reset/reconnect), what can happen
is that fsg_set_alt sets common->new_fsg from an interrupt while
handle_exception is trying to process the config change caused by
fsg_disable():
fsg_disable()
...
handle_exception()
sets state back to FSG_STATE_NORMAL
hasn't yet called do_set_interface()
or is inside it.
---> interrupt
fsg_set_alt
sets common->new_fsg
queues a new FSG_STATE_CONFIG_CHANGE
<---
Now, the first handle_exception can "see" the updated
new_fsg, treats it as if it was a fsg_set_alt() response,
call usb_composite_setup_continue() etc...
But then, the thread sees the second FSG_STATE_CONFIG_CHANGE,
and goes back down the same path, wipes and reattaches a now
active fsg, and .. calls usb_composite_setup_continue() which
at this point is wrong.
Not only we get a backtrace, but I suspect the second set_interface
wrecks some state causing the host to get upset in my case.
This fixes it by replacing "new_fsg" by a "state argument" (same
principle) which is set in the same lock section as the state
update, and retrieved similarly.
That way, there is never any discrepancy between the dequeued
state and the observed value of it. We keep the ability to have
the latest reconfig operation take precedence, but we guarantee
that once "dequeued" the argument (new_fsg) will not be clobbered
by any new event.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
In some cases, one can get out of suspend with a reset or
a disconnect followed by a reconnect. Previously we would
leave a stale suspended flag set.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Since the role_store() uses strncmp(), it's possible to refer
out-of-memory if the sysfs data size is smaller than strlen("host").
This patch fixes it by using sysfs_streq() instead of strncmp().
Fixes: cc995c9ec1 ("usb: gadget: udc: renesas_usb3: add support for usb role swap")
Cc: <stable@vger.kernel.org> # v4.12+
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
clang warns:
drivers/net/ethernet/toshiba/tc35815.c:1507:30: warning: use of logical
'&&' with constant operand [-Wconstant-logical-operand]
if (!HAVE_DMA_RXALIGN(lp) && NET_IP_ALIGN)
^ ~~~~~~~~~~~~
drivers/net/ethernet/toshiba/tc35815.c:1507:30: note: use '&' for a
bitwise operation
if (!HAVE_DMA_RXALIGN(lp) && NET_IP_ALIGN)
^~
&
drivers/net/ethernet/toshiba/tc35815.c:1507:30: note: remove constant to
silence this warning
if (!HAVE_DMA_RXALIGN(lp) && NET_IP_ALIGN)
~^~~~~~~~~~~~~~~
1 warning generated.
Explicitly check that NET_IP_ALIGN is not zero, which matches how this
is checked in other parts of the tree. Because NET_IP_ALIGN is a build
time constant, this check will be constant folded away during
optimization.
Fixes: 82a9928db5 ("tc35815: Enable StripCRC feature")
Link: https://github.com/ClangBuiltLinux/linux/issues/608
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We set the field 'addr_trial_end' to 'jiffies', instead of the current
value 0, at the moment the node address is initialized. This guarantees
we don't inadvertently enter an address trial period when the node
address is explicitly set by the user.
Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To identify timestamps for matching with their packets, Spectrum-1 uses a
five-tuple of (port, direction, domain number, message type, sequence ID).
If there are several clients from the same domain behind a single port
sending Delay_Req's, the only thing differentiating these packets, as far
as Spectrum-1 is concerned, is the sequence ID. Should sequence IDs between
individual clients be similar, conflicts may arise. That is not a problem
to hardware, which will simply deliver timestamps on a first comes, first
served basis.
However the driver uses a simple hash table to store the unmatched pieces.
When a new conflicting piece arrives, it pushes out the previously stored
one, which if it is a packet, is delivered without timestamp. Later on as
the corresponding timestamps arrive, the first one is mismatched to the
second packet, and the second one is never matched and eventually is GCd.
To correct this issue, instead of using a simple rhashtable, use rhltable
to keep the unmatched entries.
Previously, a found unmatched entry would always be removed from the hash
table. That is not the case anymore--an incompatible entry is left in the
hash table. Therefore removal from the hash table cannot be used to confirm
the validity of the looked-up pointer, instead the lookup would simply need
to be redone. Therefore move it inside the critical section. This
simplifies a lot of the code.
Fixes: 8748642751 ("mlxsw: spectrum: PTP: Support SIOCGHWTSTAMP, SIOCSHWTSTAMP ioctls")
Reported-by: Alex Veber <alexve@mellanox.com>
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adjust the function names in two doc comments to match the corresponding
functions.
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix rxrpc_unuse_local() to handle a NULL local pointer as it can be called
on an unbound socket on which rx->local is not yet set.
The following reproduced (includes omitted):
int main(void)
{
socket(AF_RXRPC, SOCK_DGRAM, AF_INET);
return 0;
}
causes the following oops to occur:
BUG: kernel NULL pointer dereference, address: 0000000000000010
...
RIP: 0010:rxrpc_unuse_local+0x8/0x1b
...
Call Trace:
rxrpc_release+0x2b5/0x338
__sock_release+0x37/0xa1
sock_close+0x14/0x17
__fput+0x115/0x1e9
task_work_run+0x72/0x98
do_exit+0x51b/0xa7a
? __context_tracking_exit+0x4e/0x10e
do_group_exit+0xab/0xab
__x64_sys_exit_group+0x14/0x17
do_syscall_64+0x89/0x1d4
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Reported-by: syzbot+20dee719a2e090427b5f@syzkaller.appspotmail.com
Fixes: 730c5fd42c ("rxrpc: Fix local endpoint refcounting")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeffrey Altman <jaltman@auristor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Prior to the commit in the fixes tag, the resource controller in netdevsim
tracked fib entries and rules per network namespace. Restore that behavior.
Fixes: 5fc494225c ("netdevsim: create devlink instance per netdevsim instance")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf 2019-08-11
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) x64 JIT code generation fix for backward-jumps to 1st insn, from Alexei.
2) Fix buggy multi-closing of BTF file descriptor in libbpf, from Andrii.
3) Fix libbpf_num_possible_cpus() to make it thread safe, from Takshak.
4) Fix bpftool to dump an error if pinning fails, from Jakub.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove logically dead code and mark switch cases where we are expecting
to fall through.
Fix the following warnings (Building: defconfig sh):
arch/sh/kernel/disassemble.c:478:8: warning: this statement may fall
through [-Wimplicit-fallthrough=]
arch/sh/kernel/disassemble.c:487:8: warning: this statement may fall
through [-Wimplicit-fallthrough=]
arch/sh/kernel/disassemble.c:496:8: warning: this statement may fall
through [-Wimplicit-fallthrough=]
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Pull dax fixes from Dan Williams:
"A filesystem-dax and device-dax fix for v5.3.
The filesystem-dax fix is tagged for stable as the implementation has
been mistakenly throwing away all cow pages on any truncate or hole
punch operation as part of the solution to coordinate device-dma vs
truncate to dax pages.
The device-dax change fixes up a regression this cycle from the
introduction of a common 'internal per-cpu-ref' implementation.
Summary:
- Fix dax_layout_busy_page() to not discard private cow pages of
fs/dax private mappings.
- Update the memremap_pages core to properly cleanup on behalf of
internal reference-count users like device-dax"
* tag 'dax-fixes-5.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
mm/memremap: Fix reuse of pgmap instances with internal references
dax: dax_layout_busy_page() should not unmap cow pages
Pull NTB fix from Jon Mason:
"Bug fix for NTB MSI kernel compile warning"
* tag 'ntb-5.3-bugfixes' of git://github.com/jonmason/ntb:
NTB/msi: remove incorrect MODULE defines
Pull NVMe fixes from Sagi:
"Few nvme fixes for the next rc round.
- detect capacity changes on the mpath disk from Anthony
- probe/remove fix from Keith
- various fixes to pass blktests from Logan
- deadlock in reset/scan race fix
- nvme-rdma use-after-free fix
- deadlock fix when passthru commands race mpath disk info update"
* 'nvme-5.3-rc' of git://git.infradead.org/nvme:
nvme-pci: Fix async probe remove race
nvme: fix controller removal race with scan work
nvme-rdma: fix possible use-after-free in connect error flow
nvme: fix a possible deadlock when passthru commands sent to a multipath device
nvme-core: Fix extra device_put() call on error path
nvmet-file: fix nvmet_file_flush() always returning an error
nvmet-loop: Flush nvme_delete_wq when removing the port
nvmet: Fix use-after-free bug when a port is removed
nvme-multipath: revalidate nvme_ns_head gendisk in nvme_validate_ns
Pull RISC-V updates from Paul Walmsley:
"A few minor RISC-V updates for v5.3-rc4:
- Remove __udivdi3() from the 32-bit Linux port, converting the only
upstream user to use do_div(), per Linux policy
- Convert the RISC-V standard clocksource away from per-cpu data
structures, since only one is used by Linux, even on a multi-CPU
system
- A set of DT binding updates that remove an obsolete text binding in
favor of a YAML binding, fix a bogus compatible string in the
schema (thus fixing a "make dtbs_check" warning), and clarifies the
future values expected in one of the RISC-V CPU properties"
* tag 'riscv/for-v5.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
dt-bindings: riscv: fix the schema compatible string for the HiFive Unleashed board
dt-bindings: riscv: remove obsolete cpus.txt
RISC-V: Remove udivdi3
riscv: delay: use do_div() instead of __udivdi3()
dt-bindings: Update the riscv,isa string description
RISC-V: Remove per cpu clocksource
Pull x86 fixes from Thomas Gleixner:
"A few fixes for x86:
- Don't reset the carefully adjusted build flags for the purgatory
and remove the unwanted flags instead. The 'reset all' approach led
to build fails under certain circumstances.
- Unbreak CLANG build of the purgatory by avoiding the builtin
memcpy/memset implementations.
- Address missing prototype warnings by including the proper header
- Fix yet more fall-through issues"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/lib/cpu: Address missing prototypes warning
x86/purgatory: Use CFLAGS_REMOVE rather than reset KBUILD_CFLAGS
x86/purgatory: Do not use __builtin_memcpy and __builtin_memset
x86: mtrr: cyrix: Mark expected switch fall-through
x86/ptrace: Mark expected switch fall-through
Pull perf tooling fixes from Thomas Gleixner:
"Perf tooling fixes all over the place:
- Fix the selection of the main thread COMM in db-export
- Fix the disassemmbly display for BPF in annotate
- Fix cpumap mask setup in perf ftrace when only one CPU is present
- Add the missing 'cpu_clk_unhalted.core' event
- Fix CPU 0 bindings in NUMA benchmarks
- Fix the module size calculations for s390
- Handle the gap between kernel end and module start on s390
correctly
- Build and typo fixes"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf pmu-events: Fix missing "cpu_clk_unhalted.core" event
perf annotate: Fix s390 gap between kernel end and module start
perf record: Fix module size on s390
perf tools: Fix include paths in ui directory
perf tools: Fix a typo in a variable name in the Documentation Makefile
perf cpumap: Fix writing to illegal memory in handling cpumap mask
perf ftrace: Fix failure to set cpumask when only one cpu is present
perf db-export: Fix thread__exec_comm()
perf annotate: Fix printing of unaugmented disassembled instructions from BPF
perf bench numa: Fix cpu0 binding
Pull scheduler fixes from Thomas Gleixner:
"Three fixlets for the scheduler:
- Avoid double bandwidth accounting in the push & pull code
- Use a sane FIFO priority for the Pressure Stall Information (PSI)
thread.
- Avoid permission checks when setting the scheduler params for the
PSI thread"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/psi: Do not require setsched permission from the trigger creator
sched/psi: Reduce psimon FIFO priority
sched/deadline: Fix double accounting of rq/running bw in push & pull
Pull irq fix from Thomas Gleixner:
"A small fix for the affinity spreading code.
It failed to handle situations where a single vector was requested
either due to only one CPU being available or vector exhaustion
causing only a single interrupt to be granted.
The fix is to simply remove the requirement in the affinity spreading
code for more than one interrupt being available"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq/affinity: Create affinity mask for single vector
Pull objtool warning fix from Thomas Gleixner:
"The recent objtool fixes/enhancements unearthed a unbalanced CLAC in
the i915 driver.
Chris asked me to pick the fix up and route it through"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
drm/i915: Remove redundant user_access_end() from __copy_from_user() error path
Pull gfs2 fix from Andreas Gruenbacher:
"Fix incorrect lseek / fiemap results"
* tag 'gfs2-v5.3-rc3.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: gfs2_walk_metadata fix
A compilation -Wimplicit-fallthrough warning was enabled by commit
a035d552a9 ("Makefile: Globally enable fall-through warning")
Even though clang 10.0.0 does not currently support this warning without
a patch, clang currently does not support a value for this option.
Link: https://bugs.llvm.org/show_bug.cgi?id=39382
The gcc default for this warning is 3 so removing the =3 has no effect
for gcc and enables the warning for patched versions of clang.
Also remove the =3 from an existing use in a parisc Makefile:
arch/parisc/math-emu/Makefile
Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-and-tested-by: Nathan Chancellor <natechancellor@gmail.com>
Cc: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull char/misc driver fixes Greg KH:
"Here are some small char/misc driver fixes for 5.3-rc4.
Two of these are for the habanalabs driver for issues found when
running on a big-endian system (are they still alive?) The others are
tiny fixes reported by people, and a MAINTAINERS update about the
location of the fpga development tree.
All of these have been in linux-next for a while with no reported
issues"
* tag 'char-misc-5.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
coresight: Fix DEBUG_LOCKS_WARN_ON for uninitialized attribute
MAINTAINERS: Move linux-fpga tree to new location
nvmem: Use the same permissions for eeprom as for nvmem
habanalabs: fix host memory polling in BE architecture
habanalabs: fix F/W download in BE architecture
Pull driver core fixes from Greg KH:
"Here are two small fixes for some driver core issues that have been
reported. There is also a kernfs "fix" here, which was then reverted
because it was found to cause problems in linux-next.
The driver core fixes both resolve reported issues, one with gpioint
stuff that showed up in 5.3-rc1, and the other finally (and hopefully)
resolves a very long standing race when removing glue directories.
It's nice to get that issue finally resolved and the developers
involved should be applauded for the persistence it took to get this
patch finally accepted.
All of these have been in linux-next for a while with no reported
issues. Well, the one reported issue, hence the revert :)"
* tag 'driver-core-5.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
Revert "kernfs: fix memleak in kernel_ops_readdir()"
kernfs: fix memleak in kernel_ops_readdir()
driver core: Fix use-after-free and double free on glue directory
driver core: platform: return -ENXIO for missing GpioInt
Pull tty fix from Greg KH:
"Here is a single tty kgdb fix for 5.3-rc4.
It fixes an annoying log message that has caused kdb to become
useless. It's another fallout from commit ddde3c18b7 ("vt: More
locking checks") which tries to enforce locking checks more strictly
in the tty layer, unfortunatly when kdb is stopped, there's no need
for locks :)
This patch has been linux-next for a while with no reported issues"
* tag 'tty-5.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
kgdboc: disable the console lock when in kgdb
Pull staging / IIO driver fixes from Greg KH:
"Here are some small staging and IIO driver fixes for 5.3-rc4.
Nothing major, just resolutions for a number of small reported issues,
full details in the shortlog.
All have been in linux-next for a while with no reported issues"
* tag 'staging-5.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
iio: adc: gyroadc: fix uninitialized return code
docs: generic-counter.rst: fix broken references for ABI file
staging: android: ion: Bail out upon SIGKILL when allocating memory.
Staging: fbtft: Fix GPIO handling
staging: unisys: visornic: Update the description of 'poll_for_irq()'
staging: wilc1000: flush the workqueue before deinit the host
staging: gasket: apex: fix copy-paste typo
Staging: fbtft: Fix reset assertion when using gpio descriptor
Staging: fbtft: Fix probing of gpio descriptor
iio: imu: mpu6050: add missing available scan masks
iio: cros_ec_accel_legacy: Fix incorrect channel setting
IIO: Ingenic JZ47xx: Set clock divider on probe
iio: adc: max9611: Fix misuse of GENMASK macro
Pull USB fixes from Greg KH:
"Here are some small USB fixes for 5.3-rc4.
The "biggest" one here is moving code from one file to another in
order to fix a long-standing race condition with the creation of sysfs
files for USB devices. Turns out that there are now userspace tools
out there that are hitting this long-known bug, so it's time to fix
them. Thankfully the tool-maker in this case fixed the issue :)
The other patches in here are all fixes for reported issues. Now that
syzbot knows how to fuzz USB drivers better, and is starting to now
fuzz the userspace facing side of them at the same time, there will be
more and more small fixes like these coming, which is a good thing.
All of these have been in linux-next with no reported issues"
* tag 'usb-5.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: setup authorized_default attributes using usb_bus_notify
usb: iowarrior: fix deadlock on disconnect
Revert "USB: rio500: simplify locking"
usb: usbfs: fix double-free of usb memory upon submiturb error
usb: yurex: Fix use-after-free in yurex_delete
usb: typec: tcpm: Ignore unsupported/unknown alternate mode requests
xhci: Fix NULL pointer dereference at endpoint zero reset.
usb: host: xhci-rcar: Fix timeout in xhci_suspend()
usb: typec: ucsi: ccg: Fix uninitilized symbol error
usb: typec: tcpm: remove tcpm dir if no children
usb: typec: tcpm: free log buf memory when remove debug file
usb: typec: tcpm: Add NULL check before dereferencing config
All the way back to introducing dma_common_mmap we've defaulted to mark
the pages as uncached. But this is wrong for DMA coherent devices.
Later on DMA_ATTR_WRITE_COMBINE also got incorrect treatment as that
flag is only treated special on the alloc side for non-coherent devices.
Introduce a new dma_pgprot helper that deals with the check for coherent
devices so that only the remapping cases ever reach arch_dma_mmap_pgprot
and we thus ensure no aliasing of page attributes happens, which makes
the powerpc version of arch_dma_mmap_pgprot obsolete and simplifies the
remaining ones.
Note that this means arch_dma_mmap_pgprot is a bit misnamed now, but
we'll phase it out soon.
Fixes: 64ccc9c033 ("common: dma-mapping: add support for generic dma_mmap_* calls")
Reported-by: Shawn Anastasio <shawn@anastas.io>
Reported-by: Gavin Li <git@thegavinli.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Catalin Marinas <catalin.marinas@arm.com> # arm64
The dma required_mask needs to reflect the actual addressing capabilities
needed to handle the whole system RAM. When truncated down to the bus
addressing capabilities dma_addressing_limited() will incorrectly signal
no limitations for devices which are restricted by the bus_dma_mask.
Fixes: b4ebe60632 (dma-direct: implement complete bus_dma_mask handling)
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Tested-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
The new DMA_ATTR_NO_KERNEL_MAPPING needs to actually assign
a dma_addr to work. Also skip it if the architecture needs
forced decryption handling, as that needs a kernel virtual
address.
Fixes: d98849aff8 (dma-direct: handle DMA_ATTR_NO_KERNEL_MAPPING in common code)
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Pull pin control fixes from Linus Walleij:
- Delay acquisition of regmaps in the Aspeed G5 driver.
- Make a symbol static to reduce compiler noise.
* tag 'pinctrl-v5.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: aspeed: Make aspeed_pinmux_ips static
pinctrl: aspeed-g5: Delay acquisition of regmaps
Pull powerpc fix from Michael Ellerman:
"Just one fix, a revert of a commit that was meant to be a minor
improvement to some inline asm, but ended up having no real benefit
with GCC and broke booting 32-bit machines when using Clang.
Thanks to: Arnd Bergmann, Christophe Leroy, Nathan Chancellor, Nick
Desaulniers, Segher Boessenkool"
* tag 'powerpc-5.3-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
Revert "powerpc: slightly improve cache helpers"
Pull fall-through fixes from Gustavo A. R. Silva:
"Mark more switch cases where we are expecting to fall through, fixing
fall-through warnings in arm, sparc64, mips, i386 and s390"
* tag 'Wimplicit-fallthrough-5.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux:
ARM: ep93xx: Mark expected switch fall-through
scsi: fas216: Mark expected switch fall-throughs
pcmcia: db1xxx_ss: Mark expected switch fall-throughs
video: fbdev: omapfb_main: Mark expected switch fall-throughs
watchdog: riowd: Mark expected switch fall-through
s390/net: Mark expected switch fall-throughs
crypto: ux500/crypt: Mark expected switch fall-throughs
watchdog: wdt977: Mark expected switch fall-through
watchdog: scx200_wdt: Mark expected switch fall-through
watchdog: Mark expected switch fall-throughs
ARM: signal: Mark expected switch fall-through
mfd: omap-usb-host: Mark expected switch fall-throughs
mfd: db8500-prcmu: Mark expected switch fall-throughs
ARM: OMAP: dma: Mark expected switch fall-throughs
ARM: alignment: Mark expected switch fall-throughs
ARM: tegra: Mark expected switch fall-through
ARM/hw_breakpoint: Mark expected switch fall-throughs
To avoid reducing the frequency of a CPU prematurely, we skip reducing
the frequency if the CPU had been busy recently.
This should not be done when the limits of the policy are changed, for
example due to thermal throttling. We should always get the frequency
within the new limits as soon as possible.
Trying to fix this by using only one flag, i.e. need_freq_update, can
lead to a race condition where the flag gets cleared without forcing us
to change the frequency at least once. And so this patch introduces
another flag to avoid that race condition.
Fixes: ecd2884291 ("cpufreq: schedutil: Don't set next_freq to UINT_MAX")
Cc: v4.18+ <stable@vger.kernel.org> # v4.18+
Reported-by: Doug Smythies <dsmythies@telus.net>
Tested-by: Doug Smythies <dsmythies@telus.net>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
dev_pm_qos_update_request() can return 1 on success, so don't treat
it as an error.
Fixes: 18c49926c4 ("cpufreq: Add QoS requests for userspace constraints")
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
In snd_hda_parse_generic_codec(), 'spec' is allocated through kzalloc().
Then, the pin widgets in 'codec' are parsed. However, if the parsing
process fails, 'spec' is not deallocated, leading to a memory leak.
To fix the above issue, free 'spec' before returning the error.
Fixes: 352f7f914e ("ALSA: hda - Merge Realtek parser code to generic parser")
Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Pull Kbuild fixes from Masahiro Yamada:
- revive single target %.ko
- do not create built-in.a where it is unneeded
- do not create modules.order where it is unneeded
- show a warning if subdir-y/m is used to visit a module Makefile
* tag 'kbuild-fixes-v5.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: show hint if subdir-y/m is used to visit module Makefile
kbuild: generate modules.order only in directories visited by obj-y/m
kbuild: fix false-positive need-builtin calculation
kbuild: revive single target %.ko
Now that we swap the original proto and clear the ULP pointer
on close we have to make sure no callback will try to access
the freed state. sk_write_space is not part of sk_prot, remember
to swap it.
Reported-by: syzbot+dcdc9deefaec44785f32@syzkaller.appspotmail.com
Fixes: 95fa145479 ("bpf: sockmap/tls, close can race with map free")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mark switch cases where we are expecting to fall through.
Fix the following warnings (Building: arm-ep93xx_defconfig arm):
arch/arm/mach-ep93xx/crunch.c: In function 'crunch_do':
arch/arm/mach-ep93xx/crunch.c:46:3: warning: this statement may
fall through [-Wimplicit-fallthrough=]
memset(crunch_state, 0, sizeof(*crunch_state));
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/arm/mach-ep93xx/crunch.c:53:2: note: here
case THREAD_NOTIFY_EXIT:
^~~~
Notice that, in this particular case, the code comment is
modified in accordance with what GCC is expecting to find.
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Mark switch cases where we are expecting to fall through.
Fix the following warnings (Building: rpc_defconfig arm):
drivers/scsi/arm/fas216.c: In function ‘fas216_disconnect_intr’:
drivers/scsi/arm/fas216.c:913:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
if (fas216_get_last_msg(info, info->scsi.msgin_fifo) == ABORT) {
^
drivers/scsi/arm/fas216.c:919:2: note: here
default: /* huh? */
^~~~~~~
drivers/scsi/arm/fas216.c: In function ‘fas216_kick’:
drivers/scsi/arm/fas216.c:1959:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
fas216_allocate_tag(info, SCpnt);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/scsi/arm/fas216.c:1960:2: note: here
case TYPE_OTHER:
^~~~
drivers/scsi/arm/fas216.c: In function ‘fas216_busservice_intr’:
drivers/scsi/arm/fas216.c:1413:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
fas216_stoptransfer(info);
^~~~~~~~~~~~~~~~~~~~~~~~~
drivers/scsi/arm/fas216.c:1414:2: note: here
case STATE(STAT_STATUS, PHASE_SELSTEPS):/* Sel w/ steps -> Status */
^~~~
drivers/scsi/arm/fas216.c:1424:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
fas216_stoptransfer(info);
^~~~~~~~~~~~~~~~~~~~~~~~~
drivers/scsi/arm/fas216.c:1425:2: note: here
case STATE(STAT_MESGIN, PHASE_COMMAND): /* Command -> Message In */
^~~~
drivers/scsi/arm/fas216.c: In function ‘fas216_funcdone_intr’:
drivers/scsi/arm/fas216.c:1573:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
if ((stat & STAT_BUSMASK) == STAT_MESGIN) {
^
drivers/scsi/arm/fas216.c:1579:2: note: here
default:
^~~~~~~
drivers/scsi/arm/fas216.c: In function ‘fas216_handlesync’:
drivers/scsi/arm/fas216.c:605:20: warning: this statement may fall through [-Wimplicit-fallthrough=]
info->scsi.phase = PHASE_MSGOUT_EXPECT;
~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~
drivers/scsi/arm/fas216.c:607:2: note: here
case async:
^~~~
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Mark switch cases where we are expecting to fall through.
This patch fixes the following warnings (Building: db1xxx_defconfig mips):
drivers/pcmcia/db1xxx_ss.c:257:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
drivers/pcmcia/db1xxx_ss.c:269:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Mark switch cases where we are expecting to fall through.
This patch fixes the following warning (Building: omap1_defconfig arm):
drivers/watchdog/wdt285.c:170:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
drivers/watchdog/ar7_wdt.c:237:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
drivers/video/fbdev/omap/omapfb_main.c:449:23: warning: this statement may fall through [-Wimplicit-fallthrough=]
drivers/video/fbdev/omap/omapfb_main.c:1549:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
drivers/video/fbdev/omap/omapfb_main.c:1547:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
drivers/video/fbdev/omap/omapfb_main.c:1545:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
drivers/video/fbdev/omap/omapfb_main.c:1543:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
drivers/video/fbdev/omap/omapfb_main.c:1540:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
drivers/video/fbdev/omap/omapfb_main.c:1538:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
drivers/video/fbdev/omap/omapfb_main.c:1535:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Mark switch cases where we are expecting to fall through.
This patch fixes the following warnings (Building: sparc64):
drivers/watchdog/riowd.c: In function ‘riowd_ioctl’:
drivers/watchdog/riowd.c:136:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
riowd_writereg(p, riowd_timeout, WDTO_INDEX);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/watchdog/riowd.c:139:2: note: here
case WDIOC_GETTIMEOUT:
^~~~
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Mark switch cases where we are expecting to fall through.
This patch fixes the following warnings (Building: s390):
drivers/s390/net/ctcm_fsms.c: In function ‘ctcmpc_chx_attnbusy’:
drivers/s390/net/ctcm_fsms.c:1703:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
if (grp->changed_side == 1) {
^
drivers/s390/net/ctcm_fsms.c:1707:2: note: here
case MPCG_STATE_XID0IOWAIX:
^~~~
drivers/s390/net/ctcm_mpc.c: In function ‘ctc_mpc_alloc_channel’:
drivers/s390/net/ctcm_mpc.c:358:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
if (callback)
^
drivers/s390/net/ctcm_mpc.c:360:2: note: here
case MPCG_STATE_XID0IOWAIT:
^~~~
drivers/s390/net/ctcm_mpc.c: In function ‘mpc_action_timeout’:
drivers/s390/net/ctcm_mpc.c:1469:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
if ((fsm_getstate(rch->fsm) == CH_XID0_PENDING) &&
^
drivers/s390/net/ctcm_mpc.c:1472:2: note: here
default:
^~~~~~~
drivers/s390/net/ctcm_mpc.c: In function ‘mpc_send_qllc_discontact’:
drivers/s390/net/ctcm_mpc.c:2087:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
if (grp->estconnfunc) {
^
drivers/s390/net/ctcm_mpc.c:2092:2: note: here
case MPCG_STATE_FLOWC:
^~~~
drivers/s390/net/qeth_l2_main.c: In function ‘qeth_l2_process_inbound_buffer’:
drivers/s390/net/qeth_l2_main.c:328:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
if (IS_OSN(card)) {
^
drivers/s390/net/qeth_l2_main.c:337:3: note: here
default:
^~~~~~~
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Mark switch cases where we are expecting to fall through.
This patch fixes the following warning (Building: arm):
drivers/crypto/ux500/cryp/cryp.c: In function ‘cryp_save_device_context’:
drivers/crypto/ux500/cryp/cryp.c:316:16: warning: this statement may fall through [-Wimplicit-fallthrough=]
ctx->key_4_r = readl_relaxed(&src_reg->key_4_r);
drivers/crypto/ux500/cryp/cryp.c:318:2: note: here
case CRYP_KEY_SIZE_192:
^~~~
drivers/crypto/ux500/cryp/cryp.c:320:16: warning: this statement may fall through [-Wimplicit-fallthrough=]
ctx->key_3_r = readl_relaxed(&src_reg->key_3_r);
drivers/crypto/ux500/cryp/cryp.c:322:2: note: here
case CRYP_KEY_SIZE_128:
^~~~
drivers/crypto/ux500/cryp/cryp.c:324:16: warning: this statement may fall through [-Wimplicit-fallthrough=]
ctx->key_2_r = readl_relaxed(&src_reg->key_2_r);
drivers/crypto/ux500/cryp/cryp.c:326:2: note: here
default:
^~~~~~~
In file included from ./include/linux/io.h:13:0,
from drivers/crypto/ux500/cryp/cryp_p.h:14,
from drivers/crypto/ux500/cryp/cryp.c:15:
drivers/crypto/ux500/cryp/cryp.c: In function ‘cryp_restore_device_context’:
./arch/arm/include/asm/io.h:92:22: warning: this statement may fall through [-Wimplicit-fallthrough=]
#define __raw_writel __raw_writel
^
./arch/arm/include/asm/io.h:299:29: note: in expansion of macro ‘__raw_writel’
#define writel_relaxed(v,c) __raw_writel((__force u32) cpu_to_le32(v),c)
^~~~~~~~~~~~
drivers/crypto/ux500/cryp/cryp.c:363:3: note: in expansion of macro ‘writel_relaxed’
writel_relaxed(ctx->key_4_r, ®->key_4_r);
^~~~~~~~~~~~~~
drivers/crypto/ux500/cryp/cryp.c:365:2: note: here
case CRYP_KEY_SIZE_192:
^~~~
In file included from ./include/linux/io.h:13:0,
from drivers/crypto/ux500/cryp/cryp_p.h:14,
from drivers/crypto/ux500/cryp/cryp.c:15:
./arch/arm/include/asm/io.h:92:22: warning: this statement may fall through [-Wimplicit-fallthrough=]
#define __raw_writel __raw_writel
^
./arch/arm/include/asm/io.h:299:29: note: in expansion of macro ‘__raw_writel’
#define writel_relaxed(v,c) __raw_writel((__force u32) cpu_to_le32(v),c)
^~~~~~~~~~~~
drivers/crypto/ux500/cryp/cryp.c:367:3: note: in expansion of macro ‘writel_relaxed’
writel_relaxed(ctx->key_3_r, ®->key_3_r);
^~~~~~~~~~~~~~
drivers/crypto/ux500/cryp/cryp.c:369:2: note: here
case CRYP_KEY_SIZE_128:
^~~~
In file included from ./include/linux/io.h:13:0,
from drivers/crypto/ux500/cryp/cryp_p.h:14,
from drivers/crypto/ux500/cryp/cryp.c:15:
./arch/arm/include/asm/io.h:92:22: warning: this statement may fall through [-Wimplicit-fallthrough=]
#define __raw_writel __raw_writel
^
./arch/arm/include/asm/io.h:299:29: note: in expansion of macro ‘__raw_writel’
#define writel_relaxed(v,c) __raw_writel((__force u32) cpu_to_le32(v),c)
^~~~~~~~~~~~
drivers/crypto/ux500/cryp/cryp.c:371:3: note: in expansion of macro ‘writel_relaxed’
writel_relaxed(ctx->key_2_r, ®->key_2_r);
^~~~~~~~~~~~~~
drivers/crypto/ux500/cryp/cryp.c:373:2: note: here
default:
^~~~~~~
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Mark switch cases where we are expecting to fall through.
This patch fixes the following warning (Building: arm):
drivers/watchdog/wdt977.c: In function ‘wdt977_ioctl’:
LD [M] drivers/media/platform/vicodec/vicodec.o
drivers/watchdog/wdt977.c:400:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
wdt977_keepalive();
^~~~~~~~~~~~~~~~~~
drivers/watchdog/wdt977.c:403:2: note: here
case WDIOC_GETTIMEOUT:
^~~~
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Mark switch cases where we are expecting to fall through.
This patch fixes the following warning (Building: i386):
drivers/watchdog/scx200_wdt.c: In function ‘scx200_wdt_ioctl’:
drivers/watchdog/scx200_wdt.c:188:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
scx200_wdt_ping();
^~~~~~~~~~~~~~~~~
drivers/watchdog/scx200_wdt.c:189:2: note: here
case WDIOC_GETTIMEOUT:
^~~~
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Mark switch cases where we are expecting to fall through.
This patch fixes the following warnings:
drivers/watchdog/ar7_wdt.c: warning: this statement may fall
through [-Wimplicit-fallthrough=]: => 237:3
drivers/watchdog/pcwd.c: warning: this statement may fall
through [-Wimplicit-fallthrough=]: => 653:3
drivers/watchdog/sb_wdog.c: warning: this statement may fall
through [-Wimplicit-fallthrough=]: => 204:3
drivers/watchdog/wdt.c: warning: this statement may fall
through [-Wimplicit-fallthrough=]: => 391:3
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Mark switch cases where we are expecting to fall through.
This patch fixes the following warning:
arch/arm/kernel/signal.c: In function 'do_signal':
arch/arm/kernel/signal.c:598:12: warning: this statement may fall through [-Wimplicit-fallthrough=]
restart -= 2;
~~~~~~~~^~~~
arch/arm/kernel/signal.c:599:3: note: here
case -ERESTARTNOHAND:
^~~~
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Mark switch cases where we are expecting to fall through.
This patch fixes the following warnings:
drivers/mfd/omap-usb-host.c: In function 'usbhs_runtime_resume':
drivers/mfd/omap-usb-host.c:303:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
if (!IS_ERR(omap->hsic480m_clk[i])) {
^
drivers/mfd/omap-usb-host.c:313:3: note: here
case OMAP_EHCI_PORT_MODE_TLL:
^~~~
drivers/mfd/omap-usb-host.c: In function 'usbhs_runtime_suspend':
drivers/mfd/omap-usb-host.c:345:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
if (!IS_ERR(omap->hsic480m_clk[i]))
^
drivers/mfd/omap-usb-host.c:349:3: note: here
case OMAP_EHCI_PORT_MODE_TLL:
^~~~
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Mark switch cases where we are expecting to fall through.
This patch fixes the following warnings:
drivers/mfd/db8500-prcmu.c: In function 'dsiclk_rate':
drivers/mfd/db8500-prcmu.c:1592:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
div *= 2;
~~~~^~~~
drivers/mfd/db8500-prcmu.c:1593:2: note: here
case PRCM_DSI_PLLOUT_SEL_PHI_2:
^~~~
drivers/mfd/db8500-prcmu.c:1594:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
div *= 2;
~~~~^~~~
drivers/mfd/db8500-prcmu.c:1595:2: note: here
case PRCM_DSI_PLLOUT_SEL_PHI:
^~~~
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Mark switch cases where we are expecting to fall through.
This patch fixes the following warnings:
arch/arm/plat-omap/dma.c: In function 'omap_set_dma_src_burst_mode':
arch/arm/plat-omap/dma.c:384:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
if (dma_omap2plus()) {
^
arch/arm/plat-omap/dma.c:393:2: note: here
case OMAP_DMA_DATA_BURST_16:
^~~~
arch/arm/plat-omap/dma.c:394:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
if (dma_omap2plus()) {
^
arch/arm/plat-omap/dma.c:402:2: note: here
default:
^~~~~~~
arch/arm/plat-omap/dma.c: In function 'omap_set_dma_dest_burst_mode':
arch/arm/plat-omap/dma.c:473:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
if (dma_omap2plus()) {
^
arch/arm/plat-omap/dma.c:481:2: note: here
default:
^~~~~~~
Notice that, in this particular case, the code comment is
modified in accordance with what GCC is expecting to find.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Mark switch cases where we are expecting to fall through.
This patch fixes the following warnings:
arch/arm/mm/alignment.c: In function 'thumb2arm':
arch/arm/mm/alignment.c:688:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
if ((tinstr & (3 << 9)) == 0x0400) {
^
arch/arm/mm/alignment.c:700:2: note: here
default:
^~~~~~~
arch/arm/mm/alignment.c: In function 'do_alignment_t32_to_handler':
arch/arm/mm/alignment.c:753:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
poffset->un = (tinst2 & 0xff) << 2;
~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~
arch/arm/mm/alignment.c:754:2: note: here
case 0xe940:
^~~~
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Mark switch cases where we are expecting to fall through.
This patch fixes the following warning:
arch/arm/mach-tegra/reset.c: In function 'tegra_cpu_reset_handler_enable':
arch/arm/mach-tegra/reset.c:72:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
tegra_cpu_reset_handler_set(reset_address);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/arm/mach-tegra/reset.c:74:2: note: here
case 0:
^~~~
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Mark switch cases where we are expecting to fall through.
This patch fixes the following warnings:
arch/arm/kernel/hw_breakpoint.c: In function 'hw_breakpoint_arch_parse':
arch/arm/kernel/hw_breakpoint.c:609:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
if (hw->ctrl.len == ARM_BREAKPOINT_LEN_2)
^
arch/arm/kernel/hw_breakpoint.c:611:2: note: here
case 3:
^~~~
arch/arm/kernel/hw_breakpoint.c:613:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
if (hw->ctrl.len == ARM_BREAKPOINT_LEN_1)
^
arch/arm/kernel/hw_breakpoint.c:615:2: note: here
default:
^~~~~~~
arch/arm/kernel/hw_breakpoint.c: In function 'arch_build_bp_info':
arch/arm/kernel/hw_breakpoint.c:544:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
if ((hw->ctrl.type != ARM_BREAKPOINT_EXECUTE)
^
arch/arm/kernel/hw_breakpoint.c:547:2: note: here
default:
^~~~~~~
In file included from include/linux/kernel.h:11,
from include/linux/list.h:9,
from include/linux/preempt.h:11,
from include/linux/hardirq.h:5,
from arch/arm/kernel/hw_breakpoint.c:16:
arch/arm/kernel/hw_breakpoint.c: In function 'hw_breakpoint_pending':
include/linux/compiler.h:78:22: warning: this statement may fall through [-Wimplicit-fallthrough=]
# define unlikely(x) __builtin_expect(!!(x), 0)
^~~~~~~~~~~~~~~~~~~~~~~~~~
include/asm-generic/bug.h:136:2: note: in expansion of macro 'unlikely'
unlikely(__ret_warn_on); \
^~~~~~~~
arch/arm/kernel/hw_breakpoint.c:863:3: note: in expansion of macro 'WARN'
WARN(1, "Asynchronous watchpoint exception taken. Debugging results may be unreliable\n");
^~~~
arch/arm/kernel/hw_breakpoint.c:864:2: note: here
case ARM_ENTRY_SYNC_WATCHPOINT:
^~~~
arch/arm/kernel/hw_breakpoint.c: In function 'core_has_os_save_restore':
arch/arm/kernel/hw_breakpoint.c:910:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
if (oslsr & ARM_OSLSR_OSLM0)
^
arch/arm/kernel/hw_breakpoint.c:912:2: note: here
default:
^~~~~~~
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Pull kvm fixes from Paolo Bonzini:
"Bugfixes (arm and x86) and cleanups"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
selftests: kvm: Adding config fragments
KVM: selftests: Update gitignore file for latest changes
kvm: remove unnecessary PageReserved check
KVM: arm/arm64: vgic: Reevaluate level sensitive interrupts on enable
KVM: arm: Don't write junk to CP15 registers on reset
KVM: arm64: Don't write junk to sysregs on reset
KVM: arm/arm64: Sync ICH_VMCR_EL2 back when about to block
x86: kvm: remove useless calls to kvm_para_available
KVM: no need to check return value of debugfs_create functions
KVM: remove kvm_arch_has_vcpu_debugfs()
KVM: Fix leak vCPU's VMCS value into other pCPU
KVM: Check preempted_in_kernel for involuntary preemption
KVM: LAPIC: Don't need to wakeup vCPU twice afer timer fire
arm64: KVM: hyp: debug-sr: Mark expected switch fall-through
KVM: arm64: Update kvm_arm_exception_class and esr_class_str for new EC
KVM: arm: vgic-v3: Mark expected switch fall-through
arm64: KVM: regmap: Fix unexpected switch fall-through
KVM: arm/arm64: Introduce kvm_pmu_vcpu_init() to setup PMU counter index
Pull input updates from Dmitry Torokhov:
- newer systems with Elan touchpads will be switched over to SMBus
- HP Spectre X360 will be using SMbus/RMI4
- checks for invalid USB descriptors in kbtab and iforce
- build fixes for applespi driver (misconfigs)
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: iforce - add sanity checks
Input: applespi - use struct_size() helper
Input: kbtab - sanity check for endpoint type
Input: usbtouchscreen - initialize PM mutex before using it
Input: applespi - add dependency on LEDS_CLASS
Input: synaptics - enable RMI mode for HP Spectre X360
Input: elantech - annotate fall-through case in elantech_use_host_notify()
Input: elantech - enable SMBus on new (2018+) systems
Input: applespi - fix trivial typo in struct description
Input: applespi - select CRC16 module
Input: applespi - fix warnings detected by sparse
Currently, attempts to shutdown and re-enable a device-dax instance
trigger:
Missing reference count teardown definition
WARNING: CPU: 37 PID: 1608 at mm/memremap.c:211 devm_memremap_pages+0x234/0x850
[..]
RIP: 0010:devm_memremap_pages+0x234/0x850
[..]
Call Trace:
dev_dax_probe+0x66/0x190 [device_dax]
really_probe+0xef/0x390
driver_probe_device+0xb4/0x100
device_driver_attach+0x4f/0x60
Given that the setup path initializes pgmap->ref, arrange for it to be
also torn down so devm_memremap_pages() is ready to be called again and
not be mistaken for the 3rd-party per-cpu-ref case.
Fixes: 24917f6b10 ("memremap: provide an optional internal refcount in struct dev_pagemap")
Reported-by: Fan Du <fan.du@intel.com>
Tested-by: Vishal Verma <vishal.l.verma@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/156530042781.2068700.8733813683117819799.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This fixes a warning of "suspicious rcu_dereference_check() usage"
when nload runs.
Fixes: 776e726bfb ("netvsc: fix RCU warning in get_stats")
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Saeed Mahameed says:
====================
Mellanox, mlx5 fixes 2019-08-08
This series introduces some fixes to mlx5 driver.
Highlights:
1) From Tariq, Critical mlx5 kTLS fixes to better align with hw specs.
2) From Aya, Fixes to mlx5 tx devlink health reporter.
3) From Maxim, aRFs parsing to use flow dissector to avoid relying on
invalid skb fields.
Please pull and let me know if there is any problem.
For -stable v4.3
('net/mlx5e: Only support tx/rx pause setting for port owner')
For -stable v4.9
('net/mlx5e: Use flow keys dissector to parse packets for ARFS')
For -stable v5.1
('net/mlx5e: Fix false negative indication on tx reporter CQE recovery')
('net/mlx5e: Remove redundant check in CQE recovery flow of tx reporter')
('net/mlx5e: ethtool, Avoid setting speed to 56GBASE when autoneg off')
Note: when merged with net-next this minor conflict will pop up:
++<<<<<<< (net-next)
+ if (is_eswitch_flow) {
+ flow->esw_attr->match_level = match_level;
+ flow->esw_attr->tunnel_match_level = tunnel_match_level;
++=======
+ if (flow->flags & MLX5E_TC_FLOW_ESWITCH) {
+ flow->esw_attr->inner_match_level = inner_match_level;
+ flow->esw_attr->outer_match_level = outer_match_level;
++>>>>>>> (net)
To resolve, use hunks from net (2nd) and replace:
if (flow->flags & MLX5E_TC_FLOW_ESWITCH)
with
if (is_eswitch_flow)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
ixgbe_service_task() calls unregister_netdev() under rtnl_lock().
But unregister_netdev() internally calls rtnl_lock().
So deadlock would occur.
Fixes: 59dd45d550 ("ixgbe: firmware recovery mode")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
Fix collisions in socket cookie generation
This change makes the socket cookie generator as a global counter
instead of per netns in order to fix cookie collisions for BPF use
cases we ran into. See main patch #1 for more details.
Given the change is small/trivial and fixes an issue we're seeing
my preference would be net tree (though it cleanly applies to
net-next as well). Went for net tree instead of bpf tree here given
the main change is in net/core/sock_diag.c, but either way would be
fine with me.
v1 -> v2:
- Fix up commit description in patch #1, thanks Eric!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull in updates in BPF helper function description.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Generating and retrieving socket cookies are a useful feature that is
exposed to BPF for various program types through bpf_get_socket_cookie()
helper.
The fact that the cookie counter is per netns is quite a limitation
for BPF in practice in particular for programs in host namespace that
use socket cookies as part of a map lookup key since they will be
causing socket cookie collisions e.g. when attached to BPF cgroup hooks
or cls_bpf on tc egress in host namespace handling container traffic
from veth or ipvlan devices with peer in different netns. Change the
counter to be global instead.
Socket cookie consumers must assume the value as opqaue in any case.
Not every socket must have a cookie generated and knowledge of the
counter value itself does not provide much value either way hence
conversion to global is fine.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Martynas Pumputis <m@lambda.lt>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently there are two nodes named "regulator1" in the Draak DTS: a
3.3V regulator for the eMMC and the LVDS decoder, and a 12V regulator
for the backlight. This causes the former to be overwritten by the
latter.
Fix this by renaming all regulators with numerical suffixes to use named
suffixes, which are less likely to conflict.
Fixes: 4fbd4158fe ("arm64: dts: renesas: r8a77995: draak: Add backlight")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
David Howells says:
====================
Here's a couple of fixes for rxrpc:
(1) Fix refcounting of the local endpoint.
(2) Don't calculate or report packet skew information. This has been
obsolete since AFS 3.1 and so is a waste of resources.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit ff9b45c55b ("kbuild: modpost: read modules.order instead
of $(MODVERDIR)/*.mod"), a module is no longer built in the following
pattern:
[Makefile]
subdir-y := some-module
[some-module/Makefile]
obj-m := some-module.o
You cannot write Makefile this way in upstream because modules.order is
not correctly generated. subdir-y is used to descend to a sub-directory
that builds tools, device trees, etc.
For external modules, the modules order does not matter. So, the
Makefile above was known to work.
I believe the Makefile should be re-written as follows:
[Makefile]
obj-m := some-module/
[some-module/Makefile]
obj-m := some-module.o
However, people will have no idea if their Makefile suddenly stops
working. In fact, I received questions from multiple people.
Show a warning for a while if obj-m is specified in a Makefile visited
by subdir-y or subdir-m.
I touched the %/ rule to avoid false-positive warnings for the single
target.
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Tom Stonecypher <thomas.edwardx.stonecypher@intel.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Tested-by: Jan Kiszka <jan.kiszka@siemens.com>
The modules.order files in directories visited by the chain of obj-y
or obj-m are merged to the upper-level ones, and become parts of the
top-level modules.order. On the other hand, there is no need to
generate modules.order in directories visited by subdir-y or subdir-m
since they would become orphan anyway.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
The current implementation of need-builtin is false-positive,
for example, in the following Makefile:
obj-m := foo/
obj-y := foo/bar/
..., where foo/built-in.a is not required.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
I removed the single target %.ko in commit ff9b45c55b ("kbuild:
modpost: read modules.order instead of $(MODVERDIR)/*.mod") because
the modpost stage does not work reliably. For instance, the module
dependency, modversion, etc. do not work if we lack symbol information
from the other modules.
Yet, some people still want to build only one module in their interest,
and it may be still useful if it is used within those limitations.
Fixes: ff9b45c55b ("kbuild: modpost: read modules.order instead of $(MODVERDIR)/*.mod")
Reported-by: Don Brace <don.brace@microsemi.com>
Reported-by: Arend Van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Pull drm fixes from Dave Airlie:
"Usual fixes roundup. Nothing too crazy or serious, one non-released
ioctl is removed in the amdkfd driver.
core:
- mode parser strncpy fix
i915:
- GLK DSI escape clock setting
- HDCP memleak fix
tegra:
- one gpiod/of regression fix
amdgpu:
- fix VCN to handle the latest navi10 firmware
- fix for fan control on navi10
- properly handle SMU metrics table on navi10
- fix a resume regression on Stoney
- kfd revert a GWS ioctl
vmwgfx:
- memory leak fix
rockchip:
- suspend fix"
* tag 'drm-fixes-2019-08-09' of git://anongit.freedesktop.org/drm/drm:
drm/vmwgfx: fix memory leak when too many retries have occurred
Revert "drm/amdkfd: New IOCTL to allocate queue GWS"
Revert "drm/amdgpu: fix transform feedback GDS hang on gfx10 (v2)"
drm/amdgpu: pin the csb buffer on hw init for gfx v8
drm/rockchip: Suspend DP late
drm/i915: Fix wrong escape clock divisor init for GLK
drm/i915: fix possible memory leak in intel_hdcp_auth_downstream()
drm/modes: Fix unterminated strncpy
drm/amd/powerplay: correct navi10 vcn powergate
drm/amd/powerplay: honor hw limit on fetching metrics data for navi10
drm/amd/powerplay: Allow changing of fan_control in smu_v11_0
drm/amd/amdgpu/vcn_v2_0: Move VCN 2.0 specific dec ring test to vcn_v2_0
drm/amd/amdgpu/vcn_v2_0: Mark RB commands as KMD commands
drm/tegra: Fix gpiod_get_from_of_node() regression
Pull arm64 fix from Catalin Marinas:
"Fix bad_pte warning caused by pte_mkdevmap() not setting PTE_SPECIAL"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: mm: add missing PTE_SPECIAL in pte_mkdevmap on arm64
Pull s390 fixes from Vasily Gorbik:
- Map vdso also for statically linked binaries like all other
architectures.
- Fix no .bss usage compile-time check to account common objects with
the help of binutils size tool. Top level Makefile change acked-by
Masahiro.
- A fix to make perf happy with _etext symbol type.
- Fix dump_pagetables which is broken since p*d_offset implementation
change to comply with mm/gup.c expectations.
- Revert memory sharing for diag calls in protected virtualization,
since this is not required after all.
- Couple of other minor code cleanups.
* tag 's390-5.3-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/vdso: map vdso also for statically linked binaries
s390/build: use size command to perform empty .bss check
kbuild: add OBJSIZE variable for the size tool
s390: put _stext and _etext into .text section
s390/head64: cleanup unused labels
s390/unwind: remove stack recursion warning
s390/setup: adjust start_code of init_mm to _text
s390/mm: fix dump_pagetables top level page table walking
s390/protvirt: avoid memory sharing for diag 308 set/store
Pull block fixes from Jens Axboe:
- Revert of a bcache patch that caused an oops for some (Coly)
- ata rb532 unused warning fix (Gustavo)
- AoE kernel crash fix (He)
- Error handling fixup for blkdev_get() (Jan)
- libata read/write translation and SFF PIO fix (me)
- Use after free and error handling fix for O_DIRECT fragments. There's
still a nowait + sync oddity in there, we'll nail that start next
week. If all else fails, I'll queue a revert of the NOWAIT change.
(me)
- Loop GFP_KERNEL -> GFP_NOIO deadlock fix (Mikulas)
- Two BFQ regression fixes that caused crashes (Paolo)
* tag 'for-linus-20190809' of git://git.kernel.dk/linux-block:
bcache: Revert "bcache: use sysfs_match_string() instead of __sysfs_match_string()"
loop: set PF_MEMALLOC_NOIO for the worker thread
bdev: Fixup error handling in blkdev_get()
block, bfq: handle NULL return value by bfq_init_rq()
block, bfq: move update of waker and woken list to queue freeing
block, bfq: reset last_completed_rq_bfqq if the pointed queue is freed
block: aoe: Fix kernel crash due to atomic sleep when exiting
libata: add SG safety checks in SFF pio transfers
libata: have ata_scsi_rw_xlat() fail invalid passthrough requests
block: fix O_DIRECT error handling for bio fragments
ata: rb532_cf: Fix unused variable warning in rb532_pata_driver_probe
Pull MMC fixes from Ulf Hansson:
- cavium: Fix DMA support
- sdhci-sprd: Fix soft reset when runtime resuming"
* tag 'mmc-v5.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: cavium: Add the missing dma unmap when the dma has finished.
mmc: cavium: Set the correct dma max segment size for mmc_host
mmc: sdhci-sprd: Fix the incorrect soft reset operation when runtime resuming
Pull fbdev fix from Bartlomiej Zolnierkiewicz:
"fbdev patches will now go to upstream through drm-misc tree for
improved maintainership and better integration testing so update
MAINTAINERS file accordingly"
* tag 'fbdev-v5.3-rc4' of git://github.com/bzolnier/linux:
MAINTAINERS: handle fbdev changes through drm-misc tree
Pull pwm fix from Thierry Reding:
"A single fix for a backlight brightness regression introduced in
this merge window"
* tag 'pwm/for-5.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm:
pwm: Fallback to the static lookup-list when acpi_pwm_get fails
Pull sound fixes from Takashi Iwai:
"Lots of small fixes at this time since we've received the ASoC fix
batch now.
- Some coverage in ASoC core mostly for minor issues like NULL checks
for DPCM and proper error handling in DAI instantiation
- A collection of small device-specific changes in various ASoC codec
and platform drivers
- OF-tree refcount fixes in a few ASoC drivers
- Fixes of memory leaks in the error paths of various ASoC / ALSA
drivers
- A workaround for a long-standing issue on AMD HD-audio device
- Updates of MAINTAINERS, mail addresses, file permission fixups"
* tag 'sound-5.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (38 commits)
ALSA: firewire: fix a memory leak bug
sound: fix a memory leak bug
ALSA: hda - Workaround for crackled sound on AMD controller (1022:1457)
ALSA: hiface: fix multiple memory leak bugs
ALSA: hda - Don't override global PCM hw info flag
ALSA: usb-audio: fix a memory leak bug
ASoC: max98373: Remove executable bits
ASoC: amd: acp3x: use dma address for acp3x dma driver
ASoC: amd: acp3x: use dma_ops of parent device for acp3x dma driver
ASoC: max98373: add 88200 and 96000 sampling rate support
ASoC: sun4i-i2s: Incorrect SR and WSS computation
MAINTAINERS: Update Intel ASoC drivers maintainers
ASoC: ti: davinci-mcasp: Correct slot_width posed constraint
ASoC: rockchip: Fix mono capture
ASoC: Intel: Fix some acpi vs apci typo in somme comments
ASoC: ti: davinci-mcasp: Fix clk PDIR handling for i2s master mode
ASoC: Fail card instantiation if DAI format setup fails
ASoC: SOF: Intel: hda: remove misleading error trace from IRQ thread
ASoC: qcom: apq8016_sbc: Fix oops with multiple DAI links
ASoC: dapm: fix a memory leak bug
...
Pull media fix from Mauro Carvalho Chehab:
"A fix at the vivid CEC support"
* tag 'media/v5.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
media: vivid: fix missing cec adapter name
Pull power management fix from Rafael Wysocki:
"Revert a recent PCI power management change that caused problems to
occur on multiple systems (Mika Westerberg)"
* tag 'pm-5.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
Revert "PCI: Add missing link delays required by the PCIe spec"
Pull crypto fixes from Herbert Xu:
"Fix a number of bugs in the ccp driver"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: ccp - Ignore tag length when decrypting GCM ciphertext
crypto: ccp - Add support for valid authsize values less than 16
crypto: ccp - Fix oops by properly managing allocated structures
It turns out that the current version of gfs2_metadata_walker suffers
from multiple problems that can cause gfs2_hole_size to report an
incorrect size. This will confuse fiemap as well as lseek with the
SEEK_DATA flag.
Fix that by changing gfs2_hole_walker to compute the metapath to the
first data block after the hole (if any), and compute the hole size
based on that.
Fixes xfstest generic/490.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Bob Peterson <rpeterso@redhat.com>
Cc: stable@vger.kernel.org # v4.18+
Jakub Kicinski says:
====================
First make sure we don't use "prog" in error messages because
the pinning operation could be performed on a map. Second add
back missing error message if pin syscall failed.
====================
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Since scatterlist dimensions are all unsigned ints, in the relatively
rare cases where a device's max_segment_size is set to UINT_MAX, then
the "cur_len + s_length <= max_len" check in __finalise_sg() will always
return true. As a result, the corner case of such a device mapping an
excessively large scatterlist which is mergeable to or beyond a total
length of 4GB can lead to overflow and a bogus truncated dma_length in
the resulting segment.
As we already assume that any single segment must be no longer than
max_len to begin with, this can easily be addressed by reshuffling the
comparison.
Fixes: 809eac54cd ("iommu/dma: Implement scatterlist segment merging")
Reported-by: Nicolin Chen <nicoleotsuka@gmail.com>
Tested-by: Nicolin Chen <nicoleotsuka@gmail.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
KVM/arm fixes for 5.3, take #2
- Fix our system register reset so that we stop writing
non-sensical values to them, and track which registers
get reset instead.
- Sync VMCR back from the GIC on WFI so that KVM has an
exact vue of PMR.
- Reevaluate state of HW-mapped, level triggered interrupts
on enable.
KVM/arm fixes for 5.3
- A bunch of switch/case fall-through annotation, fixing one actual bug
- Fix PMU reset bug
- Add missing exception class debug strings
selftests kvm test cases need pre-required kernel configs for the test
to get pass.
Signed-off-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The kvm_create_max_vcpus test has been moved to the main directory,
and sync_regs_test is now available on s390x, too.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Don't bother generating maxSkew in the ACK packet as it has been obsolete
since AFS 3.1.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
The object lifetime management on the rxrpc_local struct is broken in that
the rxrpc_local_processor() function is expected to clean up and remove an
object - but it may get requeued by packets coming in on the backing UDP
socket once it starts running.
This may result in the assertion in rxrpc_local_rcu() firing because the
memory has been scheduled for RCU destruction whilst still queued:
rxrpc: Assertion failed
------------[ cut here ]------------
kernel BUG at net/rxrpc/local_object.c:468!
Note that if the processor comes around before the RCU free function, it
will just do nothing because ->dead is true.
Fix this by adding a separate refcount to count active users of the
endpoint that causes the endpoint to be destroyed when it reaches 0.
The original refcount can then be used to refcount objects through the work
processor and cause the memory to be rcu freed when that reaches 0.
Fixes: 4f95dd78a7 ("rxrpc: Rework local endpoint management")
Reported-by: syzbot+1e0edc4b8b7494c28450@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
fbdev patches will now go to upstream through drm-misc tree (IOW
starting with v5.4 merge window fbdev changes will be included in
DRM pull request) for improved maintainership and better integration
testing. Update MAINTAINERS file accordingly.
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Flows that are in teardown state (due to RST / FIN TCP packet) still
have their offload flag set on. Hence, the conntrack garbage collector
may race to undo the timeout adjustment that the fixup routine performs,
leaving the conntrack entry in place with the internal offload timeout
(one day).
Update teardown flow state to ESTABLISHED and set tracking to liberal,
then once the offload bit is cleared, adjust timeout if it is more than
the default fixup timeout (conntrack might already have set a lower
timeout from the packet path).
Fixes: da5984e510 ("netfilter: nf_flow_table: add support for sending flows back to the slow path")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Update conntrack entry to pick up expired flows, otherwise the conntrack
entry gets stuck with the internal offload timeout (one day). The TCP
state also needs to be adjusted to ESTABLISHED state and tracking is set
to liberal mode in order to give conntrack a chance to pick up the
expired flow.
Fixes: ac2a66665e ("netfilter: add generic flow table infrastructure")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
If a rule that has already a bound anonymous set fails to be added, the
preparation phase releases the rule and the bound set. However, the
transaction object from the abort path still has a reference to the set
object that is stale, leading to a use-after-free when checking for the
set->bound field. Add a new field to the transaction that specifies if
the set is bound, so the abort path can skip releasing it since the rule
command owns it and it takes care of releasing it. After this update,
the set->bound field is removed.
[ 24.649883] Unable to handle kernel paging request at virtual address 0000000000040434
[ 24.657858] Mem abort info:
[ 24.660686] ESR = 0x96000004
[ 24.663769] Exception class = DABT (current EL), IL = 32 bits
[ 24.669725] SET = 0, FnV = 0
[ 24.672804] EA = 0, S1PTW = 0
[ 24.675975] Data abort info:
[ 24.678880] ISV = 0, ISS = 0x00000004
[ 24.682743] CM = 0, WnR = 0
[ 24.685723] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000428952000
[ 24.692207] [0000000000040434] pgd=0000000000000000
[ 24.697119] Internal error: Oops: 96000004 [#1] SMP
[...]
[ 24.889414] Call trace:
[ 24.891870] __nf_tables_abort+0x3f0/0x7a0
[ 24.895984] nf_tables_abort+0x20/0x40
[ 24.899750] nfnetlink_rcv_batch+0x17c/0x588
[ 24.904037] nfnetlink_rcv+0x13c/0x190
[ 24.907803] netlink_unicast+0x18c/0x208
[ 24.911742] netlink_sendmsg+0x1b0/0x350
[ 24.915682] sock_sendmsg+0x4c/0x68
[ 24.919185] ___sys_sendmsg+0x288/0x2c8
[ 24.923037] __sys_sendmsg+0x7c/0xd0
[ 24.926628] __arm64_sys_sendmsg+0x2c/0x38
[ 24.930744] el0_svc_common.constprop.0+0x94/0x158
[ 24.935556] el0_svc_handler+0x34/0x90
[ 24.939322] el0_svc+0x8/0xc
[ 24.942216] Code: 37280300 f9404023 91014262 aa1703e0 (f9401863)
[ 24.948336] ---[ end trace cebbb9dcbed3b56f ]---
Fixes: f6ac858589 ("netfilter: nf_tables: unbind set in rule from commit path")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The OMAP 4 TRM specifies that when using double-index addressing
the address increases by the ES plus the EI value minus 1 within
a frame. When a full frame is transferred, the address increases
by the ES plus the frame index (FI) value minus 1.
The omap-dma code didn't account for the 'minus 1' in the FI register.
To get correct addressing, add 1 to the src_icg value.
This was found when testing a hacked version of the media m2m-deinterlace.c
driver on a Pandaboard.
The only other source that uses this feature is omap_vout_vrfb.c,
and that adds a + 1 when setting the dst_icg. This is a workaround
for the broken omap-dma.c behavior. So remove the workaround at the
same time that we fix omap-dma.c.
I tested the omap_vout driver with a Beagle XM board to check that
the '+ 1' in omap_vout_vrfb.c was indeed a workaround for the omap-dma
bug.
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Acked-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Acked-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Link: https://lore.kernel.org/r/952e7f51-f208-9333-6f58-b7ed20d2ea0b@xs4all.nl
Signed-off-by: Vinod Koul <vkoul@kernel.org>
s390 does not map the vdso for statically linked binaries, assuming
that this doesn't make sense. See commit fc5243d98a ("[S390]
arch_setup_additional_pages arguments").
However with glibc commit d665367f596d ("linux: Enable vDSO for static
linking as default (BZ#19767)") and commit 5e855c895401 ("s390: Enable
VDSO for static linking") the vdso is also used for statically linked
binaries - if the kernel would make it available.
Therefore map the vdso always, just like all other architectures.
Reported-by: Stefan Liebler <stli@linux.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
A HW mapped level sensitive interrupt asserted by a device will not be put
into the ap_list if it is disabled at the VGIC level. When it is enabled
again, it will be inserted into the ap_list and written to a list register
on guest entry regardless of the state of the device.
We could argue that this can also happen on real hardware, when the command
to enable the interrupt reached the GIC before the device had the chance to
de-assert the interrupt signal; however, we emulate the distributor and
redistributors in software and we can do better than that.
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
At the moment, the way we reset CP15 registers is mildly insane:
We write junk to them, call the reset functions, and then check that
we have something else in them.
The "fun" thing is that this can happen while the guest is running
(PSCI, for example). If anything in KVM has to evaluate the state
of a CP15 register while junk is in there, bad thing may happen.
Let's stop doing that. Instead, we track that we have called a
reset function for that register, and assume that the reset
function has done something.
In the end, the very need of this reset check is pretty dubious,
as it doesn't check everything (a lot of the CP15 reg leave outside
of the cp15_regs[] array). It may well be axed in the near future.
Signed-off-by: Marc Zyngier <maz@kernel.org>
At the moment, the way we reset system registers is mildly insane:
We write junk to them, call the reset functions, and then check that
we have something else in them.
The "fun" thing is that this can happen while the guest is running
(PSCI, for example). If anything in KVM has to evaluate the state
of a system register while junk is in there, bad thing may happen.
Let's stop doing that. Instead, we track that we have called a
reset function for that register, and assume that the reset
function has done something. This requires fixing a couple of
sysreg refinition in the trap table.
In the end, the very need of this reset check is pretty dubious,
as it doesn't check everything (a lot of the sysregs leave outside of
the sys_regs[] array). It may well be axed in the near future.
Tested-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
As spin_unlock_irq will enable interrupts.
Function tsi108_stat_carry is called from interrupt handler tsi108_irq.
Interrupts are enabled in interrupt handler.
Use spin_lock_irqsave/spin_unlock_irqrestore instead of spin_(un)lock_irq
in IRQ context to avoid this.
Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We should also enable team's vlan tx offload in hw_enc_features,
pass the vlan packets to the slave devices with vlan tci, let the
slave handle vlan tunneling offload implementation.
Fixes: 3268e5cb49 ("team: Advertise tunneling offload features")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sk_validate_xmit_skb() and drivers depend on the sk member of
struct sk_buff to identify segments requiring encryption.
Any operation which removes or does not preserve the original TLS
socket such as skb_orphan() or skb_clone() will cause clear text
leaks.
Make the TCP socket underlying an offloaded TLS connection
mark all skbs as decrypted, if TLS TX is in offload mode.
Then in sk_validate_xmit_skb() catch skbs which have no socket
(or a socket with no validation) and decrypted flag set.
Note that CONFIG_SOCK_VALIDATE_XMIT, CONFIG_TLS_DEVICE and
sk->sk_validate_xmit_skb are slightly interchangeable right now,
they all imply TLS offload. The new checks are guarded by
CONFIG_TLS_DEVICE because that's the option guarding the
sk_buff->decrypted member.
Second, smaller issue with orphaning is that it breaks
the guarantee that packets will be delivered to device
queues in-order. All TLS offload drivers depend on that
scheduling property. This means skb_orphan_partial()'s
trick of preserving partial socket references will cause
issues in the drivers. We need a full orphan, and as a
result netem delay/throttling will cause all TLS offload
skbs to be dropped.
Reusing the sk_buff->decrypted flag also protects from
leaking clear text when incoming, decrypted skb is redirected
(e.g. by TC).
See commit 0608c69c9a ("bpf: sk_msg, sock{map|hash} redirect
through ULP") for justification why the internal flag is safe.
The only location which could leak the flag in is tcp_bpf_sendmsg(),
which is taken care of by clearing the previously unused bit.
v2:
- remove superfluous decrypted mark copy (Willem);
- remove the stale doc entry (Boris);
- rely entirely on EOR marking to prevent coalescing (Boris);
- use an internal sendpages flag instead of marking the socket
(Boris).
v3 (Willem):
- reorganize the can_skb_orphan_partial() condition;
- fix the flag leak-in through tcp_bpf_sendmsg.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Roman Mashak says:
====================
Fix batched event generation for skbedit action
When adding or deleting a batch of entries, the kernel sends up to
TCA_ACT_MAX_PRIO (defined to 32 in kernel) entries in an event to user
space. However it does not consider that the action sizes may vary and
require different skb sizes.
For example, consider the following script adding 32 entries with all
supported skbedit parameters and cookie (in order to maximize netlink
messages size):
% cat tc-batch.sh
TC="sudo /mnt/iproute2.git/tc/tc"
$TC actions flush action skbedit
for i in `seq 1 $1`;
do
cmd="action skbedit queue_mapping 2 priority 10 mark 7/0xaabbccdd \
ptype host inheritdsfield \
index $i cookie aabbccddeeff112233445566778800a1 "
args=$args$cmd
done
$TC actions add $args
%
% ./tc-batch.sh 32
Error: Failed to fill netlink attributes while adding TC action.
We have an error talking to the kernel
%
patch 1 adds callback in tc_action_ops of skbedit action, which calculates
the action size, and passes size to tcf_add_notify()/tcf_del_notify().
patch 2 updates the TDC test suite with relevant skbedit test cases.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Update TDC tests with cases varifying ability of TC to install or delete
batches of skbedit actions.
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add get_fill_size() routine used to calculate the action size
when building a batch of events.
Fixes: ca9b0e27e ("pkt_action: add new action skbedit")
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/net/dsa/sja1105/sja1105_main.c: In function sja1105_fdb_dump:
drivers/net/dsa/sja1105/sja1105_main.c:1226:14: warning:
variable tx_vid set but not used [-Wunused-but-set-variable]
drivers/net/dsa/sja1105/sja1105_main.c:1226:6: warning:
variable rx_vid set but not used [-Wunused-but-set-variable]
They are not used since commit 6d7c7d948a ("net: dsa:
sja1105: Fix broken learning with vlan_filtering disabled")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As commit 30d8177e8a ("bonding: Always enable vlan tx offload")
said, we should always enable bonding's vlan tx offload, pass the
vlan packets to the slave devices with vlan tci, let them to handle
vlan implementation.
Now if encapsulation protocols like VXLAN is used, skb->encapsulation
may be set, then the packet is passed to vlan device which based on
bonding device. However in netif_skb_features(), the check of
hw_enc_features:
if (skb->encapsulation)
features &= dev->hw_enc_features;
clears NETIF_F_HW_VLAN_CTAG_TX/NETIF_F_HW_VLAN_STAG_TX. This results
in same issue in commit 30d8177e8a like this:
vlan_dev_hard_start_xmit
-->dev_queue_xmit
-->validate_xmit_skb
-->netif_skb_features //NETIF_F_HW_VLAN_CTAG_TX is cleared
-->validate_xmit_vlan
-->__vlan_hwaccel_push_inside //skb->tci is cleared
...
--> bond_start_xmit
--> bond_xmit_hash //BOND_XMIT_POLICY_ENCAP34
--> __skb_flow_dissect // nhoff point to IP header
--> case htons(ETH_P_8021Q)
// skb_vlan_tag_present is false, so
vlan = __skb_header_pointer(skb, nhoff, sizeof(_vlan),
//vlan point to ip header wrongly
Fixes: b2a103e6d0 ("bonding: convert to ndo_fix_features")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In error case, all entries should be freed from the sched list
before deleting it. For simplicity use rcu way.
Fixes: 5a781ccbd1 ("tc: Add support for configuring the taprio scheduler")
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
IPX is no longer supported, but the example in the documentation
might useful. Replace it with IPv6.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Both IPX and TR have not been supported for a while now.
Remove them from the /proc/sys/net documentation.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
At this point nr_frags has been incremented but the frag does not yet
have a page assigned so freeing the skb results in a crash. Reset
nr_frags before freeing the skb to prevent this.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for removing __udivdi3() from the RISC-V
architecture-specific files, convert its one user to use do_div().
This avoids breaking the RV32 build after __udivdi3() is removed.
This second version removes the assignment of the remainder to an
unused temporary variable. Thanks to Nicolas Pitre <nico@fluxnic.net>
for the suggestion.
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Nicolas Pitre <nico@fluxnic.net>
Since the RISC-V specification states that ISA description strings are
case-insensitive, there's no functional difference between mixed-case,
upper-case, and lower-case ISA strings. Thus, to simplify parsing,
specify that the letters present in "riscv,isa" must be all lowercase.
Suggested-by: Paul Walmsley <paul.walmsley@sifive.com>
Signed-off-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
Before commit d4289fcc9b ("net: IP6 defrag: use rbtrees for IPv6
defrag"), a netperf UDP_STREAM test[0] using big IPv6 datagrams (thus
generating many fragments) and running over an IPsec tunnel, reported
more than 6Gbps throughput. After that patch, the same test gets only
9Mbps when receiving on a be2net nic (driver can make a big difference
here, for example, ixgbe doesn't seem to be affected).
By reusing the IPv4 defragmentation code, IPv6 lost fragment coalescing
(IPv4 fragment coalescing was dropped by commit 14fe22e334 ("Revert
"ipv4: use skb coalescing in defragmentation"")).
Without fragment coalescing, be2net runs out of Rx ring entries and
starts to drop frames (ethtool reports rx_drops_no_frags errors). Since
the netperf traffic is only composed of UDP fragments, any lost packet
prevents reassembly of the full datagram. Therefore, fragments which
have no possibility to ever get reassembled pile up in the reassembly
queue, until the memory accounting exeeds the threshold. At that point
no fragment is accepted anymore, which effectively discards all
netperf traffic.
When reassembly timeout expires, some stale fragments are removed from
the reassembly queue, so a few packets can be received, reassembled
and delivered to the netperf receiver. But the nic still drops frames
and soon the reassembly queue gets filled again with stale fragments.
These long time frames where no datagram can be received explain why
the performance drop is so significant.
Re-introducing fragment coalescing is enough to get the initial
performances again (6.6Gbps with be2net): driver doesn't drop frames
anymore (no more rx_drops_no_frags errors) and the reassembly engine
works at full speed.
This patch is quite conservative and only coalesces skbs for local
IPv4 and IPv6 delivery (in order to avoid changing skb geometry when
forwarding). Coalescing could be extended in the future if need be, as
more scenarios would probably benefit from it.
[0]: Test configuration
Sender:
ip xfrm policy flush
ip xfrm state flush
ip xfrm state add src fc00:1::1 dst fc00:2::1 proto esp spi 0x1000 aead 'rfc4106(gcm(aes))' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b 96 mode transport sel src fc00:1::1 dst fc00:2::1
ip xfrm policy add src fc00:1::1 dst fc00:2::1 dir in tmpl src fc00:1::1 dst fc00:2::1 proto esp mode transport action allow
ip xfrm state add src fc00:2::1 dst fc00:1::1 proto esp spi 0x1001 aead 'rfc4106(gcm(aes))' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b 96 mode transport sel src fc00:2::1 dst fc00:1::1
ip xfrm policy add src fc00:2::1 dst fc00:1::1 dir out tmpl src fc00:2::1 dst fc00:1::1 proto esp mode transport action allow
netserver -D -L fc00:2::1
Receiver:
ip xfrm policy flush
ip xfrm state flush
ip xfrm state add src fc00:2::1 dst fc00:1::1 proto esp spi 0x1001 aead 'rfc4106(gcm(aes))' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b 96 mode transport sel src fc00:2::1 dst fc00:1::1
ip xfrm policy add src fc00:2::1 dst fc00:1::1 dir in tmpl src fc00:2::1 dst fc00:1::1 proto esp mode transport action allow
ip xfrm state add src fc00:1::1 dst fc00:2::1 proto esp spi 0x1000 aead 'rfc4106(gcm(aes))' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b 96 mode transport sel src fc00:1::1 dst fc00:2::1
ip xfrm policy add src fc00:1::1 dst fc00:2::1 dir out tmpl src fc00:1::1 dst fc00:2::1 proto esp mode transport action allow
netperf -H fc00:2::1 -f k -P 0 -L fc00:1::1 -l 60 -t UDP_STREAM -I 99,5 -i 5,5 -T5,5 -6
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull NFS client fixes from Trond Myklebust:
"Highlights include:
Stable fixes:
- NFSv4: Ensure we check the return value of update_open_stateid() so
we correctly track active open state.
- NFSv4: Fix for delegation state recovery to ensure we recover all
open modes that are active.
- NFSv4: Fix an Oops in nfs4_do_setattr
Fixes:
- NFS: Fix regression whereby fscache errors are appearing on 'nofsc'
mounts
- NFSv4: Fix a potential sleep while atomic in nfs4_do_reclaim()
- NFSv4: Fix a credential refcount leak in nfs41_check_delegation_stateid
- pNFS: Report errors from the call to nfs4_select_rw_stateid()
- NFSv4: Various other delegation and open stateid recovery fixes
- NFSv4: Fix state recovery behaviour when server connection times
out"
* tag 'nfs-for-5.3-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFSv4: Ensure state recovery handles ETIMEDOUT correctly
NFS: Fix regression whereby fscache errors are appearing on 'nofsc' mounts
NFSv4: Fix an Oops in nfs4_do_setattr
NFSv4: Fix a potential sleep while atomic in nfs4_do_reclaim()
NFSv4: Check the return value of update_open_stateid()
NFSv4.1: Only reap expired delegations
NFSv4.1: Fix open stateid recovery
NFSv4: Report the error from nfs4_select_rw_stateid()
NFSv4: When recovering state fails with EAGAIN, retry the same recovery
NFSv4: Print an error in the syslog when state is marked as irrecoverable
NFSv4: Fix delegation state recovery
NFSv4: Fix a credential refcount leak in nfs41_check_delegation_stateid
M2M scaler clocks require special handling of their parent bus clock during
power domain on/off sequences. MSCL clocks were not initially added to the
sub-CMU handler, because that time there was no driver for the M2M scaler
device and it was not possible to test it.
This patch fixes this issue. Parent clock for M2M scaler devices is now
properly preserved during MSC power domain on/off sequence. This gives M2M
scaler devices proper performance: fullHD XRGB32 image 1000 rotations test
takes 3.17s instead of 45.08s.
Fixes: b06a532bf1 ("clk: samsung: Add Exynos5 sub-CMU clock driver")
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lkml.kernel.org/r/20190808121839.23892-1-m.szyprowski@samsung.com
Acked-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
This patch fixes broken sound on Exynos5422/5800 platforms after
system/suspend resume cycle in cases where the audio root clock
is derived from MAU_EPLL_CLK.
In order to preserve state of the USER_MUX_MAU_EPLL_CLK clock mux
during system suspend/resume cycle for Exynos5800 we group the MAU
block input clocks in "MAU" sub-CMU and add the clock mux control
bit to .suspend_regs. This ensures that user configuration of the mux
is not lost after the PMU block changes the mux setting to OSC_DIV
when switching off the MAU power domain.
Adding the SRC_TOP9 register to exynos5800_clk_regs[] array is not
sufficient as at the time of the syscore_ops suspend call MAU power
domain is already turned off and we already save and subsequently
restore an incorrect register's value.
Fixes: b06a532bf1 ("clk: samsung: Add Exynos5 sub-CMU clock driver")
Reported-by: Jaafar Ali <jaafarkhalaf@gmail.com>
Suggested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Jaafar Ali <jaafarkhalaf@gmail.com>
Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Link: https://lkml.kernel.org/r/20190808144929.18685-2-s.nawrocki@samsung.com
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
perf/urgent fixes:
db-export:
Adrian Hunter:
- Fix thread__exec_comm() picking of main thread COMM for pre-existing,
synthesized from /proc data records.
annotate:
Arnaldo Carvalho de Melo:
- Fix printing of unaugmented disassembled instructions from BPF, some
lines were leaving leftovers from the previous screen, due to use of
newlines by binutils's libopcode disassembler.
perf ftrace:
He Zhe:
- Fix cpumask problems when only one CPU is present.
PMU events:
Jin Yao:
- Add missing "cpu_clk_unhalted.core" event.
perf bench:
Jiri Olsa:
- Fix cpu0 binding in the NUMA benchmarks.
s390:
Thomas Richter:
- Fix module size calculations.
build system:
Masanari Iida:
- Fix a typo in a variable name in the Documentation Makefile
misc:
Ian Rogers:
- Fix include paths in ui directory.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Remove check of recovery bit, in the beginning of the CQE recovery
function. This test is already performed right before the reporter
is invoked, when CQE error is detected.
Fixes: de8650a820 ("net/mlx5e: Add tx reporter support")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
CQE recovery function begins with test and set of recovery bit. Add an
error flow which ensures clearing of this bit when leaving the recovery
function, to allow further recoveries to take place. This allows removal
of clearing recovery bit on sq activate.
Fixes: de8650a820 ("net/mlx5e: Add tx reporter support")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Remove wrong error return value when SQ is not in error state.
CQE recovery on TX reporter queries the sq state. If the sq is not in
error state, the sq is either in ready or reset state. Ready state is
good state which doesn't require recovery and reset state is a temporal
state which ends in ready state. With this patch, CQE recovery in this
scenario is successful.
Fixes: de8650a820 ("net/mlx5e: Add tx reporter support")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Shift the tisn field in the WQE control segment, per the
HW specification.
Fixes: d2ead1f360 ("net/mlx5e: Add kTLS TX HW offload support")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Use the proper tisn field name from the union in struct mlx5_wqe_ctrl_seg.
Fixes: d2ead1f360 ("net/mlx5e: Add kTLS TX HW offload support")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The TLS progress params context WQE should not include an
Eth segment, drop it.
In addition, align the tls_progress_params layout with the
HW specification document:
- fix the tisn field name.
- remove the valid bit.
Fixes: a12ff35e0f ("net/mlx5: Introduce TLS TX offload hardware bits and structures")
Fixes: d2ead1f360 ("net/mlx5e: Add kTLS TX HW offload support")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Setting speed to 56GBASE is allowed only with auto-negotiation enabled.
This patch prevent setting speed to 56GBASE when auto-negotiation disabled.
Fixes: f62b8bb8f2 ("net/mlx5: Extend mlx5_core to support ConnectX-4 Ethernet functionality")
Signed-off-by: Mohamad Heib <mohamadh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Only support changing tx/rx pause frame setting if the net device
is the vport group manager.
Fixes: 3c2d18ef22 ("net/mlx5e: Support ethtool get/set_pauseparam")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
We have an issue that OVS application creates an offloaded drop rule
that drops VXLAN traffic with both inner and outer header match
criteria. mlx5_core driver detects correctly the inner and outer
header match criteria but does not enable the inner header match criteria
due to an incorrect assumption in mlx5_eswitch_add_offloaded_rule that
only decap rule needs inner header criteria.
Solution:
Remove mlx5_esw_flow_attr's match_level and tunnel_match_level and add
two new members: inner_match_level and outer_match_level.
inner/outer_match_level is set to NONE if the inner/outer match criteria
is not specified in the tc rule creation request. The decap assumption is
removed and the code just needs to check for inner/outer_match_level to
enable the corresponding bit in firmware's match_criteria_enable value.
Fixes: 6363651d6d ("net/mlx5e: Properly set steering match levels for offloaded TC decap rules")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The current ARFS code relies on certain fields to be set in the SKB
(e.g. transport_header) and extracts IP addresses and ports by custom
code that parses the packet. The necessary SKB fields, however, are not
always set at that point, which leads to an out-of-bounds access. Use
skb_flow_dissect_flow_keys() to get the necessary information reliably,
fix the out-of-bounds access and reuse the code.
Fixes: 18c908e477 ("net/mlx5e: Add accelerated RFS support")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The events defined in pmu-events JSON are parsed and added into perf
tool. For fixed counters, we handle the encodings between JSON and perf
by using a static array fixed[].
But the fixed[] has missed an important event "cpu_clk_unhalted.core".
For example, on the Tremont platform,
[root@localhost ~]# perf stat -e cpu_clk_unhalted.core -a
event syntax error: 'cpu_clk_unhalted.core'
\___ parser error
With this patch, the event cpu_clk_unhalted.core can be parsed.
[root@localhost perf]# ./perf stat -e cpu_clk_unhalted.core -a -vvv
------------------------------------------------------------
perf_event_attr:
type 4
size 112
config 0x3c
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit 1
exclude_guest 1
------------------------------------------------------------
...
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20190729072755.2166-1-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
During execution of command 'perf top' the error message:
Not enough memory for annotating '__irf_end' symbol!)
is emitted from this call sequence:
__cmd_top
perf_top__mmap_read
perf_top__mmap_read_idx
perf_event__process_sample
hist_entry_iter__add
hist_iter__top_callback
perf_top__record_precise_ip
hist_entry__inc_addr_samples
symbol__inc_addr_samples
symbol__get_annotation
symbol__alloc_hist
In this function the size of symbol __irf_end is calculated. The size of
a symbol is the difference between its start and end address.
When the symbol was read the first time, its start and end was set to:
symbol__new: __irf_end 0xe954d0-0xe954d0
which is correct and maps with /proc/kallsyms:
root@s8360046:~/linux-4.15.0/tools/perf# fgrep _irf_end /proc/kallsyms
0000000000e954d0 t __irf_end
root@s8360046:~/linux-4.15.0/tools/perf#
In function symbol__alloc_hist() the end of symbol __irf_end is
symbol__alloc_hist sym:__irf_end start:0xe954d0 end:0x3ff80045a8
which is identical with the first module entry in /proc/kallsyms
This results in a symbol size of __irf_req for histogram analyses of
70334140059072 bytes and a malloc() for this requested size fails.
The root cause of this is function
__dso__load_kallsyms()
+-> symbols__fixup_end()
Function symbols__fixup_end() enlarges the last symbol in the kallsyms
map:
# fgrep __irf_end /proc/kallsyms
0000000000e954d0 t __irf_end
#
to the start address of the first module:
# cat /proc/kallsyms | sort | egrep ' [tT] '
....
0000000000e952d0 T __security_initcall_end
0000000000e954d0 T __initramfs_size
0000000000e954d0 t __irf_end
000003ff800045a8 T fc_get_event_number [scsi_transport_fc]
000003ff800045d0 t store_fc_vport_disable [scsi_transport_fc]
000003ff800046a8 T scsi_is_fc_rport [scsi_transport_fc]
000003ff800046d0 t fc_target_setup [scsi_transport_fc]
On s390 the kernel is located around memory address 0x200, 0x10000 or
0x100000, depending on linux version. Modules however start some- where
around 0x3ff xxxx xxxx.
This is different than x86 and produces a large gap for which histogram
allocation fails.
Fix this by detecting the kernel's last symbol and do no adjustment for
it. Introduce a weak function and handle s390 specifics.
Reported-by: Klaus Theurich <klaus.theurich@de.ibm.com>
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Hendrik Brueckner <brueckner@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20190724122703.3996-2-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
On s390 the modules loaded in memory have the text segment located after
the GOT and Relocation table. This can be seen with this output:
[root@m35lp76 perf]# fgrep qeth /proc/modules
qeth 151552 1 qeth_l2, Live 0x000003ff800b2000
...
[root@m35lp76 perf]# cat /sys/module/qeth/sections/.text
0x000003ff800b3990
[root@m35lp76 perf]#
There is an offset of 0x1990 bytes. The size of the qeth module is
151552 bytes (0x25000 in hex).
The location of the GOT/relocation table at the beginning of a module is
unique to s390.
commit 203d8a4aa6 ("perf s390: Fix 'start' address of module's map")
adjusts the start address of a module in the map structures, but does
not adjust the size of the modules. This leads to overlapping of module
maps as this example shows:
[root@m35lp76 perf] # ./perf report -D
0 0 0xfb0 [0xa0]: PERF_RECORD_MMAP -1/0: [0x3ff800b3990(0x25000)
@ 0]: x /lib/modules/.../qeth.ko.xz
0 0 0x1050 [0xb0]: PERF_RECORD_MMAP -1/0: [0x3ff800d85a0(0x8000)
@ 0]: x /lib/modules/.../ip6_tables.ko.xz
The module qeth.ko has an adjusted start address modified to b3990, but
its size is unchanged and the module ends at 0x3ff800d8990. This end
address overlaps with the next modules start address of 0x3ff800d85a0.
When the size of the leading GOT/Relocation table stored in the
beginning of the text segment (0x1990 bytes) is subtracted from module
qeth end address, there are no overlaps anymore:
0x3ff800d8990 - 0x1990 = 0x0x3ff800d7000
which is the same as
0x3ff800b2000 + 0x25000 = 0x0x3ff800d7000.
To fix this issue, also adjust the modules size in function
arch__fix_module_text_start(). Add another function parameter named size
and reduce the size of the module when the text segment start address is
changed.
Output after:
0 0 0xfb0 [0xa0]: PERF_RECORD_MMAP -1/0: [0x3ff800b3990(0x23670)
@ 0]: x /lib/modules/.../qeth.ko.xz
0 0 0x1050 [0xb0]: PERF_RECORD_MMAP -1/0: [0x3ff800d85a0(0x7a60)
@ 0]: x /lib/modules/.../ip6_tables.ko.xz
Reported-by: Stefan Liebler <stli@linux.ibm.com>
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Hendrik Brueckner <brueckner@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: stable@vger.kernel.org
Fixes: 203d8a4aa6 ("perf s390: Fix 'start' address of module's map")
Link: http://lkml.kernel.org/r/20190724122703.3996-1-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The code to disassemble BPF programs uses binutil's disassembling
routines, and those use in turn fprintf to print to a memstream FILE,
adding a newline at the end of each line, which ends up confusing the
TUI routines called from:
annotate_browser__write()
annotate_line__write()
annotate_browser__printf()
ui_browser__vprintf()
SLsmg_vprintf()
The SLsmg_vprintf() function in the slang library gets confused with the
terminating newline, so make the disasm_line__parse() function that
parses the lines produced by the BPF specific disassembler (that uses
binutil's libopcodes) and the lines produced by the objdump based
disassembler used for everything else (and that doesn't adds this
terminating newline) trim the end of the line in addition of the
beginning.
This way when disasm_line->ops.raw, i.e. for instructions without a
special scnprintf() method, we'll not have that \n getting in the way of
filling the screen right after the instruction with spaces to avoid
leaving what was on the screen before and thus garbling the annotation
screen, breaking scrolling, etc.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Fixes: 6987561c9e ("perf annotate: Enable annotation of BPF programs")
Link: https://lkml.kernel.org/n/tip-unbr5a5efakobfr6rhxq99ta@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Simon Wunderlich says:
====================
Here are some batman-adv bugfixes:
- Fix netlink dumping of all mcast_flags buckets, by Sven Eckelmann
- Fix deletion of RTR(4|6) mcast list entries, by Sven Eckelmann
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull cifs fixes from Steve French:
"Six small SMB3 fixes, two for stable"
* tag '5.3-rc3-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
SMB3: Kernel oops mounting a encryptData share with CONFIG_DEBUG_VIRTUAL
smb3: update TODO list of missing features
smb3: send CAP_DFS capability during session setup
SMB3: Fix potential memory leak when processing compound chain
SMB3: Fix deadlock in validate negotiate hits reconnect
cifs: fix rmmod regression in cifs.ko caused by force_sig changes
A deadlock with this stacktrace was observed.
The loop thread does a GFP_KERNEL allocation, it calls into dm-bufio
shrinker and the shrinker depends on I/O completion in the dm-bufio
subsystem.
In order to fix the deadlock (and other similar ones), we set the flag
PF_MEMALLOC_NOIO at loop thread entry.
PID: 474 TASK: ffff8813e11f4600 CPU: 10 COMMAND: "kswapd0"
#0 [ffff8813dedfb938] __schedule at ffffffff8173f405
#1 [ffff8813dedfb990] schedule at ffffffff8173fa27
#2 [ffff8813dedfb9b0] schedule_timeout at ffffffff81742fec
#3 [ffff8813dedfba60] io_schedule_timeout at ffffffff8173f186
#4 [ffff8813dedfbaa0] bit_wait_io at ffffffff8174034f
#5 [ffff8813dedfbac0] __wait_on_bit at ffffffff8173fec8
#6 [ffff8813dedfbb10] out_of_line_wait_on_bit at ffffffff8173ff81
#7 [ffff8813dedfbb90] __make_buffer_clean at ffffffffa038736f [dm_bufio]
#8 [ffff8813dedfbbb0] __try_evict_buffer at ffffffffa0387bb8 [dm_bufio]
#9 [ffff8813dedfbbd0] dm_bufio_shrink_scan at ffffffffa0387cc3 [dm_bufio]
#10 [ffff8813dedfbc40] shrink_slab at ffffffff811a87ce
#11 [ffff8813dedfbd30] shrink_zone at ffffffff811ad778
#12 [ffff8813dedfbdc0] kswapd at ffffffff811ae92f
#13 [ffff8813dedfbec0] kthread at ffffffff810a8428
#14 [ffff8813dedfbf50] ret_from_fork at ffffffff81745242
PID: 14127 TASK: ffff881455749c00 CPU: 11 COMMAND: "loop1"
#0 [ffff88272f5af228] __schedule at ffffffff8173f405
#1 [ffff88272f5af280] schedule at ffffffff8173fa27
#2 [ffff88272f5af2a0] schedule_preempt_disabled at ffffffff8173fd5e
#3 [ffff88272f5af2b0] __mutex_lock_slowpath at ffffffff81741fb5
#4 [ffff88272f5af330] mutex_lock at ffffffff81742133
#5 [ffff88272f5af350] dm_bufio_shrink_count at ffffffffa03865f9 [dm_bufio]
#6 [ffff88272f5af380] shrink_slab at ffffffff811a86bd
#7 [ffff88272f5af470] shrink_zone at ffffffff811ad778
#8 [ffff88272f5af500] do_try_to_free_pages at ffffffff811adb34
#9 [ffff88272f5af590] try_to_free_pages at ffffffff811adef8
#10 [ffff88272f5af610] __alloc_pages_nodemask at ffffffff811a09c3
#11 [ffff88272f5af710] alloc_pages_current at ffffffff811e8b71
#12 [ffff88272f5af760] new_slab at ffffffff811f4523
#13 [ffff88272f5af7b0] __slab_alloc at ffffffff8173a1b5
#14 [ffff88272f5af880] kmem_cache_alloc at ffffffff811f484b
#15 [ffff88272f5af8d0] do_blockdev_direct_IO at ffffffff812535b3
#16 [ffff88272f5afb00] __blockdev_direct_IO at ffffffff81255dc3
#17 [ffff88272f5afb30] xfs_vm_direct_IO at ffffffffa01fe3fc [xfs]
#18 [ffff88272f5afb90] generic_file_read_iter at ffffffff81198994
#19 [ffff88272f5afc50] __dta_xfs_file_read_iter_2398 at ffffffffa020c970 [xfs]
#20 [ffff88272f5afcc0] lo_rw_aio at ffffffffa0377042 [loop]
#21 [ffff88272f5afd70] loop_queue_work at ffffffffa0377c3b [loop]
#22 [ffff88272f5afe60] kthread_worker_fn at ffffffff810a8a0c
#23 [ffff88272f5afec0] kthread at ffffffff810a8428
#24 [ffff88272f5aff50] ret_from_fork at ffffffff81745242
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Currently, the authorized_default and interface_authorized_default
attributes for HCD are set up after the uevent has been sent to userland.
This creates a race condition where userland may fail to access this
file when processing the event. Move the appending of these attributes
earlier relying on the usb_bus_notify dispatcher.
Signed-off-by: Thiébaud Weksteen <tweek@google.com>
Cc: stable <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20190806110050.38918-1-tweek@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 89e524c04f ("loop: Fix mount(2) failure due to race with
LOOP_SET_FD") converted blkdev_get() to use the new helpers for
finishing claiming of a block device. However the conversion botched the
error handling in blkdev_get() and thus the bdev has been marked as held
even in case __blkdev_get() returned error. This led to occasional
warnings with block/001 test from blktests like:
kernel: WARNING: CPU: 5 PID: 907 at fs/block_dev.c:1899 __blkdev_put+0x396/0x3a0
Correct the error handling.
CC: stable@vger.kernel.org
Fixes: 89e524c04f ("loop: Fix mount(2) failure due to race with LOOP_SET_FD")
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In stm32_mdma_irq_handler(), chan is checked on line 1368.
When chan is NULL, it is still used on line 1369:
dev_err(chan2dev(chan), "MDMA channel not initialized\n");
Thus, a possible null-pointer dereference may occur.
To fix this bug, "dev_dbg(mdma2dev(dmadev), ...)" is used instead.
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Fixes: a4ffb13c89 ("dmaengine: Add STM32 MDMA driver")
Link: https://lore.kernel.org/r/20190729020849.17971-1-baijiaju1990@gmail.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Since commit 13a857a4c4 ("block, bfq: detect wakers and
unconditionally inject their I/O"), every bfq_queue has a pointer to a
waker bfq_queue and a list of the bfq_queues it may wake. In this
respect, when a bfq_queue, say Q, remains with no I/O source attached
to it, Q cannot be woken by any other bfq_queue, and cannot wake any
other bfq_queue. Then Q must be removed from the woken list of its
possible waker bfq_queue, and all bfq_queues in the woken list of Q
must stop having a waker bfq_queue.
Q remains with no I/O source in two cases: when the last process
associated with Q exits or when such a process gets associated with a
different bfq_queue. Unfortunately, commit 13a857a4c4 ("block, bfq:
detect wakers and unconditionally inject their I/O") performed the
above updates only in the first case.
This commit fixes this bug by moving these updates to when Q gets
freed. This is a simple and safe way to handle all cases, as both the
above events, process exit and re-association, lead to Q being freed
soon, and because dangling references would come out only after Q gets
freed (if no update were performed).
Fixes: 13a857a4c4 ("block, bfq: detect wakers and unconditionally inject their I/O")
Reported-by: Douglas Anderson <dianders@chromium.org>
Tested-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Since commit 13a857a4c4 ("block, bfq: detect wakers and
unconditionally inject their I/O"), BFQ stores, in a per-device
pointer last_completed_rq_bfqq, the last bfq_queue that had an I/O
request completed. If some bfq_queue receives new I/O right after the
last request of last_completed_rq_bfqq has been completed, then
last_completed_rq_bfqq may be a waker bfq_queue.
But if the bfq_queue last_completed_rq_bfqq points to is freed, then
last_completed_rq_bfqq becomes a dangling reference. This commit
resets last_completed_rq_bfqq if the pointed bfq_queue is freed.
Fixes: 13a857a4c4 ("block, bfq: detect wakers and unconditionally inject their I/O")
Reported-by: Douglas Anderson <dianders@chromium.org>
Tested-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Currently empty .bss checks performed do not pay attention to "common
objects" in object files which end up in .bss section eventually.
The "size" tool is a part of binutils and since version 2.18 provides
"--common" command line option, which allows to account "common objects"
sizes in .bss section size. Utilize "size --common" to perform accurate
check that .bss section is unused. Besides that the size tool handles
object files without .bss section gracefully and doesn't require
additional objdump run.
The linux kernel requires binutils 2.20 since 4.13.
Kbuild exports OBJSIZE to reference the right size tool.
Link: http://lkml.kernel.org/r/patch-2.thread-2257a1.git-2257a1c53d4a.your-ad-here.call-01565088755-ext-5120@work.hours
Reported-and-tested-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Commit 4a6ef8e37c ("pwm: Add support referencing PWMs from ACPI")
made pwm_get unconditionally return the acpi_pwm_get return value if
the device passed to pwm_get has an ACPI fwnode.
But even if the passed in device has an ACPI fwnode, it does not
necessarily have the necessary ACPI package defining its pwm bindings,
especially since the binding / API of this ACPI package has only been
introduced very recently.
Up until now X86/ACPI devices which use a separate pwm controller for
controlling their LCD screen's backlight brightness have been relying
on the static lookup-list to get their pwm.
pwm_get unconditionally returning the acpi_pwm_get return value breaks
this, breaking backlight control on these devices.
This commit fixes this by making pwm_get fall back to the static
lookup-list if acpi_pwm_get returns -ENOENT.
BugLink: https://bugs.freedesktop.org/show_bug.cgi?id=96571
Reported-by: youling257@gmail.com
Fixes: 4a6ef8e37c ("pwm: Add support referencing PWMs from ACPI")
Cc: Nikolaus Voss <nikolaus.voss@loewensteinmedical.de>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Nikolaus Voss <nikolaus.voss@loewensteinmedical.de>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
Currently when too many retries have occurred there is a memory
leak on the allocation for reply on the error return path. Fix
this by kfree'ing reply before returning.
Addresses-Coverity: ("Resource leak")
Fixes: a9cd9c044a ("drm/vmwgfx: Add a check to handle host message failure")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Deepak Rawat <drawat@vmware.com>
Signed-off-by: Deepak Rawat <drawat@vmware.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
In iso_packets_buffer_init(), 'b->packets' is allocated through
kmalloc_array(). Then, the aligned packet size is checked. If it is
larger than PAGE_SIZE, -EINVAL will be returned to indicate the error.
However, the allocated 'b->packets' is not deallocated on this path,
leading to a memory leak.
To fix the above issue, free 'b->packets' before returning the error code.
Fixes: 31ef9134eb ("ALSA: add LaCie FireWire Speakers/Griffin FireWave Surround driver")
Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
Reviewed-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Cc: <stable@vger.kernel.org> # v2.6.39+
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Since commit c66d4bd110 ("genirq/affinity: Add new callback for
(re)calculating interrupt sets"), irq_create_affinity_masks() returns
NULL in case of single vector. This change has caused regression on some
drivers, such as lpfc.
The problem is that single vector requests can happen in some generic cases:
1) kdump kernel
2) irq vectors resource is close to exhaustion.
If in that situation the affinity mask for a single vector is not created,
every caller has to handle the special case.
There is no reason why the mask cannot be created, so remove the check for
a single vector and create the mask.
Fixes: c66d4bd110 ("genirq/affinity: Add new callback for (re)calculating interrupt sets")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20190805011906.5020-1-ming.lei@redhat.com
This reverts commit cc798c8389.
Tony writes:
Somehow this causes a regression in Linux next for me where I'm
seeing lots of sysfs entries now missing under
/sys/bus/platform/devices.
For example, I now only see one .serial entry show up in sysfs.
Things work again if I revert commit cc798c8389 ("kernfs: fix
memleak inkernel_ops_readdir()"). Any ideas why that would be?
Tejun says:
Ugh, you're right. It can get double-put cuz ctx->pos is put by
release too.
So reverting it for now.
Reported-by: Tony Lindgren <tony@atomide.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Fixes: cc798c8389 ("kernfs: fix memleak in kernel_ops_readdir()")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When building with W=1, warnings about missing prototypes are emitted:
CC arch/x86/lib/cpu.o
arch/x86/lib/cpu.c:5:14: warning: no previous prototype for 'x86_family' [-Wmissing-prototypes]
5 | unsigned int x86_family(unsigned int sig)
| ^~~~~~~~~~
arch/x86/lib/cpu.c:18:14: warning: no previous prototype for 'x86_model' [-Wmissing-prototypes]
18 | unsigned int x86_model(unsigned int sig)
| ^~~~~~~~~
arch/x86/lib/cpu.c:33:14: warning: no previous prototype for 'x86_stepping' [-Wmissing-prototypes]
33 | unsigned int x86_stepping(unsigned int sig)
| ^~~~~~~~~~~~
Add the proper include file so the prototypes are there.
Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/42513.1565234837@turing-police
KBUILD_CFLAGS is very carefully built up in the top level Makefile,
particularly when cross compiling or using different build tools.
Resetting KBUILD_CFLAGS via := assignment is an antipattern.
The comment above the reset mentions that -pg is problematic. Other
Makefiles use `CFLAGS_REMOVE_file.o = $(CC_FLAGS_FTRACE)` when
CONFIG_FUNCTION_TRACER is set. Prefer that pattern to wiping out all of
the important KBUILD_CFLAGS then manually having to re-add them. Seems
also that __stack_chk_fail references are generated when using
CONFIG_STACKPROTECTOR or CONFIG_STACKPROTECTOR_STRONG.
Fixes: 8fc5b4d412 ("purgatory: core purgatory functionality")
Reported-by: Vaibhav Rustagi <vaibhavrustagi@google.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Vaibhav Rustagi <vaibhavrustagi@google.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20190807221539.94583-2-ndesaulniers@google.com
Implementing memcpy and memset in terms of __builtin_memcpy and
__builtin_memset is problematic.
GCC at -O2 will replace calls to the builtins with calls to memcpy and
memset (but will generate an inline implementation at -Os). Clang will
replace the builtins with these calls regardless of optimization level.
$ llvm-objdump -dr arch/x86/purgatory/string.o | tail
0000000000000339 memcpy:
339: 48 b8 00 00 00 00 00 00 00 00 movabsq $0, %rax
000000000000033b: R_X86_64_64 memcpy
343: ff e0 jmpq *%rax
0000000000000345 memset:
345: 48 b8 00 00 00 00 00 00 00 00 movabsq $0, %rax
0000000000000347: R_X86_64_64 memset
34f: ff e0
Such code results in infinite recursion at runtime. This is observed
when doing kexec.
Instead, reuse an implementation from arch/x86/boot/compressed/string.c.
This requires to implement a stub function for warn(). Also, Clang may
lower memcmp's that compare against 0 to bcmp's, so add a small definition,
too. See also: commit 5f074f3e19 ("lib/string.c: implement a basic bcmp")
Fixes: 8fc5b4d412 ("purgatory: core purgatory functionality")
Reported-by: Vaibhav Rustagi <vaibhavrustagi@google.com>
Debugged-by: Vaibhav Rustagi <vaibhavrustagi@google.com>
Debugged-by: Manoj Gupta <manojgupta@google.com>
Suggested-by: Alistair Delva <adelva@google.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Vaibhav Rustagi <vaibhavrustagi@google.com>
Cc: stable@vger.kernel.org
Link: https://bugs.chromium.org/p/chromium/issues/detail?id=984056
Link: https://lkml.kernel.org/r/20190807221539.94583-1-ndesaulniers@google.com
In sound_insert_unit(), the controlling structure 's' is allocated through
kmalloc(). Then it is added to the sound driver list by invoking
__sound_insert_unit(). Later on, if __register_chrdev() fails, 's' is
removed from the list through __sound_remove_unit(). If 'index' is not less
than 0, -EBUSY is returned to indicate the error. However, 's' is not
deallocated on this execution path, leading to a memory leak bug.
To fix the above issue, free 's' before -EBUSY is returned.
Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
When a configurations runs with a single cpu (such as a kdump kernel),
which causes the driver to request a single vector, when the driver
subsequently requests an irq affinity mask, the mask comes back null. The
driver currently does nothing in this scenario, which leaves mappings to
hardware queues incomplete and crashes the system.
Fix by recognizing the null mask and assigning the vector to the first cpu
in the system.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Pull hwmon fixes from Guenter Roeck:
"Fixes to lm75 and nct7802 drivers
In the lm75 driver, fix TMP75B chip description to ensure correct
initialization. In the nct7802 driver, fix in4 presence detection"
* tag 'hwmon-for-v5.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (lm75) Fixup tmp75b clr_mask
hwmon: (nct7802) Fix wrong detection of in4 presence
The code to detect if in4 is present is wrong; if in4 is not present,
the in4_input sysfs attribute is still present.
In detail:
- Ihen RTD3_MD=11 (VSEN3 present), everything is as expected (no bug).
- If we have RTD3_MD!=11 (no VSEN3), we unexpectedly have a in4_input
file under /sys and the "sensors" command displays in4_input.
But as expected, we have no in4_min, in4_max, in4_alarm, in4_beep.
Fix is_visible function to detect and report in4_input visibility
as expected.
Reported-by: Gilles Buloz <Gilles.Buloz@kontron.com>
Cc: Gilles Buloz <Gilles.Buloz@kontron.com>
Cc: stable@vger.kernel.org
Fixes: 3434f37835 ("hwmon: Driver for Nuvoton NCT7802Y")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Once implicit MR is being called to be released by
ib_umem_notifier_release() its leaves were marked as "dying".
However, when dereg_mr()->mlx5_ib_free_implicit_mr()->mr_leaf_free() is
called, it skips running the mr_leaf_free_action (i.e. umem_odp->work)
when those leaves were marked as "dying".
As such ib_umem_release() for the leaves won't be called and their MRs
will be leaked as well.
When an application exits/killed without calling dereg_mr we might hit the
above flow.
This fatal scenario is reported by WARN_ON() upon
mlx5_ib_dealloc_ucontext() as ibcontext->per_mm_list is not empty, the
call trace can be seen below.
Originally the "dying" mark as part of ib_umem_notifier_release() was
introduced to prevent pagefault_mr() from returning a success response
once this happened. However, we already have today the completion
mechanism so no need for that in those flows any more. Even in case a
success response will be returned the firmware will not find the pages and
an error will be returned in the following call as a released mm will
cause ib_umem_odp_map_dma_pages() to permanently fail mmget_not_zero().
Fix the above issue by dropping the "dying" from the above flows. The
other flows that are using "dying" are still needed it for their
synchronization purposes.
WARNING: CPU: 1 PID: 7218 at
drivers/infiniband/hw/mlx5/main.c:2004
mlx5_ib_dealloc_ucontext+0x84/0x90 [mlx5_ib]
CPU: 1 PID: 7218 Comm: ibv_rc_pingpong Tainted: G E
5.2.0-rc6+ #13
Call Trace:
uverbs_destroy_ufile_hw+0xb5/0x120 [ib_uverbs]
ib_uverbs_close+0x1f/0x80 [ib_uverbs]
__fput+0xbe/0x250
task_work_run+0x88/0xa0
do_exit+0x2cb/0xc30
? __fput+0x14b/0x250
do_group_exit+0x39/0xb0
get_signal+0x191/0x920
? _raw_spin_unlock_bh+0xa/0x20
? inet_csk_accept+0x229/0x2f0
do_signal+0x36/0x5e0
? put_unused_fd+0x5b/0x70
? __sys_accept4+0x1a6/0x1e0
? inet_hash+0x35/0x40
? release_sock+0x43/0x90
? _raw_spin_unlock_bh+0xa/0x20
? inet_listen+0x9f/0x120
exit_to_usermode_loop+0x5c/0xc6
do_syscall_64+0x182/0x1b0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: 81713d3788 ("IB/mlx5: Add implicit MR support")
Link: https://lore.kernel.org/r/20190805083010.21777-1-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Abort processing of a command if we run out of mapped data in the
SG list. This should never happen, but a previous bug caused it to
be possible. Play it safe and attempt to abort nicely if we don't
have more SG segments left.
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
For passthrough requests, libata-scsi takes what the user passes in
as gospel. This can be problematic if the user fills in the CDB
incorrectly. One example of that is in request sizes. For read/write
commands, the CDB contains fields describing the transfer length of
the request. These should match with the SG_IO header fields, but
libata-scsi currently does no validation of that.
Check that the number of blocks in the CDB for passthrough requests
matches what was mapped into the request. If the CDB asks for more
data then the validated SG_IO header fields, error it.
Reported-by: Krishna Ram Prakash R <krp@gtux.in>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
0eb6ddfb86 tried to fix this up, but introduced a use-after-free
of dio. Additionally, we still had an issue with error handling,
as reported by Darrick:
"I noticed a regression in xfs/747 (an unreleased xfstest for the
xfs_scrub media scanning feature) on 5.3-rc3. I'll condense that down
to a simpler reproducer:
error-test: 0 209 linear 8:48 0
error-test: 209 1 error
error-test: 210 6446894 linear 8:48 210
Basically we have a ~3G /dev/sdd and we set up device mapper to fail IO
for sector 209 and to pass the io to the scsi device everywhere else.
On 5.3-rc3, performing a directio pread of this range with a < 1M buffer
(in other words, a request for fewer than MAX_BIO_PAGES bytes) yields
EIO like you'd expect:
pread64(3, 0x7f880e1c7000, 1048576, 0) = -1 EIO (Input/output error)
pread: Input/output error
+++ exited with 0 +++
But doing it with a larger buffer succeeds(!):
pread64(3, "XFSB\0\0\20\0\0\0\0\0\0\fL\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1146880, 0) = 1146880
read 1146880/1146880 bytes at offset 0
1 MiB, 1 ops; 0.0009 sec (1.124 GiB/sec and 1052.6316 ops/sec)
+++ exited with 0 +++
(Note that the part of the buffer corresponding to the dm-error area is
uninitialized)
On 5.3-rc2, both commands would fail with EIO like you'd expect. The
only change between rc2 and rc3 is commit 0eb6ddfb86 ("block: Fix
__blkdev_direct_IO() for bio fragments").
AFAICT we end up in __blkdev_direct_IO with a 1120K buffer, which gets
split into two bios: one for the first BIO_MAX_PAGES worth of data (1MB)
and a second one for the 96k after that."
Fix this by noting that it's always safe to dereference dio if we get
BLK_QC_T_EAGAIN returned, as end_io hasn't been run for that case. So
we can safely increment the dio size before calling submit_bio(), and
then decrement it on failure (not that it really matters, as the bio
and dio are going away).
For error handling, return to the original method of just using 'ret'
for tracking the error, and the size tracking in dio->size.
Fixes: 0eb6ddfb86 ("block: Fix __blkdev_direct_IO() for bio fragments")
Fixes: 6a43074e2f ("block: properly handle IOCB_NOWAIT for async O_DIRECT IO")
Reported-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Ensure that the state recovery code handles ETIMEDOUT correctly,
and also that we set RPC_TASK_TIMEOUT when recovering open state.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Normally the range->len is set to default value (U64_MAX), but when it's
not default value, we should check if the range overflows.
And if it overflows, return -EINVAL before doing anything.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In the 5.3 merge window, commit 7c7e301406 ("btrfs: sysfs: Replace
default_attrs in ktypes with groups"), we started using the member
"defaults_groups" for the kobject type "btrfs_raid_ktype". That leads
to a series of warnings when running some test cases of fstests, such
as btrfs/027, btrfs/124 and btrfs/176. The traces produced by those
warnings are like the following:
[116648.059212] kernfs: can not remove 'total_bytes', no directory
[116648.060112] WARNING: CPU: 3 PID: 28500 at fs/kernfs/dir.c:1504 kernfs_remove_by_name_ns+0x75/0x80
(...)
[116648.066482] CPU: 3 PID: 28500 Comm: umount Tainted: G W 5.3.0-rc3-btrfs-next-54 #1
(...)
[116648.069376] RIP: 0010:kernfs_remove_by_name_ns+0x75/0x80
(...)
[116648.072385] RSP: 0018:ffffabfd0090bd08 EFLAGS: 00010282
[116648.073437] RAX: 0000000000000000 RBX: ffffffffc0c11998 RCX: 0000000000000000
[116648.074201] RDX: ffff9fff603a7a00 RSI: ffff9fff603978a8 RDI: ffff9fff603978a8
[116648.074956] RBP: ffffffffc0b9ca2f R08: 0000000000000000 R09: 0000000000000001
[116648.075708] R10: ffff9ffe1f72e1c0 R11: 0000000000000000 R12: ffffffffc0b94120
[116648.076434] R13: ffffffffb3d9b4e0 R14: 0000000000000000 R15: dead000000000100
[116648.077143] FS: 00007f9cdc78a2c0(0000) GS:ffff9fff60380000(0000) knlGS:0000000000000000
[116648.077852] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[116648.078546] CR2: 00007f9fc4747ab4 CR3: 00000005c7832003 CR4: 00000000003606e0
[116648.079235] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[116648.079907] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[116648.080585] Call Trace:
[116648.081262] remove_files+0x31/0x70
[116648.081929] sysfs_remove_group+0x38/0x80
[116648.082596] sysfs_remove_groups+0x34/0x70
[116648.083258] kobject_del+0x20/0x60
[116648.083933] btrfs_free_block_groups+0x405/0x430 [btrfs]
[116648.084608] close_ctree+0x19a/0x380 [btrfs]
[116648.085278] generic_shutdown_super+0x6c/0x110
[116648.085951] kill_anon_super+0xe/0x30
[116648.086621] btrfs_kill_super+0x12/0xa0 [btrfs]
[116648.087289] deactivate_locked_super+0x3a/0x70
[116648.087956] cleanup_mnt+0xb4/0x160
[116648.088620] task_work_run+0x7e/0xc0
[116648.089285] exit_to_usermode_loop+0xfa/0x100
[116648.089933] do_syscall_64+0x1cb/0x220
[116648.090567] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[116648.091197] RIP: 0033:0x7f9cdc073b37
(...)
[116648.100046] ---[ end trace 22e24db328ccadf8 ]---
[116648.100618] ------------[ cut here ]------------
[116648.101175] kernfs: can not remove 'used_bytes', no directory
[116648.101731] WARNING: CPU: 3 PID: 28500 at fs/kernfs/dir.c:1504 kernfs_remove_by_name_ns+0x75/0x80
(...)
[116648.105649] CPU: 3 PID: 28500 Comm: umount Tainted: G W 5.3.0-rc3-btrfs-next-54 #1
(...)
[116648.107461] RIP: 0010:kernfs_remove_by_name_ns+0x75/0x80
(...)
[116648.109336] RSP: 0018:ffffabfd0090bd08 EFLAGS: 00010282
[116648.109979] RAX: 0000000000000000 RBX: ffffffffc0c119a0 RCX: 0000000000000000
[116648.110625] RDX: ffff9fff603a7a00 RSI: ffff9fff603978a8 RDI: ffff9fff603978a8
[116648.111283] RBP: ffffffffc0b9ca41 R08: 0000000000000000 R09: 0000000000000001
[116648.111940] R10: ffff9ffe1f72e1c0 R11: 0000000000000000 R12: ffffffffc0b94120
[116648.112603] R13: ffffffffb3d9b4e0 R14: 0000000000000000 R15: dead000000000100
[116648.113268] FS: 00007f9cdc78a2c0(0000) GS:ffff9fff60380000(0000) knlGS:0000000000000000
[116648.113939] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[116648.114607] CR2: 00007f9fc4747ab4 CR3: 00000005c7832003 CR4: 00000000003606e0
[116648.115286] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[116648.115966] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[116648.116649] Call Trace:
[116648.117326] remove_files+0x31/0x70
[116648.117997] sysfs_remove_group+0x38/0x80
[116648.118671] sysfs_remove_groups+0x34/0x70
[116648.119342] kobject_del+0x20/0x60
[116648.120022] btrfs_free_block_groups+0x405/0x430 [btrfs]
[116648.120707] close_ctree+0x19a/0x380 [btrfs]
[116648.121396] generic_shutdown_super+0x6c/0x110
[116648.122057] kill_anon_super+0xe/0x30
[116648.122702] btrfs_kill_super+0x12/0xa0 [btrfs]
[116648.123335] deactivate_locked_super+0x3a/0x70
[116648.123961] cleanup_mnt+0xb4/0x160
[116648.124586] task_work_run+0x7e/0xc0
[116648.125210] exit_to_usermode_loop+0xfa/0x100
[116648.125830] do_syscall_64+0x1cb/0x220
[116648.126463] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[116648.127080] RIP: 0033:0x7f9cdc073b37
(...)
[116648.135923] ---[ end trace 22e24db328ccadf9 ]---
These happen because, during the unmount path, we call kobject_del() for
raid kobjects that are not fully initialized, meaning that we set their
ktype (as btrfs_raid_ktype) through link_block_group() but we didn't set
their parent kobject, which is done through btrfs_add_raid_kobjects().
We have this split raid kobject setup since commit 75cb379d26
("btrfs: defer adding raid type kobject until after chunk relocation") in
order to avoid triggering reclaim during contextes where we can not
(either we are holding a transaction handle or some lock required by
the transaction commit path), so that we do the calls to kobject_add(),
which triggers GFP_KERNEL allocations, through btrfs_add_raid_kobjects()
in contextes where it is safe to trigger reclaim. That change expected
that a new raid kobject can only be created either when mounting the
filesystem or after raid profile conversion through the relocation path.
However, we can have new raid kobject created in other two cases at least:
1) During device replace (or scrub) after adding a device a to the
filesystem. The replace procedure (and scrub) do calls to
btrfs_inc_block_group_ro() which can allocate a new block group
with a new raid profile (because we now have more devices). This
can be triggered by test cases btrfs/027 and btrfs/176.
2) During a degraded mount trough any write path. This can be triggered
by test case btrfs/124.
Fixing this by adding extra calls to btrfs_add_raid_kobjects(), not only
makes things more complex and fragile, can also introduce deadlocks with
reclaim the following way:
1) Calling btrfs_add_raid_kobjects() at btrfs_inc_block_group_ro() or
anywhere in the replace/scrub path will cause a deadlock with reclaim
because if reclaim happens and a transaction commit is triggered,
the transaction commit path will block at btrfs_scrub_pause().
2) During degraded mounts it is essentially impossible to figure out where
to add extra calls to btrfs_add_raid_kobjects(), because allocation of
a block group with a new raid profile can happen anywhere, which means
we can't safely figure out which contextes are safe for reclaim, as
we can either hold a transaction handle or some lock needed by the
transaction commit path.
So it is too complex and error prone to have this split setup of raid
kobjects. So fix the issue by consolidating the setup of the kobjects in a
single place, at link_block_group(), and setup a nofs context there in
order to prevent reclaim being triggered by the memory allocations done
through the call chain of kobject_add().
Besides fixing the sysfs warnings during kobject_del(), this also ensures
the sysfs directories for the new raid profiles end up created and visible
to users (a bug that existed before the 5.3 commit 7c7e301406
("btrfs: sysfs: Replace default_attrs in ktypes with groups")).
Fixes: 75cb379d26 ("btrfs: defer adding raid type kobject until after chunk relocation")
Fixes: 7c7e301406 ("btrfs: sysfs: Replace default_attrs in ktypes with groups")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Mark switch cases where we are expecting to fall through.
Fix the following warning (Building: allnoconfig i386):
arch/x86/kernel/ptrace.c:202:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
if (unlikely(value == 0))
^
arch/x86/kernel/ptrace.c:206:2: note: here
default:
^~~~~~~
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/20190805195654.GA17831@embeddedor
A long-time problem on the recent AMD chip (X370, X470, B450, etc with
PCI ID 1022:1457) with Realtek codecs is the crackled or distorted
sound for capture streams, as well as occasional playback hiccups.
After lengthy debugging sessions, the workarounds we've found are like
the following:
- Set up the proper driver caps for this controller, similar as the
other AMD controller.
- Correct the DMA position reporting with the fixed FIFO size, which
is similar like as workaround used for VIA chip set.
- Even after the position correction, PulseAudio still shows
mysterious stalls of playback streams when a capture is triggered in
timer-scheduled mode. Since we have no clear way to eliminate the
stall, pass the BATCH PCM flag for PA to suppress the tsched mode as
a temporary workaround.
This patch implements the workarounds. For the driver caps, it
defines a new preset, AXZ_DCAPS_PRESET_AMD_SB. It enables the FIFO-
corrected position reporting (corresponding to the new position_fix=6)
and enforces the SNDRV_PCM_INFO_BATCH flag.
Note that the current implementation is merely a workaround.
Hopefully we'll find a better alternative in future, especially about
removing the BATCH flag hack again.
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=195303
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
In hiface_pcm_init(), 'rt' is firstly allocated through kzalloc(). Later
on, hiface_pcm_init_urb() is invoked to initialize 'rt->out_urbs[i]'. In
hiface_pcm_init_urb(), 'rt->out_urbs[i].buffer' is allocated through
kzalloc(). However, if hiface_pcm_init_urb() fails, both 'rt' and
'rt->out_urbs[i].buffer' are not deallocated, leading to memory leak bugs.
Also, 'rt->out_urbs[i].buffer' is not deallocated if snd_pcm_new() fails.
To fix the above issues, free 'rt' and 'rt->out_urbs[i].buffer'.
Fixes: a91c3fb2f8 ("Add M2Tech hiFace USB-SPDIF driver")
Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
This reverts commit 9ed2c993d7.
SET_CONFIG_REG writes to memory if register shadowing is enabled,
causing a VM fault.
NGG streamout is unstable anyway, so all UMDs should use legacy
streamout. I think Mesa is the only driver using NGG streamout.
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Le Ma <Le.Ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Pull networking fixes from David Miller:
"Yeah I should have sent a pull request last week, so there is a lot
more here than usual:
1) Fix memory leak in ebtables compat code, from Wenwen Wang.
2) Several kTLS bug fixes from Jakub Kicinski (circular close on
disconnect etc.)
3) Force slave speed check on link state recovery in bonding 802.3ad
mode, from Thomas Falcon.
4) Clear RX descriptor bits before assigning buffers to them in
stmmac, from Jose Abreu.
5) Several missing of_node_put() calls, mostly wrt. for_each_*() OF
loops, from Nishka Dasgupta.
6) Double kfree_skb() in peak_usb can driver, from Stephane Grosjean.
7) Need to hold sock across skb->destructor invocation, from Cong
Wang.
8) IP header length needs to be validated in ipip tunnel xmit, from
Haishuang Yan.
9) Use after free in ip6 tunnel driver, also from Haishuang Yan.
10) Do not use MSI interrupts on r8169 chips before RTL8168d, from
Heiner Kallweit.
11) Upon bridge device init failure, we need to delete the local fdb.
From Nikolay Aleksandrov.
12) Handle erros from of_get_mac_address() properly in stmmac, from
Martin Blumenstingl.
13) Handle concurrent rename vs. dump in netfilter ipset, from Jozsef
Kadlecsik.
14) Setting NETIF_F_LLTX on mac80211 causes complete breakage with
some devices, so revert. From Johannes Berg.
15) Fix deadlock in rxrpc, from David Howells.
16) Fix Kconfig deps of enetc driver, we must have PHYLIB. From Yue
Haibing.
17) Fix mvpp2 crash on module removal, from Matteo Croce.
18) Fix race in genphy_update_link, from Heiner Kallweit.
19) bpf_xdp_adjust_head() stopped working with generic XDP when we
fixes generic XDP to support stacked devices properly, fix from
Jesper Dangaard Brouer.
20) Unbalanced RCU locking in rt6_update_exception_stamp_rt(), from
David Ahern.
21) Several memory leaks in new sja1105 driver, from Vladimir Oltean"
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (214 commits)
net: dsa: sja1105: Fix memory leak on meta state machine error path
net: dsa: sja1105: Fix memory leak on meta state machine normal path
net: dsa: sja1105: Really fix panic on unregistering PTP clock
net: dsa: sja1105: Use the LOCKEDS bit for SJA1105 E/T as well
net: dsa: sja1105: Fix broken learning with vlan_filtering disabled
net: dsa: qca8k: Add of_node_put() in qca8k_setup_mdio_bus()
net: sched: sample: allow accessing psample_group with rtnl
net: sched: police: allow accessing police->params with rtnl
net: hisilicon: Fix dma_map_single failed on arm64
net: hisilicon: fix hip04-xmit never return TX_BUSY
net: hisilicon: make hip04_tx_reclaim non-reentrant
tc-testing: updated vlan action tests with batch create/delete
net sched: update vlan action for batched events operations
net: stmmac: tc: Do not return a fragment entry
net: stmmac: Fix issues when number of Queues >= 4
net: stmmac: xgmac: Fix XGMAC selftests
be2net: disable bh with spin_lock in be_process_mcc
net: cxgb3_main: Fix a resource leak in a error path in 'init_one()'
net: ethernet: sun4i-emac: Support phy-handle property for finding PHYs
net: bridge: move default pvid init/deinit to NETDEV_REGISTER/UNREGISTER
...
There is only one clocksource in RISC-V. The boot cpu initializes
that clocksource. No need to keep a percpu data structure.
Signed-off-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Vladimir Oltean says:
====================
Fixes for SJA1105 DSA: FDBs, Learning and PTP
This is an assortment of functional fixes for the sja1105 switch driver
targeted for the "net" tree (although they apply on net-next just as
well).
Patch 1/5 ("net: dsa: sja1105: Fix broken learning with vlan_filtering
disabled") repairs a breakage introduced in the early development stages
of the driver: support for traffic from the CPU has broken "normal"
frame forwarding (based on DMAC) - there is connectivity through the
switch only because all frames are flooded.
I debated whether this patch qualifies as a fix, since it puts the
switch into a mode it has never operated in before (aka SVL). But
"normal" forwarding did use to work before the "Traffic support for
SJA1105 DSA driver" patchset, and arguably this patch should have been
part of that.
Also, it would be strange for this feature to be broken in the 5.2 LTS.
Patch 2/5 ("net: dsa: sja1105: Use the LOCKEDS bit for SJA1105 E/T as
well") is a simplification of a previous FDB-related patch that is
currently in the 5.3 rc's.
Patches 3/5 - 5/5 fix various crashes found while running linuxptp over the
switch ports for extended periods of time, or in conjunction with other
error conditions. The fixed-up commits were all introduced in 5.2.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When RX timestamping is enabled and two link-local (non-meta) frames are
received in a row, this constitutes an error.
The tagger is always caching the last link-local frame, in an attempt to
merge it with the meta follow-up frame when that arrives. To recover
from the above error condition, the initial cached link-local frame is
dropped and the second frame in a row is cached (in expectance of the
second meta frame).
However, when dropping the initial link-local frame, its backing memory
was being leaked.
Fixes: f3097be21b ("net: dsa: sja1105: Add a state machine for RX timestamping")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After a meta frame is received, it is associated with the cached
sp->data->stampable_skb from the DSA tagger private structure.
Cached means its refcount is incremented with skb_get() in order for
dsa_switch_rcv() to not free it when the tagger .rcv returns NULL.
The mistake is that skb_unref() is not the correct function to use. It
will correctly decrement the refcount (which will go back to zero) but
the skb memory will not be freed. That is the job of kfree_skb(), which
also calls skb_unref().
But it turns out that freeing the cached stampable_skb is in fact not
necessary. It is still a perfectly valid skb, and now it is even
annotated with the partial RX timestamp. So remove the skb_copy()
altogether and simply pass the stampable_skb with a refcount of 1
(incremented by us, decremented by dsa_switch_rcv) up the stack.
Fixes: f3097be21b ("net: dsa: sja1105: Add a state machine for RX timestamping")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The IS_ERR_OR_NULL(priv->clock) check inside
sja1105_ptp_clock_unregister() is preventing cancel_delayed_work_sync
from actually being run.
Additionally, sja1105_ptp_clock_unregister() does not actually get run,
when placed in sja1105_remove(). The DSA switch gets torn down, but the
sja1105 module does not get unregistered. So sja1105_ptp_clock_unregister
needs to be moved to sja1105_teardown, to be symmetrical with
sja1105_ptp_clock_register which is called from the DSA sja1105_setup.
It is strange to fix a "fixes" patch, but the probe failure can only be
seen when the attached PHY does not respond to MDIO (issue which I can't
pinpoint the reason to) and it goes away after I power-cycle the board.
This time the patch was validated on a failing board, and the kernel
panic from the fixed commit's message can no longer be seen.
Fixes: 29dd908d35 ("net: dsa: sja1105: Cancel PTP delayed work on unregister")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It looks like the FDB dump taken from first-generation switches also
contains information on whether entries are static or not. So use that
instead of searching through the driver's tables.
Fixes: d763778224 ("net: dsa: sja1105: Implement is_static for FDB entries on E/T")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When put under a bridge with vlan_filtering 0, the SJA1105 ports will
flood all traffic as if learning was broken. This is because learning
interferes with the rx_vid's configured by dsa_8021q as unique pvid's.
So learning technically still *does* work, it's just that the learnt
entries never get matched due to their unique VLAN ID.
The setting that saves the day is Shared VLAN Learning, which on this
switch family works exactly as desired: VLAN tagging still works
(untagged traffic gets the correct pvid) and FDB entries are still
populated with the correct contents including VID. Also, a frame cannot
violate the forwarding domain restrictions enforced by its classified
VLAN. It is just that the VID is ignored when looking up the FDB for
taking a forwarding decision (selecting the egress port).
This patch activates SVL, and the result is that frames with a learnt
DMAC are no longer flooded in the scenario described above.
Now exactly *because* SVL works as desired, we have to revisit some
earlier patches:
- It is no longer necessary to manipulate the VID of the 'bridge fdb
{add,del}' command when vlan_filtering is off. This is because now,
SVL is enabled for that case, so the actual VID does not matter*.
- It is still desirable to hide dsa_8021q VID's in the FDB dump
callback. But right now the dump callback should no longer hide
duplicates (one per each front panel port's pvid, plus one for the
VLAN that the CPU port is going to tag a TX frame with), because there
shouldn't be any (the switch will match a single FDB entry no matter
its VID anyway).
* Not really... It's no longer necessary to transform a 'bridge fdb add'
into 5 fdb add operations, but the user might still add a fdb entry with
any vid, and all of them would appear as duplicates in 'bridge fdb
show'. So force a 'bridge fdb add' to insert the VID of 0**, so that we
can prune the duplicates at insertion time.
** The VID of 0 is better than 1 because it is always guaranteed to be
in the ports' hardware filter. DSA also avoids putting the VID inside
the netlink response message towards the bridge driver when we return
this particular VID, which makes it suitable for FDB entries learnt
with vlan_filtering off.
Fixes: 227d07a07e ("net: dsa: sja1105: Add support for traffic through standalone ports")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Georg Waibel <georg.waibel@sensor-technik.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Each iteration of for_each_available_child_of_node() puts the previous
node, but in the case of a return from the middle of the loop, there
is no put, thus causing a memory leak. Hence add an of_node_put() before
the return.
Additionally, the local variable ports in the function
qca8k_setup_mdio_bus() takes the return value of of_get_child_by_name(),
which gets a node but does not put it. If the function returns without
putting ports, it may cause a memory leak. Hence put ports before the
mid-loop return statement, and also outside the loop after its last usage
in this function.
Issues found with Coccinelle.
Signed-off-by: Nishka Dasgupta <nishkadg.linux@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Buslov says:
====================
action fixes for flow_offload infra compatibility
Fix rcu warnings due to usage of action helpers that expect rcu read lock
protection from rtnl-protected context of flow_offload infra.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiangfeng Xiao says:
====================
net: hisilicon: Fix a few problems with hip04_eth
During the use of the hip04_eth driver,
several problems were found,
which solved the hip04_tx_reclaim reentry problem,
fixed the problem that hip04_mac_start_xmit never
returns NETDEV_TX_BUSY
and the dma_map_single failed on the arm64 platform.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
On the arm64 platform, executing "ifconfig eth0 up" will fail,
returning "ifconfig: SIOCSIFFLAGS: Input/output error."
ndev->dev is not initialized, dma_map_single->get_dma_ops->
dummy_dma_ops->__dummy_map_page will return DMA_ERROR_CODE
directly, so when we use dma_map_single, the first parameter
is to use the device of platform_device.
Signed-off-by: Jiangfeng Xiao <xiaojiangfeng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
TX_DESC_NUM is 256, in tx_count, the maximum value of
mod(TX_DESC_NUM - 1) is 254, the variable "count" in
the hip04_mac_start_xmit function is never equal to
(TX_DESC_NUM - 1), so hip04_mac_start_xmit never
return NETDEV_TX_BUSY.
tx_count is modified to mod(TX_DESC_NUM) so that
the maximum value of tx_count can reach
(TX_DESC_NUM - 1), then hip04_mac_start_xmit can reurn
NETDEV_TX_BUSY.
Signed-off-by: Jiangfeng Xiao <xiaojiangfeng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Roman Mashak says:
====================
Fix batched event generation for vlan action
When adding or deleting a batch of entries, the kernel sends up to
TCA_ACT_MAX_PRIO (defined to 32 in kernel) entries in an event to user
space. However it does not consider that the action sizes may vary and
require different skb sizes.
For example, consider the following script adding 32 entries with all
supported vlan parameters (in order to maximize netlink messages size):
% cat tc-batch.sh
TC="sudo /mnt/iproute2.git/tc/tc"
$TC actions flush action vlan
for i in `seq 1 $1`;
do
cmd="action vlan push protocol 802.1q id 4094 priority 7 pipe \
index $i cookie aabbccddeeff112233445566778800a1 "
args=$args$cmd
done
$TC actions add $args
%
% ./tc-batch.sh 32
Error: Failed to fill netlink attributes while adding TC action.
We have an error talking to the kernel
%
patch 1 adds callback in tc_action_ops of vlan action, which calculates
the action size, and passes size to tcf_add_notify()/tcf_del_notify().
patch 2 updates the TDC test suite with relevant vlan test cases.
====================
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update TDC tests with cases varifying ability of TC to install or delete
batches of vlan actions.
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add get_fill_size() routine used to calculate the action size
when building a batch of events.
Fixes: c7e2b9689 ("sched: introduce vlan action")
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull MIPS fixes from Paul Burton:
"A few MIPS fixes for 5.3:
- Various switch fall through annotations to fixup warnings & errors
resulting from -Wimplicit-fallthrough.
- A fix for systems (at least jazz) using an i8253 PIT as clocksource
when it's not suitably configured.
- Set struct cacheinfo's cpu_map_populated field to true, indicating
that we filled in cache info detected from cop0 registers &
avoiding complaints about that info being (intentionally) missing
in devicetree"
* tag 'mips_fixes_5.3_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: BCM63XX: Mark expected switch fall-through
MIPS: OProfile: Mark expected switch fall-throughs
MIPS: Annotate fall-through in Cavium Octeon code
MIPS: Annotate fall-through in kvm/emulate.c
mips: fix cacheinfo
MIPS: kernel: only use i8253 clocksource with periodic clockevent
Without this pin, the csb buffer will be filled with inconsistent
data after S3 resume. And that will causes gfx hang on gfxoff
exit since this csb will be executed then.
Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Tested-by: Paul Gover <pmw.gover@yahoo.co.uk>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-by: Xiaojie Yuan <xiaojie.yuan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jose Abreu says:
====================
net: stmmac: Fixes for -net
Couple of fixes for -net. More info in commit log.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Do not try to return a fragment entry from TC list. Otherwise we may not
clean properly allocated entries.
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When queues >= 4 we use different registers but we were not subtracting
the offset of 4. Fix this.
Found out by Coverity.
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixup the XGMAC selftests by correctly finishing the implementation of
set_filter callback.
Result:
$ ethtool -t enp4s0
The test result is PASS
The test extra info:
1. MAC Loopback 0
2. PHY Loopback -95
3. MMC Counters -95
4. EEE -95
5. Hash Filter MC 0
6. Perfect Filter UC 0
7. MC Filter 0
8. UC Filter 0
9. Flow Control 0
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kalle Valo says:
====================
wireless-drivers fixes for 5.3
Second set of fixes for 5.3. Lots of iwlwifi fixes have accumulated
which consists most of patches in this pull request. Only most notable
iwlwifi fixes are listed below.
mwifiex
* fix a regression related to WPA1 networks since v5.3-rc1
iwlwifi
* fix use-after-free issues
* fix DMA mapping API usage errors
* fix frame drop occurring due to reorder buffer handling in
RSS in certain conditions
* fix rate scale locking issues
* disable TX A-MSDU on older NICs as it causes problems and was
never supposed to be supported
* new PCI IDs
* GEO_TX_POWER_LIMIT API issue that many people were hitting
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull HID fixes from Jiri Kosina:
- functional regression fix for some of the Logitech unifying devices,
from Hans de Goede
- race condition fix in hid-sony for bug severely affecting
Valve/Android deployments, from Roderick Colenbrander
- several fixes for issues found by syzbot/kasan, from Oliver Neukum
and Hillf Danton
- functional regression fix for Wacom Cintiq device, from Aaron
Armstrong Skomra
- a few other assorted device-specific quirks
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
HID: sony: Fix race condition between rumble and device remove.
HID: hiddev: do cleanup in failure of opening a device
HID: hiddev: avoid opening a disconnected device
HID: input: fix a4tech horizontal wheel custom usage
HID: Add quirk for HP X1200 PIXART OEM mouse
HID: holtek: test for sanity of intfdata
HID: wacom: fix bit shift for Cintiq Companion 2
HID: quirks: Set the INCREMENT_USAGE_ON_DUPLICATE quirk on Saitek X52
HID: logitech-dj: Really fix return value of logi_dj_recv_query_hidpp_devices
HID: Add 044f:b320 ThrustMaster, Inc. 2 in 1 DT
HID: logitech-dj: add the Powerplay receiver
HID: logitech-hidpp: add USB PID for a few more supported mice
HID: logitech-dj: rename "gaming" receiver to "lightspeed"
be_process_mcc() is invoked in 3 different places and
always with BHs disabled except the be_poll function
but since it's invoked from softirq with BHs
disabled it won't hurt.
v1->v2: added explanation to the patch
v2->v3: add a missing call from be_cmds.c
Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
A call to 'kfree_skb()' is missing in the error handling path of
'init_one()'.
This is already present in 'remove_one()' but is missing here.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
The sun4i-emac uses the "phy" property to find the PHY it's supposed to
use. This property was deprecated in favor of "phy-handle" in commit
8c5b094476 ("dt-bindings: net: sun4i-emac: Convert the binding to a
schemas").
Add support for this new property name, and fall back to the old one in
case the device tree hasn't been updated.
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull pti updates from Thomas Gleixner:
"The performance deterioration departement is not proud at all to
present yet another set of speculation fences to mitigate the next
chapter in the 'what could possibly go wrong' story.
The new vulnerability belongs to the Spectre class and affects GS
based data accesses and has therefore been dubbed 'Grand Schemozzle'
for secret communication purposes. It's officially listed as
CVE-2019-1125.
Conditional branches in the entry paths which contain a SWAPGS
instruction (interrupts and exceptions) can be mis-speculated which
results in speculative accesses with a wrong GS base.
This can happen on entry from user mode through a mis-speculated
branch which takes the entry from kernel mode path and therefore does
not execute the SWAPGS instruction. The following speculative accesses
are done with user GS base.
On entry from kernel mode the mis-speculated branch executes the
SWAPGS instruction in the entry from user mode path which has the same
effect that the following GS based accesses are done with user GS
base.
If there is a disclosure gadget available in these code paths the
mis-speculated data access can be leaked through the usual side
channels.
The entry from user mode issue affects all CPUs which have speculative
execution. The entry from kernel mode issue affects only Intel CPUs
which can speculate through SWAPGS. On CPUs from other vendors SWAPGS
has semantics which prevent that.
SMAP migitates both problems but only when the CPU is not affected by
the Meltdown vulnerability.
The mitigation is to issue LFENCE instructions in the entry from
kernel mode path for all affected CPUs and on the affected Intel CPUs
also in the entry from user mode path unless PTI is enabled because
the CR3 write is serializing.
The fences are as usual enabled conditionally and can be completely
disabled on the kernel command line. The Spectre V1 documentation is
updated accordingly.
A big "Thank You!" goes to Josh for doing the heavy lifting for this
round of hardware misfeature 'repair'. Of course also "Thank You!" to
everybody else who contributed in one way or the other"
* 'x86/grand-schemozzle' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Documentation: Add swapgs description to the Spectre v1 documentation
x86/speculation/swapgs: Exclude ATOMs from speculation through SWAPGS
x86/entry/64: Use JMP instead of JMPQ
x86/speculation: Enable Spectre v1 swapgs mitigations
x86/speculation: Prepare entry code for Spectre v1 swapgs mitigations
We have set the mmc_host.max_seg_size to 8M, but the dma max segment
size of PCI device is set to 64K by default in function pci_device_add().
The mmc_host.max_seg_size is used to set the max segment size of
the blk queue. Then this mismatch will trigger a calltrace like below
when a bigger than 64K segment request arrives at mmc dev. So we should
consider the limitation of the cvm_mmc_host when setting the
mmc_host.max_seg_size.
DMA-API: thunderx_mmc 0000:01:01.4: mapping sg segment longer than device claims to support [len=131072] [max=65536]
WARNING: CPU: 6 PID: 238 at kernel/dma/debug.c:1221 debug_dma_map_sg+0x2b8/0x350
Modules linked in:
CPU: 6 PID: 238 Comm: kworker/6:1H Not tainted 5.3.0-rc1-next-20190724-yocto-standard+ #62
Hardware name: Marvell OcteonTX CN96XX board (DT)
Workqueue: kblockd blk_mq_run_work_fn
pstate: 80c00009 (Nzcv daif +PAN +UAO)
pc : debug_dma_map_sg+0x2b8/0x350
lr : debug_dma_map_sg+0x2b8/0x350
sp : ffff00001770f9e0
x29: ffff00001770f9e0 x28: ffffffff00000000
x27: 00000000ffffffff x26: ffff800bc2c73180
x25: ffff000010e83700 x24: 0000000000000002
x23: 0000000000000001 x22: 0000000000000001
x21: 0000000000000000 x20: ffff800bc48ba0b0
x19: ffff800bc97e8c00 x18: ffffffffffffffff
x17: 0000000000000000 x16: 0000000000000000
x15: ffff000010e835c8 x14: 6874207265676e6f
x13: 6c20746e656d6765 x12: 7320677320676e69
x11: 7070616d203a342e x10: 31303a31303a3030
x9 : 303020636d6d5f78 x8 : 35363d78616d5b20
x7 : 00000000000002fd x6 : ffff000010fd57dc
x5 : 0000000000000000 x4 : ffff0000106c61f0
x3 : 00000000ffffffff x2 : 0000800bee060000
x1 : 7010678df3041a00 x0 : 0000000000000000
Call trace:
debug_dma_map_sg+0x2b8/0x350
cvm_mmc_request+0x3c4/0x988
__mmc_start_request+0x9c/0x1f8
mmc_start_request+0x7c/0xb0
mmc_blk_mq_issue_rq+0x5c4/0x7b8
mmc_mq_queue_rq+0x11c/0x278
blk_mq_dispatch_rq_list+0xb0/0x568
blk_mq_do_dispatch_sched+0x6c/0x108
blk_mq_sched_dispatch_requests+0x110/0x1b8
__blk_mq_run_hw_queue+0xb0/0x118
blk_mq_run_work_fn+0x28/0x38
process_one_work+0x210/0x490
worker_thread+0x48/0x458
kthread+0x130/0x138
ret_from_fork+0x10/0x1c
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Fixes: ba3869ff32 ("mmc: cavium: Add core MMC driver for Cavium SOCs")
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
The SD host controller specification defines 3 types software reset:
software reset for data line, software reset for command line and software
reset for all. Software reset for all means this reset affects the entire
Host controller except for the card detection circuit.
In sdhci_runtime_resume_host() we always do a software "reset for all",
which causes the Spreadtrum variant controller to work abnormally after
resuming. To fix the problem, let's do a software reset for the data and
the command part, rather than "for all".
However, as sdhci_runtime_resume() is a common sdhci function and we don't
want to change the behaviour for other variants, let's introduce a new
in-parameter for it. This enables the caller to decide if a "reset for all"
shall be done or not.
Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
Fixes: fb8bd90f83 ("mmc: sdhci-sprd: Add Spreadtrum's initial host controller")
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:
struct touchpad_protocol {
...
struct tp_finger fingers[0];
};
Make use of the struct_size() helper instead of an open-coded version
in order to avoid any potential type mistakes.
So, replace the following form:
sizeof(*tp) + tp->number_of_fingers * sizeof(tp->fingers[0]);
with:
struct_size(tp, fingers, tp->number_of_fingers)
This code was detected with the help of Coccinelle.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Now that -Wimplicit-fallthrough is passed to GCC by default, the
following warning shows up:
../drivers/iommu/arm-smmu-v3.c: In function ‘arm_smmu_write_strtab_ent’:
../drivers/iommu/arm-smmu-v3.c:1189:7: warning: this statement may fall
through [-Wimplicit-fallthrough=]
if (disable_bypass)
^
../drivers/iommu/arm-smmu-v3.c:1191:3: note: here
default:
^~~~~~~
Rework so that the compiler doesn't warn about fall-through. Make it
clearer by calling 'BUG_ON()' when disable_bypass is set, and always
'break;'
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
MSI pages must always be mapped into a device's *current* domain, which
*might* be the default DMA domain, but might instead be a VFIO domain
with its own MSI cookie. This subtlety got accidentally lost in the
streamlining of __iommu_dma_map(), but rather than reintroduce more
complexity and/or special-casing, it turns out neater to just split this
path out entirely.
Since iommu_dma_get_msi_page() already duplicates much of what
__iommu_dma_map() does, it can easily just make the allocation and
mapping calls directly as well. That way we can further streamline the
helper back to exclusively operating on DMA domains.
Fixes: b61d271e59 ("iommu/dma: Move domain lookup into __iommu_dma_{map,unmap}")
Reported-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Reported-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Andre Przywara <andre.przywara@arm.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The commit bfcba288b9 ("ALSA - hda: Add support for link audio time
reporting") introduced the conditional PCM hw info setup, but it
overwrites the global azx_pcm_hw object. This will cause a problem if
any other HD-audio controller, as it'll inherit the same bit flag
although another controller doesn't support that feature.
Fix the bug by setting the PCM hw info flag locally.
Fixes: bfcba288b9 ("ALSA - hda: Add support for link audio time reporting")
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Perf relies on _etext and _stext symbols being one of 't', 'T', 'v' or
'V'. Put them into .text section to guarantee that.
Also moves padding to page boundary inside .text which has an effect that
.text section is now padded with nops rather than 0's, which apparently
has been the initial intention for specifying 0x0700 fill expression.
Reported-by: Thomas Richter <tmricht@linux.ibm.com>
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Suggested-by: Andreas Krebbel <krebbel@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Cleanup labels in head64 some of which are not being used since git
recorded history.
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Remove pointless stack recursion on stack type ... warning, which
only confuses people. There is no way to make backchain unwinder 100%
reliable. When a task is interrupted in-between stack frame allocation
and backchain write instructions new stack frame backchain pointer is
left uninitialized (there are also sometimes additional instruction
in-between stack frame allocation and backchain write instructions due
to gcc shrink-wrapping). In attempt to unwind such stack the unwinder
would still try to use that invalid backchain value and perform all kind
of sanity checks on it to make sure we are not pointed out of stack. In
some cases that invalid backchain value would be 0 and we would falsely
treat next stackframe as pt_regs and again gprs[15] in those pt_regs
might happen to point at some address within the task's stack.
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
After some investigation it doesn't look like init_mm fields
start_code/end_code are used anywhere besides potentially in dump_mm for
debugging purposes. Originally the value of 0 for start_code reflected
the presence of lowcore and early boot code. But with kaslr in place
start_code/end_code range should not span over unoccupied by the code
segment memory. So, adjust init_mm start_code to point at the beginning
of the code segment like other architectures do it.
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Since commit d1874a0c28 ("s390/mm: make the pxd_offset functions more
robust") behaviour of p4d_offset, pud_offset and pmd_offset has been
changed so that they cannot be used to iterate through top level page
table, because the index for the top level page table is now calculated
in pgd_offset. To avoid dumping the very first region/segment top level
table entry 2048 times simply iterate entry pointer like it is already
done in other page walking cases.
Fixes: d1874a0c28 ("s390/mm: make the pxd_offset functions more robust")
Reported-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
This reverts commit db9492cef4 ("s390/protvirt: add memory sharing for
diag 308 set/store") which due to ultravisor implementation change is
not needed after all.
Fixes: db9492cef4 ("s390/protvirt: add memory sharing for diag 308 set/store")
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
PSI defaults to a FIFO-99 thread, reduce this to FIFO-1.
FIFO-99 is the very highest priority available to SCHED_FIFO and
it not a suitable default; it would indicate the psi work is the
most important work on the machine.
Since Real-Time tasks will have pre-allocated memory and locked it in
place, Real-Time tasks do not care about PSI. All it needs is to be
above OTHER.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Tested-by: Suren Baghdasaryan <surenb@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Valve reported a kernel crash on Ubuntu 18.04 when disconnecting a DS4
gamepad while rumble is enabled. This issue is reproducible with a
frequency of 1 in 3 times in the game Borderlands 2 when using an
automatic weapon, which triggers many rumble operations.
We found the issue to be a race condition between sony_remove and the
final device destruction by the HID / input system. The problem was
that sony_remove didn't clean some of its work_item state in
"struct sony_sc". After sony_remove work, the corresponding evdev
node was around for sufficient time for applications to still queue
rumble work after "sony_remove".
On pre-4.19 kernels the race condition caused a kernel crash due to a
NULL-pointer dereference as "sc->output_report_dmabuf" got freed during
sony_remove. On newer kernels this crash doesn't happen due the buffer
now being allocated using devm_kzalloc. However we can still queue work,
while the driver is an undefined state.
This patch fixes the described problem, by guarding the work_item
"state_worker" with an initialized variable, which we are setting back
to 0 on cleanup.
Signed-off-by: Roderick Colenbrander <roderick.colenbrander@sony.com>
CC: stable@vger.kernel.org
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
In snd_usb_get_audioformat_uac3(), a structure for channel maps 'chmap' is
allocated through kzalloc() before the execution goto 'found_clock'.
However, this structure is not deallocated if the memory allocation for
'pd' fails, leading to a memory leak bug.
To fix the above issue, free 'fp->chmap' before returning NULL.
Fixes: 7edf3b5e6a ("ALSA: usb-audio: AudioStreaming Power Domain parsing")
Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
ASoC: Fixes for v5.3
A relatively large batch of mostly unremarkable fixes here, a couple of
small core fixes for fairly obscure issues, more comment/email updates
with no code impact than usual and a bunch of small driver fixes.
The support for new sample rates in the max98373 driver is a fix for the
fact that the driver declared support for those rates but would in fact
return an error if these rates were selected.
Mark switch cases where we are expecting to fall through.
This patch fixes the following warnings (Building: haps_hs_defconfig arc):
arch/arc/kernel/unwind.c:827:20: warning: this statement may fall through [-Wimplicit-fallthrough=]
arch/arc/kernel/unwind.c:836:20: warning: this statement may fall through [-Wimplicit-fallthrough=]
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
We had a report of a server which did not do a DFS referral
because the session setup Capabilities field was set to 0
(unlike negotiate protocol where we set CAP_DFS). Better to
send it session setup in the capabilities as well (this also
more closely matches Windows client behavior).
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
CC: Stable <stable@vger.kernel.org>
When a reconnect happens in the middle of processing a compound chain
the code leaks a buffer from the memory pool. Fix this by properly
checking for a return code and freeing buffers in case of error.
Also maintain a buf variable to be equal to either smallbuf or bigbuf
depending on a response buffer size while parsing a chain and when
returning to the caller.
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Currently we skip SMB2_TREE_CONNECT command when checking during
reconnect because Tree Connect happens when establishing
an SMB session. For SMB 3.0 protocol version the code also calls
validate negotiate which results in SMB2_IOCL command being sent
over the wire. This may deadlock on trying to acquire a mutex when
checking for reconnect. Fix this by skipping SMB2_IOCL command
when doing the reconnect check.
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
CC: Stable <stable@vger.kernel.org>
Vivek:
"As of now dax_layout_busy_page() calls unmap_mapping_range() with last
argument as 1, which says even unmap cow pages. I am wondering who needs
to get rid of cow pages as well.
I noticed one interesting side affect of this. I mount xfs with -o dax and
mmaped a file with MAP_PRIVATE and wrote some data to a page which created
cow page. Then I called fallocate() on that file to zero a page of file.
fallocate() called dax_layout_busy_page() which unmapped cow pages as well
and then I tried to read back the data I wrote and what I get is old
data from persistent memory. I lost the data I had written. This
read basically resulted in new fault and read back the data from
persistent memory.
This sounds wrong. Are there any users which need to unmap cow pages
as well? If not, I am proposing changing it to not unmap cow pages.
I noticed this while while writing virtio_fs code where when I tried
to reclaim a memory range and that corrupted the executable and I
was running from virtio-fs and program got segment violation."
Dan:
"In fact the unmap_mapping_range() in this path is only to synchronize
against get_user_pages_fast() and force it to call back into the
filesystem to re-establish the mapping. COW pages should be left
untouched by dax_layout_busy_page()."
Cc: <stable@vger.kernel.org>
Fixes: 5fac7408d8 ("mm, fs, dax: handle layout changes to pinned dax mappings")
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Link: https://lore.kernel.org/r/20190802192956.GA3032@redhat.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Marc Kleine-Budde says:
====================
pull-request: can 2019-08-02
this is a pull request of 4 patches for net/master.
The first two patches are by Wang Xiayang, they force that the string buffer
during a dev_info() is properly NULL terminated.
The last two patches are by Tomas Bortoli and fix both a potential info leak of
kernel memory to USB devices.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When powering off the Odroid N2, the tflash_vdd regulator is
automatically turned off by the kernel. This is a problem
when issuing the "reboot" command while using an SD card.
The boot ROM does not power this regulator back on, blocking
the reboot process at the boot ROM stage, preventing the
SD card from being detected.
Adding the "regulator-always-on" property fixes the problem.
Signed-off-by: Xavier Ruppen <xruppen@gmail.com>
Suggested-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Reviewed-by: Neil Armstrong <narmstrong@baylibre.com>
Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Fixes: c35f6dc5c3 ("arm64: dts: meson: Add minimal support for Odroid-N2")
[khilman: minor subject change: s/meson/amlogic/]
Signed-off-by: Kevin Hilman <khilman@baylibre.com>
The G12A USB2 OTG capable PHY uses a 8bit large UTMI bus, and the OTG
controller gets the PHY but width by probing the associated phy.
By default it will use 16bit wide settings if a phy is not specified,
in our case we specified the phy, but not the phy-names.
The dwc2 bindings specifies that if phys is present, phy-names shall be
"usb2-phy".
Adding phy-names = "usb2-phy" solves the OTG PHY bus configuration.
Fixes: 9baf7d6be7 ("arm64: dts: meson: g12a: Add G12A USB nodes")
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: Kevin Hilman <khilman@baylibre.com>
While the individual CHARLCD_BL_xxx options have help texts, the
menu itself does not. Fix this.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mans Rullgard <mans@mansr.com>
[Added a bit more text to address Linus' suggestion]
Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Most of the bridge device's vlan init bugs come from the fact that its
default pvid is created at the wrong time, way too early in ndo_init()
before the device is even assigned an ifindex. It introduces a bug when the
bridge's dev_addr is added as fdb during the initial default pvid creation
the notification has ifindex/NDA_MASTER both equal to 0 (see example below)
which really makes no sense for user-space[0] and is wrong.
Usually user-space software would ignore such entries, but they are
actually valid and will eventually have all necessary attributes.
It makes much more sense to send a notification *after* the device has
registered and has a proper ifindex allocated rather than before when
there's a chance that the registration might still fail or to receive
it with ifindex/NDA_MASTER == 0. Note that we can remove the fdb flush
from br_vlan_flush() since that case can no longer happen. At
NETDEV_REGISTER br->default_pvid is always == 1 as it's initialized by
br_vlan_init() before that and at NETDEV_UNREGISTER it can be anything
depending why it was called (if called due to NETDEV_REGISTER error
it'll still be == 1, otherwise it could be any value changed during the
device life time).
For the demonstration below a small change to iproute2 for printing all fdb
notifications is added, because it contained a workaround not to show
entries with ifindex == 0.
Command executed while monitoring: $ ip l add br0 type bridge
Before (both ifindex and master == 0):
$ bridge monitor fdb
36:7e:8a:b3:56:ba dev * vlan 1 master * permanent
After (proper br0 ifindex):
$ bridge monitor fdb
e6:2a:ae:7a:b7:48 dev br0 vlan 1 master br0 permanent
v4: move only the default pvid init/deinit to NETDEV_REGISTER/UNREGISTER
v3: send the correct v2 patch with all changes (stub should return 0)
v2: on error in br_vlan_init set br->vlgrp to NULL and return 0 in
the br_vlan_bridge_event stub when bridge vlans are disabled
[0] https://bugzilla.kernel.org/show_bug.cgi?id=204389
Reported-by: michael-dev <michael-dev@fami-braun.de>
Fixes: 5be5a2df40 ("bridge: Add filtering support for default_pvid")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
FASTOPEN is not possible with SMC. sendmsg() with msg_flag MSG_FASTOPEN
triggers a fallback to TCP if the socket is in state SMC_INIT.
But if a nonblocking connect is already started, fallback to TCP
is no longer possible, even though the socket may still be in state
SMC_INIT.
And if a nonblocking connect is already started, a listen() call
does not make sense.
Reported-by: syzbot+bd8cc73d665590a1fcad@syzkaller.appspotmail.com
Fixes: 50717a37db ("net/smc: nonblocking connect rework")
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
desc_cnt and data_cnt should always be equal. In the case of a dropped
packet desc_cnt was still getting updated (correctly), data_cnt
was not. To eliminate this bug and prevent it from recurring this
patch combines them into one ring level cnt.
Signed-off-by: Catherine Sullivan <csully@google.com>
Reviewed-by: Sagi Shahar <sagis@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The nexthop path in rt6_update_exception_stamp_rt needs to call
rcu_read_unlock if it fails to find a fib6_nh match rather than
just returning.
Fixes: e659ba31d8 ("ipv6: Handle all fib6_nh in a nexthop in exception handling")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make sure that shutdown never works, and at the same time document how
I tested to came to the conclusion that currently reuse is not possible.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Looks like we were slightly overzealous with the shutdown()
cleanup. Even though the sock->sk_state can reach CLOSED again,
socket->state will not got back to SS_UNCONNECTED once
connections is ESTABLISHED. Meaning we will see EISCONN if
we try to reconnect, and EINVAL if we try to listen.
Only listen sockets can be shutdown() and reused, but since
ESTABLISHED sockets can never be re-connected() or used for
listen() we don't need to try to clean up the ULP state early.
Fixes: 32857cf57f ("net/tls: fix transition through disconnect with close")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull spi fixes from Mark Brown:
"A bunch of small, device specific things here plus a DT bindings fix
for the new validatable YAML binding format.
The most notable thing is the fix for GPIO chip selects which fixes a
corner case in updates of that code to modern APIs, unfortunately due
to a historical mess the code around GPIO support is obscure, fragile
and an ABI which makes and attempt to improve the situation painful"
* tag 'spi-fix-v5.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: pxa2xx: Add support for Intel Tiger Lake
spi: bcm2835: Fix 3-wire mode if DMA is enabled
spi: pxa2xx: Balance runtime PM enable/disable on error
spi: gpio: Add SPI_MASTER_GPIO_SS flag
spi: spi-fsl-qspi: change i.MX7D RX FIFO size
spi: dt-bindings: spi-controller: remove unnecessary 'maxItems: 1' from reg
Pull regulator fixes from Mark Brown:
"A few small driver specific fixes here plus one core fix for a
refcounting problem with DT which will have little practical impact
unless overlays are used"
* tag 'regulator-fix-v5.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: of: Add of_node_put() before return in function
regulator: lp87565: Fix probe failure for "ti,lp87565"
regulator: axp20x: fix DCDC5 and DCDC6 for AXP803
regulator: axp20x: fix DCDCA and DCDCD for AXP806
Pull kselftest fixes from Shuah Khan:
"A fix to the Kselftest framework to save and restore errno and a fix
to livepatch to push and pop dynamic debug config"
* tag 'linux-kselftest-5.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/livepatch: push and pop dynamic debug config
kselftest: save-and-restore errno to allow for %m formatting
If getdents64 is killed or hits on segfault, it'll leave cgroups
directories in sysfs pinned leaking memory because the kernfs node
won't be freed on rmdir and the parent neither.
Repro:
# for i in `seq 1000`; do mkdir $i; done
# rmdir *
# for i in `seq 1000`; do mkdir $i; done
# rmdir *
# for i in `seq 1000`; do while :; do ls $i/ >/dev/null; done & done
# while :; do killall ls; done
kernfs_node_cache in /proc/slabinfo keeps going up as expected.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org # goes way back to original sysfs days
Link: https://lore.kernel.org/r/20190805173404.GF136335@devbig004.ftw2.facebook.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jesper Dangaard Brouer says:
====================
net: fix regressions for generic-XDP
Thanks to Brandon Cazander, who wrote a very detailed bug report that
even used perf probe's on xdp-newbies mailing list, we discovered that
generic-XDP contains some regressions when using bpf_xdp_adjust_head().
First issue were that my selftests script, that use bpf_xdp_adjust_head(),
by mistake didn't use generic-XDP any-longer. That selftest should have
caught the real regression introduced in commit 458bf2f224 ("net: core:
support XDP generic on stacked devices.").
To verify this patchset fix the regressions, you can invoked manually via:
cd tools/testing/selftests/bpf/
sudo ./test_xdp_vlan_mode_generic.sh
sudo ./test_xdp_vlan_mode_native.sh
====================
Link: https://www.spinics.net/lists/xdp-newbies/msg01231.html
Fixes: 458bf2f224 ("net: core: support XDP generic on stacked devices.")
Reported by: Brandon Cazander <brandon.cazander@multapplied.net>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When generic-XDP was moved to a later processing step by commit
458bf2f224 ("net: core: support XDP generic on stacked devices.")
a regression was introduced when using bpf_xdp_adjust_head.
The issue is that after this commit the skb->network_header is now
changed prior to calling generic XDP and not after. Thus, if the header
is changed by XDP (via bpf_xdp_adjust_head), then skb->network_header
also need to be updated again. Fix by calling skb_reset_network_header().
Fixes: 458bf2f224 ("net: core: support XDP generic on stacked devices.")
Reported-by: Brandon Cazander <brandon.cazander@multapplied.net>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Given the increasing number of BPF selftests, it makes sense to
reduce the time to execute these tests. The ping parameters are
adjusted to reduce the time from measures 9 sec to approx 2.8 sec.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In-order to test both native-XDP (xdpdrv) and generic-XDP (xdpgeneric)
create two wrapper test scripts, that start the test_xdp_vlan.sh script
with these modes.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change BPF selftest test_xdp_vlan.sh to (default) use generic XDP.
This selftest was created together with a fix for generic XDP, in commit
2972495699 ("net: fix generic XDP to handle if eth header was
mangled"). And was suppose to catch if generic XDP was broken again.
The tests are using veth and assumed that veth driver didn't support
native driver XDP, thus it used the (ip link set) 'xdp' attach that fell
back to generic-XDP. But veth gained native-XDP support in 948d4f214f
("veth: Add driver XDP"), which caused this test script to use
native-XDP.
Fixes: 948d4f214f ("veth: Add driver XDP")
Fixes: 97396ff0bc ("selftests/bpf: add XDP selftests for modifying and popping VLAN headers")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The commit 069d11465a ("net/mlx5e: RX, Enhance legacy Receive Queue
memory scheme") introduced an undefined behaviour below due to
"frag->last_in_page" is only initialized in mlx5e_init_frags_partition()
when,
if (next_frag.offset + frag_info[f].frag_stride > PAGE_SIZE)
or after bailed out the loop,
for (i = 0; i < mlx5_wq_cyc_get_size(&rq->wqe.wq); i++)
As the result, there could be some "frag" have uninitialized
value of "last_in_page".
Later, get_frag() obtains those "frag" and check "frag->last_in_page" in
mlx5e_put_rx_frag() and triggers the error during boot. Fix it by always
initializing "frag->last_in_page" to "false" in
mlx5e_init_frags_partition().
UBSAN: Undefined behaviour in
drivers/net/ethernet/mellanox/mlx5/core/en_rx.c:325:12
load of value 170 is not a valid value for type 'bool' (aka '_Bool')
Call trace:
dump_backtrace+0x0/0x264
show_stack+0x20/0x2c
dump_stack+0xb0/0x104
__ubsan_handle_load_invalid_value+0x104/0x128
mlx5e_handle_rx_cqe+0x8e8/0x12cc [mlx5_core]
mlx5e_poll_rx_cq+0xca8/0x1a94 [mlx5_core]
mlx5e_napi_poll+0x17c/0xa30 [mlx5_core]
net_rx_action+0x248/0x940
__do_softirq+0x350/0x7b8
irq_exit+0x200/0x26c
__handle_domain_irq+0xc8/0x128
gic_handle_irq+0x138/0x228
el1_irq+0xb8/0x140
arch_cpu_idle+0x1a4/0x348
do_idle+0x114/0x1b0
cpu_startup_entry+0x24/0x28
rest_init+0x1ac/0x1dc
arch_call_rest_init+0x10/0x18
start_kernel+0x4d4/0x57c
Fixes: 069d11465a ("net/mlx5e: RX, Enhance legacy Receive Queue memory scheme")
Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently init call of all actions (except ipt) init their 'parm'
structure as a direct pointer to nla data in skb. This leads to race
condition when some of the filter actions were initialized successfully
(and were assigned with idr action index that was written directly
into nla data), but then were deleted and retried (due to following
action module missing or classifier-initiated retry), in which case
action init code tries to insert action to idr with index that was
assigned on previous iteration. During retry the index can be reused
by another action that was inserted concurrently, which causes
unintended action sharing between filters.
To fix described race condition, save action idr index to temporary
stack-allocated variable instead on nla data.
Fixes: 0190c1d452 ("net: sched: atomically check-allocate action")
Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We have to drop the adjust_link callback in order to finally migrate to
phylink.
Otherwise we get the following warning during startup:
"mv88e6xxx 2188000.ethernet-1:10: Using legacy PHYLIB callbacks. Please
migrate to PHYLINK!"
The warning is generated in the function dsa_port_link_register_of in
dsa/port.c:
int dsa_port_link_register_of(struct dsa_port *dp)
{
struct dsa_switch *ds = dp->ds;
if (!ds->ops->adjust_link)
return dsa_port_phylink_register(dp);
dev_warn(ds->dev,
"Using legacy PHYLIB callbacks. Please migrate to PHYLINK!\n");
[...]
}
Signed-off-by: Hubert Feurstein <h.feurstein@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix two reset-gpio sanity checks which were never converted to use
gpio_is_valid(), and make sure to use -EINVAL to indicate a missing
reset line also for the UART-driver module parameter and for the USB
driver.
This specifically prevents the UART and USB drivers from incidentally
trying to request and use gpio 0, and also avoids triggering a WARN() in
gpio_to_desc() during probe when no valid reset line has been specified.
Fixes: e33a3f84f8 ("NFC: nfcmrvl: allow gpio 0 for reset signalling")
Reported-by: syzbot+cf35b76f35e068a1107f@syzkaller.appspotmail.com
Tested-by: syzbot+cf35b76f35e068a1107f@syzkaller.appspotmail.com
Signed-off-by: Johan Hovold <johan@kernel.org>
The max9611 driver reads the die temperature at probe time to validate
the communication channel. Use the actual read value to perform the test
instead of the read function return value, which was mistakenly used so
far.
The temperature reading test was only successful because the 0 return
value is in the range of supported temperatures.
Fixes: 69780a3bbc ("iio: adc: Add Maxim max9611 ADC driver")
Signed-off-by: Jacopo Mondi <jacopo+renesas@jmondi.org>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
This patch corrects the SPDX License Identifier style
in header file related to STM32 Driver for I2C hardware
bus support.
For C header files Documentation/process/license-rules.rst
mandates C-like comments (opposed to C source files where
C++ style should be used)
Changes made by using a script provided by Joe Perches here:
https://lkml.org/lkml/2019/2/7/46
Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Nishad Kamdar <nishadkamdar@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since commit commit 328e566479 ("KVM: arm/arm64: vgic: Defer
touching GICH_VMCR to vcpu_load/put"), we leave ICH_VMCR_EL2 (or
its GICv2 equivalent) loaded as long as we can, only syncing it
back when we're scheduled out.
There is a small snag with that though: kvm_vgic_vcpu_pending_irq(),
which is indirectly called from kvm_vcpu_check_block(), needs to
evaluate the guest's view of ICC_PMR_EL1. At the point were we
call kvm_vcpu_check_block(), the vcpu is still loaded, and whatever
changes to PMR is not visible in memory until we do a vcpu_put().
Things go really south if the guest does the following:
mov x0, #0 // or any small value masking interrupts
msr ICC_PMR_EL1, x0
[vcpu preempted, then rescheduled, VMCR sampled]
mov x0, #ff // allow all interrupts
msr ICC_PMR_EL1, x0
wfi // traps to EL2, so samping of VMCR
[interrupt arrives just after WFI]
Here, the hypervisor's view of PMR is zero, while the guest has enabled
its interrupts. kvm_vgic_vcpu_pending_irq() will then say that no
interrupts are pending (despite an interrupt being received) and we'll
block for no reason. If the guest doesn't have a periodic interrupt
firing once it has blocked, it will stay there forever.
To avoid this unfortuante situation, let's resync VMCR from
kvm_arch_vcpu_blocking(), ensuring that a following kvm_vcpu_check_block()
will observe the latest value of PMR.
This has been found by booting an arm64 Linux guest with the pseudo NMI
feature, and thus using interrupt priorities to mask interrupts instead
of the usual PSTATE masking.
Cc: stable@vger.kernel.org # 4.12
Fixes: 328e566479 ("KVM: arm/arm64: vgic: Defer touching GICH_VMCR to vcpu_load/put")
Signed-off-by: Marc Zyngier <maz@kernel.org>
In commit fe64ba5c63 ("drm/rockchip: Resume DP early") we moved
resume to be early but left suspend at its normal time. This seems
like it could be OK, but casues problems if a suspend gets interrupted
partway through. The OS only balances matching suspend/resume levels.
...so if suspend was called then resume will be called. If suspend
late was called then resume early will be called. ...but if suspend
was called resume early might not get called. This leads to an
unbalance in the clock enables / disables.
Lets take the simple fix and just move suspend to be late to match.
This makes the PM core take proper care in keeping things balanced.
Fixes: fe64ba5c63 ("drm/rockchip: Resume DP early")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20190802184616.44822-1-dianders@chromium.org
We want to use DW AXI DMAC on HSDK board in our automated verification
to test cache & dma kernel code changes. This is perfect candidate
as we don't depend on any external peripherals like MMC card / USB
storage / etc.
To increase test coverage we want to test both options:
* DW AXI DMAC is connected through IOC port & dma direct ops used
* DW AXI DMAC is connected to DDR port & dma noncoherent ops used
Introduce 'arc_hsdk_axi_dmac_coherent' global variable which can be
modified by debugger (same way as we patch 'ioc_enable') to switch
between these options without recompiling the kernel.
Depend on this value we tweak memory bridge configuration and
"dma-coherent" DTS property of DW AXI DMAC.
Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
Acked-by: Alexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Some a4tech mice use the 'GenericDesktop.00b8' usage to inform whether
the previous wheel report was horizontal or vertical. Before
c01908a14b ("HID: input: add mapping for "Toggle Display" key") this
usage was being mapped to 'Relative.Misc'. After the patch it's simply
ignored (usage->type == 0 & usage->code == 0). Which ultimately makes
hid-a4tech ignore the WHEEL/HWHEEL selection event, as it has no
usage->type.
We shouldn't rely on a mapping for that usage as it's nonstandard and
doesn't really map to an input event. So we bypass the mapping and make
sure the custom event handling properly handles both reports.
Fixes: c01908a14b ("HID: input: add mapping for "Toggle Display" key")
Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
The ioctl handler uses the intfdata of a second interface,
which may not be present in a broken or malicious device, hence
the intfdata needs to be checked for NULL.
[jkosina@suse.cz: fix newly added spurious space]
Reported-by: syzbot+965152643a75a56737be@syzkaller.appspotmail.com
Signed-off-by: Oliver Neukum <oneukum@suse.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
The Saitek X52 joystick has a pair of axes that are originally
(by the Windows driver) used as mouse pointer controls. The corresponding
usage->hid values are 0x50024 and 0x50026. Thus they are handled
as unknown axes and both get mapped to ABS_MISC. The quirk makes
the second axis to be mapped to ABS_MISC1 and thus made available
separately.
[jkosina@suse.cz: squashed two patches into one]
Signed-off-by: István Váradi <ivaradi@varadiistvan.hu>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Most code in arch/x86/kernel/kvm.c is called through x86_hyper_kvm, and thus only
runs if KVM has been detected. There is no need to check again for the CPUID
base.
Cc: Sergio Lopez <slp@redhat.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When calling debugfs functions, there is no need to ever check the
return value. The function can work or not, but the code logic should
never do something different based on this.
Also, when doing this, change kvm_arch_create_vcpu_debugfs() to return
void instead of an integer, as we should not care at all about if this
function actually does anything or not.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <x86@kernel.org>
Cc: <kvm@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
There is no need for this function as all arches have to implement
kvm_arch_create_vcpu_debugfs() no matter what. A #define symbol
let us actually simplify the code.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
After commit d73eb57b80 (KVM: Boost vCPUs that are delivering interrupts), a
five years old bug is exposed. Running ebizzy benchmark in three 80 vCPUs VMs
on one 80 pCPUs Skylake server, a lot of rcu_sched stall warning splatting
in the VMs after stress testing:
INFO: rcu_sched detected stalls on CPUs/tasks: { 4 41 57 62 77} (detected by 15, t=60004 jiffies, g=899, c=898, q=15073)
Call Trace:
flush_tlb_mm_range+0x68/0x140
tlb_flush_mmu.part.75+0x37/0xe0
tlb_finish_mmu+0x55/0x60
zap_page_range+0x142/0x190
SyS_madvise+0x3cd/0x9c0
system_call_fastpath+0x1c/0x21
swait_active() sustains to be true before finish_swait() is called in
kvm_vcpu_block(), voluntarily preempted vCPUs are taken into account
by kvm_vcpu_on_spin() loop greatly increases the probability condition
kvm_arch_vcpu_runnable(vcpu) is checked and can be true, when APICv
is enabled the yield-candidate vCPU's VMCS RVI field leaks(by
vmx_sync_pir_to_irr()) into spinning-on-a-taken-lock vCPU's current
VMCS.
This patch fixes it by checking conservatively a subset of events.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Marc Zyngier <Marc.Zyngier@arm.com>
Cc: stable@vger.kernel.org
Fixes: 98f4a1467 (KVM: add kvm_arch_vcpu_runnable() test to kvm_vcpu_on_spin() loop)
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
preempted_in_kernel is updated in preempt_notifier when involuntary preemption
ocurrs, it can be stale when the voluntarily preempted vCPUs are taken into
account by kvm_vcpu_on_spin() loop. This patch lets it just check preempted_in_kernel
for involuntary preemption.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
kvm_set_pending_timer() will take care to wake up the sleeping vCPU which
has pending timer, don't need to check this in apic_timer_expired() again.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Commit dbcbabf7da ("HID: logitech-dj: fix return value of
logi_dj_recv_query_hidpp_devices") made logi_dj_recv_query_hidpp_devices
return the return value of hid_hw_raw_request instead of unconditionally
returning 0.
But hid_hw_raw_request returns the report-size on a successful request
(and a negative error-code on failure) where as the callers of
logi_dj_recv_query_hidpp_devices expect a 0 return on success.
This commit fixes things so that either the negative error gets returned
or 0 on success, fixing HID++ receivers such as the Logitech nano receivers
no longer working.
Cc: YueHaibing <yuehaibing@huawei.com>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: dbcbabf7da ("HID: logitech-dj: fix return value of logi_dj_recv_query_hidpp_devices")
Reported-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reported-by: Rafael J. Wysocki <rjw@rjwysocki.net>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Tested-by: Rafael J. Wysocki <rjw@rjwysocki.net>
Reviewed-by: Petr Vorel <pvorel@suse.cz>
Tested-by: Petr Vorel <pvorel@suse.cz>
Reviewed-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
This makes the previously added 'encap test' pass.
Because its possible that the xfrm dst entry becomes stale while such
a flow is offloaded, we need to call dst_check() -- the notifier that
handles this for non-tunneled traffic isn't sufficient, because SA or
or policies might have changed.
If dst becomes stale the flow offload entry will be tagged for teardown
and packets will be passed to 'classic' forwarding path.
Removing the entry right away is problematic, as this would
introduce a race condition with the gc worker.
In case flow is long-lived, it could eventually be offloaded again
once the gc worker removes the entry from the flow table.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
'flow offload' expression should not offload flows that will be subject
to ipsec, but it does.
This results in a connectivity blackhole for the affected flows -- first
packets will go through (offload happens after established state is
reached), but all remaining ones bypass ipsec encryption and are thus
discarded by the peer.
This can be worked around by adding "rt ipsec exists accept"
before the 'flow offload' rule matches.
This test case will fail, support for such flows is added in
next patch.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Exception handlers call FAKE_RET_FROM_EXCPN to
- clear AE bit: drop down from exception active to pure kernel mode
allowing further excptions
- set IE bit: re-enable interrupts
It additionally also clears U bit (user mode) and DE bit (delay slot
execution) which is redundant as hardware does that already on any taken
exception. Morevoer the current software clearing is bogus anyways as
the KFLAG instruction being used for purpose can't possibly write those
bits anyways.
So don't pretend to clear them.
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
[vgupta: rewrote changelog]
Fixes: 72abe3bcf0 ("signal/cifs: Fix cifs_put_tcp_session to call send_sig instead of force_sig")
The global change from force_sig caused module unloading of cifs.ko
to fail (since the cifsd process could not be killed, "rmmod cifs"
now would always fail)
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
CC: Eric W. Biederman <ebiederm@xmission.com>
People are reporing seeing fscache errors being reported concerning
duplicate cookies even in cases where they are not setting up fscache
at all. The rule needs to be that if fscache is not enabled, then it
should have no side effects at all.
To ensure this is the case, we disable fscache completely on all superblocks
for which the 'fsc' mount option was not set. In order to avoid issues
with '-oremount', we also disable the ability to turn fscache on via
remount.
Fixes: f1fe29b4a0 ("NFS: Use i_writecount to control whether...")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=200145
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Steve Dickson <steved@redhat.com>
Cc: David Howells <dhowells@redhat.com>
John Hubbard reports seeing the following stack trace:
nfs4_do_reclaim
rcu_read_lock /* we are now in_atomic() and must not sleep */
nfs4_purge_state_owners
nfs4_free_state_owner
nfs4_destroy_seqid_counter
rpc_destroy_wait_queue
cancel_delayed_work_sync
__cancel_work_timer
__flush_work
start_flush_work
might_sleep:
(kernel/workqueue.c:2975: BUG)
The solution is to separate out the freeing of the state owners
from nfs4_purge_state_owners(), and perform that outside the atomic
context.
Reported-by: John Hubbard <jhubbard@nvidia.com>
Fixes: 0aaaf5c424 ("NFS: Cache state owners after files are closed")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Ensure that we always check the return value of update_open_stateid()
so that we can retry if the update of local state failed. This fixes
infinite looping on state recovery.
Fixes: e23008ec81 ("NFSv4 reduce attribute requests for open reclaim")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: stable@vger.kernel.org # v3.7+
Fix nfs_reap_expired_delegations() to ensure that we only reap delegations
that are actually expired, rather than triggering on random errors.
Fixes: 45870d6909 ("NFSv4.1: Test delegation stateids when server...")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
The logic for checking in nfs41_check_open_stateid() whether the state
is supported by a delegation is inverted. In addition, it makes more
sense to perform that check before we check for expired locks.
Fixes: 8a64c4ef10 ("NFSv4.1: Even if the stateid is OK,...")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
In pnfs_update_layout() ensure that we do report any fatal errors from
nfs4_select_rw_stateid().
Fixes: d9aba2b40d ("NFSv4: Don't use the zero stateid with layoutget")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
If the server returns with EAGAIN when we're trying to recover from
a server reboot, we currently delay for 1 second, but then mark the
stateid as needing recovery after the grace period has expired.
Instead, we should just retry the same recovery process immediately
after the 1 second delay. Break out of the loop after 10 retries.
Fixes: 35a61606a6 ("NFS: Reduce indentation of the switch statement...")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
When error recovery fails due to a fatal error on the server, ensure
we log it in the syslog.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Once we clear the NFS_DELEGATED_STATE flag, we're telling
nfs_delegation_claim_opens() that we're done recovering all open state
for that stateid, so we really need to ensure that we test for all
open modes that are currently cached and recover them before exiting
nfs4_open_delegation_recall().
Fixes: 24311f8841 ("NFSv4: Recovery of recalled read delegations...")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: stable@vger.kernel.org # v4.3+
It is unsafe to dereference delegation outside the rcu lock, and in
any case, the refcount is guaranteed held if cred is non-zero.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Pull tpm fixes from Jarkko Sakkinen:
"Two bug fixes that did not make into my first pull request"
* tag 'tpmdd-next-20190805' of git://git.infradead.org/users/jjs/linux-tpmdd:
tpm: tpm_ibm_vtpm: Fix unallocated banks
tpm: Fix null pointer dereference on chip register error path
Pull MTD fixes from Miquel Raynal:
"NAND:
- Fix Micron driver as some chips enable internal ECC correction
during their discovery while they advertize they do not have any.
Hyperbus:
- Restrict the build to only ARM64 SoCs (and compile testing) which
is what should have been done since the beginning.
- Fix Kconfig issue by selection something instead of implying it"
* tag 'mtd/fixes-for-5.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
mtd: hyperbus: Add hardware dependency to AM654 driver
mtd: hyperbus: Kconfig: Fix HBMC_AM654 dependencies
mtd: rawnand: micron: handle on-die "ECC-off" devices correctly
The nr_allocated_banks and allocated banks are initialized as part of
tpm_chip_register. Currently, this is done as part of auto startup
function. However, some drivers, like the ibm vtpm driver, do not run
auto startup during initialization. This results in uninitialized memory
issue and causes a kernel panic during boot.
This patch moves the pcr allocation outside the auto startup function
into tpm_chip_register. This ensures that allocated banks are initialized
in any case.
Fixes: 879b589210 ("tpm: retrieve digest size of unknown algorithms with PCR read")
Reported-by: Michal Suchanek <msuchanek@suse.de>
Signed-off-by: Nayna Jain <nayna@linux.ibm.com>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Tested-by: Michal Suchánek <msuchanek@suse.de>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Pull powerpc fixes from Michael Ellerman:
"Some more powerpc fixes for 5.3:
- Wire up the new clone3 syscall.
- A fix for the PAPR SCM nvdimm driver, to fix a crash when firmware
gives us a device that's attached to a non-online NUMA node.
- A fix for a boot failure on 32-bit with KASAN enabled.
- Three fixes for implicit fall through warnings, some of which are
errors for us due to -Werror.
Thanks to: Aneesh Kumar K.V, Christophe Leroy, Kees Cook, Santosh
Sivaraj, Stephen Rothwell"
* tag 'powerpc-5.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/kasan: fix early boot failure on PPC32
drivers/macintosh/smu.c: Mark expected switch fall-through
powerpc/spe: Mark expected switch fall-throughs
powerpc/nvdimm: Pick nearby online node if the device node is not online
powerpc/kvm: Fall through switch case explicitly
powerpc: Wire up clone3 syscall
At the end of the v5.3 upstream kernel development cycle, Simon will be
stepping down from his role as Renesas SoC maintainer. Starting with
the v5.4 development cycle, Geert is taking over this role.
Add Geert as a co-maintainer, and add his git repository and branch.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Acked-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull Kbuild fixes from Masahiro Yamada:
- detect missing missing "WITH Linux-syscall-note" for uapi headers
- fix needless rebuild when using Clang
- fix false-positive cc-option in Kconfig when using Clang
- avoid including corrupted .*.cmd files in the modpost stage
- fix warning of 'make vmlinux'
- fix {m,n,x,g}config to not generate the broken .config on the second
save operation.
- some trivial Makefile fixes
* tag 'kbuild-fixes-v5.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kconfig: Clear "written" flag to avoid data loss
kbuild: Check for unknown options with cc-option usage in Kconfig and clang
lib/raid6: fix unnecessary rebuild of vpermxor*.c
kbuild: modpost: do not parse unnecessary rules for vmlinux modpost
kbuild: modpost: remove unnecessary dependency for __modpost
kbuild: modpost: handle KBUILD_EXTRA_SYMBOLS only for external modules
kbuild: modpost: include .*.cmd files only when targets exist
kbuild: initialize CLANG_FLAGS correctly in the top Makefile
kbuild: detect missing "WITH Linux-syscall-note" for uapi headers
Pull SafeSetID maintainer update from Micah Morton:
"Add entry in MAINTAINERS file for SafeSetID LSM"
* tag 'safesetid-maintainers-correction-5.3-rc2' of git://github.com/micah-morton/linux:
Add entry in MAINTAINERS file for SafeSetID LSM
Prior to this commit, starting nconfig, xconfig or gconfig, and saving
the .config file more than once caused data loss, where a .config file
that contained only comments would be written to disk starting from the
second save operation.
This bug manifests itself because the SYMBOL_WRITTEN flag is never
cleared after the first call to conf_write, and subsequent calls to
conf_write then skip all of the configuration symbols due to the
SYMBOL_WRITTEN flag being set.
This commit resolves this issue by clearing the SYMBOL_WRITTEN flag
from all symbols before conf_write returns.
Fixes: 8e2442a5f8 ("kconfig: fix missing choice values in auto.conf")
Cc: linux-stable <stable@vger.kernel.org> # 4.19+
Signed-off-by: M. Vefa Bicakci <m.v.b@runbox.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Pull Xtensa fix from Max Filippov:
"Fix build for xtensa cores with coprocessors that was broken by
entry/return abstraction patch"
* tag 'xtensa-20190803' of git://github.com/jcmvbkbc/linux-xtensa:
xtensa: fix build for cores with coprocessors
Pull i2c fixes from Wolfram Sang:
"A set of driver fixes for the I2C subsystem"
* 'i2c/for-current-fixed' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: s3c2410: Mark expected switch fall-through
i2c: at91: fix clk_offset for sama5d2
i2c: at91: disable TXRDY interrupt after sending data
i2c: iproc: Fix i2c master read more than 63 bytes
eeprom: at24: make spd world-readable again
Add documentation to the Spectre document about the new swapgs variant of
Spectre v1.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
There are a lot of those warnings with GCC8+ 64-bit,
In file included from ./include/linux/sctp.h:42,
from net/core/skbuff.c:47:
./include/uapi/linux/sctp.h:395:1: warning: alignment 4 of 'struct
sctp_paddr_change' is less than 8 [-Wpacked-not-aligned]
} __attribute__((packed, aligned(4)));
^
./include/uapi/linux/sctp.h:728:1: warning: alignment 4 of 'struct
sctp_setpeerprim' is less than 8 [-Wpacked-not-aligned]
} __attribute__((packed, aligned(4)));
^
./include/uapi/linux/sctp.h:727:26: warning: 'sspp_addr' offset 4 in
'struct sctp_setpeerprim' isn't aligned to 8 [-Wpacked-not-aligned]
struct sockaddr_storage sspp_addr;
^~~~~~~~~
./include/uapi/linux/sctp.h:741:1: warning: alignment 4 of 'struct
sctp_prim' is less than 8 [-Wpacked-not-aligned]
} __attribute__((packed, aligned(4)));
^
./include/uapi/linux/sctp.h:740:26: warning: 'ssp_addr' offset 4 in
'struct sctp_prim' isn't aligned to 8 [-Wpacked-not-aligned]
struct sockaddr_storage ssp_addr;
^~~~~~~~
./include/uapi/linux/sctp.h:792:1: warning: alignment 4 of 'struct
sctp_paddrparams' is less than 8 [-Wpacked-not-aligned]
} __attribute__((packed, aligned(4)));
^
./include/uapi/linux/sctp.h:784:26: warning: 'spp_address' offset 4 in
'struct sctp_paddrparams' isn't aligned to 8 [-Wpacked-not-aligned]
struct sockaddr_storage spp_address;
^~~~~~~~~~~
./include/uapi/linux/sctp.h:905:1: warning: alignment 4 of 'struct
sctp_paddrinfo' is less than 8 [-Wpacked-not-aligned]
} __attribute__((packed, aligned(4)));
^
./include/uapi/linux/sctp.h:899:26: warning: 'spinfo_address' offset 4
in 'struct sctp_paddrinfo' isn't aligned to 8 [-Wpacked-not-aligned]
struct sockaddr_storage spinfo_address;
^~~~~~~~~~~~~~
This is because the commit 20c9c825b1 ("[SCTP] Fix SCTP socket options
to work with 32-bit apps on 64-bit kernels.") added "packed, aligned(4)"
GCC attributes to some structures but one of the members, i.e, "struct
sockaddr_storage" in those structures has the attribute,
"aligned(__alignof__ (struct sockaddr *)" which is 8-byte on 64-bit
systems, so the commit overwrites the designed alignments for
"sockaddr_storage".
To fix this, "struct sockaddr_storage" needs to be aligned to 4-byte as
it is only used in those packed sctp structure which is part of UAPI,
and "struct __kernel_sockaddr_storage" is used in some other
places of UAPI that need not to change alignments in order to not
breaking userspace.
Use an implicit alignment for "struct __kernel_sockaddr_storage" so it
can keep the same alignments as a member in both packed and un-packed
structures without breaking UAPI.
Suggested-by: David Laight <David.Laight@ACULAB.COM>
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull perf tooling fixes from Thomas Gleixner:
"A set of updates for perf tools and documentation:
perf header:
- Prevent a division by zero
- Deal with an uninitialized warning proper
libbpf:
- Fix the missiong __WORDSIZE definition for musl & al
UAPI headers:
- Synchronize kernel headers
Documentation:
- Fix the memory units for perf.data size"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
libbpf: fix missing __WORDSIZE definition
perf tools: Fix perf.data documentation units for memory size
perf header: Fix use of unitialized value warning
perf header: Fix divide by zero error if f_header.attr_size==0
tools headers UAPI: Sync if_link.h with the kernel
tools headers UAPI: Sync sched.h with the kernel
tools headers UAPI: Sync usbdevice_fs.h with the kernels to get new ioctl
tools perf beauty: Fix usbdevfs_ioctl table generator to handle _IOC()
tools headers UAPI: Update tools's copy of drm.h headers
tools headers UAPI: Update tools's copy of mman.h headers
tools headers UAPI: Update tools's copy of kvm.h headers
tools include UAPI: Sync x86's syscalls_64.tbl and generic unistd.h to pick up clone3 and pidfd_open
Pull vdso timer fixes from Thomas Gleixner:
"A series of commits to deal with the regression caused by the generic
VDSO implementation.
The usage of clock_gettime64() for 32bit compat fallback syscalls
caused seccomp filters to kill innocent processes because they only
allow clock_gettime().
Handle the compat syscalls with clock_gettime() as before, which is
not a functional problem for the VDSO as the legacy compat application
interface is not y2038 safe anyway. It's just extra fallback code
which needs to be implemented on every architecture.
It's opt in for now so that it does not break the compile of already
converted architectures in linux-next. Once these are fixed, the
#ifdeffery goes away.
So much for trying to be smart and reuse code..."
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
arm64: compat: vdso: Use legacy syscalls as fallback
x86/vdso/32: Use 32bit syscall fallback
lib/vdso/32: Provide legacy syscall fallbacks
lib/vdso: Move fallback invocation to the callers
lib/vdso/32: Remove inconsistent NULL pointer checks
Pull irq fixes from Thomas Gleixner:
"A small bunch of fixes from the irqchip department:
- Fix a couple of UAF on error paths (RZA1, GICv3 ITS)
- Fix iMX GPCv2 trigger setting
- Add missing of_node_put() on error path in MBIGEN
- Add another bunch of /* fall-through */ to silence warnings"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/renesas-rza1: Fix an use-after-free in rza1_irqc_probe()
irqchip/irq-imx-gpcv2: Forward irq type to parent
irqchip/irq-mbigen: Add of_node_put() before return
irqchip/gic-v3-its: Free unused vpt_page when alloc vpe table fail
irqchip/gic-v3: Mark expected switch fall-through
Pull xfs fixes from Darrick Wong:
- Avoid leaking kernel stack contents to userspace
- Fix a potential null pointer dereference in the dabtree scrub code
* tag 'xfs-5.3-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: Fix possible null-pointer dereferences in xchk_da_btree_block_check_sibling()
xfs: fix stack contents leakage in the v1 inumber ioctls
When the system is close-to-OOM, fsync() may fail due to -ENOMEM because
xfs_log_reserve() is using KM_MAYFAIL. It is a bad thing to fail writeback
operation due to user-triggerable OOM condition. Since we are not using
KM_MAYFAIL at xfs_trans_alloc() before calling xfs_log_reserve(), let's
use the same flags at xfs_log_reserve().
oom-torture: page allocation failure: order:0, mode:0x46c40(GFP_NOFS|__GFP_NOWARN|__GFP_RETRY_MAYFAIL|__GFP_COMP), nodemask=(null)
CPU: 7 PID: 1662 Comm: oom-torture Kdump: loaded Not tainted 5.3.0-rc2+ #925
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00
Call Trace:
dump_stack+0x67/0x95
warn_alloc+0xa9/0x140
__alloc_pages_slowpath+0x9a8/0xbce
__alloc_pages_nodemask+0x372/0x3b0
alloc_slab_page+0x3a/0x8d0
new_slab+0x330/0x420
___slab_alloc.constprop.94+0x879/0xb00
__slab_alloc.isra.89.constprop.93+0x43/0x6f
kmem_cache_alloc+0x331/0x390
kmem_zone_alloc+0x9f/0x110 [xfs]
kmem_zone_alloc+0x9f/0x110 [xfs]
xlog_ticket_alloc+0x33/0xd0 [xfs]
xfs_log_reserve+0xb4/0x410 [xfs]
xfs_trans_reserve+0x1d1/0x2b0 [xfs]
xfs_trans_alloc+0xc9/0x250 [xfs]
xfs_setfilesize_trans_alloc.isra.27+0x44/0xc0 [xfs]
xfs_submit_ioend.isra.28+0xa5/0x180 [xfs]
xfs_vm_writepages+0x76/0xa0 [xfs]
do_writepages+0x17/0x80
__filemap_fdatawrite_range+0xc1/0xf0
file_write_and_wait_range+0x53/0xa0
xfs_file_fsync+0x87/0x290 [xfs]
vfs_fsync_range+0x37/0x80
do_fsync+0x38/0x60
__x64_sys_fsync+0xf/0x20
do_syscall_64+0x4a/0x1c0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Fixes: eb01c9cd87 ("[XFS] Remove the xlog_ticket allocator")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Merge misc fixes from Andrew Morton:
"17 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
drivers/acpi/scan.c: document why we don't need the device_hotplug_lock
memremap: move from kernel/ to mm/
lib/test_meminit.c: use GFP_ATOMIC in RCU critical section
asm-generic: fix -Wtype-limits compiler warnings
cgroup: kselftest: relax fs_spec checks
mm/memory_hotplug.c: remove unneeded return for void function
mm/migrate.c: initialize pud_entry in migrate_vma()
coredump: split pipe command whitespace before expanding template
page flags: prioritize kasan bits over last-cpuid
ubsan: build ubsan.c more conservatively
kasan: remove clang version check for KASAN_STACK
mm: compaction: avoid 100% CPU usage during compaction when a task is killed
mm: migrate: fix reference check race between __find_get_block() and migration
mm: vmscan: check if mem cgroup is disabled or not before calling memcg slab shrinker
ocfs2: remove set but not used variable 'last_hash'
Revert "kmemleak: allow to coexist with fault injection"
kernel/signal.c: fix a kernel-doc markup
Pull RISC-V fixes from Paul Walmsley:
"Three minor RISC-V-related changes for v5.3-rc3:
- Add build ID to VDSO builds to avoid a double-free in perf when
libelf isn't used
- Align the RV64 defconfig to the output of "make savedefconfig" so
subsequent defconfig patches don't get out of hand
- Drop a superfluous DT property from the FU540 SoC DT data (since it
must be already set in board data that includes it)"
* tag 'riscv/for-v5.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: defconfig: align RV64 defconfig to the output of "make savedefconfig"
riscv: dts: fu540-c000: drop "timebase-frequency"
riscv: Fix perf record without libelf support
Before this change the device tree description of qspi node for
second memory on BK4 board was wrong (applicable to old, removed
fsl-quadspi.c driver).
As a result this memory was not recognized correctly when used
with the new spi-fsl-qspi.c driver.
From the dt-bindings:
"Required SPI slave node properties:
- reg: There are two buses (A and B) with two chip selects each.
This encodes to which bus and CS the flash is connected:
<0>: Bus A, CS 0
<1>: Bus A, CS 1
<2>: Bus B, CS 0
<3>: Bus B, CS 1"
According to above with new driver the second SPI-NOR memory shall
have reg=<2> as it is connected to Bus B, CS 0.
Fixes: a67d2c52a8 ("ARM: dts: Add support for Liebherr's BK4 device (vf610 based)")
Suggested-by: Fabio Estevam <festevam@gmail.com>
Signed-off-by: Lukasz Majewski <lukma@denx.de>
Reviewed-by: Fabio Estevam <festevam@gmail.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Commit d66acc39c7 ("bitops: Optimise get_order()") introduced a
compilation warning because "rx_frag_size" is an "ushort" while
PAGE_SHIFT here is 16.
The commit changed the get_order() to be a multi-line macro where
compilers insist to check all statements in the macro even when
__builtin_constant_p(rx_frag_size) will return false as "rx_frag_size"
is a module parameter.
In file included from ./arch/powerpc/include/asm/page_64.h:107,
from ./arch/powerpc/include/asm/page.h:242,
from ./arch/powerpc/include/asm/mmu.h:132,
from ./arch/powerpc/include/asm/lppaca.h:47,
from ./arch/powerpc/include/asm/paca.h:17,
from ./arch/powerpc/include/asm/current.h:13,
from ./include/linux/thread_info.h:21,
from ./arch/powerpc/include/asm/processor.h:39,
from ./include/linux/prefetch.h:15,
from drivers/net/ethernet/emulex/benet/be_main.c:14:
drivers/net/ethernet/emulex/benet/be_main.c: In function 'be_rx_cqs_create':
./include/asm-generic/getorder.h:54:9: warning: comparison is always
true due to limited range of data type [-Wtype-limits]
(((n) < (1UL << PAGE_SHIFT)) ? 0 : \
^
drivers/net/ethernet/emulex/benet/be_main.c:3138:33: note: in expansion
of macro 'get_order'
adapter->big_page_size = (1 << get_order(rx_frag_size)) * PAGE_SIZE;
^~~~~~~~~
Fix it by moving all of this multi-line macro into a proper function,
and killing __get_order() off.
[akpm@linux-foundation.org: remove __get_order() altogether]
[cai@lca.pw: v2]
Link: http://lkml.kernel.org/r/1564000166-31428-1-git-send-email-cai@lca.pw
Link: http://lkml.kernel.org/r/1563914986-26502-1-git-send-email-cai@lca.pw
Fixes: d66acc39c7 ("bitops: Optimise get_order()")
Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Jakub Jelinek <jakub@redhat.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Bill Wendling <morbo@google.com>
Cc: James Y Knight <jyknight@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On my laptop most memcg kselftests were being skipped because it claimed
cgroup v2 hierarchy wasn't mounted, but this isn't correct. Instead, it
seems current systemd HEAD mounts it with the name "cgroup2" instead of
"cgroup":
% grep cgroup /proc/mounts
cgroup2 /sys/fs/cgroup cgroup2 rw,nosuid,nodev,noexec,relatime,nsdelegate 0 0
I can't think of a reason to need to check fs_spec explicitly
since it's arbitrary, so we can just rely on fs_vfstype.
After these changes, `make TARGETS=cgroup kselftest` actually runs the
cgroup v2 tests in more cases.
Link: http://lkml.kernel.org/r/20190723210737.GA487@chrisdown.name
Signed-off-by: Chris Down <chris@chrisdown.name>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Save the offsets of the start of each argument to avoid having to update
pointers to each argument after every corename krealloc and to avoid
having to duplicate the memory for the dump command.
Executable names containing spaces were previously being expanded from
%e or %E and then split in the middle of the filename. This is
incorrect behaviour since an argument list can represent arguments with
spaces.
The splitting could lead to extra arguments being passed to the core
dump handler that it might have interpreted as options or ignored
completely.
Core dump handlers that are not aware of this Linux kernel issue will be
using %e or %E without considering that it may be split and so they will
be vulnerable to processes with spaces in their names breaking their
argument list. If their internals are otherwise well written, such as
if they are written in shell but quote arguments, they will work better
after this change than before. If they are not well written, then there
is a slight chance of breakage depending on the details of the code but
they will already be fairly broken by the split filenames.
Core dump handlers that are aware of this Linux kernel issue will be
placing %e or %E as the last item in their core_pattern and then
aggregating all of the remaining arguments into one, separated by
spaces. Alternatively they will be obtaining the filename via other
methods. Both of these will be compatible with the new arrangement.
A side effect from this change is that unknown template types (for
example %z) result in an empty argument to the dump handler instead of
the argument being dropped. This is a desired change as:
It is easier for dump handlers to process empty arguments than dropped
ones, especially if they are written in shell or don't pass each
template item with a preceding command-line option in order to
differentiate between individual template types. Most core_patterns in
the wild do not use options so they can confuse different template types
(especially numeric ones) if an earlier one gets dropped in old kernels.
If the kernel introduces a new template type and a core_pattern uses it,
the core dump handler might not expect that the argument can be dropped
in old kernels.
For example, this can result in security issues when %d is dropped in
old kernels. This happened with the corekeeper package in Debian and
resulted in the interface between corekeeper and Linux having to be
rewritten to use command-line options to differentiate between template
types.
The core_pattern for most core dump handlers is written by the handler
author who would generally not insert unknown template types so this
change should be compatible with all the core dump handlers that exist.
Link: http://lkml.kernel.org/r/20190528051142.24939-1-pabs3@bonedaddy.net
Fixes: 74aadce986 ("core_pattern: allow passing of arguments to user mode helper when core_pattern is a pipe")
Signed-off-by: Paul Wise <pabs3@bonedaddy.net>
Reported-by: Jakub Wilk <jwilk@jwilk.net> [https://bugs.debian.org/924398]
Reported-by: Paul Wise <pabs3@bonedaddy.net> [https://lore.kernel.org/linux-fsdevel/c8b7ecb8508895bf4adb62a748e2ea2c71854597.camel@bonedaddy.net/]
Suggested-by: Jakub Wilk <jwilk@jwilk.net>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
objtool points out several conditions that it does not like, depending
on the combination with other configuration options and compiler
variants:
stack protector:
lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch()+0xbf: call to __stack_chk_fail() with UACCESS enabled
lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch_v1()+0xbe: call to __stack_chk_fail() with UACCESS enabled
stackleak plugin:
lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch()+0x4a: call to stackleak_track_stack() with UACCESS enabled
lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch_v1()+0x4a: call to stackleak_track_stack() with UACCESS enabled
kasan:
lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch()+0x25: call to memcpy() with UACCESS enabled
lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch_v1()+0x25: call to memcpy() with UACCESS enabled
The stackleak and kasan options just need to be disabled for this file
as we do for other files already. For the stack protector, we already
attempt to disable it, but this fails on clang because the check is
mixed with the gcc specific -fno-conserve-stack option. According to
Andrey Ryabinin, that option is not even needed, dropping it here fixes
the stackprotector issue.
Link: http://lkml.kernel.org/r/20190722125139.1335385-1-arnd@arndb.de
Link: https://lore.kernel.org/lkml/20190617123109.667090-1-arnd@arndb.de/t/
Link: https://lore.kernel.org/lkml/20190722091050.2188664-1-arnd@arndb.de/t/
Fixes: d08965a27e ("x86/uaccess, ubsan: Fix UBSAN vs. SMAP")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
"howaboutsynergy" reported via kernel buzilla number 204165 that
compact_zone_order was consuming 100% CPU during a stress test for
prolonged periods of time. Specifically the following command, which
should exit in 10 seconds, was taking an excessive time to finish while
the CPU was pegged at 100%.
stress -m 220 --vm-bytes 1000000000 --timeout 10
Tracing indicated a pattern as follows
stress-3923 [007] 519.106208: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106212: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106216: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106219: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106223: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106227: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106231: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106235: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106238: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106242: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
Note that compaction is entered in rapid succession while scanning and
isolating nothing. The problem is that when a task that is compacting
receives a fatal signal, it retries indefinitely instead of exiting
while making no progress as a fatal signal is pending.
It's not easy to trigger this condition although enabling zswap helps on
the basis that the timing is altered. A very small window has to be hit
for the problem to occur (signal delivered while compacting and
isolating a PFN for migration that is not aligned to SWAP_CLUSTER_MAX).
This was reproduced locally -- 16G single socket system, 8G swap, 30%
zswap configured, vm-bytes 22000000000 using Colin Kings stress-ng
implementation from github running in a loop until the problem hits).
Tracing recorded the problem occurring almost 200K times in a short
window. With this patch, the problem hit 4 times but the task existed
normally instead of consuming CPU.
This problem has existed for some time but it was made worse by commit
cf66f0700c ("mm, compaction: do not consider a need to reschedule as
contention"). Before that commit, if the same condition was hit then
locks would be quickly contended and compaction would exit that way.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204165
Link: http://lkml.kernel.org/r/20190718085708.GE24383@techsingularity.net
Fixes: cf66f0700c ("mm, compaction: do not consider a need to reschedule as contention")
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org> [5.1+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
buffer_migrate_page_norefs() can race with bh users in the following
way:
CPU1 CPU2
buffer_migrate_page_norefs()
buffer_migrate_lock_buffers()
checks bh refs
spin_unlock(&mapping->private_lock)
__find_get_block()
spin_lock(&mapping->private_lock)
grab bh ref
spin_unlock(&mapping->private_lock)
move page do bh work
This can result in various issues like lost updates to buffers (i.e.
metadata corruption) or use after free issues for the old page.
This patch closes the race by holding mapping->private_lock while the
mapping is being moved to a new page. Ordinarily, a reference can be
taken outside of the private_lock using the per-cpu BH LRU but the
references are checked and the LRU invalidated if necessary. The
private_lock is held once the references are known so the buffer lookup
slow path will spin on the private_lock. Between the page lock and
private_lock, it should be impossible for other references to be
acquired and updates to happen during the migration.
A user had reported data corruption issues on a distribution kernel with
a similar page migration implementation as mainline. The data
corruption could not be reproduced with this patch applied. A small
number of migration-intensive tests were run and no performance problems
were noted.
[mgorman@techsingularity.net: Changelog, removed tracing]
Link: http://lkml.kernel.org/r/20190718090238.GF24383@techsingularity.net
Fixes: 89cb0888ca "mm: migrate: provide buffer_migrate_page_norefs()"
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Cc: <stable@vger.kernel.org> [5.0+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When running ltp's oom test with kmemleak enabled, the below warning was
triggerred since kernel detects __GFP_NOFAIL & ~__GFP_DIRECT_RECLAIM is
passed in:
WARNING: CPU: 105 PID: 2138 at mm/page_alloc.c:4608 __alloc_pages_nodemask+0x1c31/0x1d50
Modules linked in: loop dax_pmem dax_pmem_core ip_tables x_tables xfs virtio_net net_failover virtio_blk failover ata_generic virtio_pci virtio_ring virtio libata
CPU: 105 PID: 2138 Comm: oom01 Not tainted 5.2.0-next-20190710+ #7
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
RIP: 0010:__alloc_pages_nodemask+0x1c31/0x1d50
...
kmemleak_alloc+0x4e/0xb0
kmem_cache_alloc+0x2a7/0x3e0
mempool_alloc_slab+0x2d/0x40
mempool_alloc+0x118/0x2b0
bio_alloc_bioset+0x19d/0x350
get_swap_bio+0x80/0x230
__swap_writepage+0x5ff/0xb20
The mempool_alloc_slab() clears __GFP_DIRECT_RECLAIM, however kmemleak
has __GFP_NOFAIL set all the time due to d9570ee3bd ("kmemleak:
allow to coexist with fault injection"). But, it doesn't make any sense
to have __GFP_NOFAIL and ~__GFP_DIRECT_RECLAIM specified at the same
time.
According to the discussion on the mailing list, the commit should be
reverted for short term solution. Catalin Marinas would follow up with
a better solution for longer term.
The failure rate of kmemleak metadata allocation may increase in some
circumstances, but this should be expected side effect.
Link: http://lkml.kernel.org/r/1563299431-111710-1-git-send-email-yang.shi@linux.alibaba.com
Fixes: d9570ee3bd ("kmemleak: allow to coexist with fault injection")
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Qian Cai <cai@lca.pw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[Why]
Before this change, the fan control state on smu_v11 was not able to be
changed because the capability check for checking if the fan control
capability existed was inverted.
[How]
The capability check for fan control in smu_v11_0_auto_fan_control was
inverted, to correctly check for the absence, instead of presence of fan
control capabilities.
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Matt Coffin <mcoffin13@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Pull more drm fixes from Daniel Vetter:
"Dave sends his pull, everyone realizes they've been asleep at the
wheel and hits send on their own pulls :-/
Normally I'd just ignore these all because w/e for me and Dave. But
this time around the latecomers also included drm-intel-fixes, which
failed to send out a -fixes pull thus far for this release (screwed up
vacation coverage, despite that 2/3 maintainers were around ... they
all look appropriately guilty), and that really is overdue to get
landed.
And since I had to do a pull request anyway I pulled the other two
late ones too.
intel fixes (didn't have any ever since the main merge window pull):
- gvt fixes (2 cc: stable)
- fix gpu reset vs mm-shrinker vs wakeup fun (needed a few patches)
- two gem locking fixes (one cc: stable)
- pile of misc fixes all over with minor impact, 6 cc: stable, others
from this window
exynos:
- misc minor fixes
misc:
- some build/Kconfig fixes
- regression fix for vm scalability perf test which seems to mostly
exercise dmesg/console logging ...
- the vgem cache flush fix for arm64 broke the world on x86, so
that's reverted again
* tag 'drm-fixes-2019-08-02-1' of git://anongit.freedesktop.org/drm/drm: (42 commits)
Revert "drm/vgem: fix cache synchronization on arm/arm64"
drm/exynos: fix missing decrement of retry counter
drm/exynos: add CONFIG_MMU dependency
drm/exynos: remove redundant assignment to pointer 'node'
drm/exynos: using dev_get_drvdata directly
drm/bochs: Use shadow buffer for bochs framebuffer console
drm/fb-helper: Instanciate shadow FB if configured in device's mode_config
drm/fb-helper: Map DRM client buffer only when required
drm/client: Support unmapping of DRM client buffers
drm/i915: Only recover active engines
drm/i915: Add a wakeref getter for iff the wakeref is already active
drm/i915: Lift intel_engines_resume() to callers
drm/vgem: fix cache synchronization on arm/arm64
drm/i810: Use CONFIG_PREEMPTION
drm/bridge: tc358764: Fix build error
drm/bridge: lvds-encoder: Fix build error while CONFIG_DRM_KMS_HELPER=m
drm/i915/gvt: Adding ppgtt to GVT GEM context after shadow pdps settled.
drm/i915/gvt: grab runtime pm first for forcewake use
drm/i915/gvt: fix incorrect cache entry for guest page mapping
drm/i915/gvt: Checking workload's gma earlier
...
Pull selinux fix from Paul Moore:
"One more small fix for a potential memory leak in an error path"
* tag 'selinux-pr-20190801' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: fix memory leak in policydb_init()
In phy_start_aneg() autoneg is started, and immediately after that
link and autoneg status are read. As reported in [0] it can happen that
at time of this read the PHY has reset the "aneg complete" bit but not
yet the "link up" bit, what can result in a false link-up detection.
To fix this don't report link as up if we're in aneg mode and PHY
doesn't signal "aneg complete".
[0] https://marc.info/?t=156413509900003&r=1&w=2
Fixes: 4950c2ba49 ("net: phy: fix autoneg mismatch case in genphy_read_status")
Reported-by: liuyonglong <liuyonglong@huawei.com>
Tested-by: liuyonglong <liuyonglong@huawei.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Like FSL_ENETC, when CONFIG_FSL_ENETC_VF is set,
we should select PHYLIB, otherwise building still fails:
drivers/net/ethernet/freescale/enetc/enetc.o: In function `enetc_open':
enetc.c:(.text+0x2744): undefined reference to `phy_start'
enetc.c:(.text+0x282c): undefined reference to `phy_disconnect'
drivers/net/ethernet/freescale/enetc/enetc.o: In function `enetc_close':
enetc.c:(.text+0x28f8): undefined reference to `phy_stop'
enetc.c:(.text+0x2904): undefined reference to `phy_disconnect'
drivers/net/ethernet/freescale/enetc/enetc_ethtool.o:(.rodata+0x3f8): undefined reference to `phy_ethtool_get_link_ksettings'
drivers/net/ethernet/freescale/enetc/enetc_ethtool.o:(.rodata+0x400): undefined reference to `phy_ethtool_set_link_ksettings'
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: d4fd0404c1 ("enetc: Introduce basic PF and VF ENETC ethernet drivers")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
strncpy() does not ensure NULL-termination when the input string
size equals to the destination buffer size 30.
The output string is passed to qed_int_deassertion_aeu_bit()
which calls DP_INFO() and relies NULL-termination.
Use strlcpy instead. The other conditional branch above strncpy()
needs no fix as snprintf() ensures NULL-termination.
This issue is identified by a Coccinelle script.
Signed-off-by: Wang Xiayang <xywang.sjtu@sjtu.edu.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
board is controlled by user-space, hence leading to a potential
exploitation of the Spectre variant 1 vulnerability.
This issue was detected with the help of Smatch:
drivers/atm/iphase.c:2765 ia_ioctl() warn: potential spectre issue 'ia_dev' [r] (local cap)
drivers/atm/iphase.c:2774 ia_ioctl() warn: possible spectre second half. 'iadev'
drivers/atm/iphase.c:2782 ia_ioctl() warn: possible spectre second half. 'iadev'
drivers/atm/iphase.c:2816 ia_ioctl() warn: possible spectre second half. 'iadev'
drivers/atm/iphase.c:2823 ia_ioctl() warn: possible spectre second half. 'iadev'
drivers/atm/iphase.c:2830 ia_ioctl() warn: potential spectre issue '_ia_dev' [r] (local cap)
drivers/atm/iphase.c:2845 ia_ioctl() warn: possible spectre second half. 'iadev'
drivers/atm/iphase.c:2856 ia_ioctl() warn: possible spectre second half. 'iadev'
Fix this by sanitizing board before using it to index ia_dev and _ia_dev
Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].
[1] https://lore.kernel.org/lkml/20180423164740.GY17484@dhcp22.suse.cz/
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a race condition for an established connection that is being closed
by the guest: the refcnt is 4 at the end of hvs_release() (Note: here the
'remove_sock' is false):
1 for the initial value;
1 for the sk being in the bound list;
1 for the sk being in the connected list;
1 for the delayed close_work.
After hvs_release() finishes, __vsock_release() -> sock_put(sk) *may*
decrease the refcnt to 3.
Concurrently, hvs_close_connection() runs in another thread:
calls vsock_remove_sock() to decrease the refcnt by 2;
call sock_put() to decrease the refcnt to 0, and free the sk;
next, the "release_sock(sk)" may hang due to use-after-free.
In the above, after hvs_release() finishes, if hvs_close_connection() runs
faster than "__vsock_release() -> sock_put(sk)", then there is not any issue,
because at the beginning of hvs_close_connection(), the refcnt is still 4.
The issue can be resolved if an extra reference is taken when the
connection is established.
Fixes: a9eeb998c2 ("hv_sock: Add support for delayed close")
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Sunil Muthuswamy <sunilmut@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The hbmc-am654 driver is for the TI AM654, which is an ARM64 SoC, so
don't propose this driver on other architectures unless
build-testing.
Fixes: b07079f164 ("mtd: hyperbus: Add driver for TI's HyperBus memory controller")
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Cc: Vignesh Raghavendra <vigneshr@ti.com>
Cc: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
On x86_64, when CONFIG_OF is not disabled:
WARNING: unmet direct dependencies detected for MUX_MMIO
Depends on [n]: MULTIPLEXER [=y] && (OF [=n] || COMPILE_TEST [=n])
Selected by [y]:
- HBMC_AM654 [=y] && MTD [=y] && MTD_HYPERBUS [=y]
due to
config HBMC_AM654
tristate "HyperBus controller driver for AM65x SoC"
select MULTIPLEXER
select MUX_MMIO
Fix this by making HBMC_AM654 imply MUX_MMIO instead of select so
that dependencies are taken care of. MUX_MMIO is optional for
functioning of driver.
Fixes: b07079f164 ("mtd: hyperbus: Add driver for TI's HyperBus memory controller")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Some devices are not supposed to support on-die ECC but experience
shows that internal ECC machinery can actually be enabled through the
"SET FEATURE (EFh)" command, even if a read of the "READ ID Parameter
Tables" returns that it is not.
Currently, the driver checks the "READ ID Parameter" field directly
after having enabled the feature. If the check fails it returns
immediately but leaves the ECC on. When using buggy chips like
MT29F2G08ABAGA and MT29F2G08ABBGA, all future read/program cycles will
go through the on-die ECC, confusing the host controller which is
supposed to be the one handling correction.
To address this in a common way we need to turn off the on-die ECC
directly after reading the "READ ID Parameter" and before checking the
"ECC status".
Cc: stable@vger.kernel.org
Fixes: dbc44edbf8 ("mtd: rawnand: micron: Fix on-die ECC detection logic")
Signed-off-by: Marco Felsch <m.felsch@pengutronix.de>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Pull xen fixes from Juergen Gross:
- a small cleanup
- a fix for a build error on ARM with some configs
- a fix of a patch for the Xen gntdev driver
- three patches for fixing a potential problem in the swiotlb-xen
driver which Konrad was fine with me carrying them through the Xen
tree
* tag 'for-linus-5.3a-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/swiotlb: remember having called xen_create_contiguous_region()
xen/swiotlb: simplify range_straddles_page_boundary()
xen/swiotlb: fix condition for calling xen_destroy_contiguous_region()
xen: avoid link error on ARM
xen/gntdev.c: Replace vm_map_pages() with vm_map_pages_zero()
xen/pciback: remove set but not used variable 'old_state'
Pull arm64 fixes from Catalin Marinas:
- Update the compat layer to allow single-byte watchpoints on all
addresses (similar to the native support)
- arm_pmu: fix the restoration of the counters on the
CPU_PM_ENTER_FAILED path
- Fix build regression with vDSO and Makefile not stripping
CROSS_COMPILE_COMPAT
- Fix the CTR_EL0 (cache type register) sanitisation on heterogeneous
machines (e.g. big.LITTLE)
- Fix the interrupt controller priority mask value when pseudo-NMIs are
enabled
- arm64 kprobes fixes: recovering of the PSTATE.D flag in the
single-step exception handler, NOKPROBE annotations for
unwind_frame() and walk_stackframe(), remove unneeded
rcu_read_lock/unlock from debug handlers
- Several gcc fall-through warnings
- Unused variable warnings
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: Make debug exception handlers visible from RCU
arm64: kprobes: Recover pstate.D in single-step exception handler
arm64/mm: fix variable 'tag' set but not used
arm64/mm: fix variable 'pud' set but not used
arm64: Remove unneeded rcu_read_lock from debug handlers
arm64: unwind: Prohibit probing on return_address()
arm64: Lower priority mask for GIC_PRIO_IRQON
arm64/efi: fix variable 'si' set but not used
arm64: cpufeature: Fix feature comparison for CTR_EL0.{CWG,ERG}
arm64: vdso: Fix Makefile regression
arm64: module: Mark expected switch fall-through
arm64: smp: Mark expected switch fall-through
arm64: hw_breakpoint: Fix warnings about implicit fallthrough
drivers/perf: arm_pmu: Fix failure path in PM notifier
arm64: compat: Allow single-byte watchpoints on all addresses
Pull parisc fixes from Helge Deller:
"A few small fixes for the parisc architecture:
- Fix fall-through warnings in parisc math emu code
- Fix vmlinuz linking failure with debug-enabled kernels
- Fix a race condition in kernel live-patching code
- Add missing archclean Makefile target & defconfig adjustments"
* 'parisc-5.3-4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Add archclean Makefile target
parisc: Strip debug info from kernel before creating compressed vmlinuz
parisc: Fix build of compressed kernel even with debug enabled
parisc: fix race condition in patching code
parisc: rename default_defconfig to defconfig
parisc: Fix fall-through warnings in fpudispatch.c
parisc: Mark expected switch fall-throughs in fault.c
Pull s390 updates from Vasily Gorbik:
- Default configs updates
- Minor qdio cleanup
- Sparse warnings fixes
- Implicit-fallthrough warnings fixes
* tag 's390-5.3-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/zcrypt: adjust switch fall through comments for -Wimplicit-fallthrough
vfio-ccw: make vfio_ccw_async_region_ops static
s390/3215: add switch fall through comment for -Wimplicit-fallthrough
s390/tape: add fallthrough annotations
s390/mm: add fallthrough annotations
s390/mm: make gmap_test_and_clear_dirty_pmd static
s390/kexec: add missing include to machine_kexec_reloc.c
s390/perf: make cf_diag_csd static
s390/lib: add missing include
s390/boot: add missing declarations and includes
s390: update configs
s390: clean up qdio.h
Pull SCSI fixes from James Bottomley:
"Seven fixes to four drivers with no core changes.
The mpt3sas one is theoretical until we get a CPU that goes up to 64
bits physical, the qla2xxx one fixes an oops in a driver
initialization error leg and the others are mostly cosmetic"
[ The fcoe patches may be worth highlighting - they may be "just"
cleanups, but they simplify and fix the odd fc_rport_priv structure
handling rules so that the new gcc-9 warnings about memset crossing
structure boundaries are gone.
The old code was hard for humans to understand too, and really
confused the compiler sanity checks - Linus ]
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: qla2xxx: Fix possible fcport null-pointer dereferences
scsi: mpt3sas: Use 63-bit DMA addressing on SAS35 HBA
scsi: hpsa: remove printing internal cdb on tag collision
scsi: hpsa: correct scsi command status issue after reset
scsi: fcoe: pass in fcoe_rport structure instead of fc_rport_priv
scsi: fcoe: Embed fc_rport_priv in fcoe_rport structure
scsi: libfc: Whitespace cleanup in libfc.h
Pull block fixes from Jens Axboe:
"Here's a small collection of fixes that should go into this series.
This contains:
- io_uring potential use-after-free fix (Jackie)
- loop regression fix (Jan)
- O_DIRECT fragmented bio regression fix (Damien)
- Mark Denis as the new floppy maintainer (Denis)
- ataflop switch fall-through annotation (Gustavo)
- libata zpodd overflow fix (Kees)
- libata ahci deferred probe fix (Miquel)
- nbd invalidation BUG_ON() fix (Munehisa)
- dasd endless loop fix (Stefan)"
* tag 'for-linus-20190802' of git://git.kernel.dk/linux-block:
s390/dasd: fix endless loop after read unit address configuration
block: Fix __blkdev_direct_IO() for bio fragments
MAINTAINERS: floppy: take over maintainership
nbd: replace kill_bdev() with __invalidate_device() again
ata: libahci: do not complain in case of deferred probe
io_uring: fix KASAN use after free in io_sq_wq_submit_work
loop: Fix mount(2) failure due to race with LOOP_SET_FD
libata: zpodd: Fix small read overflow in zpodd_get_mech_type()
ataflop: Mark expected switch fall-through
Pull device mapper fixes from Mike Snitzer:
"Fix NULL pointer and various whitespace issues with DM's recent DAX
code changes from commit in 5.3 merge"
* tag 'for-5.3/dm-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm table: fix various whitespace issues with recent DAX code
dm table: fix dax_dev NULL dereference in device_synchronous()
Pull rdma fixes from Doug Ledford:
"Here's our second -rc pull request. Nothing particularly special in
this one. The client removal deadlock fix is kindy tricky, but we had
multiple eyes on it and no one could find a fault in it. A couple
Spectre V1 fixes too. Otherwise, all just normal -rc fodder:
- A couple Spectre V1 fixes (umad, hfi1)
- Fix a tricky deadlock in the rdma core code with refcounting
instead of locks (client removal patches)
- Build errors (hns)
- Fix a scheduling while atomic issue (mlx5)
- Use after free fix (mad)
- Fix error path return code (hns)
- Null deref fix (siw_crypto_hash)
- A few other misc. minor fixes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/hns: Fix error return code in hns_roce_v1_rsv_lp_qp()
RDMA/mlx5: Release locks during notifier unregister
IB/hfi1: Fix Spectre v1 vulnerability
IB/mad: Fix use-after-free in ib mad completion handling
RDMA/restrack: Track driver QP types in resource tracker
IB/mlx5: Fix MR registration flow to use UMR properly
RDMA/devices: Remove the lock around remove_client_context
RDMA/devices: Do not deadlock during client removal
IB/core: Add mitigation for Spectre V1
Do not dereference 'siw_crypto_shash' before checking
RDMA/qedr: Fix the hca_type and hca_rev returned in device attributes
RDMA/hns: Fix build error
Pull btrfs fixes from David Sterba:
- tiny race window during 2 transactions aborting at the same time can
accidentally lead to a commit
- regression fix, possible deadlock during fiemap
- fix for an old bug when incremental send can fail on a file that has
been deduplicated in a special way
* tag 'for-5.3-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
Btrfs: fix deadlock between fiemap and transaction commits
Btrfs: fix race leading to fs corruption after transaction abort
Btrfs: fix incremental send failure after deduplication
We shouldn't assume CPU physical address we get from page_to_phys()
is same as DMA address we get from dma_alloc_coherent(). On x86_64,
we won't run into any problem with the assumption when dma_ops is
nommu_dma_ops. However, DMA address is IOVA when IOMMU is enabled.
And it's most likely different from CPU physical address when AMD
IOMMU is not in passthrough mode.
This patch fixes page faults when IOMMU is enabled.
Signed-off-by: Vijendar Mukunda <vijendar.mukunda@amd.com>
Link: https://lore.kernel.org/r/1564753899-17124-2-git-send-email-Vijendar.Mukunda@amd.com
Signed-off-by: Mark Brown <broonie@kernel.org>
AMD platform device acp3x_rv_i2s created by parent PCI device
driver. Pass struct device of the parent to
snd_pcm_lib_preallocate_pages() so dma_alloc_coherent() can use
correct dma_ops. Otherwise, it will use default dma_ops which
is nommu_dma_ops on x86_64 even when IOMMU is enabled and
set to non passthrough mode.
Signed-off-by: Vijendar Mukunda <vijendar.mukunda@amd.com>
Link: https://lore.kernel.org/r/1564753899-17124-1-git-send-email-Vijendar.Mukunda@amd.com
Signed-off-by: Mark Brown <broonie@kernel.org>
TCPM may receive PD messages associated with unknown or unsupported
alternate modes. If that happens, calls to typec_match_altmode()
will return NULL. The tcpm code does not currently take this into
account. This results in crashes.
Unable to handle kernel NULL pointer dereference at virtual address 000001f0
pgd = 41dad9a1
[000001f0] *pgd=00000000
Internal error: Oops: 5 [#1] THUMB2
Modules linked in: tcpci tcpm
CPU: 0 PID: 2338 Comm: kworker/u2:0 Not tainted 5.1.18-sama5-armv7-r2 #6
Hardware name: Atmel SAMA5
Workqueue: 2-0050 tcpm_pd_rx_handler [tcpm]
PC is at typec_altmode_attention+0x0/0x14
LR is at tcpm_pd_rx_handler+0xa3b/0xda0 [tcpm]
...
[<c03fbee8>] (typec_altmode_attention) from [<bf8030fb>]
(tcpm_pd_rx_handler+0xa3b/0xda0 [tcpm])
[<bf8030fb>] (tcpm_pd_rx_handler [tcpm]) from [<c012082b>]
(process_one_work+0x123/0x2a8)
[<c012082b>] (process_one_work) from [<c0120a6d>]
(worker_thread+0xbd/0x3b0)
[<c0120a6d>] (worker_thread) from [<c012431f>] (kthread+0xcf/0xf4)
[<c012431f>] (kthread) from [<c01010f9>] (ret_from_fork+0x11/0x38)
Ignore PD messages if the associated alternate mode is not supported.
Fixes: e9576fe8e6 ("usb: typec: tcpm: Support for Alternate Modes")
Cc: stable <stable@vger.kernel.org>
Reported-by: Douglas Gilbert <dgilbert@interlog.com>
Cc: Douglas Gilbert <dgilbert@interlog.com>
Acked-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Tested-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/1564761822-13984-1-git-send-email-linux@roeck-us.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Usb core will reset the default control endpoint "ep0" before resetting
a device. if the endpoint has a valid pointer back to the usb device
then the xhci driver reset callback will try to clear the toggle for
the endpoint.
ep0 didn't use to have this pointer set as ep0 was always allocated
by default together with a xhci slot for the usb device. Other endpoints
got their usb device pointer set in xhci_add_endpoint()
This changed with commit ef513be0a9 ("usb: xhci: Add Clear_TT_Buffer")
which sets the pointer for any endpoint on a FS/LS device behind a
HS hub that halts, including ep0.
If xHC controller needs to be reset at resume, then all the xhci slots
will be lost. Slots will be reenabled and reallocated at device reset,
but unlike other endpoints the ep0 is reset before device reset, while
the xhci slot may still be invalid, causing NULL pointer dereference.
Fix it by checking that the endpoint has both a usb device pointer and
valid xhci slot before trying to clear the toggle.
This issue was not seen earlier as ep0 didn't use to have a valid usb
device pointer, and other endpoints were only reset after device reset
when xhci slots were properly reenabled.
Reported-by: Bob Gleitsmann <rjgleits@bellsouth.net>
Reported-by: Enric Balletbo Serra <eballetbo@gmail.com>
Fixes: ef513be0a9 ("usb: xhci: Add Clear_TT_Buffer")
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Tested-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
Link: https://lore.kernel.org/r/1564758044-24748-1-git-send-email-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When a USB device is connected to the host controller and
the system enters suspend, the following error happens
in xhci_suspend():
xhci-hcd ee000000.usb: WARN: xHC CMD_RUN timeout
Since the firmware/internal CPU control the USBSTS.STS_HALT
and the process speed is down when the roothub port enters U3,
long delay for the handshake of STS_HALT is neeed in xhci_suspend().
So, this patch adds to set the XHCI_SLOW_SUSPEND.
Fixes: 435cc1138e ("usb: host: xhci-plat: set resume_quirk() for R-Car controllers")
Cc: <stable@vger.kernel.org> # v4.12+
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Link: https://lore.kernel.org/r/1564734815-17964-1-git-send-email-yoshihiro.shimoda.uh@renesas.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull gfs2 fix from Andreas Gruenbacher:
"Fix gfs2 cluster coherency bug"
* tag 'gfs2-v5.3-rc2.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: Inode dirtying fix
Pull power management fix from Rafael Wysocki:
"Fix recent regression affecting ACPI device power management"
* tag 'pm-5.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: PM: Fix regression in acpi_device_set_power()
Pull sound fixes from Takashi Iwai:
- A further fix for syzcaller issues with USB-audio, addressing NULL
dereference that was introduced by the recent fix
- Avoid a long delay at boot with HD-audio when i915 module was built
but not installed, found on some Debian systems
- A fix of small race window at PCM draining
* tag 'sound-5.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: usb-audio: Fix gpf in snd_usb_pipe_sanity_check
ALSA: pcm: fix lost wakeup event scenarios in snd_pcm_drain
ALSA: hda: Fix 1-minute detection delay when i915 module is not available
Pull drm fixes from Dave Airlie:
"Thanks to Daniel for handling the email the last couple of weeks, flus
and break-ins combined to derail me. Surprised nothing materialised
today to take me out again.
Just more amdgpu navi fixes, msm fixes and a single nouveau regression
fix:
amdgpu:
- navi10 temperature and pstate fixes
- vcn dynamic power management fix
- CS ioctl error handling fix
- debugfs info leak fix
- amdkfd VegaM fix
msm:
- dma sync call fix
- mdp5 dsi command mode fix
- fall-through fixes
- disabled GPU fix
nouveau:
- regression fix for displayport MST support"
* tag 'drm-fixes-2019-08-02' of git://anongit.freedesktop.org/drm/drm:
drm/nouveau: Only release VCPI slots on mode changes
drm: msm: Fix add_gpu_components
drm/msm: Annotate intentional switch statement fall throughs
drm/msm: add support for per-CRTC max_vblank_count on mdp5
drm/msm: Use the correct dma_sync calls in msm_gem
drm/amd/powerplay: correct UVD/VCE/VCN power status retrieval
drm/amd/powerplay: correct Navi10 VCN powergate control (v2)
drm/amd/powerplay: support VCN powergate status retrieval for SW SMU
drm/amd/powerplay: support VCN powergate status retrieval on Raven
drm/amd/powerplay: add new sensor type for VCN powergate status
drm/amdgpu: fix a potential information leaking bug
drm/amdgpu: fix error handling in amdgpu_cs_process_fence_dep
drm/amd/powerplay: enable SW SMU reset functionality
drm/amd/powerplay: fix null pointer dereference around dpm state relates
drm/amdgpu/powerplay: use proper revision id for navi
drm/amd/powerplay: fix temperature granularity error in smu11
drm/amd/powerplay: add callback function of get_thermal_temperature_range
drm/amdkfd: Fix byte align on VegaM
Pull clk fixes from Stephen Boyd:
"A few fixes for code that came in during the merge window or that
started getting exercised differently this time around:
- Select regmap MMIO kconfig in spreadtrum driver to avoid compile
errors
- Complete kerneldoc on devm_clk_bulk_get_optional()
- Register an essential clk earlier on mediatek mt8183 SoCs so the
clocksource driver can use it
- Fix divisor math in the at91 driver
- Plug a race in Renesas reset control logic"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: renesas: cpg-mssr: Fix reset control race condition
clk: sprd: Select REGMAP_MMIO to avoid compile errors
clk: mediatek: mt8183: Register 13MHz clock earlier for clocksource
clk: Add missing documentation of devm_clk_bulk_get_optional() argument
clk: at91: generated: Truncate divisor to GENERATED_MAX_DIV + 1
Pull arm swiotlb support from Christoph Hellwig:
"This fixes a cascade of regressions that originally started with the
addition of the ia64 port, but only got fatal once we removed most
uses of block layer bounce buffering in Linux 4.18.
The reason is that while the original i386/PAE code that was the first
architecture that supported > 4GB of memory without an iommu decided
to leave bounce buffering to the subsystems, which in those days just
mean block and networking as no one else consumed arbitrary userspace
memory.
Later with ia64, x86_64 and other ports we assumed that either an
iommu or something that fakes it up ("software IOTLB" in beautiful
Intel speak) is present and that subsystems can rely on that for
dealing with addressing limitations in devices. Except that the ARM
LPAE scheme that added larger physical address to 32-bit ARM did not
follow that scheme and thus only worked by chance and only for block
and networking I/O directly to highmem.
Long story, short fix - add swiotlb support to arm when build for LPAE
platforms, which actuallys turns out to be pretty trivial with the
modern dma-direct / swiotlb code to fix the Linux 4.18-ish regression"
* tag 'arm-swiotlb-5.3' of git://git.infradead.org/users/hch/dma-mapping:
arm: use swiotlb for bounce buffering on LPAE configs
dma-mapping: check pfn validity in dma_common_{mmap,get_sgtable}
Pull dma-mapping regression fixes from Christoph Hellwig:
"Two related regression fixes for changes from this merge window to fix
alignment issues introduced in the CMA allocation rework (Nicolin
Chen)"
* tag 'dma-mapping-5.3-3' of git://git.infradead.org/users/hch/dma-mapping:
dma-contiguous: page-align the size in dma_free_contiguous()
dma-contiguous: do not overwrite align in dma_alloc_contiguous()
VCN 2.0 firmware now requires a packet start command to be sent before
any other decode ring buffer command.
Signed-off-by: Thong Thai <thong.thai@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Sets the CMD_SOURCE bit for VCN 2.0 Decoder Ring Buffer commands. This
bit was previously set by the RBC HW on older firmware. Newer firmware
uses a SW RBC and this bit has to be set by the driver.
Signed-off-by: Thong Thai <thong.thai@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Silence the following warnings when built with -Wimplicit-fallthrough=3
enabled by default since 5.3-rc2:
In file included from ./include/linux/preempt.h:11,
from ./include/linux/spinlock.h:51,
from ./include/linux/mmzone.h:8,
from ./include/linux/gfp.h:6,
from ./include/linux/slab.h:15,
from drivers/s390/crypto/ap_queue.c:13:
drivers/s390/crypto/ap_queue.c: In function 'ap_sm_recv':
./include/linux/list.h:577:2: warning: this statement may fall through [-Wimplicit-fallthrough=]
577 | for (pos = list_first_entry(head, typeof(*pos), member); \
| ^~~
drivers/s390/crypto/ap_queue.c:147:3: note: in expansion of macro 'list_for_each_entry'
147 | list_for_each_entry(ap_msg, &aq->pendingq, list) {
| ^~~~~~~~~~~~~~~~~~~
drivers/s390/crypto/ap_queue.c:155:2: note: here
155 | case AP_RESPONSE_NO_PENDING_REPLY:
| ^~~~
drivers/s390/crypto/zcrypt_msgtype6.c: In function 'convert_response_ep11_xcrb':
drivers/s390/crypto/zcrypt_msgtype6.c:871:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
871 | if (msg->cprbx.cprb_ver_id == 0x04)
| ^
drivers/s390/crypto/zcrypt_msgtype6.c:874:2: note: here
874 | default: /* Unknown response type, this should NEVER EVER happen */
| ^~~~~~~
drivers/s390/crypto/zcrypt_msgtype6.c: In function 'convert_response_rng':
drivers/s390/crypto/zcrypt_msgtype6.c:901:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
901 | if (msg->cprbx.cprb_ver_id == 0x02)
| ^
drivers/s390/crypto/zcrypt_msgtype6.c:907:2: note: here
907 | default: /* Unknown response type, this should NEVER EVER happen */
| ^~~~~~~
drivers/s390/crypto/zcrypt_msgtype6.c: In function 'convert_response_xcrb':
drivers/s390/crypto/zcrypt_msgtype6.c:838:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
838 | if (msg->cprbx.cprb_ver_id == 0x02)
| ^
drivers/s390/crypto/zcrypt_msgtype6.c:844:2: note: here
844 | default: /* Unknown response type, this should NEVER EVER happen */
| ^~~~~~~
drivers/s390/crypto/zcrypt_msgtype6.c: In function 'convert_response_ica':
drivers/s390/crypto/zcrypt_msgtype6.c:801:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
801 | if (msg->cprbx.cprb_ver_id == 0x02)
| ^
drivers/s390/crypto/zcrypt_msgtype6.c:808:2: note: here
808 | default: /* Unknown response type, this should NEVER EVER happen */
| ^~~~~~~
Acked-by: Patrick Steuer <patrick.steuer@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
strncpy() does not ensure NULL-termination when the input string size
equals to the destination buffer size IFNAMSIZ. The output string is
passed to dev_info() which relies on the NULL-termination.
Use strlcpy() instead.
This issue is identified by a Coccinelle script.
Signed-off-by: Wang Xiayang <xywang.sjtu@sjtu.edu.cn>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
strncpy() does not ensure NULL-termination when the input string size
equals to the destination buffer size IFNAMSIZ. The output string
'name' is passed to dev_info which relies on NULL-termination.
Use strlcpy() instead.
This issue is identified by a Coccinelle script.
Signed-off-by: Wang Xiayang <xywang.sjtu@sjtu.edu.cn>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Make debug exceptions visible from RCU so that synchronize_rcu()
correctly track the debug exception handler.
This also introduces sanity checks for user-mode exceptions as same
as x86's ist_enter()/ist_exit().
The debug exception can interrupt in idle task. For example, it warns
if we put a kprobe on a function called from idle task as below.
The warning message showed that the rcu_read_lock() caused this
problem. But actually, this means the RCU is lost the context which
is already in NMI/IRQ.
/sys/kernel/debug/tracing # echo p default_idle_call >> kprobe_events
/sys/kernel/debug/tracing # echo 1 > events/kprobes/enable
/sys/kernel/debug/tracing # [ 135.122237]
[ 135.125035] =============================
[ 135.125310] WARNING: suspicious RCU usage
[ 135.125581] 5.2.0-08445-g9187c508bdc7 #20 Not tainted
[ 135.125904] -----------------------------
[ 135.126205] include/linux/rcupdate.h:594 rcu_read_lock() used illegally while idle!
[ 135.126839]
[ 135.126839] other info that might help us debug this:
[ 135.126839]
[ 135.127410]
[ 135.127410] RCU used illegally from idle CPU!
[ 135.127410] rcu_scheduler_active = 2, debug_locks = 1
[ 135.128114] RCU used illegally from extended quiescent state!
[ 135.128555] 1 lock held by swapper/0/0:
[ 135.128944] #0: (____ptrval____) (rcu_read_lock){....}, at: call_break_hook+0x0/0x178
[ 135.130499]
[ 135.130499] stack backtrace:
[ 135.131192] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.2.0-08445-g9187c508bdc7 #20
[ 135.131841] Hardware name: linux,dummy-virt (DT)
[ 135.132224] Call trace:
[ 135.132491] dump_backtrace+0x0/0x140
[ 135.132806] show_stack+0x24/0x30
[ 135.133133] dump_stack+0xc4/0x10c
[ 135.133726] lockdep_rcu_suspicious+0xf8/0x108
[ 135.134171] call_break_hook+0x170/0x178
[ 135.134486] brk_handler+0x28/0x68
[ 135.134792] do_debug_exception+0x90/0x150
[ 135.135051] el1_dbg+0x18/0x8c
[ 135.135260] default_idle_call+0x0/0x44
[ 135.135516] cpu_startup_entry+0x2c/0x30
[ 135.135815] rest_init+0x1b0/0x280
[ 135.136044] arch_call_rest_init+0x14/0x1c
[ 135.136305] start_kernel+0x4d4/0x500
[ 135.136597]
So make debug exception visible to RCU can fix this warning.
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Acked-by: Paul E. McKenney <paulmck@linux.ibm.com>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
kprobes manipulates the interrupted PSTATE for single step, and
doesn't restore it. Thus, if we put a kprobe where the pstate.D
(debug) masked, the mask will be cleared after the kprobe hits.
Moreover, in the most complicated case, this can lead a kernel
crash with below message when a nested kprobe hits.
[ 152.118921] Unexpected kernel single-step exception at EL1
When the 1st kprobe hits, do_debug_exception() will be called.
At this point, debug exception (= pstate.D) must be masked (=1).
But if another kprobes hits before single-step of the first kprobe
(e.g. inside user pre_handler), it unmask the debug exception
(pstate.D = 0) and return.
Then, when the 1st kprobe setting up single-step, it saves current
DAIF, mask DAIF, enable single-step, and restore DAIF.
However, since "D" flag in DAIF is cleared by the 2nd kprobe, the
single-step exception happens soon after restoring DAIF.
This has been introduced by commit 7419333fa1 ("arm64: kprobe:
Always clear pstate.D in breakpoint exception handler")
To solve this issue, this stores all DAIF bits and restore it
after single stepping.
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Fixes: 7419333fa1 ("arm64: kprobe: Always clear pstate.D in breakpoint exception handler")
Reviewed-by: James Morse <james.morse@arm.com>
Tested-by: James Morse <james.morse@arm.com>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Currently the retry counter is not being decremented, leading to a
potential infinite spin if the scalar_reads don't change state.
Addresses-Coverity: ("Infinite loop")
Fixes: 280e54c9f6 ("drm/exynos: scaler: Reset hardware before starting the operation")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
Compile-testing this driver on a NOMMU configuration shows a link failure:
drivers/gpu/drm/exynos/exynos_drm_gem.o: In function `exynos_drm_gem_fault':
exynos_drm_gem.c:(.text+0x484): undefined reference to `vmf_insert_mixed'
Add a CONFIG_MMU dependency to ensure we only enable this in configurations
that build correctly.
Many other drm drivers have the same dependency. It would be nice to
make this work in MMU-less configurations, but evidently nobody has
ever needed this so far.
Fixes: 156bdac990 ("drm/exynos: trigger build of all modules")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Inki Dae <daeinki@gmail.com>
The pointer 'node' is being assigned with a value that is never
read and is re-assigned later. The assignment is redundant and
can be removed.
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Inki Dae <daeinki@gmail.com>
Several drivers cast a struct device pointer to a struct
platform_device pointer only to then call platform_get_drvdata().
To improve readability, these constructs can be simplified
by using dev_get_drvdata() directly.
Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
AES GCM input buffers for decryption contain AAD+CTEXT+TAG. Only
decrypt the ciphertext, and use the tag for comparison.
Fixes: 36cf515b9b ("crypto: ccp - Enable support for AES GCM on v5 CCPs")
Cc: <stable@vger.kernel.org>
Signed-off-by: Gary R Hook <gary.hook@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
AES GCM encryption allows for authsize values of 4, 8, and 12-16 bytes.
Validate the requested authsize, and retain it to save in the request
context.
Fixes: 36cf515b9b ("crypto: ccp - Enable support for AES GCM on v5 CCPs")
Cc: <stable@vger.kernel.org>
Signed-off-by: Gary R Hook <gary.hook@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
A plaintext or ciphertext length of 0 is allowed in AES, in which case
no encryption occurs. Ensure that we don't clean up data structures
that were never allocated.
Fixes: 36cf515b9b ("crypto: ccp - Enable support for AES GCM on v5 CCPs")
Cc: <stable@vger.kernel.org>
Signed-off-by: Gary R Hook <gary.hook@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
After getting a storage server event that causes the DASD device driver
to update its unit address configuration during a device shutdown there is
the possibility of an endless loop in the device driver.
In the system log there will be ongoing DASD error messages with RC: -19.
The reason is that the loop starting the ruac request only terminates when
the retry counter is decreased to 0. But in the sleep_on function there are
early exit paths that do not decrease the retry counter.
Prevent an endless loop by handling those cases separately.
Remove the unnecessary do..while loop since the sleep_on function takes
care of retries by itself.
Fixes: 8e09f21574 ("[S390] dasd: add hyper PAV support to DASD device driver, part 1")
Cc: stable@vger.kernel.org # 2.6.25+
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Looks like a regression got introduced into nv50_mstc_atomic_check()
that somehow didn't get found until now. If userspace changes
crtc_state->active to false but leaves the CRTC enabled, we end up
calling drm_dp_atomic_find_vcpi_slots() using the PBN calculated in
asyh->dp.pbn. However, if the display is inactive we end up calculating
a PBN of 0, which inadvertently causes us to have an allocation of 0.
>From there, if userspace then disables the CRTC afterwards we end up
accidentally attempting to free the VCPI twice:
WARNING: CPU: 0 PID: 1484 at drivers/gpu/drm/drm_dp_mst_topology.c:3336
drm_dp_atomic_release_vcpi_slots+0x87/0xb0 [drm_kms_helper]
RIP: 0010:drm_dp_atomic_release_vcpi_slots+0x87/0xb0 [drm_kms_helper]
Call Trace:
drm_atomic_helper_check_modeset+0x3f3/0xa60 [drm_kms_helper]
? drm_atomic_check_only+0x43/0x780 [drm]
drm_atomic_helper_check+0x15/0x90 [drm_kms_helper]
nv50_disp_atomic_check+0x83/0x1d0 [nouveau]
drm_atomic_check_only+0x54d/0x780 [drm]
? drm_atomic_set_crtc_for_connector+0xec/0x100 [drm]
drm_atomic_commit+0x13/0x50 [drm]
drm_atomic_helper_set_config+0x81/0x90 [drm_kms_helper]
drm_mode_setcrtc+0x194/0x6a0 [drm]
? vprintk_emit+0x16a/0x230
? drm_ioctl+0x163/0x390 [drm]
? drm_mode_getcrtc+0x180/0x180 [drm]
drm_ioctl_kernel+0xaa/0xf0 [drm]
drm_ioctl+0x208/0x390 [drm]
? drm_mode_getcrtc+0x180/0x180 [drm]
nouveau_drm_ioctl+0x63/0xb0 [nouveau]
do_vfs_ioctl+0x405/0x660
? recalc_sigpending+0x17/0x50
? _copy_from_user+0x37/0x60
ksys_ioctl+0x5e/0x90
? exit_to_usermode_loop+0x92/0xe0
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x59/0x190
entry_SYSCALL_64_after_hwframe+0x44/0xa9
WARNING: CPU: 0 PID: 1484 at drivers/gpu/drm/drm_dp_mst_topology.c:3336
drm_dp_atomic_release_vcpi_slots+0x87/0xb0 [drm_kms_helper]
---[ end trace 4c395c0c51b1f88d ]---
[drm:drm_dp_atomic_release_vcpi_slots [drm_kms_helper]] *ERROR* no VCPI for
[MST PORT:00000000e288eb7d] found in mst state 000000008e642070
So, fix this by doing what we probably should have done from the start: only
call drm_dp_atomic_find_vcpi_slots() when crtc_state->mode_changed is set, so
that VCPI allocations remain for as long as the CRTC is enabled.
Signed-off-by: Lyude Paul <lyude@redhat.com>
Fixes: 232c9eec41 ("drm/nouveau: Use atomic VCPI helpers for MST")
Cc: Lyude Paul <lyude@redhat.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Airlie <airlied@redhat.com>
Cc: Jerry Zuo <Jerry.Zuo@amd.com>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Juston Li <juston.li@intel.com>
Cc: Karol Herbst <karolherbst@gmail.com>
Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: <stable@vger.kernel.org> # v5.1+
Acked-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190801220216.15323-1-lyude@redhat.com
Commit 2753ca5d90 ("tipc: fix uninit-value in tipc_nl_compat_doit")
broke older tipc tools that use compat interface (e.g. tipc-config from
tipcutils package):
% tipc-config -p
operation not supported
The commit started to reject TIPC netlink compat messages that do not
have attributes. It is too restrictive because some of such messages are
valid (they don't need any arguments):
% grep 'tx none' include/uapi/linux/tipc_config.h
#define TIPC_CMD_NOOP 0x0000 /* tx none, rx none */
#define TIPC_CMD_GET_MEDIA_NAMES 0x0002 /* tx none, rx media_name(s) */
#define TIPC_CMD_GET_BEARER_NAMES 0x0003 /* tx none, rx bearer_name(s) */
#define TIPC_CMD_SHOW_PORTS 0x0006 /* tx none, rx ultra_string */
#define TIPC_CMD_GET_REMOTE_MNG 0x4003 /* tx none, rx unsigned */
#define TIPC_CMD_GET_MAX_PORTS 0x4004 /* tx none, rx unsigned */
#define TIPC_CMD_GET_NETID 0x400B /* tx none, rx unsigned */
#define TIPC_CMD_NOT_NET_ADMIN 0xC001 /* tx none, rx none */
This patch relaxes the original fix and rejects messages without
arguments only if such arguments are expected by a command (reg_type is
non zero).
Fixes: 2753ca5d90 ("tipc: fix uninit-value in tipc_nl_compat_doit")
Cc: stable@vger.kernel.org
Signed-off-by: Taras Kondratiuk <takondra@cisco.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit f850a48a07 ("ACPI: PM: Allow transitions to D0 to occur in
special cases") overlooked the fact that acpi_power_transition() may
change the power.state value for the target device and if that
happens, it may confuse acpi_device_set_power() and cause it to
omit the _PS0 evaluation which on some systems is necessary to
change power states of devices from low-power to D0.
Fix that by saving the current value of power.state for the
target device before passing it to acpi_power_transition() and
using the saved value in a subsequent check.
Fixes: f850a48a07 ("ACPI: PM: Allow transitions to D0 to occur in special cases")
Reported-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Reported-by: Mario Limonciello <mario.limonciello@dell.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Tested-by: Mario Limonciello <mario.limonciello@dell.com>
Mark switch cases where we are expecting to fall through.
This patch fixes the following warning:
drivers/i2c/busses/i2c-s3c2410.c: In function 'i2c_s3c_irq_nextbyte':
drivers/i2c/busses/i2c-s3c2410.c:431:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
if (i2c->state == STATE_READ)
^
drivers/i2c/busses/i2c-s3c2410.c:439:2: note: here
case STATE_WRITE:
^~~~
Notice that, in this particular case, the code comment is
modified in accordance with what GCC is expecting to find.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
In SAMA5D2 datasheet, TWIHS_CWGR register rescription mentions clock
offset of 3 cycles (compared to 4 in eg. SAMA5D3).
Cc: stable@vger.kernel.org # 5.2.x
[needs applying to i2c-at91.c instead for earlier kernels]
Fixes: 0ef6f3213d ("i2c: at91: add support for new alternative command mode")
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Acked-by: Ludovic Desroches <ludovic.desroches@microchip.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Driver was not disabling TXRDY interrupt after last TX byte.
This caused interrupt storm until transfer timeouts for slow
or broken device on the bus. The patch fixes the interrupt storm
on my SAMA5D2-based board.
Cc: stable@vger.kernel.org # 5.2.x
[v5.2 introduced file split; the patch should apply to i2c-at91.c before the split]
Fixes: fac368a040 ("i2c: at91: add new driver")
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Acked-by: Ludovic Desroches <ludovic.desroches@microchip.com>
Tested-by: Raag Jadav <raagjadav@gmail.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Add 2 tests that check JIT code generation to jumps to 1st insn.
1st test is similar to syzbot reproducer.
The backwards branch is never taken at runtime.
2nd test has branch to 1st insn that executes.
The test is written as two bpf functions, since it's not possible
to construct valid single bpf program that jumps to 1st insn.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Introduction of bounded loops exposed old bug in x64 JIT.
JIT maintains the array of offsets to the end of all instructions to
compute jmp offsets.
addrs[0] - offset of the end of the 1st insn (that includes prologue).
addrs[1] - offset of the end of the 2nd insn.
JIT didn't keep the offset of the beginning of the 1st insn,
since classic BPF didn't have backward jumps and valid extended BPF
couldn't have a branch to 1st insn, because it didn't allow loops.
With bounded loops it's possible to construct a valid program that
jumps backwards to the 1st insn.
Fix JIT by computing:
addrs[0] - offset of the end of prologue == start of the 1st insn.
addrs[1] - offset of the end of 1st insn.
v1->v2:
- Yonghong noticed a bug in jit linfo.
Fix it by passing 'addrs + 1' to bpf_prog_fill_jited_linfo(),
since it expects insn_to_jit_off array to be offsets to last byte.
Reported-by: syzbot+35101610ff3e83119b1b@syzkaller.appspotmail.com
Fixes: 2589726d12 ("bpf: introduce bounded loops")
Fixes: 0a14842f5a ("net: filter: Just In Time compiler for x86-64")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
5d01ab7bac ("libbpf: fix erroneous multi-closing of BTF FD")
introduced backwards-compatibility issue, manifesting itself as -E2BIG
error returned on program load due to unknown non-zero btf_fd attribute
value for BPF_PROG_LOAD sys_bpf() sub-command.
This patch fixes bug by ensuring that we only ever associate BTF FD with
program if there is a BTF.ext data that was successfully loaded into
kernel, which automatically means kernel supports func_info/line_info
and associated BTF FD for progs (checked and ensured also by BTF
sanitization code).
Fixes: 5d01ab7bac ("libbpf: fix erroneous multi-closing of BTF FD")
Reported-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The recent fix to properly handle IOCB_NOWAIT for async O_DIRECT IO
(patch 6a43074e2f) introduced two problems with BIO fragment handling
for direct IOs:
1) The dio size processed is calculated by incrementing the ret variable
by the size of the bio fragment issued for the dio. However, this size
is obtained directly from bio->bi_iter.bi_size AFTER the bio submission
which may result in referencing the bi_size value after the bio
completed, resulting in an incorrect value use.
2) The ret variable is not incremented by the size of the last bio
fragment issued for the bio, leading to an invalid IO size being
returned to the user.
Fix both problem by using dio->size (which is incremented before the bio
submission) to update the value of ret after bio submissions, including
for the last bio fragment issued.
Fixes: 6a43074e2f ("block: properly handle IOCB_NOWAIT for async O_DIRECT IO")
Reported-by: Masato Suzuki <masato.suzuki@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
While running the linux-next with CONFIG_DEBUG_LOCKS_ALLOC enabled,
I get the following splat.
BUG: key ffffcb5636929298 has not been registered!
------------[ cut here ]------------
DEBUG_LOCKS_WARN_ON(1)
WARNING: CPU: 1 PID: 53 at kernel/locking/lockdep.c:3669 lockdep_init_map+0x164/0x1f0
CPU: 1 PID: 53 Comm: kworker/1:1 Tainted: G W 5.2.0-next-20190712-00015-g00ad4634222e-dirty #603
Workqueue: events amba_deferred_retry_func
pstate: 60c00005 (nZCv daif +PAN +UAO)
pc : lockdep_init_map+0x164/0x1f0
lr : lockdep_init_map+0x164/0x1f0
[ trimmed ]
Call trace:
lockdep_init_map+0x164/0x1f0
__kernfs_create_file+0x9c/0x158
sysfs_add_file_mode_ns+0xa8/0x1d0
sysfs_add_file_to_group+0x88/0xd8
etm_perf_add_symlink_sink+0xcc/0x138
coresight_register+0x110/0x280
tmc_probe+0x160/0x420
[ trimmed ]
---[ end trace ab4cc669615ba1b0 ]---
Fix this by initialising the dynamically allocated attribute properly.
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Fixes: bb8e370bdc ("coresight: perf: Add "sinks" group to PMU directory")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
[Fixed a typograhic error in the changelog]
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Link: https://lore.kernel.org/r/20190801172323.18359-2-mathieu.poirier@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull irqchip fixes from Marc Zyngier:
A small bunch of fixes from the irqchip department:
- Fix a couple of UAF on error paths (RZA1, GICv3 ITS)
- Fix iMX GPCv2 trigger setting
- Add missing of_node_put on error path in MBIGEN
- Add another bunch of /* fall-through */ to silence warnings
Geert Uytterhoeven says:
====================
net: Manufacturer names and spelling fixes
This is a set of fixes for (some blatantly) wrong manufacturer names and
various spelling issues, mostly in Kconfig help texts.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The help text refers to AMD instead of Broadcom, presumably because it
was copied from the former.
Fixes: adfc5217e9 ("broadcom: Move the Broadcom drivers")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
The help text refers to IBM instead of Apple, presumably because it was
copied from the former.
Fixes: 8fb6b09081 ("bmac/mace/macmace/mac89x0/cs89x0: Move the Macintosh (Apple) drivers")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
The help text refers to Western Digital instead of National
Semiconductor 8390, presumably because it was copied from the former.
Fixes: 644570b830 ("8390: Move the 8390 related drivers")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
add_gpu_components() adds found GPU nodes from the DT to the match list,
regardless of the status of the nodes. This is a problem, because if the
nodes are disabled, they should not be on the match list because they will
not be matched. This prevents display from initing if a GPU node is
defined, but it's status is disabled.
Fix this by checking the node's status before adding it to the match list.
Fixes: dc3ea265b8 (drm/msm: Drop the gpu binding)
Reviewed-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Jeffrey Hugo <jeffrey.l.hugo@gmail.com>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20190626180015.45242-1-jeffrey.l.hugo@gmail.com
We encountered a use-after-free bug when unloading the driver:
[ 3562.116059] BUG: KASAN: use-after-free in ib_mad_post_receive_mads+0xddc/0xed0 [ib_core]
[ 3562.117233] Read of size 4 at addr ffff8882ca5aa868 by task kworker/u13:2/23862
[ 3562.118385]
[ 3562.119519] CPU: 2 PID: 23862 Comm: kworker/u13:2 Tainted: G OE 5.1.0-for-upstream-dbg-2019-05-19_16-44-30-13 #1
[ 3562.121806] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu2 04/01/2014
[ 3562.123075] Workqueue: ib-comp-unb-wq ib_cq_poll_work [ib_core]
[ 3562.124383] Call Trace:
[ 3562.125640] dump_stack+0x9a/0xeb
[ 3562.126911] print_address_description+0xe3/0x2e0
[ 3562.128223] ? ib_mad_post_receive_mads+0xddc/0xed0 [ib_core]
[ 3562.129545] __kasan_report+0x15c/0x1df
[ 3562.130866] ? ib_mad_post_receive_mads+0xddc/0xed0 [ib_core]
[ 3562.132174] kasan_report+0xe/0x20
[ 3562.133514] ib_mad_post_receive_mads+0xddc/0xed0 [ib_core]
[ 3562.134835] ? find_mad_agent+0xa00/0xa00 [ib_core]
[ 3562.136158] ? qlist_free_all+0x51/0xb0
[ 3562.137498] ? mlx4_ib_sqp_comp_worker+0x1970/0x1970 [mlx4_ib]
[ 3562.138833] ? quarantine_reduce+0x1fa/0x270
[ 3562.140171] ? kasan_unpoison_shadow+0x30/0x40
[ 3562.141522] ib_mad_recv_done+0xdf6/0x3000 [ib_core]
[ 3562.142880] ? _raw_spin_unlock_irqrestore+0x46/0x70
[ 3562.144277] ? ib_mad_send_done+0x1810/0x1810 [ib_core]
[ 3562.145649] ? mlx4_ib_destroy_cq+0x2a0/0x2a0 [mlx4_ib]
[ 3562.147008] ? _raw_spin_unlock_irqrestore+0x46/0x70
[ 3562.148380] ? debug_object_deactivate+0x2b9/0x4a0
[ 3562.149814] __ib_process_cq+0xe2/0x1d0 [ib_core]
[ 3562.151195] ib_cq_poll_work+0x45/0xf0 [ib_core]
[ 3562.152577] process_one_work+0x90c/0x1860
[ 3562.153959] ? pwq_dec_nr_in_flight+0x320/0x320
[ 3562.155320] worker_thread+0x87/0xbb0
[ 3562.156687] ? __kthread_parkme+0xb6/0x180
[ 3562.158058] ? process_one_work+0x1860/0x1860
[ 3562.159429] kthread+0x320/0x3e0
[ 3562.161391] ? kthread_park+0x120/0x120
[ 3562.162744] ret_from_fork+0x24/0x30
...
[ 3562.187615] Freed by task 31682:
[ 3562.188602] save_stack+0x19/0x80
[ 3562.189586] __kasan_slab_free+0x11d/0x160
[ 3562.190571] kfree+0xf5/0x2f0
[ 3562.191552] ib_mad_port_close+0x200/0x380 [ib_core]
[ 3562.192538] ib_mad_remove_device+0xf0/0x230 [ib_core]
[ 3562.193538] remove_client_context+0xa6/0xe0 [ib_core]
[ 3562.194514] disable_device+0x14e/0x260 [ib_core]
[ 3562.195488] __ib_unregister_device+0x79/0x150 [ib_core]
[ 3562.196462] ib_unregister_device+0x21/0x30 [ib_core]
[ 3562.197439] mlx4_ib_remove+0x162/0x690 [mlx4_ib]
[ 3562.198408] mlx4_remove_device+0x204/0x2c0 [mlx4_core]
[ 3562.199381] mlx4_unregister_interface+0x49/0x1d0 [mlx4_core]
[ 3562.200356] mlx4_ib_cleanup+0xc/0x1d [mlx4_ib]
[ 3562.201329] __x64_sys_delete_module+0x2d2/0x400
[ 3562.202288] do_syscall_64+0x95/0x470
[ 3562.203277] entry_SYSCALL_64_after_hwframe+0x49/0xbe
The problem was that the MAD PD was deallocated before the MAD CQ.
There was completion work pending for the CQ when the PD got deallocated.
When the mad completion handling reached procedure
ib_mad_post_receive_mads(), we got a use-after-free bug in the following
line of code in that procedure:
sg_list.lkey = qp_info->port_priv->pd->local_dma_lkey;
(the pd pointer in the above line is no longer valid, because the
pd has been deallocated).
We fix this by allocating the PD before the CQ in procedure
ib_mad_port_open(), and deallocating the PD after freeing the CQ
in procedure ib_mad_port_close().
Since the CQ completion work queue is flushed during ib_free_cq(),
no completions will be pending for that CQ when the PD is later
deallocated.
Note that freeing the CQ before deallocating the PD is the practice
in the ULPs.
Fixes: 4be90bc60d ("IB/mad: Remove ib_get_dma_mr calls")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20190801121449.24973-1-leon@kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
The check for QP type different than XRC has excluded driver QP
types from the resource tracker.
As a result, "rdma resource show" user command would not show opened
driver QPs which does not reflect the real state of the system.
Check QP type explicitly instead of assuming enum values/ordering.
Fixes: 40909f664d ("RDMA/efa: Add EFA verbs implementation")
Signed-off-by: Gal Pressman <galpress@amazon.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20190801104354.11417-1-galpress@amazon.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
Due to the complexity of client->remove() callbacks it is desirable to not
hold any locks while calling them. Remove the last one by tracking only
the highest client ID and running backwards from there over the xarray.
Since the only purpose of that lock was to protect the linked list, we can
drop the lock.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20190731081841.32345-3-leon@kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
lockdep reports:
WARNING: possible circular locking dependency detected
modprobe/302 is trying to acquire lock:
0000000007c8919c ((wq_completion)ib_cm){+.+.}, at: flush_workqueue+0xdf/0x990
but task is already holding lock:
000000002d3d2ca9 (&device->client_data_rwsem){++++}, at: remove_client_context+0x79/0xd0 [ib_core]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (&device->client_data_rwsem){++++}:
down_read+0x3f/0x160
ib_get_net_dev_by_params+0xd5/0x200 [ib_core]
cma_ib_req_handler+0x5f6/0x2090 [rdma_cm]
cm_process_work+0x29/0x110 [ib_cm]
cm_req_handler+0x10f5/0x1c00 [ib_cm]
cm_work_handler+0x54c/0x311d [ib_cm]
process_one_work+0x4aa/0xa30
worker_thread+0x62/0x5b0
kthread+0x1ca/0x1f0
ret_from_fork+0x24/0x30
-> #1 ((work_completion)(&(&work->work)->work)){+.+.}:
process_one_work+0x45f/0xa30
worker_thread+0x62/0x5b0
kthread+0x1ca/0x1f0
ret_from_fork+0x24/0x30
-> #0 ((wq_completion)ib_cm){+.+.}:
lock_acquire+0xc8/0x1d0
flush_workqueue+0x102/0x990
cm_remove_one+0x30e/0x3c0 [ib_cm]
remove_client_context+0x94/0xd0 [ib_core]
disable_device+0x10a/0x1f0 [ib_core]
__ib_unregister_device+0x5a/0xe0 [ib_core]
ib_unregister_device+0x21/0x30 [ib_core]
mlx5_ib_stage_ib_reg_cleanup+0x9/0x10 [mlx5_ib]
__mlx5_ib_remove+0x3d/0x70 [mlx5_ib]
mlx5_ib_remove+0x12e/0x140 [mlx5_ib]
mlx5_remove_device+0x144/0x150 [mlx5_core]
mlx5_unregister_interface+0x3f/0xf0 [mlx5_core]
mlx5_ib_cleanup+0x10/0x3a [mlx5_ib]
__x64_sys_delete_module+0x227/0x350
do_syscall_64+0xc3/0x6a4
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Which is due to the read side of the client_data_rwsem being obtained
recursively through a work queue flush during cm client removal.
The lock is being held across the remove in remove_client_context() so
that the function is a fence, once it returns the client is removed. This
is required so that the two callers do not proceed with destruction until
the client completes removal.
Instead of using client_data_rwsem use the existing device unregistration
refcount and add a similar client unregistration (client->uses) refcount.
This will fence the two unregistration paths without holding any locks.
Cc: <stable@vger.kernel.org>
Fixes: 921eab1143 ("RDMA/devices: Re-organize device.c locking")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20190731081841.32345-2-leon@kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
When CONFIG_KASAN_SW_TAGS=n, set_tag() is compiled away. GCC throws a
warning,
mm/kasan/common.c: In function '__kasan_kmalloc':
mm/kasan/common.c:464:5: warning: variable 'tag' set but not used
[-Wunused-but-set-variable]
u8 tag = 0xff;
^~~
Fix it by making __tag_set() a static inline function the same as
arch_kasan_set_tag() in mm/kasan/kasan.h for consistency because there
is a macro in arch/arm64/include/asm/kasan.h,
#define arch_kasan_set_tag(addr, tag) __tag_set(addr, tag)
However, when CONFIG_DEBUG_VIRTUAL=n and CONFIG_SPARSEMEM_VMEMMAP=y,
page_to_virt() will call __tag_set() with incorrect type of a
parameter, so fix that as well. Also, still let page_to_virt() return
"void *" instead of "const void *", so will not need to add a similar
cast in lowmem_page_address().
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Will Deacon <will@kernel.org>
Michael reported an issue with perf bench numa failing with binding to
cpu0 with '-0' option.
# perf bench numa mem -p 3 -t 1 -P 512 -s 100 -zZcm0 --thp 1 -M 1 -ddd
# Running 'numa/mem' benchmark:
# Running main, "perf bench numa numa-mem -p 3 -t 1 -P 512 -s 100 -zZcm0 --thp 1 -M 1 -ddd"
binding to node 0, mask: 0000000000000001 => -1
perf: bench/numa.c:356: bind_to_memnode: Assertion `!(ret)' failed.
Aborted (core dumped)
This happens when the cpu0 is not part of node0, which is the benchmark
assumption and we can see that's not the case for some powerpc servers.
Using correct node for cpu0 binding.
Reported-by: Michael Petlan <mpetlan@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20190801142642.28004-1-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The mdp5 drm/kms driver currently does not work on command-mode DSI
panels due to 'vblank wait timed out' errors. This causes a latency
of seconds, or tens of seconds in some cases, before content is shown
on the panel. This hardware does not have the something that we can use
as a frame counter available when running in command mode, so we need to
fall back to using timestamps by setting the max_vblank_count to zero.
This can be done on a per-CRTC basis, so the convert mdp5 to use
drm_crtc_set_max_vblank_count().
This change was tested on a LG Nexus 5 (hammerhead) phone.
Suggested-by: Jeffrey Hugo <jeffrey.l.hugo@gmail.com>
Reviewed-by: Jeffrey Hugo <jeffrey.l.hugo@gmail.com>
Signed-off-by: Brian Masney <masneyb@onstation.org>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20190531094619.31704-3-masneyb@onstation.org
GCC throws a warning,
arch/arm64/mm/mmu.c: In function 'pud_free_pmd_page':
arch/arm64/mm/mmu.c:1033:8: warning: variable 'pud' set but not used
[-Wunused-but-set-variable]
pud_t pud;
^~~
because pud_table() is a macro and compiled away. Fix it by making it a
static inline function and for pud_sect() as well.
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Will Deacon <will@kernel.org>
Remove rcu_read_lock()/rcu_read_unlock() from debug exception
handlers since we are sure those are not preemptible and
interrupts are off.
Acked-by: Paul E. McKenney <paulmck@linux.ibm.com>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Prohibit probing on return_address() and subroutines which
is called from return_address(), since the it is invoked from
trace_hardirqs_off() which is also kprobe blacklisted.
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
On a system with two security states, if SCR_EL3.FIQ is cleared,
non-secure IRQ priorities get shifted to fit the secure view but
priority masks aren't.
On such system, it turns out that GIC_PRIO_IRQON masks the priority of
normal interrupts, which obviously ends up in a hang.
Increase GIC_PRIO_IRQON value (i.e. lower priority) to make sure
interrupts are not blocked by it.
Cc: Oleg Nesterov <oleg@redhat.com>
Fixes: bd82d4bd21 ("arm64: Fix incorrect irqflag restore for priority masking")
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Julien Thierry <julien.thierry.kdev@gmail.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
[will: fixed Fixes: tag]
Signed-off-by: Will Deacon <will@kernel.org>
Pull MMC fixes from Ulf Hansson:
- sdhci-sprd: Add a missing pm_runtime_put_noidle() to fix deferred
probe
- dw_mmc: Fix occasional hang after tuning on eMMC
- meson-mx-sdio: Fix misuse of GENMASK macro
- mmc_spi: Fix CRC problems for writes by using BDI_CAP_STABLE_WRITES
* tag 'mmc-v5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: mmc_spi: Enable stable writes
mmc: meson-mx-sdio: Fix misuse of GENMASK macro
mmc: dw_mmc: Fix occasional hang after tuning on eMMC
mmc: host: sdhci-sprd: Fix the missing pm_runtime_put_noidle()
Pull GPIO fixes from Linus Walleij:
"Three GPIO fixes, all touching the core, so quite important:
- Fix the request of active low GPIO line events.
- Don't issue WARN() stuff on NULL descriptors if the GPIOLIB is
disabled.
- Preserve the descriptor flags when setting the initial direction on
lines"
* tag 'gpio-v5.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpiolib: Preserve desc->flags when setting state
gpio: don't WARN() on NULL descs if gpiolib is disabled
gpiolib: fix incorrect IRQ requesting of an active-low lineevent
The local variable search in regulator_of_get_init_node takes the value
returned by either of_get_child_by_name or of_node_get, both of which
get a node. If this node is not put before returning, it could cause a
memory leak. Hence put search before a mid-loop return statement.
Issue found with Coccinelle.
Signed-off-by: Nishka Dasgupta <nishkadg.linux@gmail.com>
Link: https://lore.kernel.org/r/20190724083231.10276-1-nishkadg.linux@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
The bochs driver (and virtual hardware) requires buffer objects to
reside in video ram to display them to the screen. So it can not
display the framebuffer console because the respective buffer object
is permanently pinned in system memory.
Using a shadow buffer for the console solves this problem. The console
emulation will pin the buffer object only during updates from the shadow
buffer. Otherwise, the bochs driver can freely relocated the buffer
between system memory and video ram.
v2:
* select shadow FB via struct drm_mode_config.prefer_shadow_fbdev
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Acked-by: Noralf Trønnes <noralf@tronnes.org>
Link: https://patchwork.freedesktop.org/patch/315833/
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Generic framebuffer emulation uses a shadow buffer for framebuffers with
dirty() function. If drivers want to use the shadow FB without such a
function, they can now set prefer_shadow or prefer_shadow_fbdev in their
mode_config structures. The former flag is exported to userspace, the
latter flag is fbdev-only.
v3:
* only schedule dirty worker if fbdev uses shadow fb
* test shadow fb settings with boolean operators
* use bool for struct drm_mode_config.prefer_shadow_fbdev
* fix documentation comments
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Noralf Trønnes <noralf@tronnes.org>
Tested-by: Noralf Trønnes <noralf@tronnes.org>
Link: https://patchwork.freedesktop.org/patch/315834/
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
This patch changes DRM clients to not map the buffer by default. The
buffer, like any buffer object, should be mapped and unmapped when
needed.
An unmapped buffer object can be evicted to system memory and does
not consume video ram until displayed. This allows to use generic fbdev
emulation with drivers for low-memory devices, such as ast and mgag200.
This change affects the generic framebuffer console. HW-based consoles
map their console buffer once and keep it mapped. Userspace can mmap this
buffer into its address space. The shadow-buffered framebuffer console
only needs the buffer object to be mapped during updates. While not being
updated from the shadow buffer, the buffer object can remain unmapped.
Userspace will always mmap the shadow buffer.
v2:
* change DRM client to not map buffer by default
* manually map client buffer for fbdev with HW framebuffer
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Noralf Trønnes <noralf@tronnes.org>
Link: https://patchwork.freedesktop.org/patch/315830/
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
DRM clients, such as the fbdev emulation, have their buffer objects
mapped by default. Mapping a buffer implicitly prevents its relocation.
Hence, the buffer may permanently consume video memory while it's
allocated. This is a problem for drivers of low-memory devices, such as
ast, mgag200 or older framebuffer hardware, which will then not have
enough memory to display other content (e.g., X11).
This patch introduces drm_client_buffer_vmap() and _vunmap(). Internal
DRM clients can use these functions to unmap and remap buffer objects
as needed.
There's no reference counting for vmap operations. Callers are expected
to either keep buffers mapped (as it is now), or call vmap and vunmap
in pairs around code that accesses the mapped memory.
v2:
* remove several duplicated NULL-pointer checks
v3:
* style and typo fixes
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Noralf Trønnes <noralf@tronnes.org>
Link: https://patchwork.freedesktop.org/patch/315831/
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Use SMBUS_MASTER_DATA_READ.MASTER_RD_STATUS bit to check for RX
FIFO empty condition because SMBUS_MASTER_FIFO_CONTROL.MASTER_RX_PKT_COUNT
is not updated for read >= 64 bytes. This fixes the issue when trying to
read from the I2C slave more than 63 bytes.
Fixes: c24b8d574b ("i2c: iproc: Extend I2C read up to 255 bytes")
Cc: stable@kernel.org
Signed-off-by: Rayagonda Kokatanur <rayagonda.kokatanur@broadcom.com>
Reviewed-by: Ray Jui <ray.jui@broadcom.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Apparently we don't have an archclean target in our
arch/parisc/Makefile, so files in there never get cleaned out by make
mrproper. This, in turn means that the sizes.h file in
arch/parisc/boot/compressed never gets removed and worse, when you
transition to an O=build/parisc[64] build model it overrides the
generated file. The upshot being my bzImage was building with a SZ_end
that was too small.
I fixed it by making mrproper clean everything.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: stable@vger.kernel.org # v4.20+
Signed-off-by: Helge Deller <deller@gmx.de>
With debug info enabled (CONFIG_DEBUG_INFO=y) the resulting vmlinux may get
that huge that we need to increase the start addresss for the decompression
text section otherwise one will face a linker error.
Reported-by: Sven Schnelle <svens@stackframe.org>
Tested-by: Sven Schnelle <svens@stackframe.org>
Cc: stable@vger.kernel.org # v4.14+
Signed-off-by: Helge Deller <deller@gmx.de>
If we issue a reset to a currently idle engine, leave it idle
afterwards. This is useful to excise a linkage between reset and the
shrinker. When waking the engine, we need to pin the default context
image which we use for overwriting a guilty context -- if the engine is
idle we do not need this pinned image! However, this pinning means that
waking the engine acquires the FS_RECLAIM, and so may trigger the
shrinker. The shrinker itself may need to wait upon the GPU to unbind
and object and so may require services of reset; ergo we should avoid
the engine wake up path.
The danger in skipping the recovery for idle engines is that we leave the
engine with no context defined, which may interfere with the operation of
the power context on some older platforms. In practice, we should only
be resetting an active GPU but it something to look out for on Ironlake
(if memory serves).
Fixes: 79ffac8599 ("drm/i915: Invert the GEM wakeref hierarchy")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190626154549.10066-2-chris@chris-wilson.co.uk
(cherry picked from commit 18398904ca)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Instead of always calling xen_destroy_contiguous_region() in case the
memory is DMA-able for the used device, do so only in case it has been
made DMA-able via xen_create_contiguous_region() before.
This will avoid a lot of xen_destroy_contiguous_region() calls for
64-bit capable devices.
As the memory in question is owned by swiotlb-xen the PG_owner_priv_1
flag of the first allocated page can be used for remembering.
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
range_straddles_page_boundary() is open coding several macros from
include/xen/page.h. Use those instead. Additionally there is no need
to have check_pages_physically_contiguous() as a separate function as
it is used only once, so merge it into range_straddles_page_boundary().
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
The condition in xen_swiotlb_free_coherent() for deciding whether to
call xen_destroy_contiguous_region() is wrong: in case the region to
be freed is not contiguous calling xen_destroy_contiguous_region() is
the wrong thing to do: it would result in inconsistent mappings of
multiple PFNs to the same MFN. This will lead to various strange
crashes or data corruption.
Instead of calling xen_destroy_contiguous_region() in that case a
warning should be issued as that situation should never occur.
Cc: stable@vger.kernel.org
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Having static variable `cpus` in libbpf_num_possible_cpus function
without guarding it with mutex makes this function thread-unsafe.
If multiple threads accessing this function, in the current form; it
leads to incrementing the static variable value `cpus` in the multiple
of total available CPUs.
Used local stack variable to calculate the number of possible CPUs and
then updated the static variable using WRITE_ONCE().
Changes since v1:
* added stack variable to calculate cpus
* serialized static variable update using WRITE_ONCE()
* fixed Fixes tag
Fixes: 6446b31555 ("bpf: add a new API libbpf_num_possible_cpus()")
Signed-off-by: Takshak Chahande <ctakshak@fb.com>
Acked-by: Andrey Ignatov <rdna@fb.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Ensure the controller is not in the NEW state when nvme_probe() exits.
This will always allow a subsequent nvme_remove() to set the state to
DELETING, fixing a potential race between the initial asynchronous probe
and device removal.
Reported-by: Li Zhong <lizhongfs@gmail.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
With multipath enabled, nvme_scan_work() can read from the device
(through nvme_mpath_add_disk()) and hang [1]. However, with fabrics,
once ctrl->state is set to NVME_CTRL_DELETING, the reads will hang
(see nvmf_check_ready()) and the mpath stack device make_request
will block if head->list is not empty. However, when the head->list
consistst of only DELETING/DEAD controllers, we should actually not
block, but rather fail immediately.
In addition, before we go ahead and remove the namespaces, make sure
to clear the current path and kick the requeue list so that the
request will fast fail upon requeuing.
[1]:
--
INFO: task kworker/u4:3:166 blocked for more than 120 seconds.
Not tainted 5.2.0-rc6-vmlocalyes-00005-g808c8c2dc0cf #316
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/u4:3 D 0 166 2 0x80004000
Workqueue: nvme-wq nvme_scan_work
Call Trace:
__schedule+0x851/0x1400
schedule+0x99/0x210
io_schedule+0x21/0x70
do_read_cache_page+0xa57/0x1330
read_cache_page+0x4a/0x70
read_dev_sector+0xbf/0x380
amiga_partition+0xc4/0x1230
check_partition+0x30f/0x630
rescan_partitions+0x19a/0x980
__blkdev_get+0x85a/0x12f0
blkdev_get+0x2a5/0x790
__device_add_disk+0xe25/0x1250
device_add_disk+0x13/0x20
nvme_mpath_set_live+0x172/0x2b0
nvme_update_ns_ana_state+0x130/0x180
nvme_set_ns_ana_state+0x9a/0xb0
nvme_parse_ana_log+0x1c3/0x4a0
nvme_mpath_add_disk+0x157/0x290
nvme_validate_ns+0x1017/0x1bd0
nvme_scan_work+0x44d/0x6a0
process_one_work+0x7d7/0x1240
worker_thread+0x8e/0xff0
kthread+0x2c3/0x3b0
ret_from_fork+0x35/0x40
INFO: task kworker/u4:1:1034 blocked for more than 120 seconds.
Not tainted 5.2.0-rc6-vmlocalyes-00005-g808c8c2dc0cf #316
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/u4:1 D 0 1034 2 0x80004000
Workqueue: nvme-delete-wq nvme_delete_ctrl_work
Call Trace:
__schedule+0x851/0x1400
schedule+0x99/0x210
schedule_timeout+0x390/0x830
wait_for_completion+0x1a7/0x310
__flush_work+0x241/0x5d0
flush_work+0x10/0x20
nvme_remove_namespaces+0x85/0x3d0
nvme_do_delete_ctrl+0xb4/0x1e0
nvme_delete_ctrl_work+0x15/0x20
process_one_work+0x7d7/0x1240
worker_thread+0x8e/0xff0
kthread+0x2c3/0x3b0
ret_from_fork+0x35/0x40
--
Reported-by: Logan Gunthorpe <logang@deltatee.com>
Tested-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
When the user issues a command with side effects, we will end up freezing
the namespace request queue when updating disk info (and the same for
the corresponding mpath disk node).
However, we are not freezing the mpath node request queue,
which means that mpath I/O can still come in and block on blk_queue_enter
(called from nvme_ns_head_make_request -> direct_make_request).
This is a deadlock, because blk_queue_enter will block until the inner
namespace request queue is unfroze, but that process is blocked because
the namespace revalidation is trying to update the mpath disk info
and freeze its request queue (which will never complete because
of the I/O that is blocked on blk_queue_enter).
Fix this by freezing all the subsystem nsheads request queues before
executing the passthru command. Given that these commands are infrequent
we should not worry about this temporary I/O freeze to keep things sane.
Here is the matching hang traces:
--
[ 374.465002] INFO: task systemd-udevd:17994 blocked for more than 122 seconds.
[ 374.472975] Not tainted 5.2.0-rc3-mpdebug+ #42
[ 374.478522] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 374.487274] systemd-udevd D 0 17994 1 0x00000000
[ 374.493407] Call Trace:
[ 374.496145] __schedule+0x2ef/0x620
[ 374.500047] schedule+0x38/0xa0
[ 374.503569] blk_queue_enter+0x139/0x220
[ 374.507959] ? remove_wait_queue+0x60/0x60
[ 374.512540] direct_make_request+0x60/0x130
[ 374.517219] nvme_ns_head_make_request+0x11d/0x420 [nvme_core]
[ 374.523740] ? generic_make_request_checks+0x307/0x6f0
[ 374.529484] generic_make_request+0x10d/0x2e0
[ 374.534356] submit_bio+0x75/0x140
[ 374.538163] ? guard_bio_eod+0x32/0xe0
[ 374.542361] submit_bh_wbc+0x171/0x1b0
[ 374.546553] block_read_full_page+0x1ed/0x330
[ 374.551426] ? check_disk_change+0x70/0x70
[ 374.556008] ? scan_shadow_nodes+0x30/0x30
[ 374.560588] blkdev_readpage+0x18/0x20
[ 374.564783] do_read_cache_page+0x301/0x860
[ 374.569463] ? blkdev_writepages+0x10/0x10
[ 374.574037] ? prep_new_page+0x88/0x130
[ 374.578329] ? get_page_from_freelist+0xa2f/0x1280
[ 374.583688] ? __alloc_pages_nodemask+0x179/0x320
[ 374.588947] read_cache_page+0x12/0x20
[ 374.593142] read_dev_sector+0x2d/0xd0
[ 374.597337] read_lba+0x104/0x1f0
[ 374.601046] find_valid_gpt+0xfa/0x720
[ 374.605243] ? string_nocheck+0x58/0x70
[ 374.609534] ? find_valid_gpt+0x720/0x720
[ 374.614016] efi_partition+0x89/0x430
[ 374.618113] ? string+0x48/0x60
[ 374.621632] ? snprintf+0x49/0x70
[ 374.625339] ? find_valid_gpt+0x720/0x720
[ 374.629828] check_partition+0x116/0x210
[ 374.634214] rescan_partitions+0xb6/0x360
[ 374.638699] __blkdev_reread_part+0x64/0x70
[ 374.643377] blkdev_reread_part+0x23/0x40
[ 374.647860] blkdev_ioctl+0x48c/0x990
[ 374.651956] block_ioctl+0x41/0x50
[ 374.655766] do_vfs_ioctl+0xa7/0x600
[ 374.659766] ? locks_lock_inode_wait+0xb1/0x150
[ 374.664832] ksys_ioctl+0x67/0x90
[ 374.668539] __x64_sys_ioctl+0x1a/0x20
[ 374.672732] do_syscall_64+0x5a/0x1c0
[ 374.676828] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 374.738474] INFO: task nvmeadm:49141 blocked for more than 123 seconds.
[ 374.745871] Not tainted 5.2.0-rc3-mpdebug+ #42
[ 374.751419] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 374.760170] nvmeadm D 0 49141 36333 0x00004080
[ 374.766301] Call Trace:
[ 374.769038] __schedule+0x2ef/0x620
[ 374.772939] schedule+0x38/0xa0
[ 374.776452] blk_mq_freeze_queue_wait+0x59/0x100
[ 374.781614] ? remove_wait_queue+0x60/0x60
[ 374.786192] blk_mq_freeze_queue+0x1a/0x20
[ 374.790773] nvme_update_disk_info.isra.57+0x5f/0x350 [nvme_core]
[ 374.797582] ? nvme_identify_ns.isra.50+0x71/0xc0 [nvme_core]
[ 374.804006] __nvme_revalidate_disk+0xe5/0x110 [nvme_core]
[ 374.810139] nvme_revalidate_disk+0xa6/0x120 [nvme_core]
[ 374.816078] ? nvme_submit_user_cmd+0x11e/0x320 [nvme_core]
[ 374.822299] nvme_user_cmd+0x264/0x370 [nvme_core]
[ 374.827661] nvme_dev_ioctl+0x112/0x1d0 [nvme_core]
[ 374.833114] do_vfs_ioctl+0xa7/0x600
[ 374.837117] ? __audit_syscall_entry+0xdd/0x130
[ 374.842184] ksys_ioctl+0x67/0x90
[ 374.845891] __x64_sys_ioctl+0x1a/0x20
[ 374.850082] do_syscall_64+0x5a/0x1c0
[ 374.854178] entry_SYSCALL_64_after_hwframe+0x44/0xa9
--
Reported-by: James Puthukattukaran <james.puthukattukaran@oracle.com>
Tested-by: James Puthukattukaran <james.puthukattukaran@oracle.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Presently, nvmet_file_flush() always returns a call to
errno_to_nvme_status() but that helper doesn't take into account the
case when errno=0. So nvmet_file_flush() always returns an error code.
All other callers of errno_to_nvme_status() check for success before
calling it.
To fix this, ensure errno_to_nvme_status() returns success if the
errno is zero. This should prevent future mistakes like this from
happening.
Fixes: c6aa3542e0 ("nvmet: add error log support for file backend")
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
After calling nvme_loop_delete_ctrl(), the controllers will not
yet be deleted because nvme_delete_ctrl() only schedules work
to do the delete.
This means a race can occur if a port is removed but there
are still active controllers trying to access that memory.
To fix this, flush the nvme_delete_wq before returning from
nvme_loop_remove_port() so that any controllers that might
be in the process of being deleted won't access a freed port.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by : Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
When a port is removed through configfs, any connected controllers
are still active and can still send commands. This causes a
use-after-free bug which is detected by KASAN for any admin command
that dereferences req->port (like in nvmet_execute_identify_ctrl).
To fix this, disconnect all active controllers when a subsystem is
removed from a port. This ensures there are no active controllers
when the port is eventually removed.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by : Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
When permanent entries were introduced by the commit below, they were
exempt from timing out and thus igmp leave wouldn't affect them unless
fast leave was enabled on the port which was added before permanent
entries existed. It shouldn't matter if fast leave is enabled or not
if the user added a permanent entry it shouldn't be deleted on igmp
leave.
Before:
$ echo 1 > /sys/class/net/eth4/brport/multicast_fast_leave
$ bridge mdb add dev br0 port eth4 grp 229.1.1.1 permanent
$ bridge mdb show
dev br0 port eth4 grp 229.1.1.1 permanent
< join and leave 229.1.1.1 on eth4 >
$ bridge mdb show
$
After:
$ echo 1 > /sys/class/net/eth4/brport/multicast_fast_leave
$ bridge mdb add dev br0 port eth4 grp 229.1.1.1 permanent
$ bridge mdb show
dev br0 port eth4 grp 229.1.1.1 permanent
< join and leave 229.1.1.1 on eth4 >
$ bridge mdb show
dev br0 port eth4 grp 229.1.1.1 permanent
Fixes: ccb1c31a7a ("bridge: add flags to distinguish permanent mdb entires")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In phy_led_trigger_change_speed(), there is an if statement on line 48
to check whether phy->last_triggered is NULL:
if (!phy->last_triggered)
When phy->last_triggered is NULL, it is used on line 52:
led_trigger_event(&phy->last_triggered->trigger, LED_OFF);
Thus, a possible null-pointer dereference may occur.
To fix this bug, led_trigger_event(&phy->last_triggered->trigger,
LED_OFF) is called when phy->last_triggered is not NULL.
This bug is found by a static analysis tool STCheck written by
the OSLAB group in Tsinghua University.
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Build bot reports some recent TLS tests are failing
with CONFIG_TLS=n. Correct the expected return code
and skip TLS installation if not supported.
Tested with CONFIG_TLS=n and CONFIG_TLS=m.
Reported-by: kernel test robot <rong.a.chen@intel.com>
Fixes: cf32526c88 ("selftests/tls: add a test for ULP but no keys")
Fixes: 65d41fb317 ("selftests/tls: add a bidirectional test")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Certain ttys operations (pty_unix98_ops) lack tiocmget() and tiocmset()
functions which are called by the certain HCI UART protocols (hci_ath,
hci_bcm, hci_intel, hci_mrvl, hci_qca) via hci_uart_set_flow_control()
or directly. This leads to an execution at NULL and can be triggered by
an unprivileged user. Fix this by adding a helper function and a check
for the missing tty operations in the protocols code.
This fixes CVE-2019-10207. The Fixes: lines list commits where calls to
tiocm[gs]et() or hci_uart_set_flow_control() were added to the HCI UART
protocols.
Link: https://syzkaller.appspot.com/bug?id=1b42faa2848963564a5b1b7f8c837ea7b55ffa50
Reported-by: syzbot+79337b501d6aa974d0f6@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org # v2.6.36+
Fixes: b3190df628 ("Bluetooth: Support for Atheros AR300x serial chip")
Fixes: 118612fb91 ("Bluetooth: hci_bcm: Add suspend/resume PM functions")
Fixes: ff2895592f ("Bluetooth: hci_intel: Add Intel baudrate configuration support")
Fixes: 162f812f23 ("Bluetooth: hci_uart: Add Marvell support")
Fixes: fa9ad876b8 ("Bluetooth: hci_qca: Add support for Qualcomm Bluetooth chip wcn3990")
Signed-off-by: Vladis Dronov <vdronov@redhat.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Reviewed-by: Yu-Chen, Cho <acho@suse.com>
Tested-by: Yu-Chen, Cho <acho@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
To properly clear the slab on free with slab_want_init_on_free, we walk
the list of free objects using get_freepointer/set_freepointer.
The value we get from get_freepointer may not be valid. This isn't an
issue since an actual value will get written later but this means
there's a chance of triggering a bug if we use this value with
set_freepointer:
kernel BUG at mm/slub.c:306!
invalid opcode: 0000 [#1] PREEMPT PTI
CPU: 0 PID: 0 Comm: swapper Not tainted 5.2.0-05754-g6471384a #4
RIP: 0010:kfree+0x58a/0x5c0
Code: 48 83 05 78 37 51 02 01 0f 0b 48 83 05 7e 37 51 02 01 48 83 05 7e 37 51 02 01 48 83 05 7e 37 51 02 01 48 83 05 d6 37 51 02 01 <0f> 0b 48 83 05 d4 37 51 02 01 48 83 05 d4 37 51 02 01 48 83 05 d4
RSP: 0000:ffffffff82603d90 EFLAGS: 00010002
RAX: ffff8c3976c04320 RBX: ffff8c3976c04300 RCX: 0000000000000000
RDX: ffff8c3976c04300 RSI: 0000000000000000 RDI: ffff8c3976c04320
RBP: ffffffff82603db8 R08: 0000000000000000 R09: 0000000000000000
R10: ffff8c3976c04320 R11: ffffffff8289e1e0 R12: ffffd52cc8db0100
R13: ffff8c3976c01a00 R14: ffffffff810f10d4 R15: ffff8c3976c04300
FS: 0000000000000000(0000) GS:ffffffff8266b000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff8c397ffff000 CR3: 0000000125020000 CR4: 00000000000406b0
Call Trace:
apply_wqattrs_prepare+0x154/0x280
apply_workqueue_attrs_locked+0x4e/0xe0
apply_workqueue_attrs+0x36/0x60
alloc_workqueue+0x25a/0x6d0
workqueue_init_early+0x246/0x348
start_kernel+0x3c7/0x7ec
x86_64_start_reservations+0x40/0x49
x86_64_start_kernel+0xda/0xe4
secondary_startup_64+0xb6/0xc0
Modules linked in:
---[ end trace f67eb9af4d8d492b ]---
Fix this by ensuring the value we set with set_freepointer is either NULL
or another value in the chain.
Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: Laura Abbott <labbott@redhat.com>
Fixes: 6471384af2 ("mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options")
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Align the RV64 defconfig to the output of "make savedefconfig" to
avoid unnecessary deltas for future defconfig patches. This patch
should have no runtime functional impact.
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
Reviewed-by: Bin Meng <bmeng.cn@gmail.com>
On FU540-based systems, the "timebase-frequency" (RTCCLK) is sourced
from an external crystal located on the PCB. Thus the
timebase-frequency DT property should be defined by the board that
uses the SoC, not the SoC itself. Drop the superfluous
timebase-frequency property from the SoC DT data. (It's already
present in the board DT data.)
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
Reviewed-by: Bin Meng <bmeng.cn@gmail.com>
This patch fix following perf record error by linking vdso.so with
build id.
perf.data perf.data.old
[ perf record: Woken up 1 times to write data ]
free(): double free detected in tcache 2
Aborted
perf record use filename__read_build_id(util/symbol-minimal.c) to get
build id when libelf is not supported. When vdso.so is linked without
build id, the section size of PT_NOTE will be zero, buf size will
realloc to zero and cause memory corruption.
Signed-off-by: Mao Han <han_mao@c-sky.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
Pull tracing fixes from Steven Rostedt:
"Two minor fixes:
- Fix trace event header include guards, as several did not match the
#define to the #ifdef
- Remove a redundant test to ftrace_graph_notrace_addr() that was
accidentally added"
* tag 'trace-v5.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
fgraph: Remove redundant ftrace_graph_notrace_addr() test
tracing: Fix header include guards in trace event headers
GCC throws out this warning on arm64.
drivers/firmware/efi/libstub/arm-stub.c: In function 'efi_entry':
drivers/firmware/efi/libstub/arm-stub.c:132:22: warning: variable 'si'
set but not used [-Wunused-but-set-variable]
Fix it by making free_screen_info() a static inline function.
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Pull IPMI fix from Corey Minyard:
"One necessary fix for an uninitialized variable in the new IPMB driver.
Nothing else has come in besides things that need to wait until later"
* tag 'for-linus-5.3-2' of git://github.com/cminyard/linux-ipmi:
Fix uninitialized variable in ipmb_dev_int.c
If CTR_EL0.{CWG,ERG} are 0b0000 then they must be interpreted to have
their architecturally maximum values, which defeats the use of
FTR_HIGHER_SAFE when sanitising CPU ID registers on heterogeneous
machines.
Introduce FTR_HIGHER_OR_ZERO_SAFE so that these fields effectively
saturate at zero.
Fixes: 3c739b5710 ("arm64: Keep track of CPU feature registers")
Cc: <stable@vger.kernel.org> # 4.4.x-
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Using an old .config in combination with "make oldconfig" can cause
an incorrect detection of the compat compiler:
$ grep CROSS_COMPILE_COMPAT .config
CONFIG_CROSS_COMPILE_COMPAT_VDSO=""
$ make oldconfig && make
arch/arm64/Makefile:58: gcc not found, check CROSS_COMPILE_COMPAT.
Stop.
Accordingly to the section 7.2 of the GNU Make manual "Syntax of
Conditionals", "When the value results from complex expansions of
variables and functions, expansions you would consider empty may
actually contain whitespace characters and thus are not seen as
empty. However, you can use the strip function to avoid interpreting
whitespace as a non-empty value."
Fix the issue adding strip to the CROSS_COMPILE_COMPAT string
evaluation.
Reported-by: Matteo Croce <mcroce@redhat.com>
Tested-by: Matteo Croce <mcroce@redhat.com>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
With the recent iomap write page reclaim deadlock fix, it turns out that the
GLF_DIRTY flag isn't always set when it needs to be anymore: previously, this
happened as a side effect of always adding the inode buffer head to the current
transaction with gfs2_trans_add_meta, but this isn't happening consistently
anymore. Fix by removing an additional unnecessary gfs2_trans_add_meta call
and by setting the GLF_DIRTY flag in gfs2_iomap_end.
(The GLF_DIRTY flag causes inode_go_sync to flush the transaction log when
syncing out the glock of that inode. When the flag isn't set, inode_go_sync
will skip inodes, including ones with an i_state of I_DIRTY_PAGES, which will
lead to cluster incoherency.)
In addition, in gfs2_iomap_page_done, if the metadata has changed, mark the
inode as I_DIRTY_DATASYNC to have the inode added to the current transaction:
we don't expect metadata to change here, but let's err on the safe side.
Fixes: d0a22a4b03 ("gfs2: Fix iomap write page reclaim deadlock");
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
In "consolidate the capability checks in sget_{fc,userns}())" the
wrong argument had been passed to mount_capable() by sget_fc().
That mistake had been further obscured later, when switching
mount_capable() to fs_context has moved the calculation of
bogus argument from sget_fc() to mount_capable() itself. It
should've been fc->user_ns all along.
Screwed-up-by: Al Viro <viro@zeniv.linux.org.uk>
Reported-by: Christian Brauner <christian@brauner.io>
Tested-by: Christian Brauner <christian@brauner.io>
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Since linux 4.9 it is not possible to use buffers on the stack for DMA transfers.
During usb probe the driver crashes with "transfer buffer is on stack" message.
This fix k-allocates a buffer to be used on "read_reg_atomic", which is a macro
that calls "usb_control_msg" under the hood.
Kernel 4.19 backtrace:
usb_hcd_submit_urb+0x3e5/0x900
? sched_clock+0x9/0x10
? log_store+0x203/0x270
? get_random_u32+0x6f/0x90
? cache_alloc_refill+0x784/0x8a0
usb_submit_urb+0x3b4/0x550
usb_start_wait_urb+0x4e/0xd0
usb_control_msg+0xb8/0x120
hfcsusb_probe+0x6bc/0xb40 [hfcsusb]
usb_probe_interface+0xc2/0x260
really_probe+0x176/0x280
driver_probe_device+0x49/0x130
__driver_attach+0xa9/0xb0
? driver_probe_device+0x130/0x130
bus_for_each_dev+0x5a/0x90
driver_attach+0x14/0x20
? driver_probe_device+0x130/0x130
bus_add_driver+0x157/0x1e0
driver_register+0x51/0xe0
usb_register_driver+0x5d/0x120
? 0xf81ed000
hfcsusb_drv_init+0x17/0x1000 [hfcsusb]
do_one_initcall+0x44/0x190
? free_unref_page_commit+0x6a/0xd0
do_init_module+0x46/0x1c0
load_module+0x1dc1/0x2400
sys_init_module+0xed/0x120
do_fast_syscall_32+0x7a/0x200
entry_SYSENTER_32+0x6b/0xbe
Signed-off-by: Juliana Rodrigueiro <juliana.rodrigueiro@intra2net.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The whole block is protected by "if NET_VENDOR_MEDIATEK", so there is
no need for individual driver config symbols to duplicate this
dependency.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johannes Berg says:
====================
Just a few fixes:
* revert NETIF_F_LLTX usage as it caused problems
* avoid warning on WMM parameters from AP that are too short
* fix possible null-ptr dereference in hwsim
* fix interface combinations with 4-addr and crypto control
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
netfilter fixes for net
The following patchset contains Netfilter fixes for your net tree:
1) memleak in ebtables from the error path for the 32/64 compat layer,
from Florian Westphal.
2) Fix inverted meta ifname/ifidx matching when no interface is set
on either from the input/output path, from Phil Sutter.
3) Remove goto label in nft_meta_bridge, also from Phil.
4) Missing include guard in xt_connlabel, from Masahiro Yamada.
5) Two patch to fix ipset destination MAC matching coming from
Stephano Brivio, via Jozsef Kadlecsik.
6) Fix set rename and listing concurrency problem, from Shijie Luo.
Patch also coming via Jozsef Kadlecsik.
7) ebtables 32/64 compat missing base chain policy in rule count,
from Florian Westphal.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no need to use GFP_ATOMIC here, GFP_KERNEL should be enough.
The 'kcalloc()' just a few lines above, already uses GFP_KERNEL.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no good reason to use GFP_ATOMIC here. Other memory allocations
are performed with GFP_KERNEL (see other 'dma_alloc_coherent()' below and
'kzalloc()' in 'et131x_rx_dma_memory_alloc()')
Use GFP_KERNEL which should be enough.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
I would like to maintain the floppy driver. After the recent fixes,
I think I know the code pretty well. Nowadays I've got 2 physical 3.5"
readers to test all the changes.
Signed-off-by: Denis Efremov <efremov@linux.com>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Ido Schimmel says:
====================
mlxsw: Two small fixes
Patch #1 from Jiri fixes the error path of the module initialization
function. Found during manual code inspection.
Patch #2 from Petr further reduces the default shared buffer pool sizes
in order to work around a problem that was originally described in
commit e891ce1dd2 ("mlxsw: spectrum_buffers: Reduce pool size on
Spectrum-2").
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit e891ce1dd2 ("mlxsw: spectrum_buffers: Reduce pool size on
Spectrum-2"), pool size was reduced to mitigate a problem in port buffer
usage of ports split four ways. It turns out that this work around does not
solve the issue, and a further reduction is required.
Thus reduce the size of pool 0 by another 2.7 MiB, and round down to the
whole number of cells.
Fixes: e891ce1dd2 ("mlxsw: spectrum_buffers: Reduce pool size on Spectrum-2")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case of sp2 pci driver registration fail, fix the error path to
start with sp1 pci driver unregister.
Fixes: c3ab435466 ("mlxsw: spectrum: Extend to support Spectrum-2 ASIC")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the particular version of clang a user has doesn't enable
-Werror=unknown-warning-option by default, even though it is the
default[1], then make sure to pass the option to the Kconfig cc-option
command so that testing options from Kconfig files works properly.
Otherwise, depending on the default values setup in the clang toolchain
we will silently assume options such as -Wmaybe-uninitialized are
supported by clang, when they really aren't.
A compilation issue only started happening for me once commit
589834b3a0 ("kbuild: Add -Werror=unknown-warning-option to
CLANG_FLAGS") was applied on top of commit b303c6df80 ("kbuild:
compute false-positive -Wmaybe-uninitialized cases in Kconfig"). This
leads kbuild to try and test for the existence of the
-Wmaybe-uninitialized flag with the cc-option command in
scripts/Kconfig.include, and it doesn't see an error returned from the
option test so it sets the config value to Y. Then the Makefile tries to
pass the unknown option on the command line and
-Werror=unknown-warning-option catches the invalid option and breaks the
build. Before commit 589834b3a0 ("kbuild: Add
-Werror=unknown-warning-option to CLANG_FLAGS") the build works fine,
but any cc-option test of a warning option in Kconfig files silently
evaluates to true, even if the warning option flag isn't supported on
clang.
Note: This doesn't change cc-option usages in Makefiles because those
use a different rule that includes KBUILD_CFLAGS by default (see the
__cc-option command in scripts/Kbuild.incluide). The KBUILD_CFLAGS
variable already has the -Werror=unknown-warning-option flag set. Thanks
to Doug for pointing out the different rule.
[1] https://clang.llvm.org/docs/DiagnosticsReference.html#wunknown-warning-option
Cc: Peter Smith <peter.smith@linaro.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
The following four files are every time rebuilt:
UNROLL lib/raid6/vpermxor1.c
UNROLL lib/raid6/vpermxor2.c
UNROLL lib/raid6/vpermxor4.c
UNROLL lib/raid6/vpermxor8.c
Fix the suffixes in the targets.
Fixes: 72ad21075d ("lib/raid6: refactor unroll rules with pattern rules")
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Since commit ff9b45c55b ("kbuild: modpost: read modules.order instead
of $(MODVERDIR)/*.mod"), 'make vmlinux' emits a warning, like this:
$ make defconfig vmlinux
[ snip ]
LD vmlinux.o
cat: modules.order: No such file or directory
MODPOST vmlinux.o
MODINFO modules.builtin.modinfo
KSYM .tmp_kallsyms1.o
KSYM .tmp_kallsyms2.o
LD vmlinux
SORTEX vmlinux
SYSMAP System.map
When building only vmlinux, KBUILD_MODULES is not set. Hence, the
modules.order is not generated. For the vmlinux modpost, it is not
necessary at all.
Separate scripts/Makefile.modpost for the vmlinux/modules stages.
This works more efficiently because the vmlinux modpost does not
need to include .*.cmd files.
Fixes: ff9b45c55b ("kbuild: modpost: read modules.order instead of $(MODVERDIR)/*.mod")
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
__modpost is a phony target. The dependency on FORCE is pointless.
All the objects have been built in the previous stage, so the
dependency on the objects are not necessary either.
Count the number of modules in a more straightforward way.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
KBUILD_EXTRA_SYMBOLS makes sense only when building external modules.
Moreover, the modpost sets 'external_module' if the -e option is given.
I replaced $(patsubst %, -e %,...) with simpler $(addprefix -e,...)
while I was here.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
If a build rule fails, the .DELETE_ON_ERROR special target removes the
target, but does nothing for the .*.cmd file, which might be corrupted.
So, .*.cmd files should be included only when the corresponding targets
exist.
Commit 392885ee82 ("kbuild: let fixdep directly write to .*.cmd
files") missed to fix up this file.
Fixes: 392885ee82 ("kbuild: let fixdep directly write to .*.cmd")
Cc: <stable@vger.kernel.org> # v5.0+
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Commit abbbdf1249 ("replace kill_bdev() with __invalidate_device()")
once did this, but 29eaadc036 ("nbd: stop using the bdev everywhere")
resurrected kill_bdev() and it has been there since then. So buffer_head
mappings still get killed on a server disconnection, and we can still
hit the BUG_ON on a filesystem on the top of the nbd device.
EXT4-fs (nbd0): mounted filesystem with ordered data mode. Opts: (null)
block nbd0: Receive control failed (result -32)
block nbd0: shutting down sockets
print_req_error: I/O error, dev nbd0, sector 66264 flags 3000
EXT4-fs warning (device nbd0): htree_dirblock_to_tree:979: inode #2: lblock 0: comm ls: error -5 reading directory block
print_req_error: I/O error, dev nbd0, sector 2264 flags 3000
EXT4-fs error (device nbd0): __ext4_get_inode_loc:4690: inode #2: block 283: comm ls: unable to read itable block
EXT4-fs error (device nbd0) in ext4_reserve_inode_write:5894: IO failure
------------[ cut here ]------------
kernel BUG at fs/buffer.c:3057!
invalid opcode: 0000 [#1] SMP PTI
CPU: 7 PID: 40045 Comm: jbd2/nbd0-8 Not tainted 5.1.0-rc3+ #4
Hardware name: Amazon EC2 m5.12xlarge/, BIOS 1.0 10/16/2017
RIP: 0010:submit_bh_wbc+0x18b/0x190
...
Call Trace:
jbd2_write_superblock+0xf1/0x230 [jbd2]
? account_entity_enqueue+0xc5/0xf0
jbd2_journal_update_sb_log_tail+0x94/0xe0 [jbd2]
jbd2_journal_commit_transaction+0x12f/0x1d20 [jbd2]
? __switch_to_asm+0x40/0x70
...
? lock_timer_base+0x67/0x80
kjournald2+0x121/0x360 [jbd2]
? remove_wait_queue+0x60/0x60
kthread+0xf8/0x130
? commit_timeout+0x10/0x10 [jbd2]
? kthread_bind+0x10/0x10
ret_from_fork+0x35/0x40
With __invalidate_device(), I no longer hit the BUG_ON with sync or
unmount on the disconnected device.
Fixes: 29eaadc036 ("nbd: stop using the bdev everywhere")
Cc: linux-block@vger.kernel.org
Cc: Ratna Manoj Bolla <manoj.br@gmail.com>
Cc: nbd@other.debian.org
Cc: stable@vger.kernel.org
Cc: David Woodhouse <dwmw@amazon.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Munehisa Kamata <kamatam@amazon.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Retrieving PHYs can defer the probe, do not spawn an error when
-EPROBE_DEFER is returned, it is normal behavior.
Fixes: b1a9edbda0 ("ata: libahci: allow to use multiple PHYs")
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Assume the following ftrace code sequence that was patched in earlier by
ftrace_make_call():
PAGE A:
ffc: addr of ftrace_caller()
PAGE B:
000: 0x6fc10080 /* stw,ma r1,40(sp) */
004: 0x48213fd1 /* ldw -18(r1),r1 */
008: 0xe820c002 /* bv,n r0(r1) */
00c: 0xe83f1fdf /* b,l,n .-c,r1 */
When a Code sequences that is to be patched spans a page break, we might
have already cleared the part on the PAGE A. If an interrupt is coming in
during the remap of the fixed mapping to PAGE B, it might execute the
patched function with only parts of the FTRACE code cleared. To prevent
this, clear the jump to our mini trampoline first, and clear the remaining
parts after this. This might also happen when patch_text() patches a
function that it calls during remap.
Signed-off-by: Sven Schnelle <svens@stackframe.org>
Cc: <stable@vger.kernel.org> # 5.2+
Signed-off-by: Helge Deller <deller@gmx.de>
'default_defconfig' is an awkward name since 'defconfig' is the default.
Let's simply say 'defconfig' like other architectures. You can drop the
KBUILD_DEFCONFIG define by following the standard naming.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Helge Deller <deller@gmx.de>
In fpudispatch.c we see a lot of fall-through warnings, but for this file we
prefer to not mark the switches and instead keep it in it's original state as
it's copied from HP-UX.
Fixes: a035d552a9 ("Makefile: Globally enable fall-through warning")
Signed-off-by: Helge Deller <deller@gmx.de>
If applespi is enabled, but LEDs class support is not, the build fails:
drivers/input/keyboard/applespi.o: In function `applespi_probe':
applespi.c:(.text+0x1fcd): undefined reference to `devm_led_classdev_register_ext'
Add "depends on LEDS_CLASS" to the Konfig
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: 038b1a05ea ("Input: add Apple SPI keyboard and trackpad driver")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
This reverts commit 6c5875843b.
It triggers a probable compiler bug on clang which leads to crashes.
With GCC it allows the compiler to use a more efficient register
allocation but current GCC versions never do that at any of the current
call sites, so there's no benefit.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
We have a lot of fixes, most of them are also applicable to stable.
Notably:
* fix use-after-free issues
* fix DMA mapping API usage errors
* fix frame drop occurring due to reorder buffer handling in
RSS in certain conditions
* fix rate scale locking issues
* disable TX A-MSDU on older NICs as it causes problems and was
never supposed to be supported
* new PCI IDs
* GEO_TX_POWER_LIMIT API issue that many people were hitting
Mark switch cases where we are expecting to fall through.
This patch fixes the following warning (Building: powerpc):
drivers/macintosh/smu.c: In function 'smu_queue_i2c':
drivers/macintosh/smu.c:854:21: warning: this statement may fall through [-Wimplicit-fallthrough=]
cmd->info.devaddr &= 0xfe;
~~~~~~~~~~~~~~~~~~^~~~~~~
drivers/macintosh/smu.c:855:2: note: here
case SMU_I2C_TRANSFER_STDSUB:
^~~~
Fixes: 0365ba7fb1 ("[PATCH] ppc64: SMU driver update & i2c support")
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190730143704.060a2606@canb.auug.org.au
No VCN DPM bit check as that's different from VCN PG. Also
no extra check for possible double enablement/disablement
as that's already done by VCN.
v2: check return value of smu_feature_set_enabled
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Coccinelle reports a path that the array "data" is never initialized.
The path skips the checks in the conditional branches when either
of callback functions, read_wave_vgprs and read_wave_sgprs, is not
registered. Later, the uninitialized "data" array is read
in the while-loop below and passed to put_user().
Fix the path by allocating the array with kcalloc().
The patch is simplier than adding a fall-back branch that explicitly
calls memset(data, 0, ...). Also it does not need the multiplication
1024*sizeof(*data) as the size parameter for memset() though there is
no risk of integer overflow.
Signed-off-by: Wang Xiayang <xywang.sjtu@sjtu.edu.cn>
Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We always need to drop the ctx reference and should check
for errors first and then dereference the fence pointer.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Building the privcmd code as a loadable module on ARM, we get
a link error due to the private cache management functions:
ERROR: "__sync_icache_dcache" [drivers/xen/xen-privcmd.ko] undefined!
Move the code into a new that is always built in when Xen is enabled,
as suggested by Juergen Gross and Boris Ostrovsky.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
'commit df9bde015a ("xen/gntdev.c: convert to use vm_map_pages()")'
breaks gntdev driver. If vma->vm_pgoff > 0, vm_map_pages()
will:
- use map->pages starting at vma->vm_pgoff instead of 0
- verify map->count against vma_pages()+vma->vm_pgoff instead of just
vma_pages().
In practice, this breaks using a single gntdev FD for mapping multiple
grants.
relevant strace output:
[pid 857] ioctl(7, IOCTL_GNTDEV_MAP_GRANT_REF, 0x7ffd3407b6d0) = 0
[pid 857] mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, 7, 0) =
0x777f1211b000
[pid 857] ioctl(7, IOCTL_GNTDEV_SET_UNMAP_NOTIFY, 0x7ffd3407b710) = 0
[pid 857] ioctl(7, IOCTL_GNTDEV_MAP_GRANT_REF, 0x7ffd3407b6d0) = 0
[pid 857] mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, 7,
0x1000) = -1 ENXIO (No such device or address)
details here:
https://github.com/QubesOS/qubes-issues/issues/5199
The reason is -> ( copying Marek's word from discussion)
vma->vm_pgoff is used as index passed to gntdev_find_map_index. It's
basically using this parameter for "which grant reference to map".
map struct returned by gntdev_find_map_index() describes just the pages
to be mapped. Specifically map->pages[0] should be mapped at
vma->vm_start, not vma->vm_start+vma->vm_pgoff*PAGE_SIZE.
When trying to map grant with index (aka vma->vm_pgoff) > 1,
__vm_map_pages() will refuse to map it because it will expect map->count
to be at least vma_pages(vma)+vma->vm_pgoff, while it is exactly
vma_pages(vma).
Converting vm_map_pages() to use vm_map_pages_zero() will fix the
problem.
Marek has tested and confirmed the same.
Cc: stable@vger.kernel.org # v5.2+
Fixes: df9bde015a ("xen/gntdev.c: convert to use vm_map_pages()")
Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
DPM state relates are not supported on the new SW SMU ASICs. But still
it's not OK to trigger null pointer dereference on accessing them.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
in this patch,
drm/amd/powerplay: add callback function of get_thermal_temperature_range
the driver missed temperature granularity change on other temperature.
Signed-off-by: Kevin Wang <kevin1.wang@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
1. the thermal temperature is asic related data, move the code logic to
xxx_ppt.c.
2. replace data structure PP_TemperatureRange with
smu_temperature_range.
3. change temperature uint from temp*1000 to temp (temperature uint).
Signed-off-by: Kevin Wang <kevin1.wang@amd.com>
Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Acked-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Pull dax fix from Dan Williams:
"Fix a botched manual patch update that got dropped between testing and
application"
* 'dax-fix-5.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
dax: Fix missed wakeup in put_unlocked_entry()
If a device doesn't support DAX its 'dax_dev' is NULL. Fix
device_synchronous() to first check if dax_dev is NULL before
dereferencing it.
Fixes: 2e9ee0955d ("dm: enable synchronous dax")
Reported-by: jencce.kernel@gmail.com
Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
The generic VDSO implementation uses the Y2038 safe clock_gettime64() and
clock_getres_time64() syscalls as fallback for 32bit VDSO. This breaks
seccomp setups because these syscalls might be not (yet) allowed.
Implement the 32bit variants which use the legacy syscalls and select the
variant in the core library.
The 64bit time variants are not removed because they are required for the
time64 based vdso accessors.
Fixes: 00b26474c2 ("lib/vdso: Provide generic VDSO implementation")
Reported-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reported-by: Paul Bolle <pebolle@tiscali.nl>
Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Link: https://lkml.kernel.org/r/20190728131648.971361611@linutronix.de
The generic VDSO implementation uses the Y2038 safe clock_gettime64() and
clock_getres_time64() syscalls as fallback for 32bit VDSO. This breaks
seccomp setups because these syscalls might be not (yet) allowed.
Implement the 32bit variants which use the legacy syscalls and select the
variant in the core library.
The 64bit time variants are not removed because they are required for the
time64 based vdso accessors.
Fixes: 7ac8707479 ("x86/vdso: Switch to generic vDSO implementation")
Reported-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reported-by: Paul Bolle <pebolle@tiscali.nl>
Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20190728131648.879156507@linutronix.de
To address the regression which causes seccomp to deny applications the
access to clock_gettime64() and clock_getres64() syscalls because they
are not enabled in the existing filters.
That trips over the fact that 32bit VDSOs use the new clock_gettime64() and
clock_getres64() syscalls in the fallback path.
Add a conditional to invoke the 32bit legacy fallback syscalls instead of
the new 64bit variants. The conditional can go away once all architectures
are converted.
Fixes: 00b26474c2 ("lib/vdso: Provide generic VDSO implementation")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1907301134470.1738@nanos.tec.linutronix.de
To allow syscall fallbacks using the legacy 32bit syscall for 32bit VDSO
builds, move the fallback invocation out into the callers.
Split the common code out of __cvdso_clock_gettime/getres() and invoke the
syscall fallback in the 64bit and 32bit variants.
Preparatory work for using legacy syscalls in 32bit VDSO. No functional
change.
Fixes: 00b26474c2 ("lib/vdso: Provide generic VDSO implementation")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Link: https://lkml.kernel.org/r/20190728131648.695579736@linutronix.de
The 32bit variants of vdso_clock_gettime()/getres() have a NULL pointer
check for the timespec pointer. That's inconsistent vs. 64bit.
But the vdso implementation will never be consistent versus the syscall
because the only case which it can handle is NULL. Any other invalid
pointer will cause a segfault. So special casing NULL is not really useful.
Remove it along with the superflouos syscall fallback invocation as that
will return -EFAULT anyway. That also gets rid of the dubious typecast
which only works because the pointer is NULL.
Fixes: 00b26474c2 ("lib/vdso: Provide generic VDSO implementation")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20190728131648.587523358@linutronix.de
The livepatching self-tests tweak the dynamic debug config to verify
the kernel log during the tests. Enhance set_dynamic_debug() so that
the config changes are restored when the script exits.
Note this functionality needs to keep in sync with:
- dynamic_debug input/output formatting
- functions affected by set_dynamic_debug()
For example, push_dynamic_debug() transforms:
kernel/livepatch/transition.c:530 [livepatch]klp_init_transition =_ "'%s': initializing %s transition\012"
to the following:
file kernel/livepatch/transition.c line 530 =_
Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com>
Tested-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Previously, using "%m" in a ksft_* format string can result in strange
output because the errno value wasn't saved before calling other libc
functions. The solution is to simply save and restore the errno before
we format the user-supplied format string.
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Support for handling the PPPOEIOCSFWD ioctl in compat mode was added in
linux-2.5.69 along with hundreds of other commands, but was always broken
sincen only the structure is compatible, but the command number is not,
due to the size being sizeof(size_t), or at first sizeof(sizeof((struct
sockaddr_pppox)), which is different on 64-bit architectures.
Guillaume Nault adds:
And the implementation was broken until 2016 (see 29e73269aa ("pppoe:
fix reference counting in PPPoE proxy")), and nobody ever noticed. I
should probably have removed this ioctl entirely instead of fixing it.
Clearly, it has never been used.
Fix it by adding a compat_ioctl handler for all pppoe variants that
translates the command number and then calls the regular ioctl function.
All other ioctl commands handled by pppoe are compatible between 32-bit
and 64-bit, and require compat_ptr() conversion.
This should apply to all stable kernels.
Acked-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull pidfd fixes from Christian Brauner:
"This makes setting the exit_state in exit_notify() consistent after
fixing the pidfd polling race pre-rc1. Related to the race fix, this
adds a WARN_ON() to do_notify_pidfd() to catch any future exit_state
races.
Last, this removes an obsolete comment from the pidfd tests"
* tag 'for-linus-20190730' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
exit: make setting exit_state consistent
pidfd: Add warning if exit_state is 0 during notification
pidfd: remove obsolete comments from test
Pull f2fs fixes from Jaegeuk Kim:
"This set of patches adjust to follow recent setflags changes and fix
two regressions"
* tag 'f2fs-for-5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs:
f2fs: use EINVAL for superblock with invalid magic
f2fs: fix to read source block before invalidating it
f2fs: remove redundant check from f2fs_setflags_common()
f2fs: use generic checking function for FS_IOC_FSSETXATTR
f2fs: use generic checking and prep function for FS_IOC_SETFLAGS
Pull kselftest fixes from Shuah Khan:
"Minor fixes to tests and one major fix to livepatch test to add skip
handling to avoid false fail reports when livepatch is disabled"
* tag 'linux-kselftest-5.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/livepatch: add test skip handling
selftests: mlxsw: Fix typo in qos_mc_aware.sh
selftests/x86: fix spelling mistake "FAILT" -> "FAIL"
selftests: kmod: Fix typo in kmod.sh
Pull rdma fixes from Jason Gunthorpe:
"A few regression and bug fixes for the patches merged in the last
cycle:
- hns fixes a subtle crash from the ib core SGL rework
- hfi1 fixes various error handling, oops and protocol errors
- bnxt_re fixes a regression where nvmeof doesn't work on some
configurations
- mlx5 fixes a serious 'use after free' bug in how MR caching is
handled
- some edge case crashers in the new statistic core code
- more siw static checker fixups"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
IB/mlx5: Fix RSS Toeplitz setup to be aligned with the HW specification
IB/counters: Always initialize the port counter object
IB/core: Fix querying total rdma stats
IB/mlx5: Prevent concurrent MR updates during invalidation
IB/mlx5: Fix clean_mr() to work in the expected order
IB/mlx5: Move MRs to a kernel PD when freeing them to the MR cache
IB/mlx5: Use direct mkey destroy command upon UMR unreg failure
IB/mlx5: Fix unreg_umr to ignore the mkey state
RDMA/siw: Remove set but not used variables 'rv'
IB/mlx5: Replace kfree with kvfree
RDMA/bnxt_re: Honor vlan_id in GID entry comparison
IB/hfi1: Drop all TID RDMA READ RESP packets after r_next_psn
IB/hfi1: Field not zero-ed when allocating TID flow memory
IB/hfi1: Unreserve a flushed OPFN request
IB/hfi1: Check for error on call to alloc_rsm_map_table
RDMA/hns: Fix sg offset non-zero issue
RDMA/siw: Fix error return code in siw_init_module()
Pull HMM fixes from Jason Gunthorpe:
"Fix the locking around nouveau's use of the hmm_range_* APIs. It works
correctly in the success case, but many of the the edge cases have
missing unlocks or double unlocks.
The diffstat is a bit big as Christoph did a comprehensive job to move
the obsolete API from the core header and into the driver before
fixing its flow, but the risk of regression from this code motion is
low"
* tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
nouveau: unlock mmap_sem on all errors from nouveau_range_fault
nouveau: remove the block parameter to nouveau_range_fault
mm/hmm: move hmm_vma_range_done and hmm_vma_fault to nouveau
mm/hmm: always return EBUSY for invalid ranges in hmm_range_{fault,snapshot}
Commit 33ec3e53e7 ("loop: Don't change loop device under exclusive
opener") made LOOP_SET_FD ioctl acquire exclusive block device reference
while it updates loop device binding. However this can make perfectly
valid mount(2) fail with EBUSY due to racing LOOP_SET_FD holding
temporarily the exclusive bdev reference in cases like this:
for i in {a..z}{a..z}; do
dd if=/dev/zero of=$i.image bs=1k count=0 seek=1024
mkfs.ext2 $i.image
mkdir mnt$i
done
echo "Run"
for i in {a..z}{a..z}; do
mount -o loop -t ext2 $i.image mnt$i &
done
Fix the problem by not getting full exclusive bdev reference in
LOOP_SET_FD but instead just mark the bdev as being claimed while we
update the binding information. This just blocks new exclusive openers
instead of failing them with EBUSY thus fixing the problem.
Fixes: 33ec3e53e7 ("loop: Don't change loop device under exclusive opener")
Cc: stable@vger.kernel.org
Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Commit 837158b847 ("dt-bindings: Check the examples against the
schemas") started generating YAML encoded DT files to validate the
examples against the schema. When running 'make dt_binding_check' in
tree after the 1st time, the generated example .dt.yaml files are
mistakenly added to the list of schema files. Exclude *.example.dt.yaml
files from the search for schema files.
Fixes: 837158b847 ("dt-bindings: Check the examples against the schemas")
Reported-by: Guido Günther <agx@sigxcpu.org>
Tested-by: Guido Günther <agx@sigxcpu.org>
Signed-off-by: Rob Herring <robh@kernel.org>
In xchk_da_btree_block_check_sibling(), there is an if statement on
line 274 to check whether ds->state->altpath.blk[level].bp is NULL:
if (ds->state->altpath.blk[level].bp)
When ds->state->altpath.blk[level].bp is NULL, it is used on line 281:
xfs_trans_brelse(..., ds->state->altpath.blk[level].bp);
struct xfs_buf_log_item *bip = bp->b_log_item;
ASSERT(bp->b_transp == tp);
Thus, possible null-pointer dereferences may occur.
To fix these bugs, ds->state->altpath.blk[level].bp is checked before
being used.
These bugs are found by a static analysis tool STCheck written by us.
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Since commit b191d6491b ("pidfd: fix a poll race when setting exit_state")
we unconditionally set exit_state to EXIT_ZOMBIE before calling into
do_notify_parent(). This was done to eliminate a race when querying
exit_state in do_notify_pidfd().
Back then we decided to do the absolute minimal thing to fix this and
not touch the rest of the exit_notify() function where exit_state is
set.
Since this fix has not caused any issues change the setting of
exit_state to EXIT_DEAD in the autoreap case to account for the fact hat
exit_state is set to EXIT_ZOMBIE unconditionally. This fix was planned
but also explicitly requested in [1] and makes the whole code more
consistent.
/* References */
[1]: https://lore.kernel.org/lkml/CAHk-=wigcxGFR2szue4wavJtH5cYTTeNES=toUBVGsmX0rzX+g@mail.gmail.com
Signed-off-by: Christian Brauner <christian@brauner.io>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
This patch corrects the SPDX License Identifier style
in header files related to Drivers for Intel(R) Trace Hub
controller.
For C header files Documentation/process/license-rules.rst
mandates C-like comments (opposed to C source files where
C++ style should be used)
Changes made by using a script provided by Joe Perches here:
https://lkml.org/lkml/2019/2/7/46
Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Nishad Kamdar <nishadkamdar@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
David Howells says:
====================
Here are a couple of fixes for rxrpc:
(1) Fix a potential deadlock in the peer keepalive dispatcher.
(2) Fix a missing notification when a UDP sendmsg error occurs in rxrpc.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
If PHYLIB is not set, build enetc will fails:
drivers/net/ethernet/freescale/enetc/enetc.o: In function `enetc_open':
enetc.c: undefined reference to `phy_disconnect'
enetc.c: undefined reference to `phy_start'
drivers/net/ethernet/freescale/enetc/enetc.o: In function `enetc_close':
enetc.c: undefined reference to `phy_stop'
enetc.c: undefined reference to `phy_disconnect'
drivers/net/ethernet/freescale/enetc/enetc_ethtool.o: undefined reference to `phy_ethtool_get_link_ksettings'
drivers/net/ethernet/freescale/enetc/enetc_ethtool.o: undefined reference to `phy_ethtool_set_link_ksettings'
drivers/net/ethernet/freescale/enetc/enetc_mdio.o: In function `enetc_mdio_probe':
enetc_mdio.c: undefined reference to `mdiobus_alloc_size'
enetc_mdio.c: undefined reference to `mdiobus_free'
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: d4fd0404c1 ("enetc: Introduce basic PF and VF ENETC ethernet drivers")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With recent changes that introduced support for Page Pool in stmmac, Jon
reported that NFS boot was no longer working on an ARM64 based platform
that had the IP behind an IOMMU.
As Page Pool API does not guarantee DMA syncing because of the use of
DMA_ATTR_SKIP_CPU_SYNC flag, we have to explicit sync the whole buffer upon
re-allocation because we are always re-using same pages.
In fact, ARM64 code invalidates the DMA area upon two situations [1]:
- sync_single_for_cpu(): Invalidates if direction != DMA_TO_DEVICE
- sync_single_for_device(): Invalidates if direction == DMA_FROM_DEVICE
So, as we must invalidate both the current RX buffer and the newly allocated
buffer we propose this fix.
[1] arch/arm64/mm/cache.S
Reported-by: Jon Hunter <jonathanh@nvidia.com>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Fixes: 2af6106ae9 ("net: stmmac: Introducing support for Page Pool")
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Tested-by: Ezequiel Garcia <ezequiel@collabora.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently are duplicated checks on orig_egr_types which are
redundant, I believe this is a typo and should actually be
orig_ing_types || orig_egr_types instead of the expression
orig_egr_types || orig_egr_types. Fix these.
Addresses-Coverity: ("Same on both sides")
Fixes: c6b36bdd04 ("mlxsw: spectrum_ptp: Increase parsing depth when PTP is enabled")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is perfectly ok to not have an gpio attached to the fixed-link node. So
the driver should not throw an error message when the gpio is missing.
Fixes: 5468e82f70 ("net: phy: fixed-phy: Drop GPIO from fixed_phy_add()")
Signed-off-by: Hubert Feurstein <h.feurstein@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
In qla2x00_alloc_fcport(), fcport is assigned to NULL in the error
handling code on line 4880:
fcport = NULL;
Then fcport is used on lines 4883-4886:
INIT_WORK(&fcport->del_work, qla24xx_delete_sess_fn);
INIT_WORK(&fcport->reg_work, qla_register_fcport_fn);
INIT_LIST_HEAD(&fcport->gnl_entry);
INIT_LIST_HEAD(&fcport->list);
Thus, possible null-pointer dereferences may occur.
To fix these bugs, qla2x00_alloc_fcport() directly returns NULL
in the error handling code.
These bugs are found by a static analysis tool STCheck written by us.
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Acked-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Although SAS3 & SAS3.5 IT HBA controllers support 64-bit DMA addressing, as
per hardware design, if DMA-able range contains all 64-bits
set (0xFFFFFFFF-FFFFFFFF) then it results in a firmware fault.
E.g. SGE's start address is 0xFFFFFFFF-FFFF000 and data length is 0x1000
bytes. when HBA tries to DMA the data at 0xFFFFFFFF-FFFFFFFF location then
HBA will fault the firmware.
Driver will set 63-bit DMA mask to ensure the above address will not be
used.
Cc: <stable@vger.kernel.org> # 5.1.20+
Signed-off-by: Suganath Prabu <suganath-prabu.subramani@broadcom.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
There is a race condition between removing glue directory and adding a new
device under the glue dir. It can be reproduced in following test:
CPU1: CPU2:
device_add()
get_device_parent()
class_dir_create_and_add()
kobject_add_internal()
create_dir() // create glue_dir
device_add()
get_device_parent()
kobject_get() // get glue_dir
device_del()
cleanup_glue_dir()
kobject_del(glue_dir)
kobject_add()
kobject_add_internal()
create_dir() // in glue_dir
sysfs_create_dir_ns()
kernfs_create_dir_ns(sd)
sysfs_remove_dir() // glue_dir->sd=NULL
sysfs_put() // free glue_dir->sd
// sd is freed
kernfs_new_node(sd)
kernfs_get(glue_dir)
kernfs_add_one()
kernfs_put()
Before CPU1 remove last child device under glue dir, if CPU2 add a new
device under glue dir, the glue_dir kobject reference count will be
increase to 2 via kobject_get() in get_device_parent(). And CPU2 has
been called kernfs_create_dir_ns(), but not call kernfs_new_node().
Meanwhile, CPU1 call sysfs_remove_dir() and sysfs_put(). This result in
glue_dir->sd is freed and it's reference count will be 0. Then CPU2 call
kernfs_get(glue_dir) will trigger a warning in kernfs_get() and increase
it's reference count to 1. Because glue_dir->sd is freed by CPU1, the next
call kernfs_add_one() by CPU2 will fail(This is also use-after-free)
and call kernfs_put() to decrease reference count. Because the reference
count is decremented to 0, it will also call kmem_cache_free() to free
the glue_dir->sd again. This will result in double free.
In order to avoid this happening, we also should make sure that kernfs_node
for glue_dir is released in CPU1 only when refcount for glue_dir kobj is
1 to fix this race.
The following calltrace is captured in kernel 4.14 with the following patch
applied:
commit 726e410979 ("drivers: core: Remove glue dirs from sysfs earlier")
--------------------------------------------------------------------------
[ 3.633703] WARNING: CPU: 4 PID: 513 at .../fs/kernfs/dir.c:494
Here is WARN_ON(!atomic_read(&kn->count) in kernfs_get().
....
[ 3.633986] Call trace:
[ 3.633991] kernfs_create_dir_ns+0xa8/0xb0
[ 3.633994] sysfs_create_dir_ns+0x54/0xe8
[ 3.634001] kobject_add_internal+0x22c/0x3f0
[ 3.634005] kobject_add+0xe4/0x118
[ 3.634011] device_add+0x200/0x870
[ 3.634017] _request_firmware+0x958/0xc38
[ 3.634020] request_firmware_into_buf+0x4c/0x70
....
[ 3.634064] kernel BUG at .../mm/slub.c:294!
Here is BUG_ON(object == fp) in set_freepointer().
....
[ 3.634346] Call trace:
[ 3.634351] kmem_cache_free+0x504/0x6b8
[ 3.634355] kernfs_put+0x14c/0x1d8
[ 3.634359] kernfs_create_dir_ns+0x88/0xb0
[ 3.634362] sysfs_create_dir_ns+0x54/0xe8
[ 3.634366] kobject_add_internal+0x22c/0x3f0
[ 3.634370] kobject_add+0xe4/0x118
[ 3.634374] device_add+0x200/0x870
[ 3.634378] _request_firmware+0x958/0xc38
[ 3.634381] request_firmware_into_buf+0x4c/0x70
--------------------------------------------------------------------------
Fixes: 726e410979 ("drivers: core: Remove glue dirs from sysfs earlier")
Signed-off-by: Muchun Song <smuchun@gmail.com>
Reviewed-by: Mukesh Ojha <mojha@codeaurora.org>
Signed-off-by: Prateek Sood <prsood@codeaurora.org>
Link: https://lore.kernel.org/r/20190727032122.24639-1-smuchun@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Mark switch cases where we are expecting to fall through.
This patch fixes the following warning (Building: mips):
arch/mips/oprofile/op_model_mipsxx.c: In function ‘mipsxx_cpu_stop’:
arch/mips/oprofile/op_model_mipsxx.c:217:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
w_c0_perfctrl3(0);
^~~~~~~~~~~~~~~~~
arch/mips/oprofile/op_model_mipsxx.c:218:2: note: here
case 3:
^~~~
arch/mips/oprofile/op_model_mipsxx.c:219:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
w_c0_perfctrl2(0);
^~~~~~~~~~~~~~~~~
arch/mips/oprofile/op_model_mipsxx.c:220:2: note: here
case 2:
^~~~
arch/mips/oprofile/op_model_mipsxx.c:221:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
w_c0_perfctrl1(0);
^~~~~~~~~~~~~~~~~
arch/mips/oprofile/op_model_mipsxx.c:222:2: note: here
case 1:
^~~~
arch/mips/oprofile/op_model_mipsxx.c: In function ‘mipsxx_cpu_start’:
arch/mips/oprofile/op_model_mipsxx.c:197:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
w_c0_perfctrl3(WHAT | reg.control[3]);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/mips/oprofile/op_model_mipsxx.c:198:2: note: here
case 3:
^~~~
arch/mips/oprofile/op_model_mipsxx.c:199:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
w_c0_perfctrl2(WHAT | reg.control[2]);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/mips/oprofile/op_model_mipsxx.c:200:2: note: here
case 2:
^~~~
arch/mips/oprofile/op_model_mipsxx.c:201:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
w_c0_perfctrl1(WHAT | reg.control[1]);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/mips/oprofile/op_model_mipsxx.c:202:2: note: here
case 1:
^~~~
arch/mips/oprofile/op_model_mipsxx.c: In function ‘reset_counters’:
arch/mips/oprofile/op_model_mipsxx.c:299:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
w_c0_perfcntr3(0);
^~~~~~~~~~~~~~~~~
arch/mips/oprofile/op_model_mipsxx.c:300:2: note: here
case 3:
^~~~
arch/mips/oprofile/op_model_mipsxx.c:302:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
w_c0_perfcntr2(0);
^~~~~~~~~~~~~~~~~
arch/mips/oprofile/op_model_mipsxx.c:303:2: note: here
case 2:
^~~~
arch/mips/oprofile/op_model_mipsxx.c:305:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
w_c0_perfcntr1(0);
^~~~~~~~~~~~~~~~~
arch/mips/oprofile/op_model_mipsxx.c:306:2: note: here
case 1:
^~~~
arch/mips/oprofile/op_model_mipsxx.c: In function ‘mipsxx_perfcount_handler’:
arch/mips/oprofile/op_model_mipsxx.c:242:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
if ((control & MIPS_PERFCTRL_IE) && \
^
arch/mips/oprofile/op_model_mipsxx.c:248:2: note: in expansion of macro ‘HANDLE_COUNTER’
HANDLE_COUNTER(3)
^~~~~~~~~~~~~~
arch/mips/oprofile/op_model_mipsxx.c:239:2: note: here
case n + 1: \
^
arch/mips/oprofile/op_model_mipsxx.c:249:2: note: in expansion of macro ‘HANDLE_COUNTER’
HANDLE_COUNTER(2)
^~~~~~~~~~~~~~
arch/mips/oprofile/op_model_mipsxx.c:242:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
if ((control & MIPS_PERFCTRL_IE) && \
^
arch/mips/oprofile/op_model_mipsxx.c:249:2: note: in expansion of macro ‘HANDLE_COUNTER’
HANDLE_COUNTER(2)
^~~~~~~~~~~~~~
arch/mips/oprofile/op_model_mipsxx.c:239:2: note: here
case n + 1: \
^
arch/mips/oprofile/op_model_mipsxx.c:250:2: note: in expansion of macro ‘HANDLE_COUNTER’
HANDLE_COUNTER(1)
^~~~~~~~~~~~~~
arch/mips/oprofile/op_model_mipsxx.c:242:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
if ((control & MIPS_PERFCTRL_IE) && \
^
arch/mips/oprofile/op_model_mipsxx.c:250:2: note: in expansion of macro ‘HANDLE_COUNTER’
HANDLE_COUNTER(1)
^~~~~~~~~~~~~~
arch/mips/oprofile/op_model_mipsxx.c:239:2: note: here
case n + 1: \
^
arch/mips/oprofile/op_model_mipsxx.c:251:2: note: in expansion of macro ‘HANDLE_COUNTER’
HANDLE_COUNTER(0)
^~~~~~~~~~~~~~
CC usr/include/linux/pmu.h.s
arch/mips/oprofile/op_model_mipsxx.c: In function ‘mipsxx_cpu_setup’:
arch/mips/oprofile/op_model_mipsxx.c:174:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
w_c0_perfcntr3(reg.counter[3]);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/mips/oprofile/op_model_mipsxx.c:175:2: note: here
case 3:
^~~~
arch/mips/oprofile/op_model_mipsxx.c:177:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
w_c0_perfcntr2(reg.counter[2]);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/mips/oprofile/op_model_mipsxx.c:178:2: note: here
case 2:
^~~~
arch/mips/oprofile/op_model_mipsxx.c:180:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
w_c0_perfcntr1(reg.counter[1]);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/mips/oprofile/op_model_mipsxx.c:181:2: note: here
case 1:
^~~~
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Paul Burton <paul.burton@mips.com>
Cc: Robert Richter <rric@kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: oprofile-list@lists.sf.net
Cc: linux-mips@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Kees Cook <keescook@chromium.org>
Accessing the hdr of an skb that was consumed already isn't
a good idea.
First ask if the skb is a QoS packet, then keep that data
on stack, and then consume the skb.
This was spotted by KASAN.
Cc: stable@vger.kernel.org
Fixes: 08f7d8b69a ("iwlwifi: mvm: bring back mvm GSO code")
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The index for the elements of the ACPI object we dereference
was static. This means that if we called the function twice
we wouldn't start from 3 again, but rather from the latest
index we reached in the previous call.
This was dutifully reported by KASAN.
Fix this.
Cc: stable@vger.kernel.org
Fixes: 6996490501 ("iwlwifi: mvm: add support for EWRD (Dynamic SAR) ACPI table")
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
In order to remember how to unmap a memory (as single or
as page), we maintain a bit per Transmit Buffer (TBs) in
the meta data (structure iwl_cmd_meta).
We maintain a bitmap: 1 bit per TB.
If the TB is set, we will free the memory as a page.
This bitmap was never cleared. Fix this.
Cc: stable@vger.kernel.org
Fixes: 3cd1980b0c ("iwlwifi: pcie: introduce new tfd and tb formats")
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
We erroneously added a check for FW API version 41 before sending
GEO_TX_POWER_LIMIT, but this was already implemented in version 38.
Additionally, it was cherry-picked to older versions, namely 17, 26
and 29, so check for those as well.
Cc: stable@vger.kernel.org
Fixes: eca1e56cee ("iwlwifi: mvm: don't send GEO_TX_POWER_LIMIT to old firmwares")
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
lq_info is an arary of size 2, active_tbl index is u8.
When accessing lq_info[1 - active_tbl], theoretically it's possible
that the access will be made to a negative index value.
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
An earlier patch made sure that the queues are not lagging
too far behind. This means that iwl_mvm_release_frames
should not be called with a head_sn too far behind NSSN.
Don't take the risk to change completely the entry
condition to iwl_mvm_release_frames, but don't update
the head_sn is the NSSN is more than 2048 packets ahead
of us. Since this just cannot be right. This means that
the scenario described here happened. We are queue 0.
Q:0 Q:1
head_sn: 0 -> 2047
head_sn: 2048
Lots of packets arrive:
head_sn: 2047 -> 2150
send NSSN_SYNC notification
Handle notification
from the firmware and
do NOT move the head_sn
back to 2048
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The solution with the worker still had a bug, as in order
to get sta, rcu_read_lock should be used and thus no mutex
can be used inside iwl_mvm_rs_rate_init.
Also, spin_lock is a simpler solution, no need to spawn a
dedicated worker.
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The only place where the command was sent as SYNC is during
init and this is not really critical. This change is required
for replacing RS mutex with a spinlock (in the subsequent patch),
since SYNC comamnd requres sleeping and thus the flow cannot
be done when holding a spinlock.
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The comparison of the u32 variable wgds_tbl_idx with less than zero is
always going to be false because it is unsigned. Fix this by making
wgds_tbl_idx a plain signed int.
Addresses-Coverity: ("Unsigned compared against 0")
Fixes: 4fd445a2c8 ("iwlwifi: mvm: Add log information about SAR status")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This code clearly never could have worked, since it locks
while already locked. Add an unlocked __iwl_mvm_mac_set_key()
variant that doesn't do locking to fix that.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The driver should call iwl_dbg_tlv_free even if debugfs is not defined
since ini mode does not depend on debugfs ifdef.
Signed-off-by: Shahar S Matityahu <shahar.s.matityahu@intel.com>
Fixes: 68f6f492c4 ("iwlwifi: trans: support loading ini TLVs from external file")
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add the missing GPLv2 SPDX license identifier.
It appears this single file was missing from 7f904d7e1f ("treewide:
Replace GPLv2 boilerplate/reference with SPDX - rule 505"), which
addressed all other files in scripts/coccinelle. Hence I added
GPL-2.0-only consitently with the mentioned patch.
Cc: linux-spdx@vger.kernel.org
Cc: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Matthias Maennich <maennich@google.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
iwl_mvm_rs_tx_status can be called from two places in the code, but the
mutex is taken only on one of the calls. Split it into a wrapper taking
locks and an internal __iwl_mvm_rs_tx_status function.
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The FSF does not reside in "675 Mass Ave, Cambridge" anymore...
let's replace the old GPL boilerplate code with a proper SPDX
identifier instead.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In order to support MSI-X efficiently, we want to avoid
communication across Rx queues. Each Rx queue should have
all the data it needs to process a packet.
The reordering buffer is a challenge in the MSI-X world
since we can have a single BA session whose packets are
directed to different queues. This is why each queue has
its own reordering buffer. The hardware is able to hint
the driver whether we have a hole or not, which allows
the driver to know whether it can release a packet or not.
This indication is called NSSN. Roughly, if the packet's
SN is lower than the NSSN, we can release the packet to
the stack. The NSSN is the SN of the newest packet received
without any holes + 1.
This is working as long as we don't have packets that we
release because of a timeout. When that happens, we could
have taken the decision to release a packet after we have
been waiting for its predecessor for too long. If this
predecessor comes later, we have to drop it because we
can't release packets out of order. In that case, the
hardware will give us an indication that we can we release
the packet (SN < NSSN), but the packet still needs to be
dropped.
This is why we sometimes need to ignore the NSSN and we
track the head_sn in software.
Here is a specific example of this:
1) Rx queue 1 got packets: 480, 482, 483
2) We release 480 to to the stack and wait for 481
3) NSSN is now 481
4) The timeout expires
5) We release 482 and 483, NSSN is still 480
6) 481 arrives its NSSN is 484.
We need to drop 481 even if 481 < 484. This is why we'll
update the head_sn to 484 at step 2. The flow now is:
1) Rx queue 1 got packets: 480, 482, 483
2) We release 480 to to the stack and wait for 481
3) NSSN is now 481 / head_sn is 481
4) The timeout expires
5) We release 482 and 483, NSSN is still 480 but head_sn is 484.
6) 481 arrives its NSSN is 484, but head_sn is 484 and we drop it.
This code introduces another problem in case all the traffic
goes well (no hole, no timeout):
Rx queue 1: 0 -> 483 (head_sn = 484)
Rx queue 2: 501 -> 4095 (head_sn = 0)
Rx queue 2: 0 -> 480 (head_sn = 481)
Rx queue 1: 481 but head_sn = 484 and we drop it.
At this point, the SN of queue 1 is far behind: more than
4040 packets behind. Queue 1 will consider 481 "old"
because 481 is in [501-64:501] whereas it is a very new
packet.
In order to fix that, send an Rx notification from time to
time (twice across the full set of 4096 packets) to make
sure no Rx queue is lagging too far behind.
What will happen then is:
Rx queue 1: 0 -> 483 (head_sn = 484)
Rx queue 2: 501 -> 2047 (head_sn = 2048)
Rx queue 1: Sync nofication (head_sn = 2048)
Rx queue 2: 2048 -> 4095 (head_sn = 0)
Rx queue 1: Sync notification (head_sn = 0)
Rx queue 2: 1 -> 481 (head_sn = 482)
Rx queue 1: 481 and head_sn = 0.
In queue 1's data, head_sn is now 0, the packet coming in
is 481, it'll understand that the new packet is new and it
won't be dropped.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
We will soon be using a new notification that will be
initiated by the driver, sent to the firmware and sent
back to all the RSS queues by the firmware. This new
notification will be useful to synchronize the NSSN across
all the queues.
For now, don't send the notification, just add the code to
handle it. Later patch will add the code to actually send
it.
While at it, validate the baid coming from the firmware to
avoid accessing an array with a bad index in the driver.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Firmware versions before 41 don't support the GEO_TX_POWER_LIMIT
command, and sending it to the firmware will cause a firmware crash.
We allow this via debugfs, so we need to return an error value in case
it's not supported.
This had already been fixed during init, when we send the command if
the ACPI WGDS table is present. Fix it also for the other,
userspace-triggered case.
Cc: stable@vger.kernel.org
Fixes: 7fe90e0e3d ("iwlwifi: mvm: refactor geo init")
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Rate perform uses the lq_sta table to calculate the next rate to scale
while rate init resets the same table,
Rate perform is done in soft irq context in parallel to rate init
that can be called in case we are doing changes like AP changes BW
or moving state for auth to assoc.
Signed-off-by: Mordechay Goodstein <mordechay.goodstein@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
On older NICs, we occasionally see issues with A-MSDU support,
where the commands in the FIFO get confused and then we see an
assert EDC because the next command in the FIFO isn't TX.
We've tried to isolate this issue and understand where it comes
from, but haven't found any errors in building the A-MSDU in
software.
At least for now, disable A-MSDU support on older hardware so
that users can use it again without fearing the assert.
This fixes https://bugzilla.kernel.org/show_bug.cgi?id=203315.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The fiemap handler locks a file range that can have unflushed delalloc,
and after locking the range, it tries to attach to a running transaction.
If the running transaction started its commit, that is, it is in state
TRANS_STATE_COMMIT_START, and either the filesystem was mounted with the
flushoncommit option or the transaction is creating a snapshot for the
subvolume that contains the file that fiemap is operating on, we end up
deadlocking. This happens because fiemap is blocked on the transaction,
waiting for it to complete, and the transaction is waiting for the flushed
dealloc to complete, which requires locking the file range that the fiemap
task already locked. The following stack traces serve as an example of
when this deadlock happens:
(...)
[404571.515510] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
[404571.515956] Call Trace:
[404571.516360] ? __schedule+0x3ae/0x7b0
[404571.516730] schedule+0x3a/0xb0
[404571.517104] lock_extent_bits+0x1ec/0x2a0 [btrfs]
[404571.517465] ? remove_wait_queue+0x60/0x60
[404571.517832] btrfs_finish_ordered_io+0x292/0x800 [btrfs]
[404571.518202] normal_work_helper+0xea/0x530 [btrfs]
[404571.518566] process_one_work+0x21e/0x5c0
[404571.518990] worker_thread+0x4f/0x3b0
[404571.519413] ? process_one_work+0x5c0/0x5c0
[404571.519829] kthread+0x103/0x140
[404571.520191] ? kthread_create_worker_on_cpu+0x70/0x70
[404571.520565] ret_from_fork+0x3a/0x50
[404571.520915] kworker/u8:6 D 0 31651 2 0x80004000
[404571.521290] Workqueue: btrfs-flush_delalloc btrfs_flush_delalloc_helper [btrfs]
(...)
[404571.537000] fsstress D 0 13117 13115 0x00004000
[404571.537263] Call Trace:
[404571.537524] ? __schedule+0x3ae/0x7b0
[404571.537788] schedule+0x3a/0xb0
[404571.538066] wait_current_trans+0xc8/0x100 [btrfs]
[404571.538349] ? remove_wait_queue+0x60/0x60
[404571.538680] start_transaction+0x33c/0x500 [btrfs]
[404571.539076] btrfs_check_shared+0xa3/0x1f0 [btrfs]
[404571.539513] ? extent_fiemap+0x2ce/0x650 [btrfs]
[404571.539866] extent_fiemap+0x2ce/0x650 [btrfs]
[404571.540170] do_vfs_ioctl+0x526/0x6f0
[404571.540436] ksys_ioctl+0x70/0x80
[404571.540734] __x64_sys_ioctl+0x16/0x20
[404571.540997] do_syscall_64+0x60/0x1d0
[404571.541279] entry_SYSCALL_64_after_hwframe+0x49/0xbe
(...)
[404571.543729] btrfs D 0 14210 14208 0x00004000
[404571.544023] Call Trace:
[404571.544275] ? __schedule+0x3ae/0x7b0
[404571.544526] ? wait_for_completion+0x112/0x1a0
[404571.544795] schedule+0x3a/0xb0
[404571.545064] schedule_timeout+0x1ff/0x390
[404571.545351] ? lock_acquire+0xa6/0x190
[404571.545638] ? wait_for_completion+0x49/0x1a0
[404571.545890] ? wait_for_completion+0x112/0x1a0
[404571.546228] wait_for_completion+0x131/0x1a0
[404571.546503] ? wake_up_q+0x70/0x70
[404571.546775] btrfs_wait_ordered_extents+0x27c/0x400 [btrfs]
[404571.547159] btrfs_commit_transaction+0x3b0/0xae0 [btrfs]
[404571.547449] ? btrfs_mksubvol+0x4a4/0x640 [btrfs]
[404571.547703] ? remove_wait_queue+0x60/0x60
[404571.547969] btrfs_mksubvol+0x605/0x640 [btrfs]
[404571.548226] ? __sb_start_write+0xd4/0x1c0
[404571.548512] ? mnt_want_write_file+0x24/0x50
[404571.548789] btrfs_ioctl_snap_create_transid+0x169/0x1a0 [btrfs]
[404571.549048] btrfs_ioctl_snap_create_v2+0x11d/0x170 [btrfs]
[404571.549307] btrfs_ioctl+0x133f/0x3150 [btrfs]
[404571.549549] ? mem_cgroup_charge_statistics+0x4c/0xd0
[404571.549792] ? mem_cgroup_commit_charge+0x84/0x4b0
[404571.550064] ? __handle_mm_fault+0xe3e/0x11f0
[404571.550306] ? do_raw_spin_unlock+0x49/0xc0
[404571.550608] ? _raw_spin_unlock+0x24/0x30
[404571.550976] ? __handle_mm_fault+0xedf/0x11f0
[404571.551319] ? do_vfs_ioctl+0xa2/0x6f0
[404571.551659] ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs]
[404571.552087] do_vfs_ioctl+0xa2/0x6f0
[404571.552355] ksys_ioctl+0x70/0x80
[404571.552621] __x64_sys_ioctl+0x16/0x20
[404571.552864] do_syscall_64+0x60/0x1d0
[404571.553104] entry_SYSCALL_64_after_hwframe+0x49/0xbe
(...)
If we were joining the transaction instead of attaching to it, we would
not risk a deadlock because a join only blocks if the transaction is in a
state greater then or equals to TRANS_STATE_COMMIT_DOING, and the delalloc
flush performed by a transaction is done before it reaches that state,
when it is in the state TRANS_STATE_COMMIT_START. However a transaction
join is intended for use cases where we do modify the filesystem, and
fiemap only needs to peek at delayed references from the current
transaction in order to determine if extents are shared, and, besides
that, when there is no current transaction or when it blocks to wait for
a current committing transaction to complete, it creates a new transaction
without reserving any space. Such unnecessary transactions, besides doing
unnecessary IO, can cause transaction aborts (-ENOSPC) and unnecessary
rotation of the precious backup roots.
So fix this by adding a new transaction join variant, named join_nostart,
which behaves like the regular join, but it does not create a transaction
when none currently exists or after waiting for a committing transaction
to complete.
Fixes: 03628cdbc6 ("Btrfs: do not start a transaction during fiemap")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When one transaction is finishing its commit, it is possible for another
transaction to start and enter its initial commit phase as well. If the
first ends up getting aborted, we have a small time window where the second
transaction commit does not notice that the previous transaction aborted
and ends up committing, writing a superblock that points to btrees that
reference extent buffers (nodes and leafs) that were not persisted to disk.
The consequence is that after mounting the filesystem again, we will be
unable to load some btree nodes/leafs, either because the content on disk
is either garbage (or just zeroes) or corresponds to the old content of a
previouly COWed or deleted node/leaf, resulting in the well known error
messages "parent transid verify failed on ...".
The following sequence diagram illustrates how this can happen.
CPU 1 CPU 2
<at transaction N>
btrfs_commit_transaction()
(...)
--> sets transaction state to
TRANS_STATE_UNBLOCKED
--> sets fs_info->running_transaction
to NULL
(...)
btrfs_start_transaction()
start_transaction()
wait_current_trans()
--> returns immediately
because
fs_info->running_transaction
is NULL
join_transaction()
--> creates transaction N + 1
--> sets
fs_info->running_transaction
to transaction N + 1
--> adds transaction N + 1 to
the fs_info->trans_list list
--> returns transaction handle
pointing to the new
transaction N + 1
(...)
btrfs_sync_file()
btrfs_start_transaction()
--> returns handle to
transaction N + 1
(...)
btrfs_write_and_wait_transaction()
--> writeback of some extent
buffer fails, returns an
error
btrfs_handle_fs_error()
--> sets BTRFS_FS_STATE_ERROR in
fs_info->fs_state
--> jumps to label "scrub_continue"
cleanup_transaction()
btrfs_abort_transaction(N)
--> sets BTRFS_FS_STATE_TRANS_ABORTED
flag in fs_info->fs_state
--> sets aborted field in the
transaction and transaction
handle structures, for
transaction N only
--> removes transaction from the
list fs_info->trans_list
btrfs_commit_transaction(N + 1)
--> transaction N + 1 was not
aborted, so it proceeds
(...)
--> sets the transaction's state
to TRANS_STATE_COMMIT_START
--> does not find the previous
transaction (N) in the
fs_info->trans_list, so it
doesn't know that transaction
was aborted, and the commit
of transaction N + 1 proceeds
(...)
--> sets transaction N + 1 state
to TRANS_STATE_UNBLOCKED
btrfs_write_and_wait_transaction()
--> succeeds writing all extent
buffers created in the
transaction N + 1
write_all_supers()
--> succeeds
--> we now have a superblock on
disk that points to trees
that refer to at least one
extent buffer that was
never persisted
So fix this by updating the transaction commit path to check if the flag
BTRFS_FS_STATE_TRANS_ABORTED is set on fs_info->fs_state if after setting
the transaction to the TRANS_STATE_COMMIT_START we do not find any previous
transaction in the fs_info->trans_list. If the flag is set, just fail the
transaction commit with -EROFS, as we do in other places. The exact error
code for the previous transaction abort was already logged and reported.
Fixes: 49b25e0540 ("btrfs: enhance transaction abort infrastructure")
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When doing an incremental send operation we can fail if we previously did
deduplication operations against a file that exists in both snapshots. In
that case we will fail the send operation with -EIO and print a message
to dmesg/syslog like the following:
BTRFS error (device sdc): Send: inconsistent snapshot, found updated \
extent for inode 257 without updated inode item, send root is 258, \
parent root is 257
This requires that we deduplicate to the same file in both snapshots for
the same amount of times on each snapshot. The issue happens because a
deduplication only updates the iversion of an inode and does not update
any other field of the inode, therefore if we deduplicate the file on
each snapshot for the same amount of time, the inode will have the same
iversion value (stored as the "sequence" field on the inode item) on both
snapshots, therefore it will be seen as unchanged between in the send
snapshot while there are new/updated/deleted extent items when comparing
to the parent snapshot. This makes the send operation return -EIO and
print an error message.
Example reproducer:
$ mkfs.btrfs -f /dev/sdb
$ mount /dev/sdb /mnt
# Create our first file. The first half of the file has several 64Kb
# extents while the second half as a single 512Kb extent.
$ xfs_io -f -s -c "pwrite -S 0xb8 -b 64K 0 512K" /mnt/foo
$ xfs_io -c "pwrite -S 0xb8 512K 512K" /mnt/foo
# Create the base snapshot and the parent send stream from it.
$ btrfs subvolume snapshot -r /mnt /mnt/mysnap1
$ btrfs send -f /tmp/1.snap /mnt/mysnap1
# Create our second file, that has exactly the same data as the first
# file.
$ xfs_io -f -c "pwrite -S 0xb8 0 1M" /mnt/bar
# Create the second snapshot, used for the incremental send, before
# doing the file deduplication.
$ btrfs subvolume snapshot -r /mnt /mnt/mysnap2
# Now before creating the incremental send stream:
#
# 1) Deduplicate into a subrange of file foo in snapshot mysnap1. This
# will drop several extent items and add a new one, also updating
# the inode's iversion (sequence field in inode item) by 1, but not
# any other field of the inode;
#
# 2) Deduplicate into a different subrange of file foo in snapshot
# mysnap2. This will replace an extent item with a new one, also
# updating the inode's iversion by 1 but not any other field of the
# inode.
#
# After these two deduplication operations, the inode items, for file
# foo, are identical in both snapshots, but we have different extent
# items for this inode in both snapshots. We want to check this doesn't
# cause send to fail with an error or produce an incorrect stream.
$ xfs_io -r -c "dedupe /mnt/bar 0 0 512K" /mnt/mysnap1/foo
$ xfs_io -r -c "dedupe /mnt/bar 512K 512K 512K" /mnt/mysnap2/foo
# Create the incremental send stream.
$ btrfs send -p /mnt/mysnap1 -f /tmp/2.snap /mnt/mysnap2
ERROR: send ioctl failed with -5: Input/output error
This issue started happening back in 2015 when deduplication was updated
to not update the inode's ctime and mtime and update only the iversion.
Back then we would hit a BUG_ON() in send, but later in 2016 send was
updated to return -EIO and print the error message instead of doing the
BUG_ON().
A test case for fstests follows soon.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203933
Fixes: 1c919a5e13 ("btrfs: don't update mtime/ctime on deduped inodes")
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
CLANG_FLAGS is initialized by the following line:
CLANG_FLAGS := --target=$(notdir $(CROSS_COMPILE:%-=%))
..., which is run only when CROSS_COMPILE is set.
Some build targets (bindeb-pkg etc.) recurse to the top Makefile.
When you build the kernel with Clang but without CROSS_COMPILE,
the same compiler flags such as -no-integrated-as are accumulated
into CLANG_FLAGS.
If you run 'make CC=clang' and then 'make CC=clang bindeb-pkg',
Kbuild will recompile everything needlessly due to the build command
change.
Fix this by correctly initializing CLANG_FLAGS.
Fixes: 238bcbc4e0 ("kbuild: consolidate Clang compiler flags")
Cc: <stable@vger.kernel.org> # v5.0+
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Acked-by: Nick Desaulniers <ndesaulniers@google.com>
Commit "vivid: reorder CEC allocation and control set-up" missed that the CEC adapter
needs a valid vfd->name, and that was now filled in after the CEC adapter was
created, leading to an empty adapter name.
Fill in the name earlier.
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Fixes: 4ee895e71a ("media: vivid: reorder CEC allocation and control set-up")
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
After commit ddde3c18b7 ("vt: More locking checks") kdb / kgdb has
become useless because my console is filled with spews of:
WARNING: CPU: 0 PID: 0 at .../drivers/tty/vt/vt.c:3846 con_is_visible+0x50/0x74
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.0-rc1+ #48
Hardware name: Rockchip (Device Tree)
Backtrace:
[<c020ce9c>] (dump_backtrace) from [<c020d188>] (show_stack+0x20/0x24)
[<c020d168>] (show_stack) from [<c0a8fc14>] (dump_stack+0xb0/0xd0)
[<c0a8fb64>] (dump_stack) from [<c0232c58>] (__warn+0xec/0x11c)
[<c0232b6c>] (__warn) from [<c0232dc4>] (warn_slowpath_null+0x4c/0x58)
[<c0232d78>] (warn_slowpath_null) from [<c06338a0>] (con_is_visible+0x50/0x74)
[<c0633850>] (con_is_visible) from [<c0634078>] (con_scroll+0x108/0x1ac)
[<c0633f70>] (con_scroll) from [<c0634160>] (lf+0x44/0x88)
[<c063411c>] (lf) from [<c06363ec>] (vt_console_print+0x1a4/0x2bc)
[<c0636248>] (vt_console_print) from [<c02f628c>] (vkdb_printf+0x420/0x8a4)
[<c02f5e6c>] (vkdb_printf) from [<c02f6754>] (kdb_printf+0x44/0x60)
[<c02f6714>] (kdb_printf) from [<c02fa6f4>] (kdb_main_loop+0xf4/0x6e0)
[<c02fa600>] (kdb_main_loop) from [<c02fd5f0>] (kdb_stub+0x268/0x398)
[<c02fd388>] (kdb_stub) from [<c02f3ba0>] (kgdb_cpu_enter+0x1f8/0x674)
[<c02f39a8>] (kgdb_cpu_enter) from [<c02f4330>] (kgdb_handle_exception+0x1c4/0x1fc)
[<c02f416c>] (kgdb_handle_exception) from [<c0210fe0>] (kgdb_compiled_brk_fn+0x30/0x3c)
[<c0210fb0>] (kgdb_compiled_brk_fn) from [<c020d7ac>] (do_undefinstr+0x180/0x1a0)
[<c020d62c>] (do_undefinstr) from [<c0201b44>] (__und_svc_finish+0x0/0x3c)
...
[<c02f3224>] (kgdb_breakpoint) from [<c02f3310>] (sysrq_handle_dbg+0x58/0x6c)
[<c02f32b8>] (sysrq_handle_dbg) from [<c062abf0>] (__handle_sysrq+0xac/0x154)
Let's disable this warning when we're in kgdb to avoid the spew. The
whole system is stopped when we're in kgdb so we can't exactly wait
for someone else to drop the lock. Presumably the best we can do is
to disable the warning and hope for the best.
Fixes: ddde3c18b7 ("vt: More locking checks")
Cc: Daniel Vetter <daniel.vetter@intel.com>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20190725183551.169208-1-dianders@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 63d7ef3610 ("mwifiex: Don't abort on small, spec-compliant
vendor IEs") adjusted the ieee_types_vendor_header struct, which
inadvertently messed up the offsets used in
mwifiex_is_wpa_oui_present(). Add that offset back in, mirroring
mwifiex_is_rsn_oui_present().
As it stands, commit 63d7ef3610 breaks compatibility with WPA (not
WPA2) 802.11n networks, since we hit the "info: Disable 11n if AES is
not supported by AP" case in mwifiex_is_network_compatible().
Fixes: 63d7ef3610 ("mwifiex: Don't abort on small, spec-compliant vendor IEs")
Cc: <stable@vger.kernel.org>
Signed-off-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Fix the fact that a notification isn't sent to the recvmsg side to indicate
a call failed when sendmsg() fails to transmit a DATA packet with the error
ENETUNREACH, EHOSTUNREACH or ECONNREFUSED.
Without this notification, the afs client just sits there waiting for the
call to complete in some manner (which it's not now going to do), which
also pins the rxrpc call in place.
This can be seen if the client has a scope-level IPv6 address, but not a
global-level IPv6 address, and we try and transmit an operation to a
server's IPv6 address.
Looking in /proc/net/rxrpc/calls shows completed calls just sat there with
an abort code of RX_USER_ABORT and an error code of -ENETUNREACH.
Fixes: c54e43d752 ("rxrpc: Fix missing start of call timeout")
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
Mark switch cases where we are expecting to fall through.
Fixes errors such as below, seen with mpc85xx_defconfig:
arch/powerpc/kernel/align.c: In function 'emulate_spe':
arch/powerpc/kernel/align.c:178:8: error: this statement may fall through
ret |= __get_user_inatomic(temp.v[3], p++);
^~
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190730141917.21817-1-mpe@ellerman.id.au
There is a potential deadlock in rxrpc_peer_keepalive_dispatch() whereby
rxrpc_put_peer() is called with the peer_hash_lock held, but if it reduces
the peer's refcount to 0, rxrpc_put_peer() calls __rxrpc_put_peer() - which
the tries to take the already held lock.
Fix this by providing a version of rxrpc_put_peer() that can be called in
situations where the lock is already held.
The bug may produce the following lockdep report:
============================================
WARNING: possible recursive locking detected
5.2.0-next-20190718 #41 Not tainted
--------------------------------------------
kworker/0:3/21678 is trying to acquire lock:
00000000aa5eecdf (&(&rxnet->peer_hash_lock)->rlock){+.-.}, at: spin_lock_bh
/./include/linux/spinlock.h:343 [inline]
00000000aa5eecdf (&(&rxnet->peer_hash_lock)->rlock){+.-.}, at:
__rxrpc_put_peer /net/rxrpc/peer_object.c:415 [inline]
00000000aa5eecdf (&(&rxnet->peer_hash_lock)->rlock){+.-.}, at:
rxrpc_put_peer+0x2d3/0x6a0 /net/rxrpc/peer_object.c:435
but task is already holding lock:
00000000aa5eecdf (&(&rxnet->peer_hash_lock)->rlock){+.-.}, at: spin_lock_bh
/./include/linux/spinlock.h:343 [inline]
00000000aa5eecdf (&(&rxnet->peer_hash_lock)->rlock){+.-.}, at:
rxrpc_peer_keepalive_dispatch /net/rxrpc/peer_event.c:378 [inline]
00000000aa5eecdf (&(&rxnet->peer_hash_lock)->rlock){+.-.}, at:
rxrpc_peer_keepalive_worker+0x6b3/0xd02 /net/rxrpc/peer_event.c:430
Fixes: 330bdcfadc ("rxrpc: Fix the keepalive generator [ver #2]")
Reported-by: syzbot+72af434e4b3417318f84@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
In the in-kernel afs filesystem, the d_fsdata dentry field is used to hold
the data version of the parent directory when it was created or when
d_revalidate() last caused it to be updated. This is compared to the
->invalid_before field in the directory inode, rather than the actual data
version number, thereby allowing changes due to local edits to be ignored.
Only if the server data version gets bumped unexpectedly (eg. by a
competing client), do we need to revalidate stuff.
However, the d_fsdata field should also be updated if an rpc op is
performed that modifies that particular dentry. Such ops return the
revised data version of the directory(ies) involved, so we should use that.
This is particularly problematic for rename, since a dentry from one
directory may be moved directly into another directory (ie. mv a/x b/x).
It would then be sporting the wrong data version - and if this is in the
future, for the destination directory, revalidations would be missed,
leading to foreign renames and hard-link deletion being missed.
Fix this by the following means:
(1) Return the data version number from operations that read the directory
contents - if they issue the read. This starts in afs_dir_iterate()
and is used, ignored or passed back by its callers.
(2) In afs_lookup*(), set the dentry version to the version returned by
(1) before d_splice_alias() is called and the dentry published.
(3) In afs_d_revalidate(), set the dentry version to that returned from
(1) if an rpc call was issued. This means that if a parallel
procedure, such as mkdir(), modifies the directory, we won't
accidentally use the data version from that.
(4) In afs_{mkdir,create,link,symlink}(), set the new dentry's version to
the directory data version before d_instantiate() is called.
(5) In afs_{rmdir,unlink}, update the target dentry's version to the
directory data version as soon as we've updated the directory inode.
(6) In afs_rename(), we need to unhash the old dentry before we start so
that we don't get afs_d_revalidate() reverting the version change in
cross-directory renames.
We then need to set both the old and the new dentry versions the data
version of the new directory before we call d_move() as d_move() will
rehash them.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: David Howells <dhowells@redhat.com>
In the in-kernel afs filesystem, d_fsdata is set with the data version of
the parent directory. afs_d_revalidate() will update this to the current
directory version, but it shouldn't do this if it the value it read from
d_fsdata is the same as no lock is held and cmpxchg() is not used.
Fix the code to only change the value if it is different from the current
directory version.
Fixes: 260a980317 ("[AFS]: Add "directory write" support.")
Signed-off-by: David Howells <dhowells@redhat.com>
When afs_rename() calculates the expected data version of the target
directory in a cross-directory rename, it doesn't increment it as it
should, so it always thinks that the target inode is unexpectedly modified
on the server.
Fixes: a58823ac45 ("afs: Fix application of status and callback to be under same lock")
Signed-off-by: David Howells <dhowells@redhat.com>
In afs_read_dir(), there is an if statement on line 255 to check whether
req->pages is NULL:
if (!req->pages)
goto error;
If req->pages is NULL, afs_put_read() on line 337 is executed.
In afs_put_read(), req->pages[i] is used on line 195.
Thus, a possible null-pointer dereference may occur in this case.
To fix this possible bug, an if statement is added in afs_put_read() to
check req->pages.
This bug is found by a static analysis tool STCheck written by us.
Fixes: f3ddee8dc4 ("afs: Fix directory handling")
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
afs_deliver_vl_get_entry_by_name_u() scans through the vl entry
received from the volume location server and builds a return list
containing the sites that are currently valid. When assigning
values for the return list, the index into the vl entry (i) is used
rather than the one for the new list (entry->nr_server). If all
sites are usable, this works out fine as the indices will match.
If some sites are not valid, for example if AFS_VLSF_DONTUSE is
set, fs_mask and the uuid will be set for the wrong return site.
Fix this by using entry->nr_server as the index into the arrays
being filled in rather than i.
This can lead to EDESTADDRREQ errors if none of the returned sites
have a valid fs_mask.
Fixes: d2ddc776a4 ("afs: Overhaul volume and server record caching and fileserver rotation")
Signed-off-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
Fix the service handler function for the CB.ProbeUuid RPC call so that it
replies in the correct manner - that is an empty reply for success and an
abort of 1 for failure.
Putting 0 or 1 in an integer in the body of the reply should result in the
fileserver throwing an RX_PROTOCOL_ERROR abort and discarding its record of
the client; older servers, however, don't necessarily check that all the
data got consumed, and so might incorrectly think that they got a positive
response and associate the client with the wrong host record.
If the client is incorrectly associated, this will result in callbacks
intended for a different client being delivered to this one and then, when
the other client connects and responds positively, all of the callback
promises meant for the client that issued the improper response will be
lost and it won't receive any further change notifications.
Fixes: 9396d496d7 ("afs: support the CB.ProbeUuid RPC op")
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
If CONFIG_DRM_TOSHIBA_TC358764=y but CONFIG_DRM_KMS_HELPER=m,
building fails:
drivers/gpu/drm/bridge/tc358764.o:(.rodata+0x228): undefined reference to `drm_atomic_helper_connector_reset'
drivers/gpu/drm/bridge/tc358764.o:(.rodata+0x240): undefined reference to `drm_helper_probe_single_connector_modes'
drivers/gpu/drm/bridge/tc358764.o:(.rodata+0x268): undefined reference to `drm_atomic_helper_connector_duplicate_state'
drivers/gpu/drm/bridge/tc358764.o:(.rodata+0x270): undefined reference to `drm_atomic_helper_connector_destroy_state'
Like TC358767, select DRM_KMS_HELPER to fix this, and
change to select DRM_PANEL to avoid recursive dependency.
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: f38b7cca6d ("drm/bridge: tc358764: Add DSI to LVDS bridge driver")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Reviewed-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190729090520.25968-1-yuehaibing@huawei.com
Revert this for now, it has been reported multiple times that it
completely breaks connectivity on various devices.
Cc: stable@vger.kernel.org
Fixes: 8dbb000ee7 ("mac80211: set NETIF_F_LLTX when using intermediate tx queues")
Reported-by: Jean Delvare <jdelvare@suse.de>
Reported-by: Peter Lebbing <peter@digitalbrains.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Jozsef Kadlecsik says:
====================
ipset patches for the nf tree
- When the support of destination MAC addresses for hash:mac sets was
introduced, it was forgotten to add the same functionality to hash:ip,mac
types of sets. The patch from Stefano Brivio adds the missing part.
- When the support of destination MAC addresses for hash:mac sets was
introduced, a copy&paste error was made in the code of the hash:ip,mac
and bitmap:ip,mac types: the MAC address in these set types is in
the second position and not in the first one. Stefano Brivio's patch
fixes the issue.
- There was still a not properly handled concurrency handling issue
between renaming and listing sets at the same time, reported by
Shijie Luo.
====================
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
ebtables doesn't include the base chain policies in the rule count,
so we need to add them manually when we call into the x_tables core
to allocate space for the comapt offset table.
This lead syzbot to trigger:
WARNING: CPU: 1 PID: 9012 at net/netfilter/x_tables.c:649
xt_compat_add_offset.cold+0x11/0x36 net/netfilter/x_tables.c:649
Reported-by: syzbot+276ddebab3382bbf72db@syzkaller.appspotmail.com
Fixes: 2035f3ff8e ("netfilter: ebtables: compat: un-break 32bit setsockopt when no rules are present")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Currently, nvdimm subsystem expects the device numa node for SCM device to be
an online node. It also doesn't try to bring the device numa node online. Hence
if we use a non-online numa node as device node we hit crashes like below. This
is because we try to access uninitialized NODE_DATA in different code paths.
cpu 0x0: Vector: 300 (Data Access) at [c0000000fac53170]
pc: c0000000004bbc50: ___slab_alloc+0x120/0xca0
lr: c0000000004bc834: __slab_alloc+0x64/0xc0
sp: c0000000fac53400
msr: 8000000002009033
dar: 73e8
dsisr: 80000
current = 0xc0000000fabb6d80
paca = 0xc000000003870000 irqmask: 0x03 irq_happened: 0x01
pid = 7, comm = kworker/u16:0
Linux version 5.2.0-06234-g76bd729b2644 (kvaneesh@ltc-boston123) (gcc version 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1)) #135 SMP Thu Jul 11 05:36:30 CDT 2019
enter ? for help
[link register ] c0000000004bc834 __slab_alloc+0x64/0xc0
[c0000000fac53400] c0000000fac53480 (unreliable)
[c0000000fac53500] c0000000004bc818 __slab_alloc+0x48/0xc0
[c0000000fac53560] c0000000004c30a0 __kmalloc_node_track_caller+0x3c0/0x6b0
[c0000000fac535d0] c000000000cfafe4 devm_kmalloc+0x74/0xc0
[c0000000fac53600] c000000000d69434 nd_region_activate+0x144/0x560
[c0000000fac536d0] c000000000d6b19c nd_region_probe+0x17c/0x370
[c0000000fac537b0] c000000000d6349c nvdimm_bus_probe+0x10c/0x230
[c0000000fac53840] c000000000cf3cc4 really_probe+0x254/0x4e0
[c0000000fac538d0] c000000000cf429c driver_probe_device+0x16c/0x1e0
[c0000000fac53950] c000000000cf0b44 bus_for_each_drv+0x94/0x130
[c0000000fac539b0] c000000000cf392c __device_attach+0xdc/0x200
[c0000000fac53a50] c000000000cf231c bus_probe_device+0x4c/0xf0
[c0000000fac53a90] c000000000ced268 device_add+0x528/0x810
[c0000000fac53b60] c000000000d62a58 nd_async_device_register+0x28/0xa0
[c0000000fac53bd0] c0000000001ccb8c async_run_entry_fn+0xcc/0x1f0
[c0000000fac53c50] c0000000001bcd9c process_one_work+0x46c/0x860
[c0000000fac53d20] c0000000001bd4f4 worker_thread+0x364/0x5f0
[c0000000fac53db0] c0000000001c7260 kthread+0x1b0/0x1c0
[c0000000fac53e20] c00000000000b954 ret_from_kernel_thread+0x5c/0x68
The patch tries to fix this by picking the nearest online node as the SCM node.
This does have a problem of us losing the information that SCM node is
equidistant from two other online nodes. If applications need to understand these
fine-grained details we should express then like x86 does via
/sys/devices/system/node/nodeX/accessY/initiators/
With the patch we get
# numactl -H
available: 2 nodes (0-1)
node 0 cpus:
node 0 size: 0 MB
node 0 free: 0 MB
node 1 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
node 1 size: 130865 MB
node 1 free: 129130 MB
node distances:
node 0 1
0: 10 20
1: 20 10
# cat /sys/bus/nd/devices/region0/numa_node
0
# dmesg | grep papr_scm
[ 91.332305] papr_scm ibm,persistent-memory:ibm,pmemory@44104001: Region registered with target node 2 and online node 0
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190729095128.23707-1-aneesh.kumar@linux.ibm.com
syzbot found the following crash on:
general protection fault: 0000 [#1] SMP KASAN
RIP: 0010:snd_usb_pipe_sanity_check+0x80/0x130 sound/usb/helper.c:75
Call Trace:
snd_usb_motu_microbookii_communicate.constprop.0+0xa0/0x2fb sound/usb/quirks.c:1007
snd_usb_motu_microbookii_boot_quirk sound/usb/quirks.c:1051 [inline]
snd_usb_apply_boot_quirk.cold+0x163/0x370 sound/usb/quirks.c:1280
usb_audio_probe+0x2ec/0x2010 sound/usb/card.c:576
usb_probe_interface+0x305/0x7a0 drivers/usb/core/driver.c:361
really_probe+0x281/0x650 drivers/base/dd.c:548
....
It was introduced in commit 801ebf1043 for checking pipe and endpoint
types. It is fixed by adding a check of the ep pointer in question.
BugLink: https://syzkaller.appspot.com/bug?extid=d59c4387bfb6eced94e2
Reported-by: syzbot <syzbot+d59c4387bfb6eced94e2@syzkaller.appspotmail.com>
Fixes: 801ebf1043 ("ALSA: usb-audio: Sanity checks for each pipe and EP types")
Cc: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Hillf Danton <hdanton@sina.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
When CONFIG_NVME_MULTIPATH is set, only the hidden gendisk associated
with the per-controller ns is run through revalidate_disk when a
rescan is triggered, while the visible blockdev never gets its size
(bdev->bd_inode->i_size) updated to reflect any capacity changes that
may have occurred.
This prevents online resizing of nvme block devices and in extension of
any filesystems atop that will are unable to expand while mounted, as
userspace relies on the blockdev size for obtaining the disk capacity
(via BLKGETSIZE/64 ioctls).
Fix this by explicitly revalidating the actual namespace gendisk in
addition to the per-controller gendisk, when multipath is enabled.
Signed-off-by: Anthony Iliopoulos <ailiopoulos@suse.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Some arches (e.g., arm64, x86) have moved towards non-executable
module_alloc() allocations for security hardening reasons. That means
that the module loader will need to set the text section of a module to
executable, regardless of whether or not CONFIG_STRICT_MODULE_RWX is set.
When CONFIG_STRICT_MODULE_RWX=y, module section allocations are always
page-aligned to handle memory rwx permissions. On some arches with
CONFIG_STRICT_MODULE_RWX=n however, when setting the module text to
executable, the BUG_ON() in frob_text() gets triggered since module
section allocations are not page-aligned when CONFIG_STRICT_MODULE_RWX=n.
Since the set_memory_* API works with pages, and since we need to call
set_memory_x() regardless of whether CONFIG_STRICT_MODULE_RWX is set, we
might as well page-align all module section allocations for ease of
managing rwx permissions of module sections (text, rodata, etc).
Fixes: 2eef1399a8 ("modules: fix BUG when load module with rodata=n")
Reported-by: Martin Kaiser <lists@kaiser.cx>
Reported-by: Bartosz Golaszewski <brgl@bgdev.pl>
Tested-by: David Lechner <david@lechnology.com>
Tested-by: Martin Kaiser <martin@kaiser.cx>
Tested-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
Oded writes:
This tag contains two fixes when running in BE architecture:
- Fix for F/W download. The F/W is in LE so use a function that doesn't
do bytw-swapping.
- Fix for polling on host memory locations that are written by the device.
The device always works in LE, so we need to do byte-swap when polling
on those locations.
* tag 'misc-habanalabs-fixes-2019-07-29' of git://people.freedesktop.org/~gabbayo/linux:
habanalabs: fix host memory polling in BE architecture
habanalabs: fix F/W download in BE architecture
Windows guest can't run after force-TDR with host log:
...
gvt: vgpu 1: workload shadow ppgtt isn't ready
gvt: vgpu 1: fail to dispatch workload, skip
...
The error is raised by set_context_ppgtt_from_shadow(), when it checks
and found the shadow_mm isn't marked as shadowed.
In work thread before each submission, a shadow_mm is set to shadowed in:
shadow_ppgtt_mm()
<-intel_vgpu_pin_mm()
<-prepare_workload()
<-dispatch_workload()
<-workload_thread()
However checking whether or not shadow_mm is shadowed is prior to it:
set_context_ppgtt_from_shadow()
<-dispatch_workload()
<-workload_thread()
In normal case, create workload will check the existence of shadow_mm,
if not it will create a new one and marked as shadowed. If already exist
it will reuse the old one. Since shadow_mm is reused, checking of shadowed
in set_context_ppgtt_from_shadow() actually always see the state set in
creation, but not the state set in intel_vgpu_pin_mm().
When force-TDR, all engines are reset, since it's not dmlr level, all
ppgtt_mm are invalidated but not destroyed. Invalidation will mark all
reused shadow_mm as not shadowed but still keeps in ppgtt_mm_list_head.
If workload submission phase those shadow_mm are reused with shadowed
not set, then set_context_ppgtt_from_shadow() will report error.
Pin for context after shadow_mm pinned and shadow pdps settled.
v2:
Move set_context_ppgtt_from_shadow() after prepare_workload(). (zhenyu)
v3:
Move set_context_ppgtt_from_shadow() after shadow pdps updated.(zhenyu)
Fixes: 4f15665ccb ("drm/i915: Add ppgtt to GVT GEM context")
Cc: stable@vger.kernel.org
Signed-off-by: Colin Xu <colin.xu@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
GPU hang observed during the guest OCL conformance test which is caused
by THP GTT feature used durning the test.
It was observed the same GFN with different size (4K and 2M) requested
from the guest in GVT. So during the guest page dma map stage, it is
required to unmap first with orginal size and then remap again with
requested size.
Fixes: b901b252b6 ("drm/i915/gvt: Add 2M huge gtt support")
Cc: stable@vger.kernel.org
Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Xiaolin Zhang <xiaolin.zhang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Workload contains RB and WA_CTX which are in ggtt space,
if they aren't in valid ggtt space, the workload shouldn't be
shadowed and scanned. So checking them earlier to avoid shadow
them.
Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Instead of silently return virtual ggtt entries that guest is allowed
to access, this patch add extra range check. If guest read out of
range, it will print a warning and return 0. If guest write out
of range, the write will be dropped without any message.
Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Instead of using the generic 'fc_rport_priv' structure as argument and then
having to painstakingly outcast this to fcoe_rport we should be passing the
fcoe_rport structure itself and reduce complexity.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Gcc-9 complains for a memset across pointer boundaries, which happens as
the code tries to allocate a flexible array on the stack. Turns out we
cannot do this without relying on gcc-isms, so with this patch we'll embed
the fc_rport_priv structure into fcoe_rport, can use the normal
'container_of' outcast, and will only have to do a memset over one
structure.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Mark switch cases where we are expecting to fall through.
This patch fixes the following warning (Building: arm):
drivers/net/ethernet/smsc/smc911x.c: In function ‘smc911x_phy_detect’:
drivers/net/ethernet/smsc/smc911x.c:677:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
if (cfg & HW_CFG_EXT_PHY_DET_) {
^
drivers/net/ethernet/smsc/smc911x.c:715:3: note: here
default:
^~~~~~~
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeffrin reported a KASAN issue:
BUG: KASAN: global-out-of-bounds in ata_exec_internal_sg+0x50f/0xc70
Read of size 16 at addr ffffffff91f41f80 by task scsi_eh_1/149
...
The buggy address belongs to the variable:
cdb.48319+0x0/0x40
Much like commit 18c9a99bce ("libata: zpodd: small read overflow in
eject_tray()"), this fixes a cdb[] buffer length, this time in
zpodd_get_mech_type():
We read from the cdb[] buffer in ata_exec_internal_sg(). It has to be
ATAPI_CDB_LEN (16) bytes long, but this buffer is only 12 bytes.
Reported-by: Jeffrin Jose T <jeffrin@rajagiritech.edu.in>
Fixes: afe7595118 ("libata: identify and init ZPODD devices")
Link: https://lore.kernel.org/lkml/201907181423.E808958@keescook/
Tested-by: Jeffrin Jose T <jeffrin@rajagiritech.edu.in>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Mark switch cases where we are expecting to fall through.
This patch fixes the following warning (Building: m68k):
drivers/block/ataflop.c: In function ‘fd_locked_ioctl’:
drivers/block/ataflop.c:1728:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
set_capacity(floppy->disk, MAX_DISK_SIZE * 2);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/block/ataflop.c:1729:2: note: here
case FDFMTEND:
^~~~
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull perf/urgent fixes from Arnaldo Carvalho de Melo:
perf header:
Vince Weaver:
- Fix divide by zero error if f_header.attr_size==0, found using a perf tool fuzzer.
Numfor Mbiziwo-Tiapo:
- Silence use of uninitialized value warning pointed out by clang's MSAN tool.
libbpf:
Andrii Nakryiko:
- Fix missing __WORDSIZE definition in some systems, such as musl libc (Alpine Linux).
tools header UAPI:
Arnaldo Carvalho de Melo:
- Sync headers to address perf build warnings:
- syscalls_64.tbl and generic unistd.h to pick up clone3 and pidfd_open.
- With new ioctls: kvm.h, drm.h and usbdevice_fs.h.
- No tooling change: mman.h, sched.h and if_link.h.
Documentation:
Vince Weaver:
- Fix perf.data documentation units for memory size, its kB, not bytes.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Mark switch cases where we are expecting to fall through.
This patch fixes the following warning (Building: i386):
drivers/net/hamradio/baycom_epp.c: In function ‘transmit’:
drivers/net/hamradio/baycom_epp.c:491:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
if (i) {
^
drivers/net/hamradio/baycom_epp.c:504:3: note: here
default: /* fall through */
^~~~~~~
Notice that, in this particular case, the code comment is
modified in accordance with what GCC is expecting to find.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mark switch cases where we are expecting to fall through.
This patch fixes the following warning (Building: i386):
drivers/net/wan/sdla.c: In function ‘sdla_errors’:
drivers/net/wan/sdla.c:414:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
if (cmd == SDLA_INFORMATION_WRITE)
^
drivers/net/wan/sdla.c:417:3: note: here
default:
^~~~~~~
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
IS_ERR() already calls unlikely(), so this extra unlikely() call
around IS_ERR() is not needed.
Signed-off-by: Enrico Weigelt <info@metux.net>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Spectrum systems have a configurable limit on how far into the packet they
parse. By default, the limit is 96 bytes.
An IPv6 PTP packet is layered as Ethernet/IPv6/UDP (14+40+8 bytes), and
sequence ID of a PTP event is only available 32 bytes into payload, for a
total of 94 bytes. When an additional 802.1q header is present as
well (such as when ptp4l is running on a VLAN port), the parsing limit is
exceeded. Such packets are not recognized as PTP, and are not timestamped.
Therefore generalize the current VXLAN-specific parsing depth setting to
allow reference-counted requests from other modules as well. Keep it in the
VXLAN module, because the MPRS register also configures UDP destination
port number used for VXLAN, and is thus closely tied to the VXLAN code
anyway.
Then invoke the new interfaces from both VXLAN (in obvious places), as well
as from PTP code, when the (global) timestamping configuration changes from
disabled to enabled or vice versa.
Fixes: 8748642751 ("mlxsw: spectrum: PTP: Support SIOCGHWTSTAMP, SIOCSHWTSTAMP ioctls")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shijie Luo reported that when stress-testing ipset with multiple concurrent
create, rename, flush, list, destroy commands, it can result
ipset <version>: Broken LIST kernel message: missing DATA part!
error messages and broken list results. The problem was the rename operation
was not properly handled with respect of listing. The patch fixes the issue.
Reported-by: Shijie Luo <luoshijie1@huawei.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
In commit 8cc4ccf583 ("ipset: Allow matching on destination MAC address
for mac and ipmac sets"), ipset.git commit 1543514c46a7, I added to the
KADT functions for sets matching on MAC addreses the copy of source or
destination MAC address depending on the configured match.
This was done correctly for hash:mac, but for hash:ip,mac and
bitmap:ip,mac, copying and pasting the same code block presents an
obvious problem: in these two set types, the MAC address is the second
dimension, not the first one, and we are actually selecting the MAC
address depending on whether the first dimension (IP address) specifies
source or destination.
Fix this by checking for the IPSET_DIM_TWO_SRC flag in option flags.
This way, mixing source and destination matches for the two dimensions
of ip,mac set types works as expected. With this setup:
ip netns add A
ip link add veth1 type veth peer name veth2 netns A
ip addr add 192.0.2.1/24 dev veth1
ip -net A addr add 192.0.2.2/24 dev veth2
ip link set veth1 up
ip -net A link set veth2 up
dst=$(ip netns exec A cat /sys/class/net/veth2/address)
ip netns exec A ipset create test_bitmap bitmap:ip,mac range 192.0.0.0/16
ip netns exec A ipset add test_bitmap 192.0.2.1,${dst}
ip netns exec A iptables -A INPUT -m set ! --match-set test_bitmap src,dst -j DROP
ip netns exec A ipset create test_hash hash:ip,mac
ip netns exec A ipset add test_hash 192.0.2.1,${dst}
ip netns exec A iptables -A INPUT -m set ! --match-set test_hash src,dst -j DROP
ipset correctly matches a test packet:
# ping -c1 192.0.2.2 >/dev/null
# echo $?
0
Reported-by: Chen Yi <yiche@redhat.com>
Fixes: 8cc4ccf583 ("ipset: Allow matching on destination MAC address for mac and ipmac sets")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
In commit 8cc4ccf583 ("ipset: Allow matching on destination MAC address
for mac and ipmac sets"), ipset.git commit 1543514c46a7, I removed the
KADT check that prevents matching on destination MAC addresses for
hash:mac sets, but forgot to remove the same check for hash:ip,mac set.
Drop this check: functionality is now commented in man pages and there's
no reason to restrict to source MAC address matching anymore.
Reported-by: Chen Yi <yiche@redhat.com>
Fixes: 8cc4ccf583 ("ipset: Allow matching on destination MAC address for mac and ipmac sets")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Pull virtio/vhost fixes from Michael Tsirkin:
- Fixes in the iommu and balloon devices.
- Disable the meta-data optimization for now - I hope we can get it
fixed shortly, but there's no point in making users suffer crashes
while we are working on that.
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vhost: disable metadata prefetch optimization
iommu/virtio: Update to most recent specification
balloon: fix up comments
mm/balloon_compaction: avoid duplicate page removal
Mark switch cases where we are expecting to fall through.
This patch fixes the following warning:
drivers/net/ethernet/toshiba/spider_net.c: In function 'spider_net_release_tx_chain':
drivers/net/ethernet/toshiba/spider_net.c:783:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
if (!brutal) {
^
drivers/net/ethernet/toshiba/spider_net.c:792:3: note: here
case SPIDER_NET_DESCR_RESPONSE_ERROR:
^~~~
Notice that, in this particular case, the code comment is
modified in accordance with what GCC is expecting to find.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mark switch cases where we are expecting to fall through.
This patch fixes the following warning:
drivers/net/ethernet/ibm/ehea/ehea_main.c: In function 'ehea_mem_notifier':
include/linux/printk.h:311:2: warning: this statement may fall through [-Wimplicit-fallthrough=]
printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/ibm/ehea/ehea_main.c:3253:3: note: in expansion of macro 'pr_info'
pr_info("memory offlining canceled");
^~~~~~~
drivers/net/ethernet/ibm/ehea/ehea_main.c:3256:2: note: here
case MEM_ONLINE:
^~~~
Notice that, in this particular case, the code comment is
modified in accordance with what GCC is expecting to find.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull x86 platform driver fixes from Andy Shevchenko:
"Business as usual, a few fixes and new IDs:
- PC Engines APU got one fix for software dependencies to
automatically load them and another fix for mapping of key button
in the front to issue restart event.
- OLPC driver is now probed automatically based on module device
table.
- Intel PMC core driver supports Intel Ice Lake NNPI processor.
- WMI driver missed description of a new field in the structure that
has been added"
* tag 'platform-drivers-x86-v5.3-3' of git://git.infradead.org/linux-platform-drivers-x86:
platform/x86: pcengines-apuv2: use KEY_RESTART for front button
platform/x86: intel_pmc_core: Add ICL-NNPI support to PMC Core
Platform: OLPC: add SPI MODULE_DEVICE_TABLE
platform/x86: wmi: add missing struct parameter description
platform/x86: pcengines-apuv2: Fix softdep statement
The hardware can only offload checksum calculation on first port due to
the Tx FIFO size limitation, and has a maximum L3 offset of 128 bytes.
Document this in a comment and move duplicated code in a function.
Fixes: 576193f2d5 ("net: mvpp2: jumbo frames support")
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The MTU change code can call napi_disable() with the device already down,
leading to a deadlock. Also, lot of code is duplicated unnecessarily.
Rework mvpp2_change_mtu() to avoid the deadlock and remove duplicated code.
Fixes: 3f518509de ("ethernet: Add new driver for Marvell Armada 375 network unit")
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently there are two error return paths that leak memory allocated
to fib_work. Fix this by kfree'ing fib_work before returning.
Addresses-Coverity: ("Resource leak")
Fixes: 19a9d136f1 ("ipv4: Flag fib_info with a fib_nh using IPv6 gateway")
Fixes: dbcc4fa718 ("rocker: Fail attempts to use routes with nexthop objects")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit d01f449c00 ("of_net: add NVMEM support to of_get_mac_address")
added support for reading the MAC address from an nvmem-cell. This
required changing the logic to return an error pointer upon failure.
If stmmac is loaded before the nvmem provider driver then
of_get_mac_address() return an error pointer with -EPROBE_DEFER.
Propagate this error so the stmmac driver will be probed again after the
nvmem provider driver is loaded.
Default to a random generated MAC address in case of any other error,
instead of using the error pointer as MAC address.
Fixes: d01f449c00 ("of_net: add NVMEM support to of_get_mac_address")
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Reviewed-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mark switch cases where we are expecting to fall through.
This patch fixes the following warnings:
net/iucv/af_iucv.c: warning: this statement may fall
through [-Wimplicit-fallthrough=]: => 537:3, 519:6, 2246:6, 510:6
Notice that, in this particular case, the code comment is
modified in accordance with what GCC is expecting to find.
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mark switch cases where we are expecting to fall through.
This patch fixes the following warnings:
drivers/net/arcnet/com20020-isa.c: warning: this statement may fall
through [-Wimplicit-fallthrough=]: => 205:13, 203:10, 209:7, 201:11,
207:8
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
lost wakeup can occur after enabling irq, therefore put task
into interruptible before enabling interrupts,
without this change, task can be put to sleep and snd_pcm_drain
will delay
Fixes: f2b3614cef ("ALSA: PCM - Don't check DMA time-out too shortly")
Signed-off-by: Yuki Tsunashima <ytsunashima@jp.adit-jv.com>
Signed-off-by: Suresh Udipi <sudipi@jp.adit-jv.com>
[ported from 4.9]
Signed-off-by: Adam Miartus <amiartus@de.adit-jv.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
On initialization failure we have to delete the local fdb which was
inserted due to the default pvid creation. This problem has been present
since the inception of default_pvid. Note that currently there are 2 cases:
1) in br_dev_init() when br_multicast_init() fails
2) if register_netdevice() fails after calling ndo_init()
This patch takes care of both since br_vlan_flush() is called on both
occasions. Also the new fdb delete would be a no-op on normal bridge
device destruction since the local fdb would've been already flushed by
br_dev_delete(). This is not an issue for ports since nbp_vlan_init() is
called last when adding a port thus nothing can fail after it.
Reported-by: syzbot+88533dc8b582309bf3ee@syzkaller.appspotmail.com
Fixes: 5be5a2df40 ("bridge: Add filtering support for default_pvid")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In dequeue_func(), there is an if statement on line 74 to check whether
skb is NULL:
if (skb)
When skb is NULL, it is used on line 77:
prefetch(&skb->end);
Thus, a possible null-pointer dereference may occur.
To fix this bug, skb->end is used when skb is not NULL.
This bug is found by a static analysis tool STCheck written by us.
Fixes: 76e3cc126b ("codel: Controlled Delay AQM")
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This removes the mailing list xdp-newbies@vger.kernel.org from the XDP
kernel maintainers entry.
Being in the kernel MAINTAINERS file successfully caused the list to
receive kbuild bot warnings, syzbot reports and sometimes developer
patches. The level of details in these messages, doesn't match the
target audience of the XDP-newbies list. This is based on a survey on
the mailing list, where 73% voted for removal from MAINTAINERS file.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mark switch cases where we are expecting to fall through.
This patch fixes the following warnings (Building: powerpc allyesconfig):
drivers/net/arcnet/arc-rimi.c: In function 'arcrimi_setup':
include/linux/printk.h:304:2: warning: this statement may fall through [-Wimplicit-fallthrough=]
printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/arcnet/arc-rimi.c:365:3: note: in expansion of macro 'pr_err'
pr_err("Too many arguments\n");
^~~~~~
drivers/net/arcnet/arc-rimi.c:366:2: note: here
case 3: /* Node ID */
^~~~
drivers/net/arcnet/arc-rimi.c:367:8: warning: this statement may fall through [-Wimplicit-fallthrough=]
node = ints[3];
~~~~~^~~~~~~~~
drivers/net/arcnet/arc-rimi.c:368:2: note: here
case 2: /* IRQ */
^~~~
drivers/net/arcnet/arc-rimi.c:369:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
irq = ints[2];
~~~~^~~~~~~~~
drivers/net/arcnet/arc-rimi.c:370:2: note: here
case 1: /* IO address */
^~~~
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mark switch cases where we are expecting to fall through.
This patch fixes the following warnings (Building: powerpc allyesconfig):
drivers/net/arcnet/com90io.c: In function 'com90io_setup':
include/linux/printk.h:304:2: warning: this statement may fall through [-Wimplicit-fallthrough=]
printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/arcnet/com90io.c:365:3: note: in expansion of macro 'pr_err'
pr_err("Too many arguments\n");
^~~~~~
drivers/net/arcnet/com90io.c:366:2: note: here
case 2: /* IRQ */
^~~~
drivers/net/arcnet/com90io.c:367:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
irq = ints[2];
~~~~^~~~~~~~~
drivers/net/arcnet/com90io.c:368:2: note: here
case 1: /* IO address */
^~~~
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mark switch cases where we are expecting to fall through.
This patch fixes the following warnings (Building: powerpc allyesconfig):
drivers/net/arcnet/com90xx.c: In function 'com90xx_setup':
include/linux/printk.h:304:2: warning: this statement may fall through [-Wimplicit-fallthrough=]
printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/arcnet/com90xx.c:695:3: note: in expansion of macro 'pr_err'
pr_err("Too many arguments\n");
^~~~~~
drivers/net/arcnet/com90xx.c:696:2: note: here
case 3: /* Mem address */
^~~~
drivers/net/arcnet/com90xx.c:697:9: warning: this statement may fall through [-Wimplicit-fallthrough=]
shmem = ints[3];
~~~~~~^~~~~~~~~
drivers/net/arcnet/com90xx.c:698:2: note: here
case 2: /* IRQ */
^~~~
drivers/net/arcnet/com90xx.c:699:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
irq = ints[2];
~~~~^~~~~~~~~
drivers/net/arcnet/com90xx.c:700:2: note: here
case 1: /* IO address */
^~~~
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The condition checking whether put_unlocked_entry() needs to wake up
following waiter got broken by commit 23c84eb783 ("dax: Fix missed
wakeup with PMD faults"). We need to wake the waiter whenever the passed
entry is valid (i.e., non-NULL and not special conflict entry). This
could lead to processes never being woken up when waiting for entry
lock. Fix the condition.
Cc: <stable@vger.kernel.org>
Link: http://lore.kernel.org/r/20190729120228.GC17833@quack2.suse.cz
Fixes: 23c84eb783 ("dax: Fix missed wakeup with PMD faults")
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
If INFINIBAND_HNS_HIP08 is selected and HNS3 is m,
but INFINIBAND_HNS is y, building fails:
drivers/infiniband/hw/hns/hns_roce_hw_v2.o: In function `hns_roce_hw_v2_exit':
hns_roce_hw_v2.c:(.exit.text+0xd): undefined reference to `hnae3_unregister_client'
drivers/infiniband/hw/hns/hns_roce_hw_v2.o: In function `hns_roce_hw_v2_init':
hns_roce_hw_v2.c:(.init.text+0xd): undefined reference to `hnae3_register_client'
Also if INFINIBAND_HNS_HIP06 is selected and HNS_DSAF
is m, but INFINIBAND_HNS is y, building fails:
drivers/infiniband/hw/hns/hns_roce_hw_v1.o: In function `hns_roce_v1_reset':
hns_roce_hw_v1.c:(.text+0x39fa): undefined reference to `hns_dsaf_roce_reset'
hns_roce_hw_v1.c:(.text+0x3a25): undefined reference to `hns_dsaf_roce_reset'
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: dd74282df5 ("RDMA/hns: Initialize the PCI device for hip08 RoCE")
Fixes: 08805fdbeb ("RDMA/hns: Split hw v1 driver from hns roce driver")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Link: https://lore.kernel.org/r/20190724065443.53068-1-yuehaibing@huawei.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
Since vfio_ccw_async_region_ops is not exported and has no reason to be
globally visible make it static to avoid the following sparse warning:
drivers/s390/cio/vfio_ccw_async.c:73:30: warning: symbol 'vfio_ccw_async_region_ops' was not declared. Should it be static?
Fixes: d5afd5d135 ("vfio-ccw: add handling for async channel instructions")
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Silence the following warning when built with -Wimplicit-fallthrough=3
enabled by default since 5.3-rc2:
drivers/s390/char/con3215.c: In function 'raw3215_irq':
drivers/s390/char/con3215.c:399:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
399 | if (dstat == 0x08)
| ^
drivers/s390/char/con3215.c:401:2: note: here
401 | case 0x04:
| ^~~~
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Since gmap_test_and_clear_dirty_pmd is not exported and has no reason to
be globally visible make it static to avoid the following sparse warning:
arch/s390/mm/gmap.c:2427:6: warning: symbol 'gmap_test_and_clear_dirty_pmd' was not declared. Should it be static?
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Include <asm/kexec.h> into machine_kexec_reloc.c to expose
arch_kexec_do_relocs declaration and avoid the following sparse warnings:
arch/s390/kernel/machine_kexec_reloc.c:4:5: warning: symbol 'arch_kexec_do_relocs' was not declared. Should it be static?
arch/s390/boot/../kernel/machine_kexec_reloc.c:4:5: warning: symbol 'arch_kexec_do_relocs' was not declared. Should it be static?
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Since there is really no reason for cf_diag_csd per cpu variable to be
globally visible make it static to avoid the following sparse warning:
arch/s390/kernel/perf_cpum_cf_diag.c:37:1: warning: symbol 'cf_diag_csd' was not declared. Should it be static?
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Include <asm/xor.h> into arch/s390/lib/xor.c to expose xor_block_xc
declaration and avoid the following sparse warning:
arch/s390/lib/xor.c:128:27: warning: symbol 'xor_block_xc' was not declared. Should it be static?
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Add __swsusp_reset_dma declaration to avoid the following sparse warnings:
arch/s390/kernel/setup.c:107:15: warning: symbol '__swsusp_reset_dma' was not declared. Should it be static?
arch/s390/boot/startup.c:52:15: warning: symbol '__swsusp_reset_dma' was not declared. Should it be static?
Add verify_facilities declaration to avoid the following sparse warning:
arch/s390/boot/als.c:105:6: warning: symbol 'verify_facilities' was not declared. Should it be static?
Include "boot.h" into arch/s390/boot/kaslr.c to expose get_random_base
function declaration and avoid the following sparse warning:
arch/s390/boot/kaslr.c:90:15: warning: symbol 'get_random_base' was not declared. Should it be static?
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Fix two typos, document missing fields in the driver initialization
data and remove the copy&pasted 'pfmt' field from the qdr struct.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
The keycode KEY_RESTART is more appropriate for the front button,
as most people use it for things like restart or factory reset.
Signed-off-by: Enrico Weigelt <info@metux.net>
Fixes: f8eb0235f6 ("x86: pcengines apuv2 gpio/leds/keys platform driver")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
In hwsim_dump_radio_nl(), when genlmsg_put() on line 3617 fails, hdr is
assigned to NULL. Then hdr is used on lines 3622 and 3623:
genl_dump_check_consistent(cb, hdr);
genlmsg_end(skb, hdr);
Thus, possible null-pointer dereferences may occur.
To fix these bugs, hdr is used here when it is not NULL.
This bug is found by a static analysis tool STCheck written by us.
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Link: https://lore.kernel.org/r/20190729082332.28895-1-baijiaju1990@gmail.com
[put braces on all branches]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
In a very similar spirit to commit c470bdc1aa ("mac80211: don't WARN
on bad WMM parameters from buggy APs"), an AP may not transmit a
fully-formed WMM IE. For example, it may miss or repeat an Access
Category. The above loop won't catch that and will instead leave one of
the four ACs zeroed out. This triggers the following warning in
drv_conf_tx()
wlan0: invalid CW_min/CW_max: 0/0
and it may leave one of the hardware queues unconfigured. If we detect
such a case, let's just print a warning and fall back to the defaults.
Tested with a hacked version of hostapd, intentionally corrupting the
IEs in hostapd_eid_wmm().
Cc: stable@vger.kernel.org
Signed-off-by: Brian Norris <briannorris@chromium.org>
Link: https://lore.kernel.org/r/20190726224758.210953-1-briannorris@chromium.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
hashmap.h depends on __WORDSIZE being defined. It is defined by
glibc/musl in different headers. It's an explicit goal for musl to be
"non-detectable" at compilation time, so instead include glibc header if
glibc is explicitly detected and fall back to musl header otherwise.
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Fixes: e3b9242240 ("libbpf: add resizable non-thread safe internal hashmap")
Link: https://lkml.kernel.org/r/20190718173021.2418606-1-andriin@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
On VLV/CHV there is some kind of linkage between the cdclk frequency
and the DP link frequency. The spec says:
"For DP audio configuration, cdclk frequency shall be set to
meet the following requirements:
DP Link Frequency(MHz) | Cdclk frequency(MHz)
270 | 320 or higher
162 | 200 or higher"
I suspect that would more accurately be expressed as
"cdclk >= DP link clock", and in any case we can express it like
that in the code because of the limited set of cdclk (200, 266,
320, 400 MHz) and link frequencies (162 and 270 MHz) we support.
Without this we can end up in a situation where the cdclk
is too low and enabling DP audio will kill the pipe. Happens
eg. with 2560x1440 modes where the 266MHz cdclk is sufficient
to pump the pixels (241.5 MHz dotclock) but is too low for
the DP audio due to the link frequency being 270 MHz.
v2: Spell out the cdclk and link frequencies we actually support
Cc: stable@vger.kernel.org
Tested-by: Stefan Gottwald <gottwald@igel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111149
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190717114536.22937-1-ville.syrjala@linux.intel.com
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
(cherry picked from commit bffb31f73b)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
If we hit an error while allocating the page tables, we have to unwind
the incomplete updates, and wish to free the unused pd. However, we are
not allowed to be hoding the spinlock at that point, and so must use the
later free to defer it until after we drop the lock.
<3> [414.363795] BUG: sleeping function called from invalid context at drivers/gpu/drm/i915/i915_gem_gtt.c:472
<3> [414.364167] in_atomic(): 1, irqs_disabled(): 0, pid: 3905, name: i915_selftest
<4> [414.364406] 3 locks held by i915_selftest/3905:
<4> [414.364408] #0: 0000000034fe8aa8 (&dev->mutex){....}, at: device_driver_attach+0x18/0x50
<4> [414.364415] #1: 000000006bd8a560 (&dev->struct_mutex){+.+.}, at: igt_ctx_exec+0xb7/0x410 [i915]
<4> [414.364476] #2: 000000003dfdc766 (&(&pd->lock)->rlock){+.+.}, at: gen8_ppgtt_alloc_pdp+0x448/0x540 [i915]
<3> [414.364529] Preemption disabled at:
<4> [414.364530] [<0000000000000000>] 0x0
<4> [414.364696] CPU: 0 PID: 3905 Comm: i915_selftest Tainted: G U 5.2.0-rc7-CI-CI_DRM_6403+ #1
<4> [414.364698] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.1-0-g8891697-prebuilt.qemu-project.org 04/01/2014
<4> [414.364699] Call Trace:
<4> [414.364704] dump_stack+0x67/0x9b
<4> [414.364708] ___might_sleep+0x167/0x250
<4> [414.364777] vm_free_page+0x24/0xc0 [i915]
<4> [414.364852] free_pd+0xf/0x20 [i915]
<4> [414.364897] gen8_ppgtt_alloc_pdp+0x489/0x540 [i915]
<4> [414.364946] gen8_ppgtt_alloc_4lvl+0x8e/0x2e0 [i915]
<4> [414.364992] ppgtt_bind_vma+0x2e/0x60 [i915]
<4> [414.365039] i915_vma_bind+0xe8/0x2c0 [i915]
<4> [414.365088] __i915_vma_do_pin+0xa1/0xd20 [i915]
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111050
Fixes: 1d1b5490b9 ("drm/i915/gtt: Replace struct_mutex serialisation for allocation")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190703171913.16585-3-chris@chris-wilson.co.uk
(cherry picked from commit 068610895e)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
When a register is readonly there is not much we can tell about its
value (apart from its default value?). This can be covered by tests
exercising the value of the register from userspace.
For PS_INVOCATION_COUNT we've got the following piglit tests :
KHR-GL45.pipeline_statistics_query_tests_ARB.functional_fragment_shader_invocations
Vulkan CTS tests :
dEQP-VK.query_pool.statistics_query.fragment_shader_invocations.*
v2: Use a local to shrink under 80cols.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 86554f48e5 ("drm/i915/selftests: Verify whitelist of context registers")
Tested-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190629131350.31185-1-chris@chris-wilson.co.uk
(cherry picked from commit 361b690513)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
When building our local version of perf with MSAN (Memory Sanitizer) and
running the perf record command, MSAN throws a use of uninitialized
value warning in "tools/perf/util/util.c:333:6".
This warning stems from the "buf" variable being passed into "write".
It originated as the variable "ev" with the type union perf_event*
defined in the "perf_event__synthesize_attr" function in
"tools/perf/util/header.c".
In the "perf_event__synthesize_attr" function they allocate space with a malloc
call using ev, then go on to only assign some of the member variables before
passing "ev" on as a parameter to the "process" function therefore "ev"
contains uninitialized memory. Changing the malloc call to zalloc to initialize
all the members of "ev" which gets rid of the warning.
To reproduce this warning, build perf by running:
make -C tools/perf CLANG=1 CC=clang EXTRA_CFLAGS="-fsanitize=memory\
-fsanitize-memory-track-origins"
(Additionally, llvm might have to be installed and clang might have to
be specified as the compiler - export CC=/usr/bin/clang)
then running:
tools/perf/perf record -o - ls / | tools/perf/perf --no-pager annotate\
-i - --stdio
Please see the cover letter for why false positive warnings may be
generated.
Signed-off-by: Numfor Mbiziwo-Tiapo <nums@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Drayton <mbd@fb.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/20190724234500.253358-2-nums@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To get the changes in:
6d101f24f1 ("USB: add usbfs ioctl to retrieve the connection parameters")
And address this perf build warning:
Warning: Kernel ABI header at 'tools/include/uapi/linux/usbdevice_fs.h' differs from latest version at 'include/uapi/linux/usbdevice_fs.h'
diff -u tools/include/uapi/linux/usbdevice_fs.h include/uapi/linux/usbdevice_fs.h
Which ends up autogenerating a ioctl_cmd->string table used by 'perf
trace':
$ tools/perf/trace/beauty/usbdevfs_ioctl.sh > before
$ cp include/uapi/linux/usbdevice_fs.h tools/include/uapi/linux/usbdevice_fs.h
$ tools/perf/trace/beauty/usbdevfs_ioctl.sh > after
$ diff -u before after
--- before 2019-07-26 15:26:55.513636844 -0300
+++ after 2019-07-26 15:29:11.650518677 -0300
@@ -23,6 +23,7 @@
[2] = "BULK",
[30] = "DROP_PRIVILEGES",
[31] = "GET_SPEED",
+ [32] = "CONNINFO_EX",
[3] = "RESETEP",
[4] = "SETINTERFACE",
[5] = "SETCONFIGURATION",
$
Now 'perf trace' ioctl beautifier will translate this new ioctl to a
string and at some point will allow filtering the 'ioctl' syscall with
something like this in a system wide strace-like sessin:
# perf trace -e ioctl/cmd=USBDEVFS_CONNINFO_EX/
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-tkdfbgzqypwco96b309c0ovd@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Picking the changes from:
c5d3e39caa ("drm/i915: Engine discovery query")
a88b6e4cba ("drm/i915: Allow specification of parallel execbuf")
ee1136908e ("drm/i915/execlists: Virtual engine bonding")
6d06779e86 ("drm/i915: Load balancing across a virtual engine")
b81dde7194 ("drm/i915: Allow userspace to clone contexts on creation")
8319f44c05 ("drm/i915: Re-expose SINGLE_TIMELINE flags for context creation")
e620f7b3a2 ("drm/i915: Extend I915_CONTEXT_PARAM_SSEU to support local ctx->engine[]")
976b55f0e1 ("drm/i915: Allow a context to define its set of engines")
7f3f317a66 ("drm/i915: Restore control over ppgtt for context creation ABI")
75b3f1cb50 ("drm: Fix drm.h uapi header for GNU/kFreeBSD")
Silencing these perf build warnings:
Warning: Kernel ABI header at 'tools/include/uapi/drm/drm.h' differs from latest version at 'include/uapi/drm/drm.h'
diff -u tools/include/uapi/drm/drm.h include/uapi/drm/drm.h
Warning: Kernel ABI header at 'tools/include/uapi/drm/i915_drm.h' differs from latest version at 'include/uapi/drm/i915_drm.h'
diff -u tools/include/uapi/drm/i915_drm.h include/uapi/drm/i915_drm.h
Now 'perf trace' and other code that might use the tools/perf/trace/beauty autogenerated
tables will be able to translate this new ioctl code into a string:
$ tools/perf/trace/beauty/drm_ioctl.sh > before
$ cp include/uapi/drm/i915_drm.h tools/include/uapi/drm/i915_drm.h
$ tools/perf/trace/beauty/drm_ioctl.sh > after
$ diff -u before after
--- before 2019-07-26 13:02:22.052723640 -0300
+++ after 2019-07-26 13:02:35.354906036 -0300
@@ -163,4 +163,6 @@
[DRM_COMMAND_BASE + 0x37] = "I915_PERF_ADD_CONFIG",
[DRM_COMMAND_BASE + 0x38] = "I915_PERF_REMOVE_CONFIG",
[DRM_COMMAND_BASE + 0x39] = "I915_QUERY",
+ [DRM_COMMAND_BASE + 0x3a] = "I915_GEM_VM_CREATE",
+ [DRM_COMMAND_BASE + 0x3b] = "I915_GEM_VM_DESTROY",
};
$
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Eric Anholt <eric@anholt.net>
Cc: James Clarke <jrtc27@jrtc27.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://lkml.kernel.org/n/tip-a9173whgu3h1vo24jgdg5do8@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Gen2 doesn't have a frame counter and apparently we no longer provide
a fake .get_vblank_counter() hook for it. That means all tracepoints
calling that hook will oops. Update the tracepoints to use
intel_crtc_get_vblank_counter() which will gracefully fall back to
using the software counter. This is actually a better approach since
we now get (hopefully accurate) frame numbers in the traces.
This also gets rid of the raw driver->get_vblank_counter() calls, which
we need to do in order to switch to the per-crtc vblank vfuncs.
v2: Deal with new tracepoints
v3: Use a distinct variable name for the internal crtc iterator (Chris)
Cc: Shawn Guo <shawn.guo@linaro.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Fixes: 967dd48417 ("drm: remove drm_vblank_no_hw_counter assignment from driver code")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190619170842.20579-2-ville.syrjala@linux.intel.com
(cherry picked from commit 4c888e7bd2)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Kernel crashes during suspend due to wrong conversion in
suspend and resume functions.
Use the proper helper to get ishtp_cl_device instance.
Cc: <stable@vger.kernel.org> # 5.2.x: b12bbdc5: HID: intel-ish-hid: fix wrong driver_data usage
Signed-off-by: Hyungwoo Yang <hyungwoo.yang@intel.com>
Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
When fall-through warnings was enabled by default the following warnings
was starting to show up:
../arch/arm64/kernel/module.c: In function ‘apply_relocate_add’:
../arch/arm64/kernel/module.c:316:19: warning: this statement may fall
through [-Wimplicit-fallthrough=]
overflow_check = false;
~~~~~~~~~~~~~~~^~~~~~~
../arch/arm64/kernel/module.c:317:3: note: here
case R_AARCH64_MOVW_UABS_G0:
^~~~
../arch/arm64/kernel/module.c:322:19: warning: this statement may fall
through [-Wimplicit-fallthrough=]
overflow_check = false;
~~~~~~~~~~~~~~~^~~~~~~
../arch/arm64/kernel/module.c:323:3: note: here
case R_AARCH64_MOVW_UABS_G1:
^~~~
Rework so that the compiler doesn't warn about fall-through.
Fixes: d93512ef0f0e ("Makefile: Globally enable fall-through warning")
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Will Deacon <will@kernel.org>
When fall-through warnings was enabled by default the following warning
was starting to show up:
In file included from ../include/linux/kernel.h:15,
from ../include/linux/list.h:9,
from ../include/linux/kobject.h:19,
from ../include/linux/of.h:17,
from ../include/linux/irqdomain.h:35,
from ../include/linux/acpi.h:13,
from ../arch/arm64/kernel/smp.c:9:
../arch/arm64/kernel/smp.c: In function ‘__cpu_up’:
../include/linux/printk.h:302:2: warning: this statement may fall
through [-Wimplicit-fallthrough=]
printk(KERN_CRIT pr_fmt(fmt), ##__VA_ARGS__)
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../arch/arm64/kernel/smp.c:156:4: note: in expansion of macro ‘pr_crit’
pr_crit("CPU%u: may not have shut down cleanly\n", cpu);
^~~~~~~
../arch/arm64/kernel/smp.c:157:3: note: here
case CPU_STUCK_IN_KERNEL:
^~~~
Rework so that the compiler doesn't warn about fall-through.
Fixes: d93512ef0f0e ("Makefile: Globally enable fall-through warning")
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Will Deacon <will@kernel.org>
Now that -Wimplicit-fallthrough is passed to GCC by default, the kernel
build has suddenly got noisy. Annotate the two fall-through cases in our
hw_breakpoint implementation, since they are both intentional.
Reported-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Will Deacon <will@kernel.org>
Handling of the CPU_PM_ENTER_FAILED transition in the Arm PMU PM
notifier code incorrectly skips restoration of the counters. Fix the
logic so that CPU_PM_ENTER_FAILED follows the same path as CPU_PM_EXIT.
Cc: <stable@vger.kernel.org>
Fixes: da4e4f18af ("drivers/perf: arm_pmu: implement CPU_PM notifier")
Reported-by: Anders Roxell <anders.roxell@linaro.org>
Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Commit d968d2b801 ("ARM: 7497/1: hw_breakpoint: allow single-byte
watchpoints on all addresses") changed the validation requirements for
hardware watchpoints on arch/arm/. Update our compat layer to implement
the same relaxation.
Cc: <stable@vger.kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
When fall-through warnings was enabled by default the following warnings
was starting to show up:
../arch/arm64/kvm/hyp/debug-sr.c: In function ‘__debug_save_state’:
../arch/arm64/kvm/hyp/debug-sr.c:20:19: warning: this statement may fall
through [-Wimplicit-fallthrough=]
case 15: ptr[15] = read_debug(reg, 15); \
../arch/arm64/kvm/hyp/debug-sr.c:113:2: note: in expansion of macro ‘save_debug’
save_debug(dbg->dbg_bcr, dbgbcr, brps);
^~~~~~~~~~
../arch/arm64/kvm/hyp/debug-sr.c:21:2: note: here
case 14: ptr[14] = read_debug(reg, 14); \
^~~~
../arch/arm64/kvm/hyp/debug-sr.c:113:2: note: in expansion of macro ‘save_debug’
save_debug(dbg->dbg_bcr, dbgbcr, brps);
^~~~~~~~~~
../arch/arm64/kvm/hyp/debug-sr.c:21:19: warning: this statement may fall
through [-Wimplicit-fallthrough=]
case 14: ptr[14] = read_debug(reg, 14); \
../arch/arm64/kvm/hyp/debug-sr.c:113:2: note: in expansion of macro ‘save_debug’
save_debug(dbg->dbg_bcr, dbgbcr, brps);
^~~~~~~~~~
../arch/arm64/kvm/hyp/debug-sr.c:22:2: note: here
case 13: ptr[13] = read_debug(reg, 13); \
^~~~
../arch/arm64/kvm/hyp/debug-sr.c:113:2: note: in expansion of macro ‘save_debug’
save_debug(dbg->dbg_bcr, dbgbcr, brps);
^~~~~~~~~~
Rework to add a 'Fall through' comment where the compiler warned
about fall-through, hence silencing the warning.
Fixes: d93512ef0f0e ("Makefile: Globally enable fall-through warning")
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
[maz: fixed commit message]
Signed-off-by: Marc Zyngier <maz@kernel.org>
This patch fix a bug in the host memory polling macro. The bug is that the
memory being polled can be written by the device, which always writes it
in LE. However, if the host is running Linux in BE mode, we need to
convert the value that was written by the device before matching it to the
required value that the caller has given to the macro.
Signed-off-by: Ben Segal <bpsegal20@gmail.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
writeX macros might perform byte-swapping in BE architectures. As our F/W
is in LE format, we need to make sure no byte-swapping will occur.
There is a standard kernel function (called memcpy_toio) for copying data
to I/O area which is used in a lot of drivers to download F/W to PCIe
adapters. That function also makes sure the data is copied "as-is",
without byte-swapping.
This patch use that function to copy the F/W to the GOYA ASIC instead of
writeX macros.
Signed-off-by: Ben Segal <bpsegal20@gmail.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
According to the original dma_direct_alloc_pages() code:
{
unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
if (!dma_release_from_contiguous(dev, page, count))
__free_pages(page, get_order(size));
}
The count parameter for dma_release_from_contiguous() was page
aligned before the right-shifting operation, while the new API
dma_free_contiguous() forgets to have PAGE_ALIGN() at the size.
So this patch simply adds it to prevent any corner case.
Fixes: fdaeec198ada ("dma-contiguous: add dma_{alloc,free}_contiguous() helpers")
Signed-off-by: Nicolin Chen <nicoleotsuka@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
The dma_alloc_contiguous() limits align at CONFIG_CMA_ALIGNMENT for
cma_alloc() however it does not restore it for the fallback routine.
This will result in a size mismatch between the allocation and free
when running into the fallback routines after cma_alloc() fails, if
the align is larger than CONFIG_CMA_ALIGNMENT.
This patch adds a cma_align to take care of cma_alloc() and prevent
the align from being overwritten.
Fixes: fdaeec198ada ("dma-contiguous: add dma_{alloc,free}_contiguous() helpers")
Reported-by: Dafna Hirschfeld <dafna.hirschfeld@collabora.com>
Signed-off-by: Nicolin Chen <nicoleotsuka@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
The kernel mount_block_root() function expects -EACESS or -EINVAL for a
unmountable filesystem when trying to mount the root with different
filesystem types.
However, in 5.3-rc1 the behavior when F2FS code cannot find valid block
changed to return -EFSCORRUPTED(-EUCLEAN), and this error code makes
mount_block_root() fail when trying to probe F2FS.
When the magic number of the superblock mismatches, it has a high
probability that it's just not a F2FS. In this case return -EINVAL seems
to be a better result, and this return value can make mount_block_root()
probing work again.
Return -EINVAL when the superblock has magic mismatch, -EFSCORRUPTED in
other cases (the magic matches but the superblock cannot be recognized).
Fixes: 10f966bbf5 ("f2fs: use generic EFSBADCRC/EFSCORRUPTED")
Signed-off-by: Icenowy Zheng <icenowy@aosc.io>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Explicitly initialize the onstack structures to zero so we don't leak
kernel memory into userspace when converting the in-core inumbers
structure to the v1 inogrp ioctl structure. Add a comment about why we
have to use memset to ensure that the padding holes in the structures
are set to zero.
Fixes: 5f19c7fc68 ("xfs: introduce v5 inode group structure")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
UAPI headers licensed under GPL are supposed to have exception
"WITH Linux-syscall-note" so that they can be included into non-GPL
user space application code.
Unfortunately, people often miss to add it. Break 'make headers'
when any of exported headers lacks the exception note so that the
0-day bot can easily catch it.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Wire up the new clone3 syscall added in commit 7f192e3cd3 ("fork:
add clone3").
This requires a ppc_clone3 wrapper, in order to save the non-volatile
GPRs before calling into the generic syscall code. Otherwise we hit
the BUG_ON in CHECK_FULL_REGS in copy_thread().
Lightly tested using Christian's test code on a Power8 LE VM.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Christian Brauner <christian@brauner.io>
Link: https://lore.kernel.org/r/20190724140259.23554-1-mpe@ellerman.id.au
While sorting out some devicetree issues I found that the pinctrl driver
was failing to acquire its GFX regmap even though the phandle was
present in the devicetree:
[ 0.124190] aspeed-g5-pinctrl 1e6e2000.syscon:pinctrl: No GFX phandle found, some mux configurations may fail
Without access to the GFX regmap we fail to configure the mux for the
VPO function:
[ 1.548866] pinctrl core: add 1 pinctrl maps
[ 1.549826] aspeed-g5-pinctrl 1e6e2000.syscon:pinctrl: found group selector 164 for VPO
[ 1.550638] aspeed-g5-pinctrl 1e6e2000.syscon:pinctrl: request pin 144 (V20) for 1e6e6000.display
[ 1.551346] aspeed-g5-pinctrl 1e6e2000.syscon:pinctrl: request pin 145 (U19) for 1e6e6000.display
...
[ 1.562057] aspeed-g5-pinctrl 1e6e2000.syscon:pinctrl: request pin 218 (T22) for 1e6e6000.display
[ 1.562541] aspeed-g5-pinctrl 1e6e2000.syscon:pinctrl: request pin 219 (R20) for 1e6e6000.display
[ 1.563113] Muxing pin 144 for VPO
[ 1.563456] Want SCU8C[0x00000001]=0x1, got 0x0 from 0x00000000
[ 1.564624] aspeed_gfx 1e6e6000.display: Error applying setting, reverse things back
This turned out to be a simple problem of timing: The ASPEED pinctrl
driver is probed during arch_initcall(), while GFX is processed much
later. As such the GFX syscon is not yet registered during the pinctrl
probe() and we get an -EPROBE_DEFER when we try to look it up, however
we must not defer probing the pinctrl driver for the inability to mux
some GFX-related functions.
Switch to lazily grabbing the regmaps when they're first required by the
mux configuration. This generates a bit of noise in the patch as we have
to drop the `const` qualifier on arguments for several function
prototypes, but has the benefit of working.
I've smoke tested this for the ast2500-evb under qemu with a dummy
graphics device. We now succeed in our attempts to configure the SoC's
VPO pinmux function.
Fixes: 7d29ed88ac ("pinctrl: aspeed: Read and write bits in LPC and GFX controllers")
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Link: https://lore.kernel.org/r/20190724080155.12209-1-andrew@aj.id.au
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
gpio fixes for v5.3-rc3
- fix for user space handling of active-low flag for GPIO events
- fix the stubs for gpiolib: don't WARN() on NULL gpio descriptors
if gpiolib is not compiled
Intel provided the following information:
On all current Atom processors, instructions that use a segment register
value (e.g. a load or store) will not speculatively execute before the
last writer of that segment retires. Thus they will not use a
speculatively written segment value.
That means on ATOMs there is no speculation through SWAPGS, so the SWAPGS
entry paths can be excluded from the extra LFENCE if PTI is disabled.
Create a separate bug flag for the through SWAPGS speculation and mark all
out-of-order ATOMs and AMD/HYGON CPUs as not affected. The in-order ATOMs
are excluded from the whole mitigation mess anyway.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Tyler Hicks <tyhicks@canonical.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Pull structleak fix from Kees Cook:
"Disable gcc-based stack variable auto-init under KASAN (Arnd
Bergmann).
This fixes a bunch of build warnings under KASAN and the
gcc-plugin-based stack auto-initialization features (which are
arguably redundant, so better to let KASAN control this)"
* tag 'meminit-v5.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
structleak: disable STRUCTLEAK_BYREF in combination with KASAN_STACK
Pull char/misc driver fixes from Greg KH:
"Here are some small char and misc driver fixes for 5.3-rc2 to resolve
some reported issues.
Nothing major at all, some binder bugfixes for issues found, some new
mei device ids, firmware building warning fixes, habanalabs fixes, a
few other build fixes, and a MAINTAINERS update.
All of these have been in linux-next with no reported issues"
* tag 'char-misc-5.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
test_firmware: fix a memory leak bug
hpet: Fix division by zero in hpet_time_div()
eeprom: make older eeprom drivers select NVMEM_SYSFS
vmw_balloon: Remove Julien from the maintainers list
fpga-manager: altera-ps-spi: Fix build error
mei: me: add mule creek canyon (EHL) device ids
binder: prevent transactions to context manager from its own process.
binder: Set end of SG buffer area properly.
firmware: Fix missing inline
firmware: fix build errors in paged buffer handling code
habanalabs: don't reset device when getting VRHOT
habanalabs: use %pad for printing a dma_addr_t
Pull tty fixes from Greg KH:
"Here are two tty/vt fixes:
- delete the netx-serial driver as the arch has been removed, no need
to keep the serial driver for it around either.
- vt console_lock fix to resolve a reported noisy warning at runtime
Both of these have been in linux-next with no reported issues"
* tag 'tty-5.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
vt: Grab console_lock around con_is_bound in show_bind
tty: serial: netx: Delete driver
Pull SPDX fixes from Greg KH:
"Here are some small SPDX fixes for 5.3-rc2 for things that came in
during the 5.3-rc1 merge window that we previously missed.
Only three small patches here:
- two uapi patches to resolve some SPDX tags that were not correct
- fix an invalid SPDX tag in the iomap Makefile file
All have been properly reviewed on the public mailing lists"
* tag 'spdx-5.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx:
iomap: fix Invalid License ID
treewide: remove SPDX "WITH Linux-syscall-note" from kernel-space headers again
treewide: add "WITH Linux-syscall-note" to SPDX tag of uapi headers
Pull USB fixes from Greg KH:
"Here are some small fixes for 5.3-rc2. All of these resolve some
reported issues, some more than others :)
Included in here is:
- xhci fix for an annoying issue with odd devices
- reversion of some usb251xb patches that should not have been merged
- usb pci quirk additions and fixups
- usb storage fix
- usb host controller error test fix
All of these have been in linux-next with no reported issues"
* tag 'usb-5.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
xhci: Fix crash if scatter gather is used with Immediate Data Transfer (IDT).
usb: usb251xb: Reallow swap-dx-lanes to apply to the upstream port
Revert "usb: usb251xb: Add US port lanes inversion property"
Revert "usb: usb251xb: Add US lanes inversion dts-bindings"
usb: wusbcore: fix unbalanced get/put cluster_id
usb/hcd: Fix a NULL vs IS_ERR() bug in usb_hcd_setup_local_mem()
usb-storage: Add a limitation for blk_queue_max_hw_sectors()
usb: pci-quirks: Minor cleanup for AMD PLL quirk
usb: pci-quirks: Correct AMD PLL quirk detection
Pull ARM SoC fixes from Olof Johansson:
"Here's the first batch of fixes for this release cycle.
Main diffstat here is the re-deletion of netx. I messed up and most
likely didn't remove the files from the index when I test-merged this
and saw conflicts, and from there on out 'git rerere' remembered the
mistake and I missed checking it. Here it's done again as expected.
Besides that:
- A defconfig refresh + enabling of new drivers for u8500
- i.MX fixlets for i2c/SAI/pinmux
- sleep.S build fix for Davinci
- Broadcom devicetree build/warning fix"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
ARM: defconfig: u8500: Add new drivers
ARM: defconfig: u8500: Refresh defconfig
ARM: dts: bcm: bcm47094: add missing #cells for mdio-bus-mux
ARM: davinci: fix sleep.S build error on ARMv4
arm64: dts: imx8mq: fix SAI compatible
arm64: dts: imx8mm: Correct SAI3 RXC/TXFS pin's mux option #1
ARM: dts: imx6ul: fix clock frequency property name of I2C buses
ARM: Delete netx a second time
ARM: dts: imx7ulp: Fix usb-phy unit address format
If gpiolib is disabled, we use the inline stubs from gpio/consumer.h
instead of regular definitions of GPIO API. The stubs for 'optional'
variants of gpiod_get routines return NULL in this case as if the
relevant GPIO wasn't found. This is correct so far.
Calling other (non-gpio_get) stubs from this header triggers a warning
because the GPIO descriptor couldn't have been requested. The warning
however is unconditional (WARN_ON(1)) and is emitted even if the passed
descriptor pointer is NULL.
We don't want to force the users of 'optional' gpio_get to check the
returned pointer before calling e.g. gpiod_set_value() so let's only
WARN on non-NULL descriptors.
Cc: stable@vger.kernel.org
Reported-by: Claus H. Stovgaard <cst@phaseone.com>
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Jonathan writes:
First set of IIO fixes in the 5.3 cycle.
* cros_ec_accel_legacy
- Fix a false double entry for channel scale as both per channel
and shared.
* gyro_adc
- Fix uninitialized return code that got detected by GCC 9.0 having
be previously missed.
* ingenic_adc
- Set the clock divider on probe to avoid an issue seen with false
button press detections on JZ4725B SoCs.
* max9611
- Backwards parameters in GENMASK.
* mpu6050
- Enforce the fact only certain scan modes are actually possible.
One counter fix also picked up for William,
* generic-counter.rst
- Fix some references.
* tag 'iio-fixes-for-5.3a' of git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio:
iio: adc: gyroadc: fix uninitialized return code
docs: generic-counter.rst: fix broken references for ABI file
iio: imu: mpu6050: add missing available scan masks
iio: cros_ec_accel_legacy: Fix incorrect channel setting
IIO: Ingenic JZ47xx: Set clock divider on probe
iio: adc: max9611: Fix misuse of GENMASK macro
Pull x86 fixes from Thomas Gleixner:
"A set of x86 fixes and functional updates:
- Prevent stale huge I/O TLB mappings on 32bit. A long standing bug
which got exposed by KPTI support for 32bit
- Prevent bogus access_ok() warnings in arch_stack_walk_user()
- Add display quirks for Lenovo devices which have height and width
swapped
- Add the missing CR2 fixup for 32 bit async pagefaults. Fallout of
the CR2 bug fix series.
- Unbreak handling of force enabled HPET by moving the 'is HPET
counting' check back to the original place.
- A more accurate check for running on a hypervisor platform in the
MDS mitigation code. Not perfect, but more accurate than the
previous one.
- Update a stale and confusing comment vs. IRQ stacks"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/speculation/mds: Apply more accurate check on hypervisor platform
x86/hpet: Undo the early counter is counting check
x86/entry/32: Pass cr2 to do_async_page_fault()
x86/irq/64: Update stale comment
x86/sysfb_efi: Add quirks for some devices with swapped width and height
x86/stacktrace: Prevent access_ok() warnings in arch_stack_walk_user()
mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()
x86/mm: Sync also unmappings in vmalloc_sync_all()
x86/mm: Check for pfn instead of page in vmalloc_sync_one()
Pull scheduler fixes from Thomas Gleixner:
"Two fixes for the fair scheduling class:
- Prevent freeing memory which is accessible by concurrent readers
- Make the RCU annotations for numa groups consistent"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: Use RCU accessors consistently for ->numa_group
sched/fair: Don't free p->numa_faults with concurrent readers
Pull perf fixes from Thomas Gleixner:
"A pile of perf related fixes:
Kernel:
- Fix SLOTS PEBS event constraints for Icelake CPUs
- Add the missing mask bit to allow counting hardware generated
prefetches on L3 for Icelake CPUs
- Make the test for hypervisor platforms more accurate (as far as
possible)
- Handle PMUs correctly which override event->cpu
- Yet another missing fallthrough annotation
Tools:
perf.data:
- Fix loading of compressed data split across adjacent records
- Fix buffer size setting for processing CPU topology perf.data
header.
perf stat:
- Fix segfault for event group in repeat mode
- Always separate "stalled cycles per insn" line, it was being
appended to the "instructions" line.
perf script:
- Fix --max-blocks man page description.
- Improve man page description of metrics.
- Fix off by one in brstackinsn IPC computation.
perf probe:
- Avoid calling freeing routine multiple times for same pointer.
perf build:
- Do not use -Wshadow on gcc < 4.8, avoiding too strict warnings
treated as errors, breaking the build"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel: Mark expected switch fall-throughs
perf/core: Fix creating kernel counters for PMUs that override event->cpu
perf/x86: Apply more accurate check on hypervisor platform
perf/x86/intel: Fix invalid Bit 13 for Icelake MSR_OFFCORE_RSP_x register
perf/x86/intel: Fix SLOTS PEBS event constraint
perf build: Do not use -Wshadow on gcc < 4.8
perf probe: Avoid calling freeing routine multiple times for same pointer
perf probe: Set pev->nargs to zero after freeing pev->args entries
perf session: Fix loading of compressed data split across adjacent records
perf stat: Always separate stalled cycles per insn
perf stat: Fix segfault for event group in repeat mode
perf tools: Fix proper buffer size for feature processing
perf script: Fix off by one in brstackinsn IPC computation
perf script: Improve man page description of metrics
perf script: Fix --max-blocks man page description
Pull locking fixes from Thomas Gleixner:
"A set of locking fixes:
- Address the fallout of the rwsem rework. Missing ACQUIREs and a
sanity check to prevent a use-after-free
- Add missing checks for unitialized mutexes when mutex debugging is
enabled.
- Remove the bogus code in the generic SMP variant of
arch_futex_atomic_op_inuser()
- Fixup the #ifdeffery in lockdep to prevent compile warnings"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/mutex: Test for initialized mutex
locking/lockdep: Clean up #ifdef checks
locking/lockdep: Hide unused 'class' variable
locking/rwsem: Add ACQUIRE comments
tty/ldsem, locking/rwsem: Add missing ACQUIRE to read_failed sleep loop
lcoking/rwsem: Add missing ACQUIRE to read_slowpath sleep loop
locking/rwsem: Add missing ACQUIRE to read_slowpath exit when queue is empty
locking/rwsem: Don't call owner_on_cpu() on read-owner
futex: Cleanup generic SMP variant of arch_futex_atomic_op_inuser()
Pull objtool fix from Thomas Gleixner:
"A single robustness fix for objtool to handle unbalanced CLAC
invocations under all circumstances"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Improve UACCESS coverage
It was reported that after resuming from suspend network fails with
error "do_IRQ: 3.38 No irq handler for vector", see [0]. Enabling WoL
can work around the issue, but the only actual fix is to disable MSI.
So let's mimic the behavior of the vendor driver and disable MSI on
all chip versions before RTL8168d.
[0] https://bugzilla.kernel.org/show_bug.cgi?id=204079
Fixes: 6c6aa15fde ("r8169: improve interrupt handling")
Reported-by: Dušan Dragić <dragic.dusan@gmail.com>
Tested-by: Dušan Dragić <dragic.dusan@gmail.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit a6851c613f.
It was reported that RTL8111b successfully finishes 1000/Full autoneg
but no data flows. Reverting the original patch fixes the issue.
It seems to be a HW issue with the integrated RTL8211B PHY. This PHY
version used also e.g. on RTL8168d, so better revert the original patch.
Reported-by: Bernhard Held <berny156@gmx.de>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In phylink_parse_fixedlink() the pl->link_config.advertising bits are AND
with pl->supported, pl->supported is zeroed and only the speed/duplex
modes and MII bits are set.
So pl->link_config.advertising always loses the flow control/pause bits.
By setting Pause and Asym_Pause bits in pl->supported, the flow control
work again when devicetree "pause" is set in fixes-link node and the MAC
advertise that is supports pause.
Results with this patch.
Legend:
- DT = 'Pause' is set in the fixed-link in devicetree.
- validate() = ‘Yes’ means phylink_set(mask, Pause) is set in the
validate().
- flow = results reported my link is Up line.
+-----+------------+-------+
| DT | validate() | flow |
+-----+------------+-------+
| Yes | Yes | rx/tx |
| No | Yes | off |
| Yes | No | off |
+-----+------------+-------+
Fixes: 9525ae8395 ("phylink: add phylink infrastructure")
Signed-off-by: René van Dorst <opensource@vdorst.com>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Dutch consumer grade ISDN network will be shut down on September 1,
2019. This means I'll be converted to some sort of VOIP shortly. At that
point it would be unwise to try to maintain the gigaset driver, even for
odd fixes as I do. So I'll stop maintaining it as a seperate driver and
bump support to CAPI in staging. De facto this means the driver will be
unmaintained, since no-one seems to be working on CAPI.
I've lighty tested the hardware specific modules of this driver (bas-gigaset,
ser-gigaset, and usb-gigaset) for v5.3-rc1. The basic functionality appears to
be working. It's unclear whether anyone still cares. I'm aware of only one
person sort of using the driver a few years ago.
Thanks to Karsten Keil for the ISDN subsystems gigaset was using (I4L and
CAPI). And many thanks to Hansjoerg Lipp and Tilman Schmidt for writing and
upstreaming this driver.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: David S. Miller <davem@davemloft.net>
In rds_rdma_cm_event_handler_cmn(), there are some if statements to
check whether conn is NULL, such as on lines 65, 96 and 112.
But conn is not checked before being used on line 108:
trans->cm_connect_complete(conn, event);
and on lines 140-143:
rdsdebug("DISCONNECT event - dropping connection "
"%pI6c->%pI6c\n", &conn->c_laddr,
&conn->c_faddr);
rds_conn_drop(conn);
Thus, possible null-pointer dereferences may occur.
To fix these bugs, conn is checked before being used.
These bugs are found by a static analysis tool STCheck written by us.
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
gcc-9 complains about a blatant uninitialized variable use that
all earlier compiler versions missed:
drivers/iio/adc/rcar-gyroadc.c:510:5: warning: 'ret' may be used uninitialized in this function [-Wmaybe-uninitialized]
Return -EINVAL instead here and a few lines above it where
we accidentally return 0 on failure.
Cc: stable@vger.kernel.org
Fixes: 059c53b323 ("iio: adc: Add Renesas GyroADC driver")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
In start_isoc_chain(), usb_alloc_urb() on line 1392 may fail
and return NULL. At this time, fifo->iso[i].urb is assigned to NULL.
Then, fifo->iso[i].urb is used at some places, such as:
LINE 1405: fill_isoc_urb(fifo->iso[i].urb, ...)
urb->number_of_packets = num_packets;
urb->transfer_flags = URB_ISO_ASAP;
urb->actual_length = 0;
urb->interval = interval;
LINE 1416: fifo->iso[i].urb->...
LINE 1419: fifo->iso[i].urb->...
Thus, possible null-pointer dereferences may occur.
To fix these bugs, "continue" is added to avoid using fifo->iso[i].urb
when it is NULL.
These bugs are found by a static analysis tool STCheck written by us.
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull Wimplicit-fallthrough enablement from Gustavo A. R. Silva:
"This marks switch cases where we are expecting to fall through, and
globally enables the -Wimplicit-fallthrough option in the main
Makefile.
Finally, some missing-break fixes that have been tagged for -stable:
- drm/amdkfd: Fix missing break in switch statement
- drm/amdgpu/gfx10: Fix missing break in switch statement
With these changes, we completely get rid of all the fall-through
warnings in the kernel"
* tag 'Wimplicit-fallthrough-5.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux:
Makefile: Globally enable fall-through warning
drm/i915: Mark expected switch fall-throughs
drm/amd/display: Mark expected switch fall-throughs
drm/amdkfd/kfd_mqd_manager_v10: Avoid fall-through warning
drm/amdgpu/gfx10: Fix missing break in switch statement
drm/amdkfd: Fix missing break in switch statement
perf/x86/intel: Mark expected switch fall-throughs
mtd: onenand_base: Mark expected switch fall-through
afs: fsclient: Mark expected switch fall-throughs
afs: yfsclient: Mark expected switch fall-throughs
can: mark expected switch fall-throughs
firewire: mark expected switch fall-throughs
Pull s390 updates from Heiko Carstens:
- Add ABI to kernel image file which allows e.g. the file utility to
figure out the kernel version.
- Wire up clone3 system call.
- Add support for kasan bitops instrumentation.
- uapi header cleanup: use __u{16,32,64} instead of uint{16,32,64}_t.
- Provide proper ARCH_ZONE_DMA_BITS so the s390 DMA zone is correctly
defined with 2 GB instead of the default value of 1 MB.
- Farhan Ali leaves the group of vfio-ccw maintainers.
- Various small vfio-ccw fixes.
- Add missing locking for airq_areas array in virtio code.
- Minor qdio improvements.
* tag 's390-5.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
MAINTAINERS: vfio-ccw: Remove myself as the maintainer
s390/mm: use shared variables for sysctl range check
virtio/s390: fix race on airq_areas[]
s390/dma: provide proper ARCH_ZONE_DMA_BITS value
s390/kasan: add bitops instrumentation
s390/bitops: make test functions return bool
s390: wire up clone3 system call
kbuild: enable arch/s390/include/uapi/asm/zcrypt.h for uapi header test
s390: use __u{16,32,64} instead of uint{16,32,64}_t in uapi header
s390/hypfs: fix a typo in the name of a function
s390/qdio: restrict QAOB usage to IQD unicast queues
s390/qdio: add sanity checks to the fast-requeue path
s390: enable detection of kernel version from bzImage
Documentation: fix vfio-ccw doc
vfio-ccw: Update documentation for csch/hsch
vfio-ccw: Don't call cp_free if we are processing a channel program
vfio-ccw: Set pa_nr to 0 if memory allocation fails for pa_iova_pfn
vfio-ccw: Fix memory leak and don't call cp_free in cp_init
vfio-ccw: Fix misleading comment when setting orb.cmd.c64
Pull Devicetree fixes from Rob Herring:
"The nvmem changes would typically go thru Greg's tree, but they were
missed in the merge window. [ Acked by Greg ]
Summary:
- Fix mismatches in $id values and actual filenames. Now checked by
tools.
- Convert nvmem binding to DT schema
- Fix a typo in of_property_read_bool() kerneldoc
- Remove some redundant description in al-fic interrupt-controller"
* tag 'devicetree-fixes-for-5.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
dt-bindings: Fix more $id value mismatches filenames
dt-bindings: nvmem: SID: Fix the examples node names
dt-bindings: nvmem: Add YAML schemas for the generic NVMEM bindings
of: Fix typo in kerneldoc
dt-bindings: interrupt-controller: al-fic: remove redundant binding
dt-bindings: clk: allwinner,sun4i-a10-ccu: Correct path in $id
Pull libnvdimm fixes from Dan Williams:
"A collection of locking and async operations fixes for v5.3-rc2. These
had been soaking in a branch targeting the merge window, but missed
due to a regression hunt. This fixed up version has otherwise been in
-next this past week with no reported issues.
In order to gain confidence in the locking changes the pull also
includes a debug / instrumentation patch to enable lockdep coverage
for libnvdimm subsystem operations that depend on the device_lock for
exclusion. As mentioned in the changelog it is a hack, but it works
and documents the locking expectations of the sub-system in a way that
others can use lockdep to verify. The driver core touches got an ack
from Greg.
Summary:
- Fix duplicate device_unregister() calls (multiple threads competing
to do unregister work when scheduling device removal from a sysfs
attribute of the self-same device).
- Fix badblocks registration order bug. Ensure region badblocks are
initialized in advance of namespace registration.
- Fix a deadlock between the bus lock and probe operations.
- Export device-core infrastructure to coordinate async operations
via the device ->dead state.
- Add device-core infrastructure to validate device_lock() usage with
lockdep"
* tag 'libnvdimm-fixes-5.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
driver-core, libnvdimm: Let device subsystems add local lockdep coverage
libnvdimm/bus: Fix wait_nvdimm_bus_probe_idle() ABBA deadlock
libnvdimm/bus: Stop holding nvdimm_bus_list_mutex over __nd_ioctl()
libnvdimm/bus: Prepare the nd_ioctl() path to be re-entrant
libnvdimm/region: Register badblocks before namespaces
libnvdimm/bus: Prevent duplicate device_unregister() calls
drivers/base: Introduce kill_device()
There are two references to the generic counter ABI, with was added
on a separate patch. Both point to a non-existing file.
Fix them.
Fixes: ea2b23b895 ("counter: Documentation: Add Generic Counter sysfs documentation")
Fixes: 09e7d4ed89 ("docs: Add Generic Counter interface documentation")
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Distribution installation images such as Debian include different sets
of modules which can be downloaded dynamically. Such images may notably
include the hda sound modules but not the i915 DRM module, even if the
latter was enabled at build time, as reported on
https://bugs.debian.org/931507
In such a case hdac_i915 would be linked in and try to load the i915
module, fail since it is not there, but still wait for a whole minute
before giving up binding with it.
This fixes such as case by only waiting for the binding if the module
was properly loaded (or module support is disabled, in which case i915
is already compiled-in anyway).
Fixes: f9b54e1961 ("ALSA: hda/i915: Allow delayed i915 audio component binding")
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
This is unused since commit 9f69a496f1 ("kbuild: split out *.mod out
of {single,multi}-used-m rules").
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Running gen_compile_commands.py after building the kernel with
allnoconfig gave this:
$ ./scripts/gen_compile_commands.py
WARNING: Found 449 entries. Have you compiled the kernel?
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
This file is used by clangd to use language server protocol.
It can be generated at each compile using scripts/gen_compile_commands.py.
Therefore it is different depending on the environment and should be
ignored.
Signed-off-by: Toru Komatsu <k0ma@utam0k.jp>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Commit 415008af32 ("docs-rst: convert lsm from DocBook to ReST")
removed the last users of this macro.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Pull block DMA segment fix from Jens Axboe:
"Here's the virtual boundary segment size fix"
* tag 'for-linus-20190726-2' of git://git.kernel.dk/linux-block:
block: fix max segment size handling in blk_queue_virt_boundary
Pull selinux fix from Paul Moore:
"One small SELinux patch to add some proper bounds/overflow checking
when adding a new sid/secid"
* tag 'selinux-pr-20190726' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: check sidtab limit before adding a new entry
f2fs_allocate_data_block() invalidates old block address and enable new block
address. Then, if we try to read old block by f2fs_submit_page_bio(), it will
give WARN due to reading invalid blocks.
Let's make the order sanely back.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Libbpf stores associated BTF FD per each instance of bpf_program. When
program is unloaded, that FD is closed. This is wrong, because leads to
a race and possibly closing of unrelated files, if application
simultaneously opens new files while bpf_programs are unloaded.
It's also unnecessary, because struct btf "owns" that FD, and
btf__free(), called from bpf_object__close() will close it. Thus the fix
is to never have per-program BTF FD and fetch it from obj->btf, when
necessary.
Fixes: 2993e0515b ("tools/bpf: add support to read .BTF.ext sections")
Reported-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Now that the examples are validated, the examples in the SID binding
generates an error since the node names aren't one of the valid ones.
Let's switch for one that is ok.
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
Signed-off-by: Rob Herring <robh@kernel.org>
The nvmem providers and consumers have a bunch of generic properties that
are needed in a device tree. Add a YAML schemas for those.
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
[Srini: Changed licence to (GPL-2.0 OR BSD-2-Clause)]
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Signed-off-by: Rob Herring <robh@kernel.org>
"Findfrom" is not a word. Replace the function synopsis by something
that makes sense.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Saeed Mahameed says:
====================
Mellanox, mlx5 fixes 2019-07-25
This series introduces some fixes to mlx5 driver.
1) Ariel is addressing an issue with enacp flow counter race condition
2) Aya fixes ethtool speed handling
3) Edward fixes modify_cq hw bits alignment
4) Maor fixes RDMA_RX capabilities handling
5) Mark reverses unregister devices order to address an issue with LAG
6) From Tariq,
- wrong max num channels indication regression
- TLS counters naming and documentation as suggested by Jakub
- kTLS, Call WARN_ONCE on netdev mismatch
There is one patch in this series that touches nfp driver to align
TLS statistics names with latest documentation, Jakub is CC'ed.
Please pull and let me know if there is any problem.
For -stable v4.9:
('net/mlx5: Use reversed order when unregister devices')
For -stable v4.20
('net/mlx5e: Prevent encap flow counter update async to user query')
('net/mlx5: Fix modify_cq_in alignment')
For -stable v5.1
('net/mlx5e: Fix matching of speed to PRM link modes')
For -stable v5.2
('net/mlx5: Add missing RDMA_RX capabilities')
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull SCSI fixes from James Bottomley:
"Nine fixes: The most important core one is the dma_max_mapping_size
fix that corrects the boot problem Gunter Roeck was having. A couple
of other driver only fixes are significant, like the cxgbi selector
support addition, the alua 2 second delay and the fdomain build fix"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: scsi_dh_alua: always use a 2 second delay before retrying RTPG
scsi: ibmvfc: fix WARN_ON during event pool release
scsi: fcoe: fix a typo
scsi: megaraid_sas: Make some functions static
scsi: megaraid_sas: fix panic on loading firmware crashdump
scsi: megaraid_sas: fix spelling mistake "megarid_sas" -> "megaraid_sas"
scsi: core: fix the dma_max_mapping_size call
scsi: fdomain: fix building pcmcia front-end
scsi: target: cxgbit: add support for IEEE_8021QAZ_APP_SEL_STREAM selector
The udp_ip4_ind bit is set only for IPv4 UDP non-fragmented packets
so that the hardware can flip the checksum to 0xFFFF if the computed
checksum is 0 per RFC768.
However, this bit had to be set for IPv6 UDP non fragmented packets
as well per hardware requirements. Otherwise, IPv6 UDP packets
with computed checksum as 0 were transmitted by hardware and were
dropped in the network.
In addition to setting this bit for IPv6 UDP, the field is also
appropriately renamed to udp_ind as part of this change.
Fixes: 5eb5f8608e ("net: qualcomm: rmnet: Add support for TX checksum offload")
Cc: Sean Tranchetti <stranche@codeaurora.org>
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip4ip6/ip6ip6 tunnels run iptunnel_handle_offloads on xmit which
can cause a possible use-after-free accessing iph/ipv6h pointer
since the packet will be 'uncloned' running pskb_expand_head if
it is a cloned gso skb.
Fixes: 0e9a709560 ("ip6_tunnel, ip6_gre: fix setting of DSCP on encapsulated packets")
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull drm fixes from Daniel Vetter:
"Dave seems to collect an entire streak of things happening, so again
me typing pull summary.
Nothing nefarious here, most of the fixes are for new stuff or things
users won't see. The amd-display patches are a bit different, and very
much look like they should have at least some cc: stable tags. Might
be amd is a bit too comfortable with their internal tree and not
enough looking at upstream. Dave&me are looking into this, in case
something needs rectified with process here.
Also no intel fixes pull, but intel CI is general become rather good,
still I guess expect a notch more for -rc3.
Summary:
amdgpu:
- fixes for (new in 5.3) hw support (vega20, navi)
- disable RAS
- lots of display fixes all over (audio, DSC, dongle, clock mgr)
ttm:
- fix dma_free_attrs calls to appease dma debugging
msm:
- fixes for dma-api, locking debug and compiler splats
core:
- fix cmdline mode to not apply rotation if not specified (new in 5.3)
- compiler warn fix"
* tag 'drm-fixes-2019-07-26' of git://anongit.freedesktop.org/drm/drm: (46 commits)
drm/amd/display: Set enabled to false at start of audio disable
drm/amdgpu/smu: move fan rpm query into the asic specific code
drm/amd/powerplay: custom peak clock freq for navi10
drm: silence variable 'conn' set but not used
drm/msm: stop abusing dma_map/unmap for cache
drm/msm/dpu: Correct dpu encoder spinlock initialization
drm/msm: correct NULL pointer dereference in context_init
drm/amd/display: handle active dongle port type is DP++ or DP case
drm/amd/display: do not read link setting if edp not connected
drm/amd/display: Increase size of audios array
drm/amd/display: drop ASSERT() if eDP panel is not connected
drm/amd/display: Only enable audio if speaker allocation exists
drm/amd/display: Fix dc_create failure handling and 666 color depths
drm/amd/display: allocate 4 ddc engines for RV2
drm/amd/display: put back front end initialization sequence
drm/amd/display: Wait for flip to complete
drm/amd/display: Change min_h_sync_width from 8 to 4
drm/amd/display: use encoder's engine id to find matched free audio device
drm/amd/display: fix DMCU hang when going into Modern Standby
drm/amd/display: Disable Audio on reinitialize hardware
...
Make sure the delayed work for stats update is not pending before
wq destruction.
This fixes the module unload path.
The issue is there since day 1.
Fixes: a556c76adc ("net: mscc: Add initial Ocelot switch support")
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The hw_ver field was initialized to zero. Return the chip revision.
This is relevant for rdma driver.
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The BroadMobi BM818 M.2 card uses the QMI protocol
Signed-off-by: Bob Ham <bob.ham@puri.sm>
Signed-off-by: Angus Ainslie (Purism) <angus@akkea.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
We should only set the max segment size to unlimited if we actually
have a virt boundary. Otherwise we accidentally clear that limit
when called from the SCSI midlayer, which always calls
blk_queue_virt_boundary, even if that mask is 0.
Fixes: 7ad388d8e4 ("scsi: core: add a host / host template field for the virt boundary")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull documentation fixes from Jonathan Corbet:
"This is mostly a set of follow-on fixes from Mauro fixing various
fallout from the massive RST conversion; a few other small fixes as
well"
* tag 'docs-5.3-1' of git://git.lwn.net/linux: (21 commits)
docs: phy: Drop duplicate 'be made'
doc:it_IT: translations in process/
docs/vm: transhuge: fix typo in madvise reference
doc:it_IT: rephrase statement
doc:it_IT: align translation to mainline
docs: load_config.py: ensure subdirs end with "/"
docs: virtual: add it to the documentation body
docs: remove extra conf.py files
docs: load_config.py: avoid needing a conf.py just due to LaTeX docs
scripts/sphinx-pre-install: seek for Noto CJK fonts for pdf output
scripts/sphinx-pre-install: cleanup Gentoo checks
scripts/sphinx-pre-install: fix latexmk dependencies
scripts/sphinx-pre-install: don't use LaTeX with CentOS 7
scripts/sphinx-pre-install: fix script for RHEL/CentOS
docs: conf.py: only use CJK if the font is available
docs: conf.py: add CJK package needed by translations
docs: pdf: add all Documentation/*/index.rst to PDF output
docs: fix broken doc references due to renames
docs: power: add it to to the main documentation index
docs: powerpc: convert docs to ReST and rename to *.rst
...
Pull arm64 fixes from Will Deacon:
"There's more here than we usually have at this stage, but that's
mainly down to the stacktrace changes which came in slightly too late
for the merge window.
Summary:
- Big bad batch of MAINTAINERS updates
- Fix handling of SP alignment fault exceptions
- Fix PSTATE.SSBS handling on heterogeneous systems
- Fix fallout from moving to the generic vDSO implementation
- Fix stack unwinding in the face of frame corruption
- Fix off-by-one in IORT code
- Minor SVE cleanups"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
ACPI/IORT: Fix off-by-one check in iort_dev_find_its_id()
arm64: entry: SP Alignment Fault doesn't write to FAR_EL1
arm64: Force SSBS on context switch
MAINTAINERS: Update my email address
MAINTAINERS: Update my email address
MAINTAINERS: Fix spelling mistake in my name
MAINTAINERS: Update my email address to @kernel.org
arm64: mm: Drop pte_huge()
arm64/sve: Fix a couple of magic numbers for the Z-reg count
arm64/sve: Factor out FPSIMD to SVE state conversion
arm64: stacktrace: Better handle corrupted stacks
arm64: stacktrace: Factor out backtrace initialisation
arm64: stacktrace: Constify stacktrace.h functions
arm64: vdso: Cleanup Makefiles
arm64: vdso: fix flip/flop vdso build bug
arm64: vdso: Fix population of AT_SYSINFO_EHDR for compat vdso
Pull btrfs fixes from David Sterba:
"Two regression fixes:
- hangs caused by a missing barrier in the locking code
- memory leaks of extent_state due to bad handling of a cached
pointer"
* tag 'for-5.3-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix extent_state leak in btrfs_lock_and_flush_ordered_range
btrfs: Fix deadlock caused by missing memory barrier
Pull vfs umount_tree() leak fix from Al Viro:
"Fix braino introduced in 'switch the remnants of releasing the
mountpoint away from fs_pin'.
The most visible result is leaking struct mount when mounting btrfs,
making it impossible to shut down"
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fix the struct mount leak in umount_tree()
Pull block fixes from Jens Axboe:
- Several io_uring fixes/improvements:
- Blocking fix for O_DIRECT (me)
- Latter page slowness for registered buffers (me)
- Fix poll hang under certain conditions (me)
- Defer sequence check fix for wrapped rings (Zhengyuan)
- Mismatch in async inc/dec accounting (Zhengyuan)
- Memory ordering issue that could cause stall (Zhengyuan)
- Track sequential defer in bytes, not pages (Zhengyuan)
- NVMe pull request from Christoph
- Set of hang fixes for wbt (Josef)
- Redundant error message kill for libahci (Ding)
- Remove unused blk_mq_sched_started_request() and related ops (Marcos)
- drbd dynamic alloc shash descriptor to reduce stack use (Arnd)
- blkcg ->pd_stat() non-debug print (Tejun)
- bcache memory leak fix (Wei)
- Comment fix (Akinobu)
- BFQ perf regression fix (Paolo)
* tag 'for-linus-20190726' of git://git.kernel.dk/linux-block: (24 commits)
io_uring: ensure ->list is initialized for poll commands
Revert "nvme-pci: don't create a read hctx mapping without read queues"
nvme: fix multipath crash when ANA is deactivated
nvme: fix memory leak caused by incorrect subsystem free
nvme: ignore subnqn for ADATA SX6000LNP
drbd: dynamically allocate shash descriptor
block: blk-mq: Remove blk_mq_sched_started_request and started_request
bcache: fix possible memory leak in bch_cached_dev_run()
io_uring: track io length in async_list based on bytes
io_uring: don't use iov_iter_advance() for fixed buffers
block: properly handle IOCB_NOWAIT for async O_DIRECT IO
blk-mq: allow REQ_NOWAIT to return an error inline
io_uring: add a memory barrier before atomic_read
rq-qos: use a mb for got_token
rq-qos: set ourself TASK_UNINTERRUPTIBLE after we schedule
rq-qos: don't reset has_sleepers on spurious wakeups
rq-qos: fix missed wake-ups in rq_qos_throttle
wait: add wq_has_single_sleeper helper
block, bfq: check also in-flight I/O in dispatch plugging
block: fix sysfs module parameters directory path in comment
...
Pull sound fixes from Takashi Iwai:
"All relatively small changes:
- a regression fix for PCM link code with CONFIG_REFCOUNT_FULL;
stumbled on a slight difference between atomic_t and refcount_t
- a couple of HD-audio stabilization patches addressing the too slow
PM resume seen on some Intel chips
- a series of ALSA compress-offload API fixes, including the
regression by the previous capture stream support
- trivial LINE6 USB-audio driver fixes, a new Conexant HD-audio chip
coverage, and a fix in AC97 bus error path"
* tag 'sound-5.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda - Add a conexant codec entry to let mute led work
ALSA: hda - Fix intermittent CORB/RIRB stall on Intel chips
ALSA: ac97: Fix double free of ac97_codec_device
ALSA: compress: Be more restrictive about when a drain is allowed
ALSA: compress: Don't allow paritial drain operations on capture streams
ALSA: compress: Prevent bypasses of set_params
ALSA: compress: Fix regression on compressed capture streams
ALSA: line6: Fix a typo
ALSA: pcm: Fix refcount_inc() on zero usage
ALSA: line6: Fix wrong altsetting for LINE6_PODHD500_1
ALSA: hda - Optimize resume for codecs without jack detection
Pull IOMMU fixes from Joerg Roedel:
- revert an Intel VT-d patch that caused boot problems on some machines
- fix AMD IOMMU interrupts with x2apic enabled
- fix a potential crash when Intel VT-d domain allocation fails
- fix crash in Intel VT-d driver when accessing a domain without a
flush queue
- formatting fix for new Intel VT-d debugfs code
- fix for use-after-free bug in IOVA code
- fix for a NULL-pointer dereference in Intel VT-d driver when PCI
hotplug is used
- compilation fix for one of the previous fixes
* tag 'iommu-fixes-v5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/amd: Add support for X2APIC IOMMU interrupts
iommu/iova: Fix compilation error with !CONFIG_IOMMU_IOVA
iommu/vt-d: Print pasid table entries MSB to LSB in debugfs
iommu/iova: Remove stale cached32_node
iommu/vt-d: Check if domain->pgd was allocated
iommu/vt-d: Don't queue_iova() if there is no flush queue
iommu/vt-d: Avoid duplicated pci dma alias consideration
Revert "iommu/vt-d: Consolidate domain_init() to avoid duplication"
Pull iscsi_ibft fix from Konrad Rzeszutek Wilk:
"One tiny fix to enable iSCSI IBFT to be compiled under ARM"
* 'for-linus-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/ibft:
iscsi_ibft: make ISCSI_IBFT depend on ACPI instead of ISCSI_IBFT_FIND
Pull hwmon fixes from Guenter Roeck:
"A couple of hwmon bug fixes:
- Update k8temp documentation URL
- Register address fixes in nct6775 driver
- Fix potential division by zero in occ driver"
* tag 'hwmon-for-v5.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (k8temp) documentation: update URL of datasheet
hwmon: (nct6775) Fix register address and added missed tolerance for nct6106
hwmon: (occ) Fix division by zero issue
Picking the changes from:
66bb8a065f ("KVM: x86: PMU Event Filter")
f087a02941 ("KVM: nVMX: Stash L1's CR3 in vmcs01.GUEST_CR3 on nested entry w/o EPT")
99adb56763 ("KVM: arm/arm64: Add save/restore support for firmware workaround state")
Silencing this perf build warning:
Warning: Kernel ABI header at 'tools/arch/arm/include/uapi/asm/kvm.h' differs from latest version at 'arch/arm/include/uapi/asm/kvm.h'
diff -u tools/arch/arm/include/uapi/asm/kvm.h arch/arm/include/uapi/asm/kvm.h
Warning: Kernel ABI header at 'tools/arch/arm64/include/uapi/asm/kvm.h' differs from latest version at 'arch/arm64/include/uapi/asm/kvm.h'
diff -u tools/arch/arm64/include/uapi/asm/kvm.h arch/arm64/include/uapi/asm/kvm.h
Warning: Kernel ABI header at 'tools/arch/x86/include/uapi/asm/vmx.h' differs from latest version at 'arch/x86/include/uapi/asm/vmx.h'
diff -u tools/arch/x86/include/uapi/asm/vmx.h arch/x86/include/uapi/asm/vmx.h
Warning: Kernel ABI header at 'tools/arch/x86/include/uapi/asm/kvm.h' differs from latest version at 'arch/x86/include/uapi/asm/kvm.h'
diff -u tools/arch/x86/include/uapi/asm/kvm.h arch/x86/include/uapi/asm/kvm.h
Warning: Kernel ABI header at 'tools/include/uapi/linux/kvm.h' differs from latest version at 'include/uapi/linux/kvm.h'
diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h
Now 'perf trace' and other code that might use the tools/perf/trace/beauty autogenerated
tables will be able to translate this new ioctl code into a string:
$ tools/perf/trace/beauty/kvm_ioctl.sh > before
$
$ cp include/uapi/linux/kvm.h tools/include/uapi/linux/kvm.h
$ tools/perf/trace/beauty/kvm_ioctl.sh > after
$ diff -u before after
--- before 2019-07-26 12:32:47.959220236 -0300
+++ after 2019-07-26 12:33:05.766464871 -0300
@@ -79,6 +79,7 @@
[0xac] = "SET_ONE_REG",
[0xad] = "KVMCLOCK_CTRL",
[0xb0] = "GET_REG_LIST",
+ [0xb2] = "SET_PMU_EVENT_FILTER",
[0xb7] = "SMI",
[0xba] = "MEMORY_ENCRYPT_OP",
[0xbb] = "MEMORY_ENCRYPT_REG_REGION",
$
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andre Przywara <andre.przywara@arm.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Eric Hankland <ehankland@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Link: https://lkml.kernel.org/n/tip-py1gcmt6rboehlwg6zvagfg2@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We've added two ESR exception classes for new ARM hardware extensions:
ESR_ELx_EC_PAC and ESR_ELx_EC_SVE, but failed to update the strings
used in tracing and other debug.
Let's update "kvm_arm_exception_class" for these two EC, which the
new EC will be visible to user-space via kvm_exit trace events
Also update to "esr_class_str" for ESR_ELx_EC_PAC, by which we can
get more readable debug info.
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: James Morse <james.morse@arm.com>
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
When fall-through warnings was enabled by default the following warnings
was starting to show up:
../virt/kvm/arm/hyp/vgic-v3-sr.c: In function ‘__vgic_v3_save_aprs’:
../virt/kvm/arm/hyp/vgic-v3-sr.c:351:24: warning: this statement may fall
through [-Wimplicit-fallthrough=]
cpu_if->vgic_ap0r[2] = __vgic_v3_read_ap0rn(2);
~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~
../virt/kvm/arm/hyp/vgic-v3-sr.c:352:2: note: here
case 6:
^~~~
../virt/kvm/arm/hyp/vgic-v3-sr.c:353:24: warning: this statement may fall
through [-Wimplicit-fallthrough=]
cpu_if->vgic_ap0r[1] = __vgic_v3_read_ap0rn(1);
~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~
../virt/kvm/arm/hyp/vgic-v3-sr.c:354:2: note: here
default:
^~~~~~~
Rework so that the compiler doesn't warn about fall-through.
Fixes: d93512ef0f0e ("Makefile: Globally enable fall-through warning")
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
When fall-through warnings was enabled by default, commit d93512ef0f0e
("Makefile: Globally enable fall-through warning"), the following
warnings was starting to show up:
In file included from ../arch/arm64/include/asm/kvm_emulate.h:19,
from ../arch/arm64/kvm/regmap.c:13:
../arch/arm64/kvm/regmap.c: In function ‘vcpu_write_spsr32’:
../arch/arm64/include/asm/kvm_hyp.h:31:3: warning: this statement may fall
through [-Wimplicit-fallthrough=]
asm volatile(ALTERNATIVE(__msr_s(r##nvh, "%x0"), \
^~~
../arch/arm64/include/asm/kvm_hyp.h:46:31: note: in expansion of macro ‘write_sysreg_elx’
#define write_sysreg_el1(v,r) write_sysreg_elx(v, r, _EL1, _EL12)
^~~~~~~~~~~~~~~~
../arch/arm64/kvm/regmap.c:180:3: note: in expansion of macro ‘write_sysreg_el1’
write_sysreg_el1(v, SYS_SPSR);
^~~~~~~~~~~~~~~~
../arch/arm64/kvm/regmap.c:181:2: note: here
case KVM_SPSR_ABT:
^~~~
In file included from ../arch/arm64/include/asm/cputype.h:132,
from ../arch/arm64/include/asm/cache.h:8,
from ../include/linux/cache.h:6,
from ../include/linux/printk.h:9,
from ../include/linux/kernel.h:15,
from ../include/asm-generic/bug.h:18,
from ../arch/arm64/include/asm/bug.h:26,
from ../include/linux/bug.h:5,
from ../include/linux/mmdebug.h:5,
from ../include/linux/mm.h:9,
from ../arch/arm64/kvm/regmap.c:11:
../arch/arm64/include/asm/sysreg.h:837:2: warning: this statement may fall
through [-Wimplicit-fallthrough=]
asm volatile("msr " __stringify(r) ", %x0" \
^~~
../arch/arm64/kvm/regmap.c:182:3: note: in expansion of macro ‘write_sysreg’
write_sysreg(v, spsr_abt);
^~~~~~~~~~~~
../arch/arm64/kvm/regmap.c:183:2: note: here
case KVM_SPSR_UND:
^~~~
Rework to add a 'break;' in the swich-case since it didn't have that,
leading to an interresting set of bugs.
Cc: stable@vger.kernel.org # v4.17+
Fixes: a892819560 ("KVM: arm64: Prepare to handle deferred save/restore of 32-bit registers")
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
[maz: reworked commit message, fixed stable range]
Signed-off-by: Marc Zyngier <maz@kernel.org>
The gic_node is still being used in the rza1_irqc_parse_map() call
after the of_node_put() call, which may result in use-after-free.
Fixes: a644ccb819 ("irqchip: Add Renesas RZ/A1 Interrupt Controller driver")
Signed-off-by: Wen Yang <wen.yang99@zte.com.cn>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Marc Zyngier <maz@kernel.org>
The GPCv2 is a stacked IRQ controller below the ARM GIC. It doesn't
care about the IRQ type itself, but needs to forward the type to the
parent IRQ controller, so this one can be configured correctly.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Each iteration of for_each_child_of_node puts the previous node, but
in the case of a return from the middle of the loop, there is no put,
thus causing a memory leak. Add an of_node_put before the return in
three places.
Issue found with Coccinelle.
Signed-off-by: Nishka Dasgupta <nishkadg.linux@gmail.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
When fall-through warnings was enabled by default the following warning
was starting to show up:
In file included from ../arch/arm64/include/asm/cputype.h:132,
from ../arch/arm64/include/asm/cache.h:8,
from ../include/linux/cache.h:6,
from ../include/linux/printk.h:9,
from ../include/linux/kernel.h:15,
from ../include/linux/list.h:9,
from ../include/linux/kobject.h:19,
from ../include/linux/of.h:17,
from ../include/linux/irqdomain.h:35,
from ../include/linux/acpi.h:13,
from ../drivers/irqchip/irq-gic-v3.c:9:
../drivers/irqchip/irq-gic-v3.c: In function ‘gic_cpu_sys_reg_init’:
../arch/arm64/include/asm/sysreg.h:853:2: warning: this statement may fall
through [-Wimplicit-fallthrough=]
asm volatile(__msr_s(r, "%x0") : : "rZ" (__val)); \
^~~
../arch/arm64/include/asm/arch_gicv3.h:20:29: note: in expansion of macro ‘write_sysreg_s’
#define write_gicreg(v, r) write_sysreg_s(v, SYS_ ## r)
^~~~~~~~~~~~~~
../drivers/irqchip/irq-gic-v3.c:773:4: note: in expansion of macro ‘write_gicreg’
write_gicreg(0, ICC_AP0R2_EL1);
^~~~~~~~~~~~
../drivers/irqchip/irq-gic-v3.c:774:3: note: here
case 6:
^~~~
Rework so that the compiler doesn't warn about fall-through.
Fixes: d93512ef0f0e ("Makefile: Globally enable fall-through warning")
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
The slot_width is a property for the bus while the constraint for
SNDRV_PCM_HW_PARAM_SAMPLE_BITS is for the in memory format.
Applying slot_width constraint to sample_bits works most of the time, but
it will blacklist valid formats in some cases.
With slot_width 24 we can support S24_3LE and S24_LE formats as they both
look the same on the bus, but a a 24 constraint on sample_bits would not
allow S24_LE as it is stored in 32bits in memory.
Implement a simple hw_rule function to allow all formats which require less
or equal number of bits on the bus as slot_width (if configured).
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Link: https://lore.kernel.org/r/20190726064244.3762-2-peter.ujfalusi@ti.com
Signed-off-by: Mark Brown <broonie@kernel.org>
This reverts commit db51707b9c.
Revert "ASoC: rockchip: i2s: Support mono capture"
Previous discussion in
https://patchwork.kernel.org/patch/10147153/
explains the issue of the patch.
While device is configured as 1-ch, hardware is still
generating a 2-ch stream.
When user space reads the data and assumes it is a 1-ch stream,
the rate will be slower by 2x.
Revert the change so 1-ch is not supported.
User space can selectively take one channel data out of two channel
if 1-ch is preferred.
Currently, both channels record identical data.
Signed-off-by: Cheng-Yi Chiang <cychiang@chromium.org>
Link: https://lore.kernel.org/r/20190726044202.26866-1-cychiang@chromium.org
Signed-off-by: Mark Brown <broonie@kernel.org>
When running McASP as master capture alone will not record any audio unless
a parallel playback stream is running. As soon as the playback stops the
captured data is going to be silent again.
In McASP master mode we need to set the PDIR for the clock pins and fix
the mcasp_set_axr_pdir() to skip the bits in the PDIR registers above
AMUTE.
This went unnoticed as most of the boards uses McASP as slave and neither
of these issues are visible (audible) in those setups.
Fixes: ca3d943334 ("ASoC: davinci-mcasp: Update PDIR (pin direction) register handling")
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Link: https://lore.kernel.org/r/20190725083423.7321-1-peter.ujfalusi@ti.com
Signed-off-by: Mark Brown <broonie@kernel.org>
We need to drop everything we remove from the tree, whether
mnt_has_parent() is true or not. Usually the bug manifests as a slow
memory leak (leaked struct mount for initramfs); it becomes much more
visible in mount_subtree() users, such as btrfs. There we leak
a struct mount for btrfs superblock being mounted, which prevents
fs shutdown on subsequent umount.
Fixes: 56cbb429d9 ("switch the remnants of releasing the mountpoint away from fs_pin")
Reported-by: Nikolay Borisov <nborisov@suse.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Commit 33d915d9e8 ("{nl,mac}80211: allow 4addr AP operation on
crypto controlled devices") has introduced a change which allows
4addr operation on crypto controlled devices (ex: ath10k). This
change has inadvertently impacted the interface combinations logic
on such devices.
General rule is that software interfaces like AP/VLAN should not be
listed under supported interface combinations and should not be
considered during validation of these combinations; because of the
aforementioned change, AP/VLAN interfaces(if present) will be checked
against interfaces supported by the device and blocks valid interface
combinations.
Consider a case where an AP and AP/VLAN are up and running; when a
second AP device is brought up on the same physical device, this AP
will be checked against the AP/VLAN interface (which will not be
part of supported interface combinations of the device) and blocks
second AP to come up.
Add a new API cfg80211_iftype_allowed() to fix the problem, this
API works for all devices with/without SW crypto control.
Signed-off-by: Manikanta Pubbisetty <mpubbise@codeaurora.org>
Fixes: 33d915d9e8 ("{nl,mac}80211: allow 4addr AP operation on crypto controlled devices")
Link: https://lore.kernel.org/r/1563779690-9716-1-git-send-email-mpubbise@codeaurora.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This seems to cause guest and host memory corruption.
Disable for now until we get a better handle on that.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
I will not be able to continue with my maintainership responsibilities
going forward, so remove myself as the maintainer.
Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Since commit eec4844fae ("proc/sysctl: add shared variables for range
check") special shared variables are available for sysctl range check.
Reuse them for /proc/sys/vm/allocate_pgste proc handler.
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
The access to airq_areas was racy ever since the adapter interrupts got
introduced to virtio-ccw, but since commit 39c7dcb158 ("virtio/s390:
make airq summary indicators DMA") this became an issue in practice as
well. Namely before that commit the airq_info that got overwritten was
still functional. After that commit however the two infos share a
summary_indicator, which aggravates the situation. Which means
auto-online mechanism occasionally hangs the boot with virtio_blk.
Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 96b14536d9 ("virtio-ccw: virtio-ccw adapter interrupt support.")
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
On s390 ZONE_DMA is up to 2G, i.e. ARCH_ZONE_DMA_BITS should be 31 bits.
The current value is 24 and makes __dma_direct_alloc_pages() take a
wrong turn first (but __dma_direct_alloc_pages() recovers then).
Let's correct ARCH_ZONE_DMA_BITS value and avoid wrong turns.
Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Reported-by: Petr Tesarik <ptesarik@suse.cz>
Fixes: c61e963734 ("dma-direct: add support for allocation from ZONE_DMA and ZONE_DMA32")
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
btrfs_lock_and_flush_ordered_range() loads given "*cached_state" into
cachedp, which, in general, is NULL. Then, lock_extent_bits() updates
"cachedp", but it never goes backs to the caller. Thus the caller still
see its "cached_state" to be NULL and never free the state allocated
under btrfs_lock_and_flush_ordered_range(). As a result, we will
see massive state leak with e.g. fstests btrfs/005. Fix this bug by
properly handling the pointers.
Fixes: bd80d94efb ("btrfs: Always use a cached extent_state in btrfs_lock_and_flush_ordered_range")
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/xen/xen-pciback/conf_space_capability.c: In function pm_ctrl_write:
drivers/xen/xen-pciback/conf_space_capability.c:119:25: warning:
variable old_state set but not used [-Wunused-but-set-variable]
It is never used so can be removed.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
In preparation to enabling -Wimplicit-fallthrough, mark switch
cases where we are expecting to fall through.
This patch fixes the following warnings:
drivers/gpu/drm/i915/gem/i915_gem_mman.c: In function ‘i915_gem_fault’:
drivers/gpu/drm/i915/gem/i915_gem_mman.c:342:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
if (!i915_terminally_wedged(i915))
^
drivers/gpu/drm/i915/gem/i915_gem_mman.c:345:2: note: here
case -EAGAIN:
^~~~
drivers/gpu/drm/i915/gem/i915_gem_pages.c: In function ‘i915_gem_object_map’:
./include/linux/compiler.h:78:22: warning: this statement may fall through [-Wimplicit-fallthrough=]
# define unlikely(x) __builtin_expect(!!(x), 0)
^~~~~~~~~~~~~~~~~~~~~~~~~~
./include/asm-generic/bug.h:136:2: note: in expansion of macro ‘unlikely’
unlikely(__ret_warn_on); \
^~~~~~~~
drivers/gpu/drm/i915/i915_utils.h:49:25: note: in expansion of macro ‘WARN’
#define MISSING_CASE(x) WARN(1, "Missing case (%s == %ld)\n", \
^~~~
drivers/gpu/drm/i915/gem/i915_gem_pages.c:270:3: note: in expansion of macro ‘MISSING_CASE’
MISSING_CASE(type);
^~~~~~~~~~~~
drivers/gpu/drm/i915/gem/i915_gem_pages.c:272:2: note: here
case I915_MAP_WB:
^~~~
drivers/gpu/drm/i915/i915_gpu_error.c: In function ‘error_record_engine_registers’:
./include/linux/compiler.h:78:22: warning: this statement may fall through [-Wimplicit-fallthrough=]
# define unlikely(x) __builtin_expect(!!(x), 0)
^~~~~~~~~~~~~~~~~~~~~~~~~~
./include/asm-generic/bug.h:136:2: note: in expansion of macro ‘unlikely’
unlikely(__ret_warn_on); \
^~~~~~~~
drivers/gpu/drm/i915/i915_utils.h:49:25: note: in expansion of macro ‘WARN’
#define MISSING_CASE(x) WARN(1, "Missing case (%s == %ld)\n", \
^~~~
drivers/gpu/drm/i915/i915_gpu_error.c:1196:5: note: in expansion of macro ‘MISSING_CASE’
MISSING_CASE(engine->id);
^~~~~~~~~~~~
drivers/gpu/drm/i915/i915_gpu_error.c:1197:4: note: here
case RCS0:
^~~~
drivers/gpu/drm/i915/display/intel_dp.c: In function ‘intel_dp_get_fia_supported_lane_count’:
./include/linux/compiler.h:78:22: warning: this statement may fall through [-Wimplicit-fallthrough=]
# define unlikely(x) __builtin_expect(!!(x), 0)
^~~~~~~~~~~~~~~~~~~~~~~~~~
./include/asm-generic/bug.h:136:2: note: in expansion of macro ‘unlikely’
unlikely(__ret_warn_on); \
^~~~~~~~
drivers/gpu/drm/i915/i915_utils.h:49:25: note: in expansion of macro ‘WARN’
#define MISSING_CASE(x) WARN(1, "Missing case (%s == %ld)\n", \
^~~~
drivers/gpu/drm/i915/display/intel_dp.c:233:3: note: in expansion of macro ‘MISSING_CASE’
MISSING_CASE(lane_info);
^~~~~~~~~~~~
drivers/gpu/drm/i915/display/intel_dp.c:234:2: note: here
case 1:
^~~~
drivers/gpu/drm/i915/display/intel_display.c: In function ‘check_digital_port_conflicts’:
CC [M] drivers/gpu/drm/nouveau/nvkm/engine/disp/cursgv100.o
drivers/gpu/drm/i915/display/intel_display.c:12043:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
if (WARN_ON(!HAS_DDI(to_i915(dev))))
^
drivers/gpu/drm/i915/display/intel_display.c:12046:3: note: here
case INTEL_OUTPUT_DP:
^~~~
Also, notice that the Makefile is modified to stop ignoring
fall-through warnings. The -Wimplicit-fallthrough option
will be enabled globally in v5.3.
Warning level 3 was used: -Wimplicit-fallthrough=3
This patch is part of the ongoing efforts to enable
-Wimplicit-fallthrough.
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
In preparation to enabling -Wimplicit-fallthrough, mark switch
cases where we are expecting to fall through.
Warning level 3 was used: -Wimplicit-fallthrough=3
This patch is part of the ongoing efforts to enable -Wimplicit-fallthrough.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
In preparation to enabling -Wimplicit-fallthrough, this patch silences
the following warning:
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_mqd_manager_v10.c: In function ‘mqd_manager_init_v10’:
./include/linux/dynamic_debug.h:122:52: warning: this statement may fall through [-Wimplicit-fallthrough=]
#define __dynamic_func_call(id, fmt, func, ...) do { \
^
./include/linux/dynamic_debug.h:143:2: note: in expansion of macro ‘__dynamic_func_call’
__dynamic_func_call(__UNIQUE_ID(ddebug), fmt, func, ##__VA_ARGS__)
^~~~~~~~~~~~~~~~~~~
./include/linux/dynamic_debug.h:153:2: note: in expansion of macro ‘_dynamic_func_call’
_dynamic_func_call(fmt, __dynamic_pr_debug, \
^~~~~~~~~~~~~~~~~~
./include/linux/printk.h:336:2: note: in expansion of macro ‘dynamic_pr_debug’
dynamic_pr_debug(fmt, ##__VA_ARGS__)
^~~~~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_mqd_manager_v10.c:432:3: note: in expansion of macro ‘pr_debug’
pr_debug("%s@%i\n", __func__, __LINE__);
^~~~~~~~
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_mqd_manager_v10.c:433:2: note: here
case KFD_MQD_TYPE_COMPUTE:
^~~~
by removing the call to pr_debug() in KFD_MQD_TYPE_CP:
"The mqd init for CP and COMPUTE will have the same routine." [1]
This bug was found thanks to the ongoing efforts to enable
-Wimplicit-fallthrough.
[1] https://lore.kernel.org/lkml/c735a1cc-a545-50fb-44e7-c0ad93ee8ee7@amd.com/
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Add missing break statement in order to prevent the code from falling
through to case AMDGPU_IRQ_STATE_ENABLE.
This bug was found thanks to the ongoing efforts to enable
-Wimplicit-fallthrough.
Fixes: a644d85a5c ("drm/amdgpu: add gfx v10 implementation (v10)")
Cc: stable@vger.kernel.org
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Add missing break statement in order to prevent the code from falling
through to case CHIP_NAVI10.
This bug was found thanks to the ongoing efforts to enable
-Wimplicit-fallthrough.
Fixes: 14328aa58c ("drm/amdkfd: Add navi10 support to amdkfd. (v3)")
Cc: stable@vger.kernel.org
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
In preparation to enabling -Wimplicit-fallthrough, mark switch
cases where we are expecting to fall through.
This patch fixes the following warnings:
arch/x86/events/intel/core.c: In function ‘intel_pmu_init’:
arch/x86/events/intel/core.c:4959:8: warning: this statement may fall through [-Wimplicit-fallthrough=]
pmem = true;
~~~~~^~~~~~
arch/x86/events/intel/core.c:4960:2: note: here
case INTEL_FAM6_SKYLAKE_MOBILE:
^~~~
arch/x86/events/intel/core.c:5008:8: warning: this statement may fall through [-Wimplicit-fallthrough=]
pmem = true;
~~~~~^~~~~~
arch/x86/events/intel/core.c:5009:2: note: here
case INTEL_FAM6_ICELAKE_MOBILE:
^~~~
Warning level 3 was used: -Wimplicit-fallthrough=3
This patch is part of the ongoing efforts to enable
-Wimplicit-fallthrough.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.
This patch fixes the following warning:
drivers/mtd/nand/onenand/onenand_base.c: In function ‘onenand_check_features’:
drivers/mtd/nand/onenand/onenand_base.c:3264:17: warning: this statement may fall through [-Wimplicit-fallthrough=]
this->options |= ONENAND_HAS_NOP_1;
drivers/mtd/nand/onenand/onenand_base.c:3265:2: note: here
case ONENAND_DEVICE_DENSITY_4Gb:
^~~~
Warning level 3 was used: -Wimplicit-fallthrough=3
This patch is part of the ongoing efforts to enable
-Wimplicit-fallthrough.
Cc: Jonathan Bakker <xc-racer2@live.ca>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
In preparation to enabling -Wimplicit-fallthrough, mark switch
cases where we are expecting to fall through.
This patch fixes the following warnings:
Warning level 3 was used: -Wimplicit-fallthrough=3
fs/afs/fsclient.c: In function ‘afs_deliver_fs_fetch_acl’:
fs/afs/fsclient.c:2199:19: warning: this statement may fall through [-Wimplicit-fallthrough=]
call->unmarshall++;
~~~~~~~~~~~~~~~~^~
fs/afs/fsclient.c:2202:2: note: here
case 1:
^~~~
fs/afs/fsclient.c:2216:19: warning: this statement may fall through [-Wimplicit-fallthrough=]
call->unmarshall++;
~~~~~~~~~~~~~~~~^~
fs/afs/fsclient.c:2219:2: note: here
case 2:
^~~~
fs/afs/fsclient.c:2225:19: warning: this statement may fall through [-Wimplicit-fallthrough=]
call->unmarshall++;
~~~~~~~~~~~~~~~~^~
fs/afs/fsclient.c:2228:2: note: here
case 3:
^~~~
This patch is part of the ongoing efforts to enable
-Wimplicit-fallthrough.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
In preparation to enabling -Wimplicit-fallthrough, mark switch
cases where we are expecting to fall through.
This patch fixes the following warnings:
fs/afs/yfsclient.c: In function ‘yfs_deliver_fs_fetch_opaque_acl’:
fs/afs/yfsclient.c:1984:19: warning: this statement may fall through [-Wimplicit-fallthrough=]
call->unmarshall++;
~~~~~~~~~~~~~~~~^~
fs/afs/yfsclient.c:1987:2: note: here
case 1:
^~~~
fs/afs/yfsclient.c:2005:19: warning: this statement may fall through [-Wimplicit-fallthrough=]
call->unmarshall++;
~~~~~~~~~~~~~~~~^~
fs/afs/yfsclient.c:2008:2: note: here
case 2:
^~~~
fs/afs/yfsclient.c:2014:19: warning: this statement may fall through [-Wimplicit-fallthrough=]
call->unmarshall++;
~~~~~~~~~~~~~~~~^~
fs/afs/yfsclient.c:2017:2: note: here
case 3:
^~~~
fs/afs/yfsclient.c:2035:19: warning: this statement may fall through [-Wimplicit-fallthrough=]
call->unmarshall++;
~~~~~~~~~~~~~~~~^~
fs/afs/yfsclient.c:2038:2: note: here
case 4:
^~~~
fs/afs/yfsclient.c:2047:19: warning: this statement may fall through [-Wimplicit-fallthrough=]
call->unmarshall++;
~~~~~~~~~~~~~~~~^~
fs/afs/yfsclient.c:2050:2: note: here
case 5:
^~~~
Warning level 3 was used: -Wimplicit-fallthrough=3
Also, fix some commenting style issues.
This patch is part of the ongoing efforts to enable
-Wimplicit-fallthrough.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.
This patch fixes the following warnings:
drivers/net/can/peak_canfd/peak_pciefd_main.c:668:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
drivers/net/can/spi/mcp251x.c:875:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
drivers/net/can/usb/peak_usb/pcan_usb.c:422:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
drivers/net/can/at91_can.c:895:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
drivers/net/can/at91_can.c:953:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
drivers/net/can/usb/peak_usb/pcan_usb.c: In function ‘pcan_usb_decode_error’:
drivers/net/can/usb/peak_usb/pcan_usb.c:422:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
if (n & PCAN_USB_ERROR_BUS_LIGHT) {
^
drivers/net/can/usb/peak_usb/pcan_usb.c:428:2: note: here
case CAN_STATE_ERROR_WARNING:
^~~~
Warning level 3 was used: -Wimplicit-fallthrough=3
This patch is part of the ongoing efforts to enabling
-Wimplicit-fallthrough.
Notice that in some cases spelling mistakes were fixed.
In other cases, the /* fall through */ comment is placed
at the bottom of the case statement, which is what GCC
is expecting to find.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
In preparation to enabling -Wimplicit-fallthrough, mark switch
cases where we are expecting to fall through.
This patch fixes the following warnings:
drivers/firewire/core-device.c: In function ‘set_broadcast_channel’:
drivers/firewire/core-device.c:969:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
if (data & cpu_to_be32(1 << 31)) {
^
drivers/firewire/core-device.c:974:3: note: here
case RCODE_ADDRESS_ERROR:
^~~~
drivers/firewire/core-iso.c: In function ‘manage_channel’:
drivers/firewire/core-iso.c:308:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
if ((data[0] & bit) == (data[1] & bit))
^
drivers/firewire/core-iso.c:312:3: note: here
default:
^~~~~~~
drivers/firewire/core-topology.c: In function ‘count_ports’:
drivers/firewire/core-topology.c:69:23: warning: this statement may fall through [-Wimplicit-fallthrough=]
(*child_port_count)++;
~~~~~~~~~~~~~~~~~~~^~
drivers/firewire/core-topology.c:70:3: note: here
case SELFID_PORT_PARENT:
^~~~
Warning level 3 was used: -Wimplicit-fallthrough=3
Notice that in some cases, the code comment is modified in
accordance with what GCC is expecting to find.
This patch is part of the ongoing efforts to enable
-Wimplicit-fallthrough.
Cc: Kees Cook <keescook@chromium.org>
Cc: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> (reworded a comment)
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Alexei Starovoitov says:
====================
pull-request: bpf 2019-07-25
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) fix segfault in libbpf, from Andrii.
2) fix gso_segs access, from Eric.
3) tls/sockmap fixes, from Jakub and John.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
We need the same checks introduced by commit cb9f1b7838
("ip: validate header length on virtual device xmit") for
ipip tunnel.
Fixes: cb9f1b7838 ("ip: validate header length on virtual device xmit")
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ipv6_flowlabel and ipv6_flowlabel_mgr are missing from
gitignore. Quentin points out that the original
commit 3fb321fde2 ("selftests/net: ipv6 flowlabel")
did add ignore entries, they are just missing the "ipv6_"
prefix.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 3968d38917 ("bnx2x: Fix Multi-Cos.") which enabled multi-cos
feature after prolonged time in driver added some regression causing
numerous issues (sudden reboots, tx timeout etc.) reported by customers.
We plan to backout this commit and submit proper fix once we have root
cause of issues reported with this feature enabled.
Fixes: 3968d38917 ("bnx2x: Fix Multi-Cos.")
Signed-off-by: Sudarsana Reddy Kalluru <skalluru@marvell.com>
Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The combination of KASAN_STACK and GCC_PLUGIN_STRUCTLEAK_BYREF
leads to much larger kernel stack usage, as seen from the warnings
about functions that now exceed the 2048 byte limit:
drivers/media/i2c/tvp5150.c:253:1: error: the frame size of 3936 bytes is larger than 2048 bytes
drivers/media/tuners/r820t.c:1327:1: error: the frame size of 2816 bytes is larger than 2048 bytes
drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c:16552:1: error: the frame size of 3144 bytes is larger than 2048 bytes [-Werror=frame-larger-than=]
fs/ocfs2/aops.c:1892:1: error: the frame size of 2088 bytes is larger than 2048 bytes
fs/ocfs2/dlm/dlmrecovery.c:737:1: error: the frame size of 2088 bytes is larger than 2048 bytes
fs/ocfs2/namei.c:1677:1: error: the frame size of 2584 bytes is larger than 2048 bytes
fs/ocfs2/super.c:1186:1: error: the frame size of 2640 bytes is larger than 2048 bytes
fs/ocfs2/xattr.c:3678:1: error: the frame size of 2176 bytes is larger than 2048 bytes
net/bluetooth/l2cap_core.c:7056:1: error: the frame size of 2144 bytes is larger than 2048 bytes [-Werror=frame-larger-than=]
net/bluetooth/l2cap_core.c: In function 'l2cap_recv_frame':
net/bridge/br_netlink.c:1505:1: error: the frame size of 2448 bytes is larger than 2048 bytes
net/ieee802154/nl802154.c:548:1: error: the frame size of 2232 bytes is larger than 2048 bytes
net/wireless/nl80211.c:1726:1: error: the frame size of 2224 bytes is larger than 2048 bytes
net/wireless/nl80211.c:2357:1: error: the frame size of 4584 bytes is larger than 2048 bytes
net/wireless/nl80211.c:5108:1: error: the frame size of 2760 bytes is larger than 2048 bytes
net/wireless/nl80211.c:6472:1: error: the frame size of 2112 bytes is larger than 2048 bytes
The structleak plugin was previously disabled for CONFIG_COMPILE_TEST,
but meant we missed some bugs, so this time we should address them.
The frame size warnings are distracting, and risking a kernel stack
overflow is generally not beneficial to performance, so it may be best
to disallow that particular combination. This can be done by turning
off either one. I picked the dependency in GCC_PLUGIN_STRUCTLEAK_BYREF
and GCC_PLUGIN_STRUCTLEAK_BYREF_ALL, as this option is designed to
make uninitialized stack usage less harmful when enabled on its own,
but it also prevents KASAN from detecting those cases in which it was
in fact needed.
KASAN_STACK is currently implied by KASAN on gcc, but could be made a
user selectable option if we want to allow combining (non-stack) KASAN
with GCC_PLUGIN_STRUCTLEAK_BYREF.
Note that it would be possible to specifically address the files that
print the warning, but presumably the overall stack usage is still
significantly higher than in other configurations, so this would not
address the full problem.
I could not test this with CONFIG_INIT_STACK_ALL, which may or may not
suffer from a similar problem.
Fixes: 81a56f6dcd ("gcc-plugins: structleak: Generalize to all variable types")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20190722114134.3123901-1-arnd@arndb.de
Signed-off-by: Kees Cook <keescook@chromium.org>
A netdev mismatch in the processed TLS SKB should not occur,
and indicates a kernel bug.
Add WARN_ONCE to spot such cases.
Fixes: d2ead1f360 ("net/mlx5e: Add kTLS TX HW offload support")
Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
This patch prevents a race between user invoked cached counters
query and a neighbor last usage updater.
The cached flow counter stats can be queried by calling
"mlx5_fc_query_cached" which provides the number of bytes and
packets that passed via this flow since the last time this counter
was queried.
It does so by reducting the last saved stats from the current, cached
stats and then updating the last saved stats with the cached stats.
It also provide the lastuse value for that flow.
Since "mlx5e_tc_update_neigh_used_value" needs to retrieve the
last usage time of encapsulation flows, it calls the flow counter
query method periodically and async to user queries of the flow counter
using cls_flower.
This call is causing the driver to update the last reported bytes and
packets from the cache and therefore, future user queries of the flow
stats will return lower than expected number for bytes and packets
since the last saved stats in the driver was updated async to the last
saved stats in cls_flower.
This causes wrong stats presentation of encapsulation flows to user.
Since the neighbor usage updater only needs the lastuse stats from the
cached counter, the fix is to use a dedicated lastuse query call that
returns the lastuse value without synching between the cached stats and
the last saved stats.
Fixes: f6dfb4c3f2 ("net/mlx5e: Update neighbour 'used' state using HW flow rules counters")
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Speed translation is performed based on legacy or extended PTYS
register. Translate speed with respect to:
1) Capability bit of extended PTYS table.
2) User request:
a) When auto-negotiation is turned on, inspect advertisement whether it
contains extended link modes.
b) When auto-negotiation is turned off, speed > 100Gbps (maximal
speed supported in legacy mode).
With both conditions fulfilled translation is done with extended PTYS
table otherwise use legacy PTYS table.
Without this patch 25/50/100 Gbps speed cannot be set, since try to
configure in extended mode but read from legacy mode.
Fixes: dd1b9e09c1 ("net/mlx5: ethtool, Allow legacy link-modes configuration via non-extended ptys")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
No XSK support in the enhanced IPoIB driver and representors.
Add a profile property to specify this, and enhance the logic
that calculates the max number of channels to take it into
account.
Fixes: db05815b36 ("net/mlx5e: Add XSK zero-copy support")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Fix modify_cq_in alignment to match the device specification.
After this fix the 'cq_umem_valid' field will be in the right offset.
Cc: <stable@vger.kernel.org> # 4.19
Fixes: bd37197554 ("net/mlx5: Update mlx5_ifc with DEVX UID bits")
Signed-off-by: Edward Srouji <edwards@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
New flow table type RDMA_RX was added but the MLX5_CAP_FLOW_TABLE_TYPE
didn't handle this new flow table type.
This means that MLX5_CAP_FLOW_TABLE_TYPE returns an empty capability to
this flow table type.
Update both the macro and the maximum supported flow table type to
RDMA_RX.
Fixes: d83eb50e29 ("net/mlx5: Add support in RDMA RX steering")
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When lag is active, which is controlled by the bonded mlx5e netdev, mlx5
interface unregestering must happen in the reverse order where rdma is
unregistered (unloaded) first, to guarantee all references to the lag
context in hardware is removed, then remove mlx5e netdev interface which
will cleanup the lag context from hardware.
Without this fix during destroy of LAG interface, we observed following
errors:
* mlx5_cmd_check:752:(pid 12556): DESTROY_LAG(0x843) op_mod(0x0) failed,
status bad parameter(0x3), syndrome (0xe4ac33)
* mlx5_cmd_check:752:(pid 12556): DESTROY_LAG(0x843) op_mod(0x0) failed,
status bad parameter(0x3), syndrome (0xa5aee8).
Fixes: a31208b1e1 ("net/mlx5_core: New init and exit flow for mlx5_core")
Reviewed-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Currently nouveau_svm_fault expects nouveau_range_fault to never unlock
mmap_sem, but the latter unlocks it for a random selection of error
codes. Fix this up by always unlocking mmap_sem for non-zero return values
in nouveau_range_fault, and only unlocking it in the caller for successful
returns.
Link: https://lore.kernel.org/r/20190724065258.16603-5-hch@lst.de
Tested-by: Ralph Campbell <rcampbell@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
These two functions are marked as a legacy APIs to get rid of, but seem to
suit the current nouveau flow. Move it to the only user in preparation
for fixing a locking bug involving caller and callee. All comments
referring to the old API have been removed as this now is a driver private
helper.
Link: https://lore.kernel.org/r/20190724065258.16603-3-hch@lst.de
Tested-by: Ralph Campbell <rcampbell@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
We should not have two different error codes for the same
condition. EAGAIN must be reserved for the FAULT_FLAG_ALLOW_RETRY retry
case and signals to the caller that the mmap_sem has been unlocked.
Use EBUSY for the !valid case so that callers can get the locking right.
Link: https://lore.kernel.org/r/20190724065258.16603-2-hch@lst.de
Tested-by: Ralph Campbell <rcampbell@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
[jgg: elaborated commit message]
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
fm_set_max_frm() existed in the Freescale SDK as a callback for an
early_param. When this code was ported to the upstream kernel the
early_param was converted to a module_param making the reference to the
function incorrect. The rest of the comment already does a good job of
explaining the parameter so removing the reference to the non-existent
function seems like the best thing to do.
Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
devm_kzalloc may fail and return NULL. So the null check is needed.
Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
devm_kzalloc may fail and return null. So the null check is needed.
Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- v1 -> v2: Move skb_set_owner_w to __tun_build_skb to reduce patch size
Small packets going out of a tap device go through an optimized code
path that uses build_skb() rather than sock_alloc_send_pskb(). The
latter calls skb_set_owner_w(), but the small packet code path does not.
The net effect is that small packets are not owned by the userland
application's socket (e.g. QEMU), while large packets are.
This can be seen with a TCP session, where packets are not owned when
the window size is small enough (around PAGE_SIZE), while they are once
the window grows (note that this requires the host to support virtio
tso for the guest to offload segmentation).
All this leads to inconsistent behaviour in the kernel, especially on
netfilter modules that uses sk->socket (e.g. xt_owner).
Fixes: 66ccbc9c87 ("tap: use build_skb() for small packet")
Signed-off-by: Alexis Bauvin <abauvin@scaleway.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Leon Romanovsky says:
====================
DIM fixes for 5.3
Those two fixes for recently merged DIM patches, both exposed through
RDMa DIM usage.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
DIM causes to the following warnings during kernel compilation
which indicates that tx_profile and rx_profile are supposed to
be declared in *.c and not in *.h files.
In file included from ./include/rdma/ib_verbs.h:64,
from ./include/linux/mlx5/device.h:37,
from ./include/linux/mlx5/driver.h:51,
from ./include/linux/mlx5/vport.h:36,
from drivers/infiniband/hw/mlx5/ib_virt.c:34:
./include/linux/dim.h:326:1: warning: _tx_profile_ defined but not used [-Wunused-const-variable=]
326 | tx_profile[DIM_CQ_PERIOD_NUM_MODES][NET_DIM_PARAMS_NUM_PROFILES] = {
| ^~~~~~~~~~
./include/linux/dim.h:320:1: warning: _rx_profile_ defined but not used [-Wunused-const-variable=]
320 | rx_profile[DIM_CQ_PERIOD_NUM_MODES][NET_DIM_PARAMS_NUM_PROFILES] = {
| ^~~~~~~~~~
Fixes: 4f75da3666 ("linux/dim: Move implementation to .c files")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While using net_dim, a dim_sample was used without ever initializing the
comps value. Added use of DIV_ROUND_DOWN_ULL() to prevent potential
overflow, it should not be a problem to save the final result in an int
because after the division by epms the value should not be larger than a
few thousand.
[ 1040.127124] UBSAN: Undefined behaviour in lib/dim/dim.c:78:23
[ 1040.130118] signed integer overflow:
[ 1040.131643] 134718714 * 100 cannot be represented in type 'int'
Fixes: 398c2b05bb ("linux/dim: Add completions count to dim_sample")
Signed-off-by: Yamin Friedman <yaminf@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The SPI bus creates a device with the modalias of "xo1.75-ec". This
fixes XO-1.75 EC driver autoloading
Fixes: 0c3d931b3a ("Platform: OLPC: Add XO-1.75 EC driver")
Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Despite a proper NULL-termination after strncpy(..., ..., IFNAMSIZ - 1),
GCC8 still complains about *expected* string truncation:
xsk.c:330:2: error: 'strncpy' output may be truncated copying 15 bytes
from a string of length 15 [-Werror=stringop-truncation]
strncpy(ifr.ifr_name, xsk->ifname, IFNAMSIZ - 1);
This patch gets rid of the issue altogether by using memcpy instead.
There is no performance regression, as strncpy will still copy and fill
all of the bytes anyway.
v1->v2:
- rebase against bpf tree.
Cc: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add a description for the context parameter in the struct wmi_device_id.
Reported-by: kbuild test robot <lkp@intel.com>
Fixes: a48e23385f ("platform/x86: wmi: add context pointer field to struct wmi_device_id")
Signed-off-by: Mattias Jacobsson <2pi@mok.nu>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Pull NVMe fixes from Christoph.
* 'nvme-5.3' of git://git.infradead.org/nvme:
Revert "nvme-pci: don't create a read hctx mapping without read queues"
nvme: fix multipath crash when ANA is deactivated
nvme: fix memory leak caused by incorrect subsystem free
nvme: ignore subnqn for ADATA SX6000LNP
Daniel reports that when testing an http server that uses io_uring
to poll for incoming connections, sometimes it hard crashes. This is
due to an uninitialized list member for the io_uring request. Normally
this doesn't trigger and none of the test cases caught it.
Reported-by: Daniel Kozak <kozzi11@gmail.com>
Tested-by: Daniel Kozak <kozzi11@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull power management fixes from Rafael Wysocki
"These fix two issues related to the RAPL MMIO interface support added
recently and one cpufreq driver issue.
Specifics:
- Initialize the power capping subsystem and the RAPL driver earlier
in case the int340X thermal driver is built-in and attempts to
register an MMIO interface for RAPL which must not happen before
the requisite infrastructure is ready (Zhang Rui)
- Fix the int340X thermal driver's RAPL MMIO interface registration
error path (Rafael Wysocki)
- Fix possible use-after-free in the pasemi cpufreq driver (Wen
Yang)"
* tag 'pm-5.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq/pasemi: fix use-after-free in pas_cpufreq_cpu_init()
int340X/processor_thermal_device: Fix proc_thermal_rapl_remove()
powercap: Invoke powercap_init() and rapl_init() earlier
Pull RISC-V updates from Paul Walmsley:
"Four minor RISC-V-related changes:
- Add support for the new clone3 syscall for RV64, relying on the
generic support
- Add DT data for the gigabit Ethernet controller on the SiFive FU540
and the HiFive Unleashed board
- Update MAINTAINERS to add me to the arch/riscv maintainers' list
- Add support for PCIe message-signaled interrupts by reusing the
generic header file"
* tag 'riscv/for-v5.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: dts: Add DT node for SiFive FU540 Ethernet controller driver
riscv: include generic support for MSI irqdomains
MAINTAINERS: Add Paul as a RISC-V maintainer
riscv: enable sys_clone3 syscall for rv64
Pull ktest fixlets from Steven Rostedt:
"This contains only simple spelling fixes"
* tag 'ktest-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest:
ktest: Fix some typos in config-bisect.pl
The access() (and faccessat()) credentials change can cause an
unnecessary load on the RCU machinery because every access() call ends
up freeing the temporary access credential using RCU.
This isn't really noticeable on small machines, but if you have hundreds
of cores you can cause huge slowdowns due to RCU storms.
It's easy to avoid: the temporary access crededntials aren't actually
normally accessed using RCU at all, so we can avoid the whole issue by
just marking them as such.
* access-creds:
access: avoid the RCU grace period for the temporary subjective credentials
Commit 06297d8cef ("btrfs: switch extent_buffer blocking_writers from
atomic to int") changed the type of blocking_writers but forgot to
adjust relevant code in btrfs_tree_unlock by converting the
smp_mb__after_atomic to smp_mb. This opened up the possibility of a
deadlock due to re-ordering of setting blocking_writers and
checking/waking up the waiter. This particular lockup is explained in a
comment above waitqueue_active() function.
Fix it by converting the memory barrier to a full smp_mb, accounting
for the fact that blocking_writers is a simple integer.
Fixes: 06297d8cef ("btrfs: switch extent_buffer blocking_writers from atomic to int")
Tested-by: Johannes Thumshirn <jthumshirn@suse.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Port counter objects should be initialized even if alloc_stats is
unsupported, otherwise QP bind operations in user space can trigger a NULL
pointer deference if they try to bind QP on RDMA device which doesn't
support counters.
Fixes: f34a55e497 ("RDMA/core: Get sum value of all counters when perform a sysfs stat read")
Link: https://lore.kernel.org/r/20190723065733.4899-11-leon@kernel.org
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
rdma_counter_init() may fail for a device. In such case while calculating
total sum, ignore NULL hstats.
This fixes below observed call trace.
BUG: kernel NULL pointer dereference, address: 00000000000000a0
PGD 8000001009b30067 P4D 8000001009b30067 PUD 10549c9067 PMD 0
Oops: 0000 [#1] SMP PTI
CPU: 55 PID: 20887 Comm: cat Kdump: loaded Not tainted 5.2.0-rc6-jdc+ #13
RIP: 0010:rdma_counter_get_hwstat_value+0xf2/0x150 [ib_core]
Call Trace:
show_hw_stats+0x5e/0x130 [ib_core]
dev_attr_show+0x15/0x50
sysfs_kf_seq_show+0xc6/0x1a0
seq_read+0x132/0x370
vfs_read+0x89/0x140
ksys_read+0x5c/0xd0
do_syscall_64+0x5a/0x240
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Fixes: f34a55e497 ("RDMA/core: Get sum value of all counters when perform a sysfs stat read")
Link: https://lore.kernel.org/r/20190723065733.4899-10-leon@kernel.org
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The device requires that memory registration work requests that update the
address translation table of a MR will be fenced if posted together. This
scenario can happen when address ranges are invalidated by the mmu in
separate concurrent calls to the invalidation callback.
We prefer to block concurrent address updates for a single MR over fencing
since making the decision if a WQE needs fencing will be more expensive
and fencing all WQEs is a too radical choice.
Further, it isn't clear that this code can even run safely concurrently,
so a lock is a safer choice.
Fixes: b4cfe447d4 ("IB/mlx5: Implement on demand paging by adding support for MMU notifiers")
Link: https://lore.kernel.org/r/20190723065733.4899-8-leon@kernel.org
Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
While reviewing rwsem down_slowpath, Will noticed ldsem had a copy of
a bug we just found for rwsem.
X = 0;
CPU0 CPU1
rwsem_down_read()
for (;;) {
set_current_state(TASK_UNINTERRUPTIBLE);
X = 1;
rwsem_up_write();
rwsem_mark_wake()
atomic_long_add(adjustment, &sem->count);
smp_store_release(&waiter->task, NULL);
if (!waiter.task)
break;
...
}
r = X;
Allows 'r == 0'.
Reported-by: Will Deacon <will@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 4898e640ca ("tty: Add timed, writer-prioritized rw semaphore")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
While reviewing another read_slowpath patch, both Will and I noticed
another missing ACQUIRE, namely:
X = 0;
CPU0 CPU1
rwsem_down_read()
for (;;) {
set_current_state(TASK_UNINTERRUPTIBLE);
X = 1;
rwsem_up_write();
rwsem_mark_wake()
atomic_long_add(adjustment, &sem->count);
smp_store_release(&waiter->task, NULL);
if (!waiter.task)
break;
...
}
r = X;
Allows 'r == 0'.
Reported-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reported-by: Will Deacon <will@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
LTP mtest06 has been observed to occasionally hit "still mapped when
deleted" and following BUG_ON on arm64.
The extra mapcount originated from pagefault handler, which handled
pagefault for vma that has already been detached. vma is detached
under mmap_sem write lock by detach_vmas_to_be_unmapped(), which
also invalidates vmacache.
When the pagefault handler (under mmap_sem read lock) calls
find_vma(), vmacache_valid() wrongly reports vmacache as valid.
After rwsem down_read() returns via 'queue empty' path (as of v5.2),
it does so without an ACQUIRE on sem->count:
down_read()
__down_read()
rwsem_down_read_failed()
__rwsem_down_read_failed_common()
raw_spin_lock_irq(&sem->wait_lock);
if (list_empty(&sem->wait_list)) {
if (atomic_long_read(&sem->count) >= 0) {
raw_spin_unlock_irq(&sem->wait_lock);
return sem;
The problem can be reproduced by running LTP mtest06 in a loop and
building the kernel (-j $NCPUS) in parallel. It does reproduces since
v4.20 on arm64 HPE Apollo 70 (224 CPUs, 256GB RAM, 2 nodes). It
triggers reliably in about an hour.
The patched kernel ran fine for 10+ hours.
Signed-off-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Will Deacon <will@kernel.org>
Acked-by: Waiman Long <longman@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dbueso@suse.de
Fixes: 4b486b535c ("locking/rwsem: Exit read lock slowpath if queue empty & no writer")
Link: https://lkml.kernel.org/r/50b8914e20d1d62bb2dee42d342836c2c16ebee7.1563438048.git.jstancek@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
For writer, the owner value is cleared on unlock. For reader, it is
left intact on unlock for providing better debugging aid on crash dump
and the unlock of one reader may not mean the lock is free.
As a result, the owner_on_cpu() shouldn't be used on read-owner
as the task pointer value may not be valid and it might have
been freed. That is the case in rwsem_spin_on_owner(), but not in
rwsem_can_spin_on_owner(). This can lead to use-after-free error from
KASAN. For example,
BUG: KASAN: use-after-free in rwsem_down_write_slowpath
(/home/miguel/kernel/linux/kernel/locking/rwsem.c:669
/home/miguel/kernel/linux/kernel/locking/rwsem.c:1125)
Fix this by checking for RWSEM_READER_OWNED flag before calling
owner_on_cpu().
Reported-by: Luis Henriques <lhenriques@suse.com>
Tested-by: Luis Henriques <lhenriques@suse.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: huang ying <huang.ying.caritas@gmail.com>
Fixes: 94a9717b3c ("locking/rwsem: Make rwsem->owner an atomic_long_t")
Link: https://lkml.kernel.org/r/81e82d5b-5074-77e8-7204-28479bbe0df0@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When going through execve(), zero out the NUMA fault statistics instead of
freeing them.
During execve, the task is reachable through procfs and the scheduler. A
concurrent /proc/*/sched reader can read data from a freed ->numa_faults
allocation (confirmed by KASAN) and write it back to userspace.
I believe that it would also be possible for a use-after-free read to occur
through a race between a NUMA fault and execve(): task_numa_fault() can
lead to task_numa_compare(), which invokes task_weight() on the currently
running task of a different CPU.
Another way to fix this would be to make ->numa_faults RCU-managed or add
extra locking, but it seems easier to wipe the NUMA fault statistics on
execve.
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Fixes: 82727018b0 ("sched/numa: Call task_numa_free() from do_execve()")
Link: https://lkml.kernel.org/r/20190716152047.14424-1-jannh@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
That function now returns ERR_PTR instead of NULL if "hpd-gpio" is not
present in device-tree. The offending patch missed to adapt the Tegra's
DRM driver for the API change.
Fixes: 025bf37725 ("gpio: Fix return value mismatch of function gpiod_get_from_of_node()")
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Acked-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
If CONFIG_PM is not set, build warnings:
drivers/dma/tegra210-adma.c:747:12: warning: tegra_adma_runtime_resume defined but not used [-Wunused-function]
static int tegra_adma_runtime_resume(struct device *dev)
drivers/dma/tegra210-adma.c:715:12: warning: tegra_adma_runtime_suspend defined but not used [-Wunused-function]
static int tegra_adma_runtime_suspend(struct device *dev)
Mark the two function as __maybe_unused.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Fixes: 3145d73e69 ("dmaengine: tegra210-adma: remove PM_CLK dependency")
Fixes: f46b195799 ("dmaengine: tegra-adma: Add support for Tegra210 ADMA")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Jon Hunter <jonathanh@nvidia.com>
Link: https://lore.kernel.org/r/20190709083258.57112-1-yuehaibing@huawei.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
In test_firmware_init(), the buffer pointed to by the global pointer
'test_fw_config' is allocated through kzalloc(). Then, the buffer is
initialized in __test_firmware_config_init(). In the case that the
initialization fails, the following execution in test_firmware_init() needs
to be terminated with an error code returned to indicate this failure.
However, the allocated buffer is not freed on this execution path, leading
to a memory leak bug.
To fix the above issue, free the allocated buffer before returning from
test_firmware_init().
Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
Link: https://lore.kernel.org/r/1563084696-6865-1-git-send-email-wang6495@umn.edu
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
misc/eeprom/{at24,at25,eeprom_93xx46} drivers all register their
corresponding devices in the nvmem framework in compat mode which requires
nvmem sysfs interface to be present. The latter, however, has been split
out from nvmem under a separate Kconfig in commit ae0c2d7255 ("nvmem:
core: add NVMEM_SYSFS Kconfig"). As a result, probing certain I2C-attached
EEPROMs now fails with
at24: probe of 0-0050 failed with error -38
because of a stub implementation of nvmem_sysfs_setup_compat()
in drivers/nvmem/nvmem.h. Update the nvmem dependency for these drivers
so they could load again:
at24 0-0050: 32768 byte 24c256 EEPROM, writable, 64 bytes/write
Cc: Adrian Bunk <bunk@kernel.org>
Cc: Bartosz Golaszewski <brgl@bgdev.pl>
Cc: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Cc: stable@vger.kernel.org # v5.2+
Signed-off-by: Arseny Solokha <asolokha@kb.kras.ru>
Link: https://lore.kernel.org/r/20190716111236.27803-1-asolokha@kb.kras.ru
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
X86_HYPER_NATIVE isn't accurate for checking if running on native platform,
e.g. CONFIG_HYPERVISOR_GUEST isn't set or "nopv" is enabled.
Checking the CPU feature bit X86_FEATURE_HYPERVISOR to determine if it's
running on native platform is more accurate.
This still doesn't cover the platforms on which X86_FEATURE_HYPERVISOR is
unsupported, e.g. VMware, but there is nothing which can be done about this
scenario.
Fixes: 8a4b06d391 ("x86/speculation/mds: Add sysfs reporting for MDS")
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1564022349-17338-1-git-send-email-zhenzhong.duan@oracle.com
Rui reported that on a Pentium D machine which has HPET forced enabled
because it is not advertised by ACPI, the early counter is counting check
leads to a silent boot hang.
The reason is that the ordering of checking the counter first and then
reconfiguring the HPET fails to work on that machine. As the HPET is not
advertised and presumably not initialized by the BIOS the early enable and
the following reconfiguration seems to bring it into a broken state. Adding
clocksource=jiffies to the command line results in the following
clocksource watchdog warning:
clocksource: timekeeping watchdog on CPU1:
Marking clocksource 'tsc-early' as unstable because the skew is too large:
clocksource: 'hpet' wd_now: 33 wd_last: 33 mask: ffffffff
That clearly shows that the HPET is not counting after it got reconfigured
and reenabled. If the counter is not working then the HPET timer is not
expiring either, which explains the boot hang.
Move the counter is counting check after the full configuration again to
unbreak these systems.
Reported-by: Rui Salvaterra <rsalvaterra@gmail.com>
Fixes: 3222daf970 ("x86/hpet: Separate counter check out of clocksource register code")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Rui Salvaterra <rsalvaterra@gmail.com>
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1907250810530.1791@nanos.tec.linutronix.de
A second regression was found in the immediate data transfer (IDT)
support which was added to 5.2 kernel
IDT is used to transfer small amounts of data (up to 8 bytes) in the
field normally used for data dma address, thus avoiding dma mapping.
If the data was not already dma mapped, then IDT support assumed data was
in urb->transfer_buffer, and did not take into accound that even
small amounts of data (8 bytes) can be in a scatterlist instead.
This caused a NULL pointer dereference when sg_dma_len() was used
with non-dma mapped data.
Solve this by not using IDT if scatter gather buffer list is used.
Fixes: 33e39350eb ("usb: xhci: add Immediate Data Transfer support")
Cc: <stable@vger.kernel.org> # v5.2
Reported-by: Maik Stohn <maik.stohn@seal-one.com>
Tested-by: Maik Stohn <maik.stohn@seal-one.com>
CC: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/1564044861-1445-1-git-send-email-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This is a partial revert of 73d31def1a "usb: usb251xb: Create a ports
field collector method", which broke a existing devicetree
(arch/arm64/boot/dts/freescale/imx8mq.dtsi).
There is no reason why the swap-dx-lanes property should not apply to
the upstream port. The reason given in the breaking commit was that it's
inconsitent with respect to other port properties, but in fact it is not.
All other properties which only apply to the downstream ports explicitly
reject port 0, so there is pretty strong precedence that the driver
referred to the upstream port as port 0. So there is no inconsistency in
this property at all, other than the swapping being also applicable to
the upstream port.
CC: stable@vger.kernel.org #5.2
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Link: https://lore.kernel.org/r/20190719084407.28041-3-l.stach@pengutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The "WITH Linux-syscall-note" exception exists for headers exported to
user space. It is strange to add it to non-exported headers.
Commit 687a3e4d8e ("treewide: remove SPDX "WITH Linux-syscall-note"
from kernel-space headers") did cleanups some months ago, but it looks
like we need to do this periodically.
This patch was generated by the following script:
git grep -l -e Linux-syscall-note \
-- :*.h :^arch/*/include/uapi/asm/*.h :^include/uapi/ :^tools |
while read file
do
sed -i -e 's/(\(GPL-[^[:space:]]*\) WITH Linux-syscall-note)/\1/g' \
-e 's/ WITH Linux-syscall-note//g' $file
done
I did not commit drivers/staging/android/uapi/ion.h . This header is
not currently exported, but somebody may plan to move it to include/uapi/
when the time comes. I am not sure. Anyway, it will be better to check
the license inconsistency in drivers/staging/android/uapi/.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
UAPI headers licensed under GPL are supposed to have exception
"WITH Linux-syscall-note" so that they can be included into non-GPL
user space application code.
The exception note is missing in some UAPI headers.
Some of them slipped in by the treewide conversion commit b24413180f
("License cleanup: add SPDX GPL-2.0 license identifier to files with
no license"). Just run:
$ git show --oneline b24413180f -- arch/x86/include/uapi/asm/
I believe they are not intentional, and should be fixed too.
This patch was generated by the following script:
git grep -l --not -e Linux-syscall-note --and -e SPDX-License-Identifier \
-- :arch/*/include/uapi/asm/*.h :include/uapi/ :^*/Kbuild |
while read file
do
sed -i -e '/[[:space:]]OR[[:space:]]/s/\(GPL-[^[:space:]]*\)/(\1 WITH Linux-syscall-note)/g' \
-e '/[[:space:]]or[[:space:]]/s/\(GPL-[^[:space:]]*\)/(\1 WITH Linux-syscall-note)/g' \
-e '/[[:space:]]OR[[:space:]]/!{/[[:space:]]or[[:space:]]/!s/\(GPL-[^[:space:]]*\)/\1 WITH Linux-syscall-note/g}' $file
done
After this patch is applied, there are 5 UAPI headers that do not contain
"WITH Linux-syscall-note". They are kept untouched since this exception
applies only to GPL variants.
$ git grep --not -e Linux-syscall-note --and -e SPDX-License-Identifier \
-- :arch/*/include/uapi/asm/*.h :include/uapi/ :^*/Kbuild
include/uapi/drm/panfrost_drm.h:/* SPDX-License-Identifier: MIT */
include/uapi/linux/batman_adv.h:/* SPDX-License-Identifier: MIT */
include/uapi/linux/qemu_fw_cfg.h:/* SPDX-License-Identifier: BSD-3-Clause */
include/uapi/linux/vbox_err.h:/* SPDX-License-Identifier: MIT */
include/uapi/linux/virtio_iommu.h:/* SPDX-License-Identifier: BSD-3-Clause */
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch fixes an issue that the following error happens on
swiotlb environment:
xhci-hcd ee000000.usb: swiotlb buffer is full (sz: 524288 bytes), total 32768 (slots), used 1338 (slots)
On the kernel v5.1, block settings of a usb-storage with SuperSpeed
were the following so that the block layer will allocate buffers
up to 64 KiB, and then the issue didn't happen.
max_segment_size = 65536
max_hw_sectors_kb = 1024
After the commit 09324d32d2 ("block: force an unlimited segment
size on queues with a virt boundary") is applied, the block settings
are the following. So, the block layer will allocate buffers up to
1024 KiB, and then the issue happens:
max_segment_size = 4294967295
max_hw_sectors_kb = 1024
To fix the issue, the usb-storage driver checks the maximum size of
a mapping for the device and then adjusts the max_hw_sectors_kb
if required. After this patch is applied, the block settings will
be the following, and then the issue doesn't happen.
max_segment_size = 4294967295
max_hw_sectors_kb = 256
Fixes: 09324d32d2 ("block: force an unlimited segment size on queues with a virt boundary")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/1563793105-20597-1-git-send-email-yoshihiro.shimoda.uh@renesas.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
usb_amd_find_chipset_info() is used for chipset detection for
several quirks. It is strange that its return value indicates
the need for the PLL quirk, which means it is often ignored.
This patch adds a function specifically for checking the PLL
quirk like the other ones. Additionally, rename probe_result to
something more appropriate.
Signed-off-by: Ryan Kennedy <ryan5544@gmail.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Link: https://lore.kernel.org/r/20190704153529.9429-3-ryan5544@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The AMD PLL USB quirk is incorrectly enabled on newer Ryzen
chipsets. The logic in usb_amd_find_chipset_info currently checks
for unaffected chipsets rather than affected ones. This broke
once a new chipset was added in e788787ef. It makes more sense
to reverse the logic so it won't need to be updated as new
chipsets are added. Note that the core of the workaround in
usb_amd_quirk_pll does correctly check the chipset.
Signed-off-by: Ryan Kennedy <ryan5544@gmail.com>
Fixes: e788787ef4 ("usb:xhci:Add quirk for Certain failing HP keyboard on reset after resume")
Cc: stable <stable@vger.kernel.org>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Link: https://lore.kernel.org/r/20190704153529.9429-2-ryan5544@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Conversion to use gpio descriptors broke all gpio lookups as
devm_gpiod_get_index was converted to use dev->driver->name for
the gpio name lookup. Fix this by using the name param. In
addition gpiod_get post-fixes the -gpios to the name so that
shouldn't be included in the call. However this then breaks the
of_find_property call to see if the gpio entry exists as all
fbtft treats all gpios as optional. So use devm_gpiod_get_index_optional
instead which achieves the same thing and is simpler.
Nishad confirmed the changes where only ever compile tested.
Fixes: c440eee1a7 ("Staging: fbtft: Switch to the gpio descriptor interface")
Reviewed-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Tested-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Tested-by: Jan Sebastian Götte <linux@jaseg.net>
Signed-off-by: Phil Reid <preid@electromag.com.au>
Cc: stable <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/1563236677-5045-2-git-send-email-preid@electromag.com.au
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This conexant codec isn't in the supported codec list yet, the hda
generic driver can drive this codec well, but on a Lenovo machine
with mute/mic-mute leds, we need to apply CXT_FIXUP_THINKPAD_ACPI
to make the leds work. After adding this codec to the list, the
driver patch_conexant.c will apply THINKPAD_ACPI to this machine.
Cc: stable@vger.kernel.org
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The label is used just once and the code it points at is not reused, no
point in keeping it.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
nft_meta_get_eval()'s tendency to bail out setting NFT_BREAK verdict in
situations where required data is missing leads to unexpected behaviour
with inverted checks like so:
| meta iifname != eth0 accept
This rule will never match if there is no input interface (or it is not
known) which is not intuitive and, what's worse, breaks consistency of
iptables-nft with iptables-legacy.
Fix this by falling back to placing a value in dreg which never matches
(avoiding accidental matches), i.e. zero for interface index and an
empty string for interface name.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
A clang build reported an (obvious) double CLAC while a GCC build did not;
it turns out that objtool only re-visits instructions if the first visit
was with AC=0. If OTOH the first visit was with AC=1, it completely ignores
any subsequent visit, even when it has AC=0.
Fix this by using a visited mask instead of a boolean, and (explicitly)
mark the AC state.
$ ./objtool check -b --no-fp --retpoline --uaccess drivers/gpu/drm/i915/gem/i915_gem_execbuffer.o
drivers/gpu/drm/i915/gem/i915_gem_execbuffer.o: warning: objtool: .altinstr_replacement+0x22: redundant UACCESS disable
drivers/gpu/drm/i915/gem/i915_gem_execbuffer.o: warning: objtool: eb_copy_relocations.isra.34()+0xea: (alt)
drivers/gpu/drm/i915/gem/i915_gem_execbuffer.o: warning: objtool: .altinstr_replacement+0xffffffffffffffff: (branch)
drivers/gpu/drm/i915/gem/i915_gem_execbuffer.o: warning: objtool: eb_copy_relocations.isra.34()+0xd9: (alt)
drivers/gpu/drm/i915/gem/i915_gem_execbuffer.o: warning: objtool: eb_copy_relocations.isra.34()+0xb2: (branch)
drivers/gpu/drm/i915/gem/i915_gem_execbuffer.o: warning: objtool: eb_copy_relocations.isra.34()+0x39: (branch)
drivers/gpu/drm/i915/gem/i915_gem_execbuffer.o: warning: objtool: eb_copy_relocations.isra.34()+0x0: <=== (func)
Reported-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Nathan Chancellor <natechancellor@gmail.com>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Link: https://github.com/ClangBuiltLinux/linux/issues/617
Link: https://lkml.kernel.org/r/5359166aad2d53f3145cd442d83d0e5115e0cd17.1564007838.git.jpoimboe@redhat.com
Assembly entry/return abstraction change didn't add asmmacro.h include
statement to coprocessor.S, resulting in references to undefined macros
abi_entry and abi_ret on cores that define XTENSA_HAVE_COPROCESSORS.
Fix that by including asm/asmmacro.h from the coprocessor.S.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Some functions in the datapath code are factored out so that each
one has a stack frame smaller than 1024 bytes with gcc. However,
when compiling with clang, the functions are inlined more aggressively
and combined again so we get
net/openvswitch/datapath.c:1124:12: error: stack frame size of 1528 bytes in function 'ovs_flow_cmd_set' [-Werror,-Wframe-larger-than=]
Marking both get_flow_actions() and ovs_nla_init_match_and_action()
as 'noinline_for_stack' gives us the same behavior that we see with
gcc, and no warning. Note that this does not mean we actually use
less stack, as the functions call each other, and we still get
three copies of the large 'struct sw_flow_key' type on the stack.
The comment tells us that this was previously considered safe,
presumably since the netlink parsing functions are called with
a known backchain that does not also use a lot of stack space.
Fixes: 9cc9a5cb17 ("datapath: Avoid using stack larger than 1024.")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The memory allocated for the stats array may contain arbitrary data.
Fixes: e4f9ba642f ("net: phy: mscc: add support for VSC8514 PHY.")
Fixes: 00d70d8e0e ("net: phy: mscc: add support for VSC8574 PHY")
Fixes: a5afc16780 ("net: phy: mscc: add support for VSC8584 PHY")
Fixes: f76178dc52 ("net: phy: mscc: add ethtool statistics counters")
Signed-off-by: Andreas Schwab <schwab@suse.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
It turned out that the recent Intel HD-audio controller chips show a
significant stall during the system PM resume intermittently. It
doesn't happen so often and usually it may read back successfully
after one or more seconds, but in some rare worst cases the driver
went into fallback mode.
After trial-and-error, we found out that the communication stall seems
covered by issuing the sync after each verb write, as already done for
AMD and other chipsets. So this patch enables the write-sync flag for
the recent Intel chips, Skylake and onward, as a workaround.
Also, since Broxton and co have the very same driver flags as Skylake,
refer to the Skylake driver flags instead of defining the same
contents again for simplification.
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=201901
Reported-and-tested-by: Todd Brandt <todd.e.brandt@linux.intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
SFP modules connected using the SGMII interface have their own PHYs which
are handled by the struct phylink's phydev field. On the other hand, for
the modules connected using 1000Base-X interface that field is not set.
Since commit ce0aa27ff3 ("sfp: add sfp-bus to bridge between network
devices and sfp cages") phylink_start() ends up setting the phydev field
using the sfp-bus infrastructure, which eventually calls phy_start() on it,
and then calling phy_start() again on the same phydev from phylink_start()
itself. Similar call sequence holds for phylink_stop(), only in the reverse
order. This results in WARNs during network interface bringup and shutdown
when a copper SFP module is connected, as phy_start() and phy_stop() are
called twice in a row for the same phy_device:
% ip link set up dev eth0
------------[ cut here ]------------
called from state UP
WARNING: CPU: 1 PID: 155 at drivers/net/phy/phy.c:895 phy_start+0x74/0xc0
Modules linked in:
CPU: 1 PID: 155 Comm: backend Not tainted 5.2.0+ #1
NIP: c0227bf0 LR: c0227bf0 CTR: c004d224
REGS: df547720 TRAP: 0700 Not tainted (5.2.0+)
MSR: 00029000 <CE,EE,ME> CR: 24002822 XER: 00000000
GPR00: c0227bf0 df5477d8 df5d7080 00000014 df9d2370 df9d5ac4 1f4eb000 00000001
GPR08: c061fe58 00000000 00000000 df5477d8 0000003c 100c8768 00000000 00000000
GPR16: df486a00 c046f1c8 c046eea0 00000000 c046e904 c0239604 db68449c 00000000
GPR24: e9083204 00000000 00000001 db684460 e9083404 00000000 db6dce00 db6dcc00
NIP [c0227bf0] phy_start+0x74/0xc0
LR [c0227bf0] phy_start+0x74/0xc0
Call Trace:
[df5477d8] [c0227bf0] phy_start+0x74/0xc0 (unreliable)
[df5477e8] [c023cad0] startup_gfar+0x398/0x3f4
[df547828] [c023cf08] gfar_enet_open+0x364/0x374
[df547898] [c029d870] __dev_open+0xe4/0x140
[df5478c8] [c029db70] __dev_change_flags+0xf0/0x188
[df5478f8] [c029dc28] dev_change_flags+0x20/0x54
[df547918] [c02ae304] do_setlink+0x310/0x818
[df547a08] [c02b1eb8] __rtnl_newlink+0x384/0x6b0
[df547c28] [c02b222c] rtnl_newlink+0x48/0x68
[df547c48] [c02ad7c8] rtnetlink_rcv_msg+0x240/0x27c
[df547c98] [c02cc068] netlink_rcv_skb+0x8c/0xf0
[df547cd8] [c02cba3c] netlink_unicast+0x114/0x19c
[df547d08] [c02cbd74] netlink_sendmsg+0x2b0/0x2c0
[df547d58] [c027b668] sock_sendmsg_nosec+0x20/0x40
[df547d68] [c027d080] ___sys_sendmsg+0x17c/0x1dc
[df547e98] [c027df7c] __sys_sendmsg+0x68/0x84
[df547ef8] [c027e430] sys_socketcall+0x1a0/0x204
[df547f38] [c000d1d8] ret_from_syscall+0x0/0x38
--- interrupt: c01 at 0xfd4e030
LR = 0xfd4e010
Instruction dump:
813f0188 38800000 2b890005 419d0014 3d40c046 5529103a 394aa208 7c8a482e
3c60c046 3863a1b8 4cc63182 4be009a1 <0fe00000> 48000030 3c60c046 3863a1d0
---[ end trace d4c095aeaf6ea998 ]---
and
% ip link set down dev eth0
------------[ cut here ]------------
called from state HALTED
WARNING: CPU: 1 PID: 184 at drivers/net/phy/phy.c:858 phy_stop+0x3c/0x88
<...>
Call Trace:
[df581788] [c0228450] phy_stop+0x3c/0x88 (unreliable)
[df581798] [c022d548] sfp_sm_phy_detach+0x1c/0x44
[df5817a8] [c022e8cc] sfp_sm_event+0x4b0/0x87c
[df581848] [c022f04c] sfp_upstream_stop+0x34/0x44
[df581858] [c0225608] phylink_stop+0x7c/0xe4
[df581868] [c023c57c] stop_gfar+0x7c/0x94
[df581888] [c023c5b8] gfar_close+0x24/0x94
[df5818a8] [c0298688] __dev_close_many+0xdc/0xf8
[df5818c8] [c029db58] __dev_change_flags+0xd8/0x188
[df5818f8] [c029dc28] dev_change_flags+0x20/0x54
[df581918] [c02ae304] do_setlink+0x310/0x818
[df581a08] [c02b1eb8] __rtnl_newlink+0x384/0x6b0
[df581c28] [c02b222c] rtnl_newlink+0x48/0x68
[df581c48] [c02ad7c8] rtnetlink_rcv_msg+0x240/0x27c
[df581c98] [c02cc068] netlink_rcv_skb+0x8c/0xf0
[df581cd8] [c02cba3c] netlink_unicast+0x114/0x19c
[df581d08] [c02cbd74] netlink_sendmsg+0x2b0/0x2c0
[df581d58] [c027b668] sock_sendmsg_nosec+0x20/0x40
[df581d68] [c027d080] ___sys_sendmsg+0x17c/0x1dc
[df581e98] [c027df7c] __sys_sendmsg+0x68/0x84
[df581ef8] [c027e430] sys_socketcall+0x1a0/0x204
[df581f38] [c000d1d8] ret_from_syscall+0x0/0x38
<...>
---[ end trace d4c095aeaf6ea999 ]---
SFP modules with the 1000Base-X interface are not affected.
Place explicit calls to phy_start() and phy_stop() before enabling or after
disabling an attached SFP module, where phydev is not yet set (or is
already unset), so they will be made only from the inside of sfp-bus, if
needed.
Fixes: 2179626156 ("net: phy: warn if phy_start is called from invalid state")
Signed-off-by: Arseny Solokha <asolokha@kb.kras.ru>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marc Kleine-Budde says:
====================
pull-request: can 2019-07-24
this is a pull reqeust of 7 patches for net/master.
The first patch is by Rasmus Villemoes add a missing netif_carrier_off() to
register_candev() so that generic netdev trigger based LEDs are initially off.
Nikita Yushchenko's patch for the rcar_canfd driver fixes a possible IRQ storm
on high load.
The patch by Weitao Hou for the mcp251x driver add missing error checking to
the work queue allocation.
Both Wen Yang's and Joakim Zhang's patch for the flexcan driver fix a problem
with the stop-mode.
Stephane Grosjean contributes a patch for the peak_usb driver to fix a
potential double kfree_skb().
The last patch is by YueHaibing and fixes the error path in can-gw's
cgw_module_init() function.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a skip() message function that stops the test, logs an explanation,
and sets the "skip" return code (4).
Before loading a livepatch self-test kernel module, first verify that
we've built and installed it by running a 'modprobe --dry-run'. This
should catch a few environment issues, including !CONFIG_LIVEPATCH and
!CONFIG_TEST_LIVEPATCH. In these cases, exit gracefully with the new
skip() function.
Reported-by: Jiri Benc <jbenc@redhat.com>
Suggested-by: Shuah Khan <shuah@kernel.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Fix unreg_umr to move the MR to a kernel owned PD (i.e. the UMR PD) which
can't be accessed by userspace.
This ensures that nothing can continue to access the MR once it has been
placed in the kernels cache for reuse.
MRs in the cache continue to have their HW state, including DMA tables,
present. Even though the MR has been invalidated, changing the PD provides
an additional layer of protection against use of the MR.
Link: https://lore.kernel.org/r/20190723065733.4899-5-leon@kernel.org
Cc: <stable@vger.kernel.org> # 3.10
Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Use a direct firmware command to destroy the mkey in case the unreg UMR
operation has failed.
This prevents a case that a mkey will leak out from the cache post a
failure to be destroyed by a UMR WR.
In case the MR cache limit didn't reach a call to add another entry to the
cache instead of the destroyed one is issued.
In addition, replaced a warn message to WARN_ON() as this flow is fatal
and can't happen unless some bug around.
Link: https://lore.kernel.org/r/20190723065733.4899-4-leon@kernel.org
Cc: <stable@vger.kernel.org> # 4.10
Fixes: 49780d42df ("IB/mlx5: Expose MR cache for mlx5_ib")
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Commit 6935224da2 ("spi: bcm2835: enable support of 3-wire mode")
added 3-wire support to the BCM2835 SPI driver by setting the REN bit
(Read Enable) in the CS register when receiving data. The REN bit puts
the transmitter in high-impedance state. The driver recognizes that
data is to be received by checking whether the rx_buf of a transfer is
non-NULL.
Commit 3ecd37edaa ("spi: bcm2835: enable dma modes for transfers
meeting certain conditions") subsequently broke 3-wire support because
it set the SPI_MASTER_MUST_RX flag which causes spi_map_msg() to replace
rx_buf with a dummy buffer if it is NULL. As a result, rx_buf is
*always* non-NULL if DMA is enabled.
Reinstate 3-wire support by not only checking whether rx_buf is non-NULL,
but also checking that it is not the dummy buffer.
Fixes: 3ecd37edaa ("spi: bcm2835: enable dma modes for transfers meeting certain conditions")
Reported-by: Nuno Sá <nuno.sa@analog.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: stable@vger.kernel.org # v4.2+
Cc: Martin Sperl <kernel@martin.sperl.org>
Acked-by: Stefan Wahren <wahrenst@gmx.net>
Link: https://lore.kernel.org/r/328318841455e505370ef8ecad97b646c033dc8a.1562148527.git.lukas@wunner.de
Signed-off-by: Mark Brown <broonie@kernel.org>
It turns out that 'access()' (and 'faccessat()') can cause a lot of RCU
work because it installs a temporary credential that gets allocated and
freed for each system call.
The allocation and freeing overhead is mostly benign, but because
credentials can be accessed under the RCU read lock, the freeing
involves a RCU grace period.
Which is not a huge deal normally, but if you have a lot of access()
calls, this causes a fair amount of seconday damage: instead of having a
nice alloc/free patterns that hits in hot per-CPU slab caches, you have
all those delayed free's, and on big machines with hundreds of cores,
the RCU overhead can end up being enormous.
But it turns out that all of this is entirely unnecessary. Exactly
because access() only installs the credential as the thread-local
subjective credential, the temporary cred pointer doesn't actually need
to be RCU free'd at all. Once we're done using it, we can just free it
synchronously and avoid all the RCU overhead.
So add a 'non_rcu' flag to 'struct cred', which can be set by users that
know they only use it in non-RCU context (there are other potential
users for this). We can make it a union with the rcu freeing list head
that we need for the RCU case, so this doesn't need any extra storage.
Note that this also makes 'get_current_cred()' clear the new non_rcu
flag, in case we have filesystems that take a long-term reference to the
cred and then expect the RCU delayed freeing afterwards. It's not
entirely clear that this is required, but it makes for clear semantics:
the subjective cred remains non-RCU as long as you only access it
synchronously using the thread-local accessors, but you _can_ use it as
a generic cred if you want to.
It is possible that we should just remove the whole RCU markings for
->cred entirely. Only ->real_cred is really supposed to be accessed
through RCU, and the long-term cred copies that nfs uses might want to
explicitly re-enable RCU freeing if required, rather than have
get_current_cred() do it implicitly.
But this is a "minimal semantic changes" change for the immediate
problem.
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Paul E. McKenney <paulmck@linux.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Jan Glauber <jglauber@marvell.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Jayachandran Chandrasekharan Nair <jnair@marvell.com>
Cc: Greg KH <greg@kroah.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull powerpc fixes from Michael Ellerman:
"An assortment of non-regression fixes that have accumulated since the
start of the merge window.
- A fix for a user triggerable oops on machines where transactional
memory is disabled, eg. Power9 bare metal, Power8 with TM disabled
on the command line, or all Power7 or earlier machines.
- Three fixes for handling of PMU and power saving registers when
running nested KVM on Power9.
- Two fixes for bugs found while stress testing the XIVE interrupt
controller code, also on Power9.
- A fix to allow guests to boot under Qemu/KVM on Power9 using the
the Hash MMU with >= 1TB of memory.
- Two fixes for bugs in the recent DMA cleanup, one of which could
lead to checkstops.
- And finally three fixes for the PAPR SCM nvdimm driver.
Thanks to: Alexey Kardashevskiy, Andrea Arcangeli, Cédric Le Goater,
Christoph Hellwig, David Gibson, Gautham R. Shenoy, Michael Neuling,
Oliver O'Halloran, Satheesh Rajendran, Shawn Anastasio, Suraj Jitindar
Singh, Vaibhav Jain"
* tag 'powerpc-5.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/papr_scm: Force a scm-unbind if initial scm-bind fails
powerpc/papr_scm: Update drc_pmem_unbind() to use H_SCM_UNBIND_ALL
powerpc/pseries: Update SCM hcall op-codes in hvcall.h
powerpc/tm: Fix oops on sigreturn on systems without TM
powerpc/dma: Fix invalid DMA mmap behavior
KVM: PPC: Book3S HV: XIVE: fix rollback when kvmppc_xive_create fails
powerpc/xive: Fix loop exit-condition in xive_find_target_in_mask()
powerpc: fix off by one in max_zone_pfn initialization for ZONE_DMA
KVM: PPC: Book3S HV: Save and restore guest visible PSSCR bits on pseries
powerpc/pmu: Set pmcregs_in_use in paca when running as LPAR
KVM: PPC: Book3S HV: Always save guest pmu for guest capable of nesting
powerpc/mm: Limit rma_size to 1TB when running without HV mode
Pull KVM fixes from Paolo Bonzini:
"Bugfixes, a pvspinlock optimization, and documentation moving"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: X86: Boost queue head vCPU to mitigate lock waiter preemption
Documentation: move Documentation/virtual to Documentation/virt
KVM: nVMX: Set cached_vmcs12 and cached_shadow_vmcs12 NULL after free
KVM: X86: Dynamically allocate user_fpu
KVM: X86: Fix fpu state crash in kvm guest
Revert "kvm: x86: Use task structs fpu field for user"
KVM: nVMX: Clear pending KVM_REQ_GET_VMCS12_PAGES when leaving nested
Pull dma-mapping regression fix from Christoph Hellwig:
"Ensure that dma_addressing_limited doesn't crash on devices without a
dma mask (Eric Auger)"
* tag 'dma-mapping-5.3-2' of git://git.infradead.org/users/hch/dma-mapping:
dma-mapping: use dma_get_mask in dma_addressing_limited
The DMA API requires that 32-bit DMA masks are always supported, but on
arm LPAE configs they do not currently work when memory is present
above 4GB. Wire up the swiotlb code like for all other architectures
to provide the bounce buffering in that case.
Fixes: 21e07dba9f ("scsi: reduce use of block bounce buffers").
Reported-by: Roger Quadros <rogerq@ti.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Vignesh Raghavendra <vigneshr@ti.com>
Check that the pfn returned from arch_dma_coherent_to_pfn refers to
a valid page and reject the mmap / get_sgtable requests otherwise.
Based on the arm implementation of the mmap and get_sgtable methods.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Vignesh Raghavendra <vigneshr@ti.com>
We need to error out when trying to add an entry above SIDTAB_MAX in
sidtab_reverse_lookup() to avoid overflow on the odd chance that this
happens.
Cc: stable@vger.kernel.org
Fixes: ee1a84fdfe ("selinux: overhaul sidtab to fix bug and improve performance")
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Commit 11752adb (locking/pvqspinlock: Implement hybrid PV queued/unfair locks)
introduces hybrid PV queued/unfair locks
- queued mode (no starvation)
- unfair mode (good performance on not heavily contended lock)
The lock waiter goes into the unfair mode especially in VMs with over-commit
vCPUs since increaing over-commitment increase the likehood that the queue
head vCPU may have been preempted and not actively spinning.
However, reschedule queue head vCPU timely to acquire the lock still can get
better performance than just depending on lock stealing in over-subscribe
scenario.
Testing on 80 HT 2 socket Xeon Skylake server, with 80 vCPUs VM 80GB RAM:
ebizzy -M
vanilla boosting improved
1VM 23520 25040 6%
2VM 8000 13600 70%
3VM 3100 5400 74%
The lock holder vCPU yields to the queue head vCPU when unlock, to boost queue
head vCPU which is involuntary preemption or the one which is voluntary halt
due to fail to acquire the lock after a short spin in the guest.
Cc: Waiman Long <longman@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Commit a0d14b8909 ("x86/mm, tracing: Fix CR2 corruption") added the
address parameter to do_async_page_fault(), but does not pass it from the
32-bit entry point. To plumb it through, factor-out
common_exception_read_cr2 in the same fashion as common_exception, and uses
it from both page_fault and async_page_fault.
For a 32-bit KVM guest, this fixes:
Run /sbin/init as init process
Starting init: /sbin/init exists but couldn't execute it (error -14)
Fixes: a0d14b8909 ("x86/mm, tracing: Fix CR2 corruption")
Signed-off-by: Matt Mullins <mmullins@fb.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20190724042058.24506-1-mmullins@fb.com
This avoids a warning when building with -Wimplicit-fallthrough.
Fixes: 883a2a80f7 ("Input: elantech - enable SMBus on new (2018+) systems")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
This patch add error path for cgw_module_init to avoid possible crash if
some error occurs.
Fixes: c1aabdf379 ("can-gw: add netlink based CAN routing")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
When closing the CAN device while tx skbs are inflight, echo skb could
be released twice. By calling close_candev() before unlinking all
pending tx urbs, then the internal echo_skb[] array is fully and
correctly cleared before the USB write callback and, therefore,
can_get_echo_skb() are called, for each aborted URB.
Fixes: bb4785551f ("can: usb: PEAK-System Technik USB adapters driver core")
Signed-off-by: Stephane Grosjean <s.grosjean@peak-system.com>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
To enter stop mode, the CPU should manually assert a global Stop Mode
request and check the acknowledgment asserted by FlexCAN. The CPU must
only consider the FlexCAN in stop mode when both request and
acknowledgment conditions are satisfied.
Fixes: de3578c198 ("can: flexcan: add self wakeup support")
Reported-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Cc: linux-stable <stable@vger.kernel.org> # >= v5.0
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
The gpr_np variable is still being used in dev_dbg() after the
of_node_put() call, which may result in use-after-free.
Fixes: de3578c198 ("can: flexcan: add self wakeup support")
Signed-off-by: Wen Yang <wen.yang99@zte.com.cn>
Cc: linux-stable <stable@vger.kernel.org> # >= v5.0
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
add error check when workqueue alloc failed, and remove redundant code
to make it clear.
Fixes: e0000163e3 ("can: Driver for the Microchip MCP251x SPI CAN controllers")
Signed-off-by: Weitao Hou <houweitaoo@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Tested-by: Sean Nyekjaer <sean@geanix.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
We have observed rcar_canfd driver entering IRQ storm under high load,
with following scenario:
- rcar_canfd_global_interrupt() in entered due to Rx available,
- napi_schedule_prep() is called, and sets NAPIF_STATE_SCHED in state
- Rx fifo interrupts are masked,
- rcar_canfd_global_interrupt() is entered again, this time due to
error interrupt (e.g. due to overflow),
- since scheduled napi poller has not yet executed, condition for calling
napi_schedule_prep() from rcar_canfd_global_interrupt() remains true,
thus napi_schedule_prep() gets called and sets NAPIF_STATE_MISSED flag
in state,
- later, napi poller function rcar_canfd_rx_poll() gets executed, and
calls napi_complete_done(),
- due to NAPIF_STATE_MISSED flag in state, this call does not clear
NAPIF_STATE_SCHED flag from state,
- on return from napi_complete_done(), rcar_canfd_rx_poll() unmasks Rx
interrutps,
- Rx interrupt happens, rcar_canfd_global_interrupt() gets called
and calls napi_schedule_prep(),
- since NAPIF_STATE_SCHED is set in state at this time, this call
returns false,
- due to that false return, rcar_canfd_global_interrupt() returns
without masking Rx interrupt
- and this results into IRQ storm: unmasked Rx interrupt happens again
and again is misprocessed in the same way.
This patch fixes that scenario by unmasking Rx interrupts only when
napi_complete_done() returns true, which means it has cleared
NAPIF_STATE_SCHED in state.
Fixes: dd3bd23eb4 ("can: rcar_canfd: Add Renesas R-Car CAN FD driver")
Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
CONFIG_CAN_LEDS is deprecated. When trying to use the generic netdev
trigger as suggested, there's a small inconsistency with the link
property: The LED is on initially, stays on when the device is brought
up, and then turns off (as expected) when the device is brought down.
Make sure the LED always reflects the state of the CAN device.
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Renaming docs seems to be en vogue at the moment, so fix on of the
grossly misnamed directories. We usually never use "virtual" as
a shortcut for virtualization in the kernel, but always virt,
as seen in the virt/ top-level directory. Fix up the documentation
to match that.
Fixes: ed16648eb5 ("Move kvm, uml, and lguest subdirectories under a common "virtual" directory, I.E:")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We are currently using a wrong register for dcan revision. Although
this is currently only used for detecting the dcan module, let's
fix it to avoid confusion.
Tested-by: Keerthy <j-keerthy@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
The ti,no-idle-on-init and ti,no-reset-on-init flags need to be at
the interconnect target module level for the modules that have it
defined. Otherwise we get the following warnings:
dts flag should be at module level for ti,no-idle-on-init
dts flag should be at module level for ti,no-reset-on-init
Reviewed-by: Suman Anna <s-anna@ti.com>
Tested-by: Keerthy <j-keerthy@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
We have cases where there are no softreset bits like with am335x lcdc.
In that case ti,sysc-mask = <0> needs to be handled properly.
Tested-by: Keerthy <j-keerthy@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
For some devices we can get the following warning on boot:
ti-sysc 48485200.target-module: sysc_disable_module: invalid midlemode
Fix this by treating SYSC_IDLE_FORCE like we do for the other bits
for idlemodes mask.
Fixes: d59b60564c ("bus: ti-sysc: Add generic enable/disable functions")
Cc: Roger Quadros <rogerq@ti.com>
Reviewed-by: Suman Anna <s-anna@ti.com>
Tested-by: Keerthy <j-keerthy@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
TRM says PWMSS_SYSCONFIG bit for SOFTRESET changes to zero when
reset is completed. Let's configure it as otherwise we get warnings
on boot when we check the data against dts provided data. Eventually
the legacy platform data will be just dropped, but let's fix the
warning first.
Reviewed-by: Suman Anna <s-anna@ti.com>
Tested-by: Keerthy <j-keerthy@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Retrying immediately after we've received a 'transitioning' sense code is
pretty much pointless, we should always use a delay before retrying. So
ensure the default delay is applied before retrying.
Signed-off-by: Hannes Reinecke <hare@suse.com>
Tested-by: Zhangguanghui <zhang.guanghui@h3c.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
#define relative to FCOE CTLR start with FCOE_CTLR, except
FCOE_CTRL_SOL_TOV.
This is likely a typo and CTRL should be CTLR here as well.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fix sparse warnings:
drivers/scsi/megaraid/megaraid_sas_fusion.c:541:1: warning: symbol 'megasas_alloc_cmdlist_fusion' was not declared. Should it be static?
drivers/scsi/megaraid/megaraid_sas_fusion.c:580:1: warning: symbol 'megasas_alloc_request_fusion' was not declared. Should it be static?
drivers/scsi/megaraid/megaraid_sas_fusion.c:661:1: warning: symbol 'megasas_alloc_reply_fusion' was not declared. Should it be static?
drivers/scsi/megaraid/megaraid_sas_fusion.c:738:1: warning: symbol 'megasas_alloc_rdpq_fusion' was not declared. Should it be static?
drivers/scsi/megaraid/megaraid_sas_fusion.c:920:1: warning: symbol 'megasas_alloc_cmds_fusion' was not declared. Should it be static?
drivers/scsi/megaraid/megaraid_sas_fusion.c:1740:1: warning: symbol 'megasas_init_adapter_fusion' was not declared. Should it be static?
drivers/scsi/megaraid/megaraid_sas_fusion.c:1966:1: warning: symbol 'map_cmd_status' was not declared. Should it be static?
drivers/scsi/megaraid/megaraid_sas_fusion.c:2379:1: warning: symbol 'megasas_set_pd_lba' was not declared. Should it be static?
drivers/scsi/megaraid/megaraid_sas_fusion.c:2718:1: warning: symbol 'megasas_build_ldio_fusion' was not declared. Should it be static?
drivers/scsi/megaraid/megaraid_sas_fusion.c:3215:1: warning: symbol 'megasas_build_io_fusion' was not declared. Should it be static?
drivers/scsi/megaraid/megaraid_sas_fusion.c:3328:6: warning: symbol 'megasas_prepare_secondRaid1_IO' was not declared. Should it be static?
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
While loading fw crashdump in function fw_crash_buffer_show(), left bytes
in one dma chunk was not checked, if copying size over it, overflow access
will cause kernel panic.
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Acked-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Pull parisc fixes from Helge Deller:
- Fix build issues when kprobes are enabled
- Speed up ITLB/DTLB cache flushes when running on machines with
combined TLBs
* 'parisc-5.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Flush ITLB in flush_tlb_all_local() only on split TLB machines
parisc: add kprobe_fault_handler()
'channels.max_combined' initialized only on ioctl success and
errno is only valid on ioctl failure.
The code doesn't produce any runtime issues, but makes memory
sanitizers angry:
Conditional jump or move depends on uninitialised value(s)
at 0x55C056F: xsk_get_max_queues (xsk.c:336)
by 0x55C05B2: xsk_create_bpf_maps (xsk.c:354)
by 0x55C089F: xsk_setup_xdp_prog (xsk.c:447)
by 0x55C0E57: xsk_socket__create (xsk.c:601)
Uninitialised value was created by a stack allocation
at 0x55C04CD: xsk_get_max_queues (xsk.c:318)
Additionally fixed warning on uninitialized bytes in ioctl arguments:
Syscall param ioctl(SIOCETHTOOL) points to uninitialised byte(s)
at 0x648D45B: ioctl (in /usr/lib64/libc-2.28.so)
by 0x55C0546: xsk_get_max_queues (xsk.c:330)
by 0x55C05B2: xsk_create_bpf_maps (xsk.c:354)
by 0x55C089F: xsk_setup_xdp_prog (xsk.c:447)
by 0x55C0E57: xsk_socket__create (xsk.c:601)
Address 0x1ffefff378 is on thread 1's stack
in frame #1, created by xsk_get_max_queues (xsk.c:318)
Uninitialised value was created by a stack allocation
at 0x55C04CD: xsk_get_max_queues (xsk.c:318)
CC: Magnus Karlsson <magnus.karlsson@intel.com>
Fixes: 1cad078842 ("libbpf: add support for using AF_XDP sockets")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Pull perf/urgent fixes from Arnaldo Carvalho de Melo:
perf.data:
Alexey Budankov:
- Fix loading of compressed data split across adjacent records
Jiri Olsa:
- Fix buffer size setting for processing CPU topology perf.data header.
perf stat:
Jiri Olsa:
- Fix segfault for event group in repeat mode
Cong Wang:
- Always separate "stalled cycles per insn" line, it was being appended to
the "instructions" line.
perf script:
Andi Kleen:
- Fix --max-blocks man page description.
- Improve man page description of metrics.
- Fix off by one in brstackinsn IPC computation.
perf probe:
Arnaldo Carvalho de Melo:
- Avoid calling freeing routine multiple times for same pointer.
perf build:
- Do not use -Wshadow on gcc < 4.8, avoiding too strict warnings
treated as errors, breaking the build.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Eric Dumazet says:
====================
First patch changes the kernel, second patch
adds a new test.
Note that other patches might be needed to take
care of similar issues in sock_ops_convert_ctx_access()
and SOCK_OPS_GET_FIELD()
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Use BPF_REG_1 for source and destination of gso_segs read,
to exercise "bpf: fix access to skb_shared_info->gso_segs" fix.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Suggested-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Replace the custom implementation with fwnode_get_mac_address,
which works on both DT and ACPI platforms.
While here, replace memcpy() by ether_addr_copy().
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The very first check in test_pkt_md_access is failing on s390, which
happens because loading a part of a struct __sk_buff field produces
an incorrect result.
The preprocessed code of the check is:
{
__u8 tmp = *((volatile __u8 *)&skb->len +
((sizeof(skb->len) - sizeof(__u8)) / sizeof(__u8)));
if (tmp != ((*(volatile __u32 *)&skb->len) & 0xFF)) return 2;
};
clang generates the following code for it:
0: 71 21 00 03 00 00 00 00 r2 = *(u8 *)(r1 + 3)
1: 61 31 00 00 00 00 00 00 r3 = *(u32 *)(r1 + 0)
2: 57 30 00 00 00 00 00 ff r3 &= 255
3: 5d 23 00 1d 00 00 00 00 if r2 != r3 goto +29 <LBB0_10>
Finally, verifier transforms it to:
0: (61) r2 = *(u32 *)(r1 +104)
1: (bc) w2 = w2
2: (74) w2 >>= 24
3: (bc) w2 = w2
4: (54) w2 &= 255
5: (bc) w2 = w2
The problem is that when verifier emits the code to replace a partial
load of a struct __sk_buff field (*(u8 *)(r1 + 3)) with a full load of
struct sk_buff field (*(u32 *)(r1 + 104)), an optional shift and a
bitwise AND, it assumes that the machine is little endian and
incorrectly decides to use a shift.
Adjust shift count calculation to account for endianness.
Fixes: 31fd85816d ("bpf: permits narrower load from bpf program context fields")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The onboard sky2 NIC on ASUS P6T WS PRO doesn't work after PM resume
due to the infamous IRQ problem. Disabling MSI works around it, so
let's add it to the blacklist.
Unfortunately the BIOS on the machine doesn't fill the standard
DMI_SYS_* entry, so we pick up DMI_BOARD_* entries instead.
BugLink: https://bugzilla.suse.com/show_bug.cgi?id=1142496
Reported-and-tested-by: Marcus Seyfarth <m.seyfarth@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Each iteration of for_each_child_of_node puts the previous node, but in
the case of a return from the middle of the loop, there is no put, thus
causing a memory leak. Hence add an of_node_put before the return.
Issue found with Coccinelle.
Signed-off-by: Nishka Dasgupta <nishkadg.linux@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Each iteration of for_each_available_child_of_node puts the previous
node, but in the case of a return from the middle of the loop, there is
no put, thus causing a memory leak. Hence add an of_node_put before the
return.
Issue found with Coccinelle.
Signed-off-by: Nishka Dasgupta <nishkadg.linux@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[Why]
In an effort to stop redundant calls to dce110_disable_audio_stream
the audio->enabled flag was added to the audio resource struct. While
this state probably shouldn't have been tracked on the audio struct
itself it still works fine for some sequences.
However, it does not work for cases where we're freeing the audio
resource (such as hotplugs) or when dynamic audio is enabled.
In these cases the pipe_ctx->stream_res.audio = NULL before we can
set audio->enabled = false. The next time we acquire the audio resource
such as on hotplug the audio will not be enabled for the stream since
DC thinks it's still enabled.
Audio state tracking should cover this sequence.
[How]
Set audio->enabled = false at the start as long as we have
pipe_ctx->stream_res.audio.
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Reviewed-by: Zhan Liu <Zhan.Liu@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Ido Schimmel says:
====================
selftests: forwarding: GRE multipath fixes
Patch #1 ensures IPv4 forwarding is enabled during the test.
Patch #2 fixes the flower filters used to measure the distribution of
the traffic between the two nexthops, so that the test will pass
regardless if traffic is offloaded or not.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The TC filters used in the test do not work with veth devices because the
outer Ethertype is 802.1Q and not IPv4. The test passes with mlxsw
netdevs since the hardware always looks at "The first Ethertype that
does not point to either: VLAN, CNTAG or configurable Ethertype".
Fix this by matching on the VLAN ID instead, but on the ingress side.
The reason why this is not performed at egress is explained in the
commit cited below.
Fixes: 541ad323db ("selftests: forwarding: gre_multipath: Update next-hop statistics match criteria")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Stephen Suryaputra <ssuryaextr@gmail.com>
Tested-by: Stephen Suryaputra <ssuryaextr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The test did not enable IPv4 forwarding during its setup phase, which
causes the test to fail on machines where IPv4 forwarding is disabled.
Fixes: 54818c4c4b ("selftests: forwarding: Test multipath tunneling")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Stephen Suryaputra <ssuryaextr@gmail.com>
Tested-by: Stephen Suryaputra <ssuryaextr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
i.MX fixes for 5.3:
- Fix i.MX8MM SAI3 RXC/TXFS pinmux configuration.
- Fix i.MX7ULP usb-phy unit address to drop extra '0x' notation.
- Fix typo of clock frequency property name in a few i.MX6UL board
I2C buses.
- Drop "fsl,imx6sx-sai" from i.MX8M SAI device, as it's not compatible
with i.MX6SX SAI.
* tag 'imx-fixes-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
arm64: dts: imx8mq: fix SAI compatible
arm64: dts: imx8mm: Correct SAI3 RXC/TXFS pin's mux option #1
ARM: dts: imx6ul: fix clock frequency property name of I2C buses
ARM: dts: imx7ulp: Fix usb-phy unit address format
Link: https://lore.kernel.org/r/20190723090827.GU15632@dragon
Signed-off-by: Olof Johansson <olof@lixom.net>
This enables the new or updates driver options for U8500
that got merged into v5.3-rc1:
- CMA, MCDE driver, LIMA driver and the Samsung S6D16D0 driver
enabled by default bringing up the new graphics support.
Include the LOGO so we can see when the graphics are live.
- We use the IIO hwmon bridge for reflecting temperature
in the system.
- Set MUSB to PIO mode as this is the one working most stable
for the time being.
- HWSPINLOCK needs to be set to get the hardware semaphore
driver to compile and link properly.
Link: https://lore.kernel.org/r/20190723081523.13079-2-linus.walleij@linaro.org
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Olof Johansson <olof@lixom.net>
When building a multiplatform kernel that includes armv4 support,
the default target CPU does not support the blx instruction,
which leads to a build failure:
arch/arm/mach-davinci/sleep.S: Assembler messages:
arch/arm/mach-davinci/sleep.S:56: Error: selected processor does not support `blx ip' in ARM mode
Add a .arch statement in the sources to make this file build.
Link: https://lore.kernel.org/r/20190722145211.1154785-1-arnd@arndb.de
Acked-by: Sekhar Nori <nsekhar@ti.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Olof Johansson <olof@lixom.net>
This reverts commit 0298d54352.
With this patch, set 'poll_queues > hard queues' will lead to 'nr_read_queues = 0'
in nvme_calc_irq_sets. Then poll_queues setting can fail since dev->tagset.nr_maps
equals to 2 and nvme_pci_map_queues will not do map for poll queues.
Signed-off-by: yangerkun <yangerkun@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
When freeing the subsystem after finding another match with
__nvme_find_get_subsystem(), use put_device() instead of
__nvme_release_subsystem() which calls kfree() directly.
Per the documentation, put_device() should always be used
after device_initialization() is called. Otherwise, leaks
like the one below which was detected by kmemleak may occur.
Once the call of __nvme_release_subsystem() is removed it no
longer makes sense to keep the helper, so fold it back
into nvme_release_subsystem().
unreferenced object 0xffff8883d12bfbc0 (size 16):
comm "nvme", pid 2635, jiffies 4294933602 (age 739.952s)
hex dump (first 16 bytes):
6e 76 6d 65 2d 73 75 62 73 79 73 32 00 88 ff ff nvme-subsys2....
backtrace:
[<000000007d8fc208>] __kmalloc_track_caller+0x16d/0x2a0
[<0000000081169e5f>] kvasprintf+0xad/0x130
[<0000000025626f25>] kvasprintf_const+0x47/0x120
[<00000000fa66ad36>] kobject_set_name_vargs+0x44/0x120
[<000000004881f8b3>] dev_set_name+0x98/0xc0
[<000000007124dae3>] nvme_init_identify+0x1995/0x38e0
[<000000009315020a>] nvme_loop_configure_admin_queue+0x4fa/0x5e0
[<000000001a63e766>] nvme_loop_create_ctrl+0x489/0xf80
[<00000000a46ecc23>] nvmf_dev_write+0x1a12/0x2220
[<000000002259b3d5>] __vfs_write+0x66/0x120
[<000000002f6df81e>] vfs_write+0x154/0x490
[<000000007e8cfc19>] ksys_write+0x10a/0x240
[<00000000ff5c7b85>] __x64_sys_write+0x73/0xb0
[<00000000fee6d692>] do_syscall_64+0xaa/0x470
[<00000000997e1ede>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
Fixes: ab9e00cc72 ("nvme: track subsystems")
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
The ADATA SX6000LNP NVMe SSDs have the same subnqn and, due to this, a
system with more than one of these SSDs will only have one usable.
[ 0.942706] nvme nvme1: ignoring ctrl due to duplicate subnqn (nqn.2018-05.com.example:nvme:nvm-subsystem-OUI00E04C).
[ 0.943017] nvme nvme1: Removing after probe failure status: -22
02:00.0 Non-Volatile memory controller [0108]: Realtek Semiconductor Co., Ltd. Device [10ec:5762] (rev 01)
71:00.0 Non-Volatile memory controller [0108]: Realtek Semiconductor Co., Ltd. Device [10ec:5762] (rev 01)
There are no firmware updates available from the vendor, unfortunately.
Applying the NVME_QUIRK_IGNORE_DEV_SUBNQN quirk for these SSDs resolves
the issue, and they all work after this patch:
/dev/nvme0n1 2J1120050420 ADATA SX6000LNP [...]
/dev/nvme1n1 2J1120050540 ADATA SX6000LNP [...]
Signed-off-by: Misha Nasledov <misha@nasledov.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
We currently have cases where the dma_addressing_limited() gets
called with dma_mask unset. This causes a NULL pointer dereference.
Use dma_get_mask() accessor to prevent the crash.
Fixes: b866455423 ("dma-mapping: add a dma_addressing_limited helper")
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
AMD IOMMU requires IntCapXT registers to be setup in order to generate
its own interrupts (for Event Log, PPR Log, and GA Log) with 32-bit
APIC destination ID. Without this support, AMD IOMMU MSI interrupts
will not be routed correctly when booting the system in X2APIC mode.
Cc: Joerg Roedel <joro@8bytes.org>
Fixes: 90fcffd9cf ('iommu/amd: Add support for IOMMU XT mode')
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
There are some new HP laptops with Elantech touchpad that don't support
multitouch.
Currently we use ETP_NEW_IC_SMBUS_HOST_NOTIFY() to check if SMBus is supported,
but in addition to firmware version, the bus type also informs us whether the IC
can support SMBus. To avoid breaking old ICs, we will only enable SMbus support
based the bus type on systems manufactured after 2018.
Lastly, let's consolidate all checks into elantech_use_host_notify() and use it
to determine whether to use PS/2 or SMBus.
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Acked-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
This fixes a typo in the keyboard_protocol description.
coodinate -> coordinate.
Signed-off-by: Nikolas Nyby <nikolas@gnu.org>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
In some rare randconfig builds, CRC16 is disabled, which leads
to a link error:
drivers/input/keyboard/applespi.o: In function `applespi_send_cmd_msg':
applespi.c:(.text+0x449f): undefined reference to `crc16'
drivers/input/keyboard/applespi.o: In function `applespi_verify_crc':
applespi.c:(.text+0x7538): undefined reference to `crc16'
This symbol is meant to be selected for each user in Kconfig,
so do that here as well.
Fixes: 038b1a05ea ("Input: add Apple SPI keyboard and trackpad driver")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Building with clang and KASAN, we get a warning about an overly large
stack frame on 32-bit architectures:
drivers/block/drbd/drbd_receiver.c:921:31: error: stack frame size of 1280 bytes in function 'conn_connect'
[-Werror,-Wframe-larger-than=]
We already allocate other data dynamically in this function, so
just do the same for the shash descriptor, which makes up most of
this memory.
Link: https://lore.kernel.org/lkml/20190617132440.2721536-1-arnd@arndb.de/
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Roland Kammerer <roland.kammerer@linbit.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
blk_mq_sched_completed_request is a function that checks if the elevator
related to the request has started_request implemented, but currently, none of
the available IO schedulers implement started_request, so remove both.
Signed-off-by: Marcos Paulo de Souza <marcos.souza.org@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
put_device will call ac97_codec_release to free
ac97_codec_device and other resources, so remove the kfree
and other redundant code.
Fixes: 74426fbff6 ("ALSA: ac97: add an ac97 bus")
Signed-off-by: Ding Xiang <dingxiang@cmss.chinamobile.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
When perf_add_probe_events() we call cleanup_perf_probe_events() for the
pev pointer it receives, then, as part of handling this failure the main
'perf probe' goes on and calls cleanup_params() and that will again call
cleanup_perf_probe_events()for the same pointer, so just set nevents to
zero when handling the failure of perf_add_probe_events() to avoid the
double free.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-x8qgma4g813z96dvtw9w219q@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
So that, when perf_add_probe_events() fails, like in:
# perf probe icmp_rcv:64 "type=icmph->type"
Failed to find 'icmph' in this function.
Error: Failed to add events.
Segmentation fault (core dumped)
#
We don't segfault.
clear_perf_probe_event() was zeroing the whole pev, and since the switch
to zfree() for the members in the pev, that memset() was removed, which
left nargs with its original value, in the above case 1.
With the memset the same pev could be passed to clear_perf_probe_event()
multiple times, since all it would have would be zeroes, and free()
accepts zero, the loop would not happen and we would just memset it
again to zeroes.
Without it we got that segfault, so zero nargs to keep it like it was,
next cset will avoid calling clear_perf_probe_event() for the same pevs
in case of failure.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Fixes: d8f9da2404 ("perf tools: Use zfree() where applicable")
Link: https://lkml.kernel.org/n/tip-802f2jypnwqsvyavvivs8464@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Fix decompression failure found during the loading of compressed trace
collected on larger scale systems (>48 cores).
The error happened due to lack of decompression space for a mmaped
buffer data chunk split across adjacent PERF_RECORD_COMPRESSED records.
$ perf report -i bt.16384.data --stats
failed to decompress (B): 63869 -> 0 : Destination buffer is too small
user stack dump failure
Can't parse sample, err = -14
0x2637e436 [0x4080]: failed to process type: 9
Error:
failed to process sample
$ perf test 71
71: Zstd perf.data compression/decompression : Ok
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/4d839e1b-9c48-89c4-9702-a12217420611@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The "stalled cycles per insn" is appended to "instructions" when the CPU
has this hardware counter directly. We should always make it a separate
line, which also aligns to the output when we hit the "if (total &&
avg)" branch.
Before:
$ sudo perf stat --all-cpus --field-separator , --log-fd 1 -einstructions,cycles -- sleep 1
4565048704,,instructions,64114578096,100.00,1.34,insn per cycle,,
3396325133,,cycles,64146628546,100.00,,
After:
$ sudo ./tools/perf/perf stat --all-cpus --field-separator , --log-fd 1 -einstructions,cycles -- sleep 1
6721924,,instructions,24026790339,100.00,0.22,insn per cycle
,,,,,0.00,stalled cycles per insn
30939953,,cycles,24025512526,100.00,,
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/20190517221039.8975-1-xiyou.wangcong@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Numfor Mbiziwo-Tiapo reported segfault on stat of event group in repeat
mode:
# perf stat -e '{cycles,instructions}' -r 10 ls
It's caused by memory corruption due to not cleaned evsel's id array and
index, which needs to be rebuilt in every stat iteration. Currently the
ids index grows, while the array (which is also not freed) has the same
size.
Fixing this by releasing id array and zeroing ids index in
perf_evsel__close function.
We also need to keep the evsel_list alive for stat record (which is
disabled in repeat mode).
Reported-by: Numfor Mbiziwo-Tiapo <nums@google.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Drayton <mbd@fb.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/20190715142121.GC6032@krava
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
apq8016_sbc_parse_of() sets up multiple DAI links, depending on the
number of nodes in the device tree. However, at the moment
CPU and platform components are only allocated for the first link.
This causes an oops when more than one link is defined:
Internal error: Oops: 96000044 [#1] SMP
CPU: 0 PID: 1015 Comm: kworker/0:2 Not tainted 5.3.0-rc1 #4
Call trace:
apq8016_sbc_platform_probe+0x1a8/0x3f0
platform_drv_probe+0x50/0xa0
...
Move the allocation inside the loop to ensure that each link is
properly initialized.
Fixes: 98b232ca9e ("ASoC: qcom: apq8016_sbc: use modern dai_link style")
Signed-off-by: Stephan Gerhold <stephan@gerhold.net>
Acked-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Link: https://lore.kernel.org/r/20190722130352.95874-1-stephan@gerhold.net
Signed-off-by: Mark Brown <broonie@kernel.org>
Draining makes little sense in the situation of hardware overrun, as the
hardware will have consumed all its available samples. Additionally,
draining whilst the stream is paused would presumably get stuck as no
data is being consumed on the DSP side.
Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Acked-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Partial drain and next track are intended for gapless playback and
don't really have an obvious interpretation for a capture stream, so
makes sense to not allow those operations on capture streams.
Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Acked-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Currently, whilst in SNDRV_PCM_STATE_OPEN it is possible to call
snd_compr_stop, snd_compr_drain and snd_compr_partial_drain, which
allow a transition to SNDRV_PCM_STATE_SETUP. The stream should
only be able to move to the setup state once it has received a
SNDRV_COMPRESS_SET_PARAMS ioctl. Fix this issue by not allowing
those ioctls whilst in the open state.
Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Acked-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
A previous fix to the stop handling on compressed capture streams causes
some knock on issues. The previous fix updated snd_compr_drain_notify to
set the state back to PREPARED for capture streams. This causes some
issues however as the handling for snd_compr_poll differs between the
two states and some user-space applications were relying on the poll
failing after the stream had been stopped.
To correct this regression whilst still fixing the original problem the
patch was addressing, update the capture handling to skip the PREPARED
state rather than skipping the SETUP state as it has done until now.
Fixes: 4f2ab5e1d1 ("ALSA: compress: Fix stop handling on compressed capture streams")
Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Acked-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Add KASAN instrumentation of architecture-specific asm implementation
of bitops. It also covers s390 specific *_inv functions.
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Make s390/bitops test functions return bool values. That enforces return
value range to 0 and 1 and matches with asm-generic/bitops-instrumented.h
declarations as well as some other architectures implementations.
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Masahiro Yamada changed the zcrypt.h header file to use __u{16,32,64}
instead of uint{16,32,64}_t with ("s390: use __u{16,32,64} instead of
uint{16,32,64}_t in uapi header").
This makes all s390 header files pass - remove zcrypt.h from the blacklist.
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
When CONFIG_UAPI_HEADER_TEST=y, exported headers are compile-tested to
make sure they can be included from user-space.
Currently, zcrypt.h is excluded from the test coverage. To make it
join the compile-test, we need to fix the build errors attached below.
For a case like this, we decided to use __u{8,16,32,64} variable types
in this discussion:
https://lkml.org/lkml/2019/6/5/18
Build log:
CC usr/include/asm/zcrypt.h.s
In file included from <command-line>:32:0:
./usr/include/asm/zcrypt.h:163:2: error: unknown type name ‘uint16_t’
uint16_t cprb_len;
^~~~~~~~
./usr/include/asm/zcrypt.h:168:2: error: unknown type name ‘uint32_t’
uint32_t source_id;
^~~~~~~~
./usr/include/asm/zcrypt.h:169:2: error: unknown type name ‘uint32_t’
uint32_t target_id;
^~~~~~~~
./usr/include/asm/zcrypt.h:170:2: error: unknown type name ‘uint32_t’
uint32_t ret_code;
^~~~~~~~
./usr/include/asm/zcrypt.h:171:2: error: unknown type name ‘uint32_t’
uint32_t reserved1;
^~~~~~~~
./usr/include/asm/zcrypt.h:172:2: error: unknown type name ‘uint32_t’
uint32_t reserved2;
^~~~~~~~
./usr/include/asm/zcrypt.h:173:2: error: unknown type name ‘uint32_t’
uint32_t payload_len;
^~~~~~~~
./usr/include/asm/zcrypt.h:182:2: error: unknown type name ‘uint16_t’
uint16_t ap_id;
^~~~~~~~
./usr/include/asm/zcrypt.h:183:2: error: unknown type name ‘uint16_t’
uint16_t dom_id;
^~~~~~~~
./usr/include/asm/zcrypt.h:198:2: error: unknown type name ‘uint16_t’
uint16_t targets_num;
^~~~~~~~
./usr/include/asm/zcrypt.h:199:2: error: unknown type name ‘uint64_t’
uint64_t targets;
^~~~~~~~
./usr/include/asm/zcrypt.h:200:2: error: unknown type name ‘uint64_t’
uint64_t weight;
^~~~~~~~
./usr/include/asm/zcrypt.h:201:2: error: unknown type name ‘uint64_t’
uint64_t req_no;
^~~~~~~~
./usr/include/asm/zcrypt.h:202:2: error: unknown type name ‘uint64_t’
uint64_t req_len;
^~~~~~~~
./usr/include/asm/zcrypt.h:203:2: error: unknown type name ‘uint64_t’
uint64_t req;
^~~~~~~~
./usr/include/asm/zcrypt.h:204:2: error: unknown type name ‘uint64_t’
uint64_t resp_len;
^~~~~~~~
./usr/include/asm/zcrypt.h:205:2: error: unknown type name ‘uint64_t’
uint64_t resp;
^~~~~~~~
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
The IQD mcast queue doesn't support QAOB mode, so skip the
qdio_enable_async_operation() setup call for this queue. This avoids
the allocation of an unneeded QAOB pointer array, and sets up q->use_cq
properly so that drivers are prohibited from using QAOBs for mcast
traffic.
Take this opportunity to streamline the q->use_cq and aob != 0 checks.
The path to qdio_siga_output() is straight-forward, we don't need to
worry about being called with bad operands.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
If the device driver were to send out a full queue's worth of SBALs,
current code would end up discovering the last of those SBALs as PRIMED
and erroneously skip the SIGA-w. This immediately stalls the queue.
Add a check to not attempt fast-requeue in this case. While at it also
make sure that the state of the previous SBAL was successfully extracted
before inspecting it.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Extend "parmarea" to include an offset of the version string, which is
stored as 8-byte big endian value.
To retrieve version string from bzImage reliably, one should check the
presence of "S390EP" ascii string at 0x10008 (available since v3.2),
then read the version string offset from 0x10428 (which has been 0
since v3.2 up to now). The string is null terminated.
Could be retrieved with the following "file" command magic (requires
file v5.34):
8 string \x02\x00\x00\x18\x60\x00\x00\x50\x02\x00\x00\x68\x60\x00\x00\x50\x40\x40\x40\x40\x40\x40\x40\x40 Linux S390
>0x10008 string S390EP
>>0x10428 bequad >0
>>>(0x10428.Q) string >\0 \b, version %s
Reported-by: Petr Tesarik <ptesarik@suse.com>
Suggested-by: Petr Tesarik <ptesarik@suse.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
We use "pmc->idx" and the "chained" bitmap to determine if the pmc is
chained, in kvm_pmu_pmc_is_chained(). But idx might be uninitialized
(and random) when we doing this decision, through a KVM_ARM_VCPU_INIT
ioctl -> kvm_pmu_vcpu_reset(). And the test_bit() against this random
idx will potentially hit a KASAN BUG [1].
In general, idx is the static property of a PMU counter that is not
expected to be modified across resets, as suggested by Julien. It
looks more reasonable if we can setup the PMU counter idx for a vcpu
in its creation time. Introduce a new function - kvm_pmu_vcpu_init()
for this basic setup. Oh, and the KASAN BUG will get fixed this way.
[1] https://www.spinics.net/lists/kvm-arm/msg36700.html
Fixes: 80f393a23b ("KVM: arm/arm64: Support chained PMU counters")
Suggested-by: Andrew Murray <andrew.murray@arm.com>
Suggested-by: Julien Thierry <julien.thierry@arm.com>
Acked-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
I mistakenly dropped the inline while resolving the patch conflicts in
the previous fix patch. Without inline, we get compiler warnings wrt
unused functions.
Note that Mauro's original patch contained the correct changes; it's
all my fault to submit a patch before a morning coffee.
Fixes: c8917b8ff0 ("firmware: fix build errors in paged buffer handling code")
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://lore.kernel.org/r/20190723081159.22624-1-tiwai@suse.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The stub function for !CONFIG_IOMMU_IOVA needs to be
'static inline'.
Fixes: effa467870 ('iommu/vt-d: Don't queue_iova() if there is no flush queue')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The cpu variable is still being used in the of_get_property() call
after the of_node_put() call, which may result in use-after-free.
Fixes: a9acc26b75 ("cpufreq/pasemi: fix possible object reference leak")
Signed-off-by: Wen Yang <wen.yang99@zte.com.cn>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The i.MX8M SAI block is not compatible with the i.MX6SX one, as the
register layout has changed due to two version registers being added
at the beginning of the address map. Remove the bogus compatible.
Fixes: 8c61538dc9 ("arm64: dts: imx8mq: Add SAI2 node")
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Daniel Baluta <daniel.baluta@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Passing 0 to cpuhp_remove_state() triggers the BUG_ON() in
__cpuhp_remove_state_cpuslocked() and the argument passed to
powercap_unregister_control_type() is expected to be a valid
pointer, so avoid calling these functions with incorrect
arguments from proc_thermal_rapl_remove().
Fixes: 555c45fe0d ("int340X/processor_thermal_device: add support for MMIO RAPL")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Zhang Rui <rui.zhang@intel.com>
According to i.MX8MM reference manual Rev.1, 03/2019:
SAI3_RXC pin's mux option #1 should be GPT1_CLK, NOT GPT1_CAPTURE2;
SAI3_TXFS pin's mux option #1 should be GPT1_CAPTURE2, NOT GPT1_CLK.
Fixes: c1c9d41319 ("dt-bindings: imx: Add pinctrl binding doc for imx8mm")
Signed-off-by: Anson Huang <Anson.Huang@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
We should only call dma_max_mapping_size for devices that have a DMA mask
set, otherwise we can run into a NULL pointer dereference that will crash
the system.
Also we need to do right shift to get the sectors from the size in bytes,
not a left shift.
Fixes: bdd17bdef7 ("scsi: core: take the DMA max mapping size into account")
Reported-by: Bart Van Assche <bvanassche@acm.org>
Reported-by: Ming Lei <tom.leiming@gmail.com>
Tested-by: Guilherme G. Piccoli <gpiccoli@canonical.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
We get a warning when CONFIG_SCSI_LOWLEVEL is disabled here:
WARNING: unmet direct dependencies detected for SCSI_FDOMAIN
Depends on [n]: SCSI_LOWLEVEL [=n] && SCSI [=y]
Selected by [m]:
- PCMCIA_FDOMAIN [=m] && SCSI_LOWLEVEL_PCMCIA [=y] && SCSI [=y] && PCMCIA [=y] && m && MODULES [=y]
Move all of SCSI_LOWLEVEL_PCMCIA inside of the existing SCSI_LOWLEVEL
section. Very few people use the PCMCIA support these days, and they likely
don't mind having to turn on SCSI_LOWLEVEL as well. This way we avoid the
link error and get a more sensible structure.
Fixes: 7d47fa065e62 ("scsi: fdomain: Add PCMCIA support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Some glue logic drivers support 1G without having GMAC/GMAC4/XGMAC.
Let's allow this speed by default.
Reported-by: Ondrej Jirman <megi@xff.cz>
Tested-by: Ondrej Jirman <megi@xff.cz>
Fixes: 5b0d7d7da6 ("net: stmmac: Add the missing speeds that XGMAC supports")
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jose Abreu says:
====================
net: stmmac: Two fixes
Two fixes targeting -net.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
We need the memory to be zeroed upon allocation so use kcalloc()
instead.
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
RX Descriptors are being cleaned after setting the buffers which may
lead to buffer addresses being wiped out.
Fix this by clearing earlier the RX Descriptors.
Fixes: 2af6106ae9 ("net: stmmac: Introducing support for Page Pool")
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit(net: phy: marvell: change default m88e1510 LED configuration),
the active LED of Hip07 devices is always off, because Hip07 just
use 2 LEDs.
This patch adds a phy_register_fixup_for_uid() for m88e1510 to
correct the LED configuration.
Fixes: 077772468e ("net: phy: marvell: change default m88e1510 LED configuration")
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Reviewed-by: linyunsheng <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
PPv2's XLGMAC can wait for 3 idle frames before triggering a link up
event. This can cause the link to be stuck low when there's traffic on
the interface, so disable this feature.
Fixes: 4bb0432628 ("net: mvpp2: phylink support")
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The module reset code in the Renesas CPG/MSSR driver uses
read-modify-write (RMW) operations to write to a Software Reset Register
(SRCRn), and simple writes to write to a Software Reset Clearing
Register (SRSTCLRn), as was mandated by the R-Car Gen2 and Gen3 Hardware
User's Manuals.
However, this may cause a race condition when two devices are reset in
parallel: if the reset for device A completes in the middle of the RMW
operation for device B, device A may be reset again, causing subtle
failures (e.g. i2c timeouts):
thread A thread B
-------- --------
val = SRCRn
val |= bit A
SRCRn = val
delay
val = SRCRn (bit A is set)
SRSTCLRn = bit A
(bit A in SRCRn is cleared)
val |= bit B
SRCRn = val (bit A and B are set)
This can be reproduced on e.g. Salvator-XS using:
$ while true; do i2cdump -f -y 4 0x6A b > /dev/null; done &
$ while true; do i2cdump -f -y 2 0x10 b > /dev/null; done &
i2c-rcar e6510000.i2c: error -110 : 40000002
i2c-rcar e66d8000.i2c: error -110 : 40000002
According to the R-Car Gen3 Hardware Manual Errata for Rev.
0.80 of Feb 28, 2018, reflected in Rev. 1.00 of the R-Car Gen3 Hardware
User's Manual, writes to SRCRn do not require read-modify-write cycles.
Note that the R-Car Gen2 Hardware User's Manual has not been updated
yet, and still says a read-modify-write sequence is required. According
to the hardware team, the reset hardware block is the same on both R-Car
Gen2 and Gen3, though.
Hence fix the issue by replacing the read-modify-write operations on
SRCRn by simple writes.
Reported-by: Yao Lihua <Lihua.Yao@desay-svautomotive.com>
Fixes: 6197aa65c4 ("clk: renesas: cpg-mssr: Add support for reset control")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Tested-by: Linh Phung <linh.phung.jy@renesas.com>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Make REGMAP_MMIO selected to avoid undefined reference to regmap symbols.
Fixes: d41f59fd92 ("clk: sprd: Add common infrastructure")
Signed-off-by: Chunyan Zhang <chunyan.zhang@unisoc.com>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
DT node for SiFive FU540-C000 GEMGXL Ethernet controller driver added
Signed-off-by: Yash Shah <yash.shah@sifive.com>
Reviewed-by: Sagar Kadam <sagar.kadam@sifive.com>
Cc: Andrew Lunn <andrew@lunn.ch>
[paul.walmsley@sifive.com: changed "phy1" to "phy0" at Andrew Lunn's
suggestion]
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
The 13MHz clock should be registered before clocksource driver is
initialized. Use CLK_OF_DECLARE_DRIVER() to guarantee.
Fixes: acddfc2c26 ("clk: mediatek: Add MT8183 clock support")
Cc: <stable@vger.kernel.org>
Signed-off-by: Weiyi Lu <weiyi.lu@mediatek.com>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Fix an incomplete devm_clk_bulk_get_optional() function documentation
by adding description of the num_clks argument as in other *clk_bulk*
functions.
Fixes: 9bd5ef0bd8 ("clk: Add devm_clk_bulk_get_optional() function")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
In clk_generated_determine_rate(), if the divisor is greater than
GENERATED_MAX_DIV + 1, then the wrong best_rate will be returned.
If clk_generated_set_rate() will be called later with this wrong
rate, it will return -EINVAL, so the generated clock won't change
its value. Do no let the divisor be greater than GENERATED_MAX_DIV + 1.
Fixes: 8c7aa63289 ("clk: at91: clk-generated: remove useless divisor loop")
Signed-off-by: Codrin Ciubotariu <codrin.ciubotariu@microchip.com>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Acked-by: Ludovic Desroches <ludovic.desroches@microchip.com>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
IEEE_8021QAZ_APP_SEL_STREAM is a valid selector for iSCSI connections, so
add code to use IEEE_8021QAZ_APP_SEL_STREAM selector to get priority mask.
Signed-off-by: Varun Prakash <varun@chelsio.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch add translations for:
- programming-languages
- kernel-docs (It is better to not translate this since English is
a requirement to get something useful out of it)
Signed-off-by: Federico Vaga <federico.vaga@vaga.pv.it>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Fix an off-by-one typo in the transparent huge pages admin
documentation.
Signed-off-by: Jeremy Cline <jcline@redhat.com>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
On vega20, there is an SMU message to query it. On navi, it's fetched
from the metrics table.
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
v2:
add function smu_default_set_performance_level as default dpm level handler.
change function name smu_set_performance_level to smu_asic_set_performance_level
v1:
1.NAVI10_PEAK_SCLK_XTX 1830 Mhz
2.NAVI10_PEAK_SCLK_XT 1755 Mhz
3.NAVI10_PEAK_SCLK_XL 1625 Mhz
Signed-off-by: Kevin Wang <kevin1.wang@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Jack Gui <Jack.Gui@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Some RISC-V systems include PCIe host controllers that support PCIe
message-signaled interrupts. For this to work on Linux, we need to
enable PCI_MSI_IRQ_DOMAIN and define struct msi_alloc_info. Support
for the latter is enabled by including the architecture-generic msi.h
include.
Signed-off-by: Wesley Terpstra <wesley@sifive.com>
[paul.walmsley@sifive.com: split initial patch into one arch/riscv
patch and one drivers/pci patch]
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
The "struct drm_connector" iteration cursor from
"for_each_new_connector_in_state" is never used in atomic_remove_fb()
which generates a compilation warning,
drivers/gpu/drm/drm_framebuffer.c: In function 'atomic_remove_fb':
drivers/gpu/drm/drm_framebuffer.c:838:24: warning: variable 'conn' set
but not used [-Wunused-but-set-variable]
Silence it by marking "conn" __maybe_unused.
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
Link: https://patchwork.freedesktop.org/patch/msgid/1563822886-13570-1-git-send-email-cai@lca.pw
The RISC-V port has grown significantly over the past year. Paul's been
helping out for a while ago. We agreed in person that he'd take over
collecting the patches and submitting the PRs, but it looks like I
forgot to make it official.
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
The statement sounds more like a literal translation
Signed-off-by: Federico Vaga <federico.vaga@vaga.pv.it>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
The patch translates the following patches in Italian:
d9d7c0c497 docs: Note that :c:func: should no longer be used
83e8b971f8 sphinx.rst: Add note about code snippets embedded in the text
cca5e0b8a4 Documentation: PGP: update for newer HW devices
Signed-off-by: Federico Vaga <federico.vaga@vaga.pv.it>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
The multicast code uses the lists bat_priv->mcast.want_all_rtr*_list to
store all all originator nodes which don't have the flag no-RTR4 or no-RTR6
set. When an originator is purged, it has to be removed from these lists.
Since all entries without the BATADV_MCAST_WANT_NO_RTR4/6 are stored in
these lists, they have to be handled like entries which have these flags
set to force the update routines to remove them from the lists when purging
the originator.
Not doing so will leave a pointer to a freed memory region inside the list.
Trying to operate on these lists will then cause an use-after-free error:
BUG: KASAN: use-after-free in batadv_mcast_want_rtr4_update+0x335/0x3a0 [batman_adv]
Write of size 8 at addr ffff888007b41a38 by task swapper/0/0
Fixes: 61caf3d109 ("batman-adv: mcast: detect, distribute and maintain multicast router presence")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Acked-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
The bucket variable is only updated outside the loop over the mcast_flags
buckets. It will only be updated during a dumping run when the dumping has
to be interrupted and a new message has to be started.
This could result in repeated or missing entries when the multicast flags
are dumped to userspace.
Fixes: d2d489b7d8 ("batman-adv: Add inconsistent multicast netlink dump detection")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
The following scenario was encountered during testing of logical
partition mobility on pseries partitions with bonded ibmvnic
adapters in LACP mode.
1. Driver receives a signal that the device has been
swapped, and it needs to reset to initialize the new
device.
2. Driver reports loss of carrier and begins initialization.
3. Bonding driver receives NETDEV_CHANGE notifier and checks
the slave's current speed and duplex settings. Because these
are unknown at the time, the bond sets its link state to
BOND_LINK_FAIL and handles the speed update, clearing
AD_PORT_LACP_ENABLE.
4. Driver finishes recovery and reports that the carrier is on.
5. Bond receives a new notification and checks the speed again.
The speeds are valid but miimon has not altered the link
state yet. AD_PORT_LACP_ENABLE remains off.
Because the slave's link state is still BOND_LINK_FAIL,
no further port checks are made when it recovers. Though
the slave devices are operational and have valid speed
and duplex settings, the bond will not send LACPDU's. The
simplest fix I can see is to force another speed check
in bond_miimon_commit. This way the bond will update
AD_PORT_LACP_ENABLE if needed when transitioning from
BOND_LINK_FAIL to BOND_LINK_UP.
CC: Jarod Wilson <jarod@redhat.com>
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a TID sequence error occurs while receiving TID RDMA READ RESP
packets, all packets after flow->flow_state.r_next_psn should be dropped,
including those response packets for subsequent segments.
The current implementation will drop the subsequent response packets for
the segment to complete next, but may accept packets for subsequent
segments and therefore mistakenly advance the r_next_psn fields for the
corresponding software flows. This may result in failures to complete
subsequent segments after the current segment is completed.
The fix is to only use the flow pointed by req->clear_tail for checking
KDETH PSN instead of finding a flow from the request's flow array.
Fixes: b885d5be9c ("IB/hfi1: Unify the software PSN check for TID RDMA READ/WRITE")
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20190715164540.74174.54702.stgit@awfm-01.aw.intel.com
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The field flow->resync_npkts is added for TID RDMA WRITE request and
zero-ed when a TID RDMA WRITE RESP packet is received by the requester.
This field is used to rewind a request during retry in the function
hfi1_tid_rdma_restart_req() shared by both TID RDMA WRITE and TID RDMA
READ requests. Therefore, when a TID RDMA READ request is retried, this
field may not be initialized at all, which causes the retry to start at an
incorrect psn, leading to the drop of the retry request by the responder.
This patch fixes the problem by zeroing out the field when the flow memory
is allocated.
Fixes: 838b6fd2d9 ("IB/hfi1: TID RDMA RcvArray programming and TID allocation")
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20190715164534.74174.6177.stgit@awfm-01.aw.intel.com
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
When an OPFN request is flushed, the request is completed without
unreserving itself from the send queue. Subsequently, when a new
request is post sent, the following warning will be triggered:
WARNING: CPU: 4 PID: 8130 at rdmavt/qp.c:1761 rvt_post_send+0x72a/0x880 [rdmavt]
Call Trace:
[<ffffffffbbb61e41>] dump_stack+0x19/0x1b
[<ffffffffbb497688>] __warn+0xd8/0x100
[<ffffffffbb4977cd>] warn_slowpath_null+0x1d/0x20
[<ffffffffc01c941a>] rvt_post_send+0x72a/0x880 [rdmavt]
[<ffffffffbb4dcabe>] ? account_entity_dequeue+0xae/0xd0
[<ffffffffbb61d645>] ? __kmalloc+0x55/0x230
[<ffffffffc04e1a4c>] ib_uverbs_post_send+0x37c/0x5d0 [ib_uverbs]
[<ffffffffc04e5e36>] ? rdma_lookup_put_uobject+0x26/0x60 [ib_uverbs]
[<ffffffffc04dbce6>] ib_uverbs_write+0x286/0x460 [ib_uverbs]
[<ffffffffbb6f9457>] ? security_file_permission+0x27/0xa0
[<ffffffffbb641650>] vfs_write+0xc0/0x1f0
[<ffffffffbb64246f>] SyS_write+0x7f/0xf0
[<ffffffffbbb74ddb>] system_call_fastpath+0x22/0x27
This patch fixes the problem by moving rvt_qp_wqe_unreserve() into
rvt_qp_complete_swqe() to simplify the code and make it less
error-prone.
Fixes: ca95f802ef ("IB/hfi1: Unreserve a reserved request when it is completed")
Link: https://lore.kernel.org/r/20190715164528.74174.31364.stgit@awfm-01.aw.intel.com
Cc: <stable@vger.kernel.org>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
When run perftest in many times, the system will report a BUG as follows:
BUG: Bad rss-counter state mm:(____ptrval____) idx:0 val:-1
BUG: Bad rss-counter state mm:(____ptrval____) idx:1 val:1
We tested with different kernel version and found it started from the the
following commit:
commit d10bcf947a ("RDMA/umem: Combine contiguous PAGE_SIZE regions in
SGEs")
In this commit, the sg->offset is always 0 when sg_set_page() is called in
ib_umem_get() and the drivers are not allowed to change the sgl, otherwise
it will get bad page descriptor when unfolding SGEs in __ib_umem_release()
as sg_page_count() will get wrong result while sgl->offset is not 0.
However, there is a weird sgl usage in the current hns driver, the driver
modified sg->offset after calling ib_umem_get(), which caused we iterate
past the wrong number of pages in for_each_sg_page iterator.
This patch fixes it by correcting the non-standard sgl usage found in the
hns_roce_db_map_user() function.
Fixes: d10bcf947a ("RDMA/umem: Combine contiguous PAGE_SIZE regions in SGEs")
Fixes: 0425e3e6e0 ("RDMA/hns: Support flush cqe for hip08 in kernel space")
Link: https://lore.kernel.org/r/1562808737-45723-1-git-send-email-oulijun@huawei.com
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
In snd_soc_dapm_new_control_unlocked(), a kernel buffer is allocated in
dapm_cnew_widget() to hold the new dapm widget. Then, different actions are
taken according to the id of the widget, i.e., 'w->id'. If any failure
occurs during this process, snd_soc_dapm_new_control_unlocked() should be
terminated by going to the 'request_failed' label. However, the allocated
kernel buffer is not freed on this code path, leading to a memory leak bug.
To fix the above issue, free the buffer before returning from
snd_soc_dapm_new_control_unlocked() through the 'request_failed' label.
Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
Link: https://lore.kernel.org/r/1563803864-2809-1-git-send-email-wang6495@umn.edu
Signed-off-by: Mark Brown <broonie@kernel.org>
Pull preemption Kconfig fix from Thomas Gleixner:
"The PREEMPT_RT stub config renamed PREEMPT to PREEMPT_LL and defined
PREEMPT outside of the menu and made it selectable by both PREEMPT_LL
and PREEMPT_RT.
Stupid me missed that 114 defconfigs select CONFIG_PREEMPT which
obviously can't work anymore. oldconfig builds are affected as well,
but it's more obvious as the user gets asked. [old]defconfig silently
fixes it up and selects PREEMPT_NONE.
Unbreak it by undoing the rename and adding a intermediate config
symbol which is selected by both PREEMPT and PREEMPT_RT. That requires
to chase down a few #ifdefs, but it's better than tweaking 114
defconfigs and annoying users"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/rt, Kconfig: Unbreak def/oldconfig with CONFIG_PREEMPT=y
Pull pidfd polling fix from Christian Brauner:
"A fix for pidfd polling. It ensures that the task's exit state is
visible to all waiters"
* tag 'for-linus-20190722' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
pidfd: fix a poll race when setting exit_state
Pull btrfs fixes from David Sterba:
- fixes for leaks caused by recently merged patches
- one build fix
- a fix to prevent mixing of incompatible features
* tag 'for-5.3-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: don't leak extent_map in btrfs_get_io_geometry()
btrfs: free checksum hash on in close_ctree
btrfs: Fix build error while LIBCRC32C is module
btrfs: inode: Don't compress if NODATASUM or NODATACOW set
The merge of the CONFIG_PREEMPT_RT stub renamed CONFIG_PREEMPT to
CONFIG_PREEMPT_LL which causes all defconfigs which have CONFIG_PREEMPT=y
set to fall back to CONFIG_PREEMPT_NONE because CONFIG_PREEMPT depends on
the preemption mode choice wich defaults to NONE. This also affects
oldconfig builds.
So rather than changing 114 defconfig files and being an annoyance to
users, revert the rename and select a new config symbol PREEMPTION. That
keeps everything working smoothly and the revelant ifdef's are going to be
fixed up step by step.
Reported-by: Mark Rutland <mark.rutland@arm.com>
Fixes: a50a3f4b6a ("sched/rt, Kconfig: Introduce CONFIG_PREEMPT_RT")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Pull media fixes from Mauro Carvalho Chehab:
"For two regressions in media core:
- v4l2-subdev: fix regression in check_pad()
- videodev2.h: change V4L2_PIX_FMT_BGRA444 define: fourcc was already
in use"
* tag 'media/v5.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
media: videodev2.h: change V4L2_PIX_FMT_BGRA444 define: fourcc was already in use
media: v4l2-subdev: fix regression in check_pad()
Commit dd5142ca5d ("iommu/vt-d: Add debugfs support to show scalable mode
DMAR table internals") prints content of pasid table entries from LSB to
MSB where as other entries are printed MSB to LSB. So, to maintain
uniformity among all entries and to not confuse the user, print MSB first.
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Cc: Sohil Mehta <sohil.mehta@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Fixes: dd5142ca5d ("iommu/vt-d: Add debugfs support to show scalable mode DMAR table internals")
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Since the cached32_node is allowed to be advanced above dma_32bit_pfn
(to provide a shortcut into the limited range), we need to be careful to
remove the to be freed node if it is the cached32_node.
[ 48.477773] BUG: KASAN: use-after-free in __cached_rbnode_delete_update+0x68/0x110
[ 48.477812] Read of size 8 at addr ffff88870fc19020 by task kworker/u8:1/37
[ 48.477843]
[ 48.477879] CPU: 1 PID: 37 Comm: kworker/u8:1 Tainted: G U 5.2.0+ #735
[ 48.477915] Hardware name: Intel Corporation NUC7i5BNK/NUC7i5BNB, BIOS BNKBL357.86A.0052.2017.0918.1346 09/18/2017
[ 48.478047] Workqueue: i915 __i915_gem_free_work [i915]
[ 48.478075] Call Trace:
[ 48.478111] dump_stack+0x5b/0x90
[ 48.478137] print_address_description+0x67/0x237
[ 48.478178] ? __cached_rbnode_delete_update+0x68/0x110
[ 48.478212] __kasan_report.cold.3+0x1c/0x38
[ 48.478240] ? __cached_rbnode_delete_update+0x68/0x110
[ 48.478280] ? __cached_rbnode_delete_update+0x68/0x110
[ 48.478308] __cached_rbnode_delete_update+0x68/0x110
[ 48.478344] private_free_iova+0x2b/0x60
[ 48.478378] iova_magazine_free_pfns+0x46/0xa0
[ 48.478403] free_iova_fast+0x277/0x340
[ 48.478443] fq_ring_free+0x15a/0x1a0
[ 48.478473] queue_iova+0x19c/0x1f0
[ 48.478597] cleanup_page_dma.isra.64+0x62/0xb0 [i915]
[ 48.478712] __gen8_ppgtt_cleanup+0x63/0x80 [i915]
[ 48.478826] __gen8_ppgtt_cleanup+0x42/0x80 [i915]
[ 48.478940] __gen8_ppgtt_clear+0x433/0x4b0 [i915]
[ 48.479053] __gen8_ppgtt_clear+0x462/0x4b0 [i915]
[ 48.479081] ? __sg_free_table+0x9e/0xf0
[ 48.479116] ? kfree+0x7f/0x150
[ 48.479234] i915_vma_unbind+0x1e2/0x240 [i915]
[ 48.479352] i915_vma_destroy+0x3a/0x280 [i915]
[ 48.479465] __i915_gem_free_objects+0xf0/0x2d0 [i915]
[ 48.479579] __i915_gem_free_work+0x41/0xa0 [i915]
[ 48.479607] process_one_work+0x495/0x710
[ 48.479642] worker_thread+0x4c7/0x6f0
[ 48.479687] ? process_one_work+0x710/0x710
[ 48.479724] kthread+0x1b2/0x1d0
[ 48.479774] ? kthread_create_worker_on_cpu+0xa0/0xa0
[ 48.479820] ret_from_fork+0x1f/0x30
[ 48.479864]
[ 48.479907] Allocated by task 631:
[ 48.479944] save_stack+0x19/0x80
[ 48.479994] __kasan_kmalloc.constprop.6+0xc1/0xd0
[ 48.480038] kmem_cache_alloc+0x91/0xf0
[ 48.480082] alloc_iova+0x2b/0x1e0
[ 48.480125] alloc_iova_fast+0x58/0x376
[ 48.480166] intel_alloc_iova+0x90/0xc0
[ 48.480214] intel_map_sg+0xde/0x1f0
[ 48.480343] i915_gem_gtt_prepare_pages+0xb8/0x170 [i915]
[ 48.480465] huge_get_pages+0x232/0x2b0 [i915]
[ 48.480590] ____i915_gem_object_get_pages+0x40/0xb0 [i915]
[ 48.480712] __i915_gem_object_get_pages+0x90/0xa0 [i915]
[ 48.480834] i915_gem_object_prepare_write+0x2d6/0x330 [i915]
[ 48.480955] create_test_object.isra.54+0x1a9/0x3e0 [i915]
[ 48.481075] igt_shared_ctx_exec+0x365/0x3c0 [i915]
[ 48.481210] __i915_subtests.cold.4+0x30/0x92 [i915]
[ 48.481341] __run_selftests.cold.3+0xa9/0x119 [i915]
[ 48.481466] i915_live_selftests+0x3c/0x70 [i915]
[ 48.481583] i915_pci_probe+0xe7/0x220 [i915]
[ 48.481620] pci_device_probe+0xe0/0x180
[ 48.481665] really_probe+0x163/0x4e0
[ 48.481710] device_driver_attach+0x85/0x90
[ 48.481750] __driver_attach+0xa5/0x180
[ 48.481796] bus_for_each_dev+0xda/0x130
[ 48.481831] bus_add_driver+0x205/0x2e0
[ 48.481882] driver_register+0xca/0x140
[ 48.481927] do_one_initcall+0x6c/0x1af
[ 48.481970] do_init_module+0x106/0x350
[ 48.482010] load_module+0x3d2c/0x3ea0
[ 48.482058] __do_sys_finit_module+0x110/0x180
[ 48.482102] do_syscall_64+0x62/0x1f0
[ 48.482147] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 48.482190]
[ 48.482224] Freed by task 37:
[ 48.482273] save_stack+0x19/0x80
[ 48.482318] __kasan_slab_free+0x12e/0x180
[ 48.482363] kmem_cache_free+0x70/0x140
[ 48.482406] __free_iova+0x1d/0x30
[ 48.482445] fq_ring_free+0x15a/0x1a0
[ 48.482490] queue_iova+0x19c/0x1f0
[ 48.482624] cleanup_page_dma.isra.64+0x62/0xb0 [i915]
[ 48.482749] __gen8_ppgtt_cleanup+0x63/0x80 [i915]
[ 48.482873] __gen8_ppgtt_cleanup+0x42/0x80 [i915]
[ 48.482999] __gen8_ppgtt_clear+0x433/0x4b0 [i915]
[ 48.483123] __gen8_ppgtt_clear+0x462/0x4b0 [i915]
[ 48.483250] i915_vma_unbind+0x1e2/0x240 [i915]
[ 48.483378] i915_vma_destroy+0x3a/0x280 [i915]
[ 48.483500] __i915_gem_free_objects+0xf0/0x2d0 [i915]
[ 48.483622] __i915_gem_free_work+0x41/0xa0 [i915]
[ 48.483659] process_one_work+0x495/0x710
[ 48.483704] worker_thread+0x4c7/0x6f0
[ 48.483748] kthread+0x1b2/0x1d0
[ 48.483787] ret_from_fork+0x1f/0x30
[ 48.483831]
[ 48.483868] The buggy address belongs to the object at ffff88870fc19000
[ 48.483868] which belongs to the cache iommu_iova of size 40
[ 48.483920] The buggy address is located 32 bytes inside of
[ 48.483920] 40-byte region [ffff88870fc19000, ffff88870fc19028)
[ 48.483964] The buggy address belongs to the page:
[ 48.484006] page:ffffea001c3f0600 refcount:1 mapcount:0 mapping:ffff8888181a91c0 index:0x0 compound_mapcount: 0
[ 48.484045] flags: 0x8000000000010200(slab|head)
[ 48.484096] raw: 8000000000010200 ffffea001c421a08 ffffea001c447e88 ffff8888181a91c0
[ 48.484141] raw: 0000000000000000 0000000000120012 00000001ffffffff 0000000000000000
[ 48.484188] page dumped because: kasan: bad access detected
[ 48.484230]
[ 48.484265] Memory state around the buggy address:
[ 48.484314] ffff88870fc18f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 48.484361] ffff88870fc18f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 48.484406] >ffff88870fc19000: fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc
[ 48.484451] ^
[ 48.484494] ffff88870fc19080: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 48.484530] ffff88870fc19100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108602
Fixes: e60aa7b538 ("iommu/iova: Extend rbtree node caching")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: <stable@vger.kernel.org> # v4.15+
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Pull networking fixes from David Miller:
1) Several netfilter fixes including a nfnetlink deadlock fix from
Florian Westphal and fix for dropping VRF packets from Miaohe Lin.
2) Flow offload fixes from Pablo Neira Ayuso including a fix to restore
proper block sharing.
3) Fix r8169 PHY init from Thomas Voegtle.
4) Fix memory leak in mac80211, from Lorenzo Bianconi.
5) Missing NULL check on object allocation in cxgb4, from Navid
Emamdoost.
6) Fix scaling of RX power in sfp phy driver, from Andrew Lunn.
7) Check that there is actually an ip header to access in skb->data in
VRF, from Peter Kosyh.
8) Remove spurious rcu unlock in hv_netvsc, from Haiyang Zhang.
9) One more tweak the the TCP fragmentation memory limit changes, to be
less harmful to applications setting small SO_SNDBUF values. From
Eric Dumazet.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (40 commits)
tcp: be more careful in tcp_fragment()
hv_netvsc: Fix extra rcu_read_unlock in netvsc_recv_callback()
vrf: make sure skb->data contains ip header to make routing
connector: remove redundant input callback from cn_dev
qed: Prefer pcie_capability_read_word()
igc: Prefer pcie_capability_read_word()
cxgb4: Prefer pcie_capability_read_word()
be2net: Synchronize be_update_queues with dev_watchdog
bnx2x: Prevent load reordering in tx completion processing
net: phy: sfp: hwmon: Fix scaling of RX power
net: sched: verify that q!=NULL before setting q->flags
chelsio: Fix a typo in a function name
allocate_flower_entry: should check for null deref
net: hns3: typo in the name of a constant
kbuild: add net/netfilter/nf_tables_offload.h to header-test blacklist.
tipc: Fix a typo
mac80211: don't warn about CW params when not using them
mac80211: fix possible memory leak in ieee80211_assign_beacon
nl80211: fix NL80211_HE_MAX_CAPABILITY_LEN
nl80211: fix VENDOR_CMD_RAW_DATA
...
There is a couple of places where on domain_init() failure domain_exit()
is called. While currently domain_init() can fail only if
alloc_pgtable_page() has failed.
Make domain_exit() check if domain->pgd present, before calling
domain_unmap(), as it theoretically should crash on clearing pte entries
in dma_pte_clear_level().
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Dmitry Safonov <dima@arista.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Intel VT-d driver was reworked to use common deferred flushing
implementation. Previously there was one global per-cpu flush queue,
afterwards - one per domain.
Before deferring a flush, the queue should be allocated and initialized.
Currently only domains with IOMMU_DOMAIN_DMA type initialize their flush
queue. It's probably worth to init it for static or unmanaged domains
too, but it may be arguable - I'm leaving it to iommu folks.
Prevent queuing an iova flush if the domain doesn't have a queue.
The defensive check seems to be worth to keep even if queue would be
initialized for all kinds of domains. And is easy backportable.
On 4.19.43 stable kernel it has a user-visible effect: previously for
devices in si domain there were crashes, on sata devices:
BUG: spinlock bad magic on CPU#6, swapper/0/1
lock: 0xffff88844f582008, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0
CPU: 6 PID: 1 Comm: swapper/0 Not tainted 4.19.43 #1
Call Trace:
<IRQ>
dump_stack+0x61/0x7e
spin_bug+0x9d/0xa3
do_raw_spin_lock+0x22/0x8e
_raw_spin_lock_irqsave+0x32/0x3a
queue_iova+0x45/0x115
intel_unmap+0x107/0x113
intel_unmap_sg+0x6b/0x76
__ata_qc_complete+0x7f/0x103
ata_qc_complete+0x9b/0x26a
ata_qc_complete_multiple+0xd0/0xe3
ahci_handle_port_interrupt+0x3ee/0x48a
ahci_handle_port_intr+0x73/0xa9
ahci_single_level_irq_intr+0x40/0x60
__handle_irq_event_percpu+0x7f/0x19a
handle_irq_event_percpu+0x32/0x72
handle_irq_event+0x38/0x56
handle_edge_irq+0x102/0x121
handle_irq+0x147/0x15c
do_IRQ+0x66/0xf2
common_interrupt+0xf/0xf
RIP: 0010:__do_softirq+0x8c/0x2df
The same for usb devices that use ehci-pci:
BUG: spinlock bad magic on CPU#0, swapper/0/1
lock: 0xffff88844f402008, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.19.43 #4
Call Trace:
<IRQ>
dump_stack+0x61/0x7e
spin_bug+0x9d/0xa3
do_raw_spin_lock+0x22/0x8e
_raw_spin_lock_irqsave+0x32/0x3a
queue_iova+0x77/0x145
intel_unmap+0x107/0x113
intel_unmap_page+0xe/0x10
usb_hcd_unmap_urb_setup_for_dma+0x53/0x9d
usb_hcd_unmap_urb_for_dma+0x17/0x100
unmap_urb_for_dma+0x22/0x24
__usb_hcd_giveback_urb+0x51/0xc3
usb_giveback_urb_bh+0x97/0xde
tasklet_action_common.isra.4+0x5f/0xa1
tasklet_action+0x2d/0x30
__do_softirq+0x138/0x2df
irq_exit+0x7d/0x8b
smp_apic_timer_interrupt+0x10f/0x151
apic_timer_interrupt+0xf/0x20
</IRQ>
RIP: 0010:_raw_spin_unlock_irqrestore+0x17/0x39
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Cc: iommu@lists.linux-foundation.org
Cc: <stable@vger.kernel.org> # 4.14+
Fixes: 13cf017446 ("iommu/vt-d: Make use of iova deferred flushing")
Signed-off-by: Dmitry Safonov <dima@arista.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
clang-9 points out that there are two variables that depending on the
configuration may only be used in an ARRAY_SIZE() expression but not
referenced:
drivers/dma/ste_dma40.c:145:12: error: variable 'd40_backup_regs' is not needed and will not be emitted [-Werror,-Wunneeded-internal-declaration]
static u32 d40_backup_regs[] = {
^
drivers/dma/ste_dma40.c:214:12: error: variable 'd40_backup_regs_chan' is not needed and will not be emitted [-Werror,-Wunneeded-internal-declaration]
static u32 d40_backup_regs_chan[] = {
Mark these __maybe_unused to shut up the warning.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20190712091357.744515-1-arnd@arndb.de
Signed-off-by: Vinod Koul <vkoul@kernel.org>
When building with 'make C=1', sparse reports an endianess bug:
drivers/dma/dw-edma/dw-edma-v0-debugfs.c:60:30: warning: cast removes address space of expression
drivers/dma/dw-edma/dw-edma-v0-debugfs.c:86:24: warning: incorrect type in argument 1 (different address spaces)
drivers/dma/dw-edma/dw-edma-v0-debugfs.c:86:24: expected void const volatile [noderef] <asn:2>*addr
drivers/dma/dw-edma/dw-edma-v0-debugfs.c:86:24: got void *[assigned] ptr
drivers/dma/dw-edma/dw-edma-v0-debugfs.c:86:24: warning: incorrect type in argument 1 (different address spaces)
drivers/dma/dw-edma/dw-edma-v0-debugfs.c:86:24: expected void const volatile [noderef] <asn:2>*addr
drivers/dma/dw-edma/dw-edma-v0-debugfs.c:86:24: got void *[assigned] ptr
drivers/dma/dw-edma/dw-edma-v0-debugfs.c:86:24: warning: incorrect type in argument 1 (different address spaces)
drivers/dma/dw-edma/dw-edma-v0-debugfs.c:86:24: expected void const volatile [noderef] <asn:2>*addr
drivers/dma/dw-edma/dw-edma-v0-debugfs.c:86:24: got void *[assigned] ptr
The current code is clearly wrong, as it passes an endian-swapped word
into a register function where it gets swapped again. Just pass the variables
directly into lower_32_bits()/upper_32_bits().
Fixes: 7e4b8a4fbe ("dmaengine: Add Synopsys eDMA IP version 0 support")
Link: https://lore.kernel.org/lkml/20190617131820.2470686-1-arnd@arndb.de/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Gustavo Pimentel <gustavo.pimentel@synopsys.com>
Link: https://lore.kernel.org/r/20190722124457.1093886-3-arnd@arndb.de
Signed-off-by: Vinod Koul <vkoul@kernel.org>
The new driver mixes up dma_addr_t and __iomem pointers, which results
in warnings on some 32-bit architectures, like:
drivers/dma/dw-edma/dw-edma-v0-core.c: In function '__dw_regs':
drivers/dma/dw-edma/dw-edma-v0-core.c:28:9: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast]
return (struct dw_edma_v0_regs __iomem *)dw->rg_region.vaddr;
Make it use __iomem pointers consistently here, and avoid using dma_addr_t
for __iomem tokens altogether.
A small complication here is the debugfs code, which passes an __iomem
token as the private data for debugfs files, requiring the use of
extra __force.
Fixes: 7e4b8a4fbe ("dmaengine: Add Synopsys eDMA IP version 0 support")
Link: https://lore.kernel.org/lkml/20190617131918.2518727-1-arnd@arndb.de/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20190722124457.1093886-2-arnd@arndb.de
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Putting large constant data on the stack causes unnecessary overhead
and stack usage:
drivers/dma/dw-edma/dw-edma-v0-debugfs.c:285:6: error: stack frame size of 1376 bytes in function 'dw_edma_v0_debugfs_on' [-Werror,-Wframe-larger-than=]
Mark the variable 'static const' in order for the compiler to move it
into the .rodata section where it does no such harm.
Fixes: 305aebeff8 ("dmaengine: Add Synopsys eDMA IP version 0 debugfs support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Gustavo Pimentel <gustavo.pimentel@synopsys.com>
Link: https://lore.kernel.org/r/20190722124457.1093886-1-arnd@arndb.de
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Comparing the arm-arm's pseudocode for AArch64.PCAlignmentFault() with
AArch64.SPAlignmentFault() shows that SP faults don't copy the faulty-SP
to FAR_EL1, but this is where we read from, and the address we provide
to user-space with the BUS_ADRALN signal.
For user-space this value will be UNKNOWN due to the previous ERET to
user-space. If the last value is preserved, on systems with KASLR or KPTI
this will be the user-space link-register left in FAR_EL1 by tramp_exit().
Fix this to retrieve the original sp_el0 value, and pass this to
do_sp_pc_fault().
SP alignment faults from EL1 will cause us to take the fault again when
trying to store the pt_regs. This eventually takes us to the overflow
stack. Remove the ESR_ELx_EC_SP_ALIGN check as we will never make it
this far.
Fixes: 60ffc30d56 ("arm64: Exception handling")
Signed-off-by: James Morse <james.morse@arm.com>
[will: change label name and fleshed out comment]
Signed-off-by: Will Deacon <will@kernel.org>
A #GP is reported in the guest when requesting balloon inflation via
virtio-balloon. The reason is that the virtio-balloon driver has
removed the page from its internal page list (via balloon_page_pop),
but balloon_page_enqueue_one also calls "list_del" to do the removal.
This is necessary when it's used from balloon_page_enqueue_list, but
not from balloon_page_enqueue.
Move list_del to balloon_page_enqueue, and update comments accordingly.
Fixes: 418a3ab1e7 (mm/balloon_compaction: List interfaces)
Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
As we have abandoned the home-made lazy domain allocation
and delegated the DMA domain life cycle up to the default
domain mechanism defined in the generic iommu layer, we
needn't consider pci alias anymore when mapping/unmapping
the context entries. Without this fix, we see kernel NULL
pointer dereference during pci device hot-plug test.
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Fixes: fa954e6831 ("iommu/vt-d: Delegate the dma domain to upper layer")
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reported-and-tested-by: Xu Pengfei <pengfei.xu@intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
On a CPU that doesn't support SSBS, PSTATE[12] is RES0. In a system
where only some of the CPUs implement SSBS, we end-up losing track of
the SSBS bit across task migration.
To address this issue, let's force the SSBS bit on context switch.
Fixes: 8f04e8e6e2 ("arm64: ssbd: Add support for PSTATE.SSBS rather than trapping to EL3")
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
[will: inverted logic and added comments]
Signed-off-by: Will Deacon <will@kernel.org>
This reverts commit 123b2ffc37.
This commit reportedly caused boot failures on some systems
and needs to be reverted for now.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
"sendmsg6: rewrite IP & port (C)" fails on s390, because the code in
sendmsg_v6_prog() assumes that (ctx->user_ip6[0] & 0xFFFF) refers to
leading IPv6 address digits, which is not the case on big-endian
machines.
Since checking bitwise operations doesn't seem to be the point of the
test, replace two short comparisons with a single int comparison.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Acked-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
memory malloced in bch_cached_dev_run() and should be freed before
leaving from the error handling cases, otherwise it will cause
memory leak.
Fixes: 0b13efecf5 ("bcache: add return value check to bch_cached_dev_run()")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jakub Kicinski says:
====================
John says:
Resolve a series of splats discovered by syzbot and an unhash
TLS issue noted by Eric Dumazet.
The main issues revolved around interaction between TLS and
sockmap tear down. TLS and sockmap could both reset sk->prot
ops creating a condition where a close or unhash op could be
called forever. A rare race condition resulting from a missing
rcu sync operation was causing a use after free. Then on the
TLS side dropping the sock lock and re-acquiring it during the
close op could hang. Finally, sockmap must be deployed before
tls for current stack assumptions to be met. This is enforced
now. A feature series can enable it.
To fix this first refactor TLS code so the lock is held for the
entire teardown operation. Then add an unhash callback to ensure
TLS can not transition from ESTABLISHED to LISTEN state. This
transition is a similar bug to the one found and fixed previously
in sockmap. Then apply three fixes to sockmap to fix up races
on tear down around map free and close. Finally, if sockmap
is destroyed before TLS we add a new ULP op update to inform
the TLS stack it should not call sockmap ops. This last one
appears to be the most commonly found issue from syzbot.
v4:
- fix some use after frees;
- disable disconnect work for offload (ctx lifetime is much
more complex);
- remove some of the dead code which made it hard to understand
(for me) that things work correctly (e.g. the checks TLS is
the top ULP);
- add selftets.
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add test which sends some data with MSG_MORE and then
closes the socket (never calling send without MSG_MORE).
This should make sure we clean up open records correctly.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
When a map free is called and in parallel a socket is closed we
have two paths that can potentially reset the socket prot ops, the
bpf close() path and the map free path. This creates a problem
with which prot ops should be used from the socket closed side.
If the map_free side completes first then we want to call the
original lowest level ops. However, if the tls path runs first
we want to call the sockmap ops. Additionally there was no locking
around prot updates in TLS code paths so the prot ops could
be changed multiple times once from TLS path and again from sockmap
side potentially leaving ops pointed at either TLS or sockmap
when psock and/or tls context have already been destroyed.
To fix this race first only update ops inside callback lock
so that TLS, sockmap and lowest level all agree on prot state.
Second and a ULP callback update() so that lower layers can
inform the upper layer when they are being removed allowing the
upper layer to reset prot ops.
This gets us close to allowing sockmap and tls to be stacked
in arbitrary order but will save that patch for *next trees.
v4:
- make sure we don't free things for device;
- remove the checks which swap the callbacks back
only if TLS is at the top.
Reported-by: syzbot+06537213db7ba2745c4a@syzkaller.appspotmail.com
Fixes: 02c558b2d5 ("bpf: sockmap, support for msg_peek in sk_msg with redirect ingress")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Sockmap does not currently support adding sockets after TLS has been
enabled. There never was a real use case for this so it was never
added. But, we lost the test for ULP at some point so add it here
and fail the socket insert if TLS is enabled. Future work could
make sockmap support this use case but fixup the bug here.
Fixes: 604326b41a ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
We need to have a synchronize_rcu before free'ing the sockmap because
any outstanding psock references will have a pointer to the map and
when they use this could trigger a use after free.
Fixes: 604326b41a ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
__sock_map_delete() may be called from a tcp event such as unhash or
close from the following trace,
tcp_bpf_close()
tcp_bpf_remove()
sk_psock_unlink()
sock_map_delete_from_link()
__sock_map_delete()
In this case the sock lock is held but this only protects against
duplicate removals on the TCP side. If the map is free'd then we have
this trace,
sock_map_free
xchg() <- replaces map entry
sock_map_unref()
sk_psock_put()
sock_map_del_link()
The __sock_map_delete() call however uses a read, test, null over the
map entry which can result in both paths trying to free the map
entry.
To fix use xchg in TCP paths as well so we avoid having two references
to the same map entry.
Fixes: 604326b41a ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
It is possible (via shutdown()) for TCP socks to go through TCP_CLOSE
state via tcp_disconnect() without actually calling tcp_close which
would then call the tls close callback. Because of this a user could
disconnect a socket then put it in a LISTEN state which would break
our assumptions about sockets always being ESTABLISHED state.
More directly because close() can call unhash() and unhash is
implemented by sockmap if a sockmap socket has TLS enabled we can
incorrectly destroy the psock from unhash() and then call its close
handler again. But because the psock (sockmap socket representation)
is already destroyed we call close handler in sk->prot. However,
in some cases (TLS BASE/BASE case) this will still point at the
sockmap close handler resulting in a circular call and crash reported
by syzbot.
To fix both above issues implement the unhash() routine for TLS.
v4:
- add note about tls offload still needing the fix;
- move sk_proto to the cold cache line;
- split TX context free into "release" and "free",
otherwise the GC work itself is in already freed
memory;
- more TX before RX for consistency;
- reuse tls_ctx_free();
- schedule the GC work after we're done with context
to avoid UAF;
- don't set the unhash in all modes, all modes "inherit"
TLS_BASE's callbacks anyway;
- disable the unhash hook for TLS_HW.
Fixes: 3c4d755915 ("tls: kernel TLS support")
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The tls close() callback currently drops the sock lock to call
strp_done(). Split up the RX cleanup into stopping the strparser
and releasing most resources, syncing strparser and finally
freeing the context.
To avoid the need for a strp_done() call on the cleanup path
of device offload make sure we don't arm the strparser until
we are sure init will be successful.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The tls close() callback currently drops the sock lock, makes a
cancel_delayed_work_sync() call, and then relocks the sock.
By restructuring the code we can avoid droping lock and then
reclaiming it. To simplify this we do the following,
tls_sk_proto_close
set_bit(CLOSING)
set_bit(SCHEDULE)
cancel_delay_work_sync() <- cancel workqueue
lock_sock(sk)
...
release_sock(sk)
strp_done()
Setting the CLOSING bit prevents the SCHEDULE bit from being
cleared by any workqueue items e.g. if one happens to be
scheduled and run between when we set SCHEDULE bit and cancel
work. Then because SCHEDULE bit is set now no new work will
be scheduled.
Tested with net selftests and bpf selftests.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The deprecated TOE offload doesn't actually do anything in
tls_sk_proto_close() - all TLS code is skipped and context
not freed. Remove the callback to make it easier to refactor
tls_sk_proto_close().
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
In tls_set_device_offload_rx() we prepare the software context
for RX fallback and proceed to add the connection to the device.
Unfortunately, software context prep includes arming strparser
so in case of a later error we have to release the socket lock
to call strp_done().
In preparation for not releasing the socket lock half way through
callbacks move arming strparser into a separate function.
Following patches will make use of that.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
There is a race between reading task->exit_state in pidfd_poll and
writing it after do_notify_parent calls do_notify_pidfd. Expected
sequence of events is:
CPU 0 CPU 1
------------------------------------------------
exit_notify
do_notify_parent
do_notify_pidfd
tsk->exit_state = EXIT_DEAD
pidfd_poll
if (tsk->exit_state)
However nothing prevents the following sequence:
CPU 0 CPU 1
------------------------------------------------
exit_notify
do_notify_parent
do_notify_pidfd
pidfd_poll
if (tsk->exit_state)
tsk->exit_state = EXIT_DEAD
This causes a polling task to wait forever, since poll blocks because
exit_state is 0 and the waiting task is not notified again. A stress
test continuously doing pidfd poll and process exits uncovered this bug.
To fix it, we make sure that the task's exit_state is always set before
calling do_notify_pidfd.
Fixes: b53b0b9d9a ("pidfd: add polling support")
Cc: kernel-team@android.com
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Link: https://lore.kernel.org/r/20190717172100.261204-1-joel@joelfernandes.org
[christian@brauner.io: adapt commit message and drop unneeded changes from wait_task_zombie]
Signed-off-by: Christian Brauner <christian@brauner.io>
Update MAINTAINERS and .mailmap with my @linaro.org address, since I
don't have access to my @arm.com address anymore.
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Will Deacon <will@kernel.org>
Picking up 7aaddd96d5 ("drm/modes: Don't apply cmdline's rotation if it wasn't specified")
from drm-misc-next-fixes. It missed the merge window.
Signed-off-by: Sean Paul <seanpaul@chromium.org>
When a pin is active-low, logical trigger edge should be inverted to match
the same interrupt opportunity.
For example, a button pushed triggers falling edge in ACTIVE_HIGH case; in
ACTIVE_LOW case, the button pushed triggers rising edge. For user space the
IRQ requesting doesn't need to do any modification except to configuring
GPIOHANDLE_REQUEST_ACTIVE_LOW.
For example, we want to catch the event when the button is pushed. The
button on the original board drives level to be low when it is pushed, and
drives level to be high when it is released.
In user space we can do:
req.handleflags = GPIOHANDLE_REQUEST_INPUT;
req.eventflags = GPIOEVENT_REQUEST_FALLING_EDGE;
while (1) {
read(fd, &dat, sizeof(dat));
if (dat.id == GPIOEVENT_EVENT_FALLING_EDGE)
printf("button pushed\n");
}
Run the same logic on another board which the polarity of the button is
inverted; it drives level to be high when pushed, and level to be low when
released. For this inversion we add flag GPIOHANDLE_REQUEST_ACTIVE_LOW:
req.handleflags = GPIOHANDLE_REQUEST_INPUT |
GPIOHANDLE_REQUEST_ACTIVE_LOW;
req.eventflags = GPIOEVENT_REQUEST_FALLING_EDGE;
At the result, there are no any events caught when the button is pushed.
By the way, button releasing will emit a "falling" event. The timing of
"falling" catching is not expected.
Cc: stable@vger.kernel.org
Signed-off-by: Michael Wu <michael.wu@vatics.com>
Tested-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
While using the mmc_spi driver occasionally errors like this popped up:
mmcblk0: error -84 transferring data end_request: I/O error, dev mmcblk0, sector 581756
I looked on the Internet for occurrences of the same problem and came
across a helpful post [1]. It includes source code to reproduce the bug.
There is also an analysis about the cause. During transmission data in the
supplied buffer is being modified. Thus the previously calculated checksum
is not correct anymore.
After some digging I found out that device drivers are supposed to report
they need stable writes. To fix this I set the appropriate flag at queue
initialization if CRC checksumming is enabled for that SPI host.
[1]
https://groups.google.com/forum/#!msg/sim1/gLlzWeXGFr8/KevXinUXfc8J
Signed-off-by: Andreas Koop <andreas.koop@zf.com>
[shihpo: Rebase on top of v5.3-rc1]
Signed-off-by: ShihPo Hung <shihpo.hung@sifive.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
CC: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
In some cases initial bind of scm memory for an lpar can fail if
previously it wasn't released using a scm-unbind hcall. This situation
can arise due to panic of the previous kernel or forced lpar
fadump. In such cases the H_SCM_BIND_MEM return a H_OVERLAP error.
To mitigate such cases the patch updates papr_scm_probe() to force a
call to drc_pmem_unbind() in case the initial bind of scm memory fails
with EBUSY error. In case scm-bind operation again fails after the
forced scm-unbind then we follow the existing error path. We also
update drc_pmem_bind() to handle the H_OVERLAP error returned by phyp
and indicate it as a EBUSY error back to the caller.
Suggested-by: "Oliver O'Halloran" <oohall@gmail.com>
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Reviewed-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190629160610.23402-4-vaibhav@linux.ibm.com
The new hcall named H_SCM_UNBIND_ALL has been introduce that can
unbind all or specific scm memory assigned to an lpar. This is
more efficient than using H_SCM_UNBIND_MEM as currently we don't
support partial unbind of scm memory.
Hence this patch proposes following changes to drc_pmem_unbind():
* Update drc_pmem_unbind() to replace hcall H_SCM_UNBIND_MEM to
H_SCM_UNBIND_ALL.
* Update drc_pmem_unbind() to handles cases when PHYP asks the guest
kernel to wait for specific amount of time before retrying the
hcall via the 'LONG_BUSY' return value.
* Ensure appropriate error code is returned back from the function
in case of an error.
Reviewed-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190629160610.23402-3-vaibhav@linux.ibm.com
Update the hvcalls.h to include op-codes for new hcalls introduce to
manage SCM memory. Also update existing hcall definitions to reflect
current papr specification for SCM.
The removed hcall op-codes H_SCM_MEM_QUERY, H_SCM_BLOCK_CLEAR were
transient proposals and there support was never implemented by
Power-VM nor they were used anywhere in Linux kernel. Hence we don't
expect anyone to be impacted by this change.
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190629160610.23402-2-vaibhav@linux.ibm.com
My @arm.com address will stop working in a couple of weeks. Update
MAINTAINERS and .mailmap files with an address I'll have access to.
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
I will soon lose access to my @arm.com email address, so let's
update the MAINTAINERS file to point to my @kernel.org address,
as well as .mailmap for good measure.
Note that my @arm.com address will still work, but someone else
will be reading whatever is sent there. Don't say you didn't know!
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
In commit 46d179525a ("mmc: dw_mmc: Wait for data transfer after
response errors.") we fixed a tuning-induced hang that I saw when
stress testing tuning on certain SD cards. I won't re-hash that whole
commit, but the summary is that as a normal part of tuning you need to
deal with transfer errors and there were cases where these transfer
errors was putting my system into a bad state causing all future
transfers to fail. That commit fixed handling of the transfer errors
for me.
In downstream Chrome OS my fix landed and had the same behavior for
all SD/MMC commands. However, it looks like when the commit landed
upstream we limited it to only SD tuning commands. Presumably this
was to try to get around problems that Alim Akhtar reported on exynos
[1].
Unfortunately while stress testing reboots (and suspend/resume) on
some rk3288-based Chromebooks I found the same problem on the eMMC on
some of my Chromebooks (the ones with Hynix eMMC). Since the eMMC
tuning command is different (MMC_SEND_TUNING_BLOCK_HS200
vs. MMC_SEND_TUNING_BLOCK) we were basically getting back into the
same situation.
I'm hoping that whatever problems exynos was having in the past are
somehow magically fixed now and we can make the behavior the same for
all commands.
[1] https://lkml.kernel.org/r/CAGOxZ53WfNbaMe0_AM0qBqU47kAfgmPBVZC8K8Y-_J3mDMqW4A@mail.gmail.com
Fixes: 46d179525a ("mmc: dw_mmc: Wait for data transfer after response errors.")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Alim Akhtar <alim.akhtar@gmail.com>
Cc: Enric Balletbo i Serra <enric.balletbo@collabora.com>
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
When the SD host controller tries to probe again due to the derferred
probe mechanism, it will always keep the SD host device as runtime
resume state due to missing the runtime put operation in error path
last time.
Thus add the pm_runtime_put_noidle() in error path to make the PM runtime
counter balance, which can make the SD host device's PM runtime work well.
Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Fixes: fb8bd90f83 ("mmc: sdhci-sprd: Add Spreadtrum's initial host controller")
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Oded writes:
This tag contains the following fixes:
- Fix to VRHOT event handling from the ASIC. No need to reset the ASIC,
just notify it in dmesg.
- Fix printk specifier for printing dma address
* tag 'misc-habanalabs-fixes-2019-07-22' of git://people.freedesktop.org/~gabbayo/linux:
habanalabs: don't reset device when getting VRHOT
habanalabs: use %pad for printing a dma_addr_t
Don't undo the PM initialization if we error out before we managed to
initialize it. The call to pm_runtime_disable() without being preceded
by pm_runtime_enable() would disturb the balance of the Force.
In practice, this happens if we fail to allocate any of the GPIOS ("cs",
"ready") due to -EPROBE_DEFER because we're getting probled before the
GPIO driver.
Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Link: https://lore.kernel.org/r/20190719122713.3444318-1-lkundrak@v3.sk
Signed-off-by: Mark Brown <broonie@kernel.org>
When CONFIG_UAPI_HEADER_TEST=y, exported headers are compile-tested to
make sure they can be included from user-space.
Currently, header.h and fw.h are excluded from the test coverage.
To make them join the compile-test, we need to fix the build errors
attached below.
For a case like this, we decided to use __u{8,16,32,64} variable types
in this discussion:
https://lkml.org/lkml/2019/6/5/18
Build log:
CC usr/include/sound/sof/header.h.s
CC usr/include/sound/sof/fw.h.s
In file included from <command-line>:32:0:
./usr/include/sound/sof/header.h:19:2: error: unknown type name ‘uint32_t’
uint32_t magic; /**< 'S', 'O', 'F', '\0' */
^~~~~~~~
./usr/include/sound/sof/header.h:20:2: error: unknown type name ‘uint32_t’
uint32_t type; /**< component specific type */
^~~~~~~~
./usr/include/sound/sof/header.h:21:2: error: unknown type name ‘uint32_t’
uint32_t size; /**< size in bytes of data excl. this struct */
^~~~~~~~
./usr/include/sound/sof/header.h:22:2: error: unknown type name ‘uint32_t’
uint32_t abi; /**< SOF ABI version */
^~~~~~~~
./usr/include/sound/sof/header.h:23:2: error: unknown type name ‘uint32_t’
uint32_t reserved[4]; /**< reserved for future use */
^~~~~~~~
./usr/include/sound/sof/header.h:24:2: error: unknown type name ‘uint32_t’
uint32_t data[0]; /**< Component data - opaque to core */
^~~~~~~~
In file included from <command-line>:32:0:
./usr/include/sound/sof/fw.h:49:2: error: unknown type name ‘uint32_t’
uint32_t size; /* bytes minus this header */
^~~~~~~~
./usr/include/sound/sof/fw.h:50:2: error: unknown type name ‘uint32_t’
uint32_t offset; /* offset from base */
^~~~~~~~
./usr/include/sound/sof/fw.h:64:2: error: unknown type name ‘uint32_t’
uint32_t size; /* bytes minus this header */
^~~~~~~~
./usr/include/sound/sof/fw.h:65:2: error: unknown type name ‘uint32_t’
uint32_t num_blocks; /* number of blocks */
^~~~~~~~
./usr/include/sound/sof/fw.h:73:2: error: unknown type name ‘uint32_t’
uint32_t file_size; /* size of file minus this header */
^~~~~~~~
./usr/include/sound/sof/fw.h:74:2: error: unknown type name ‘uint32_t’
uint32_t num_modules; /* number of modules */
^~~~~~~~
./usr/include/sound/sof/fw.h:75:2: error: unknown type name ‘uint32_t’
uint32_t abi; /* version of header format */
^~~~~~~~
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Link: https://lore.kernel.org/r/20190721142308.30306-1-yamada.masahiro@socionext.com
Signed-off-by: Mark Brown <broonie@kernel.org>
The TS3A227E says that the headset keypress detection needs the MICBIAS
power in order to report the key events to ensure proper operation
The headset keypress detection needs the MICBIAS power in order to report
the key events all the time as long as MIC is present. So MICBIAS pin
is forced on when a MICROPHONE is detected.
On Veyron Minnie I observed that if the MICBIAS power is not present and
the key press detection is activated (just because it is enabled when you
insert a headset), it randomly reports a keypress on insert.
E.g. (KEY_PLAYPAUSE)
Event: (SW_HEADPHONE_INSERT), value 1
Event: (SW_MICROPHONE_INSERT), value 1
Event: -------------- SYN_REPORT ------------
Event: (KEY_PLAYPAUSE), value 1
Userspace thinks that KEY_PLAYPAUSE is pressed and produces the annoying
effect that the media player starts a play/pause loop.
Note that, although most of the time the key reported is the one
associated with BTN_0, not always this is true. On my tests I also saw
different keys reported
Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
Link: https://lore.kernel.org/r/20190719173929.24065-1-enric.balletbo@collabora.com
Signed-off-by: Mark Brown <broonie@kernel.org>
When sample rate of TX is different with sample rate of RX in
async mode, the MFreq selection will be wrong.
For example, sysclk = 24.576MHz, TX rate = 96000Hz, RX rate = 48000Hz.
Then ratio of TX = 256, ratio of RX = 512, For MFreq is shared by TX
and RX instance, the correct value of MFreq is 2 for both TX and RX.
But original method will cause MFreq = 0 for TX, MFreq = 2 for RX.
If TX is started after RX, RX will be impacted, RX work abnormal with
MFreq = 0.
This patch is to select proper MFreq value according to TX rate and
RX rate.
Fixes: 0c516b4ff8 ("ASoC: cs42xx8: Add codec driver support for CS42448/CS42888")
Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com>
Link: https://lore.kernel.org/r/20190716094547.46787-1-shengjiu.wang@nxp.com
Signed-off-by: Mark Brown <broonie@kernel.org>
DPCM uses snd_soc_dapm_dai_get_connected_widgets to build a
list of the widgets connected to a specific front end DAI so it
can search through this list for available back end DAIs. The
custom_stop_condition was added to is_connected_ep to facilitate this
list not containing more widgets than is necessary. Doing so both
speeds up the DPCM handling as less widgets need to be searched and
avoids issues with CODEC to CODEC links as these would be confused
with back end DAIs if they appeared in the list of available widgets.
custom_stop_condition was implemented by aborting the graph walk
when the condition is triggered, however there is an issue with this
approach. Whilst walking the graph is_connected_ep should update the
endpoints cache on each widget, if the walk is aborted the number
of attached end points is unknown for that sub-graph. When the stop
condition triggered, the original patch ignored the triggering widget
and returned zero connected end points; a later patch updated this
to set the triggering widget's cache to 1 and return that. Both of
these approaches result in inaccurate values being stored in various
end point caches as the values propagate back through the graph,
which can result in later issues with widgets powering/not powering
unexpectedly.
As the original goal was to reduce the size of the widget list passed
to the DPCM code, the simplest solution is to limit the functionality
of the custom_stop_condition to the widget list. This means the rest
of the graph will still be processed resulting in correct end point
caches, but only widgets up to the stop condition will be added to the
returned widget list.
Fixes: 6742064aef ("ASoC: dapm: support user-defined stop condition in dai_get_connected_widgets")
Fixes: 5fdd022c20 ("ASoC: dpcm: play nice with CODEC<->CODEC links")
Fixes: 09464974ea ("ASoC: dapm: Fix to return correct path list in is_connected_ep.")
Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Link: https://lore.kernel.org/r/20190718084333.15598-1-ckeepax@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
After reverting commit 240c35a378 (kvm: x86: Use task structs fpu field
for user), struct kvm_vcpu is 19456 bytes on my server, PAGE_ALLOC_COSTLY_ORDER(3)
is the order at which allocations are deemed costly to service. In serveless
scenario, one host can service hundreds/thoudands firecracker/kata-container
instances, howerver, new instance will fail to launch after memory is too
fragmented to allocate kvm_vcpu struct on host, this was observed in some
cloud provider product environments.
This patch dynamically allocates user_fpu, kvm_vcpu is 15168 bytes now on my
Skylake server.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The idea before commit 240c35a37 (which has just been reverted)
was that we have the following FPU states:
userspace (QEMU) guest
---------------------------------------------------------------------------
processor vcpu->arch.guest_fpu
>>> KVM_RUN: kvm_load_guest_fpu
vcpu->arch.user_fpu processor
>>> preempt out
vcpu->arch.user_fpu current->thread.fpu
>>> preempt in
vcpu->arch.user_fpu processor
>>> back to userspace
>>> kvm_put_guest_fpu
processor vcpu->arch.guest_fpu
---------------------------------------------------------------------------
With the new lazy model we want to get the state back to the processor
when schedule in from current->thread.fpu.
Reported-by: Thomas Lambertz <mail@thomaslambertz.de>
Reported-by: anthony <antdev66@gmail.com>
Tested-by: anthony <antdev66@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Thomas Lambertz <mail@thomaslambertz.de>
Cc: anthony <antdev66@gmail.com>
Cc: stable@vger.kernel.org
Fixes: 5f409e20b (x86/fpu: Defer FPU state load until return to userspace)
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
[Add a comment in front of the warning. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This reverts commit 240c35a378
("kvm: x86: Use task structs fpu field for user", 2018-11-06).
The commit is broken and causes QEMU's FPU state to be destroyed
when KVM_RUN is preempted.
Fixes: 240c35a378 ("kvm: x86: Use task structs fpu field for user")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Letting this pend may cause nested_get_vmcs12_pages to run against an
invalid state, corrupting the effective vmcs of L1.
This was triggerable in QEMU after a guest corruption in L2, followed by
a L1 reset.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Cc: stable@vger.kernel.org
Fixes: 7f7f1ba33c ("KVM: x86: do not load vmcs12 pages while still in SMM")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This helper is required from generic huge_pte_alloc() which is available
when arch subscribes ARCH_WANT_GENERAL_HUGETLB. arm64 implements it's own
huge_pte_alloc() and does not depend on the generic definition. Drop this
helper which is redundant on arm64.
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Steve Capper <Steve.Capper@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
There are some hand-written instances of "32" to express the number
of SVE Z-registers.
Since this code was written a #define was added for this, so
convert trivial instances of this magic number as appropriate.
No functional change.
Reviewed-by: Julien Grall <julien.grall@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Currently we convert from FPSIMD to SVE register state in memory in
two places.
To ease future maintenance, let's consolidate this in one place.
Reviewed-by: Julien Grall <julien.grall@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
The arm64 stacktrace code is careful to only dereference frame records
in valid stack ranges, ensuring that a corrupted frame record won't
result in a faulting access.
However, it's still possible for corrupt frame records to result in
infinite loops in the stacktrace code, which is also undesirable.
This patch ensures that we complete a stacktrace in finite time, by
keeping track of which stacks we have already completed unwinding, and
verifying that if the next frame record is on the same stack, it is at a
higher address.
As this has turned out to be particularly subtle, comments are added to
explain the procedure.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: James Morse <james.morse@arm.com>
Tested-by: James Morse <james.morse@arm.com>
Acked-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Tengfei Fan <tengfeif@codeaurora.org>
Signed-off-by: Will Deacon <will@kernel.org>
Some common code is required by each stacktrace user to initialise
struct stackframe before the first call to unwind_frame().
In preparation for adding to the common code, this patch factors it
out into a separate function start_backtrace(), and modifies the
stacktrace callers appropriately.
No functional change.
Signed-off-by: Dave Martin <dave.martin@arm.com>
[Mark: drop tsk argument, update more callsites]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: James Morse <james.morse@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
on_accessible_stack() and on_task_stack() shouldn't (and don't)
modify their task argument, so it can be const.
This patch adds the appropriate modifiers. Whitespace violations in the
parameter lists are fixed at the same time.
No functional change.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Dave Martin <dave.martin@arm.com>
[Mark: fixup const location, whitespace]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
The recent changes to the vdso library for arm64 and the introduction of
the compat vdso library have generated some misalignment in the
Makefiles.
Cleanup the Makefiles for vdso and vdso32 libraries:
* Removing unused rules.
* Unifying the displayed compilation messages.
* Simplifying the generic library inclusion path for
arm64 vdso.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Running "make" on an already compiled kernel tree will rebuild the kernel
even without any modifications:
$ make ARCH=arm64 CROSS_COMPILE=/usr/bin/aarch64-unknown-linux-gnu-
arch/arm64/Makefile:58: CROSS_COMPILE_COMPAT not defined or empty, the compat vDSO will not be built
CALL scripts/checksyscalls.sh
CALL scripts/atomic/check-atomics.sh
VDSOCHK arch/arm64/kernel/vdso/vdso.so.dbg
VDSOSYM include/generated/vdso-offsets.h
CHK include/generated/compile.h
CC arch/arm64/kernel/signal.o
CC arch/arm64/kernel/vdso.o
CC arch/arm64/kernel/signal32.o
LD arch/arm64/kernel/vdso/vdso.so.dbg
OBJCOPY arch/arm64/kernel/vdso/vdso.so
AS arch/arm64/kernel/vdso/vdso.o
AR arch/arm64/kernel/vdso/built-in.a
AR arch/arm64/kernel/built-in.a
GEN .version
CHK include/generated/compile.h
UPD include/generated/compile.h
CC init/version.o
AR init/built-in.a
LD vmlinux.o
This is the same bug fixed in commit 92a4728608 ("x86/boot: Fix
if_changed build flip/flop bug"). We cannot use two "if_changed" in one
target. Fix this build bug by merging two commands into one function.
Fixes: a7f71a2c89 ("arm64: compat: Add vDSO")
Fixes: 28b1a824a4 ("arm64: vdso: Substitute gettimeofday() with C implementation")
Reviewed-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Co-developed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
[will: merged in compat fix from Vincenzo and made rule names consistent]
Signed-off-by: Will Deacon <will@kernel.org>
Prior to the introduction of Unified vDSO support and compat layer for
vDSO on arm64, AT_SYSINFO_EHDR was not defined for compat tasks.
In the current implementation, AT_SYSINFO_EHDR is defined even if the
compat vdso layer is not built, which has been shown to break Android
applications using bionic:
| 01-01 01:22:14.097 755 755 F libc : Fatal signal 11 (SIGSEGV),
| code 1 (SEGV_MAPERR), fault addr 0x3cf2c96c in tid 755 (cameraserver),
| pid 755 (cameraserver)
| 01-01 01:22:14.112 759 759 F libc : Fatal signal 11 (SIGSEGV),
| code 1 (SEGV_MAPERR), fault addr 0x3cf2c96c in tid 759
| (android.hardwar), pid 759 (android.hardwar)
| 01-01 01:22:14.120 756 756 F libc : Fatal signal 11 (SIGSEGV)
| code 1 (SEGV_MAPERR), fault addr 0x3cf2c96c in tid 756 (drmserver),
| pid 756 (drmserver)
Restore the old behaviour by making sure that AT_SYSINFO_EHDR for compat
tasks is defined only when CONFIG_COMPAT_VDSO is enabled.
Reported-by: John Stultz <john.stultz@linaro.org>
Tested-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
The MMIO RAPL interface driver depends on both powercap subsystem and
the intel_rapl_common code.
But when all of them are built-in, the MMIO RAPL interface driver can
be loaded before the other two and this breaks the system during boot.
Fix this by adjusting the init order of the powercap subsystem and the
intel_rapl_common code, so that it can be initialized first.
Fixes: 555c45fe0d ("int340X/processor_thermal_device: add support for MMIO RAPL")
Reported-by: Kenneth R. Crudup <kenny@panix.com>
Tested-by: Kenneth R. Crudup <kenny@panix.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
[ rjw: Subject & changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Some Lenovo 2-in-1s with a detachable keyboard have a portrait screen but
advertise a landscape resolution and pitch, resulting in a messed up
display if the kernel tries to show anything on the efifb (because of the
wrong pitch).
Fix this by adding a new DMI match table for devices which need to have
their width and height swapped.
At first it was tried to use the existing table for overriding some of the
efifb parameters, but some of the affected devices have variants with
different LCD resolutions which will not work with hardcoded override
values.
Reference: https://bugzilla.redhat.com/show_bug.cgi?id=1730783
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20190721152418.11644-1-hdegoede@redhat.com
When arch_stack_walk_user() is called from atomic contexts, access_ok() can
trigger the following warning if compiled with CONFIG_DEBUG_ATOMIC_SLEEP=y.
Reproducer:
// CONFIG_DEBUG_ATOMIC_SLEEP=y
# cd /sys/kernel/debug/tracing
# echo 1 > options/userstacktrace
# echo 1 > events/irq/irq_handler_entry/enable
WARNING: CPU: 0 PID: 2649 at arch/x86/kernel/stacktrace.c:103 arch_stack_walk_user+0x6e/0xf6
CPU: 0 PID: 2649 Comm: bash Not tainted 5.3.0-rc1+ #99
RIP: 0010:arch_stack_walk_user+0x6e/0xf6
Call Trace:
<IRQ>
stack_trace_save_user+0x10a/0x16d
trace_buffer_unlock_commit_regs+0x185/0x240
trace_event_buffer_commit+0xec/0x330
trace_event_raw_event_irq_handler_entry+0x159/0x1e0
__handle_irq_event_percpu+0x22d/0x440
handle_irq_event_percpu+0x70/0x100
handle_irq_event+0x5a/0x8b
handle_edge_irq+0x12f/0x3f0
handle_irq+0x34/0x40
do_IRQ+0xa6/0x1f0
common_interrupt+0xf/0xf
</IRQ>
Fix it by calling __range_not_ok() directly instead of access_ok() as
copy_from_user_nmi() does. This is fine here because the actual copy is
inside a pagefault disabled region.
Reported-by: Juri Lelli <juri.lelli@gmail.com>
Signed-off-by: Eiichi Tsukata <devel@etsukata.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190722083216.16192-2-devel@etsukata.com
On x86-32 with PTI enabled, parts of the kernel page-tables are not shared
between processes. This can cause mappings in the vmalloc/ioremap area to
persist in some page-tables after the region is unmapped and released.
When the region is re-used the processes with the old mappings do not fault
in the new mappings but still access the old ones.
This causes undefined behavior, in reality often data corruption, kernel
oopses and panics and even spontaneous reboots.
Fix this problem by activly syncing unmaps in the vmalloc/ioremap area to
all page-tables in the system before the regions can be re-used.
References: https://bugzilla.suse.com/show_bug.cgi?id=1118689
Fixes: 5d72b4fba4 ('x86, mm: support huge I/O mapping capability I/F')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/20190719184652.11391-4-joro@8bytes.org
With huge-page ioremap areas the unmappings also need to be synced between
all page-tables. Otherwise it can cause data corruption when a region is
unmapped and later re-used.
Make the vmalloc_sync_one() function ready to sync unmappings and make sure
vmalloc_sync_all() iterates over all page-tables even when an unmapped PMD
is found.
Fixes: 5d72b4fba4 ('x86, mm: support huge I/O mapping capability I/F')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/20190719184652.11391-3-joro@8bytes.org
A few boards set clock frequency of their I2C buses with
"clock_frequency" property. The right property is "clock-frequency".
Signed-off-by: Sébastien Szymanski <sebastien.szymanski@armadeus.com>
Reviewed-by: Fabio Estevam <festevam@gmail.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
fw_{grow,map}_paged_buf() need to be defined as static inline
when CONFIG_FW_LOADER_PAGED_BUF is not enabled,
infact fw_free_paged_buf() is also defined as static inline
when CONFIG_FW_LOADER_PAGED_BUF is not enabled.
Fixes the following mutiple definition building errors for Android kernel:
drivers/base/firmware_loader/fallback_efi.o: In function `fw_grow_paged_buf':
fallback_efi.c:(.text+0x0): multiple definition of `fw_grow_paged_buf'
drivers/base/firmware_loader/main.o:(.text+0x73b): first defined here
drivers/base/firmware_loader/fallback_efi.o: In function `fw_map_paged_buf':
fallback_efi.c:(.text+0xf): multiple definition of `fw_map_paged_buf'
drivers/base/firmware_loader/main.o:(.text+0x74a): first defined here
[ slightly corrected the patch description -- tiwai ]
Fixes: 5342e7093f ("firmware: Factor out the paged buffer handling code")
Fixes: 82fd7a8142 ("firmware: Add support for loading compressed files")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://lore.kernel.org/r/20190722055536.15342-1-tiwai@suse.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We are using PAGE_SIZE as the unit to determine if the total len in
async_list has exceeded max_pages, it's not fair for smaller io sizes.
For example, if we are doing 1k-size io streams, we will never exceed
max_pages since len >>= PAGE_SHIFT always gets zero. So use original
bytes to make it more accurate.
Signed-off-by: Zhengyuan Liu <liuzhengyuan@kylinos.cn>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Hrvoje reports that when a large fixed buffer is registered and IO is
being done to the latter pages of said buffer, the IO submission time
is much worse:
reading to the start of the buffer: 11238 ns
reading to the end of the buffer: 1039879 ns
In fact, it's worse by two orders of magnitude. The reason for that is
how io_uring figures out how to setup the iov_iter. We point the iter
at the first bvec, and then use iov_iter_advance() to fast-forward to
the offset within that buffer we need.
However, that is abysmally slow, as it entails iterating the bvecs
that we setup as part of buffer registration. There's really no need
to use this generic helper, as we know it's a BVEC type iterator, and
we also know that each bvec is PAGE_SIZE in size, apart from possibly
the first and last. Hence we can just use a shift on the offset to
find the right index, and then adjust the iov_iter appropriately.
After this fix, the timings are:
reading to the start of the buffer: 10135 ns
reading to the end of the buffer: 1377 ns
Or about an 755x improvement for the tail page.
Reported-by: Hrvoje Zeba <zeba.hrvoje@gmail.com>
Tested-by: Hrvoje Zeba <zeba.hrvoje@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
A caller is supposed to pass in REQ_NOWAIT if we can't block for any
given operation, but O_DIRECT for block devices just ignore this. Hence
we'll block for various resource shortages on the block layer side,
like having to wait for requests.
Use the new REQ_NOWAIT_INLINE to ask for this error to be returned
inline, so we can handle it appropriately and return -EAGAIN to the
caller.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
By default, if a caller sets REQ_NOWAIT and we need to block, we'll
return -EAGAIN through the bio->bi_end_io() callback. For some use
cases, this makes it hard to use.
Allow a caller to ask for inline return of errors related to
blocking by also setting REQ_NOWAIT_INLINE.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Some applications set tiny SO_SNDBUF values and expect
TCP to just work. Recent patches to address CVE-2019-11478
broke them in case of losses, since retransmits might
be prevented.
We should allow these flows to make progress.
This patch allows the first and last skb in retransmit queue
to be split even if memory limits are hit.
It also adds the some room due to the fact that tcp_sendmsg()
and tcp_sendpage() might overshoot sk_wmem_queued by about one full
TSO skb (64KB size). Note this allowance was already present
in stable backports for kernels < 4.15
Note for < 4.15 backports :
tcp_rtx_queue_tail() will probably look like :
static inline struct sk_buff *tcp_rtx_queue_tail(const struct sock *sk)
{
struct sk_buff *skb = tcp_send_head(sk);
return skb ? tcp_write_queue_prev(sk, skb) : tcp_write_queue_tail(sk);
}
Fixes: f070ef2ac6 ("tcp: tcp_fragment() should apply sane memory limits")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrew Prout <aprout@ll.mit.edu>
Tested-by: Andrew Prout <aprout@ll.mit.edu>
Tested-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Tested-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Christoph Paasch <cpaasch@apple.com>
Cc: Jonathan Looney <jtl@netflix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is an extra rcu_read_unlock left in netvsc_recv_callback(),
after a previous patch that removes RCU from this function.
This patch removes the extra RCU unlock.
Fixes: 345ac08990 ("hv_netvsc: pass netvsc_device to receive callback")
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On systems like P9 powernv where we have no TM (or P8 booted with
ppc_tm=off), userspace can construct a signal context which still has
the MSR TS bits set. The kernel tries to restore this context which
results in the following crash:
Unexpected TM Bad Thing exception at c0000000000022fc (msr 0x8000000102a03031) tm_scratch=800000020280f033
Oops: Unrecoverable exception, sig: 6 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
Modules linked in:
CPU: 0 PID: 1636 Comm: sigfuz Not tainted 5.2.0-11043-g0a8ad0ffa4 #69
NIP: c0000000000022fc LR: 00007fffb2d67e48 CTR: 0000000000000000
REGS: c00000003fffbd70 TRAP: 0700 Not tainted (5.2.0-11045-g7142b497d8)
MSR: 8000000102a03031 <SF,VEC,VSX,FP,ME,IR,DR,LE,TM[E]> CR: 42004242 XER: 00000000
CFAR: c0000000000022e0 IRQMASK: 0
GPR00: 0000000000000072 00007fffb2b6e560 00007fffb2d87f00 0000000000000669
GPR04: 00007fffb2b6e728 0000000000000000 0000000000000000 00007fffb2b6f2a8
GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR12: 0000000000000000 00007fffb2b76900 0000000000000000 0000000000000000
GPR16: 00007fffb2370000 00007fffb2d84390 00007fffea3a15ac 000001000a250420
GPR20: 00007fffb2b6f260 0000000010001770 0000000000000000 0000000000000000
GPR24: 00007fffb2d843a0 00007fffea3a14a0 0000000000010000 0000000000800000
GPR28: 00007fffea3a14d8 00000000003d0f00 0000000000000000 00007fffb2b6e728
NIP [c0000000000022fc] rfi_flush_fallback+0x7c/0x80
LR [00007fffb2d67e48] 0x7fffb2d67e48
Call Trace:
Instruction dump:
e96a0220 e96a02a8 e96a0330 e96a03b8 394a0400 4200ffdc 7d2903a6 e92d0c00
e94d0c08 e96d0c10 e82d0c18 7db242a6 <4c000024> 7db243a6 7db142a6 f82d0c18
The problem is the signal code assumes TM is enabled when
CONFIG_PPC_TRANSACTIONAL_MEM is enabled. This may not be the case as
with P9 powernv or if `ppc_tm=off` is used on P8.
This means any local user can crash the system.
Fix the problem by returning a bad stack frame to the user if they try
to set the MSR TS bits with sigreturn() on systems where TM is not
supported.
Found with sigfuz kernel selftest on P9.
This fixes CVE-2019-13648.
Fixes: 2b0a576d15 ("powerpc: Add new transactional memory state to the signal context")
Cc: stable@vger.kernel.org # v3.9
Reported-by: Praveen Pandey <Praveen.Pandey@in.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190719050502.405-1-mikey@neuling.org
Fixed address of third NCT6106_REG_WEIGHT_DUTY_STEP, and
added missed NCT6106_REG_TOLERANCE_H.
Fixes: 6c009501ff ("hwmon: (nct6775) Add support for NCT6102D/6106D")
Signed-off-by: Bjoern Gerhart <gerhart@posteo.de>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
The following warning is seen when building with W=1:
arch/arm/boot/dts/imx7ulp.dtsi:189.31-195.5: Warning (simple_bus_reg): /bus@40000000/usb-phy@0x40350000: simple-bus unit address format error, expected "40350000"
Fix it as suggested by removing the extra "0x" notation.
Fixes: 5b7bd45631 ("ARM: dts: imx7ulp: add imx7ulp USBOTG1 support")
Signed-off-by: Fabio Estevam <festevam@gmail.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
vrf_process_v4_outbound() and vrf_process_v6_outbound() do routing
using ip/ipv6 addresses, but don't make sure the header is available
in skb->data[] (skb_headlen() is less then header size).
Case:
1) igb driver from intel.
2) Packet size is greater then 255.
3) MPLS forwards to VRF device.
So, patch adds pskb_may_pull() calls in vrf_process_v4/v6_outbound()
functions.
Signed-off-by: Peter Kosyh <p.kosyh@gmail.com>
Reviewed-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 8c0d3a02c1 ("PCI: Add accessors for PCI Express Capability")
added accessors for the PCI Express Capability so that drivers didn't
need to be aware of differences between v1 and v2 of the PCI
Express Capability.
Replace pci_read_config_word() and pci_write_config_word() calls with
pcie_capability_read_word() and pcie_capability_write_word().
Signed-off-by: Frederick Lawler <fred@fredlawl.com>
Acked-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 8c0d3a02c1 ("PCI: Add accessors for PCI Express Capability")
added accessors for the PCI Express Capability so that drivers didn't
need to be aware of differences between v1 and v2 of the PCI
Express Capability.
Replace pci_read_config_word() and pci_write_config_word() calls with
pcie_capability_read_word() and pcie_capability_write_word().
Signed-off-by: Frederick Lawler <fred@fredlawl.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 8c0d3a02c1 ("PCI: Add accessors for PCI Express Capability")
added accessors for the PCI Express Capability so that drivers didn't
need to be aware of differences between v1 and v2 of the PCI
Express Capability.
Replace pci_read_config_word() and pci_write_config_word() calls with
pcie_capability_read_word() and pcie_capability_write_word().
Signed-off-by: Frederick Lawler <fred@fredlawl.com>
Reviewed-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As pointed out by Firo Yang, a netdev tx timeout may trigger just before an
ethtool set_channels operation is started. be_tx_timeout(), which dumps
some queue structures, is not written to run concurrently with
be_update_queues(), which frees/allocates those queues structures. Add some
synchronization between the two.
Message-id: <CH2PR18MB31898E033896F9760D36BFF288C90@CH2PR18MB3189.namprd18.prod.outlook.com>
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes an issue seen on Power systems with bnx2x which results
in the skb is NULL WARN_ON in bnx2x_free_tx_pkt firing due to the skb
pointer getting loaded in bnx2x_free_tx_pkt prior to the hw_cons
load in bnx2x_tx_int. Adding a read memory barrier resolves the issue.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In compat_do_replace(), a temporary buffer is allocated through vmalloc()
to hold entries copied from the user space. The buffer address is firstly
saved to 'newinfo->entries', and later on assigned to 'entries_tmp'. Then
the entries in this temporary buffer is copied to the internal kernel
structure through compat_copy_entries(). If this copy process fails,
compat_do_replace() should be terminated. However, the allocated temporary
buffer is not freed on this path, leading to a memory leak.
To fix the bug, free the buffer before returning from compat_do_replace().
Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The RX power read from the SFP uses units of 0.1uW. This must be
scaled to units of uW for HWMON. This requires a divide by 10, not the
current 100.
With this change in place, sensors(1) and ethtool -m agree:
sff2-isa-0000
Adapter: ISA adapter
in0: +3.23 V
temp1: +33.1 C
power1: 270.00 uW
power2: 200.00 uW
curr1: +0.01 A
Laser output power : 0.2743 mW / -5.62 dBm
Receiver signal average optical power : 0.2014 mW / -6.96 dBm
Reported-by: chris.healy@zii.aero
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Fixes: 1323061a01 ("net: phy: sfp: Add HWMON support for module sensors")
Signed-off-by: David S. Miller <davem@davemloft.net>
It is likely that 'my3216_poll()' should be 'my3126_poll()'. (1 and 2
switched in 3126.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
allocate_flower_entry does not check for allocation success, but tries
to deref the result. I only moved the spin_lock under null check, because
the caller is checking allocation's status at line 652.
Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All constant in 'enum HCLGE_MBX_OPCODE' start with HCLGE, except
'HLCGE_MBX_PUSH_VLAN_INFO' (C and L switched)
s/HLC/HCL/
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
s/tipc_toprsv_listener_data_ready/tipc_topsrv_listener_data_ready/
(r and s switched in topsrv)
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johannes Berg says:
====================
We have a handful of fixes:
* ignore bad CW parameters if we aren't using them,
instead of warning
* fix operation (and then build) with the new netlink vendor
command policy requirement
* fix a memory leak in an error path when setting beacons
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This makes sure that we convert from on-wire to CPU endianness in
applespi_debug_update_dimensions() and also marks as "static" as it is not
needed to be visible outside of the driver.
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Ronald Tschalär <ronald@innovation.ch>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
flush_tlb_all_local() flushes the ITLB and DTLB of the CPU.
In case the machine does not have separate ITLBs and DTLBs, use the
alternative functionality to replace the code which flushes the ITLB
with nops while keeping the code which flushes the DTLB.
Signed-off-by: Helge Deller <deller@gmx.de>
Add kprobe_fault_handler() to fix compilation for PA-RISC.
On PA-RISC we actually don't need that function as the recovery counter
is restored after interrupt. See the PA-RISC 2.0 Architecture Manual,
pg. 4-8, Figure 4-4: "Interruption Processing".
Fixes: b98cca444d ("mm, kprobes: generalize and rename notify_page_fault() as kprobe_page_fault()")
Signed-off-by: Sven Schnelle <svens@stackframe.org>
Signed-off-by: Helge Deller <deller@gmx.de>
ieee80211_set_wmm_default() normally sets up the initial CW min/max for
each queue, except that it skips doing this if the driver doesn't
support ->conf_tx. We still end up calling drv_conf_tx() in some cases
(e.g., ieee80211_reconfig()), which also still won't do anything
useful...except it complains here about the invalid CW parameters.
Let's just skip the WARN if we weren't going to do anything useful with
the parameters.
Signed-off-by: Brian Norris <briannorris@chromium.org>
Link: https://lore.kernel.org/r/20190718015712.197499-1-briannorris@chromium.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Since ERR_PTR() is an inline, not a macro, just open-code it
here so it's usable as an initializer, fixing the build in
brcmfmac.
Reported-by: Arend Van Spriel <arend.vanspriel@broadcom.com>
Fixes: 901bb98918 ("nl80211: require and validate vendor command policy")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
In my previous commit to validate a policy I neglected to
actually add one to the few drivers using vendor commands,
fix that now.
Reported-by: Tony Lindgren <tony@atomide.com>
Tested-by: Tony Lindgren <tony@atomide.com>
Fixes: 901bb98918 ("nl80211: require and validate vendor command policy")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This fixes a copy&paste error in the original patch. Setting the wrong
register resulted in massive packet loss on some systems.
Fixes: a2928d2864 ("r8169: use paged versions of phylib MDIO access functions")
Tested-by: Thomas Voegtle <tv@lio96.de>
Signed-off-by: Thomas Voegtle <tv@lio96.de>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
flow_offload fixes
The following patchset contains fixes for the flow_offload infrastructure:
1) Fix possible build breakage before patch 3/4. Both the flow_offload
infrastructure and OVS define the flow_stats structure. Patch 3/4 in
this batch indirectly pulls in the flow_stats definition from
include/net/flow_offload.h into OVS, leading to structure redefinition
compile-time errors.
2) Remove netns parameter from flow_block_cb_alloc(), this is not
required as Jiri suggests. The flow_block_cb_is_busy() function uses
the per-driver block list to check for used blocks which was the
original intention for this parameter.
3) Rename tc_setup_cb_t to flow_setup_cb_t. This callback is not
exclusive of tc anymore, this might confuse the reader as Jiri
suggests, fix this semantic inconsistency.
Add #include <linux/list.h> to include/net/netfilter/nf_tables_offload.h
to avoid a compile break with CONFIG_HEADER_TEST=y.
4) Fix block sharing feature: Add flow_block structure and use it,
update flow_block_cb_lookup() to use this flow_block object.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This object stores the flow block callbacks that are attached to this
block. Update flow_block_cb_lookup() to take this new object.
This patch restores the block sharing feature.
Fixes: da3eeb904f ("net: flow_offload: add list handling functions")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
No need to annotate the netns on the flow block callback object,
flow_block_cb_is_busy() already checks for used blocks.
Fixes: d63db30c85 ("net: flow_offload: add flow_block_cb_alloc() and flow_block_cb_free()")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a flow_stats structure defined in include/net/flow_offload.h
and a follow up patch adds #include <net/flow_offload.h> to
net/sch_generic.h.
This breaks compilation since OVS codebase includes net/sock.h which
pulls in linux/filter.h which includes net/sch_generic.h.
In file included from ./include/net/sch_generic.h:18:0,
from ./include/linux/filter.h:25,
from ./include/net/sock.h:59,
from ./include/linux/tcp.h:19,
from net/openvswitch/datapath.c:24
This definition takes precedence on OVS since it is placed in the
networking core, so rename flow_stats in OVS to sw_flow_stats since
this structure is contained in sw_flow.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Fix a deadlock when module is requested via netlink_bind()
in nfnetlink, from Florian Westphal.
2) Fix ipt_rpfilter and ip6t_rpfilter with VRF, from Miaohe Lin.
3) Skip master comparison in SIP helper to fix expectation clash
under two valid scenarios, from xiao ruizhu.
4) Remove obsolete comments in nf_conntrack codebase, from
Yonatan Goldschmidt.
5) Fix redirect extension module autoload, from Christian Hesse.
6) Fix incorrect mssg option sent to client in synproxy,
from Fernando Fernandez.
7) Fix incorrect window calculations in TCP conntrack, from
Florian Westphal.
8) Don't bail out when updating basechain policy due to recent
offload works, also from Florian.
9) Allow symhash to use modulus 1 as other hash extensions do,
from Laura.Garcia.
10) Missing NAT chain module autoload for the inet family,
from Phil Sutter.
11) Fix missing adjustment of TCP RST packet in synproxy,
from Fernando Fernandez.
12) Skip EAGAIN path when nft_meta_bridge is built-in or
not selected.
13) Conntrack bridge does not depend on nf_tables_bridge.
14) Turn NF_TABLES_BRIDGE into tristate to fix possible
link break of nft_meta_bridge, from Arnd Bergmann.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
iscsi_ibft can use ACPI to find the iBFT entry during bootup,
currently, ISCSI_IBFT depends on ISCSI_IBFT_FIND which is
a X86 legacy way to find the iBFT by searching through the
low memory. This patch changes the dependency so that other
arch like ARM64 can use ISCSI_IBFT as long as the arch supports
ACPI.
ibft_init() needs to use the global variable ibft_addr declared
in iscsi_ibft_find.c. A #ifndef CONFIG_ISCSI_IBFT_FIND is needed
to declare the variable if CONFIG_ISCSI_IBFT_FIND is not selected.
Moving ibft_addr into the iscsi_ibft.c does not work because if
ISCSI_IBFT is selected as a module, the arch/x86/kernel/setup.c won't
be able to find the variable at compile time.
Signed-off-by: Thomas Tai <thomas.tai@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
If VAR in non-sanitized BTF was size less than 4, converting such VAR
into an INT with size=4 will cause BTF validation failure due to
violationg of STRUCT (into which DATASEC was converted) member size.
Fix by conservatively using size=1.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
In case when BTF loading fails despite sanitization, but BPF object has
.BTF.ext loaded as well, we free and null obj->btf, but not
obj->btf_ext. This leads to an attempt to relocate .BTF.ext later on
during bpf_object__load(), which assumes obj->btf is present. This leads
to SIGSEGV on null pointer access. Fix bug by freeing and nulling
obj->btf_ext as well.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The new nft_meta_bridge code fails to link as built-in when NF_TABLES
is a loadable module.
net/bridge/netfilter/nft_meta_bridge.o: In function `nft_meta_bridge_get_eval':
nft_meta_bridge.c:(.text+0x1e8): undefined reference to `nft_meta_get_eval'
net/bridge/netfilter/nft_meta_bridge.o: In function `nft_meta_bridge_get_init':
nft_meta_bridge.c:(.text+0x468): undefined reference to `nft_meta_get_init'
nft_meta_bridge.c:(.text+0x49c): undefined reference to `nft_parse_register'
nft_meta_bridge.c:(.text+0x4cc): undefined reference to `nft_validate_register_store'
net/bridge/netfilter/nft_meta_bridge.o: In function `nft_meta_bridge_module_exit':
nft_meta_bridge.c:(.exit.text+0x14): undefined reference to `nft_unregister_expr'
net/bridge/netfilter/nft_meta_bridge.o: In function `nft_meta_bridge_module_init':
nft_meta_bridge.c:(.init.text+0x14): undefined reference to `nft_register_expr'
net/bridge/netfilter/nft_meta_bridge.o:(.rodata+0x60): undefined reference to `nft_meta_get_dump'
net/bridge/netfilter/nft_meta_bridge.o:(.rodata+0x88): undefined reference to `nft_meta_set_eval'
This can happen because the NF_TABLES_BRIDGE dependency itself is just a
'bool'. Make the symbol a 'tristate' instead so Kconfig can propagate the
dependencies correctly.
Fixes: 30e103fe24 ("netfilter: nft_meta: move bridge meta keys into nft_meta_bridge")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The recent rewrite of PCM link lock management introduced the refcount
in snd_pcm_group object, managed by the kernel refcount_t API. This
caused unexpected kernel warnings when the kernel is built with
CONFIG_REFCOUNT_FULL=y. As the warning line indicates, the problem is
obviously that we start with refcount=0 and do refcount_inc() for
adding each PCM link, while refcount_t API doesn't like refcount_inc()
performed on zero.
For adapting the proper refcount_t usage, this patch changes the logic
slightly:
- The initial refcount is 1, assuming the single list entry
- The refcount is incremented / decremented at each PCM link addition
and deletion
- ... which allows us concentrating only on the refcount as a release
condition
Fixes: f57f3df03a ("ALSA: pcm: More fine-grained PCM link locking")
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=204221
Reported-and-tested-by: Duncan Overbruck <kernel@duncano.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The logic with seeks for a subdir passed via SPHINXDIRS is
incomplete: if one uses something like:
make SPHINXDIRS=arm pdfdocs
It will find both "arm" and "arm64" directories. Worse than
that, it will convert "arm64/index" to "4/index".
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
The refactor of powerpc DMA functions in commit 6666cc17d7
("powerpc/dma: remove dma_nommu_mmap_coherent") incorrectly
changes the way DMA mappings are handled on powerpc.
Since this change, all mapped pages are marked as cache-inhibited
through the default implementation of arch_dma_mmap_pgprot.
This differs from the previous behavior of only marking pages
in noncoherent mappings as cache-inhibited and has resulted in
sporadic system crashes in certain hardware configurations and
workloads (see Bugzilla).
This commit restores the previous correct behavior by providing
an implementation of arch_dma_mmap_pgprot that only marks
pages in noncoherent mappings as cache-inhibited. As this behavior
should be universal for all powerpc platforms a new file,
dma-generic.c, was created to store it.
Fixes: 6666cc17d7 ("powerpc/dma: remove dma_nommu_mmap_coherent")
# NOTE: fixes commit 6666cc17d7 released in v5.1.
# Consider a stable tag:
# Cc: stable@vger.kernel.org # v5.1+
# NOTE: fixes commit 6666cc17d7 released in v5.1.
# Consider a stable tag:
# Cc: stable@vger.kernel.org # v5.1+
Cc: stable@vger.kernel.org # v5.1+
Signed-off-by: Shawn Anastasio <shawn@anastas.io>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190717235437.12908-1-shawn@anastas.io
The XIVE device structure is now allocated in kvmppc_xive_get_device()
and kfree'd in kvmppc_core_destroy_vm(). In case of an OPAL error when
allocating the XIVE VPs, the kfree() call in kvmppc_xive_*create()
will result in a double free and corrupt the host memory.
Fixes: 5422e95103 ("KVM: PPC: Book3S HV: XIVE: Replace the 'destroy' method by a 'release' method")
Cc: stable@vger.kernel.org # v5.2+
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/6ea6998b-a890-2511-01d1-747d7621eb19@kaod.org
For good reason, the standard device_lock() is marked
lockdep_set_novalidate_class() because there is simply no sane way to
describe the myriad ways the device_lock() ordered with other locks.
However, that leaves subsystems that know their own local device_lock()
ordering rules to find lock ordering mistakes manually. Instead,
introduce an optional / additional lockdep-enabled lock that a subsystem
can acquire in all the same paths that the device_lock() is acquired.
A conversion of the NFIT driver and NVDIMM subsystem to a
lockdep-validate device_lock() scheme is included. The
debug_nvdimm_lock() implementation implements the correct lock-class and
stacking order for the libnvdimm device topology hierarchy.
Yes, this is a hack, but hopefully it is a useful hack for other
subsystems device_lock() debug sessions. Quoting Greg:
"Yeah, it feels a bit hacky but it's really up to a subsystem to mess up
using it as much as anything else, so user beware :)
I don't object to it if it makes things easier for you to debug."
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/156341210661.292348.7014034644265455704.stgit@dwillia2-desk3.amr.corp.intel.com
A multithreaded namespace creation/destruction stress test currently
deadlocks with the following lockup signature:
INFO: task ndctl:2924 blocked for more than 122 seconds.
Tainted: G OE 5.2.0-rc4+ #3382
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ndctl D 0 2924 1176 0x00000000
Call Trace:
? __schedule+0x27e/0x780
schedule+0x30/0xb0
wait_nvdimm_bus_probe_idle+0x8a/0xd0 [libnvdimm]
? finish_wait+0x80/0x80
uuid_store+0xe6/0x2e0 [libnvdimm]
kernfs_fop_write+0xf0/0x1a0
vfs_write+0xb7/0x1b0
ksys_write+0x5c/0xd0
do_syscall_64+0x60/0x240
INFO: task ndctl:2923 blocked for more than 122 seconds.
Tainted: G OE 5.2.0-rc4+ #3382
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ndctl D 0 2923 1175 0x00000000
Call Trace:
? __schedule+0x27e/0x780
? __mutex_lock+0x489/0x910
schedule+0x30/0xb0
schedule_preempt_disabled+0x11/0x20
__mutex_lock+0x48e/0x910
? nvdimm_namespace_common_probe+0x95/0x4d0 [libnvdimm]
? __lock_acquire+0x23f/0x1710
? nvdimm_namespace_common_probe+0x95/0x4d0 [libnvdimm]
nvdimm_namespace_common_probe+0x95/0x4d0 [libnvdimm]
__dax_pmem_probe+0x5e/0x210 [dax_pmem_core]
? nvdimm_bus_probe+0x1d0/0x2c0 [libnvdimm]
dax_pmem_probe+0xc/0x20 [dax_pmem]
nvdimm_bus_probe+0x90/0x2c0 [libnvdimm]
really_probe+0xef/0x390
driver_probe_device+0xb4/0x100
In this sequence an 'nd_dax' device is being probed and trying to take
the lock on its backing namespace to validate that the 'nd_dax' device
indeed has exclusive access to the backing namespace. Meanwhile, another
thread is trying to update the uuid property of that same backing
namespace. So one thread is in the probe path trying to acquire the
lock, and the other thread has acquired the lock and tries to flush the
probe path.
Fix this deadlock by not holding the namespace device_lock over the
wait_nvdimm_bus_probe_idle() synchronization step. In turn this requires
the device_lock to be held on entry to wait_nvdimm_bus_probe_idle() and
subsequently dropped internally to wait_nvdimm_bus_probe_idle().
Cc: <stable@vger.kernel.org>
Fixes: bf9bccc14c ("libnvdimm: pmem label sets and namespace instantiation")
Cc: Vishal Verma <vishal.l.verma@intel.com>
Tested-by: Jane Chu <jane.chu@oracle.com>
Link: https://lore.kernel.org/r/156341210094.292348.2384694131126767789.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Namespace activation expects to be able to reference region badblocks.
The following warning sometimes triggers when asynchronous namespace
activation races in front of the completion of namespace probing. Move
all possible namespace probing after region badblocks initialization.
Otherwise, lockdep sometimes catches the uninitialized state of the
badblocks seqlock with stack trace signatures like:
INFO: trying to register non-static key.
pmem2: detected capacity change from 0 to 136365211648
the code is fine but needs lockdep annotation.
turning off the locking correctness validator.
CPU: 9 PID: 358 Comm: kworker/u80:5 Tainted: G OE 5.2.0-rc4+ #3382
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
Workqueue: events_unbound async_run_entry_fn
Call Trace:
dump_stack+0x85/0xc0
pmem1.12: detected capacity change from 0 to 8589934592
register_lock_class+0x56a/0x570
? check_object+0x140/0x270
__lock_acquire+0x80/0x1710
? __mutex_lock+0x39d/0x910
lock_acquire+0x9e/0x180
? nd_pfn_validate+0x28f/0x440 [libnvdimm]
badblocks_check+0x93/0x1f0
? nd_pfn_validate+0x28f/0x440 [libnvdimm]
nd_pfn_validate+0x28f/0x440 [libnvdimm]
? lockdep_hardirqs_on+0xf0/0x180
nd_dax_probe+0x9a/0x120 [libnvdimm]
nd_pmem_probe+0x6d/0x180 [nd_pmem]
nvdimm_bus_probe+0x90/0x2c0 [libnvdimm]
Fixes: 48af2f7e52 ("libnvdimm, pfn: during init, clear errors...")
Cc: <stable@vger.kernel.org>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Link: https://lore.kernel.org/r/156341208365.292348.1547528796026249120.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
A multithreaded namespace creation/destruction stress test currently
fails with signatures like the following:
sysfs group 'power' not found for kobject 'dax1.1'
RIP: 0010:sysfs_remove_group+0x76/0x80
Call Trace:
device_del+0x73/0x370
device_unregister+0x16/0x50
nd_async_device_unregister+0x1e/0x30 [libnvdimm]
async_run_entry_fn+0x39/0x160
process_one_work+0x23c/0x5e0
worker_thread+0x3c/0x390
BUG: kernel NULL pointer dereference, address: 0000000000000020
RIP: 0010:klist_put+0x1b/0x6c
Call Trace:
klist_del+0xe/0x10
device_del+0x8a/0x2c9
? __switch_to_asm+0x34/0x70
? __switch_to_asm+0x40/0x70
device_unregister+0x44/0x4f
nd_async_device_unregister+0x22/0x2d [libnvdimm]
async_run_entry_fn+0x47/0x15a
process_one_work+0x1a2/0x2eb
worker_thread+0x1b8/0x26e
Use the kill_device() helper to atomically resolve the race of multiple
threads issuing kill, device_unregister(), requests.
Reported-by: Jane Chu <jane.chu@oracle.com>
Reported-by: Erwin Tsaur <erwin.tsaur@oracle.com>
Fixes: 4d88a97aa9 ("libnvdimm, nvdimm: dimm driver and base libnvdimm device-driver...")
Cc: <stable@vger.kernel.org>
Link: https://github.com/pmem/ndctl/issues/96
Tested-by: Tested-by: Jane Chu <jane.chu@oracle.com>
Link: https://lore.kernel.org/r/156341207846.292348.10435719262819764054.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The libnvdimm subsystem arranges for devices to be destroyed as a result
of a sysfs operation. Since device_unregister() cannot be called from
an actively running sysfs attribute of the same device libnvdimm
arranges for device_unregister() to be performed in an out-of-line async
context.
The driver core maintains a 'dead' state for coordinating its own racing
async registration / de-registration requests. Rather than add local
'dead' state tracking infrastructure to libnvdimm device objects, export
the existing state tracking via a new kill_device() helper.
The kill_device() helper simply marks the device as dead, i.e. that it
is on its way to device_del(), or returns that the device was already
dead. This can be used in advance of calling device_unregister() for
subsystems like libnvdimm that might need to handle multiple user
threads racing to delete a device.
This refactoring does not change any behavior, but it is a pre-requisite
for follow-on fixes and therefore marked for -stable.
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Fixes: 4d88a97aa9 ("libnvdimm, nvdimm: dimm driver and base libnvdimm device-driver...")
Cc: <stable@vger.kernel.org>
Tested-by: Jane Chu <jane.chu@oracle.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/156341207332.292348.14959761496009347574.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
There's an intentional switch case fall-through in Cavium Octeon USB
code, which triggers compile errors with -Wimplicit-fallthrough due to
-Werror being enabled for arch/mips.
This can be encountered when building cavium_octeon_defconfig.
Fix the build issue by annotating the intentional fall-through.
Signed-off-by: Paul Burton <paul.burton@mips.com>
Cc: linux-mips@vger.kernel.org
kvm_compute_return_epc contains a switch statement with an intentional
fall-through from a case handling jal (jump and link) instructions to
one handling j (jump) instructions. With -Wimplicit-fallthrough this
triggers a compile error (due to -Werror being enabled for arch/mips).
This can be reproduced using malta_kvm_defconfig.
Fix this by annotating the intentional fall-through.
Signed-off-by: Paul Burton <paul.burton@mips.com>
Cc: linux-mips@vger.kernel.org
Because CONFIG_OF defined for MIPS, cacheinfo attempts to fill information
from DT, ignoring data filled by architecture routine. This leads to error
reported
cacheinfo: Unable to detect cache hierarchy for CPU 0
Way to fix this provided in
commit fac5148257 ("drivers: base: cacheinfo: fix x86 with
CONFIG_OF enabled")
Utilize same mechanism to report that cacheinfo set by architecture
specific function
Signed-off-by: Vladimir Kondratiev <vladimir.kondratiev@linux.intel.com>
Signed-off-by: Paul Burton <paul.burton@mips.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: linux-mips@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
[Why]:
Some active dongles have DP++ port and DP port at the same time. Current
code doesn't cover DP++ case and processes as default DVI case, in which
audio is disabled. Because of dual mode, DP case is also treat as DVI case
for the other port.
[How]:
According DP 1.4 spec, add DP++ procedure similar with HDMI case. Also
add None dongle type for DP case.
Signed-off-by: Dale Zhao <dale.zhao@amd.com>
Reviewed-by: Wenjing Liu <wenjing.liu@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
Previously assume eDP sink present if connector present. Do not
need to enforce this restriction. Fix issue where driver attempt
to read link setting even though no edp connected.
{How]
Only read link setting after reading connection status.
Signed-off-by: Eric Yang <Eric.Yang2@amd.com>
Reviewed-by: Yongqiang Sun <yongqiang.sun@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
The audios array defined in "struct resource_pool" is only 6 (MAX_PIPES)
but the max number of audio devices (num_audio) is 7. In some projects,
it will run out of audios array.
[How]
Incraese the audios array size to 7.
Signed-off-by: Tai Man <taiman.wong@amd.com>
Reviewed-by: Joshua Aberback <Joshua.Aberback@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
For boards that support eDP but do not have a physical eDP
display connected an ASSERT will be thrown. This is not a
critical failure and shouldn't be treated as such.
[How]
Drop the assertion.
Signed-off-by: Zhan Liu <zhan.liu@amd.com>
Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
In dm_helpers_parse_edid_caps, there is a corner case where no speakers
can be allocated even though the audio mode count is greater than 0.
Enabling audio when no speaker allocations exists can cause issues in
the video stream.
[How]
Add a check to not enable audio unless one or more speaker allocations
exist (since doing this can cause issues in the video stream).
Signed-off-by: Alvin Lee <alvin.lee2@amd.com>
Reviewed-by: Jun Lei <Jun.Lei@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
It is possible (but very unlikely) that constructing dc fails
before current_state is created.
We support 666 color depth in some scenarios, but this
isn't handled in get_norm_pix_clk. It uses exactly the
same pixel clock as the 888 case.
[How]
Check for non null current_state before destructing.
Add case for 666 color depth to get_norm_pix_clk to
avoid assertion.
Signed-off-by: Julian Parkin <julian.parkin@amd.com>
Reviewed-by: Charlene Liu <Charlene.Liu@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
Driver will create 0, 1, and 2 ddc engines for RV2,
but some platforms used 0, 1, and 3.
[How]
Still allocate 4 ddc engines for RV2.
Signed-off-by: Derek Lai <Derek.Lai@amd.com>
Reviewed-by: Aric Cyr <Aric.Cyr@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
Seamless boot optimization removed proper front end power off sequence.
In driver disable enable case, this causes driver to power gate hubp
and dpp while there is still memory fetching going on, this can cause
invalid memory requests to be generated which will hang data fabric.
[How]
Put back proper front end power off sequence
Signed-off-by: Eric Yang <Eric.Yang2@amd.com>
Reviewed-by: Anthony Koo <Anthony.Koo@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Acked-by: Tony Cheng <Tony.Cheng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[why]
In pipe split issue occurs when we program immediate flip while vsync flip is pending
[how]
Don't program immediate flip until flip is no longer pending
Signed-off-by: Alvin Lee <alvin.lee2@amd.com>
Reviewed-by: Jaehyun Chung <Jaehyun.Chung@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
Some display's hsync width is lower than the minimum dcn20 is set
to support right now. This will cause optc1_validate_timing to fail which
eventually will result in wrong set mode. This was set to 8 as per
HW team's request for no valid reason.
[How]
Changing min_h_sync_width to 4 will let us validate timing for
preffered mode and light up the headset. This change was made
to Vega 10 before for a similar issue.
Signed-off-by: Fatemeh Darbehani <fatemeh.darbehani@amd.com>
Reviewed-by: Joshua Aberback <Joshua.Aberback@amd.com>
Acked-by: Aric Cyr <Aric.Cyr@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
On some platforms, the encoder id 3 is not populated. So the encoders
are not stored in right order as index (id: 0, 1, 2, 4, 5) at pool. This
would cause encoders id 4 & id 5 to fail when finding corresponding
audio device, defaulting to the first available audio device. As result,
we cannot stream audio into two DP ports with encoders id 4 & id 5.
[How]
It need to create enough audio device objects (0 - 5) to perform matching.
Then use encoder engine id to find matched audio device.
Signed-off-by: Tai Man <taiman.wong@amd.com>
Reviewed-by: Charlene Liu <Charlene.Liu@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[why]
When the system is going into suspend, set_backlight gets called
after the eDP got blanked. Since smooth brightness is enabled,
the driver will make a call into the DMCU to ramp the brightness.
The DMCU would try to enable ABM to do so. But since the display is
blanked, this ends up causing ABM1_ACE_DBUF_REG_UPDATE_PENDING to
get stuck at 1, which results in a dead lock in the DMCU firmware.
[how]
Disable brightness ramping when the eDP display is blanked.
Signed-off-by: Zi Yu Liao <ziyu.liao@amd.com>
Reviewed-by: Eric Yang <eric.yang2@amd.com>
Acked-by: Anthony Koo <Anthony.Koo@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
When we recover from hang, we do not want to skip the audio enable call.
[How]
Disable audio in dc_reinitialize_hardware
Signed-off-by: Alvin Lee <alvin.lee2@amd.com>
Reviewed-by: Jun Lei <Jun.Lei@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
When launch D10.2, driver will write DPCD 0x107 with 0x00
[How]
Read MAX_DOWNSPREAD (0x0003h) then keep in current
link settings
Signed-off-by: Derek Lai <Derek.Lai@amd.com>
Reviewed-by: Tony Cheng <Tony.Cheng@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[WHY]
Currently we don't wait for blacklight programming completion in DMCU
when setting backlight level. Some sequences such as PSR static screen
event trigger reprogramming requires it to be complete.
[How]
Add generic wait for dmcu command completion in set backlight level.
Signed-off-by: SivapiriyanKumarasamy <sivapiriyan.kumarasamy@amd.com>
Reviewed-by: Anthony Koo <Anthony.Koo@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
Hardware docs state that we must wait until the GPUVM context is ready
after programming it.
[How]
Poll until the valid bit of PAGE_TABLE_BASE_ADDR_LO32 is set to 1 after
programming it.
v2: fix include for udelay (Alex)
Signed-off-by: Julian Parkin <julian.parkin@amd.com>
Reviewed-by: Charlene Liu <Charlene.Liu@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[why]
The current code will not wait for the entire frame
after global unlock.
This causes dsc dynamic target bpp update corruption when
there is a surface update immediately happens after this.
[how]
Wait for the entire whole frame after unlock before continuing
the rest of stream and surface update.
Signed-off-by: Wenjing Liu <Wenjing.Liu@amd.com>
Reviewed-by: Jun Lei <Jun.Lei@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
For DCE110, DCE112 and DCE120 the max_clks_by_state for the clk_mgr are
copied from their respective table before the call to
dce_clk_mgr_construct, but then dce_clk_mgr_construct overwrites
these with the dce80_max_clks_by_state.
[How]
Copy these after we call dce_clk_mgr_construct so we're using the
right tables.
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Reviewed-by: David Francis <David.Francis@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[why]
We reset the optimized_required in atomic_plane_disable
flag immediately after it is set in atomic_plane_disconnect, causing us to
never have flag set during next flip in UpdatePlanes.
[how]
Optimize directly after each time plane is removed.
Signed-off-by: Murton Liu <murton.liu@amd.com>
Reviewed-by: Tony Cheng <Tony.Cheng@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
Fixes issue when we have a display connected using a passive
dongle and then emulate over it using a DP connection at 1 x 1.62 Ghz.
System hangs because register bus returns back 0xFFFFFFFF for all
register reads after setting register DIG_BE_CNTL in
dcn10_link_encoder_connect_dig_be_to_fe(). Hang occurs later
when trying to do a register read.
[How]
At the start of the emulation, dc_link_set_preferred_link_settings()
and dp_retrain_link_dp_test() is called, even though it is connected
using a passive dongle.
Add an extra condition in dp_retrain_link_dp_test() to check for
link->dongle_max_pix_clk > 0. This is the only way we know if the
connection is using passive dongle so we don't retrain DP.
Signed-off-by: Samson Tam <Samson.Tam@amd.com>
Reviewed-by: Jun Lei <Jun.Lei@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[why]
Currently logical values are swapped in HW, causing
system aperture to be undefined, so VA and PA cannot co-exist
[how]
program values correctly
Signed-off-by: Jun Lei <Jun.Lei@amd.com>
Reviewed-by: Yongqiang Sun <yongqiang.sun@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[why]
'second_line_offset_adj' was mistakenly left at zero, even though DSC spec
v1.2a recommends setting this field to 512 for 4:2:0.
[how]
Set 'second_line_offset_adj' to 512 for 4:2:0 and leave at zero otherwise
Signed-off-by: Nikola Cornij <nikola.cornij@amd.com>
Reviewed-by: Eric Bernstein <Eric.Bernstein@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
There are certain MST displays (i.e. Dell P2715Q)
that although have the MST feature set to off may still
report it is a branch device and a non-zero
value for downstream port present.
This can lead to us incorrectly classifying a
dp dongle connection as being active and
disabling the audio endpoint for the display.
[How]
Modified the placement and
condition used to assign
the is_branch_dev bit.
Signed-off-by: Harmanprit Tatla <harmanprit.tatla@amd.com>
Reviewed-by: Aric Cyr <aric.cyr@amd.com>
Acked-by: Anthony Koo <Anthony.Koo@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[why]
Due to limitation in SMU/PPLIB, it is not possible to know Fmax @ Vmin for DCFCLK.
This causes issues at high display configurations where extra headroom of DCFCLK
can enable P-state switching
[how]
Use existing override logic. If override not defined, then force
min = 507
Signed-off-by: Jun Lei <Jun.Lei@amd.com>
Reviewed-by: Eric Yang <eric.yang2@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[why]
Some values were not being converted or bit-shifted properly for
HW registers, causing black screen
[how]
Fix up the values before programming HW
Signed-off-by: Jun Lei <jun.lei@amd.com>
Reviewed-by: Anthony Koo <Anthony.Koo@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[why]
On some modes SMU will be in infinite loop state at boot, this is
because driver assumes p_state_support is false, but this is the
opposite of the assumed boot state by SMU. we optimize away
notifying SMU about no pstate, and so they will get stuck
[how]
when we init clk manager, init pstate to true, so it matches driver load
assumption
Signed-off-by: Jun Lei <Jun.Lei@amd.com>
Reviewed-by: Tony Cheng <Tony.Cheng@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Since we are using the signed FW now, and also using PSP firmware loading,
but it's still potential to break driver when loading FW directly
instead of PSP, so we should add offset.
Signed-off-by: Leo Liu <leo.liu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In function __ttm_dma_alloc_page(), d_page->addr is allocated
by dma_alloc_attrs() but freed with use dma_free_coherent() in
__ttm_dma_free_page().
Use the correct dma_free_attrs() to free d_page->vaddr.
Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
v2:
set average clock value on level 1 when current clock equal
min or max clock (fine grained dpm support).
the navi10 gfxclk (sclk) support fine grained DPM,
so use level 1 to show current dpm freq in sysfs pp_dpm_xxx
Signed-off-by: Kevin Wang <kevin1.wang@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
this function is not needed any more. error injection is
the only way to validate ras but it can't be executed in
amdgpu_ras_init, where gpu is even not initialized
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
error injection to other IP blocks (except UMC) will be enabled
until RAS feature stablize on those IP blocks
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The GDS and GWS blocks default to allowing all VMIDs to
access all entries. Graphics VMIDs can handle setting
these limits when the driver launches work. However,
compute workloads under HWS control don't go through the
kernel driver. Instead, HWS firmware should set these
limits when a process is put into a VMID slot.
Disable access to these devices by default by turning off
all mask bits (for OA) and setting BASE=SIZE=0 (for GDS
and GWS) for all compute VMIDs. If a process wants to use
these resources, they can request this from the HWS
firmware (when such capabilities are enabled). HWS will
then handle setting the base and limit for the process when
it is assigned to a VMID.
This will also prevent user kernels from getting 'stuck' in
GWS by accident if they write GWS-using code but HWS
firmware is not set up to handle GWS reset. Until HWS is
enabled to handle GWS properly, all GWS accesses will
MEM_VIOL fault the kernel.
v2: Move initialization outside of SRBM mutex
Signed-off-by: Joseph Greathouse <Joseph.Greathouse@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Place NF_CONNTRACK_BRIDGE away from the NF_TABLES_BRIDGE dependency.
Fixes: 3c171f496e ("netfilter: bridge: add connection tracking system")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
If it is a module, request this module. Otherwise, if it is compiled
built-in or not selected, skip this.
Fixes: 0ef1efd135 ("netfilter: nf_tables: force module load in case select_ops() returns -EAGAIN")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
14:51:00.024418 IP 192.168.122.1.41462 > netfilter.90: Flags [S], seq
4023580551,
14:51:00.024454 IP netfilter.90 > 192.168.122.1.41462: Flags [S.], seq
727560212, ack 4023580552,
14:51:00.024524 IP 192.168.122.1.41462 > netfilter.90: Flags [.], ack 1,
Note: here, synproxy will send a SYN to the real server, as the 3whs was
completed sucessfully. Instead of a syn/ack that we can intercept, we instead
received a reset packet from the real backend, that we forward to the original
client. However, we don't use the correct sequence number, so the reset is not
effective in closing the connection coming from the client.
14:51:00.024550 IP netfilter.90 > 192.168.122.1.41462: Flags [R.], seq
3567407084,
14:51:00.231196 IP 192.168.122.1.41462 > netfilter.90: Flags [.], ack 1,
14:51:00.647911 IP 192.168.122.1.41462 > netfilter.90: Flags [.], ack 1,
14:51:01.474395 IP 192.168.122.1.41462 > netfilter.90: Flags [.], ack 1,
Fixes: 48b1de4c11 ("netfilter: add SYNPROXY core/target")
Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Trying to create an inet family nat chain would not cause
nft_chain_nat.ko module to auto-load due to missing module alias. Add a
proper one with hard-coded family value 1 for the pseudo-family
NFPROTO_INET.
Fixes: d164385ec5 ("netfilter: nat: add inet family nat support")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
There is a hang issue while using fio to do some basic test. The issue
can be easily reproduced using the below script:
while true
do
fio --ioengine=io_uring -rw=write -bs=4k -numjobs=1 \
-size=1G -iodepth=64 -name=uring --filename=/dev/zero
done
After several minutes (or more), fio would block at
io_uring_enter->io_cqring_wait in order to waiting for previously
committed sqes to be completed and can't return to user anymore until
we send a SIGTERM to fio. After receiving SIGTERM, fio hangs at
io_ring_ctx_wait_and_kill with a backtrace like this:
[54133.243816] Call Trace:
[54133.243842] __schedule+0x3a0/0x790
[54133.243868] schedule+0x38/0xa0
[54133.243880] schedule_timeout+0x218/0x3b0
[54133.243891] ? sched_clock+0x9/0x10
[54133.243903] ? wait_for_completion+0xa3/0x130
[54133.243916] ? _raw_spin_unlock_irq+0x2c/0x40
[54133.243930] ? trace_hardirqs_on+0x3f/0xe0
[54133.243951] wait_for_completion+0xab/0x130
[54133.243962] ? wake_up_q+0x70/0x70
[54133.243984] io_ring_ctx_wait_and_kill+0xa0/0x1d0
[54133.243998] io_uring_release+0x20/0x30
[54133.244008] __fput+0xcf/0x270
[54133.244029] ____fput+0xe/0x10
[54133.244040] task_work_run+0x7f/0xa0
[54133.244056] do_exit+0x305/0xc40
[54133.244067] ? get_signal+0x13b/0xbd0
[54133.244088] do_group_exit+0x50/0xd0
[54133.244103] get_signal+0x18d/0xbd0
[54133.244112] ? _raw_spin_unlock_irqrestore+0x36/0x60
[54133.244142] do_signal+0x34/0x720
[54133.244171] ? exit_to_usermode_loop+0x7e/0x130
[54133.244190] exit_to_usermode_loop+0xc0/0x130
[54133.244209] do_syscall_64+0x16b/0x1d0
[54133.244221] entry_SYSCALL_64_after_hwframe+0x49/0xbe
The reason is that we had added a req to ctx->pending_async at the very
end, but it didn't get a chance to be processed. How could this happen?
fio#cpu0 wq#cpu1
io_add_to_prev_work io_sq_wq_submit_work
atomic_read() <<< 1
atomic_dec_return() << 1->0
list_empty(); <<< true;
list_add_tail()
atomic_read() << 0 or 1?
As atomic_ops.rst states, atomic_read does not guarantee that the
runtime modification by any other thread is visible yet, so we must take
care of that with a proper implicit or explicit memory barrier.
This issue was detected with the help of Jackie's <liuyun01@kylinos.cn>
Fixes: 31b5151064 ("io_uring: allow workqueue item to handle multiple buffered requests")
Signed-off-by: Zhengyuan Liu <liuzhengyuan@kylinos.cn>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Oleg noticed that our checking of data.got_token is unsafe in the
cleanup case, and should really use a memory barrier. Use a wmb on the
write side, and a rmb() on the read side. We don't need one in the main
loop since we're saved by set_current_state().
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In case we get a spurious wakeup we need to make sure to re-set
ourselves to TASK_UNINTERRUPTIBLE so we don't busy wait.
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If we raced with somebody else getting an inflight counter we could fail
to get an inflight counter with no sleepers on the list, and thus need
to go to sleep. In this case has_sleepers should be true because we are
now relying on the waker to get our inflight counter for us. And in the
case of spurious wakeups we'd still want this to be the case. So set
has_sleepers to true if we went to sleep to make sure we're woken up the
proper way.
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We saw a hang in production with WBT where there was only one waiter in
the throttle path and no outstanding IO. This is because of the
has_sleepers optimization that is used to make sure we don't steal an
inflight counter for new submitters when there are people already on the
list.
We can race with our check to see if the waitqueue has any waiters (this
is done locklessly) and the time we actually add ourselves to the
waitqueue. If this happens we'll go to sleep and never be woken up
because nobody is doing IO to wake us up.
Fix this by checking if the waitqueue has a single sleeper on the list
after we add ourselves, that way we have an uptodate view of the list.
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
rq-qos sits in the io path so we want to take locks as sparingly as
possible. To accomplish this we try not to take the waitqueue head lock
unless we are sure we need to go to sleep, and we have an optimization
to make sure that we don't starve out existing waiters. Since we check
if there are existing waiters locklessly we need to be able to update
our view of the waitqueue list after we've added ourselves to the
waitqueue. Accomplish this by adding this helper to see if there is
more than just ourselves on the list.
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
xive_find_target_in_mask() has the following for(;;) loop which has a
bug when @first == cpumask_first(@mask) and condition 1 fails to hold
for every CPU in @mask. In this case we loop forever in the for-loop.
first = cpu;
for (;;) {
if (cpu_online(cpu) && xive_try_pick_target(cpu)) // condition 1
return cpu;
cpu = cpumask_next(cpu, mask);
if (cpu == first) // condition 2
break;
if (cpu >= nr_cpu_ids) // condition 3
cpu = cpumask_first(mask);
}
This is because, when @first == cpumask_first(@mask), we never hit the
condition 2 (cpu == first) since prior to this check, we would have
executed "cpu = cpumask_next(cpu, mask)" which will set the value of
@cpu to a value greater than @first or to nr_cpus_ids. When this is
coupled with the fact that condition 1 is not met, we will never exit
this loop.
This was discovered by the hard-lockup detector while running LTP test
concurrently with SMT switch tests.
watchdog: CPU 12 detected hard LOCKUP on other CPUs 68
watchdog: CPU 12 TB:85587019220796, last SMP heartbeat TB:85578827223399 (15999ms ago)
watchdog: CPU 68 Hard LOCKUP
watchdog: CPU 68 TB:85587019361273, last heartbeat TB:85576815065016 (19930ms ago)
CPU: 68 PID: 45050 Comm: hxediag Kdump: loaded Not tainted 4.18.0-100.el8.ppc64le #1
NIP: c0000000006f5578 LR: c000000000cba9ec CTR: 0000000000000000
REGS: c000201fff3c7d80 TRAP: 0100 Not tainted (4.18.0-100.el8.ppc64le)
MSR: 9000000002883033 <SF,HV,VEC,VSX,FP,ME,IR,DR,RI,LE> CR: 24028424 XER: 00000000
CFAR: c0000000006f558c IRQMASK: 1
GPR00: c0000000000afc58 c000201c01c43400 c0000000015ce500 c000201cae26ec18
GPR04: 0000000000000800 0000000000000540 0000000000000800 00000000000000f8
GPR08: 0000000000000020 00000000000000a8 0000000080000000 c00800001a1beed8
GPR12: c0000000000b1410 c000201fff7f4c00 0000000000000000 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000540 0000000000000001
GPR20: 0000000000000048 0000000010110000 c00800001a1e3780 c000201cae26ed18
GPR24: 0000000000000000 c000201cae26ed8c 0000000000000001 c000000001116bc0
GPR28: c000000001601ee8 c000000001602494 c000201cae26ec18 000000000000001f
NIP [c0000000006f5578] find_next_bit+0x38/0x90
LR [c000000000cba9ec] cpumask_next+0x2c/0x50
Call Trace:
[c000201c01c43400] [c000201cae26ec18] 0xc000201cae26ec18 (unreliable)
[c000201c01c43420] [c0000000000afc58] xive_find_target_in_mask+0x1b8/0x240
[c000201c01c43470] [c0000000000b0228] xive_pick_irq_target.isra.3+0x168/0x1f0
[c000201c01c435c0] [c0000000000b1470] xive_irq_startup+0x60/0x260
[c000201c01c43640] [c0000000001d8328] __irq_startup+0x58/0xf0
[c000201c01c43670] [c0000000001d844c] irq_startup+0x8c/0x1a0
[c000201c01c436b0] [c0000000001d57b0] __setup_irq+0x9f0/0xa90
[c000201c01c43760] [c0000000001d5aa0] request_threaded_irq+0x140/0x220
[c000201c01c437d0] [c00800001a17b3d4] bnx2x_nic_load+0x188c/0x3040 [bnx2x]
[c000201c01c43950] [c00800001a187c44] bnx2x_self_test+0x1fc/0x1f70 [bnx2x]
[c000201c01c43a90] [c000000000adc748] dev_ethtool+0x11d8/0x2cb0
[c000201c01c43b60] [c000000000b0b61c] dev_ioctl+0x5ac/0xa50
[c000201c01c43bf0] [c000000000a8d4ec] sock_do_ioctl+0xbc/0x1b0
[c000201c01c43c60] [c000000000a8dfb8] sock_ioctl+0x258/0x4f0
[c000201c01c43d20] [c0000000004c9704] do_vfs_ioctl+0xd4/0xa70
[c000201c01c43de0] [c0000000004ca274] sys_ioctl+0xc4/0x160
[c000201c01c43e30] [c00000000000b388] system_call+0x5c/0x70
Instruction dump:
78aad182 54a806be 3920ffff 78a50664 794a1f24 7d294036 7d43502a 7d295039
4182001c 48000034 78a9d182 79291f24 <7d23482a> 2fa90000 409e0020 38a50040
To fix this, move the check for condition 2 after the check for
condition 3, so that we are able to break out of the loop soon after
iterating through all the CPUs in the @mask in the problem case. Use
do..while() to achieve this.
Fixes: 243e25112d ("powerpc/xive: Native exploitation of the XIVE interrupt controller")
Cc: stable@vger.kernel.org # v4.12+
Reported-by: Indira P. Joga <indira.priya@in.ibm.com>
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1563359724-13931-1-git-send-email-ego@linux.vnet.ibm.com
Consider a sync bfq_queue Q that remains empty while in service, and
suppose that, when this happens, there is a fair amount of already
in-flight I/O not belonging to Q. In such a situation, I/O dispatching
may need to be plugged (until new I/O arrives for Q), for the
following reason.
The drive may decide to serve in-flight non-Q's I/O requests before
Q's ones, thereby delaying the arrival of new I/O requests for Q
(recall that Q is sync). If I/O-dispatching is not plugged, then,
while Q remains empty, a basically uncontrolled amount of I/O from
other queues may be dispatched too, possibly causing the service of
Q's I/O to be delayed even longer in the drive. This problem gets more
and more serious as the speed and the queue depth of the drive grow,
because, as these two quantities grow, the probability to find no
queue busy but many requests in flight grows too.
If Q has the same weight and priority as the other queues, then the
above delay is unlikely to cause any issue, because all queues tend to
undergo the same treatment. So, since not plugging I/O dispatching is
convenient for throughput, it is better not to plug. Things change in
case Q has a higher weight or priority than some other queue, because
Q's service guarantees may simply be violated. For this reason,
commit 1de0c4cd9e ("block, bfq: reduce idling only in symmetric
scenarios") does plug I/O in such an asymmetric scenario. Plugging
minimizes the delay induced by already in-flight I/O, and enables Q to
recover the bandwidth it may lose because of this delay.
Yet the above commit does not cover the case of weight-raised queues,
for efficiency concerns. For weight-raised queues, I/O-dispatch
plugging is activated simply if not all bfq_queues are
weight-raised. But this check does not handle the case of in-flight
requests, because a bfq_queue may become non busy *before* all its
in-flight requests are completed.
This commit performs I/O-dispatch plugging for weight-raised queues if
there are some in-flight requests.
As a practical example of the resulting recover of control, under
write load on a Samsung SSD 970 PRO, gnome-terminal starts in 1.5
seconds after this fix, against 15 seconds before the fix (as a
reference, gnome-terminal takes about 35 seconds to start with any of
the other I/O schedulers).
Fixes: 1de0c4cd9e ("block, bfq: reduce idling only in symmetric scenarios")
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The codecs without jack detection also don't have to be resumed
forcibly because, obviously, they have no jack. Skip the forced
resume in such a case as optimization as well.
Reviewed-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The GPIO SPI master has some code in its local CS
callback to set the initial sck GPIO value. This was
lost in the commit converting it to use SPI core
GPIO handling as this callback isn't called if the
internal GPIO handling is active.
Add the special SPI_MASTER_GPIO_SS to ascertain it
gets called anyway so we get the initial SCK setting
right. There is some platform provided GPIO handling
there as well but this will be skipped as the cs_gpios
will be NULL.
My test targets seem not to care about the initial
SCK value so I am uncertain if this is a regression,
but to preserve the previous semantic we better do
this.
Cc: Andrey Smirnov <andrew.smirnov@gmail.com>
Fixes: 249e2632dc ("spi: gpio: Don't request CS GPIO in DT use-case")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20190716204651.7743-1-linus.walleij@linaro.org
Signed-off-by: Mark Brown <broonie@kernel.org>
btrfs_get_io_geometry() calls btrfs_get_chunk_map() to acquire a reference
on a extent_map, but on normal operation it does not drop this reference
anymore.
This leads to excessive kmemleak reports.
Always call free_extent_map(), not just in the error case.
Fixes: 5f1411265e ("btrfs: Introduce btrfs_io_geometry infrastructure")
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
If CONFIG_BTRFS_FS is y and CONFIG_LIBCRC32C is m,
building fails:
fs/btrfs/super.o: In function `btrfs_mount_root':
super.c:(.text+0xb7f9): undefined reference to `crc32c_impl'
fs/btrfs/super.o: In function `init_btrfs_fs':
super.c:(.init.text+0x3465): undefined reference to `crc32c_impl'
fs/btrfs/extent-tree.o: In function `hash_extent_data_ref':
extent-tree.c:(.text+0xe60): undefined reference to `crc32c'
extent-tree.c:(.text+0xe78): undefined reference to `crc32c'
extent-tree.c:(.text+0xe8b): undefined reference to `crc32c'
fs/btrfs/dir-item.o: In function `btrfs_insert_xattr_item':
dir-item.c:(.text+0x291): undefined reference to `crc32c'
fs/btrfs/dir-item.o: In function `btrfs_insert_dir_item':
dir-item.c:(.text+0x429): undefined reference to `crc32c'
Select LIBCRC32C to fix it.
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: d5178578bc ("btrfs: directly call into crypto framework for checksumming")
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
As btrfs(5) specified:
Note
If nodatacow or nodatasum are enabled, compression is disabled.
If NODATASUM or NODATACOW set, we should not compress the extent.
Normally NODATACOW is detected properly in run_delalloc_range() so
compression won't happen for NODATACOW.
However for NODATASUM we don't have any check, and it can cause
compressed extent without csum pretty easily, just by:
mkfs.btrfs -f $dev
mount $dev $mnt -o nodatasum
touch $mnt/foobar
mount -o remount,datasum,compress $mnt
xfs_io -f -c "pwrite 0 128K" $mnt/foobar
And in fact, we have a bug report about corrupted compressed extent
without proper data checksum so even RAID1 can't recover the corruption.
(https://bugzilla.kernel.org/show_bug.cgi?id=199707)
Running compression without proper checksum could cause more damage when
corruption happens, as compressed data could make the whole extent
unreadable, so there is no need to allow compression for
NODATACSUM.
The fix will refactor the inode compression check into two parts:
- inode_can_compress()
As the hard requirement, checked at btrfs_run_delalloc_range(), so no
compression will happen for NODATASUM inode at all.
- inode_need_compress()
As the soft requirement, checked at btrfs_run_delalloc_range() and
compress_file_range().
Reported-by: James Harvey <jamespharvey20@gmail.com>
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Somehow the swapgs mitigation entry code patch ended up with a JMPQ
instruction instead of JMP, where only the short jump is needed. Some
assembler versions apparently fail to optimize JMPQ into a two-byte JMP
when possible, instead always using a 7-byte JMP with relocation. For
some reason that makes the entry code explode with a #GP during boot.
Change it back to "JMP" as originally intended.
Fixes: 18ec54fdd6 ("x86/speculation: Prepare entry code for Spectre v1 swapgs mitigations")
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Enable force feedback for the Thrustmaster Dual Trigger 2 in 1 Rumble Force
gamepad. Compared to other Thrustmaster devices, left and right rumble
motors here are swapped.
Signed-off-by: Ilya Trukhanov <lahvuun@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
This should help people identify the receiver. there are several receivers used
in gaming mice. the "lightspeed" technology is pretty well advertise so this
won't just be an obscure name.
Signed-off-by: Filipe Laíns <lains@archlinux.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Now that the latex_documents are handled automatically, we can
remove those extra conf.py files.
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Right now, for every directory that we need to have LaTeX output,
a conf.py file is required.
That causes an extra overhead and it is actually a hack, as
the latex_documents line there are usually a copy of the ones
that are there already at the main conf.py.
So, instead, re-use the global latex_documents var, just
adjusting the path to be relative ones.
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
The translations guide need Noto CJK fonts. So, add a logic that
would suggest its install for distros.
It also fix a few other issues while testing the script
with several distributions.
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
On Gentoo, the portage changes for ImageMagick to work are
always suggested, even if already applied. While the two
extra commands should be harmless, add a check to avoid
reporting it without need.
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
The name of the package with carries latexmk is different
on two distros:
- On OpenSUSE, latexmk is packaged as "texlive-latexmk-bin"
- On Mageia, latexmk is packaged at "texlive-collection-basic"
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
There aren't enough texlive packages for LaTeX-based builds
to work on CentOS/RHEL <= 7.
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
There's a missing parenthesis at the script, with causes it to
fail to detect non-Fedora releases (e. g. RHEL/CentOS).
Tested with Centos 7.6.1810.
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
If we try to build a book with asian characters with XeLaTeX
and the font is not available, it will produce an error.
So, instead, add a logic at conf.py to detect if the proper
font is installed.
This will avoid an error while building the document, although
the result may not be readable.
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
In order to be able to output Asian symbols with XeLaTeX, we
need the xeCJK package, and a default font for CJK symbols.
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Currently, all index files should be manually added to the
latex_documents array at conf.py.
While this allows fine-tuning some LaTeX specific things, like
the name of the output file and the name of the document, it
is not uncommon to forget adding new documents there.
So, add a logic that will seek for all Documentation/*/index.rst.
If the index is not yet at latex_documents, it includes using
a reasonable default.
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Some files got renamed but probably due to some merge conflicts,
a few references still point to the old locations.
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
The power docs are orphaned at the documentation body.
While it could likely be moved to be inside some guide, I'm opting to just
adding it to the main index.rst, removing the :orphan: and adding the SPDX
header.
The reason is similar to what it was done for other driver-specific
subsystems: the docs there contain a mix of Kernelspace, uAPI and
admin-guide. So, better to keep them on its own directory,
while the docs there are not properly classified.
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Convert docs to ReST and add them to the arch-specific
book.
The conversion here was trivial, as almost every file there
was already using an elegant format close to ReST standard.
The changes were mostly to mark literal blocks and add a few
missing section title identifiers.
One note with regards to "--": on Sphinx, this can't be used
to identify a list, as it will format it badly. This can be
used, however, to identify a long hyphen - and "---" is an
even longer one.
At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> # cxl
*Really* mark the literal block as such.
Fixes: 127e621740 ("vfio-ccw: Update documentation for csch/hsch")
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
With the current implementation, phydev cannot be removed:
$ ip link add dummy type dummy
$ ip link add xfrm1 type xfrm dev dummy if_id 1
$ ip l d dummy
kernel:[77938.465445] unregister_netdevice: waiting for dummy to become free. Usage count = 1
Manage it like in ip tunnels, ie just keep the ifindex. Not that the side
effect, is that the phydev is now optional.
Fixes: f203b76d78 ("xfrm: Add virtual xfrm interfaces")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Tested-by: Julien Floret <julien.floret@6wind.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
dev_net(dev) is the netns of the device and xi->net is the link netns,
where the device has been linked.
changelink() must operate in the link netns to avoid a corruption of
the xfrm lists.
Note that xi->net and dev_net(xi->physdev) are always the same.
Before the patch, the xfrmi lists may be corrupted and can later trigger a
kernel panic.
Fixes: f203b76d78 ("xfrm: Add virtual xfrm interfaces")
Reported-by: Julien Floret <julien.floret@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Tested-by: Julien Floret <julien.floret@6wind.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
The ifname is copied when the interface is created, but is never updated
later. In fact, this property is used only in one error message, where the
netdevice pointer is available, thus let's use it.
Fixes: f203b76d78 ("xfrm: Add virtual xfrm interfaces")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
The new parameters must not be stored in the netdev_priv() before
validation, it may corrupt the interface. Note also that if data is NULL,
only a memset() is done.
$ ip link add xfrm1 type xfrm dev lo if_id 1
$ ip link add xfrm2 type xfrm dev lo if_id 2
$ ip link set xfrm1 type xfrm dev lo if_id 2
RTNETLINK answers: File exists
$ ip -d link list dev xfrm1
5: xfrm1@lo: <NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/none 00:00:00:00:00:00 brd 00:00:00:00:00:00 promiscuity 0 minmtu 68 maxmtu 1500
xfrm if_id 0x2 addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
=> "if_id 0x2"
Fixes: f203b76d78 ("xfrm: Add virtual xfrm interfaces")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Tested-by: Julien Floret <julien.floret@6wind.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Refactoring of axp20x driver introduced a bug in AXP803's DCDC6
regulator definition. AXP803_DCDC6_1120mV_STEPS was obtained by
subtracting 0x47 and 0x33. This should be 0x14 (hex) and not 14
(dec).
Refactoring also carried over a bug in DCDC5 regulator definition.
Number of possible voltages must be for 1 bigger than maximum valid
voltage index, because 0 is also valid and it means lowest voltage.
Fixes: 1dbe0ccb06 ("regulator: axp20x-regulator: add support for AXP803")
Fixes: db4a555f7c ("regulator: axp20x: use defines for masks")
Signed-off-by: Jernej Skrabec <jernej.skrabec@siol.net>
Link: https://lore.kernel.org/r/20190713090717.347-3-jernej.skrabec@siol.net
Signed-off-by: Mark Brown <broonie@kernel.org>
Refactoring of the driver introduced bugs in AXP806's DCDCA and DCDCD
regulator definitions.
In DCDCA case, AXP806_DCDCA_1120mV_STEPS was obtained by subtracting
0x47 and 0x33. This should be 0x14 (hex) and not 14 (dec).
In DCDCD case, axp806_dcdcd_ranges[] contains two ranges with same
start and end macros, which is clearly wrong. Second range starts at
1.6V so it should use AXP806_DCDCD_1600mV_[START|END] macros. They are
already defined but unused.
Fixes: db4a555f7c ("regulator: axp20x: use defines for masks")
Signed-off-by: Jernej Skrabec <jernej.skrabec@siol.net>
Link: https://lore.kernel.org/r/20190713090717.347-2-jernej.skrabec@siol.net
Signed-off-by: Mark Brown <broonie@kernel.org>
The runtime configurable module parameter files are located under
/sys/module/MODULENAME/parameters, not /sys/module/MODULENAME.
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Currently, ->pd_stat() is called only when moduleparam
blkcg_debug_stats is set which prevents it from printing non-debug
policy-specific statistics. Let's move debug testing down so that
->pd_stat() can print non-debug stat too. This patch doesn't cause
any visible behavior change.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We could queue a work for each req in defer and link list without
increasing async_list->cnt, so we shouldn't decrease it while exiting
from workqueue as well if we didn't process the req in async list.
Thanks to Jens Axboe <axboe@kernel.dk> for his guidance.
Fixes: 31b5151064 ("io_uring: allow workqueue item to handle multiple buffered requests")
Signed-off-by: Zhengyuan Liu <liuzhengyuan@kylinos.cn>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
sq->cached_sq_head and cq->cached_cq_tail are both unsigned int. If
cached_sq_head overflows before cached_cq_tail, then we may miss a
barrier req. As cached_cq_tail always follows cached_sq_head, the NQ
should be enough.
Cc: stable@vger.kernel.org
Fixes: de0617e467 ("io_uring: add support for marking commands as draining")
Signed-off-by: Zhengyuan Liu <liuzhengyuan@kylinos.cn>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The rule below doesn't work as the kernel raises -ERANGE.
nft add rule netdev nftlb lb01 ip daddr set \
symhash mod 1 map { 0 : 192.168.0.10 } fwd to "eth0"
This patch allows to use the symhash modulus with one
element, in the same way that the other types of hashes and
algorithms that uses the modulus parameter.
Signed-off-by: Laura Garcia Liebana <nevola@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The following nftables test case fails on nf-next:
tests/shell/run-tests.sh tests/shell/testcases/transactions/0011chain_0
The test case contains:
add chain x y { type filter hook input priority 0; }
add chain x y { policy drop; }"
The new test
if (chain->flags ^ flags)
return -EOPNOTSUPP;
triggers here, because chain->flags has NFT_BASE_CHAIN set, but flags
is 0 because no flag attribute was present in the policy update.
Just fetch the current flag settings of a pre-existing chain in case
userspace did not provide any.
Fixes: c9626a2cbd ("netfilter: nf_tables: add hardware offload support")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Jakub Jankowski reported following oddity:
After 3 way handshake completes, timeout of new connection is set to
max_retrans (300s) instead of established (5 days).
shortened excerpt from pcap provided:
25.070622 IP (flags [DF], proto TCP (6), length 52)
10.8.5.4.1025 > 10.8.1.2.80: Flags [S], seq 11, win 64240, [wscale 8]
26.070462 IP (flags [DF], proto TCP (6), length 48)
10.8.1.2.80 > 10.8.5.4.1025: Flags [S.], seq 82, ack 12, win 65535, [wscale 3]
27.070449 IP (flags [DF], proto TCP (6), length 40)
10.8.5.4.1025 > 10.8.1.2.80: Flags [.], ack 83, win 512, length 0
Turns out the last_win is of u16 type, but we store the scaled value:
512 << 8 (== 0x20000) becomes 0 window.
The Fixes tag is not correct, as the bug has existed forever, but
without that change all that this causes might cause is to mistake a
window update (to-nonzero-from-zero) for a retransmit.
Fixes: fbcd253d24 ("netfilter: conntrack: lower timeout to RETRANS seconds if window is 0")
Reported-by: Jakub Jankowski <shasta@toxcorp.com>
Tested-by: Jakub Jankowski <shasta@toxcorp.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Now synproxy sends the mss value set by the user on client syn-ack packet
instead of the mss value that client announced.
Fixes: 48b1de4c11 ("netfilter: add SYNPROXY core/target")
Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
In 9fb9cbb108 ("[NETFILTER]: Add nf_conntrack subsystem.") the new
generic nf_conntrack was introduced, and it came to supersede the old
ip_conntrack.
This change updates (some) of the obsolete comments referring to old
file/function names of the ip_conntrack mechanism, as well as removes a
few self-referencing comments that we shouldn't maintain anymore.
I did not update any comments referring to historical actions (e.g,
comments like "this file was derived from ..." were left untouched, even
if the referenced file is no longer here).
Signed-off-by: Yonatan Goldschmidt <yon.goldschmidt@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
When conntracks change during a dialog, SDP messages may be sent from
different conntracks to establish expects with identical tuples. In this
case expects conflict may be detected for the 2nd SDP message and end up
with a process failure.
The fixing here is to reuse an existing expect who has the same tuple for a
different conntrack if any.
Here are two scenarios for the case.
1)
SERVER CPE
| INVITE SDP |
5060 |<----------------------|5060
| 100 Trying |
5060 |---------------------->|5060
| 183 SDP |
5060 |---------------------->|5060 ===> Conntrack 1
| PRACK |
50601 |<----------------------|5060
| 200 OK (PRACK) |
50601 |---------------------->|5060
| 200 OK (INVITE) |
5060 |---------------------->|5060
| ACK |
50601 |<----------------------|5060
| |
|<--- RTP stream ------>|
| |
| INVITE SDP (t38) |
50601 |---------------------->|5060 ===> Conntrack 2
With a certain configuration in the CPE, SIP messages "183 with SDP" and
"re-INVITE with SDP t38" will go through the sip helper to create
expects for RTP and RTCP.
It is okay to create RTP and RTCP expects for "183", whose master
connection source port is 5060, and destination port is 5060.
In the "183" message, port in Contact header changes to 50601 (from the
original 5060). So the following requests e.g. PRACK and ACK are sent to
port 50601. It is a different conntrack (let call Conntrack 2) from the
original INVITE (let call Conntrack 1) due to the port difference.
In this example, after the call is established, there is RTP stream but no
RTCP stream for Conntrack 1, so the RTP expect created upon "183" is
cleared, and RTCP expect created for Conntrack 1 retains.
When "re-INVITE with SDP t38" arrives to create RTP&RTCP expects, current
ALG implementation will call nf_ct_expect_related() for RTP and RTCP. The
expects tuples are identical to those for Conntrack 1. RTP expect for
Conntrack 2 succeeds in creation as the one for Conntrack 1 has been
removed. RTCP expect for Conntrack 2 fails in creation because it has
idential tuples and 'conflict' with the one retained for Conntrack 1. And
then result in a failure in processing of the re-INVITE.
2)
SERVER A CPE
| REGISTER |
5060 |<------------------| 5060 ==> CT1
| 200 |
5060 |------------------>| 5060
| |
| INVITE SDP(1) |
5060 |<------------------| 5060
| 300(multi choice) |
5060 |------------------>| 5060 SERVER B
| ACK |
5060 |<------------------| 5060
| INVITE SDP(2) |
5060 |-------------------->| 5060 ==> CT2
| 100 |
5060 |<--------------------| 5060
| 200(contact changes)|
5060 |<--------------------| 5060
| ACK |
5060 |-------------------->| 50601 ==> CT3
| |
|<--- RTP stream ---->|
| |
| BYE |
5060 |<--------------------| 50601
| 200 |
5060 |-------------------->| 50601
| INVITE SDP(3) |
5060 |<------------------| 5060 ==> CT1
CPE sends an INVITE request(1) to Server A, and creates a RTP&RTCP expect
pair for this Conntrack 1 (CT1). Server A responds 300 to redirect to
Server B. The RTP&RTCP expect pairs created on CT1 are removed upon 300
response.
CPE sends the INVITE request(2) to Server B, and creates an expect pair
for the new conntrack (due to destination address difference), let call
CT2. Server B changes the port to 50601 in 200 OK response, and the
following requests ACK and BYE from CPE are sent to 50601. The call is
established. There is RTP stream and no RTCP stream. So RTP expect is
removed and RTCP expect for CT2 retains.
As BYE request is sent from port 50601, it is another conntrack, let call
CT3, different from CT2 due to the port difference. So the BYE request will
not remove the RTCP expect for CT2.
Then another outgoing call is made, with the same RTP port being used (not
definitely but possibly). CPE firstly sends the INVITE request(3) to Server
A, and tries to create a RTP&RTCP expect pairs for this CT1. In current ALG
implementation, the RTCP expect for CT1 fails in creation because it
'conflicts' with the residual one for CT2. As a result the INVITE request
fails to send.
Signed-off-by: xiao ruizhu <katrina.xiaorz@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
When firewalld is enabled with ipv4/ipv6 rpfilter, vrf
ipv4/ipv6 packets will be dropped. Vrf device will pass
through netfilter hook twice. One with enslaved device
and another one with l3 master device. So in device may
dismatch witch out device because out device is always
enslaved device.So failed with the check of the rpfilter
and drop the packets by mistake.
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
There is a small window where it's possible that we could be working
on an interrupt (queued in the workqueue) and setting up a channel
program (i.e allocating memory, pinning pages, translating address).
This can lead to allocating and freeing the channel program at the
same time and can cause memory corruption.
Let's not call cp_free if we are currently processing a channel program.
The only way we know for sure that we don't have a thread setting
up a channel program is when the state is set to VFIO_CCW_STATE_CP_PENDING.
Fixes: d5afd5d135 ("vfio-ccw: add handling for async channel instructions")
Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <62e87bf67b38dc8d5760586e7c96d400db854ebe.1562854091.git.alifm@linux.ibm.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
The comment is misleading because it tells us that
we should set orb.cmd.c64 before calling ccwchain_calc_length,
otherwise the function ccwchain_calc_length would return an
error. This is not completely accurate.
We want to allow an orb without cmd.c64, and this is fine
as long as the channel program does not use IDALs. But we do
want to reject any channel program that uses IDALs and does
not set the flag, which is what we do in ccwchain_calc_length.
After we have done the ccw processing, we need to set cmd.c64,
as we use IDALs for all translated channel programs.
Also for better code readability let's move the setting of
cmd.c64 within the non error path.
Fixes: fb9e7880af ("vfio: ccw: push down unsupported IDA check")
Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <f68636106aef0faeb6ce9712584d102d1b315ff8.1562854091.git.alifm@linux.ibm.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Thomas and Juliana report a deadlock when running:
(rmmod nf_conntrack_netlink/xfrm_user)
conntrack -e NEW -E &
modprobe -v xfrm_user
They provided following analysis:
conntrack -e NEW -E
netlink_bind()
netlink_lock_table() -> increases "nl_table_users"
nfnetlink_bind()
# does not unlock the table as it's locked by netlink_bind()
__request_module()
call_usermodehelper_exec()
This triggers "modprobe nf_conntrack_netlink" from kernel, netlink_bind()
won't return until modprobe process is done.
"modprobe xfrm_user":
xfrm_user_init()
register_pernet_subsys()
-> grab pernet_ops_rwsem
..
netlink_table_grab()
calls schedule() as "nl_table_users" is non-zero
so modprobe is blocked because netlink_bind() increased
nl_table_users while also holding pernet_ops_rwsem.
"modprobe nf_conntrack_netlink" runs and inits nf_conntrack_netlink:
ctnetlink_init()
register_pernet_subsys()
-> blocks on "pernet_ops_rwsem" thanks to xfrm_user module
both modprobe processes wait on one another -- neither can make
progress.
Switch netlink_bind() to "nowait" modprobe -- this releases the netlink
table lock, which then allows both modprobe instances to complete.
Reported-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
Reported-by: Juliana Rodrigueiro <juliana.rodrigueiro@intra2net.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
25078dc1f7 first introduced an off by
one error in the ZONE_DMA initialization of PPC_BOOK3E_64=y and since
9739ab7eda the off by one applies to
PPC32=y too. This simply corrects the off by one and should resolve
crashes like below:
[ 65.179101] page 0x7fff outside node 0 zone DMA [ 0x0 - 0x7fff ]
Unfortunately in various MM places "max" means a non inclusive end of
range. free_area_init_nodes max_zone_pfn parameter is one case and
MAX_ORDER is another one (unrelated) that comes by memory.
Reported-by: Zorro Lang <zlang@redhat.com>
Fixes: 25078dc1f7 ("powerpc: use mm zones more sensibly")
Fixes: 9739ab7eda ("powerpc: enable a 30-bit ZONE_DMA for 32-bit pmac")
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190625141727.2883-1-aarcange@redhat.com
The Performance Stop Status and Control Register (PSSCR) is used to
control the power saving facilities of the processor. This register
has various fields, some of which can be modified only in hypervisor
state, and others which can be modified in both hypervisor and
privileged non-hypervisor state. The bits which can be modified in
privileged non-hypervisor state are referred to as guest visible.
Currently the L0 hypervisor saves and restores both it's own host
value as well as the guest value of the PSSCR when context switching
between the hypervisor and guest. However a nested hypervisor running
it's own nested guests (as indicated by kvmhv_on_pseries()) doesn't
context switch the PSSCR register. That means if a nested (L2) guest
modifies the PSSCR then the L1 guest hypervisor will run with that
modified value, and if the L1 guest hypervisor modifies the PSSCR and
then goes to run the nested (L2) guest again then the L2 PSSCR value
will be lost.
Fix this by having the (L1) nested hypervisor save and restore both
its host and the guest PSSCR value when entering and exiting a
nested (L2) guest. Note that only the guest visible parts of the PSSCR
are context switched since this is all the L1 nested hypervisor can
access, this is fine however as these are the only fields the L0
hypervisor provides guest control of anyway and so all other fields
are ignored.
This could also have been implemented by adding the PSSCR register to
the hv_regs passed to the L0 hypervisor as input to the H_ENTER_NESTED
hcall, however this would have meant updating the structure layout and
thus required modifications to both the L0 and L1 kernels. Whereas the
approach used doesn't require L0 kernel modifications while achieving
the same result.
Fixes: 95a6432ce9 ("KVM: PPC: Book3S HV: Streamlined guest entry/exit path on P9 for radix guests")
Cc: stable@vger.kernel.org # v4.20+
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190703012022.15644-3-sjitindarsingh@gmail.com
The ability to run nested guests under KVM means that a guest can also
act as a hypervisor for it's own nested guest. Currently
ppc_set_pmu_inuse() assumes that either FW_FEATURE_LPAR is set,
indicating a guest environment, and so sets the pmcregs_in_use flag in
the lppaca, or that it isn't set, indicating a hypervisor environment,
and so sets the pmcregs_in_use flag in the paca.
The pmcregs_in_use flag in the lppaca is used to communicate this
information to a hypervisor and so must be set in a guest environment.
The pmcregs_in_use flag in the paca is used by KVM code to determine
whether the host state of the performance monitoring unit (PMU) must
be saved and restored when running a guest.
Thus when a guest also acts as a hypervisor it must set this bit in
both places since it needs to ensure both that the real hypervisor
saves it's PMU registers when it runs (requires pmcregs_in_use flag in
lppaca), and that it saves it's own PMU registers when running a
nested guest (requires pmcregs_in_use flag in paca).
Modify ppc_set_pmu_inuse() so that the pmcregs_in_use bit is set in
both the lppaca and the paca when a guest (LPAR) is running with the
capability of running it's own guests (CONFIG_KVM_BOOK3S_HV_POSSIBLE).
Fixes: 95a6432ce9 ("KVM: PPC: Book3S HV: Streamlined guest entry/exit path on P9 for radix guests")
Cc: stable@vger.kernel.org # v4.20+
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190703012022.15644-2-sjitindarsingh@gmail.com
The performance monitoring unit (PMU) registers are saved on guest
exit when the guest has set the pmcregs_in_use flag in its lppaca, if
it exists, or unconditionally if it doesn't. If a nested guest is
being run then the hypervisor doesn't, and in most cases can't, know
if the PMU registers are in use since it doesn't know the location of
the lppaca for the nested guest, although it may have one for its
immediate guest. This results in the values of these registers being
lost across nested guest entry and exit in the case where the nested
guest was making use of the performance monitoring facility while it's
nested guest hypervisor wasn't.
Further more the hypervisor could interrupt a guest hypervisor between
when it has loaded up the PMU registers and it calling H_ENTER_NESTED
or between returning from the nested guest to the guest hypervisor and
the guest hypervisor reading the PMU registers, in
kvmhv_p9_guest_entry(). This means that it isn't sufficient to just
save the PMU registers when entering or exiting a nested guest, but
that it is necessary to always save the PMU registers whenever a guest
is capable of running nested guests to ensure the register values
aren't lost in the context switch.
Ensure the PMU register values are preserved by always saving their
value into the vcpu struct when a guest is capable of running nested
guests.
This should have minimal performance impact however any impact can be
avoided by booting a guest with "-machine pseries,cap-nested-hv=false"
on the qemu commandline.
Fixes: 95a6432ce9 ("KVM: PPC: Book3S HV: Streamlined guest entry/exit path on P9 for radix guests")
Cc: stable@vger.kernel.org # v4.20+
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190703012022.15644-1-sjitindarsingh@gmail.com
The virtual real mode addressing (VRMA) mechanism is used when a
partition is using HPT (Hash Page Table) translation and performs real
mode accesses (MSR[IR|DR] = 0) in non-hypervisor mode. In this mode
effective address bits 0:23 are treated as zero (i.e. the access is
aliased to 0) and the access is performed using an implicit 1TB SLB
entry.
The size of the RMA (Real Memory Area) is communicated to the guest as
the size of the first memory region in the device tree. And because of
the mechanism described above can be expected to not exceed 1TB. In
the event that the host erroneously represents the RMA as being larger
than 1TB, guest accesses in real mode to memory addresses above 1TB
will be aliased down to below 1TB. This means that a memory access
performed in real mode may differ to one performed in virtual mode for
the same memory address, which would likely have unintended
consequences.
To avoid this outcome have the guest explicitly limit the size of the
RMA to the current maximum, which is 1TB. This means that even if the
first memory block is larger than 1TB, only the first 1TB should be
accessed in real mode.
Fixes: c610d65c0a ("powerpc/pseries: lift RTAS limit for hash")
Cc: stable@vger.kernel.org # v4.16+
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Tested-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190710052018.14628-1-sjitindarsingh@gmail.com
Driver only supports 3-axis gyro and/or 3-axis accel.
For icm20602, temp data is mandatory for all configurations.
Fix all single and double axis configurations (almost never used) and more
importantly fix 3-axis gyro and 6-axis accel+gyro buffer on icm20602 when
temp data is not enabled.
Signed-off-by: Jean-Baptiste Maneyrol <jmaneyrol@invensense.com>
Fixes: 1615fe41a1 ("iio: imu: mpu6050: Fix FIFO layout for ICM20602")
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
The SADC component can run at up to 8 MHz on JZ4725B, but is fed
a 12 MHz input clock (EXT). Divide it by two to get 6 MHz, then
set up another divider to match, to produce a 10us clock.
If the clock dividers are left on their power-on defaults (a divider
of 1), the SADC mostly works, but will occasionally produce erroneous
readings. This led to button presses being detected out of nowhere on
the RS90 every few minutes. With this change, no ghost button presses
were logged in almost a day worth of testing.
The ADCLK register for configuring clock dividers doesn't exist on
JZ4740, so avoid writing it there.
A function has been introduced rather than a flag because there is a lot
of variation between the ADCLK registers on JZ47xx SoCs, both in
the internal layout of the register and in the frequency range
supported by the SADC. So this solution should make it easier
to add support for other JZ47xx SoCs later.
Fixes: 1a78daea10 ("iio: adc: probe should set clock divider")
Signed-off-by: Maarten ter Huurne <maarten@treewalker.org>
Signed-off-by: Artur Rojek <contact@artur-rojek.eu>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Now that f2fs_ioc_setflags() and f2fs_ioc_fssetxattr() call the VFS
helper functions which check for permission to change the immutable and
append-only flags, it's no longer needed to do this check in
f2fs_setflags_common() too. So remove it.
This is based on a patch from Darrick Wong, but reworked to apply after
commit 360985573b ("f2fs: separate f2fs i_flags from fs_flags and ext4
i_flags").
Originally-from: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Make the f2fs implementation of FS_IOC_FSSETXATTR use the new VFS helper
function vfs_ioc_fssetxattr_check(), and remove the project quota check
since it's now done by the helper function.
This is based on a patch from Darrick Wong, but reworked to apply after
commit 360985573b ("f2fs: separate f2fs i_flags from fs_flags and ext4
i_flags").
Originally-from: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Make the f2fs implementation of FS_IOC_SETFLAGS use the new VFS helper
function vfs_ioc_setflags_prepare().
This is based on a patch from Darrick Wong, but reworked to apply after
commit 360985573b ("f2fs: separate f2fs i_flags from fs_flags and ext4
i_flags").
Originally-from: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
The V4L2_PIX_FMT_BGRA444 define clashed with the pre-existing V4L2_PIX_FMT_SGRBG12
which strangely enough used the same fourcc, even though that fourcc made no sense
for a Bayer format. In any case, you can't have duplicates, so change the fourcc of
V4L2_PIX_FMT_BGRA444.
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Cc: <stable@vger.kernel.org> # for v5.2 and up
Fixes: 6c84f9b1d2 ("media: v4l: Add definitions for missing 16-bit RGB4444 formats")
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Reviewed-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
commit c152f8491a ("ASoC: audio-graph-card: fix an use-after-free in
graph_get_dai_id()") fixups use-after-free issue,
but, it need to use "const" for reg. This patch adds it.
We will have below without this patch
LINUX/sound/soc/generic/audio-graph-card.c: In function 'graph_get_dai_id':
LINUX/sound/soc/generic/audio-graph-card.c:87:7: warning: assignment discards\
'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
reg = of_get_property(node, "reg", NULL);
Fixes: c152f8491a ("ASoC: audio-graph-card: fix an use-after-free in graph_get_dai_id()")
Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Acked-by: Wen Yang <wen.yang99@zte.com.cn>
Link: https://lore.kernel.org/r/87sgrd43ja.wl-kuninori.morimoto.gx@renesas.com
Signed-off-by: Mark Brown <broonie@kernel.org>
commit 3461473998 ("ASoC: soc-core: support dai_link with
platforms_num != 1") supports multi Platform, and
commit 9f3eb91753 ("ASoC: simple-card-utils: consider CPU-Platform
possibility") removed no Platform from simple-card.
Multi Platform is now checking both Platform name/of_node are NULL case.
But in normal case, DPCM be doesn't have Platform.
asoc_simple_canonicalize_platform() try to use CPU of_node
to Platform (This is needed for DMAEngine platform case),
but it still might be NULL at DPCM be.
This patch try to use no Platform after that if Platform of_node
is still NULL. It can't probe without this patch.
Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Link: https://lore.kernel.org/r/87muhmgw2o.wl-kuninori.morimoto.gx@renesas.com
Signed-off-by: Mark Brown <broonie@kernel.org>
max98357a_daiops_trigger() is possible to be called in atomic context if
the .nonatomic flag is equal to 0 in the DAI links.
When cancel_delayed_work_sync() in max98357a_daiops_trigger() is called
in atomic context, kernel emits the following message: "BUG: sleeping
function called from invalid context".
According to the DT binding document, value less than or equal to 5ms of
sdmod-delay should be sufficient to avoid the pop noise. Use mdelay
(i.e. busy loop) for such low delay should be acceptable.
Fixes: cec5b01f8f ("ASoC: max98357a: avoid speaker pop when playback
startup")
Signed-off-by: Tzung-Bi Shih <tzungbi@google.com>
Link: https://lore.kernel.org/r/20190708141901.68797-1-tzungbi@google.com
Signed-off-by: Mark Brown <broonie@kernel.org>
The previous commit added macro calls in the entry code which mitigate the
Spectre v1 swapgs issue if the X86_FEATURE_FENCE_SWAPGS_* features are
enabled. Enable those features where applicable.
The mitigations may be disabled with "nospectre_v1" or "mitigations=off".
There are different features which can affect the risk of attack:
- When FSGSBASE is enabled, unprivileged users are able to place any
value in GS, using the wrgsbase instruction. This means they can
write a GS value which points to any value in kernel space, which can
be useful with the following gadget in an interrupt/exception/NMI
handler:
if (coming from user space)
swapgs
mov %gs:<percpu_offset>, %reg1
// dependent load or store based on the value of %reg
// for example: mov %(reg1), %reg2
If an interrupt is coming from user space, and the entry code
speculatively skips the swapgs (due to user branch mistraining), it
may speculatively execute the GS-based load and a subsequent dependent
load or store, exposing the kernel data to an L1 side channel leak.
Note that, on Intel, a similar attack exists in the above gadget when
coming from kernel space, if the swapgs gets speculatively executed to
switch back to the user GS. On AMD, this variant isn't possible
because swapgs is serializing with respect to future GS-based
accesses.
NOTE: The FSGSBASE patch set hasn't been merged yet, so the above case
doesn't exist quite yet.
- When FSGSBASE is disabled, the issue is mitigated somewhat because
unprivileged users must use prctl(ARCH_SET_GS) to set GS, which
restricts GS values to user space addresses only. That means the
gadget would need an additional step, since the target kernel address
needs to be read from user space first. Something like:
if (coming from user space)
swapgs
mov %gs:<percpu_offset>, %reg1
mov (%reg1), %reg2
// dependent load or store based on the value of %reg2
// for example: mov %(reg2), %reg3
It's difficult to audit for this gadget in all the handlers, so while
there are no known instances of it, it's entirely possible that it
exists somewhere (or could be introduced in the future). Without
tooling to analyze all such code paths, consider it vulnerable.
Effects of SMAP on the !FSGSBASE case:
- If SMAP is enabled, and the CPU reports RDCL_NO (i.e., not
susceptible to Meltdown), the kernel is prevented from speculatively
reading user space memory, even L1 cached values. This effectively
disables the !FSGSBASE attack vector.
- If SMAP is enabled, but the CPU *is* susceptible to Meltdown, SMAP
still prevents the kernel from speculatively reading user space
memory. But it does *not* prevent the kernel from reading the
user value from L1, if it has already been cached. This is probably
only a small hurdle for an attacker to overcome.
Thanks to Dave Hansen for contributing the speculative_smap() function.
Thanks to Andrew Cooper for providing the inside scoop on whether swapgs
is serializing on AMD.
[ tglx: Fixed the USER fence decision and polished the comment as suggested
by Dave Hansen ]
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Spectre v1 isn't only about array bounds checks. It can affect any
conditional checks. The kernel entry code interrupt, exception, and NMI
handlers all have conditional swapgs checks. Those may be problematic in
the context of Spectre v1, as kernel code can speculatively run with a user
GS.
For example:
if (coming from user space)
swapgs
mov %gs:<percpu_offset>, %reg
mov (%reg), %reg1
When coming from user space, the CPU can speculatively skip the swapgs, and
then do a speculative percpu load using the user GS value. So the user can
speculatively force a read of any kernel value. If a gadget exists which
uses the percpu value as an address in another load/store, then the
contents of the kernel value may become visible via an L1 side channel
attack.
A similar attack exists when coming from kernel space. The CPU can
speculatively do the swapgs, causing the user GS to get used for the rest
of the speculative window.
The mitigation is similar to a traditional Spectre v1 mitigation, except:
a) index masking isn't possible; because the index (percpu offset)
isn't user-controlled; and
b) an lfence is needed in both the "from user" swapgs path and the
"from kernel" non-swapgs path (because of the two attacks described
above).
The user entry swapgs paths already have SWITCH_TO_KERNEL_CR3, which has a
CR3 write when PTI is enabled. Since CR3 writes are serializing, the
lfences can be skipped in those cases.
On the other hand, the kernel entry swapgs paths don't depend on PTI.
To avoid unnecessary lfences for the user entry case, create two separate
features for alternative patching:
X86_FEATURE_FENCE_SWAPGS_USER
X86_FEATURE_FENCE_SWAPGS_KERNEL
Use these features in entry code to patch in lfences where needed.
The features aren't enabled yet, so there's no functional change.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
If UHS speed modes are enabled, a compatible SD card switches down to
1.8V during enumeration. If after this a software reboot/crash takes
place and on-chip ROM tries to enumerate the SD card, the difference in
IO voltages (host @ 3.3V and card @ 1.8V) may end up damaging the card.
The fix for this is to have support for power cycling the card in
hardware (with a PORz/soft-reset line causing a power cycle of the
card). Because the beaglebone X15 (rev A,B and C), am57xx-idks and
am57xx-evms don't have this capability, disable voltage switching for
these boards.
The major effect of this is that the maximum supported speed
mode is now high speed(50 MHz) down from SDR104(200 MHz).
commit 88a748419b ("ARM: dts: am57xx-idk: Remove support for voltage
switching for SD card") did this only for idk boards. Do it for all
affected boards.
Signed-off-by: Faiz Abbas <faiz_abbas@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
VRHOT event from the F/W indicates the device has reached a temperature of
100 Celsius degrees. In this case, the driver should only print this
information to the kernel log. The device will shutdown itself
automatically when reaching 125 degrees.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
dma_addr_t might be different sizes depending on the configuration,
so we cannot print it as %llx:
drivers/misc/habanalabs/goya/goya.c: In function 'goya_sw_init':
drivers/misc/habanalabs/goya/goya.c:698:21: error: format '%llx' expects argument of type 'long long unsigned int', but argument 4 has type 'dma_addr_t' {aka 'unsigned int'} [-Werror=format=]
Use the special %pad format string. This requires passing the
argument by reference.
Fixes: 2a51558c8c ("habanalabs: remove DMA mask hack for Goya")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
* Per inserire blocchi di testo con caratteri a dimensione fissa (codici di
esempio, casi d'uso, eccetera): utilizzate ``::`` quando non è necessario
evidenziare la sintassi, specialmente per piccoli frammenti; invece,
utilizzate ``.. code-block:: <language>`` per blocchi di più lunghi che
potranno beneficiare dell'avere la sintassi evidenziata.
utilizzate ``.. code-block:: <language>`` per blocchi più lunghi che
beneficeranno della sintassi evidenziata. Per un breve pezzo di codice da
inserire nel testo, usate \`\`.
Il dominio C
@ -267,12 +268,14 @@ molto comune come ``open`` o ``ioctl``:
Il nome della funzione (per esempio ioctl) rimane nel testo ma il nome del suo
riferimento cambia da ``ioctl`` a ``VIDIOC_LOG_STATUS``. Anche la voce
nell'indice cambia in ``VIDIOC_LOG_STATUS`` e si potrà quindi fare riferimento
a questa funzione scrivendo:
nell'indice cambia in ``VIDIOC_LOG_STATUS``.
..code-block::rst
:c:func:`VIDIOC_LOG_STATUS`
Notate che per una funzione non c'è bisogno di usare ``c:func:`` per generarne
i riferimenti nella documentazione. Grazie a qualche magica estensione a
Sphinx, il sistema di generazione della documentazione trasformerà
automaticamente un riferimento ad una ``funzione()`` in un riferimento
incrociato quando questa ha una voce nell'indice. Se trovate degli usi di
``c:func:`` nella documentazione del kernel, sentitevi liberi di rimuoverli.
Tabelle a liste
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.