Compare commits

...

39 Commits

Author SHA1 Message Date
ab35d16f66 Linux 4.12.2 2017-07-15 13:09:20 +02:00
2607611714 x86/mm/pat: Don't report PAT on CPUs that don't support it
commit 99c13b8c88 upstream.

The pat_enabled() logic is broken on CPUs which do not support PAT and
where the initialization code fails to call pat_init(). Due to that the
enabled flag stays true and pat_enabled() returns true wrongfully.

As a consequence the mappings, e.g. for Xorg, are set up with the wrong
caching mode and the required MTRR setups are omitted.

To cure this the following changes are required:

  1) Make pat_enabled() return true only if PAT initialization was
     invoked and successful.

  2) Invoke init_cache_modes() unconditionally in setup_arch() and
     remove the extra callsites in pat_disable() and the pat disabled
     code path in pat_init().

Also rename __pat_enabled to pat_disabled to reflect the real purpose of
this variable.

Fixes: 9cd25aac1f ("x86/mm/pat: Emulate PAT when it is disabled")
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Bernhard Held <berny156@gmx.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: "Luis R. Rodriguez" <mcgrof@suse.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/alpine.LRH.2.02.1707041749300.3456@file01.intranet.prod.int.rdu2.redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-15 13:09:05 +02:00
20417e9a54 ext4: check return value of kstrtoull correctly in reserved_clusters_store
commit 1ea1516fbb upstream.

kstrtoull returns 0 on success, however, in reserved_clusters_store we
will return -EINVAL if kstrtoull returns 0, it makes us fail to update
reserved_clusters value through sysfs.

Fixes: 76d33bca55
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Miao Xie <miaoxie@huawei.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-15 13:09:05 +02:00
f85a3c8f00 crypto: rsa-pkcs1pad - use constant time memory comparison for MACs
commit fec17cb223 upstream.

Otherwise, we enable all sorts of forgeries via timing attack.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Suggested-by: Stephan Müller <smueller@chronox.de>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: linux-crypto@vger.kernel.org
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-15 13:09:05 +02:00
d56e029fc2 crypto: caam - fix gfp allocation flags (part I)
commit 42cfcafb91 upstream.

Changes in the SW cts (ciphertext stealing) code in
commit 0605c41cc5 ("crypto: cts - Convert to skcipher")
revealed a problem in the CAAM driver:
when cts(cbc(aes)) is executed and cts runs in SW,
cbc(aes) is offloaded in CAAM; cts encrypts the last block
in atomic context and CAAM incorrectly decides to use GFP_KERNEL
for memory allocation.

Fix this by allowing GFP_KERNEL (sleeping) only when MAY_SLEEP flag is
set, i.e. remove MAY_BACKLOG flag.

We split the fix in two parts - first is sent to -stable, while the
second is not (since there is no known failure case).

Link: http://lkml.kernel.org/g/20170602122446.2427-1-david@sigma-star.at
Reported-by: David Gstir <david@sigma-star.at>
Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-15 13:09:05 +02:00
03cbf4a306 staging: comedi: fix clean-up of comedi_class in comedi_init()
commit a9332e9ad0 upstream.

There is a clean-up bug in the core comedi module initialization
functions, `comedi_init()`.  If the `comedi_num_legacy_minors` module
parameter is non-zero (and valid), it creates that many "legacy" devices
and registers them in SysFS.  A failure causes the function to clean up
and return an error.  Unfortunately, it fails to destroy the "comedi"
class that was created earlier.  Fix it by adding a call to
`class_destroy(comedi_class)` at the appropriate place in the clean-up
sequence.

Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-15 13:09:04 +02:00
09d05eb32d staging: vt6556: vnt_start Fix missing call to vnt_key_init_table.
commit dc32190f2c upstream.

The key table is not intialized correctly without this call.

Signed-off-by: Malcolm Priestley <tvboxspy@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-15 13:09:04 +02:00
0155f201e2 locking/rwsem-spinlock: Fix EINTR branch in __down_write_common()
commit a0c4acd2c2 upstream.

If a writer could been woken up, the above branch

	if (sem->count == 0)
		break;

would have moved us to taking the sem. So, it's
not the time to wake a writer now, and only readers
are allowed now. Thus, 0 must be passed to __rwsem_do_wake().

Next, __rwsem_do_wake() wakes readers unconditionally.
But we mustn't do that if the sem is owned by writer
in the moment. Otherwise, writer and reader own the sem
the same time, which leads to memory corruption in
callers.

rwsem-xadd.c does not need that, as:

  1) the similar check is made lockless there,
  2) in __rwsem_mark_wake::try_reader_grant we test,

that sem is not owned by writer.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Niklas Cassel <niklas.cassel@axis.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 17fcbd590d "locking/rwsem: Fix down_write_killable() for CONFIG_RWSEM_GENERIC_SPINLOCK=y"
Link: http://lkml.kernel.org/r/149762063282.19811.9129615532201147826.stgit@localhost.localdomain
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-15 13:09:04 +02:00
dd236b3c9a proc: Fix proc_sys_prune_dcache to hold a sb reference
commit 2fd1d2c4ce upstream.

Andrei Vagin writes:
FYI: This bug has been reproduced on 4.11.7
> BUG: Dentry ffff895a3dd01240{i=4e7c09a,n=lo}  still in use (1) [unmount of proc proc]
> ------------[ cut here ]------------
> WARNING: CPU: 1 PID: 13588 at fs/dcache.c:1445 umount_check+0x6e/0x80
> CPU: 1 PID: 13588 Comm: kworker/1:1 Not tainted 4.11.7-200.fc25.x86_64 #1
> Hardware name: CompuLab sbc-flt1/fitlet, BIOS SBCFLT_0.08.04 06/27/2015
> Workqueue: events proc_cleanup_work
> Call Trace:
>  dump_stack+0x63/0x86
>  __warn+0xcb/0xf0
>  warn_slowpath_null+0x1d/0x20
>  umount_check+0x6e/0x80
>  d_walk+0xc6/0x270
>  ? dentry_free+0x80/0x80
>  do_one_tree+0x26/0x40
>  shrink_dcache_for_umount+0x2d/0x90
>  generic_shutdown_super+0x1f/0xf0
>  kill_anon_super+0x12/0x20
>  proc_kill_sb+0x40/0x50
>  deactivate_locked_super+0x43/0x70
>  deactivate_super+0x5a/0x60
>  cleanup_mnt+0x3f/0x90
>  mntput_no_expire+0x13b/0x190
>  kern_unmount+0x3e/0x50
>  pid_ns_release_proc+0x15/0x20
>  proc_cleanup_work+0x15/0x20
>  process_one_work+0x197/0x450
>  worker_thread+0x4e/0x4a0
>  kthread+0x109/0x140
>  ? process_one_work+0x450/0x450
>  ? kthread_park+0x90/0x90
>  ret_from_fork+0x2c/0x40
> ---[ end trace e1c109611e5d0b41 ]---
> VFS: Busy inodes after unmount of proc. Self-destruct in 5 seconds.  Have a nice day...
> BUG: unable to handle kernel NULL pointer dereference at           (null)
> IP: _raw_spin_lock+0xc/0x30
> PGD 0

Fix this by taking a reference to the super block in proc_sys_prune_dcache.

The superblock reference is the core of the fix however the sysctl_inodes
list is converted to a hlist so that hlist_del_init_rcu may be used.  This
allows proc_sys_prune_dache to remove inodes the sysctl_inodes list, while
not causing problems for proc_sys_evict_inode when if it later choses to
remove the inode from the sysctl_inodes list.  Removing inodes from the
sysctl_inodes list allows proc_sys_prune_dcache to have a progress
guarantee, while still being able to drop all locks.  The fact that
head->unregistering is set in start_unregistering ensures that no more
inodes will be added to the the sysctl_inodes list.

Previously the code did a dance where it delayed calling iput until the
next entry in the list was being considered to ensure the inode remained on
the sysctl_inodes list until the next entry was walked to.  The structure
of the loop in this patch does not need that so is much easier to
understand and maintain.

Reported-by: Andrei Vagin <avagin@gmail.com>
Tested-by: Andrei Vagin <avagin@openvz.org>
Fixes: ace0c791e6 ("proc/sysctl: Don't grab i_lock under sysctl_lock.")
Fixes: d6cffbbe9a ("proc/sysctl: prune stale dentries during unregistering")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-15 13:09:04 +02:00
6399cc16dd imx-serial: RX DMA startup latency
commit 4dec2f119e upstream.

18a4208 introduced a change to reduce the RX DMA latency on the first reception
when the serial port was opened for reading. However it was claiming a hardirq
unsafe lock after a hardirq safe lock which is not allowed and causes lockdep
to complain verbosely.

This patch changes the code to always start RX DMA earlier, instead of
relying on the flags used to open the serial port removing the code that
was looking for the serial file flags.

Signed-off-by: Peter Senna Tschudin <peter.senna@collabora.com>
Tested-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Fabio Estevam <festevam@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-15 13:09:04 +02:00
34bfc89473 mqueue: fix a use-after-free in sys_mq_notify()
commit f991af3daa upstream.

The retry logic for netlink_attachskb() inside sys_mq_notify()
is nasty and vulnerable:

1) The sock refcnt is already released when retry is needed
2) The fd is controllable by user-space because we already
   release the file refcnt

so we when retry but the fd has been just closed by user-space
during this small window, we end up calling netlink_detachskb()
on the error path which releases the sock again, later when
the user-space closes this socket a use-after-free could be
triggered.

Setting 'sock' to NULL here should be sufficient to fix it.

Reported-by: GeneBlue <geneblue.mail@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-15 13:09:03 +02:00
cb66218588 Linux 4.12.1 2017-07-12 16:55:36 +02:00
ead4cb80db crypto: drbg - Fixes panic in wait_for_completion call
commit b61929c654 upstream.

Initialise ctr_completion variable before use.

Cc: <stable@vger.kernel.org>
Signed-off-by: Harsh Jain <harshjain.prof@gmail.com>
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-12 16:55:26 +02:00
618ebd550b xen: avoid deadlock in xenbus driver
commit 1a3fc2c402 upstream.

There has been a report about a deadlock in the xenbus driver:

[  247.979498] ======================================================
[  247.985688] WARNING: possible circular locking dependency detected
[  247.991882] 4.12.0-rc4-00022-gc4b25c0 #575 Not tainted
[  247.997040] ------------------------------------------------------
[  248.003232] xenbus/91 is trying to acquire lock:
[  248.007875]  (&u->msgbuffer_mutex){+.+.+.}, at: [<ffff00000863e904>]
xenbus_dev_queue_reply+0x3c/0x230
[  248.017163]
[  248.017163] but task is already holding lock:
[  248.023096]  (xb_write_mutex){+.+...}, at: [<ffff00000863a940>]
xenbus_thread+0x5f0/0x798
[  248.031267]
[  248.031267] which lock already depends on the new lock.
[  248.031267]
[  248.039615]
[  248.039615] the existing dependency chain (in reverse order) is:
[  248.047176]
[  248.047176] -> #1 (xb_write_mutex){+.+...}:
[  248.052943]        __lock_acquire+0x1728/0x1778
[  248.057498]        lock_acquire+0xc4/0x288
[  248.061630]        __mutex_lock+0x84/0x868
[  248.065755]        mutex_lock_nested+0x3c/0x50
[  248.070227]        xs_send+0x164/0x1f8
[  248.074015]        xenbus_dev_request_and_reply+0x6c/0x88
[  248.079427]        xenbus_file_write+0x260/0x420
[  248.084073]        __vfs_write+0x48/0x138
[  248.088113]        vfs_write+0xa8/0x1b8
[  248.091983]        SyS_write+0x54/0xb0
[  248.095768]        el0_svc_naked+0x24/0x28
[  248.099897]
[  248.099897] -> #0 (&u->msgbuffer_mutex){+.+.+.}:
[  248.106088]        print_circular_bug+0x80/0x2e0
[  248.110730]        __lock_acquire+0x1768/0x1778
[  248.115288]        lock_acquire+0xc4/0x288
[  248.119417]        __mutex_lock+0x84/0x868
[  248.123545]        mutex_lock_nested+0x3c/0x50
[  248.128016]        xenbus_dev_queue_reply+0x3c/0x230
[  248.133005]        xenbus_thread+0x788/0x798
[  248.137306]        kthread+0x110/0x140
[  248.141087]        ret_from_fork+0x10/0x40

It is rather easy to avoid by dropping xb_write_mutex before calling
xenbus_dev_queue_reply().

Fixes: fd8aa9095a ("xen: optimize xenbus
driver for multiple concurrent xenstore accesses").

Reported-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Tested-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-12 16:55:26 +02:00
1b4ba31bb8 sched/numa: Hide numa_wake_affine() from UP build
commit ff801b716e upstream.

Stephen reported the following build warning in UP:

kernel/sched/fair.c:2657:9: warning: 'struct sched_domain' declared inside
parameter list
         ^
/home/sfr/next/next/kernel/sched/fair.c:2657:9: warning: its scope is only this
definition or declaration, which is probably not what you want

Hide the numa_wake_affine() inline stub on UP builds to get rid of it.

Fixes: 3fed382b46 ("sched/numa: Implement NUMA node level wake_affine()")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-12 16:55:26 +02:00
dc427c08fd sched/fair: Remove effective_load()
commit 815abf5af4 upstream.

The effective_load() function was only used by the NUMA balancing
code, and not by the regular load balancing code. Now that the
NUMA balancing code no longer uses it either, get rid of it.

Signed-off-by: Rik van Riel <riel@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: jhladky@redhat.com
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20170623165530.22514-5-riel@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-12 16:55:25 +02:00
ac74b66369 sched/numa: Implement NUMA node level wake_affine()
commit 3fed382b46 upstream.

Since select_idle_sibling() can place a task anywhere on a socket,
comparing loads between individual CPU cores makes no real sense
for deciding whether to do an affine wakeup across sockets, either.

Instead, compare the load between the sockets in a similar way the
load balancer and the numa balancing code do.

Signed-off-by: Rik van Riel <riel@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: jhladky@redhat.com
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20170623165530.22514-4-riel@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-12 16:55:25 +02:00
5a51f2febc sched/fair: Simplify wake_affine() for the single socket case
commit 7d894e6e34 upstream.

Then 'this_cpu' and 'prev_cpu' are in the same socket, select_idle_sibling()
will do its thing regardless of the return value of wake_affine().

Just return true and don't look at all the other things.

Signed-off-by: Rik van Riel <riel@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: jhladky@redhat.com
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20170623165530.22514-3-riel@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-12 16:55:25 +02:00
6cd951eefd sched/numa: Override part of migrate_degrades_locality() when idle balancing
commit 739294fb03 upstream.

Several tests in the NAS benchmark seem to run a lot slower with
NUMA balancing enabled, than with NUMA balancing disabled. The
slower run time corresponds with increased idle time.

Overriding the final test of migrate_degrades_locality (but still
doing the other NUMA tests first) seems to improve performance
of those benchmarks.

Reported-by: Jirka Hladky <jhladky@redhat.com>
Signed-off-by: Rik van Riel <riel@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20170623165530.22514-2-riel@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-12 16:55:25 +02:00
7084a918af sched/numa: Use down_read_trylock() for the mmap_sem
commit 8655d54977 upstream.

A customer has reported a soft-lockup when running an intensive
memory stress test, where the trace on multiple CPU's looks like this:

 RIP: 0010:[<ffffffff810c53fe>]
  [<ffffffff810c53fe>] native_queued_spin_lock_slowpath+0x10e/0x190
...
 Call Trace:
  [<ffffffff81182d07>] queued_spin_lock_slowpath+0x7/0xa
  [<ffffffff811bc331>] change_protection_range+0x3b1/0x930
  [<ffffffff811d4be8>] change_prot_numa+0x18/0x30
  [<ffffffff810adefe>] task_numa_work+0x1fe/0x310
  [<ffffffff81098322>] task_work_run+0x72/0x90

Further investigation showed that the lock contention here is pmd_lock().

The task_numa_work() function makes sure that only one thread is let to perform
the work in a single scan period (via cmpxchg), but if there's a thread with
mmap_sem locked for writing for several periods, multiple threads in
task_numa_work() can build up a convoy waiting for mmap_sem for read and then
all get unblocked at once.

This patch changes the down_read() to the trylock version, which prevents the
build up. For a workload experiencing mmap_sem contention, it's probably better
to postpone the NUMA balancing work anyway. This seems to have fixed the soft
lockups involving pmd_lock(), which is in line with the convoy theory.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170515131316.21909-1-vbabka@suse.cz
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-12 16:55:25 +02:00
c329c44099 sched/core: Implement new approach to scale select_idle_cpu()
commit 1ad3aaf3fc upstream.

Hackbench recently suffered a bunch of pain, first by commit:

  4c77b18cf8 ("sched/fair: Make select_idle_cpu() more aggressive")

and then by commit:

  c743f0a5c5 ("sched/fair, cpumask: Export for_each_cpu_wrap()")

which fixed a bug in the initial for_each_cpu_wrap() implementation
that made select_idle_cpu() even more expensive. The bug was that it
would skip over CPUs when bits were consequtive in the bitmask.

This however gave me an idea to fix select_idle_cpu(); where the old
scheme was a cliff-edge throttle on idle scanning, this introduces a
more gradual approach. Instead of stopping to scan entirely, we limit
how many CPUs we scan.

Initial benchmarks show that it mostly recovers hackbench while not
hurting anything else, except Mason's schbench, but not as bad as the
old thing.

It also appears to recover the tbench high-end, which also suffered like
hackbench.

Tested-by: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Chris Mason <clm@fb.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: hpa@zytor.com
Cc: kitsunyan <kitsunyan@inbox.ru>
Cc: linux-kernel@vger.kernel.org
Cc: lvenanci@redhat.com
Cc: riel@redhat.com
Cc: xiaolong.ye@intel.com
Link: http://lkml.kernel.org/r/20170517105350.hk5m4h4jb6dfr65a@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-12 16:55:25 +02:00
c6508a3964 sched/fair, cpumask: Export for_each_cpu_wrap()
commit c743f0a5c5 upstream.

More users for for_each_cpu_wrap() have appeared. Promote the construct
to generic cpumask interface.

The implementation is slightly modified to reduce arguments.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Lauro Ramos Venancio <lvenanci@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: lwang@redhat.com
Link: http://lkml.kernel.org/r/20170414122005.o35me2h5nowqkxbv@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-12 16:55:25 +02:00
1272418996 x86/uaccess: Optimize copy_user_enhanced_fast_string() for short strings
commit 236222d393 upstream.

According to the Intel datasheet, the REP MOVSB instruction
exposes a pretty heavy setup cost (50 ticks), which hurts
short string copy operations.

This change tries to avoid this cost by calling the explicit
loop available in the unrolled code for strings shorter
than 64 bytes.

The 64 bytes cutoff value is arbitrary from the code logic
point of view - it has been selected based on measurements,
as the largest value that still ensures a measurable gain.

Micro benchmarks of the __copy_from_user() function with
lengths in the [0-63] range show this performance gain
(shorter the string, larger the gain):

 - in the [55%-4%] range on Intel Xeon(R) CPU E5-2690 v4
 - in the [72%-9%] range on Intel Core i7-4810MQ

Other tested CPUs - namely Intel Atom S1260 and AMD Opteron
8216 - show no difference, because they do not expose the
ERMS feature bit.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/4533a1d101fd460f80e21329a34928fad521c1d4.1498744345.git.pabeni@redhat.com
[ Clarified the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
2017-07-12 16:55:25 +02:00
e1ebb00c1b powerpc/powernv: Fix CPU_HOTPLUG=n idle.c compile error
commit 67d2041808 upstream.

Fixes: a7cd88da97 ("powerpc/powernv: Move CPU-Offline idle state invocation from smp.c to idle.c")
Cc: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-12 16:55:24 +02:00
e9d8f71bda tpm: fix a kernel memory leak in tpm-sysfs.c
commit 13b47cfcfc upstream.

While cleaning up sysfs callback that prints EK we discovered a kernel
memory leak. This commit fixes the issue by zeroing the buffer used for
TPM command/response.

The leak happen when we use either tpm_vtpm_proxy, tpm_ibmvtpm or
xen-tpmfront.

Fixes: 0883743825 ("TPM: sysfs functions consolidation")
Reported-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Tested-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-12 16:55:24 +02:00
e520cb6c01 tpm: Issue a TPM2_Shutdown for TPM2 devices.
commit d1bd4a792d upstream.

If a TPM2 loses power without a TPM2_Shutdown command being issued (a
"disorderly reboot"), it may lose some state that has yet to be
persisted to NVRam, and will increment the DA counter. After the DA
counter gets sufficiently large, the TPM will lock the user out.

NOTE: This only changes behavior on TPM2 devices. Since TPM1 uses sysfs,
and sysfs relies on implicit locking on chip->ops, it is not safe to
allow this code to run in TPM1, or to add sysfs support to TPM2, until
that locking is made explicit.

Signed-off-by: Josh Zimmerman <joshz@google.com>
Fixes: 74d6b3ceaa ("tpm: fix suspend/resume paths for TPM 2.0")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-12 16:55:24 +02:00
23f2134fb9 Add "shutdown" to "struct class".
commit f77af15165 upstream.

The TPM class has some common shutdown code that must be executed for
all drivers. This adds some needed functionality for that.

Signed-off-by: Josh Zimmerman <joshz@google.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fixes: 74d6b3ceaa ("tpm: fix suspend/resume paths for TPM 2.0")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-12 16:55:24 +02:00
132a4d817b gfs2: Fix glock rhashtable rcu bug
commit 961ae1d83d upstream.

Before commit 88ffbf3e03 "GFS2: Use resizable hash table for glocks",
glocks were freed via call_rcu to allow reading the glock hashtable
locklessly using rcu.  This was then changed to free glocks immediately,
which made reading the glock hashtable unsafe.  Bring back the original
code for freeing glocks via call_rcu.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-12 16:55:24 +02:00
3cd2a58902 xhci: Limit USB2 port wake support for AMD Promontory hosts
commit dec08194ff upstream.

For AMD Promontory xHCI host, although you can disable USB 2.0 ports in
BIOS settings, those ports will be enabled anyway after you remove a
device on that port and re-plug it in again. It's a known limitation of
the chip. As a workaround we can clear the PORT_WAKE_BITS.

This will disable wake on connect, disconnect and overcurrent on
AMD Promontory USB2 ports

[checkpatch cleanup and commit message reword -Mathias]
Cc: Tsai Nicholas <nicholas.tsai@amd.com>
Signed-off-by: Jiahau Chang <Lars_Chang@asmedia.com.tw>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-12 16:55:24 +02:00
bf1befcaa5 USB: serial: qcserial: new Sierra Wireless EM7305 device ID
commit 996fab55d8 upstream.

A new Sierra Wireless EM7305 device ID used in a Toshiba laptop.

Reported-by: Petr Kloc <petr_kloc@yahoo.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-12 16:55:24 +02:00
b555cf8458 USB: serial: option: add two Longcheer device ids
commit 8fb060da71 upstream.

Add two Longcheer device-id entries which specifically enables a
Telewell TW-3G HSPA+ branded modem (0x9801).

Reported-by: Teemu Likonen <tlikonen@iki.fi>
Reported-by: Bjørn Mork <bjorn@mork.no>
Reported-by: Lars Melin <larsm17@gmail.com>
Tested-by: Teemu Likonen <tlikonen@iki.fi>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-12 16:55:24 +02:00
1e35d149be USB: core: fix device node leak
commit e271b2c909 upstream.

Make sure to release any OF device-node reference taken when creating
the USB device.

Note that we currently do not hold a reference to the root hub
device-tree node (i.e. the parent controller node).

Fixes: 69bec72598 ("USB: core: let USB device know device node")
Acked-by: Peter Chen <peter.chen@nxp.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-12 16:55:24 +02:00
c3a90ecddf usb: Fix typo in the definition of Endpoint[out]Request
commit 7cf916bd63 upstream.

The current definition is wrong. This breaks my upcoming
Aspeed virtual hub driver.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-12 16:55:23 +02:00
1ec27490ee Add USB quirk for HVR-950q to avoid intermittent device resets
commit 6836796de4 upstream.

The USB core and sysfs will attempt to enumerate certain parameters
which are unsupported by the au0828 - causing inconsistent behavior
and sometimes causing the chip to reset.  Avoid making these calls.

This problem manifested as intermittent cases where the au8522 would
be reset on analog video startup, in particular when starting up ALSA
audio streaming in parallel - the sysfs entries created by
snd-usb-audio on streaming startup would result in unsupported control
messages being sent during tuning which would put the chip into an
unknown state.

Signed-off-by: Devin Heitmueller <dheitmueller@kernellabs.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-12 16:55:23 +02:00
9bef7d690e usb: usbip: set buffer pointers to NULL after free
commit b3b51417d0 upstream.

The usbip stack dynamically allocates the transfer_buffer and
setup_packet of each urb that got generated by the tcp to usb stub code.
As these pointers are always used only once we will set them to NULL
after use. This is done likewise to the free_urb code in vudc_dev.c.
This patch fixes double kfree situations where the usbip remote side
added the URB_FREE_BUFFER.

Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
Acked-by: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-12 16:55:23 +02:00
1338f79220 USB: serial: cp210x: add ID for CEL EM3588 USB ZigBee stick
commit fd90f73a99 upstream.

Added the USB serial device ID for the CEL ZigBee EM3588
radio stick.

Signed-off-by: Jeremie Rapin <rapinj@gmail.com>
Acked-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-12 16:55:23 +02:00
9a11d1f9c4 usb: dwc3: replace %p with %pK
commit 04fb365c45 upstream.

%p will leak kernel pointers, so let's not expose the information on
dmesg and instead use %pK. %pK will only show the actual addresses if
explicitly enabled under /proc/sys/kernel/kptr_restrict.

Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-12 16:55:23 +02:00
d4ca0cfa3f RDMA/uverbs: Check port number supplied by user verbs cmds
commit 5ecce4c9b1 upstream.

The ib_uverbs_create_ah() ind ib_uverbs_modify_qp() calls receive
the port number from user input as part of its attributes and assumes
it is valid. Down on the stack, that parameter is used to access kernel
data structures.  If the value is invalid, the kernel accesses memory
it should not.  To prevent this, verify the port number before using it.

BUG: KASAN: use-after-free in ib_uverbs_create_ah+0x6d5/0x7b0
Read of size 4 at addr ffff880018d67ab8 by task syz-executor/313

BUG: KASAN: slab-out-of-bounds in modify_qp.isra.4+0x19d0/0x1ef0
Read of size 4 at addr ffff88006c40ec58 by task syz-executor/819

Fixes: 67cdb40ca4 ("[IB] uverbs: Implement more commands")
Fixes: 189aba99e7 ("IB/uverbs: Extend modify_qp and support packet pacing")
Cc: Yevgeny Kliteynik <kliteyn@mellanox.com>
Cc: Tziporet Koren <tziporet@mellanox.com>
Cc: Alex Polak <alexpo@mellanox.com>
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-12 16:55:23 +02:00
d0ccfd55b9 driver core: platform: fix race condition with driver_override
commit 6265539776 upstream.

The driver_override implementation is susceptible to race condition when
different threads are reading vs storing a different driver override.
Add locking to avoid race condition.

Fixes: 3d713e0e38 ("driver core: platform: add device binding path 'driver_override'")
Cc: stable@vger.kernel.org
Signed-off-by: Adrian Salido <salidoa@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-12 16:55:23 +02:00
44 changed files with 360 additions and 313 deletions

View File

@ -1,6 +1,6 @@
VERSION = 4
PATCHLEVEL = 12
SUBLEVEL = 0
SUBLEVEL = 2
EXTRAVERSION =
NAME = Fearless Coyote

View File

@ -261,6 +261,7 @@ static u64 pnv_deepest_stop_psscr_val;
static u64 pnv_deepest_stop_psscr_mask;
static bool deepest_stop_found;
#ifdef CONFIG_HOTPLUG_CPU
/*
* pnv_cpu_offline: A function that puts the CPU into the deepest
* available platform idle state on a CPU-Offline.
@ -293,6 +294,7 @@ unsigned long pnv_cpu_offline(unsigned int cpu)
return srr1;
}
#endif
/*
* Power ISA 3.0 idle initialization.

View File

@ -7,6 +7,7 @@
bool pat_enabled(void);
void pat_disable(const char *reason);
extern void pat_init(void);
extern void init_cache_modes(void);
extern int reserve_memtype(u64 start, u64 end,
enum page_cache_mode req_pcm, enum page_cache_mode *ret_pcm);

View File

@ -1075,6 +1075,13 @@ void __init setup_arch(char **cmdline_p)
max_possible_pfn = max_pfn;
/*
* This call is required when the CPU does not support PAT. If
* mtrr_bp_init() invoked it already via pat_init() the call has no
* effect.
*/
init_cache_modes();
/*
* Define random base addresses for memory sections after max_pfn is
* defined and before each memory section base is used.

View File

@ -37,7 +37,7 @@ ENTRY(copy_user_generic_unrolled)
movl %edx,%ecx
andl $63,%edx
shrl $6,%ecx
jz 17f
jz .L_copy_short_string
1: movq (%rsi),%r8
2: movq 1*8(%rsi),%r9
3: movq 2*8(%rsi),%r10
@ -58,7 +58,8 @@ ENTRY(copy_user_generic_unrolled)
leaq 64(%rdi),%rdi
decl %ecx
jnz 1b
17: movl %edx,%ecx
.L_copy_short_string:
movl %edx,%ecx
andl $7,%edx
shrl $3,%ecx
jz 20f
@ -174,6 +175,8 @@ EXPORT_SYMBOL(copy_user_generic_string)
*/
ENTRY(copy_user_enhanced_fast_string)
ASM_STAC
cmpl $64,%edx
jb .L_copy_short_string /* less then 64 bytes, avoid the costly 'rep' */
movl %edx,%ecx
1: rep
movsb

View File

@ -37,14 +37,14 @@
#undef pr_fmt
#define pr_fmt(fmt) "" fmt
static bool boot_cpu_done;
static int __read_mostly __pat_enabled = IS_ENABLED(CONFIG_X86_PAT);
static void init_cache_modes(void);
static bool __read_mostly boot_cpu_done;
static bool __read_mostly pat_disabled = !IS_ENABLED(CONFIG_X86_PAT);
static bool __read_mostly pat_initialized;
static bool __read_mostly init_cm_done;
void pat_disable(const char *reason)
{
if (!__pat_enabled)
if (pat_disabled)
return;
if (boot_cpu_done) {
@ -52,10 +52,8 @@ void pat_disable(const char *reason)
return;
}
__pat_enabled = 0;
pat_disabled = true;
pr_info("x86/PAT: %s\n", reason);
init_cache_modes();
}
static int __init nopat(char *str)
@ -67,7 +65,7 @@ early_param("nopat", nopat);
bool pat_enabled(void)
{
return !!__pat_enabled;
return pat_initialized;
}
EXPORT_SYMBOL_GPL(pat_enabled);
@ -205,6 +203,8 @@ static void __init_cache_modes(u64 pat)
update_cache_mode_entry(i, cache);
}
pr_info("x86/PAT: Configuration [0-7]: %s\n", pat_msg);
init_cm_done = true;
}
#define PAT(x, y) ((u64)PAT_ ## y << ((x)*8))
@ -225,6 +225,7 @@ static void pat_bsp_init(u64 pat)
}
wrmsrl(MSR_IA32_CR_PAT, pat);
pat_initialized = true;
__init_cache_modes(pat);
}
@ -242,10 +243,9 @@ static void pat_ap_init(u64 pat)
wrmsrl(MSR_IA32_CR_PAT, pat);
}
static void init_cache_modes(void)
void init_cache_modes(void)
{
u64 pat = 0;
static int init_cm_done;
if (init_cm_done)
return;
@ -287,8 +287,6 @@ static void init_cache_modes(void)
}
__init_cache_modes(pat);
init_cm_done = 1;
}
/**
@ -306,10 +304,8 @@ void pat_init(void)
u64 pat;
struct cpuinfo_x86 *c = &boot_cpu_data;
if (!pat_enabled()) {
init_cache_modes();
if (pat_disabled)
return;
}
if ((c->x86_vendor == X86_VENDOR_INTEL) &&
(((c->x86 == 0x6) && (c->x86_model <= 0xd)) ||

View File

@ -1691,6 +1691,7 @@ static int drbg_init_sym_kernel(struct drbg_state *drbg)
return PTR_ERR(sk_tfm);
}
drbg->ctr_handle = sk_tfm;
init_completion(&drbg->ctr_completion);
req = skcipher_request_alloc(sk_tfm, GFP_KERNEL);
if (!req) {

View File

@ -496,7 +496,7 @@ static int pkcs1pad_verify_complete(struct akcipher_request *req, int err)
goto done;
pos++;
if (memcmp(out_buf + pos, digest_info->data, digest_info->size))
if (crypto_memneq(out_buf + pos, digest_info->data, digest_info->size))
goto done;
pos += digest_info->size;

View File

@ -2667,7 +2667,11 @@ void device_shutdown(void)
pm_runtime_get_noresume(dev);
pm_runtime_barrier(dev);
if (dev->bus && dev->bus->shutdown) {
if (dev->class && dev->class->shutdown) {
if (initcall_debug)
dev_info(dev, "shutdown\n");
dev->class->shutdown(dev);
} else if (dev->bus && dev->bus->shutdown) {
if (initcall_debug)
dev_info(dev, "shutdown\n");
dev->bus->shutdown(dev);

View File

@ -866,7 +866,7 @@ static ssize_t driver_override_store(struct device *dev,
const char *buf, size_t count)
{
struct platform_device *pdev = to_platform_device(dev);
char *driver_override, *old = pdev->driver_override, *cp;
char *driver_override, *old, *cp;
if (count > PATH_MAX)
return -EINVAL;
@ -879,12 +879,15 @@ static ssize_t driver_override_store(struct device *dev,
if (cp)
*cp = '\0';
device_lock(dev);
old = pdev->driver_override;
if (strlen(driver_override)) {
pdev->driver_override = driver_override;
} else {
kfree(driver_override);
pdev->driver_override = NULL;
}
device_unlock(dev);
kfree(old);
@ -895,8 +898,12 @@ static ssize_t driver_override_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
struct platform_device *pdev = to_platform_device(dev);
ssize_t len;
return sprintf(buf, "%s\n", pdev->driver_override);
device_lock(dev);
len = sprintf(buf, "%s\n", pdev->driver_override);
device_unlock(dev);
return len;
}
static DEVICE_ATTR_RW(driver_override);

View File

@ -142,6 +142,39 @@ static void tpm_devs_release(struct device *dev)
put_device(&chip->dev);
}
/**
* tpm_class_shutdown() - prepare the TPM device for loss of power.
* @dev: device to which the chip is associated.
*
* Issues a TPM2_Shutdown command prior to loss of power, as required by the
* TPM 2.0 spec.
* Then, calls bus- and device- specific shutdown code.
*
* XXX: This codepath relies on the fact that sysfs is not enabled for
* TPM2: sysfs uses an implicit lock on chip->ops, so this could race if TPM2
* has sysfs support enabled before TPM sysfs's implicit locking is fixed.
*/
static int tpm_class_shutdown(struct device *dev)
{
struct tpm_chip *chip = container_of(dev, struct tpm_chip, dev);
if (chip->flags & TPM_CHIP_FLAG_TPM2) {
down_write(&chip->ops_sem);
tpm2_shutdown(chip, TPM2_SU_CLEAR);
chip->ops = NULL;
up_write(&chip->ops_sem);
}
/* Allow bus- and device-specific code to run. Note: since chip->ops
* is NULL, more-specific shutdown code will not be able to issue TPM
* commands.
*/
if (dev->bus && dev->bus->shutdown)
dev->bus->shutdown(dev);
else if (dev->driver && dev->driver->shutdown)
dev->driver->shutdown(dev);
return 0;
}
/**
* tpm_chip_alloc() - allocate a new struct tpm_chip instance
* @pdev: device to which the chip is associated
@ -181,6 +214,7 @@ struct tpm_chip *tpm_chip_alloc(struct device *pdev,
device_initialize(&chip->devs);
chip->dev.class = tpm_class;
chip->dev.class->shutdown = tpm_class_shutdown;
chip->dev.release = tpm_dev_release;
chip->dev.parent = pdev;
chip->dev.groups = chip->groups;

View File

@ -36,9 +36,10 @@ static ssize_t pubek_show(struct device *dev, struct device_attribute *attr,
ssize_t err;
int i, rc;
char *str = buf;
struct tpm_chip *chip = to_tpm_chip(dev);
memset(&tpm_cmd, 0, sizeof(tpm_cmd));
tpm_cmd.header.in = tpm_readpubek_header;
err = tpm_transmit_cmd(chip, NULL, &tpm_cmd, READ_PUBEK_RESULT_SIZE,
READ_PUBEK_RESULT_MIN_BODY_SIZE, 0,
@ -294,6 +295,9 @@ static const struct attribute_group tpm_dev_group = {
void tpm_sysfs_add_device(struct tpm_chip *chip)
{
/* XXX: If you wish to remove this restriction, you must first update
* tpm_sysfs to explicitly lock chip->ops.
*/
if (chip->flags & TPM_CHIP_FLAG_TPM2)
return;

View File

@ -1475,8 +1475,7 @@ static struct ablkcipher_edesc *ablkcipher_edesc_alloc(struct ablkcipher_request
struct crypto_ablkcipher *ablkcipher = crypto_ablkcipher_reqtfm(req);
struct caam_ctx *ctx = crypto_ablkcipher_ctx(ablkcipher);
struct device *jrdev = ctx->jrdev;
gfp_t flags = (req->base.flags & (CRYPTO_TFM_REQ_MAY_BACKLOG |
CRYPTO_TFM_REQ_MAY_SLEEP)) ?
gfp_t flags = (req->base.flags & CRYPTO_TFM_REQ_MAY_SLEEP) ?
GFP_KERNEL : GFP_ATOMIC;
int src_nents, mapped_src_nents, dst_nents = 0, mapped_dst_nents = 0;
struct ablkcipher_edesc *edesc;

View File

@ -1931,6 +1931,11 @@ static int modify_qp(struct ib_uverbs_file *file,
goto out;
}
if (!rdma_is_port_valid(qp->device, cmd->base.port_num)) {
ret = -EINVAL;
goto release_qp;
}
attr->qp_state = cmd->base.qp_state;
attr->cur_qp_state = cmd->base.cur_qp_state;
attr->path_mtu = cmd->base.path_mtu;
@ -2541,6 +2546,9 @@ ssize_t ib_uverbs_create_ah(struct ib_uverbs_file *file,
if (copy_from_user(&cmd, buf, sizeof cmd))
return -EFAULT;
if (!rdma_is_port_valid(ib_dev, cmd.attr.port_num))
return -EINVAL;
INIT_UDATA(&udata, buf + sizeof(cmd),
(unsigned long)cmd.response + sizeof(resp),
in_len - sizeof(cmd), out_len - sizeof(resp));

View File

@ -2915,6 +2915,7 @@ static int __init comedi_init(void)
dev = comedi_alloc_board_minor(NULL);
if (IS_ERR(dev)) {
comedi_cleanup_board_minors();
class_destroy(comedi_class);
cdev_del(&comedi_cdev);
unregister_chrdev_region(MKDEV(COMEDI_MAJOR, 0),
COMEDI_NUM_MINORS);

View File

@ -513,6 +513,9 @@ static int vnt_start(struct ieee80211_hw *hw)
goto free_all;
}
if (vnt_key_init_table(priv))
goto free_all;
priv->int_interval = 1; /* bInterval is set to 1 */
vnt_int_start_interrupt(priv);

View File

@ -1340,29 +1340,13 @@ static int imx_startup(struct uart_port *port)
imx_enable_ms(&sport->port);
/*
* If the serial port is opened for reading start RX DMA immediately
* instead of waiting for RX FIFO interrupts. In our iMX53 the average
* delay for the first reception dropped from approximately 35000
* microseconds to 1000 microseconds.
* Start RX DMA immediately instead of waiting for RX FIFO interrupts.
* In our iMX53 the average delay for the first reception dropped from
* approximately 35000 microseconds to 1000 microseconds.
*/
if (sport->dma_is_enabled) {
struct tty_struct *tty = sport->port.state->port.tty;
struct tty_file_private *file_priv;
int readcnt = 0;
spin_lock(&tty->files_lock);
if (!list_empty(&tty->tty_files))
list_for_each_entry(file_priv, &tty->tty_files, list)
if (!(file_priv->file->f_flags & O_WRONLY))
readcnt++;
spin_unlock(&tty->files_lock);
if (readcnt > 0) {
imx_disable_rx_int(sport);
start_rx_dma(sport);
}
imx_disable_rx_int(sport);
start_rx_dma(sport);
}
spin_unlock_irqrestore(&sport->port.lock, flags);

View File

@ -223,6 +223,10 @@ static const struct usb_device_id usb_quirk_list[] = {
/* Blackmagic Design UltraStudio SDI */
{ USB_DEVICE(0x1edb, 0xbd4f), .driver_info = USB_QUIRK_NO_LPM },
/* Hauppauge HVR-950q */
{ USB_DEVICE(0x2040, 0x7200), .driver_info =
USB_QUIRK_CONFIG_INTF_STRINGS },
/* INTEL VALUE SSD */
{ USB_DEVICE(0x8086, 0xf1a5), .driver_info = USB_QUIRK_RESET_RESUME },

View File

@ -416,6 +416,8 @@ static void usb_release_dev(struct device *dev)
usb_destroy_configuration(udev);
usb_release_bos_descriptor(udev);
if (udev->parent)
of_node_put(dev->of_node);
usb_put_hcd(hcd);
kfree(udev->product);
kfree(udev->manufacturer);

View File

@ -230,7 +230,7 @@ static int st_dwc3_probe(struct platform_device *pdev)
dwc3_data->syscfg_reg_off = res->start;
dev_vdbg(&pdev->dev, "glue-logic addr 0x%p, syscfg-reg offset 0x%x\n",
dev_vdbg(&pdev->dev, "glue-logic addr 0x%pK, syscfg-reg offset 0x%x\n",
dwc3_data->glue_base, dwc3_data->syscfg_reg_off);
dwc3_data->rstc_pwrdn =

View File

@ -1215,12 +1215,9 @@ static int __dwc3_gadget_ep_queue(struct dwc3_ep *dep, struct dwc3_request *req)
return -ESHUTDOWN;
}
if (WARN(req->dep != dep, "request %p belongs to '%s'\n",
&req->request, req->dep->name)) {
dev_err(dwc->dev, "%s: request %p belongs to '%s'\n",
dep->name, &req->request, req->dep->name);
if (WARN(req->dep != dep, "request %pK belongs to '%s'\n",
&req->request, req->dep->name))
return -EINVAL;
}
pm_runtime_get(dwc->dev);
@ -1396,7 +1393,7 @@ static int dwc3_gadget_ep_dequeue(struct usb_ep *ep,
}
goto out1;
}
dev_err(dwc->dev, "request %p was not queued to %s\n",
dev_err(dwc->dev, "request %pK was not queued to %s\n",
request, ep->name);
ret = -EINVAL;
goto out0;

View File

@ -1461,6 +1461,9 @@ int xhci_bus_suspend(struct usb_hcd *hcd)
t2 |= PORT_WKOC_E | PORT_WKCONN_E;
t2 &= ~PORT_WKDISC_E;
}
if ((xhci->quirks & XHCI_U2_DISABLE_WAKE) &&
(hcd->speed < HCD_USB3))
t2 &= ~PORT_WAKE_BITS;
} else
t2 &= ~PORT_WAKE_BITS;

View File

@ -54,6 +54,11 @@
#define PCI_DEVICE_ID_INTEL_APL_XHCI 0x5aa8
#define PCI_DEVICE_ID_INTEL_DNV_XHCI 0x19d0
#define PCI_DEVICE_ID_AMD_PROMONTORYA_4 0x43b9
#define PCI_DEVICE_ID_AMD_PROMONTORYA_3 0x43ba
#define PCI_DEVICE_ID_AMD_PROMONTORYA_2 0x43bb
#define PCI_DEVICE_ID_AMD_PROMONTORYA_1 0x43bc
static const char hcd_name[] = "xhci_hcd";
static struct hc_driver __read_mostly xhci_pci_hc_driver;
@ -135,6 +140,13 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci)
if (pdev->vendor == PCI_VENDOR_ID_AMD)
xhci->quirks |= XHCI_TRUST_TX_LENGTH;
if ((pdev->vendor == PCI_VENDOR_ID_AMD) &&
((pdev->device == PCI_DEVICE_ID_AMD_PROMONTORYA_4) ||
(pdev->device == PCI_DEVICE_ID_AMD_PROMONTORYA_3) ||
(pdev->device == PCI_DEVICE_ID_AMD_PROMONTORYA_2) ||
(pdev->device == PCI_DEVICE_ID_AMD_PROMONTORYA_1)))
xhci->quirks |= XHCI_U2_DISABLE_WAKE;
if (pdev->vendor == PCI_VENDOR_ID_INTEL) {
xhci->quirks |= XHCI_LPM_SUPPORT;
xhci->quirks |= XHCI_INTEL_HOST;

View File

@ -1819,6 +1819,7 @@ struct xhci_hcd {
/* For controller with a broken Port Disable implementation */
#define XHCI_BROKEN_PORT_PED (1 << 25)
#define XHCI_LIMIT_ENDPOINT_INTERVAL_7 (1 << 26)
#define XHCI_U2_DISABLE_WAKE (1 << 27)
unsigned int num_active_eps;
unsigned int limit_active_eps;

View File

@ -141,6 +141,7 @@ static const struct usb_device_id id_table[] = {
{ USB_DEVICE(0x10C4, 0x8977) }, /* CEL MeshWorks DevKit Device */
{ USB_DEVICE(0x10C4, 0x8998) }, /* KCF Technologies PRN */
{ USB_DEVICE(0x10C4, 0x8A2A) }, /* HubZ dual ZigBee and Z-Wave dongle */
{ USB_DEVICE(0x10C4, 0x8A5E) }, /* CEL EM3588 ZigBee USB Stick Long Range */
{ USB_DEVICE(0x10C4, 0xEA60) }, /* Silicon Labs factory default */
{ USB_DEVICE(0x10C4, 0xEA61) }, /* Silicon Labs factory default */
{ USB_DEVICE(0x10C4, 0xEA70) }, /* Silicon Labs factory default */

View File

@ -1877,6 +1877,10 @@ static const struct usb_device_id option_ids[] = {
.driver_info = (kernel_ulong_t)&four_g_w100_blacklist
},
{ USB_DEVICE_INTERFACE_CLASS(LONGCHEER_VENDOR_ID, SPEEDUP_PRODUCT_SU9800, 0xff) },
{ USB_DEVICE_INTERFACE_CLASS(LONGCHEER_VENDOR_ID, 0x9801, 0xff),
.driver_info = (kernel_ulong_t)&net_intf3_blacklist },
{ USB_DEVICE_INTERFACE_CLASS(LONGCHEER_VENDOR_ID, 0x9803, 0xff),
.driver_info = (kernel_ulong_t)&net_intf4_blacklist },
{ USB_DEVICE(LONGCHEER_VENDOR_ID, ZOOM_PRODUCT_4597) },
{ USB_DEVICE(LONGCHEER_VENDOR_ID, IBALL_3_5G_CONNECT) },
{ USB_DEVICE(HAIER_VENDOR_ID, HAIER_PRODUCT_CE100) },

View File

@ -158,6 +158,7 @@ static const struct usb_device_id id_table[] = {
{DEVICE_SWI(0x1199, 0x9056)}, /* Sierra Wireless Modem */
{DEVICE_SWI(0x1199, 0x9060)}, /* Sierra Wireless Modem */
{DEVICE_SWI(0x1199, 0x9061)}, /* Sierra Wireless Modem */
{DEVICE_SWI(0x1199, 0x9063)}, /* Sierra Wireless EM7305 */
{DEVICE_SWI(0x1199, 0x9070)}, /* Sierra Wireless MC74xx */
{DEVICE_SWI(0x1199, 0x9071)}, /* Sierra Wireless MC74xx */
{DEVICE_SWI(0x1199, 0x9078)}, /* Sierra Wireless EM74xx */

View File

@ -262,7 +262,11 @@ void stub_device_cleanup_urbs(struct stub_device *sdev)
kmem_cache_free(stub_priv_cache, priv);
kfree(urb->transfer_buffer);
urb->transfer_buffer = NULL;
kfree(urb->setup_packet);
urb->setup_packet = NULL;
usb_free_urb(urb);
}
}

View File

@ -28,7 +28,11 @@ static void stub_free_priv_and_urb(struct stub_priv *priv)
struct urb *urb = priv->urb;
kfree(urb->setup_packet);
urb->setup_packet = NULL;
kfree(urb->transfer_buffer);
urb->transfer_buffer = NULL;
list_del(&priv->list);
kmem_cache_free(stub_priv_cache, priv);
usb_free_urb(urb);

View File

@ -299,17 +299,7 @@ static int process_msg(void)
mutex_lock(&xb_write_mutex);
list_for_each_entry(req, &xs_reply_list, list) {
if (req->msg.req_id == state.msg.req_id) {
if (req->state == xb_req_state_wait_reply) {
req->msg.type = state.msg.type;
req->msg.len = state.msg.len;
req->body = state.body;
req->state = xb_req_state_got_reply;
list_del(&req->list);
req->cb(req);
} else {
list_del(&req->list);
kfree(req);
}
list_del(&req->list);
err = 0;
break;
}
@ -317,6 +307,15 @@ static int process_msg(void)
mutex_unlock(&xb_write_mutex);
if (err)
goto out;
if (req->state == xb_req_state_wait_reply) {
req->msg.type = state.msg.type;
req->msg.len = state.msg.len;
req->body = state.body;
req->state = xb_req_state_got_reply;
req->cb(req);
} else
kfree(req);
}
mutex_unlock(&xs_response_mutex);

View File

@ -100,7 +100,7 @@ static ssize_t reserved_clusters_store(struct ext4_attr *a,
int ret;
ret = kstrtoull(skip_spaces(buf), 0, &val);
if (!ret || val >= clusters)
if (ret || val >= clusters)
return -EINVAL;
atomic64_set(&sbi->s_resv_clusters, val);

View File

@ -80,9 +80,9 @@ static struct rhashtable_params ht_parms = {
static struct rhashtable gl_hash_table;
void gfs2_glock_free(struct gfs2_glock *gl)
static void gfs2_glock_dealloc(struct rcu_head *rcu)
{
struct gfs2_sbd *sdp = gl->gl_name.ln_sbd;
struct gfs2_glock *gl = container_of(rcu, struct gfs2_glock, gl_rcu);
if (gl->gl_ops->go_flags & GLOF_ASPACE) {
kmem_cache_free(gfs2_glock_aspace_cachep, gl);
@ -90,6 +90,13 @@ void gfs2_glock_free(struct gfs2_glock *gl)
kfree(gl->gl_lksb.sb_lvbptr);
kmem_cache_free(gfs2_glock_cachep, gl);
}
}
void gfs2_glock_free(struct gfs2_glock *gl)
{
struct gfs2_sbd *sdp = gl->gl_name.ln_sbd;
call_rcu(&gl->gl_rcu, gfs2_glock_dealloc);
if (atomic_dec_and_test(&sdp->sd_glock_disposal))
wake_up(&sdp->sd_glock_wait);
}

View File

@ -374,6 +374,7 @@ struct gfs2_glock {
loff_t end;
} gl_vm;
};
struct rcu_head gl_rcu;
struct rhash_head gl_node;
};

View File

@ -67,7 +67,7 @@ struct proc_inode {
struct proc_dir_entry *pde;
struct ctl_table_header *sysctl;
struct ctl_table *sysctl_entry;
struct list_head sysctl_inodes;
struct hlist_node sysctl_inodes;
const struct proc_ns_operations *ns_ops;
struct inode vfs_inode;
};

View File

@ -191,7 +191,7 @@ static void init_header(struct ctl_table_header *head,
head->set = set;
head->parent = NULL;
head->node = node;
INIT_LIST_HEAD(&head->inodes);
INIT_HLIST_HEAD(&head->inodes);
if (node) {
struct ctl_table *entry;
for (entry = table; entry->procname; entry++, node++)
@ -261,25 +261,42 @@ static void unuse_table(struct ctl_table_header *p)
complete(p->unregistering);
}
/* called under sysctl_lock */
static void proc_sys_prune_dcache(struct ctl_table_header *head)
{
struct inode *inode, *prev = NULL;
struct inode *inode;
struct proc_inode *ei;
struct hlist_node *node;
struct super_block *sb;
rcu_read_lock();
list_for_each_entry_rcu(ei, &head->inodes, sysctl_inodes) {
inode = igrab(&ei->vfs_inode);
if (inode) {
rcu_read_unlock();
iput(prev);
prev = inode;
d_prune_aliases(inode);
for (;;) {
node = hlist_first_rcu(&head->inodes);
if (!node)
break;
ei = hlist_entry(node, struct proc_inode, sysctl_inodes);
spin_lock(&sysctl_lock);
hlist_del_init_rcu(&ei->sysctl_inodes);
spin_unlock(&sysctl_lock);
inode = &ei->vfs_inode;
sb = inode->i_sb;
if (!atomic_inc_not_zero(&sb->s_active))
continue;
inode = igrab(inode);
rcu_read_unlock();
if (unlikely(!inode)) {
deactivate_super(sb);
rcu_read_lock();
continue;
}
d_prune_aliases(inode);
iput(inode);
deactivate_super(sb);
rcu_read_lock();
}
rcu_read_unlock();
iput(prev);
}
/* called under sysctl_lock, will reacquire if has to wait */
@ -461,7 +478,7 @@ static struct inode *proc_sys_make_inode(struct super_block *sb,
}
ei->sysctl = head;
ei->sysctl_entry = table;
list_add_rcu(&ei->sysctl_inodes, &head->inodes);
hlist_add_head_rcu(&ei->sysctl_inodes, &head->inodes);
head->count++;
spin_unlock(&sysctl_lock);
@ -489,7 +506,7 @@ out:
void proc_sys_evict_inode(struct inode *inode, struct ctl_table_header *head)
{
spin_lock(&sysctl_lock);
list_del_rcu(&PROC_I(inode)->sysctl_inodes);
hlist_del_init_rcu(&PROC_I(inode)->sysctl_inodes);
if (!--head->count)
kfree_rcu(head, rcu);
spin_unlock(&sysctl_lock);

View File

@ -236,6 +236,23 @@ unsigned int cpumask_local_spread(unsigned int i, int node);
(cpu) = cpumask_next_zero((cpu), (mask)), \
(cpu) < nr_cpu_ids;)
extern int cpumask_next_wrap(int n, const struct cpumask *mask, int start, bool wrap);
/**
* for_each_cpu_wrap - iterate over every cpu in a mask, starting at a specified location
* @cpu: the (optionally unsigned) integer iterator
* @mask: the cpumask poiter
* @start: the start location
*
* The implementation does not assume any bit in @mask is set (including @start).
*
* After the loop, cpu is >= nr_cpu_ids.
*/
#define for_each_cpu_wrap(cpu, mask, start) \
for ((cpu) = cpumask_next_wrap((start)-1, (mask), (start), false); \
(cpu) < nr_cpumask_bits; \
(cpu) = cpumask_next_wrap((cpu), (mask), (start), true))
/**
* for_each_cpu_and - iterate over every cpu in both masks
* @cpu: the (optionally unsigned) integer iterator

View File

@ -378,6 +378,7 @@ int subsys_virtual_register(struct bus_type *subsys,
* @suspend: Used to put the device to sleep mode, usually to a low power
* state.
* @resume: Used to bring the device from the sleep mode.
* @shutdown: Called at shut-down time to quiesce the device.
* @ns_type: Callbacks so sysfs can detemine namespaces.
* @namespace: Namespace of the device belongs to this class.
* @pm: The default device power management operations of this class.
@ -407,6 +408,7 @@ struct class {
int (*suspend)(struct device *dev, pm_message_t state);
int (*resume)(struct device *dev);
int (*shutdown)(struct device *dev);
const struct kobj_ns_type_operations *ns_type;
const void *(*namespace)(struct device *dev);

View File

@ -143,7 +143,7 @@ struct ctl_table_header
struct ctl_table_set *set;
struct ctl_dir *parent;
struct ctl_node *node;
struct list_head inodes; /* head for proc_inode->sysctl_inodes */
struct hlist_head inodes; /* head for proc_inode->sysctl_inodes */
};
struct ctl_dir {

View File

@ -565,9 +565,9 @@ extern void usb_ep0_reinit(struct usb_device *);
((USB_DIR_IN|USB_TYPE_STANDARD|USB_RECIP_INTERFACE)<<8)
#define EndpointRequest \
((USB_DIR_IN|USB_TYPE_STANDARD|USB_RECIP_INTERFACE)<<8)
((USB_DIR_IN|USB_TYPE_STANDARD|USB_RECIP_ENDPOINT)<<8)
#define EndpointOutRequest \
((USB_DIR_OUT|USB_TYPE_STANDARD|USB_RECIP_INTERFACE)<<8)
((USB_DIR_OUT|USB_TYPE_STANDARD|USB_RECIP_ENDPOINT)<<8)
/* class requests from the USB 2.0 hub spec, table 11-15 */
#define HUB_CLASS_REQ(dir, type, request) ((((dir) | (type)) << 8) | (request))

View File

@ -1253,8 +1253,10 @@ retry:
timeo = MAX_SCHEDULE_TIMEOUT;
ret = netlink_attachskb(sock, nc, &timeo, NULL);
if (ret == 1)
if (ret == 1) {
sock = NULL;
goto retry;
}
if (ret) {
sock = NULL;
nc = NULL;

View File

@ -231,8 +231,8 @@ int __sched __down_write_common(struct rw_semaphore *sem, int state)
out_nolock:
list_del(&waiter.list);
if (!list_empty(&sem->wait_list))
__rwsem_do_wake(sem, 1);
if (!list_empty(&sem->wait_list) && sem->count >= 0)
__rwsem_do_wake(sem, 0);
raw_spin_unlock_irqrestore(&sem->wait_lock, flags);
return -EINTR;

View File

@ -1381,7 +1381,6 @@ static unsigned long weighted_cpuload(const int cpu);
static unsigned long source_load(int cpu, int type);
static unsigned long target_load(int cpu, int type);
static unsigned long capacity_of(int cpu);
static long effective_load(struct task_group *tg, int cpu, long wl, long wg);
/* Cached statistics for all CPUs within a node */
struct numa_stats {
@ -2469,7 +2468,8 @@ void task_numa_work(struct callback_head *work)
return;
down_read(&mm->mmap_sem);
if (!down_read_trylock(&mm->mmap_sem))
return;
vma = find_vma(mm, start);
if (!vma) {
reset_ptenuma_scan(p);
@ -2584,6 +2584,60 @@ void task_tick_numa(struct rq *rq, struct task_struct *curr)
}
}
}
/*
* Can a task be moved from prev_cpu to this_cpu without causing a load
* imbalance that would trigger the load balancer?
*/
static inline bool numa_wake_affine(struct sched_domain *sd,
struct task_struct *p, int this_cpu,
int prev_cpu, int sync)
{
struct numa_stats prev_load, this_load;
s64 this_eff_load, prev_eff_load;
update_numa_stats(&prev_load, cpu_to_node(prev_cpu));
update_numa_stats(&this_load, cpu_to_node(this_cpu));
/*
* If sync wakeup then subtract the (maximum possible)
* effect of the currently running task from the load
* of the current CPU:
*/
if (sync) {
unsigned long current_load = task_h_load(current);
if (this_load.load > current_load)
this_load.load -= current_load;
else
this_load.load = 0;
}
/*
* In low-load situations, where this_cpu's node is idle due to the
* sync cause above having dropped this_load.load to 0, move the task.
* Moving to an idle socket will not create a bad imbalance.
*
* Otherwise check if the nodes are near enough in load to allow this
* task to be woken on this_cpu's node.
*/
if (this_load.load > 0) {
unsigned long task_load = task_h_load(p);
this_eff_load = 100;
this_eff_load *= prev_load.compute_capacity;
prev_eff_load = 100 + (sd->imbalance_pct - 100) / 2;
prev_eff_load *= this_load.compute_capacity;
this_eff_load *= this_load.load + task_load;
prev_eff_load *= prev_load.load - task_load;
return this_eff_load <= prev_eff_load;
}
return true;
}
#else
static void task_tick_numa(struct rq *rq, struct task_struct *curr)
{
@ -2596,6 +2650,15 @@ static inline void account_numa_enqueue(struct rq *rq, struct task_struct *p)
static inline void account_numa_dequeue(struct rq *rq, struct task_struct *p)
{
}
#ifdef CONFIG_SMP
static inline bool numa_wake_affine(struct sched_domain *sd,
struct task_struct *p, int this_cpu,
int prev_cpu, int sync)
{
return true;
}
#endif /* !SMP */
#endif /* CONFIG_NUMA_BALANCING */
static void
@ -2982,8 +3045,7 @@ __update_load_avg_cfs_rq(u64 now, int cpu, struct cfs_rq *cfs_rq)
* differential update where we store the last value we propagated. This in
* turn allows skipping updates if the differential is 'small'.
*
* Updating tg's load_avg is necessary before update_cfs_share() (which is
* done) and effective_load() (which is not done because it is too costly).
* Updating tg's load_avg is necessary before update_cfs_share().
*/
static inline void update_tg_load_avg(struct cfs_rq *cfs_rq, int force)
{
@ -5215,126 +5277,6 @@ static unsigned long cpu_avg_load_per_task(int cpu)
return 0;
}
#ifdef CONFIG_FAIR_GROUP_SCHED
/*
* effective_load() calculates the load change as seen from the root_task_group
*
* Adding load to a group doesn't make a group heavier, but can cause movement
* of group shares between cpus. Assuming the shares were perfectly aligned one
* can calculate the shift in shares.
*
* Calculate the effective load difference if @wl is added (subtracted) to @tg
* on this @cpu and results in a total addition (subtraction) of @wg to the
* total group weight.
*
* Given a runqueue weight distribution (rw_i) we can compute a shares
* distribution (s_i) using:
*
* s_i = rw_i / \Sum rw_j (1)
*
* Suppose we have 4 CPUs and our @tg is a direct child of the root group and
* has 7 equal weight tasks, distributed as below (rw_i), with the resulting
* shares distribution (s_i):
*
* rw_i = { 2, 4, 1, 0 }
* s_i = { 2/7, 4/7, 1/7, 0 }
*
* As per wake_affine() we're interested in the load of two CPUs (the CPU the
* task used to run on and the CPU the waker is running on), we need to
* compute the effect of waking a task on either CPU and, in case of a sync
* wakeup, compute the effect of the current task going to sleep.
*
* So for a change of @wl to the local @cpu with an overall group weight change
* of @wl we can compute the new shares distribution (s'_i) using:
*
* s'_i = (rw_i + @wl) / (@wg + \Sum rw_j) (2)
*
* Suppose we're interested in CPUs 0 and 1, and want to compute the load
* differences in waking a task to CPU 0. The additional task changes the
* weight and shares distributions like:
*
* rw'_i = { 3, 4, 1, 0 }
* s'_i = { 3/8, 4/8, 1/8, 0 }
*
* We can then compute the difference in effective weight by using:
*
* dw_i = S * (s'_i - s_i) (3)
*
* Where 'S' is the group weight as seen by its parent.
*
* Therefore the effective change in loads on CPU 0 would be 5/56 (3/8 - 2/7)
* times the weight of the group. The effect on CPU 1 would be -4/56 (4/8 -
* 4/7) times the weight of the group.
*/
static long effective_load(struct task_group *tg, int cpu, long wl, long wg)
{
struct sched_entity *se = tg->se[cpu];
if (!tg->parent) /* the trivial, non-cgroup case */
return wl;
for_each_sched_entity(se) {
struct cfs_rq *cfs_rq = se->my_q;
long W, w = cfs_rq_load_avg(cfs_rq);
tg = cfs_rq->tg;
/*
* W = @wg + \Sum rw_j
*/
W = wg + atomic_long_read(&tg->load_avg);
/* Ensure \Sum rw_j >= rw_i */
W -= cfs_rq->tg_load_avg_contrib;
W += w;
/*
* w = rw_i + @wl
*/
w += wl;
/*
* wl = S * s'_i; see (2)
*/
if (W > 0 && w < W)
wl = (w * (long)scale_load_down(tg->shares)) / W;
else
wl = scale_load_down(tg->shares);
/*
* Per the above, wl is the new se->load.weight value; since
* those are clipped to [MIN_SHARES, ...) do so now. See
* calc_cfs_shares().
*/
if (wl < MIN_SHARES)
wl = MIN_SHARES;
/*
* wl = dw_i = S * (s'_i - s_i); see (3)
*/
wl -= se->avg.load_avg;
/*
* Recursively apply this logic to all parent groups to compute
* the final effective load change on the root group. Since
* only the @tg group gets extra weight, all parent groups can
* only redistribute existing shares. @wl is the shift in shares
* resulting from this level per the above.
*/
wg = 0;
}
return wl;
}
#else
static long effective_load(struct task_group *tg, int cpu, long wl, long wg)
{
return wl;
}
#endif
static void record_wakee(struct task_struct *p)
{
/*
@ -5385,67 +5327,25 @@ static int wake_wide(struct task_struct *p)
static int wake_affine(struct sched_domain *sd, struct task_struct *p,
int prev_cpu, int sync)
{
s64 this_load, load;
s64 this_eff_load, prev_eff_load;
int idx, this_cpu;
struct task_group *tg;
unsigned long weight;
int balanced;
idx = sd->wake_idx;
this_cpu = smp_processor_id();
load = source_load(prev_cpu, idx);
this_load = target_load(this_cpu, idx);
int this_cpu = smp_processor_id();
bool affine = false;
/*
* If sync wakeup then subtract the (maximum possible)
* effect of the currently running task from the load
* of the current CPU:
* Common case: CPUs are in the same socket, and select_idle_sibling()
* will do its thing regardless of what we return:
*/
if (sync) {
tg = task_group(current);
weight = current->se.avg.load_avg;
this_load += effective_load(tg, this_cpu, -weight, -weight);
load += effective_load(tg, prev_cpu, 0, -weight);
}
tg = task_group(p);
weight = p->se.avg.load_avg;
/*
* In low-load situations, where prev_cpu is idle and this_cpu is idle
* due to the sync cause above having dropped this_load to 0, we'll
* always have an imbalance, but there's really nothing you can do
* about that, so that's good too.
*
* Otherwise check if either cpus are near enough in load to allow this
* task to be woken on this_cpu.
*/
this_eff_load = 100;
this_eff_load *= capacity_of(prev_cpu);
prev_eff_load = 100 + (sd->imbalance_pct - 100) / 2;
prev_eff_load *= capacity_of(this_cpu);
if (this_load > 0) {
this_eff_load *= this_load +
effective_load(tg, this_cpu, weight, weight);
prev_eff_load *= load + effective_load(tg, prev_cpu, 0, weight);
}
balanced = this_eff_load <= prev_eff_load;
if (cpus_share_cache(prev_cpu, this_cpu))
affine = true;
else
affine = numa_wake_affine(sd, p, this_cpu, prev_cpu, sync);
schedstat_inc(p->se.statistics.nr_wakeups_affine_attempts);
if (affine) {
schedstat_inc(sd->ttwu_move_affine);
schedstat_inc(p->se.statistics.nr_wakeups_affine);
}
if (!balanced)
return 0;
schedstat_inc(sd->ttwu_move_affine);
schedstat_inc(p->se.statistics.nr_wakeups_affine);
return 1;
return affine;
}
static inline int task_util(struct task_struct *p);
@ -5640,43 +5540,6 @@ find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu)
return shallowest_idle_cpu != -1 ? shallowest_idle_cpu : least_loaded_cpu;
}
/*
* Implement a for_each_cpu() variant that starts the scan at a given cpu
* (@start), and wraps around.
*
* This is used to scan for idle CPUs; such that not all CPUs looking for an
* idle CPU find the same CPU. The down-side is that tasks tend to cycle
* through the LLC domain.
*
* Especially tbench is found sensitive to this.
*/
static int cpumask_next_wrap(int n, const struct cpumask *mask, int start, int *wrapped)
{
int next;
again:
next = find_next_bit(cpumask_bits(mask), nr_cpumask_bits, n+1);
if (*wrapped) {
if (next >= start)
return nr_cpumask_bits;
} else {
if (next >= nr_cpumask_bits) {
*wrapped = 1;
n = -1;
goto again;
}
}
return next;
}
#define for_each_cpu_wrap(cpu, mask, start, wrap) \
for ((wrap) = 0, (cpu) = (start)-1; \
(cpu) = cpumask_next_wrap((cpu), (mask), (start), &(wrap)), \
(cpu) < nr_cpumask_bits; )
#ifdef CONFIG_SCHED_SMT
static inline void set_idle_cores(int cpu, int val)
@ -5736,7 +5599,7 @@ unlock:
static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int target)
{
struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
int core, cpu, wrap;
int core, cpu;
if (!static_branch_likely(&sched_smt_present))
return -1;
@ -5746,7 +5609,7 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int
cpumask_and(cpus, sched_domain_span(sd), &p->cpus_allowed);
for_each_cpu_wrap(core, cpus, target, wrap) {
for_each_cpu_wrap(core, cpus, target) {
bool idle = true;
for_each_cpu(cpu, cpu_smt_mask(core)) {
@ -5809,27 +5672,38 @@ static inline int select_idle_smt(struct task_struct *p, struct sched_domain *sd
static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int target)
{
struct sched_domain *this_sd;
u64 avg_cost, avg_idle = this_rq()->avg_idle;
u64 avg_cost, avg_idle;
u64 time, cost;
s64 delta;
int cpu, wrap;
int cpu, nr = INT_MAX;
this_sd = rcu_dereference(*this_cpu_ptr(&sd_llc));
if (!this_sd)
return -1;
avg_cost = this_sd->avg_scan_cost;
/*
* Due to large variance we need a large fuzz factor; hackbench in
* particularly is sensitive here.
*/
if (sched_feat(SIS_AVG_CPU) && (avg_idle / 512) < avg_cost)
avg_idle = this_rq()->avg_idle / 512;
avg_cost = this_sd->avg_scan_cost + 1;
if (sched_feat(SIS_AVG_CPU) && avg_idle < avg_cost)
return -1;
if (sched_feat(SIS_PROP)) {
u64 span_avg = sd->span_weight * avg_idle;
if (span_avg > 4*avg_cost)
nr = div_u64(span_avg, avg_cost);
else
nr = 4;
}
time = local_clock();
for_each_cpu_wrap(cpu, sched_domain_span(sd), target, wrap) {
for_each_cpu_wrap(cpu, sched_domain_span(sd), target) {
if (!--nr)
return -1;
if (!cpumask_test_cpu(cpu, &p->cpus_allowed))
continue;
if (idle_cpu(cpu))
@ -6011,11 +5885,15 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int sd_flag, int wake_f
if (affine_sd) {
sd = NULL; /* Prefer wake_affine over balance flags */
if (cpu != prev_cpu && wake_affine(affine_sd, p, prev_cpu, sync))
if (cpu == prev_cpu)
goto pick_cpu;
if (wake_affine(affine_sd, p, prev_cpu, sync))
new_cpu = cpu;
}
if (!sd) {
pick_cpu:
if (sd_flag & SD_BALANCE_WAKE) /* XXX always ? */
new_cpu = select_idle_sibling(p, prev_cpu, new_cpu);
@ -6686,6 +6564,10 @@ static int migrate_degrades_locality(struct task_struct *p, struct lb_env *env)
if (dst_nid == p->numa_preferred_nid)
return 0;
/* Leaving a core idle is often worse than degrading locality. */
if (env->idle != CPU_NOT_IDLE)
return -1;
if (numa_group) {
src_faults = group_faults(p, src_nid);
dst_faults = group_faults(p, dst_nid);

View File

@ -55,6 +55,7 @@ SCHED_FEAT(TTWU_QUEUE, true)
* When doing wakeups, attempt to limit superfluous scans of the LLC domain.
*/
SCHED_FEAT(SIS_AVG_CPU, false)
SCHED_FEAT(SIS_PROP, true)
/*
* Issue a WARN when we do multiple update_rq_clock() calls

View File

@ -43,6 +43,38 @@ int cpumask_any_but(const struct cpumask *mask, unsigned int cpu)
}
EXPORT_SYMBOL(cpumask_any_but);
/**
* cpumask_next_wrap - helper to implement for_each_cpu_wrap
* @n: the cpu prior to the place to search
* @mask: the cpumask pointer
* @start: the start point of the iteration
* @wrap: assume @n crossing @start terminates the iteration
*
* Returns >= nr_cpu_ids on completion
*
* Note: the @wrap argument is required for the start condition when
* we cannot assume @start is set in @mask.
*/
int cpumask_next_wrap(int n, const struct cpumask *mask, int start, bool wrap)
{
int next;
again:
next = cpumask_next(n, mask);
if (wrap && n < start && next >= start) {
return nr_cpumask_bits;
} else if (next >= nr_cpumask_bits) {
wrap = true;
n = -1;
goto again;
}
return next;
}
EXPORT_SYMBOL(cpumask_next_wrap);
/* These are not inline because of header tangles. */
#ifdef CONFIG_CPUMASK_OFFSTACK
/**